Efficient and precise single-cell reference atlas mapping with Symphony

Joyce B. Kang , Aparna Nathan, Nghia Millard , Laurie Rumker , D. Branch Moody , Ilya Korsunsky, Soumya Raychaudhuri.
bioRxiv.   
Abstract
Recent advances in single-cell technologies and integration algorithms make it possible to construct 32 large, comprehensive reference atlases from multiple datasets encompassing many donors, studies, 33 disease states, and sequencing platforms. Much like mapping sequencing reads to a reference 34 genome, it is essential to be able to map new query cells onto complex, multimillion-cell reference 35 atlases to rapidly identify relevant cell states and phenotypes. We present Symphony, a novel algorithm for building compressed, integrated reference atlases of ³106 36 cells and enabling efficient query 37 mapping within seconds. Based on a linear mixture model framework, Symphony precisely localizes 38 query cells within a low-dimensional reference embedding without the need to reintegrate the reference 39 cells, facilitating the downstream transfer of many types of reference-defined annotations to the query 40 cells. We demonstrate the power of Symphony by (1) mapping a query containing multiple levels of 41 experimental design to predict pancreatic cell types in human and mouse, (2) localizing query cells 42 along a smooth developmental trajectory of human fetal liver hematopoiesis, and (3) harnessing a 43 multimodal CITE-seq reference atlas to infer query surface protein expression in memory T cells. 44 Symphony will enable the sharing of comprehensive integrated reference atlases in a convenient, 45 portable format that powers fast, reproducible querying and downstream analyses.