{"doi":"10.1038/s41592-023-02035-2","title":"Population-level integration of single-cell datasets enables multi-scale analysis across samples","abstract":"The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.","journal":"Nature Methods","year":2023,"id":10224,"datarank":1.9792604182791975,"base_score":4.6913478822291435,"endowment":4.6913478822291435,"self_citation_contribution":0.7037021823343717,"citation_network_contribution":1.2755582359448259,"self_endowment_contribution":0.7037021823343717,"citer_contribution":1.2755582359448259,"corpus_percentile":null,"corpus_rank":null,"citation_count":108,"citer_count":86,"citers_with_citation_signal":59,"citers_with_endowment":59,"datacite_reuse_total":0,"is_dataset":false,"is_dataset_confidence":0.1513,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2023-10-09","fair_score":null,"fair_percentile":null,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":1697,"name":"Soroor Hediyeh-zadeh","orcid":"0000-0001-7513-6779","position":1,"is_corresponding":false},{"id":22238,"name":"Amir Ali Moinfar","orcid":"0009-0005-4680-2724","position":2,"is_corresponding":false},{"id":22237,"name":"Marco Wagenstetter","orcid":null,"position":3,"is_corresponding":false},{"id":11447,"name":"Luke Zappia","orcid":"0000-0001-7744-8565","position":4,"is_corresponding":false},{"id":1694,"name":"Mohammad Lotfollahi","orcid":"0000-0001-6858-7985","position":5,"is_corresponding":false},{"id":42,"name":"Fabian Joachim Theis","orcid":"0000-0002-2419-1943","position":6,"is_corresponding":false},{"id":12137,"name":"Carlo De Donno","orcid":"0000-0002-9553-0121","position":0,"is_corresponding":true}],"reference_count":68,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":null,"fair_a":null,"fair_i":null,"fair_r":null,"fair_zscore":null,"fair_rationale":null,"fair_model":null,"fair_agent_version":null,"fair_fulltext_source":null,"fair_has_llm":null,"fair_computed_at":null,"clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}