{"doi":"10.1101/2024.09.24.614721","title":"Complex genetic variation in nearly complete human genomes","abstract":"Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.","journal":null,"year":2024,"id":1762,"datarank":0.7232425000724075,"base_score":3.258096538021482,"endowment":3.258096538021482,"self_citation_contribution":0.4887144807032224,"citation_network_contribution":0.23452801936918516,"self_endowment_contribution":0.4887144807032224,"citer_contribution":0.23452801936918516,"corpus_percentile":55.00406834825061,"corpus_rank":554,"citation_count":27,"citer_count":16,"citers_with_citation_signal":8,"citers_with_endowment":8,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9328,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2024-09-25","fair_score":41.4583,"fair_percentile":20.734388742304308,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":19693,"name":"Peter Ebert","orcid":"0000-0001-7441-532X","position":1,"is_corresponding":false},{"id":19694,"name":"Peter A. Audano","orcid":"0000-0002-5187-0415","position":2,"is_corresponding":false},{"id":19695,"name":"Mark Loftus","orcid":"0000-0002-6279-6855","position":3,"is_corresponding":false},{"id":19696,"name":"David Porubsky","orcid":"0000-0001-8414-8966","position":4,"is_corresponding":false},{"id":19697,"name":"Feyza Yilmaz","orcid":"0000-0001-8795-5800","position":6,"is_corresponding":false},{"id":19698,"name":"Pille Hallast","orcid":"0000-0002-0588-3987","position":7,"is_corresponding":false},{"id":19699,"name":"Timofey Prodanov","orcid":"0000-0001-7469-6651","position":8,"is_corresponding":false},{"id":19700,"name":"DongAhn Yoo","orcid":"0000-0003-0033-3721","position":9,"is_corresponding":false},{"id":19701,"name":"Carolyn A. Paisie","orcid":"0000-0003-4306-4154","position":10,"is_corresponding":false},{"id":19702,"name":"William T. Harvey","orcid":"0000-0003-0646-7528","position":11,"is_corresponding":false},{"id":19703,"name":"Xuefang Zhao","orcid":"0000-0003-4036-9577","position":12,"is_corresponding":false},{"id":19704,"name":"Gianni V. Martino","orcid":"0009-0005-4143-7465","position":13,"is_corresponding":false},{"id":19705,"name":"Mir Henglin","orcid":"0000-0003-3604-4868","position":14,"is_corresponding":false},{"id":19706,"name":"Katherine M. Munson","orcid":"0000-0001-8413-6498","position":15,"is_corresponding":false},{"id":19707,"name":"K Siddique-e Rabbani","orcid":"0009-0004-1448-2167","position":16,"is_corresponding":false},{"id":2121,"name":"Chen-Shan Chin","orcid":"0000-0003-4394-2455","position":17,"is_corresponding":false},{"id":19708,"name":"Bida Gu","orcid":"0000-0001-8575-997X","position":18,"is_corresponding":false},{"id":19709,"name":"Hufsah Ashraf","orcid":"0000-0001-7760-0627","position":19,"is_corresponding":false},{"id":19710,"name":"Olanrewaju Austine-Orimoloye","orcid":"0000-0002-4390-1437","position":20,"is_corresponding":false},{"id":19711,"name":"Parithi Balachandran","orcid":"0000-0003-3256-1403","position":21,"is_corresponding":false},{"id":19712,"name":"Marc Jan Bonder","orcid":"0000-0002-8431-3180","position":22,"is_corresponding":false},{"id":1288,"name":"Haoyu Cheng","orcid":"0000-0002-9209-5793","position":23,"is_corresponding":false},{"id":13351,"name":"Zechen Chong","orcid":"0000-0001-5750-1808","position":24,"is_corresponding":false},{"id":19713,"name":"Jonathan Crabtree","orcid":"0000-0002-7286-5690","position":25,"is_corresponding":false},{"id":42990,"name":"Grigorios Georgolopoulos","orcid":"0000-0002-9906-4797","position":26,"is_corresponding":false},{"id":19714,"name":"Lisbeth A. Guethlein","orcid":"0000-0002-1301-8301","position":27,"is_corresponding":false},{"id":19715,"name":"Patrick Hasenfeld","orcid":"0000-0003-2319-2482","position":28,"is_corresponding":false},{"id":6314,"name":"Hickey, Glenn","orcid":"0000-0002-2280-9404","position":29,"is_corresponding":false},{"id":19716,"name":"Kendra Hoekzema","orcid":"0000-0002-8058-0177","position":30,"is_corresponding":false},{"id":19717,"name":"Sarah E. Hunt","orcid":"0000-0002-8350-1235","position":31,"is_corresponding":false},{"id":19718,"name":"Matthew Jensen","orcid":"0000-0002-5153-8543","position":32,"is_corresponding":false},{"id":19719,"name":"Yunzhe Jiang","orcid":"0000-0001-8768-0050","position":33,"is_corresponding":false},{"id":2118,"name":"Sergey Koren","orcid":"0000-0002-1472-8962","position":34,"is_corresponding":false},{"id":19721,"name":"Chong Li","orcid":"0000-0003-1949-4074","position":36,"is_corresponding":false},{"id":30887,"name":"Alexandra P. Lewis","orcid":"0000-0002-6195-4786","position":37,"is_corresponding":false},{"id":18070,"name":"Jiaqi Li","orcid":"0000-0003-1587-5910","position":38,"is_corresponding":false},{"id":19722,"name":"Paul J. Norman","orcid":"0000-0001-8370-7703","position":39,"is_corresponding":false},{"id":19723,"name":"Keisuke K. Oshima","orcid":"0009-0002-2229-8998","position":40,"is_corresponding":false},{"id":30899,"name":"Nathan D. Olson","orcid":"0000-0003-2585-3037","position":41,"is_corresponding":false},{"id":2122,"name":"Adam  M. Phillippy","orcid":"0000-0003-2983-8934","position":42,"is_corresponding":false},{"id":19724,"name":"Nicholas R. Pollock","orcid":"0000-0003-0114-528X","position":43,"is_corresponding":false},{"id":13956,"name":"Tobias Rausch","orcid":"0000-0001-5773-5620","position":44,"is_corresponding":false},{"id":30918,"name":"Allison A. Regier","orcid":"0000-0002-1932-8714","position":45,"is_corresponding":false},{"id":19726,"name":"Stephan Scholz","orcid":"0009-0000-0268-1979","position":46,"is_corresponding":false},{"id":19727,"name":"Yuwei Song","orcid":"0000-0003-2537-4343","position":47,"is_corresponding":false},{"id":19728,"name":"Arda Soylev","orcid":"0000-0003-2198-1920","position":48,"is_corresponding":false},{"id":19729,"name":"Arvis Sulovari","orcid":"0000-0003-4354-9020","position":49,"is_corresponding":false},{"id":19730,"name":"Likhitha Surapaneni","orcid":"0000-0002-0575-7673","position":50,"is_corresponding":false},{"id":19731,"name":"Vasiliki Tsapalou","orcid":"0009-0002-3588-7003","position":51,"is_corresponding":false},{"id":19732,"name":"Weichen Zhou","orcid":"0000-0003-4755-1072","position":52,"is_corresponding":false},{"id":14882,"name":"Ying Zhou","orcid":"0000-0002-8107-3927","position":53,"is_corresponding":false},{"id":19733,"name":"Qihui Zhu","orcid":"0000-0003-2401-8443","position":54,"is_corresponding":false},{"id":6273,"name":"Michael C. Zody","orcid":"0000-0001-6594-7199","position":55,"is_corresponding":false},{"id":19734,"name":"Ryan E. Mills","orcid":"0000-0003-3425-6998","position":56,"is_corresponding":false},{"id":19735,"name":"Scott E. Devine","orcid":"0000-0001-7629-8331","position":57,"is_corresponding":false},{"id":19736,"name":"Xinghua Shi","orcid":"0000-0003-4662-3177","position":58,"is_corresponding":false},{"id":19737,"name":"Mike E Talkowski","orcid":null,"position":59,"is_corresponding":false},{"id":19738,"name":"Mark J. P. Chaisson","orcid":"0000-0001-5395-1457","position":60,"is_corresponding":false},{"id":19739,"name":"Alexander T Dilthey","orcid":"0000-0002-6394-4581","position":61,"is_corresponding":false},{"id":19740,"name":"Miriam K. Konkel","orcid":"0000-0002-3190-1667","position":62,"is_corresponding":false},{"id":65595,"name":"Natalia Koralewska","orcid":"0000-0001-7096-0128","position":63,"is_corresponding":false},{"id":19741,"name":"Charles Lee","orcid":"0000-0001-7317-6662","position":64,"is_corresponding":false},{"id":19742,"name":"Christine R. Beck","orcid":"0000-0001-7821-8489","position":65,"is_corresponding":false},{"id":2125,"name":"Evan E. Eichler","orcid":"0000-0002-8246-4014","position":66,"is_corresponding":false},{"id":6321,"name":"Tobias Marschall","orcid":"0000-0002-9376-1030","position":67,"is_corresponding":false},{"id":19743,"name":"Young-Jun Kwon","orcid":"0000-0002-5024-2134","position":68,"is_corresponding":false},{"id":19692,"name":"Glennis A. Logsdon","orcid":"0000-0003-2396-0656","position":0,"is_corresponding":true}],"reference_count":0,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":"39372794","pmcid":"PMC11451754","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"green","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":52.5,"fair_a":55.0,"fair_i":25.0,"fair_r":33.3333,"fair_zscore":-0.3395,"fair_rationale":{"fair_score":41.46,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":52.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The paper provides data accessions and a public release directory URL, but does not describe machine-readable metadata or structured metadata schemas."}]},"A":{"name":"Accessible","score":55.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"Data access is described through public repositories and a Globus endpoint, but the paper lacks a clear, step-by-step protocol for automated access (e.g., API instructions)."}]},"I":{"name":"Interoperable","score":25.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"Standard formats (e.g., FASTA, VCF) are implied and some community identifiers (e.g., ORCID) are used, but no formal reference to standard terminologies, ontologies, or semantic interoperability measures is provided."}]},"R":{"name":"Reusable","score":33.33,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.5,"signal":null,"rationale":"The paper includes a data-availability statement, a CC BY license, and mentions code availability on GitHub, but lacks a formal software license for all code and does not provide a detailed reproducibility capsule or container for all analyses."}]}},"suggestions":["Provide machine-readable metadata as structured files (e.g., JSON-LD, schema.org) alongside the data release.","Include a clear, step-by-step protocol for automated data access (e.g., wget commands, API endpoints).","Adopt and cite standard ontologies/vocabularies (e.g., SO, GENO) for describing variant types and genomic features.","Add explicit software licenses (e.g., MIT, GPL) to all GitHub repositories, not just PAV.","Create a reproducible computational workflow or container (e.g., Docker, Singularity) that can be run end-to-end to reproduce the results."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:45:36.555109Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}