{"doi":"10.1101/2022.07.09.499321","title":"A Draft Human Pangenome Reference","abstract":"The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications relative to the existing reference, GRCh38. Roughly 90 million of the additional base pairs derive from structural variation. Using our draft pangenome to analyze short-read data reduces errors when discovering small variants by 34% and boosts the detected structural variants per haplotype by 104% compared to GRCh38-based workflows, and by 34% compared to using previous diversity sets of genome assemblies.","journal":null,"year":2022,"id":12510,"datarank":2.850870587584712,"base_score":4.304065093204169,"endowment":4.304065093204169,"self_citation_contribution":0.6456097639806255,"citation_network_contribution":2.2052608236040867,"self_endowment_contribution":0.6456097639806255,"citer_contribution":2.2052608236040867,"corpus_percentile":67.45321399511798,"corpus_rank":401,"citation_count":74,"citer_count":49,"citers_with_citation_signal":38,"citers_with_endowment":38,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9401,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2022-07-09","fair_score":30.0,"fair_percentile":11.499560246262094,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":24434,"name":"Mobin Asri","orcid":"0000-0002-7194-5138","position":1,"is_corresponding":false},{"id":24456,"name":"Daniel Doerr","orcid":"0000-0002-3720-6227","position":3,"is_corresponding":false},{"id":13532,"name":"Eoghan Harrington","orcid":"0000-0002-4850-2486","position":4,"is_corresponding":false},{"id":6314,"name":"Hickey, Glenn","orcid":"0000-0002-2280-9404","position":5,"is_corresponding":false},{"id":24603,"name":"Julian K. Lucas","orcid":"0000-0001-9163-2756","position":7,"is_corresponding":false},{"id":6315,"name":"Jean Monlong","orcid":"0000-0002-9737-5516","position":8,"is_corresponding":false},{"id":24433,"name":"Lucinda Antonacci-Fulton","orcid":null,"position":9,"is_corresponding":false},{"id":24442,"name":"Silvia Buonaiuto","orcid":"0000-0002-7423-7110","position":10,"is_corresponding":false},{"id":24447,"name":"Pi-Chuan Chang","orcid":"0000-0003-3021-6446","position":11,"is_corresponding":false},{"id":1288,"name":"Haoyu Cheng","orcid":"0000-0002-9209-5793","position":12,"is_corresponding":false},{"id":3632,"name":"Justin Chu","orcid":"0000-0003-0549-4997","position":13,"is_corresponding":false},{"id":6318,"name":"Jordan M. Eizenga","orcid":"0000-0001-8345-8356","position":15,"is_corresponding":false},{"id":757,"name":"Xiaowen Feng","orcid":"0000-0002-2291-1361","position":16,"is_corresponding":false},{"id":30863,"name":"Christian Fischer","orcid":"0000-0002-5023-1080","position":17,"is_corresponding":false},{"id":1766,"name":"Robert S. Fulton","orcid":"0009-0006-6820-3404","position":18,"is_corresponding":false},{"id":24469,"name":"Nanibaa’ A. Garrison","orcid":"0000-0002-6228-3216","position":19,"is_corresponding":false},{"id":24473,"name":"Cristian Groza","orcid":"0000-0001-6624-5404","position":20,"is_corresponding":false},{"id":19201,"name":"Andrea Guarracino","orcid":"0000-0001-9744-131X","position":21,"is_corresponding":false},{"id":19702,"name":"William T. Harvey","orcid":"0000-0003-0646-7528","position":22,"is_corresponding":false},{"id":24594,"name":"Simon Heumos","orcid":"0000-0003-3326-817X","position":23,"is_corresponding":false},{"id":30880,"name":"Thibaut Hourlier","orcid":"0000-0003-4894-7773","position":24,"is_corresponding":false},{"id":24505,"name":"Fergal J. Martin","orcid":"0000-0002-1672-050X","position":28,"is_corresponding":false},{"id":24512,"name":"Matthew W. Mitchell","orcid":"0000-0002-6947-0495","position":29,"is_corresponding":false},{"id":19706,"name":"Katherine M. Munson","orcid":"0000-0001-8413-6498","position":30,"is_corresponding":false},{"id":30896,"name":"Moses Njagi Mwaniki","orcid":"0000-0002-4858-2375","position":31,"is_corresponding":false},{"id":30897,"name":"Maria Nattestad","orcid":"0000-0002-4796-2894","position":32,"is_corresponding":false},{"id":30898,"name":"Hugh E. Olsen","orcid":"0000-0002-7293-8853","position":33,"is_corresponding":false},{"id":30901,"name":"Alice B. Popejoy","orcid":"0000-0003-4976-0628","position":34,"is_corresponding":false},{"id":19696,"name":"David Porubsky","orcid":"0000-0001-8414-8966","position":35,"is_corresponding":false},{"id":24530,"name":"Pjotr Prins","orcid":"0000-0002-8021-9162","position":36,"is_corresponding":false},{"id":30906,"name":"Jonas A. Sibbesen","orcid":"0000-0002-5528-0236","position":37,"is_corresponding":false},{"id":24554,"name":"Chad Tomlinson","orcid":"0000-0001-9905-6159","position":38,"is_corresponding":false},{"id":24559,"name":"Flavia Villani","orcid":"0000-0003-3633-0610","position":39,"is_corresponding":false},{"id":24561,"name":"Mitchell R. Vollger","orcid":"0000-0002-8651-1615","position":40,"is_corresponding":false},{"id":6320,"name":"Human Pangenome Reference Consortium","orcid":null,"position":41,"is_corresponding":false},{"id":30869,"name":"Konstantinos Billis","orcid":"0000-0001-8568-4306","position":42,"is_corresponding":false},{"id":19738,"name":"Mark J. P. Chaisson","orcid":"0000-0001-5395-1457","position":43,"is_corresponding":false},{"id":20075,"name":"David B. Jaffe","orcid":"0000-0001-8739-568X","position":44,"is_corresponding":false},{"id":2122,"name":"Adam  M. Phillippy","orcid":"0000-0003-2983-8934","position":45,"is_corresponding":false},{"id":30911,"name":"Aleksey V. Zimin","orcid":"0000-0001-5091-3092","position":46,"is_corresponding":false},{"id":2125,"name":"Evan E. Eichler","orcid":"0000-0002-8246-4014","position":47,"is_corresponding":false},{"id":14377,"name":"Steven J.M. Jones","orcid":"0000-0003-3394-2208","position":48,"is_corresponding":false},{"id":21329,"name":"Erich  D. Jarvis","orcid":"0000-0001-8931-5049","position":49,"is_corresponding":false},{"id":30894,"name":"Jennifer McDaniel","orcid":"0000-0003-1987-0914","position":50,"is_corresponding":false},{"id":24564,"name":"Ting Wang","orcid":"0000-0002-6800-242X","position":51,"is_corresponding":false},{"id":62052,"name":"Nicholas F. Parrish","orcid":"0000-0002-6971-8016","position":52,"is_corresponding":false},{"id":6321,"name":"Tobias Marschall","orcid":"0000-0002-9376-1030","position":53,"is_corresponding":false},{"id":24475,"name":"Leanne Haggerty","orcid":"0000-0001-8843-3596","position":54,"is_corresponding":false},{"id":30887,"name":"Alexandra P. Lewis","orcid":"0000-0002-6195-4786","position":55,"is_corresponding":false},{"id":30899,"name":"Nathan D. Olson","orcid":"0000-0003-2585-3037","position":56,"is_corresponding":false},{"id":30913,"name":"Haley Abel","orcid":"0000-0003-3110-8041","position":58,"is_corresponding":false},{"id":30916,"name":"Wen‐Wei Liao","orcid":"0000-0001-8183-213X","position":0,"is_corresponding":true}],"reference_count":127,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"green","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":37.0,"fair_a":53.0,"fair_i":10.0,"fair_r":20.0,"fair_zscore":-1.376,"fair_rationale":{"fair_score":30.0,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":37.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"datacite=0, pmcid=False, pmid=False","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The paper provides descriptive metadata about the pangenome (e.g., 47 assemblies, 119 Mbp added) but lacks machine-readable metadata such as structured identifiers or formal metadata schemas."}]},"A":{"name":"Accessible","score":53.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The paper does not specify a clear protocol for accessing the data or code, such as a repository URL, persistent identifier, or download instructions."}]},"I":{"name":"Interoperable","score":10.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper uses standard genomic formats implicitly (e.g., assemblies, alignments) but does not explicitly state the use of standard vocabularies, identifiers, or file formats."}]},"R":{"name":"Reusable","score":20.0,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.333,"signal":null,"rationale":"The paper lacks a data-availability statement, license, or explicit reproducibility details, though it describes the data and methods in general terms."}]}},"suggestions":["Add a data-availability statement with a persistent identifier (e.g., DOI) and repository URL for the pangenome assemblies and code.","Include machine-readable metadata using a standard schema (e.g., DCAT, schema.org) and structured identifiers for each assembly.","Specify the file formats (e.g., FASTA, VCF, BAM) and controlled vocabularies (e.g., SO, GENCODE) used for interoperability.","Provide a clear license (e.g., CC0, MIT) and detailed reproducibility steps, including software versions and parameters.","Describe the access protocol (e.g., open access, authentication requirements) and any restrictions on data use."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"abstract_only"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"abstract_only","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:41:21.726642Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}