{"doi":"10.1101/gr.278985.124","title":"Full-resolution HLA and KIR gene annotations for human genome assemblies","abstract":"The human leukocyte antigen (HLA) genes and the killer cell immunoglobulin-like receptor (KIR) genes are critical to immune responses and are associated with many immune-related diseases. Located in highly polymorphic regions, it is difficult to study them with traditional short-read alignment-based methods. Although modern long-read assemblers can often assemble these genes, using existing tools to annotate HLA and KIR genes in these assemblies remains a nontrivial task. Here, we describe Immuannot, a new computation tool to annotate the gene structures of HLA and KIR genes and to type the allele of each gene. Applying Immuannot to 56 regional and 212 whole-genome assemblies from previous studies, we annotate 9931 HLA and KIR genes and found that almost half of these genes, 4068, have novel sequences compared with the current Immuno Polymorphism Database (IPD). These novel gene sequences are represented by 2664 distinct alleles, some of which contained nonsynonymous variations, resulting in 92 novel protein sequences. We demonstrate the complex haplotype structures at the two loci and report the linkage between HLA/KIR haplotypes and gene alleles. We anticipate that Immuannot will speed up the discovery of new HLA/KIR alleles and enable the association of HLA/KIR haplotype structures with clinical outcomes in the future.","journal":"Genome Research","year":2024,"id":6863,"datarank":0.760649695051699,"base_score":3.2188758248682006,"endowment":3.2188758248682006,"self_citation_contribution":0.48283137373023016,"citation_network_contribution":0.2778183213214688,"self_endowment_contribution":0.48283137373023016,"citer_contribution":0.2778183213214688,"corpus_percentile":55.49227013832384,"corpus_rank":548,"citation_count":25,"citer_count":24,"citers_with_citation_signal":17,"citers_with_endowment":17,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.8297,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2024-06-05","fair_score":46.6667,"fair_percentile":43.733509234828496,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":12209,"name":"Li Song","orcid":"0000-0002-0180-7426","position":1,"is_corresponding":false},{"id":30887,"name":"Alexandra P. Lewis","orcid":"0000-0002-6195-4786","position":2,"is_corresponding":false},{"id":14882,"name":"Ying Zhou","orcid":"0000-0002-8107-3927","position":0,"is_corresponding":true}],"reference_count":64,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":"38839374","pmcid":"PMC11610593","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":52.5,"fair_a":55.0,"fair_i":37.5,"fair_r":41.6667,"fair_zscore":0.1316,"fair_rationale":{"fair_score":46.67,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":52.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The paper provides a text abstract and methodology but lacks structured, machine-readable metadata such as semantic annotations, structured data dictionaries, or a formal metadata schema."}]},"A":{"name":"Accessible","score":55.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The software is available on GitHub under an MIT license and data on Zenodo, but there is no explicit description of an authentication or access protocol for the data/code beyond the URLs."}]},"I":{"name":"Interoperable","score":37.5,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper uses standard file formats (GTF), community databases (IPD-IMGT/HLA, IPD-KIR), and standard identifiers (allele names), but does not state use of formal ontology terms or vocabulary services for interoperability at a higher level."}]},"R":{"name":"Reusable","score":41.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.667,"signal":null,"rationale":"The paper includes a clear software link, an MIT license, and Supplemental Code for reproducibility, but the data-availability statement is limited to a Zenodo link without specifying a formal data use license or comprehensive provenance metadata for all outputs."}]}},"suggestions":["Add structured, machine-readable metadata in the paper (e.g., JSON-LD or RDFa) describing the dataset, tool, and key results using community vocabularies like EDAM or schema.org.","Describe the exact authentication or access method required for each data/code repository (e.g., 'no login required for download' or 'available via API without key').","Deposit final annotation files, including novel allele sequences, in a community repository that assigns standard persistent identifiers (e.g., IPD updates or GenBank) and specify the exact file formats and schemas used.","Explicitly state the licenses for the generated data (e.g., 'CC-BY 4.0') and for the software (already MIT), and include a formal data-availability statement with accession numbers for all supplementary data.","Provide a containerized version (e.g., Docker/Singularity) of Immuannot to ensure computational reproducibility without environment setup issues."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:46:37.461164Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}