{"doi":"10.1093/nar/gks1195","title":"GenBank","abstract":"GenBank® (http://www.ncbi.nlm.nih.gov) is a comprehensive database that contains publicly available nucleotide sequences for almost 260 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.","journal":"Nucleic Acids Research","year":2012,"id":10957,"datarank":18.553747355149323,"base_score":8.064007347096661,"endowment":8.064007347096661,"self_citation_contribution":1.2096011020644992,"citation_network_contribution":17.344146253084823,"self_endowment_contribution":1.2096011020644992,"citer_contribution":17.344146253084823,"corpus_percentile":92.35150528885272,"corpus_rank":95,"citation_count":3276,"citer_count":198,"citers_with_citation_signal":198,"citers_with_endowment":198,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9502,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2012-11-26","fair_score":59.1667,"fair_percentile":92.10642040457344,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":89203,"name":"Mark Cavanaugh","orcid":null,"position":1,"is_corresponding":false},{"id":89204,"name":"Karen Clark","orcid":"0000-0003-4403-1477","position":2,"is_corresponding":false},{"id":19769,"name":"Ilene Karsch-Mizrachi","orcid":null,"position":3,"is_corresponding":false},{"id":42676,"name":"David J. Lipman","orcid":"0009-0002-3443-2023","position":4,"is_corresponding":false},{"id":17121,"name":"James Ostell","orcid":null,"position":5,"is_corresponding":false},{"id":89205,"name":"Eric W. Sayers","orcid":"0000-0001-8394-3802","position":6,"is_corresponding":false},{"id":89206,"name":"D. A. Benson","orcid":null,"position":7,"is_corresponding":false},{"id":11706,"name":"Ilene Karsch‐Mizrachi","orcid":"0000-0002-0289-7101","position":8,"is_corresponding":false},{"id":89202,"name":"Dennis A. Benson","orcid":null,"position":0,"is_corresponding":true}],"reference_count":13,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":"23193287","pmcid":"PMC3531190","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by-nc","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":77.5,"fair_a":67.5,"fair_i":50.0,"fair_r":41.6667,"fair_zscore":1.2623,"fair_rationale":{"fair_score":59.17,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":77.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper describes structured metadata for sequence records (accessions, taxonomy, annotations) and machine-readable formats (ASN.1, flat file, FASTA), but does not explicitly discuss machine-actionable metadata beyond identifiers and flat-file structures."}]},"A":{"name":"Accessible","score":67.5,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper provides clear, multi-protocol access via FTP, web interfaces (Entrez, BLAST), Aspera, and daily updates, but does not specify authentication or authorization details for all access methods, assuming open public access."}]},"I":{"name":"Interoperable","score":50.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"The paper explicitly uses standard identifiers (INSDC accession.version, GI numbers), controlled vocabularies (e.g., /experimental, /inference), formats (ASN.1, FASTA, GenBank flat file), and daily data exchange with international partners (ENA, DDBJ), demonstrating full interoperability."}]},"R":{"name":"Reusable","score":41.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.667,"signal":null,"rationale":"The paper includes a data-availability statement (GenBank as public database), an open-access license (Creative Commons Attribution-NonCommercial 3.0), and describes quality assurance and versioning, but does not specify a formal citation requirement or provide explicit reproducibility steps for derived analyses."}]}},"suggestions":["Provide explicit machine-readable metadata in a standardized schema (e.g., JSON-LD, schema.org) for improved findability.","Include a dedicated data-availability section with a persistent identifier (DOI) for the specific database version used.","Clarify authentication mechanisms (if any) for restricted access subsets (e.g., confidential sequences).","Define a canonical citation format for the database itself, not just the article, to enhance reuse tracking.","Offer a formal reproducibility checklist or containerized workflow for analyses using GenBank data."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:29:34.837425Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}