{"doi":"10.1101/2025.07.03.662928","title":"An expanded reference catalog of translated open reading frames for biomedical research","abstract":"Non-canonical (i.e., unannotated) open reading frames (ncORFs) have until recently been omitted from reference genome annotations, despite evidence of their translation, limiting their incorporation into biomedical research. To address this, in 2022, we initiated the TransCODE consortium and built the first community-driven consensus catalog of human ncORFs, which was openly distributed to the research community via Ensembl-GENCODE. While this catalog represented a starting point for reference ncORF annotation, major technical and scientific issues remained. In particular, this initial catalogue had no standardized framework to judge the evidence of translation for individual ncORFs. Here, we present an expanded and refined catalog of the human reference annotation of ncORFs. By incorporating more datasets and by lifting constraints on ORF length and start-codon, we define a comprehensive set of 28,359 ncORFs that is nearly four times the size of the previous catalog. Furthermore, to aid users who wish to work with ncORFs with the strongest and most reproducible signals of translation, we utilized a data-driven framework (i.e. translation signature scores) to assess the accumulated evidence for any individual ncORF. Using this approach, we derive a subset of 7,888 ncORFs with translation evidence on par with canonical protein-coding genes, which we refer to as the Primary set. This set can serve as a reliable reference for downstream analyses and validation, with a particular emphasis on high quality. Overall, this update reflects continual community-driven efforts to make ncORFs accessible and actionable to the broader research public and further iterations of the catalog will continue to expand and refine this resource.","journal":null,"year":2025,"id":9240,"datarank":0.2985064673411729,"base_score":1.9459101490553132,"endowment":1.9459101490553132,"self_citation_contribution":0.29188652235829704,"citation_network_contribution":0.006619944982875882,"self_endowment_contribution":0.29188652235829704,"citer_contribution":0.006619944982875882,"corpus_percentile":47.030105777054516,"corpus_rank":652,"citation_count":7,"citer_count":3,"citers_with_citation_signal":1,"citers_with_endowment":1,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9295,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2025-07-07","fair_score":41.4583,"fair_percentile":20.734388742304308,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":17867,"name":"Jorge Ruiz-Orera","orcid":"0000-0002-8317-0034","position":1,"is_corresponding":false},{"id":17785,"name":"Jack A. S. Tierney","orcid":"0000-0002-5331-3604","position":2,"is_corresponding":false},{"id":17807,"name":"Jim Clauwaert","orcid":"0000-0002-5876-1406","position":3,"is_corresponding":false},{"id":6362,"name":"Eric W. Deutsch","orcid":"0000-0001-8732-0928","position":4,"is_corresponding":false},{"id":17786,"name":"M. Mar Albà","orcid":"0000-0002-7963-7375","position":5,"is_corresponding":false},{"id":17788,"name":"Julie L. Aspden","orcid":"0000-0002-8537-6204","position":6,"is_corresponding":false},{"id":17938,"name":"Pavel V. Baranov","orcid":"0000-0001-9017-0270","position":7,"is_corresponding":false},{"id":78835,"name":"Ariel Alejandro Bazzini","orcid":"0000-0002-2251-5174","position":8,"is_corresponding":false},{"id":78836,"name":"Elspeth A. Bruford","orcid":"0000-0002-8380-5247","position":9,"is_corresponding":false},{"id":17796,"name":"Marie A. Brunet","orcid":"0000-0001-5973-3522","position":10,"is_corresponding":false},{"id":78837,"name":"Tristan Cardon","orcid":"0000-0003-1751-0528","position":11,"is_corresponding":false},{"id":17800,"name":"Anne-Ruxandra Carvunis","orcid":"0000-0002-6474-6413","position":12,"is_corresponding":false},{"id":78838,"name":"Claudio Casola","orcid":"0000-0003-4853-1866","position":13,"is_corresponding":false},{"id":17805,"name":"Jyoti Sharma Choudhary","orcid":"0000-0003-0881-5477","position":14,"is_corresponding":false},{"id":17908,"name":"Alexandre David","orcid":"0000-0003-3365-1339","position":15,"is_corresponding":false},{"id":17820,"name":"Pouya Faridi","orcid":"0000-0002-2712-3356","position":16,"is_corresponding":false},{"id":17821,"name":"Ivo Fierro-Monti","orcid":"0000-0002-5460-2117","position":17,"is_corresponding":false},{"id":78839,"name":"Isabelle Fournier","orcid":"0000-0003-1096-5044","position":18,"is_corresponding":false},{"id":29021,"name":"Michael G. FitzGerald","orcid":"0000-0002-0488-0530","position":19,"is_corresponding":false},{"id":42990,"name":"Grigorios Georgolopoulos","orcid":"0000-0002-9906-4797","position":20,"is_corresponding":false},{"id":17830,"name":"Norbert Hübner","orcid":"0000-0002-1218-6223","position":21,"is_corresponding":false},{"id":19719,"name":"Yunzhe Jiang","orcid":"0000-0001-8768-0050","position":22,"is_corresponding":false},{"id":14693,"name":"Sharon L. R. Kardia","orcid":"0000-0002-9853-3379","position":23,"is_corresponding":false},{"id":78840,"name":"Leron W. Kok","orcid":"0009-0008-0841-2313","position":24,"is_corresponding":false},{"id":17848,"name":"Thomas F. Martinez","orcid":"0000-0002-4011-8164","position":25,"is_corresponding":false},{"id":17853,"name":"Gerben Menschaert","orcid":"0000-0002-7575-2085","position":26,"is_corresponding":false},{"id":21455,"name":"Pengyu Ni","orcid":"0000-0001-9878-5480","position":27,"is_corresponding":false},{"id":5925,"name":"Sandra Orchard","orcid":"0000-0002-8878-3972","position":28,"is_corresponding":false},{"id":17865,"name":"Xavier Roucou","orcid":"0000-0001-9370-5584","position":29,"is_corresponding":false},{"id":11731,"name":"Joel Rozowsky","orcid":"0000-0002-3565-0762","position":30,"is_corresponding":false},{"id":78841,"name":"Michel Salzet","orcid":"0000-0003-4318-0817","position":31,"is_corresponding":false},{"id":78842,"name":"Mauro Siragusa","orcid":"0000-0002-5862-5156","position":32,"is_corresponding":false},{"id":17784,"name":"Michał I. Świrski","orcid":"0000-0002-8585-136X","position":34,"is_corresponding":false},{"id":17931,"name":"Eivind Valen","orcid":"0000-0003-1840-6108","position":35,"is_corresponding":false},{"id":17887,"name":"Juan Antonio Vizcaino","orcid":"0000-0002-3905-4335","position":36,"is_corresponding":false},{"id":78844,"name":"Aaron Wacholder","orcid":"0000-0001-8739-0029","position":37,"is_corresponding":false},{"id":3210,"name":"Wei Wu","orcid":"0000-0002-6556-067X","position":38,"is_corresponding":false},{"id":17893,"name":"Zhi Xie","orcid":"0000-0002-5589-4836","position":39,"is_corresponding":false},{"id":78845,"name":"Robert L. Moritz","orcid":"0000-0002-3216-9447","position":41,"is_corresponding":false},{"id":78847,"name":"Sebastiaan van Heesch","orcid":"0000-0001-9593-1980","position":43,"is_corresponding":false},{"id":35299,"name":"John R. Prensner","orcid":"0000-0002-7024-636X","position":44,"is_corresponding":false},{"id":17874,"name":"Sarah A. Slavoff","orcid":"0000-0002-4443-2070","position":47,"is_corresponding":false},{"id":59139,"name":"Yucheng T. Yang","orcid":"0000-0002-6873-5279","position":48,"is_corresponding":false},{"id":24517,"name":"Shinichi Morishita","orcid":"0000-0002-6201-8885","position":49,"is_corresponding":false},{"id":52738,"name":"Owen J. L. Rackham","orcid":"0000-0002-4390-0872","position":50,"is_corresponding":false},{"id":17804,"name":"Sonia Chothani","orcid":"0000-0002-1010-7069","position":0,"is_corresponding":true}],"reference_count":0,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":"40672165","pmcid":"PMC12265627","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":52.5,"fair_a":55.0,"fair_i":25.0,"fair_r":33.3333,"fair_zscore":-0.3395,"fair_rationale":{"fair_score":41.46,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":52.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The paper provides a DOI and mentions a URL for data availability, but does not describe any machine-readable metadata (e.g., structured data, ontologies, or standardized identifiers beyond basic gene names)."}]},"A":{"name":"Accessible","score":55.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper states data will be available at a specific URL upon final publication, but does not provide a clear, immediate access protocol (e.g., direct download link, repository, or license) for the current preprint version."}]},"I":{"name":"Interoperable","score":25.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper uses standard formats (e.g., GENCODE, Ensembl) and some identifiers (e.g., gene names), but does not specify use of standard vocabularies or machine-readable formats for the ncORF catalog itself."}]},"R":{"name":"Reusable","score":33.33,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.5,"signal":null,"rationale":"The paper includes a data-availability statement and describes the catalog's content, but lacks a clear license for reuse and does not provide full reproducibility details (e.g., exact code, parameters, or versioned data files)."}]}},"suggestions":["Provide a direct, permanent link to the data (e.g., Zenodo or Figshare) with a DOI in the abstract or data-availability section.","Include a clear open license (e.g., CC0 or CC-BY) for the catalog to enable reuse.","Deposit the catalog in a machine-readable format (e.g., GFF3, BED, or JSON) with accompanying metadata schema.","Add a code availability statement with a link to the analysis scripts and versioned software used.","Specify the exact thresholds and parameters for translation signature scores in a structured, reusable way (e.g., as a supplementary table or configuration file)."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"unpaywall_pdf"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"unpaywall_pdf","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:51:16.917326Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}