{"doi":"10.1093/nar/gkaf687","title":"Proteogenomics-enabled discovery of novel small open reading frame (sORF)-encoded polypeptides in human and mouse tissues","abstract":"Small open reading frames (sORFs) encode an emerging class of functional proteins less than 100 amino acids in length. However, sORFs are incompletely characterized in mice and humans. The development of proteomics and Ribo-seq techniques has enabled the discovery of a number of sORF-encoded peptides (SEPs), but previous proteogenomics studies have been limited to a few cell lines or tissues. Given these limitations, a potentially vast number of sORFs remains to be discovered. We collected community-scale previously published proteomics data including one billion experimental spectra derived from a wide range of mouse and human tissues in order to identify novel sORFs and reveal the tissue expression status of novel and recently annotated sORF-encoded proteins. We have detected several novel sORFs in specific tissues, including a conserved protein-coding upstream overlapping ORF in HNRNPUL2 expressed in human lymphocytes, which may hold important biological functions. This work introduces a simple and efficient filtration strategy to detect novel sORFs. Our workflow will likely prove useful for future studies on sORFs in humans and other animals.","journal":"Nucleic Acids Research","year":2025,"id":10876,"datarank":0.10397207708399181,"base_score":0.6931471805599453,"endowment":0.6931471805599453,"self_citation_contribution":0.10397207708399181,"citation_network_contribution":0.0,"self_endowment_contribution":0.10397207708399181,"citer_contribution":0.0,"corpus_percentile":null,"corpus_rank":null,"citation_count":1,"citer_count":0,"citers_with_citation_signal":0,"citers_with_endowment":0,"datacite_reuse_total":0,"is_dataset":false,"is_dataset_confidence":0.0638,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2025-07-19","fair_score":12.5,"fair_percentile":0.15391380826737028,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":88769,"name":"Yuting Xie","orcid":"0000-0002-5577-6522","position":1,"is_corresponding":false},{"id":88770,"name":"Lingshuo Wang","orcid":"0009-0003-0825-4371","position":2,"is_corresponding":false},{"id":292,"name":"Irwin Jungreis","orcid":"0000-0002-3197-5367","position":3,"is_corresponding":false},{"id":88771,"name":"Tong Ou","orcid":"0000-0002-4872-9621","position":4,"is_corresponding":false},{"id":14693,"name":"Sharon L. R. Kardia","orcid":"0000-0002-9853-3379","position":5,"is_corresponding":false},{"id":88772,"name":"Jia Wang","orcid":"0009-0007-6556-5706","position":6,"is_corresponding":false},{"id":88773,"name":"Yafeng Zhu","orcid":"0000-0003-1947-9026","position":7,"is_corresponding":false},{"id":88768,"name":"Mei Yang","orcid":"0000-0002-3748-8834","position":0,"is_corresponding":true}],"reference_count":66,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":20.0,"fair_a":30.0,"fair_i":0.0,"fair_r":0.0,"fair_zscore":-2.9589,"fair_rationale":{"fair_score":12.5,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":20.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"datacite=0, pmcid=False, pmid=False","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.0,"signal":null,"rationale":"The paper does not mention any machine-readable metadata, such as structured data descriptions or use of persistent identifiers like DOIs for datasets."}]},"A":{"name":"Accessible","score":30.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.0,"signal":null,"rationale":"No protocol for accessing the underlying data or code is provided; the text only states that data were collected from published sources without specifying how to access them."}]},"I":{"name":"Interoperable","score":0.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.0,"signal":null,"rationale":"The paper does not specify use of standard formats, controlled vocabularies, or community identifiers for the data or results."}]},"R":{"name":"Reusable","score":0.0,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"not a dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.0,"signal":null,"rationale":"There is no data-availability statement, license, or description of reproducibility measures; the text only describes the study's findings and methods."}]}},"suggestions":["Provide a data-availability statement with a persistent identifier (e.g., DOI) for the collected spectra and results.","Deposit the workflow and filtration strategy in a public repository with a clear license.","Use standard file formats (e.g., mzML for spectra, FASTA for sequences) and reference ontologies for tissue and cell types.","Include machine-readable metadata (e.g., structured JSON-LD) describing the datasets and their provenance.","Specify a clear access protocol, such as a URL or repository link, for all data and code used in the study."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v1","fulltext_source":"abstract_only"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v1","fair_fulltext_source":"abstract_only","fair_has_llm":true,"fair_computed_at":"2026-06-14T20:30:25.740628Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}