{"doi":"10.1038/s41597-023-02258-0","title":"The Translational Data Catalog - discoverable biomedical datasets","abstract":"The discoverability of datasets resulting from the diverse range of translational and biomedical projects remains sporadic. It is especially difficult for datasets emerging from pre-competitive projects, often due to the legal constraints of data-sharing agreements, and the different priorities of the private and public sectors. The Translational Data Catalog is a single discovery point for the projects and datasets produced by a number of major research programmes funded by the European Commission. Funded by and rooted in a number of these European private-public partnership projects, the Data Catalog is built on FAIR-enabling community standards, and its mission is to ensure that datasets are findable and accessible by machines. Here we present its creation, content, value and adoption, as well as the next steps for sustainability within the ELIXIR ecosystem.","journal":"Scientific Data","year":2023,"id":9732,"datarank":0.3246184725514056,"base_score":1.9459101490553132,"endowment":1.9459101490553132,"self_citation_contribution":0.29188652235829704,"citation_network_contribution":0.032731950193108525,"self_endowment_contribution":0.29188652235829704,"citer_contribution":0.032731950193108525,"corpus_percentile":47.76240846216436,"corpus_rank":643,"citation_count":6,"citer_count":1,"citers_with_citation_signal":1,"citers_with_endowment":1,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9013,"is_oa":true,"file_count":0,"downloads":21,"has_version_chain":false,"published_date":"2023-07-20","fair_score":66.4583,"fair_percentile":96.3060686015831,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":289,"name":"Rocca-Serra, Philippe","orcid":"0000-0001-9853-5668","position":1,"is_corresponding":false},{"id":2540,"name":"Daniel J. B. Clarke","orcid":"0000-0003-3471-7416","position":2,"is_corresponding":false},{"id":56840,"name":"Nirmeen Sallam","orcid":null,"position":3,"is_corresponding":false},{"id":56841,"name":"François Ancien","orcid":"0000-0002-0895-1746","position":4,"is_corresponding":false},{"id":56842,"name":"Abetare Shabani","orcid":null,"position":5,"is_corresponding":false},{"id":56843,"name":"Saeideh Asariardakani","orcid":null,"position":6,"is_corresponding":false},{"id":31883,"name":"Pinar Alper","orcid":"0000-0002-2224-0780","position":7,"is_corresponding":false},{"id":56844,"name":"Soumyabrata Ghosh","orcid":"0000-0003-0659-6733","position":8,"is_corresponding":false},{"id":2487,"name":"Tony Burdett","orcid":"0000-0002-2513-5396","position":9,"is_corresponding":false},{"id":290,"name":"Susanna‐Assunta Sansone","orcid":"0000-0001-5306-5690","position":10,"is_corresponding":false},{"id":2482,"name":"Wei Gu","orcid":"0000-0003-3951-6680","position":11,"is_corresponding":false},{"id":35903,"name":"Venkata Satagopam","orcid":"0000-0002-6532-5880","position":12,"is_corresponding":false},{"id":2498,"name":"Danielle Welter","orcid":"0000-0003-1058-2668","position":0,"is_corresponding":true}],"reference_count":15,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":"37474618","pmcid":"PMC10359386","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":90.0,"fair_a":67.5,"fair_i":50.0,"fair_r":58.3333,"fair_zscore":1.9219,"fair_rationale":{"fair_score":66.46,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":90.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"The paper describes rich, machine-readable metadata via the DATS model, Bioschemas markup, and ontology annotations, all of which are explicitly machine-processable."}]},"A":{"name":"Accessible","score":67.5,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"Access protocols are described for hosted datasets (via REMS and identity providers), but for external datasets only DUO codes and licensing are encouraged without providing direct access request functionality."}]},"I":{"name":"Interoperable","score":50.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"The paper explicitly uses community standards (DATS, DCAT, schema.org, OBO ontologies, DUO, Bioschemas) and provides JSON-LD context files for semantic interoperability."}]},"R":{"name":"Reusable","score":58.33,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"downloads=21","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.833,"signal":null,"rationale":"A clear data-availability statement and license (CC-BY 4.0) are provided, but reproducibility is limited as not all datasets are publicly accessible and no explicit reproducibility checks are mentioned."}]}},"suggestions":["Implement direct data access request functionality for externally hosted datasets to improve accessibility.","Provide explicit reproducibility documentation, e.g., containerized analysis pipelines for the cataloged datasets.","Add formal versioning of dataset entries with a persistent identifier for each version to enhance reusability over time.","Extend Bioschemas profiles to cover 'Project' and 'Study' entities to improve findability via semantic search.","Publish a machine-readable data use policy (e.g., a standard license file) alongside each dataset entry."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:52:04.391687Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}