{"doi":"10.1093/database/bat029","title":"The MetaboLights repository: curation challenges in metabolomics","abstract":"MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions. Database URL: http://www.ebi.ac.uk/metabolights.","journal":"Database","year":2013,"id":11406,"datarank":2.21359296616098,"base_score":3.9512437185814275,"endowment":3.9512437185814275,"self_citation_contribution":0.5926865577872142,"citation_network_contribution":1.6209064083737654,"self_endowment_contribution":0.5926865577872142,"citer_contribution":1.6209064083737654,"corpus_percentile":65.17493897477624,"corpus_rank":429,"citation_count":51,"citer_count":38,"citers_with_citation_signal":36,"citers_with_endowment":36,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9562,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2013-01-01","fair_score":59.1667,"fair_percentile":92.10642040457344,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":7597,"name":"Kenneth Haug","orcid":"0000-0003-3168-4145","position":1,"is_corresponding":false},{"id":22181,"name":"Pablo Conesa","orcid":"0000-0003-0575-7718","position":2,"is_corresponding":false},{"id":71,"name":"Janna Hastings","orcid":"0000-0002-3469-4923","position":3,"is_corresponding":false},{"id":22183,"name":"Mark Williams","orcid":"0000-0002-1294-1110","position":4,"is_corresponding":false},{"id":22182,"name":"Tejasvi Mahendraker","orcid":null,"position":5,"is_corresponding":false},{"id":284,"name":"Eamonn Maguire","orcid":"0000-0002-7277-7834","position":6,"is_corresponding":false},{"id":5765,"name":"Philippe Rocca-Serra","orcid":null,"position":8,"is_corresponding":false},{"id":73,"name":"Christoph Steinbeck","orcid":"0000-0001-6966-0814","position":10,"is_corresponding":false},{"id":286,"name":"Alejandra Noemí González Beltrán","orcid":"0000-0003-3499-8262","position":11,"is_corresponding":false},{"id":289,"name":"Rocca-Serra, Philippe","orcid":"0000-0001-9853-5668","position":12,"is_corresponding":false},{"id":290,"name":"Susanna‐Assunta Sansone","orcid":"0000-0001-5306-5690","position":13,"is_corresponding":false},{"id":22184,"name":"Reza M. Salek","orcid":"0000-0001-8604-1732","position":0,"is_corresponding":true}],"reference_count":23,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":"23630246","pmcid":"PMC3638156","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":77.5,"fair_a":80.0,"fair_i":37.5,"fair_r":41.6667,"fair_zscore":1.2623,"fair_rationale":{"fair_score":59.17,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":77.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper describes rich metadata capture using ISA-Tab, ontologies, and external identifiers (e.g., ChEBI, SMILES, InChI), but does not provide evidence of machine-readable metadata beyond the ISA-Tab format."}]},"A":{"name":"Accessible","score":80.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"The paper clearly states multiple access protocols: direct download, FTP, and online search, with explicit URLs and terms of use, ensuring open access."}]},"I":{"name":"Interoperable","score":37.5,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper uses standard formats (ISA-Tab, mzTab), ontologies (ChEBI, MSI), and identifiers (ChEBI, PubChem), but does not demonstrate full interoperability with other systems beyond planned future work."}]},"R":{"name":"Reusable","score":41.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.667,"signal":null,"rationale":"The paper provides a data-availability statement (open access under CC BY-NC), mentions reproducibility challenges, and offers source code, but lacks a formal license for the data and explicit reproducibility instructions."}]}},"suggestions":["Include explicit machine-readable metadata (e.g., JSON-LD or RDF) to enhance findability.","Add a formal data license (e.g., CC0 or CC BY) to clarify reuse permissions.","Provide a step-by-step reproducibility guide or containerized workflow for each study.","Implement programmatic access (e.g., REST API) to improve accessibility for automated tools.","Use persistent identifiers (e.g., DOIs) for each dataset to ensure long-term findability."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:42:55.018822Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}