{"doi":"10.1093/nar/gkac851","title":"<i>microbioTA</i>: an atlas of the microbiome in multiple disease tissues of <i>Homo sapiens</i> and <i>Mus musculus</i>","abstract":"microbioTA (http://bio-annotation.cn/microbiota) was constructed to provide a comprehensive, user-friendly resource for the application of microbiome data from diseased tissues, helping users improve their general knowledge and deep understanding of tissue-derived microbes. Various microbes have been found to colonize cancer tissues and play important roles in cancer diagnoses and outcomes, with many studies focusing on developing better cancer-related microbiome data. However, there are currently no independent, comprehensive open resources cataloguing cancer-related microbiome data, which limits the exploration of the relationship between these microbes and cancer progression. Given this, we propose a new strategy to re-align the existing next-generation sequencing data to facilitate the mining of hidden sequence data describing the microbiome to maximize available resources. To this end, we collected 417 publicly available datasets from 25 human and 14 mouse tissues from the Gene Expression Omnibus database and use these to develop a novel pipeline to re-align microbiome sequences facilitating in-depth analyses designed to reveal the microbial profile of various cancer tissues and their healthy controls. microbioTA is a user-friendly online platform which allows users to browse, search, visualize, and download microbial abundance data from various tissues along with corresponding analysis results, aimimg at providing a reference for cancer-related microbiome research.","journal":"Nucleic Acids Research","year":2022,"id":4655,"datarank":0.9802023793677361,"base_score":3.044522437723423,"endowment":3.044522437723423,"self_citation_contribution":0.4566783656585135,"citation_network_contribution":0.5235240137092226,"self_endowment_contribution":0.4566783656585135,"citer_contribution":0.5235240137092226,"corpus_percentile":58.0960130187144,"corpus_rank":516,"citation_count":22,"citer_count":19,"citers_with_citation_signal":17,"citers_with_endowment":17,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.946,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2022-10-03","fair_score":49.7917,"fair_percentile":77.9023746701847,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":49357,"name":"Sainan Zhang","orcid":"0000-0001-7258-4592","position":1,"is_corresponding":false},{"id":49358,"name":"Guoyou He","orcid":null,"position":2,"is_corresponding":false},{"id":49359,"name":"Meiyu Du","orcid":"0000-0001-7784-1308","position":3,"is_corresponding":false},{"id":16292,"name":"Changlu Qi","orcid":null,"position":4,"is_corresponding":false},{"id":49360,"name":"Ruyue Liu","orcid":"0000-0001-6122-7733","position":5,"is_corresponding":false},{"id":49361,"name":"Siyuan Zhang","orcid":"0000-0002-1307-2869","position":6,"is_corresponding":false},{"id":16291,"name":"Liang Cheng","orcid":"0000-0002-6665-6710","position":7,"is_corresponding":false},{"id":18585,"name":"Lei Shi","orcid":"0000-0002-5727-3590","position":8,"is_corresponding":false},{"id":16299,"name":"Xue Zhang","orcid":"0009-0004-0249-0581","position":9,"is_corresponding":false},{"id":49356,"name":"Ping Wang","orcid":"0000-0003-4451-1585","position":0,"is_corresponding":true}],"reference_count":36,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":"36189892","pmcid":"PMC9825499","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"other-oa","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":65.0,"fair_a":67.5,"fair_i":25.0,"fair_r":41.6667,"fair_zscore":0.4143,"fair_rationale":{"fair_score":49.79,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":65.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper provides descriptive metadata (e.g., tissue, disease, sample number) and mentions manual filtering, but does not describe structured, machine-readable metadata schemas or use of standardized ontologies."}]},"A":{"name":"Accessible","score":67.5,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"Access is clearly stated as free via a web URL (http://bio-annotation.cn/microbiota) with no login required, and code is on GitHub, but no explicit access protocol for programmatic retrieval or persistent identifiers for datasets is mentioned."}]},"I":{"name":"Interoperable","score":25.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper uses standard file formats (fastq, Excel, MySQL, PNG, etc.) and taxonomic classification tools (Kraken2, Bracken) but does not specify adoption of community standard vocabularies, ontologies, or identifier schemes (e.g., MIMARKS, NCBI taxonomy IDs) for interoperability."}]},"R":{"name":"Reusable","score":41.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.667,"signal":null,"rationale":"Data availability is stated with a direct URL and code link, the paper is Open Access under CC-BY-NC, but reproducibility details (e.g., exact pipeline versions, containerization) are not fully specified, and no data license for the hosted datasets is mentioned."}]}},"suggestions":["Provide metadata in a machine-readable format (e.g., JSON-LD, RDF) using standardized ontologies such as NCBI BioSample attributes or MIMARKS.","Assign persistent identifiers (e.g., DOIs) to individual datasets and the database itself to improve citation and long-term access.","Adopt a formal data license (e.g., CC0 or CC-BY 4.0) for all hosted datasets to clarify reuse rights.","Include a containerized analysis pipeline (e.g., Docker/Singularity) and version-controlled code with clear dependency specifications to enhance reproducibility."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:47:21.194879Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}