{"doi":"10.1093/database/baz086","title":"Mammalian Annotation Database for improved annotation and functional classification of Omics datasets from less well-annotated organisms","abstract":"<jats:title>Abstract</jats:title>\n               <jats:p>Next-generation sequencing technologies and the availability of an increasing number of mammalian and other genomes allow gene expression studies, particularly RNA sequencing, in many non-model organisms. However, incomplete genome annotation and assignments of genes to functional annotation databases can lead to a substantial loss of information in downstream data analysis. To overcome this, we developed Mammalian Annotation Database tool (MAdb, https://madb.ethz.ch) to conveniently provide homologous gene information for selected mammalian species. The assignment between species is performed in three steps: (i) matching official gene symbols, (ii) using ortholog information contained in Ensembl Compara and (iii) pairwise BLAST comparisons of all transcripts. In addition, we developed a new tool (AnnOverlappeR) for the reliable assignment of the National Center for Biotechnology Information (NCBI) and Ensembl gene IDs. The gene lists translated to gene IDs of well-annotated species such as a human can be used for improved functional annotation with relevant tools based on Gene Ontology and molecular pathway information. We tested the MAdb on a published RNA-seq data set for the pig and showed clearly improved overrepresentation analysis results based on the assigned human homologous gene identifiers. Using the MAdb revealed a similar list of human homologous genes and functional annotation results regardless of whether starting with gene IDs from NCBI or Ensembl. The MAdb database is accessible via a web interface and a Galaxy application.</jats:p>","journal":"Database","year":2019,"id":16657,"datarank":0.5678708529474288,"base_score":2.833213344056216,"endowment":2.833213344056216,"self_citation_contribution":0.42498200160843247,"citation_network_contribution":0.14288885133899631,"self_endowment_contribution":0.42498200160843247,"citer_contribution":0.14288885133899631,"corpus_percentile":52.8,"corpus_rank":614,"citation_count":16,"citer_count":6,"citers_with_citation_signal":4,"citers_with_endowment":4,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":null,"is_oa":false,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":null,"fair_score":45.8333,"fair_percentile":43.51363236587511,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":122148,"name":"Shuqin Zeng","orcid":null,"position":1,"is_corresponding":false},{"id":122149,"name":"Mark D Robinson","orcid":null,"position":2,"is_corresponding":false},{"id":122150,"name":"Susanne E Ulbrich","orcid":null,"position":3,"is_corresponding":false},{"id":60926,"name":"Stefan Bauersachs","orcid":"0000-0003-2450-1216","position":4,"is_corresponding":false},{"id":122147,"name":"Jochen T Bick","orcid":null,"position":0,"is_corresponding":false}],"reference_count":0,"raw_metadata":{"has_enrichment":true,"base_score":2.833213344056216,"endowment":2.833213344056216,"datacite_reuse_total":0,"file_count":0,"downloads":0,"views":0,"has_version_chain":false,"is_dataset":false,"is_oa":false,"pmid":"31353404","pmcid":"PMC6661403","openalex_id":"https://openalex.org/W2965072475","authors":[],"funders":[{"funder_name":"Swiss National Science Foundation","grant_id":"31003A_159734","title":null},{"funder_name":"Swiss National Science Foundation","grant_id":"159734","title":"Embryonic Diapause in roe deer: a model for deciphering the control of developmental velocity"}],"total_grants":2,"fwci":1.0253,"citation_percentile":0.76143865,"influential_citations":0,"citation_trend":[{"year":2020,"count":5},{"year":2021,"count":5},{"year":2022,"count":2},{"year":2023,"count":1},{"year":2024,"count":1},{"year":2026,"count":1}],"oa_status":"gold","license":"cc-by","oa_locations":[{"url":"https://academic.oup.com/database/article-pdf/doi/10.1093/database/baz086/29007463/baz086.pdf","host_type":"journal"},{"url":"https://doi.org/10.1093/database/baz086","host_type":"GOLD"},{"url":"https://academic.oup.com/database/article-pdf/doi/10.1093/database/baz086/29007463/baz086.pdf","host_type":"publisher"},{"url":"http://academic.oup.com/database/article-pdf/doi/10.1093/database/baz086/29007463/baz086.pdf","host_type":"publisher"},{"url":"https://pubmed.ncbi.nlm.nih.gov/31353404","host_type":"repository"},{"url":"https://www.ncbi.nlm.nih.gov/pmc/articles/6661403","host_type":"repository"},{"url":"http://hdl.handle.net/20.500.11850/360377","host_type":"repository"},{"url":"https://doi.org/10.3929/ethz-b-000360377","host_type":"repository"},{"url":"https://doi.org/10.5167/uzh-177057","host_type":"repository"},{"url":"https://europepmc.org/articles/PMC6661403","host_type":"Europe_PMC"},{"url":"https://europepmc.org/articles/PMC6661403?pdf=render","host_type":"Europe_PMC"},{"url":"https://dx.doi.org/10.5167/uzh-177057","host_type":""},{"url":"https://dx.doi.org/10.3929/ethz-b-000360377","host_type":""},{"url":"http://dx.doi.org/10.1093/database/baz086","host_type":""},{"url":"https://dx.doi.org/10.1093/database/baz086","host_type":""},{"url":"https://www.zora.uzh.ch/id/eprint/177057/","host_type":""}],"fields_of_study":["Bioinformatics and Genomic Networks","Gene expression and cancer classification","Genomics and Phylogenetic Studies","Medicine","Biology","Computer Science","0301 basic medicine","0303 health sciences","03 medical and health sciences","Animals","Databases, Genetic","Gene Ontology","Humans","Molecular Sequence Annotation","Sequence Analysis, RNA"],"mesh_terms":["Animals","Humans","Sequence Analysis, RNA","Databases, Genetic","Molecular Sequence Annotation","Gene Ontology"],"keywords":["Ensembl","Annotation","Gene Annotation","Gene","Gene nomenclature","Computational biology","Genome","Database","RefSeq","Genome project","Biology","Genome browser","Genetics","Genomics","Computer science","630 Agriculture","Sequence Analysis, RNA","Molecular Sequence Annotation","1100 General Agricultural and Biological Sciences","1710 Information Systems","10124 Institute of Molecular Life Sciences","10187 Department of Farm Animals","Database Tool","Gene Ontology","1300 General Biochemistry, Genetics and Molecular Biology","Databases, Genetic","Animals","Humans","570 Life sciences; biology"],"sdg_mappings":[{"sdg_number":3,"sdg_label":"3. Good health"}],"linked_datasets":[],"clinical_trials":[],"software_tools":[],"database_accessions":[{"name":"geo"}],"source":"live","citation_network_status":"fetched"},"created_at":"2026-06-02T12:23:22.791493Z","pmid":"31353404","pmcid":"PMC6661403","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":52.5,"fair_a":72.5,"fair_i":25.0,"fair_r":33.3333,"fair_zscore":0.0563,"fair_rationale":{"fair_score":45.83,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":52.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The paper provides a web interface and Galaxy app for the database but does not describe any machine-readable metadata (e.g., structured metadata, schema.org markup, or API metadata) for the data or code."}]},"A":{"name":"Accessible","score":72.5,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":0.5,"signal":"files/OA location present but not flagged OA","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"16 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper clearly states the MAdb is accessible via a web interface (https://madb.ethz.ch) and a Galaxy application, and the AnnOverlappeR code is on GitHub, but no explicit protocol for automated access (e.g., API) is described."}]},"I":{"name":"Interoperable","score":25.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper uses standard formats (GFF, GTF, BLAST) and identifiers (Entrez Gene IDs, Ensembl IDs, HGNC symbols), but does not mention use of standard vocabularies or ontologies for the database content itself."}]},"R":{"name":"Reusable","score":33.33,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.5,"signal":null,"rationale":"The paper includes a data-availability statement for the code (GitHub) and database (web page), and the article is Open Access under CC-BY, but no explicit license for the database or code is stated, and reproducibility details (e.g., exact software versions) are incomplete."}]}},"suggestions":["Add machine-readable metadata (e.g., JSON-LD or schema.org markup) to the MAdb web interface to improve findability.","Provide a documented API or SPARQL endpoint for automated access to the MAdb database.","Include a clear license (e.g., MIT for code, CC0 for data) in the GitHub repository and on the web page.","Specify exact versions of all software and dependencies used (e.g., Ensembl release 95, BLAST version) in a reproducibility section.","Use standard ontologies (e.g., OBO Foundry) for functional annotation categories within the database to enhance interoperability."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:48:23.586491Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}