{"doi":"10.1186/1471-2105-10-s11-s8","title":"Structural and functional-annotation of an equine whole genome oligoarray","abstract":"<jats:title>Abstract</jats:title>\n          <jats:sec>\n            <jats:title>Background</jats:title>\n            <jats:p>The horse genome is sequenced, allowing equine researchers to use high-throughput functional genomics platforms such as microarrays; next-generation sequencing for gene expression and proteomics. However, for researchers to derive value from these functional genomics datasets, they must be able to model this data in biologically relevant ways; to do so requires that the equine genome be more fully annotated. There are two interrelated types of genomic annotation: structural and functional. Structural annotation is delineating and demarcating the genomic elements (such as genes, promoters, and regulatory elements). Functional annotation is assigning function to structural elements. The Gene Ontology (GO) is the <jats:italic>de facto</jats:italic> standard for functional annotation, and is routinely used as a basis for modelling and hypothesis testing, large functional genomics datasets.</jats:p>\n          </jats:sec>\n          <jats:sec>\n            <jats:title>Results</jats:title>\n            <jats:p>An Equine Whole Genome Oligonucleotide (EWGO) array with 21,351 elements was developed at Texas A&amp;M University. This 70-mer oligoarray was designed using the approximately 7× assembled and annotated sequence of the equine genome to be one of the most comprehensive arrays available for expressed equine sequences. To assist researchers in determining the biological meaning of data derived from this array, we have structurally annotated it by mapping the elements to multiple database accessions, including UniProtKB, Entrez Gene, NRPD (Non-Redundant Protein Database) and UniGene. We next provided GO functional annotations for the gene transcripts represented on this array. Overall, we GO annotated 14,531 gene products (68.1% of the gene products represented on the EWGO array) with 57,912 annotations. GAQ (GO Annotation Quality) scores were calculated for this array both before and after we added GO annotation. The additional annotations improved the <jats:italic>meanGAQ</jats:italic> score 16-fold. This data is publicly available at <jats:italic>AgBase</jats:italic>\n              <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"http://www.agbase.msstate.edu/\" ext-link-type=\"uri\">http://www.agbase.msstate.edu/</jats:ext-link>.</jats:p>\n          </jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion</jats:title>\n            <jats:p>Providing additional information about the public databases which link to the gene products represented on the array allows users more flexibility when using gene expression modelling and hypothesis-testing computational tools. Moreover, since different databases provide different types of information, users have access to multiple data sources. In addition, our GO annotation underpins functional modelling for most gene expression analysis tools and enables equine researchers to model large lists of differentially expressed transcripts in biologically relevant ways.</jats:p>\n          </jats:sec>","journal":"BMC Bioinformatics","year":2009,"id":30991,"datarank":0.861476404279395,"base_score":3.1780538303479458,"endowment":3.1780538303479458,"self_citation_contribution":0.47670807455219194,"citation_network_contribution":0.3847683297272031,"self_endowment_contribution":0.47670807455219194,"citer_contribution":0.3847683297272031,"corpus_percentile":56.7,"corpus_rank":604,"citation_count":23,"citer_count":15,"citers_with_citation_signal":13,"citers_with_endowment":13,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":null,"is_oa":false,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":null,"fair_score":45.8333,"fair_percentile":43.51363236587511,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":166917,"name":"Shane C Burgess","orcid":null,"position":1,"is_corresponding":false},{"id":167091,"name":"Bhanu Chowdhary","orcid":null,"position":2,"is_corresponding":false},{"id":167092,"name":"Cyprianna E Swiderski","orcid":null,"position":3,"is_corresponding":false},{"id":166906,"name":"Fiona M McCarthy","orcid":"0000-0003-2175-5464","position":4,"is_corresponding":false},{"id":167090,"name":"Lauren A Bright","orcid":null,"position":0,"is_corresponding":false}],"reference_count":0,"raw_metadata":{"has_enrichment":true,"base_score":3.1780538303479458,"endowment":3.1780538303479458,"datacite_reuse_total":0,"file_count":0,"downloads":0,"views":0,"has_version_chain":false,"is_dataset":false,"is_oa":false,"pmid":"19811692","pmcid":"PMC3226197","openalex_id":"https://openalex.org/W1972257067","authors":[],"funders":[],"total_grants":0,"fwci":1.3687,"citation_percentile":0.79317147,"influential_citations":2,"citation_trend":[{"year":2013,"count":4},{"year":2015,"count":1},{"year":2017,"count":1},{"year":2018,"count":1},{"year":2019,"count":2},{"year":2020,"count":1},{"year":2023,"count":1},{"year":2024,"count":1}],"oa_status":"gold","license":"cc-by","oa_locations":[{"url":"https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-10-S11-S8","host_type":"journal"},{"url":"https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-10-S11-S8","host_type":"GOLD"},{"url":"https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-10-S11-S8","host_type":"publisher"},{"url":"https://link.springer.com/content/pdf/10.1186/1471-2105-10-S11-S8.pdf","host_type":"publisher"},{"url":"https://doi.org/10.1186/1471-2105-10-s11-s8","host_type":"journal"},{"url":"https://pubmed.ncbi.nlm.nih.gov/19811692","host_type":"repository"},{"url":"https://doaj.org/article/b676cb34d0684f37a9d079c96dee4b08","host_type":"repository"},{"url":"https://hdl.handle.net/1969.1/180232","host_type":"repository"},{"url":"https://www.ncbi.nlm.nih.gov/pmc/articles/3226197","host_type":"repository"},{"url":"http://www.biomedcentral.com/content/pdf/1471-2105-10_suppl_11-S8.pdf","host_type":"BioMedCentral"},{"url":"http://www.biomedcentral.com/1471-2105/10_suppl_11/S8/abstract","host_type":"BioMedCentral"},{"url":"http://www.biomedcentral.com/1471-2105/10_suppl_11/S8","host_type":"BioMedCentral"},{"url":"https://europepmc.org/articles/PMC3226197","host_type":"Europe_PMC"},{"url":"https://europepmc.org/articles/PMC3226197?pdf=render","host_type":"Europe_PMC"}],"fields_of_study":["Bioinformatics and Genomic Networks","Genomics and Phylogenetic Studies","Biomedical Text Mining and Ontologies","Medicine","Biology","Computer Science","Animals","Databases, Genetic","Genome","Genomics","Horses","Oligonucleotide Array Sequence Analysis"],"mesh_terms":["Animals","Horses","Genome","Oligonucleotide Array Sequence Analysis","Genomics","Databases, Genetic"],"keywords":["Annotation","Functional genomics","Gene Annotation","Genome","Computational biology","UniGene","DNA microarray","Genome project","Biology","UniProt","Genomics","RefSeq","Gene nomenclature","Gene prediction","Comparative genomics","Gene","Genetics","Expressed sequence tag","Gene expression"],"sdg_mappings":[],"linked_datasets":[],"clinical_trials":[],"software_tools":[],"database_accessions":[],"source":"live","citation_network_status":"fetched"},"created_at":"2026-06-09T06:10:17.485696Z","pmid":"19811692","pmcid":"PMC3226197","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":65.0,"fair_a":60.0,"fair_i":25.0,"fair_r":33.3333,"fair_zscore":0.0563,"fair_rationale":{"fair_score":45.83,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":65.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper describes structured annotations mapping array elements to multiple databases (UniProtKB, Entrez Gene, etc.) and provides GO annotations, but there is no evidence of machine-readable metadata (e.g., RDF, JSON-LD) or adherence to formal metadata standards like DCAT or Bioschemas."}]},"A":{"name":"Accessible","score":60.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":0.5,"signal":"files/OA location present but not flagged OA","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"14 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"Data is stated to be publicly available at AgBase (http://www.agbase.msstate.edu/) and users can contact for mapping tables, but no explicit access protocol (e.g., API, SPARQL endpoint, or permanent repository with clear download instructions) is described."}]},"I":{"name":"Interoperable","score":25.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper uses standard identifiers (UniProtKB, Entrez Gene, RefSeq, UniGene) and the Gene Ontology for annotation, but does not report use of standard data formats (e.g., GAF, RDF/XML) or controlled vocabularies beyond GO for the array data itself."}]},"R":{"name":"Reusable","score":33.33,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.5,"signal":null,"rationale":"The paper provides a data-availability statement (publicly available at AgBase) and is published under a CC-BY license, which supports reuse; however, no explicit license for the data itself is stated, and reproducibility is limited as the full annotation dataset is not directly provided or permanently archived with a DOI."}]}},"suggestions":["Provide machine-readable metadata (e.g., JSON-LD or RDF) describing the array annotations following schema.org or DCAT to enhance findability.","Deposit the full annotation mapping and GO annotation table in a persistent repository (e.g., Figshare or Zenodo) with a DOI and include the link in the paper.","Specify a clear data license (e.g., CC0 or CC-BY) for the annotation data itself, separate from the paper license.","Make the annotation data available via a standard API or SPARQL endpoint to improve automated access.","Use standardized file formats (e.g., GAF for GO annotations, or tab-separated with a documented schema) and reference them explicitly."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:47:12.357795Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}