{"doi":"10.1186/gb-2007-8-4-r45","title":"Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags","abstract":"<h4>Background</h4>Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages.<h4>Results</h4>Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories.<h4>Conclusion</h4>This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.","journal":"Genome Biology","year":2007,"id":4994,"datarank":2.7342377229143824,"base_score":4.330733340286331,"endowment":4.330733340286331,"self_citation_contribution":0.6496100010429497,"citation_network_contribution":2.0846277218714326,"self_endowment_contribution":0.6496100010429497,"citer_contribution":2.0846277218714326,"corpus_percentile":66.96501220504476,"corpus_rank":407,"citation_count":75,"citer_count":45,"citers_with_citation_signal":39,"citers_with_endowment":39,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9443,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2007-04-02","fair_score":52.9167,"fair_percentile":79.11169744942832,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":51100,"name":"Susanna Cirera","orcid":"0000-0001-8105-1579","position":1,"is_corresponding":false},{"id":51101,"name":"Jakob Hedegaard","orcid":null,"position":2,"is_corresponding":false},{"id":51102,"name":"Michael J. Gilchrist","orcid":"0000-0001-9762-1951","position":3,"is_corresponding":false},{"id":51104,"name":"Claus Jørgensen","orcid":null,"position":5,"is_corresponding":false},{"id":51105,"name":"Karsten Scheibye-Knudsen","orcid":null,"position":6,"is_corresponding":false},{"id":51106,"name":"Troels Arvin","orcid":null,"position":7,"is_corresponding":false},{"id":51107,"name":"Steen Lumholdt","orcid":null,"position":8,"is_corresponding":false},{"id":51108,"name":"Milena Sawera","orcid":null,"position":9,"is_corresponding":false},{"id":51109,"name":"Trine Green","orcid":null,"position":10,"is_corresponding":false},{"id":51110,"name":"Bente J. Nielsen","orcid":null,"position":11,"is_corresponding":false},{"id":51111,"name":"Jakob H. Havgaard","orcid":"0000-0002-4816-814X","position":12,"is_corresponding":false},{"id":51112,"name":"Carina Rosenkilde","orcid":null,"position":13,"is_corresponding":false},{"id":6308,"name":"Jun Wang","orcid":"0000-0003-2509-9599","position":14,"is_corresponding":false},{"id":30887,"name":"Alexandra P. Lewis","orcid":"0000-0002-6195-4786","position":16,"is_corresponding":false},{"id":15500,"name":"Bin Liu","orcid":"0000-0002-9113-9694","position":17,"is_corresponding":false},{"id":31696,"name":"Songnian Hu","orcid":"0000-0003-3966-3111","position":18,"is_corresponding":false},{"id":54319,"name":"Ning Yang","orcid":"0000-0001-5772-3320","position":19,"is_corresponding":false},{"id":51113,"name":"Wei Li","orcid":"0000-0002-0693-3536","position":20,"is_corresponding":false},{"id":13179,"name":"Jun Yu","orcid":"0000-0001-5008-2153","position":21,"is_corresponding":false},{"id":828,"name":"Jian Wang","orcid":"0000-0002-9589-4056","position":22,"is_corresponding":false},{"id":51114,"name":"Hans-Henrik Stærfeldt","orcid":null,"position":23,"is_corresponding":false},{"id":51115,"name":"Rasmus Wernersson","orcid":"0000-0003-4417-9842","position":24,"is_corresponding":false},{"id":51116,"name":"Lone B Madsen","orcid":null,"position":25,"is_corresponding":false},{"id":51117,"name":"Bo Thomsen","orcid":"0000-0001-9622-0929","position":26,"is_corresponding":false},{"id":13108,"name":"Henrik Hornshøj","orcid":"0000-0001-6728-099X","position":27,"is_corresponding":false},{"id":51118,"name":"Zhan Bujie","orcid":null,"position":28,"is_corresponding":false},{"id":51119,"name":"Xuegang Wang","orcid":null,"position":29,"is_corresponding":false},{"id":51120,"name":"Xuefei Wang","orcid":"0000-0002-9327-9543","position":30,"is_corresponding":false},{"id":16366,"name":"Lars Bolund","orcid":null,"position":31,"is_corresponding":false},{"id":13083,"name":"Søren Brunak","orcid":"0000-0003-0316-5866","position":32,"is_corresponding":false},{"id":14247,"name":"Huanming Yang","orcid":"0000-0002-0858-3410","position":33,"is_corresponding":false},{"id":51121,"name":"Christian Bendixen","orcid":"0000-0002-6909-5162","position":34,"is_corresponding":false},{"id":51122,"name":"Merete Fredholm","orcid":"0000-0002-3563-7648","position":35,"is_corresponding":false},{"id":51123,"name":"Claus B. Jørgensen","orcid":"0000-0003-4749-5306","position":36,"is_corresponding":false},{"id":51124,"name":"Jun-Jun Wang","orcid":null,"position":37,"is_corresponding":false},{"id":51125,"name":"Hans‐Henrik Stærfeldt","orcid":null,"position":38,"is_corresponding":false},{"id":51126,"name":"Lone Bruhn Madsen","orcid":"0009-0003-2069-1849","position":39,"is_corresponding":false},{"id":51127,"name":"Bujie Zhan","orcid":null,"position":40,"is_corresponding":false},{"id":51099,"name":"Jan Gorodkin","orcid":"0000-0001-5823-4000","position":0,"is_corresponding":true}],"reference_count":57,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":"17407547","pmcid":"PMC1895994","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":52.5,"fair_a":80.0,"fair_i":37.5,"fair_r":41.6667,"fair_zscore":0.697,"fair_rationale":{"fair_score":52.92,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":52.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The paper mentions an online resource with a backend SQL database, but does not provide explicit machine-readable metadata or structured metadata formats."}]},"A":{"name":"Accessible","score":80.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"The paper clearly describes access to the data via the PigEST online resource and NCBI trace archive, with specific search terms and accession ranges."}]},"I":{"name":"Interoperable","score":37.5,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"Standard formats (FASTA, BLAST) and vocabularies (UniProt, GO) are used, but the paper does not explicitly state adherence to community standards for data representation."}]},"R":{"name":"Reusable","score":41.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.667,"signal":null,"rationale":"The data is openly available under CC BY license, with detailed methods and supplementary files, but the analysis code is not provided and raw data deposition is not in a fully FAIR-aligned repository."}]}},"suggestions":["Provide machine-readable metadata (e.g., JSON-LD or schema.org) for the dataset to enhance findability.","Deposit raw sequencing data in a FAIR-aligned repository with persistent identifiers (e.g., ArrayExpress or GEO).","Include a software/code repository (e.g., GitHub) with version control for the analysis pipeline to improve reproducibility.","Use standardized data formats (e.g., BAM, VCF) and ontologies for tissue types to improve interoperability.","Add a formal data availability statement with explicit license and access conditions in the abstract or methods section."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:41:28.519426Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}