{"doi":"10.1371/journal.pcbi.1002957","title":"Functional Knowledge Transfer for High-accuracy Prediction of Under-studied Biological Processes","abstract":"A key challenge in genetics is identifying the functional roles of genes in pathways. Numerous functional genomics techniques (e.g. machine learning) that predict protein function have been developed to address this question. These methods generally build from existing annotations of genes to pathways and thus are often unable to identify additional genes participating in processes that are not already well studied. Many of these processes are well studied in some organism, but not necessarily in an investigator's organism of interest. Sequence-based search methods (e.g. BLAST) have been used to transfer such annotation information between organisms. We demonstrate that functional genomics can complement traditional sequence similarity to improve the transfer of gene annotations between organisms. Our method transfers annotations only when functionally appropriate as determined by genomic data and can be used with any prediction algorithm to combine transferred gene function knowledge with organism-specific high-throughput data to enable accurate function prediction. We show that diverse state-of-art machine learning algorithms leveraging functional knowledge transfer (FKT) dramatically improve their accuracy in predicting gene-pathway membership, particularly for processes with little experimental knowledge in an organism. We also show that our method compares favorably to annotation transfer by sequence similarity. Next, we deploy FKT with state-of-the-art SVM classifier to predict novel genes to 11,000 biological processes across six diverse organisms and expand the coverage of accurate function predictions to processes that are often ignored because of a dearth of annotated genes in an organism. Finally, we perform in vivo experimental investigation in Danio rerio and confirm the regulatory role of our top predicted novel gene, wnt5b, in leftward cell migration during heart development. FKT is immediately applicable to many bioinformatics techniques and will help biologists systematically integrate prior knowledge from diverse systems to direct targeted experiments in their organism of study.","journal":"PLoS Computational Biology","year":2013,"id":2537,"datarank":0.6141516843333151,"base_score":4.0943445622221,"endowment":4.0943445622221,"self_citation_contribution":0.6141516843333151,"citation_network_contribution":0.0,"self_endowment_contribution":0.6141516843333151,"citer_contribution":0.0,"corpus_percentile":null,"corpus_rank":null,"citation_count":59,"citer_count":0,"citers_with_citation_signal":0,"citers_with_endowment":0,"datacite_reuse_total":0,"is_dataset":false,"is_dataset_confidence":0.0502,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2013-03-14","fair_score":12.5,"fair_percentile":0.15391380826737028,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":27352,"name":"Aaron K. Wong","orcid":"0000-0002-7294-0646","position":1,"is_corresponding":false},{"id":308,"name":"Casey S. Greene","orcid":"0000-0001-8713-9213","position":2,"is_corresponding":false},{"id":30100,"name":"Jessica Rowland","orcid":null,"position":3,"is_corresponding":false},{"id":4926,"name":"Yuanfang Guan","orcid":"0000-0001-8275-2852","position":4,"is_corresponding":false},{"id":30101,"name":"Lars A. Bongo","orcid":null,"position":5,"is_corresponding":false},{"id":30102,"name":"Rebecca D. Burdine","orcid":"0000-0001-6620-5015","position":6,"is_corresponding":false},{"id":5951,"name":"Olga G. Troyanskaya","orcid":"0000-0002-5676-5737","position":7,"is_corresponding":false},{"id":30103,"name":"Lars Ailo Bongo","orcid":"0000-0002-7544-2482","position":8,"is_corresponding":false},{"id":30099,"name":"Christopher Y. Park","orcid":"0000-0003-2018-3476","position":0,"is_corresponding":true}],"reference_count":96,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":20.0,"fair_a":30.0,"fair_i":0.0,"fair_r":0.0,"fair_zscore":-2.9589,"fair_rationale":{"fair_score":12.5,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":20.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"datacite=0, pmcid=False, pmid=False","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.0,"signal":null,"rationale":"The paper text does not describe any machine-readable metadata, such as structured metadata or formal data descriptors, for the data or code."}]},"A":{"name":"Accessible","score":30.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.0,"signal":null,"rationale":"The text provides no protocol, repository link, or statement on how to access the underlying data, code, or supplementary materials."}]},"I":{"name":"Interoperable","score":0.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.0,"signal":null,"rationale":"No standard file formats, controlled vocabularies, or persistent identifiers are mentioned for the datasets or models used."}]},"R":{"name":"Reusable","score":0.0,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"not a dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.0,"signal":null,"rationale":"There is no data-availability statement, license information, or description of steps for reproduction of results; the text focuses solely on methods and outcomes."}]}},"suggestions":["Deposit all source code and processed data in a public repository (e.g., Zenodo, GitHub) with a persistent DOI.","Include a data-availability statement that specifies license and access conditions for reuse.","Use standard ontologies (e.g., Gene Ontology) and file formats (e.g., CSV, RDF) and provide a metadata schema.","Add structured metadata (e.g., schema.org markup) to the paper and its supplementary files to improve findability by machines."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v1","fulltext_source":"abstract_only"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v1","fair_fulltext_source":"abstract_only","fair_has_llm":true,"fair_computed_at":"2026-06-14T20:31:32.411553Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}