{"doi":"10.1038/nature11252","title":"Comprehensive molecular characterization of human colon and rectal cancer","abstract":"To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase ε (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.","journal":"Nature","year":2012,"id":3285,"datarank":19.172828308050594,"base_score":9.043695294567236,"endowment":9.043695294567236,"self_citation_contribution":1.3565542941850857,"citation_network_contribution":17.81627401386551,"self_endowment_contribution":1.3565542941850857,"citer_contribution":17.81627401386551,"corpus_percentile":93.89747762408462,"corpus_rank":76,"citation_count":8620,"citer_count":200,"citers_with_citation_signal":200,"citers_with_endowment":200,"datacite_reuse_total":25,"is_dataset":true,"is_dataset_confidence":0.6541,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2012-07-18","fair_score":52.9167,"fair_percentile":79.11169744942832,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":35252,"name":"The Cancer Genome Atlas Network","orcid":null,"position":0,"is_corresponding":true}],"reference_count":44,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":"22810696","pmcid":"PMC3401966","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"hybrid","license":"cc-by-nc-sa","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":65.0,"fair_a":67.5,"fair_i":37.5,"fair_r":41.6667,"fair_zscore":0.697,"fair_rationale":{"fair_score":52.92,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":65.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=25, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper provides data deposition locations and accession numbers in supplementary materials, but lacks embedded machine-readable metadata within the article itself."}]},"A":{"name":"Accessible","score":67.5,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper clearly states data are deposited in dbGaP and DCC with public URLs, though dbGaP requires authorization for access."}]},"I":{"name":"Interoperable","score":37.5,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"linked_datasets=0, datacite=25","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"No explicit use of standard file formats, ontologies, or persistent identifiers beyond gene symbols and COSMIC references is mentioned."}]},"R":{"name":"Reusable","score":41.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.667,"signal":null,"rationale":"Data are available under a CC BY-NC-SA license, but reproducibility is limited due to lack of code availability and fully detailed pipelines."}]}},"suggestions":["Include structured metadata such as BioProject and sample IDs in the main text.","Specify file formats (e.g., VCF, BAM, MAF) and use standard ontologies for annotations.","Deposit analysis code and computational pipelines in a public repository like GitHub.","Provide a more detailed data availability statement with direct links to all datasets.","Use persistent identifiers (e.g., DOIs) for all data resources and supplementary files."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:26:25.923839Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}