{"doi":"10.1098/rsos.240016","title":"Expanding the data Ark: an attempt to make the data from highly cited social science papers publicly available","abstract":"Access to scientific data can enable independent reuse and verification; however, most data are not available and become increasingly irrecoverable over time. This study aimed to retrieve and preserve important datasets from 160 of the most highly-cited social science articles published between 2008-2013 and 2015-2018. We asked authors if they would share data in a public repository-the Data Ark-or provide reasons if data could not be shared. Of the 160 articles, data for 117 (73%, 95% CI [67%-80%]) were not available and data for 7 (4%, 95% CI [0%-12%]) were available with restrictions. Data for 36 (22%, 95% CI [16%-30%]) articles were available in unrestricted form: 29 of these datasets were already available and 7 datasets were made available in the Data Ark. Most authors did not respond to our data requests and a minority shared reasons for not sharing, such as legal or ethical constraints. These findings highlight an unresolved need to preserve important scientific datasets and increase their accessibility to the scientific community.","journal":"Royal Society Open Science","year":2024,"id":7734,"datarank":0.10397207708399181,"base_score":0.6931471805599453,"endowment":0.6931471805599453,"self_citation_contribution":0.10397207708399181,"citation_network_contribution":0.0,"self_endowment_contribution":0.10397207708399181,"citer_contribution":0.0,"corpus_percentile":37.91700569568755,"corpus_rank":716,"citation_count":1,"citer_count":1,"citers_with_citation_signal":0,"citers_with_endowment":0,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.8894,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2024-05-01","fair_score":37.2917,"fair_percentile":18.68953386103782,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":52823,"name":"Steven Michael Crane","orcid":"0000-0001-5385-2926","position":1,"is_corresponding":false},{"id":875,"name":"Tom Elis Hardwicke","orcid":"0000-0001-9485-4952","position":2,"is_corresponding":false},{"id":148,"name":"John P. A. Ioannidis","orcid":"0000-0003-3118-6859","position":3,"is_corresponding":false},{"id":52822,"name":"Coby Dulitzki","orcid":"0000-0002-7455-5397","position":0,"is_corresponding":true}],"reference_count":28,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":"39076822","pmcid":"PMC11285638","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":40.0,"fair_a":55.0,"fair_i":12.5,"fair_r":41.6667,"fair_zscore":-0.7164,"fair_rationale":{"fair_score":37.29,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":40.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.0,"signal":null,"rationale":"No description of machine-readable metadata (e.g., schema.org, structured terms) for the paper's own datasets or those placed in the Data Ark."}]},"A":{"name":"Accessible","score":55.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper provides direct links to its own data and code (OSF and Code Ocean), but the access protocol for the 7 datasets placed in the Data Ark is not specified, and the retrieval process for the 160 studied datasets relied on email requests without a stated automated access mechanism."}]},"I":{"name":"Interoperable","score":12.5,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The paper reports using broad disciplinary categories (medical/non-medical) and offers a CSV-like table (Table 1) but does not mention use of standard formats, controlled vocabularies, or persistent identifiers beyond DOIs for the article itself."}]},"R":{"name":"Reusable","score":41.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.667,"signal":null,"rationale":"The paper includes a data-availability statement linking to its own data and code under a CC BY license, but does not verify the reusability of the 7 contributed datasets, nor does it provide documentation beyond what is implied by 'data, materials, and analysis scripts'."}]}},"suggestions":["Add structured metadata (e.g., JSON-LD with schema.org/Dataset) to the OSF repository to improve findability.","Specify explicit, machine-readable access protocols for each dataset in the Data Ark, e.g., whether download requires authentication.","Use standard data formats (e.g., CSV, JSON, or SPSS/Stata) and controlled vocabularies (e.g., from DDI or CEDAR) for all shared datasets.","Include a README file describing variable meanings, units, and collection methods for each contributed dataset to enhance reusability.","Provide a license and DOI for each individual dataset in the Data Ark, not just a project-level DOI."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v1","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v1","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-17T23:01:23.908540Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}