{"doi":"10.17632/btchxktzyw.2","title":"Data for \"Updated science-wide author databases of standardized citation indicators\"","abstract":"Citation metrics are widely used and misused. We have created a publicly available database of 100,000 top-scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator. Separate data are shown for career-long and single year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 176 sub-fields. Field- and subfield-specific percentiles are also provided for all scientists who have published at least 5 papers. Career-long data are updated to end-of-2019. \\n\\nThe dataset and code provides an update to previously released (version 1) data under https://doi.org/10.17632/btchxktzyw.1; The version 2 dataset is based on the May 06, 2020 snapshot from Scopus and is updated to citation year 2019. In addition to the time period and datacut update, it provides a longer list of authors: it also includes the top 2% for every subfield.","journal":null,"year":2020,"id":11330,"datarank":0.7786988116190094,"base_score":2.3978952727983707,"endowment":2.3978952727983707,"self_citation_contribution":0.3596842909197557,"citation_network_contribution":0.4190145206992537,"self_endowment_contribution":0.3596842909197557,"citer_contribution":0.4190145206992537,"corpus_percentile":55.817737998372664,"corpus_rank":544,"citation_count":10,"citer_count":8,"citers_with_citation_signal":5,"citers_with_endowment":5,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9453,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2020-01-01","fair_score":32.0833,"fair_percentile":13.962181178540018,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":11483,"name":"Jeroen Baas","orcid":"0000-0001-8005-4153","position":0,"is_corresponding":false}],"reference_count":0,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":37.0,"fair_a":58.0,"fair_i":10.0,"fair_r":23.3333,"fair_zscore":-1.1875,"fair_rationale":{"fair_score":32.08,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":37.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"datacite=0, pmcid=False, pmid=False","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The text only provides a natural-language description; no mention of structured, machine-readable metadata (e.g., JSON-LD, XML, schema.org) is made."}]},"A":{"name":"Accessible","score":58.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper mentions the dataset is publicly available via a DOI (10.17632/btchxktzyw.2), but does not specify the exact access protocol (e.g., direct download, API) or authentication requirements."}]},"I":{"name":"Interoperable","score":10.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"It uses standard fields (citations, h-index, etc.) and a top-2% subfield classification, but no explicit mention of controlled vocabularies or persistent identifiers for fields/subfields."}]},"R":{"name":"Reusable","score":23.33,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.5,"signal":null,"rationale":"A data-availability statement with a versioned DOI is present, but no explicit license is stated, and there is no description of code dependencies or reproducibility steps."}]}},"suggestions":["Add structured machine-readable metadata (e.g., JSON-LD) to the paper describing the dataset's schema and provenance.","Specify the access protocol (e.g., direct download URL, authentication-free REST API) for the dataset.","Cite standard vocabularies (e.g., ORCID for authors, Scopus subject codes) and include persistent identifiers for fields/subfields.","State an explicit open license (e.g., CC-BY or CC0) for both data and code.","Provide a documented computational workflow or container (e.g., Dockerfile) for reproducible generation of the dataset."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"abstract_only"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"abstract_only","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:50:04.214493Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}