{"doi":"10.1101/2023.02.20.528987","title":"DeepHeme: A generalizable, bone marrow classifier with hematopathologist-level performance","abstract":"<h4>ABSTRACT</h4> Morphology-based classification of cells in the bone marrow aspirate (BMA) is a key step in the diagnosis and management of hematologic malignancies. However, it is time-intensive and must be performed by expert hematopathologists and laboratory professionals. We curated a large, high-quality dataset of 41,595 hematopathologist consensus-annotated single-cell images extracted from BMA whole slide images (WSIs) containing 23 morphologic classes from the clinical archives of the University of California, San Francisco. We trained a convolutional neural network, DeepHeme, to classify images in this dataset, achieving a mean area under the curve (AUC) of 0.99. DeepHeme was then externally validated on WSIs from Memorial Sloan Kettering Cancer Center, with a similar AUC of 0.98, demonstrating robust generalization. When compared to individual hematopathologists from three different top academic medical centers, the algorithm outperformed all three. Finally, DeepHeme reliably identified cell states such as mitosis, paving the way for image-based quantification of mitotic index in a cell-specific manner, which may have important clinical applications.","journal":null,"year":2023,"id":3160,"datarank":0.25888917560344715,"base_score":1.3862943611198906,"endowment":1.3862943611198906,"self_citation_contribution":0.20794415416798362,"citation_network_contribution":0.050945021435463506,"self_endowment_contribution":0.20794415416798362,"citer_contribution":0.050945021435463506,"corpus_percentile":45.80960130187144,"corpus_rank":667,"citation_count":4,"citer_count":4,"citers_with_citation_signal":3,"citers_with_endowment":3,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.6753,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2023-02-21","fair_score":36.25,"fair_percentile":17.963940193491645,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":6118,"name":"Shenghuan Sun","orcid":"0000-0002-4339-2716","position":1,"is_corresponding":false},{"id":55973,"name":"Jacob G. Van Cleave","orcid":"0009-0006-0900-9222","position":2,"is_corresponding":false},{"id":34511,"name":"Linlin Wang","orcid":"0000-0001-5967-5190","position":3,"is_corresponding":false},{"id":34512,"name":"Fabienne Lucas","orcid":"0000-0002-4388-0349","position":4,"is_corresponding":false},{"id":34513,"name":"Laura Brown","orcid":"0000-0002-9012-6680","position":5,"is_corresponding":false},{"id":34514,"name":"Jacob D. Spector","orcid":"0000-0003-3897-9552","position":6,"is_corresponding":false},{"id":34515,"name":"Leonardo Boiocchi","orcid":"0000-0001-5188-6217","position":7,"is_corresponding":false},{"id":34516,"name":"Jeeyeon Baik","orcid":null,"position":8,"is_corresponding":false},{"id":34517,"name":"Menglei Zhu","orcid":"0000-0002-6623-9431","position":9,"is_corresponding":false},{"id":34518,"name":"Orly Ardon","orcid":"0000-0001-8147-933X","position":10,"is_corresponding":false},{"id":34519,"name":"Chuanyi M. Lu","orcid":"0000-0002-5906-2543","position":11,"is_corresponding":false},{"id":34521,"name":"Dmitry B. Goldgof","orcid":"0000-0001-5461-863X","position":13,"is_corresponding":false},{"id":34522,"name":"Iain Carmichael","orcid":"0000-0001-7239-035X","position":14,"is_corresponding":false},{"id":34523,"name":"Sonam Prakash","orcid":"0000-0002-3853-3836","position":15,"is_corresponding":false},{"id":51,"name":"Atul Janardhan Butte","orcid":"0000-0002-7433-2740","position":16,"is_corresponding":false},{"id":34524,"name":"Ahmet Doǧan","orcid":"0000-0001-6576-5256","position":17,"is_corresponding":false},{"id":34509,"name":"Gregory M. Goldgof","orcid":"0000-0001-8732-9834","position":0,"is_corresponding":true}],"reference_count":45,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":"36865216","pmcid":"PMC9979993","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"green","license":"cc-by-nc-nd","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":52.5,"fair_a":55.0,"fair_i":12.5,"fair_r":25.0,"fair_zscore":-0.8106,"fair_rationale":{"fair_score":36.25,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":52.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"No machine-readable metadata is provided; only a promise to publish images upon acceptance."}]},"A":{"name":"Accessible","score":55.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"Code is accessible on GitHub but lacks a license; data is not yet available and has no clear access protocol beyond a future repository."}]},"I":{"name":"Interoperable","score":12.5,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"Images use PNG but no standard vocabularies (e.g., Cell Ontology) or persistent identifiers are used."}]},"R":{"name":"Reusable","score":25.0,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.333,"signal":null,"rationale":"Data availability statement is non-specific; license is restrictive (CC BY-NC-ND) and code license absent; reproducibility is hindered by unavailable data."}]}},"suggestions":["Release the cell image dataset with a persistent identifier (DOI) as soon as possible","Add machine-readable metadata using schema.org or DCAT to describe the dataset","Include a clear open-source license (e.g., MIT, BSD) for the code on GitHub","Map cell classes to standard ontologies (e.g., CL or Cell Ontology) to enhance interoperability","Provide a formal data citation in the paper with a stable repository link"],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:53:20.587145Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}