{"doi":"10.5281/zenodo.1435833","title":"Node connectivity measurements for Hetionet v1.0 metapaths","abstract":"Hetionet v1.0 is a hetnet (heterogeneous network) with 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. This record contains computed connectivity measurements for Hetionet v1.0 for all metapaths (types of paths) up to length 3. These measurements are designed to assess the extent of connectivity between two nodes along a given metapath. Several types of data are included: <strong>Path counts</strong>: Path counts measure the number of paths from a source node to a target node along a specified metapath. The path count is a special case of the degree-weighted path count (DWPC) metric where the damping exponent parameter is set to 0.0. Path counts for all source–target node combinations of a given metapath are stored in a matrix with source nodes as rows and target nodes as columns. <strong>Degree-weighted path counts</strong>: DWPCs measure the abundance of paths from a source to target node along a given metapath (like path counts), but are adjusted for the degrees along the path such that paths through higher degree nodes are downweighted according to a damping parameter. The DWPCs here use a damping exponent of 0.5 and the same matrix serialization as the path count datasets. The values are not scaled/transformed. To compare to the null DWPCs discussed below, divide each value by the mean DWPC for the entire matrix and apply an inverse hyperbolic sine transformation. <strong>Degree-grouped permutation summaries</strong>: Degree-grouped permutations (DGP) are used to compute the significance of DWPC values. Specifically, they are used to estimate null distribution for DWPCs from the unpermuted hetnet. DGP summaries provide summary statistics of DWPCs computed on permuted hetnets. The permuted hetnets are derived from Hetionet v1.0 using the XSwap algorithm. This approach preserves node degree but randomizes edges to muddle their meaning. DWPCs were computed for 200 permuted networks and grouped by source–target node degree within each metapath. Permuted DWPCs were scaled by dividing by the unpermuted DWPC mean and then inverse hyperbolic sine transformed. Every degree pair for a given metapath has corresponding statistics that summarize its values across permuted hetnets. These statistics include the number of observed DWPCs, the number of nonzero DWPCs, the sum of the DWPCs, and the sum of squared DWPCs. These values are sufficient to calculate the parameters of a gamma-hurdle null DWPC distribution. <strong>Data Format</strong>: the .zip files are HetMat archive files. This simply means that the directory structure and file formats of the archived files conform to the HetMat data structure for storing hetnets on disk. Matrices are stored as scipy.sparse .npz files. .npz is a numpy array serialization format that scipy uses to write sparse matrices to disk. TSV files in this upload report information on the contents of the archives. The .zip-info.tsv files contain a list of all files included in the zip archives. metapath-dwpc-stats.tsv contains summary information on the unpermuted path counts and DWPCs. Note that results are archived by path length, such that all metapaths of length 1 are in a different archive than metapaths of length 2. Therefore, users who only need results for shorter metapaths, do not need to download the large archives for longer metapaths. There are 24 metapaths of length 1, 242 metapaths of length 2, and 1939 metapaths of length 3. <strong>Connectivity Search Database</strong>: connectivity-search-pg_dump.sql.gz is a PostgreSQL database dump for use with the connectivity-search-backend repository. <strong>Source code</strong>: These datasets were computed by the bulk.ipynb notebook from greenelab/hetmech@34e95b9. <strong>Funding</strong>: This work was supported through a research collaboration with Pfizer Worldwide Research and Development. This work is funded in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grants GBMF4552 and GBMF4560. <strong>More information</strong>: See the manuscript titled Hetnet connectivity search provides rapid insights into how two biomedical entities are related.","journal":"Zenodo (CERN European Organization for Nuclear Research)","year":2018,"id":2204,"datarank":0.10397207708399181,"base_score":0.6931471805599453,"endowment":0.6931471805599453,"self_citation_contribution":0.10397207708399181,"citation_network_contribution":0.0,"self_endowment_contribution":0.10397207708399181,"citer_contribution":0.0,"corpus_percentile":37.91700569568755,"corpus_rank":716,"citation_count":1,"citer_count":0,"citers_with_citation_signal":0,"citers_with_endowment":0,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.888,"is_oa":true,"file_count":14,"downloads":1715,"has_version_chain":false,"published_date":"2018-11-06","fair_score":34.5833,"fair_percentile":16.402814423922603,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":1207,"name":"Michael Zietz","orcid":"0000-0003-0539-630X","position":1,"is_corresponding":false},{"id":1209,"name":"Kyle Kloster","orcid":"0000-0001-5678-7197","position":2,"is_corresponding":false},{"id":1211,"name":"Michael W. Nagle","orcid":"0000-0002-4677-7582","position":3,"is_corresponding":false},{"id":27306,"name":"Blair D. Sullivan","orcid":"0000-0001-7720-6208","position":4,"is_corresponding":false},{"id":308,"name":"Casey S. Greene","orcid":"0000-0001-8713-9213","position":5,"is_corresponding":false},{"id":1208,"name":"Daniel S. Himmelstein","orcid":"0000-0002-3012-7446","position":0,"is_corresponding":false}],"reference_count":0,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"green","license":"public-domain","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":37.0,"fair_a":58.0,"fair_i":10.0,"fair_r":33.3333,"fair_zscore":-0.9614,"fair_rationale":{"fair_score":34.58,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":37.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"datacite=0, pmcid=False, pmid=False","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.25,"signal":null,"rationale":"The text describes data types and formats but lacks machine-readable metadata such as structured metadata files or schema.org markup."}]},"A":{"name":"Accessible","score":58.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The text mentions data archives and a database dump but does not specify a clear access protocol like a repository URL or download link."}]},"I":{"name":"Interoperable","score":10.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"Standard file formats (npz, TSV, SQL) are used, but there is no mention of standard vocabularies or identifiers for node types or metapaths."}]},"R":{"name":"Reusable","score":33.33,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"downloads=1715","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.333,"signal":null,"rationale":"Source code is referenced with a commit hash, but no license is provided and no explicit data availability statement is included."}]}},"suggestions":["Add a machine-readable metadata file (e.g., JSON-LD) describing the dataset structure and content.","Provide a direct download link or clear access instructions for the data archives.","Use standard vocabularies (e.g., identifiers.org) for node types and metapaths to enhance interoperability.","Include a license (e.g., CC0 for data, MIT for code) to clarify reuse permissions.","Add a formal data availability statement specifying where and how the data can be accessed."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"abstract_only"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"abstract_only","fair_has_llm":true,"fair_computed_at":"2026-06-18T04:57:04.877768Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}