{"doi":"10.1186/1471-2105-7-152","title":"Domain-based small molecule binding site annotation","abstract":"<h4>Background</h4>Accurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID), a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB). More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites.<h4>Description</h4>Using a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST) algorithm. SMID records are available for viewing at http://smid.blueprint.org. The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60%) of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives.<h4>Conclusion</h4>By focusing on protein domain-small molecule interactions, SMID is able to cluster similar interactions and detect subtle binding patterns that would not otherwise be obvious. Using SMID-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the PDB. Validation results and specific examples within illustrate that SMID-BLAST has a high degree of accuracy in terms of predicting both the small molecule ligand and binding site residue positions for a query protein.","journal":"BMC Bioinformatics","year":2006,"id":1642,"datarank":1.6194395919481415,"base_score":3.332204510175204,"endowment":3.332204510175204,"self_citation_contribution":0.49983067652628066,"citation_network_contribution":1.119608915421861,"self_endowment_contribution":0.49983067652628066,"citer_contribution":1.119608915421861,"corpus_percentile":62.73393002441009,"corpus_rank":459,"citation_count":27,"citer_count":25,"citers_with_citation_signal":24,"citers_with_endowment":24,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.7255,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2006-03-17","fair_score":38.3333,"fair_percentile":19.217238346525946,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":15786,"name":"Howard J. Feldman","orcid":null,"position":1,"is_corresponding":false},{"id":74,"name":"Michel J. Dumontier","orcid":"0000-0003-4727-9435","position":2,"is_corresponding":false},{"id":18777,"name":"John J Salama","orcid":null,"position":3,"is_corresponding":false},{"id":1313,"name":"Christopher W.V. Hogue","orcid":"0000-0002-3107-5246","position":4,"is_corresponding":false},{"id":18776,"name":"Kevin A. Snyder","orcid":null,"position":0,"is_corresponding":true}],"reference_count":71,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":"16545112","pmcid":"PMC1435939","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":40.0,"fair_a":55.0,"fair_i":25.0,"fair_r":33.3333,"fair_zscore":-0.6222,"fair_rationale":{"fair_score":38.33,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":40.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.0,"signal":null,"rationale":"No machine-readable metadata, structured annotations, or semantic markup are mentioned in the paper text."}]},"A":{"name":"Accessible","score":55.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper states SMID is freely accessible via a web interface and data can be downloaded as tab-delimited files from an FTP server, but no formal protocol, API, or persistent identifiers are provided, and the command-line tool requires a license for commercial use."}]},"I":{"name":"Interoperable","score":25.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.5,"signal":null,"rationale":"The paper uses standard formats (FASTA, ASN.1, GenPept, MySQL) and common identifiers (PDB, GI, CDD), but does not explicitly adopt community-standard ontologies or vocabularies for the domain-small molecule interactions."}]},"R":{"name":"Reusable","score":33.33,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.5,"signal":null,"rationale":"The paper provides an open-access license (CC BY 2.0), includes a detailed validation methodology, and offers downloadable data, but lacks a formal data-availability statement for all underlying datasets, does not provide code in a public repository, and gives no explicit license for the software beyond a note that commercial users need a license."}]}},"suggestions":["Provide machine-readable metadata (e.g., JSON-LD or RDF) for the database and data files to enhance Findability.","Deposit the SMID-BLAST source code in a public repository (e.g., GitHub) with a clear open-source license to improve Accessibility and Reusability.","Adopt community-standard ontologies (e.g., CHEBI, GO) and persistent identifiers (e.g., DOIs for datasets) to boost Interoperability.","Include a formal data-availability statement specifying exactly which datasets are available, where, and under what conditions.","Provide an API or programmatic access method (e.g., RESTful endpoint) for the SMID database to ease automated Access."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:45:46.052047Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}