| TDC ADMET Group | Lead ID / ADMET | 100.0 | Most-adopted ADMET benchmark. 100+ leaderboard submissions. |
| SAbDab | Hit IDLead ID / ADMETDevelopmental Candidate | 100.0 | Canonical antibody structure resource. Weekly updates. |
| Observed Antibody Space (OAS) | Hit IDLead ID / ADMET | 97.5 | Underlies AbLang, IgLM, AntiBERTa — industry-adopted. |
| PoseBusters | Hit ID | 97.0 | Exposed major failure modes in AlphaFold-Multimer/DiffDock/RFAA. Default pharma filter. |
| PLINDER | Hit ID | 97.0 | Replaces PDBbind as the modern leakage-controlled docking standard. |
| PLINDER v2 Protein-Ligand Benchmark | Hit ID | 97.0 | PLINDER is consistently cited as the go-to replacement for PDBbind in modern docking evaluation. |
| STRING | Target IDDisease Modeling | 94.9 | Workhorse for network-based target ID. Distinguish functional vs physical edges. |
| CASP15 | Hit IDTarget ID | 94.9 | Biennial. Introduced ligand prediction category. |
| CASP16 | Hit ID | 94.4 | First full multimer+ligand+RNA joint eval. |
| CAMEO weekly targets | Hit ID | 94.4 | Weekly cadence complements biennial CASP. |
| Boltz-1 Structure Prediction Benchmark | Hit ID | 94.4 | Open-source companion to commercial structure predictors; benchmark splits audited against AlphaFold 3 leakage. |
| ORD Reaction Benchmark | Developmental Candidate | 93.9 | Modern open reaction corpus; industry-scale. |
| Open Problems: Perturbation Prediction | Virtual Cell | 91.9 | Best-in-class rigor (Viash workflow, hidden test, NeurIPS track). |
| PrimeKG | Disease ModelingTarget ID | 91.9 | Modern, well-engineered KG; strong for GNN drug repurposing. |
| FAERS (raw) | Post-market / RWE | 91.1 | Known under-/over-reporting biases. |
| scPerturb | Virtual CellTarget ID | 88.9 | Canonical harmonized resource. Strong Perturb-seq coverage; weaker for chemical perturbations. |
| PINDER | Hit ID | 88.9 | Expected PPI docking standard. |
| Practical Molecular Optimization (PMO) | Lead ID / ADMETDevelopmental Candidate | 88.9 | Sample-efficiency focus exposed shortcomings of reward-maxing methods. |
| CoV-AbDab | Hit ID | 88.9 | Narrow modality but critical for pandemic-preparedness ML. |
| ISM Benchmarks: GPCRs (Insilico) | Hit IDLead ID / ADMET | 87.6 | Largest open GPCR affinity benchmark. Leaderboards test external frontier LLMs — not self-referential. |
| CAPRI Rounds | Hit ID | 86.3 | Oldest PPI prediction benchmark. |
| ToxCast | Lead ID / ADMETIND-enabling | 85.6 | Regulatory-grade broad tox dataset. |
| GNNBench-Drug 2026 | Hit IDLead ID / ADMET | 85.6 | IBM-led; overlaps with MoleculeNet but adds modern splits. |
| CAFA5 | Target ID | 84.3 | CAFA5 broke attendance records. |
| MoleculeACE | Lead ID / ADMET | 83.3 | Critical stress-test for generalization; exposed GNN weaknesses. |
| MatBench | Developmental Candidate | 83.3 | Materials-science benchmark; relevant for formulation / co-crystal work. |
| OffSides / TWOSIDES | Post-market / RWE | 83.0 | Key benchmark for DDI + adverse event ML. |
| DrugComb 2.0 Synergy Benchmark | Lead ID / ADMETDevelopmental Candidate | 83.0 | Industry-relevant for combination oncology. |
| DMPK Integrated Benchmark | Lead ID / ADMETDevelopmental Candidate | 82.5 | AZ/Merck/Pfizer contributed held-out test molecules. |
| DOCKSTRING | Hit ID | 81.3 | Vina scores are a proxy; not a replacement for wet assays. |
| DisGeNET | Disease ModelingTarget ID | 81.0 | Commercial license required for industry. Text-mining noise limits quality. |
| LIT-PCBA | Hit ID | 80.8 | Much fairer than DUD-E; small target count limits coverage. |
| FLIP | Target IDDevelopmental Candidate | 80.8 | Complements ProteinGym (smaller but carefully designed splits). |
| GuacaMol | Lead ID / ADMETDevelopmental Candidate | 80.5 | First-generation generative benchmark; largely superseded by PMO for goal-directed. |
| Open Systems Pharmacology / PK-Sim | Phase IIND-enabling | 80.3 | Open alternative to Simcyp. |
| ADMET-AI | Lead ID / ADMET | 79.5 | Strong baselines + web tool; builds on TDC. |
| AMES (mutagenicity) | IND-enablingLead ID / ADMET | 79.5 | Core gentox endpoint. |
| scImmuneBench | Virtual CellDisease Modeling | 79.5 | Useful for cell-therapy companies evaluating immune foundation models. |
| MoleculeNet | Lead ID / ADMETHit ID | 78.0 | Widely cited (3600+); aging splits with known scaffold leakage. |
| USPTO-50K / USPTO-MIT (Retrosynthesis) | Lead ID / ADMETDevelopmental Candidate | 78.0 | Known leakage across canonical splits; use time-split or ORD for fairer eval. |
| Tox21 | Lead ID / ADMETIND-enabling | 77.5 | Field-standard tox benchmark; endpoint count small vs modern suites. |
| Obach PK Dataset | Phase IIND-enablingLead ID / ADMET | 77.0 | Small but highest-quality human-PK dataset. |
| CASF-2016 | Hit ID | 76.2 | Authoritative scoring-power eval; update cadence slow. |
| PDBbind | Hit IDLead ID / ADMET | 75.9 | Scaffold/temporal leakage well-documented. Pair with CASF + LeakyPDB. |
| SIDER | Post-market / RWEIND-enabling | 74.9 | Aging but still widely used. TWOSIDES/OffSides offer newer signals. |
| TAPE | Target IDDevelopmental Candidate | 74.9 | Historically important; largely superseded by ProteinGym/FLIP for fitness and by PEER for broader tasks. |
| Simcyp Validation Sets | Phase IPhase IIIND-enabling | 74.4 | Industry gold standard but proprietary. Open benchmarks exist via OSP Suite. |
| PEER | Target IDDevelopmental Candidate | 74.4 | Broader than TAPE, tighter than ProteinGym; good middle ground. |
| ClawBio Skill Correctness Bench | Disease ModelingTarget IDClinical Development | 74.2 | Independent third-party bench structurally precludes self-reference. Coverage narrow but rigor exemplary. |
| hERG (cardio-tox) TDC | IND-enablingLead ID / ADMET | 73.9 | Small but widely benchmarked. Industry pairs with SafetyPanel-5. |
| DILI / LD50 Zhu | IND-enablingLead ID / ADMET | 73.9 | Essential IND-enabling endpoints. |
| DUD-E | Hit ID | 72.9 | Well-known analog bias in decoy selection; use LIT-PCBA / PLINDER for fair VS. |
| MOSES | Lead ID / ADMETDevelopmental Candidate | 72.4 | Distribution-learning metrics known to saturate. |
| PerturbBench | Virtual Cell | 71.4 | Pharma-led (Genentech); well-specified eval. |
| ClinTox | Lead ID / ADMETIND-enabling | 65.6 | Small, binary; saturated. Useful only as sanity check. |
| DEKOIS 2.0 | Hit ID | 57.5 | Historical reference; use LIT-PCBA / PLINDER for modern VS. |