Initiatives (Meta-platforms, Consortia, Competitions)

31 initiatives collectively track 1,749 benchmarks. Sorted by benchmarks tracked. Click any header to re-sort.

#InitiativeKindBenchmarks trackedAs ofHostScoreLinks
1Therapeutics Data Commons (TDC)meta-platform832026-05-12Zitnik Lab, Harvard Medical School (+ MIT, Stanford, Georgia Tech collaborators)100.0site ยท gh
2CASP (Critical Assessment of Structure Prediction)competition162026-05-12Prediction Center, UC Davis100.0site ยท gh
3ProteinGymmeta-platform2172026-05-12Marks Lab (Harvard) + OATML (Oxford) + DeepMind97.5site ยท gh
4ELIXIR Infrastructureconsortium182026-05-12EMBL-EBI + 23 EU member nodes97.5site ยท gh
5CAMEOcompetition42026-05-12Biozentrum Basel + SIB94.4site ยท gh
6PoseBusters Evaluation Suitemeta-platform32026-05-12Oxford OPIG (Deane Lab)93.9site ยท gh
7PLINDER / PINDERmeta-platform22026-05-12Biozentrum Basel + VantAI + Isomorphic Labs + EPFL93.9site ยท gh
8Open Problems in Single-Cell Analysisconsortium292026-05-12CZI + Helmholtz Munich + Yale + HMS91.9site ยท gh
9Polaris Hubmeta-platform482026-05-12Polaris consortium (Valence Labs, Recursion, Novartis, Pfizer, Merck, AstraZeneca)91.4site ยท gh
10ScienceAIBenchmeta-platform2272026-05-12Insilico Medicine90.6site ยท gh
11DREAM Challengescompetition742026-05-12Sage Bionetworks + IBM + academic partners89.4site ยท gh
12MIMIC-IV / eICUdata-platform142026-05-12MIT Lab for Computational Physiology89.4site ยท gh
13CZI Virtual Cell / CellxGene / VCCconsortium122026-05-12Chan Zuckerberg Initiative / CZ Biohub88.9site ยท gh
14Open Reaction Database (ORD)data-platform12026-05-12ORD consortium (Doyle, Coley, Pfizer, Merck, BASF)88.9site ยท gh
15Drug Discovery Benchmarks (DDB)meta-platform2062026-05-12Insilico Medicine87.6site ยท gh
16CAFAcompetition62026-05-12Radivojac / Friedberg / Jiang consortium86.8site ยท gh
17CAPRIcompetition562026-05-12EBI + CCP486.3site ยท gh
18FAERS / SIDER / OffSides / TWOSIDESdata-platform42026-05-12FDA CDER + Tatonetti Lab85.6site ยท gh
19InsilicoBenchmeta-platform1622026-05-12Insilico Medicine84.6site ยท gh
20FLIPmeta-platform152026-05-12Rostlab TUM + AlQuraishi Lab Columbia80.8site ยท gh
21CPTACconsortium102026-05-12NCI Office of Cancer Clinical Proteomics Research80.8site ยท gh
22DeepChemmeta-platform402026-05-12DeepChem community80.0site ยท gh
23MoleculeNetmeta-platform172026-05-12DeepChem community (Pande Lab alumni)78.0site ยท gh
24TrialBench / HINT / TOPmeta-platform42026-05-12Fu/Sun Lab, Georgia Tech + HMS76.5site ยท gh
25ClawBio Benchmarksmeta-platform102026-05-03ClawBio (open source, MIT)74.2site ยท gh
26EU-OPENSCREEN / EUbOPENconsortium52026-05-12EU-OPENSCREEN ERIC + IMI EUbOPEN73.9site ยท gh
27PKU-AIDD / ChinaDrug Benchmarksconsortium72026-05-12PKU + SIMM CAS + Tsinghua + Baidu + Huawei73.9site ยท gh
28Kaggle โ€” Pharma / Bio Competitionscompetition232026-05-12Google / Kaggle + sponsoring companies71.9site ยท gh
29Papers With Code โ€” Drug Discoverymeta-platform1202026-05-12Meta AI / Papers With Code community71.6site ยท gh
30PDBbind / CASFmeta-platform62026-05-12SIMM, Chinese Academy of Sciences70.4site ยท gh
31HuggingFace โ€” Bio/Chem Datasetsdata-platform3102026-05-12HuggingFace + community uploaders67.8site ยท gh

Per-initiative detail

Therapeutics Data Commons (TDC) โ€” 83 benchmarks tracked (as of 2026-05-12)

Open-science platform curating ML datasets/tasks across the drug discovery pipeline with unified API, splits, and leaderboards.

Kind
meta-platform
Host
Zitnik Lab, Harvard Medical School (+ MIT, Stanford, Georgia Tech collaborators)
Founded
2021-02
License model
MIT (code); per-dataset licenses for data
URL
https://tdcommons.ai/
GitHub
https://github.com/mims-harvard/TDC
Composite score
100.0
Flags
none

Count methodology: Scraped tdcommons.ai single_pred/multi_pred/generation overview pages 2026-05-12: single-pred ~38 datasets (ADME/Tox/HTS/QM/Yields/Epitope/Develop/CRISPROutcome), multi-pred ~32 datasets (DTI/DDI/PPI/GDA/DrugRes/DrugSyn/PeptideMHC/AntibodyAff/MTI/Catalyst/TCREpitope/TrialOutcome/ProteinPeptide/PerturbOutcome/scDTI), generation ~13 (MolGen/RetroSyn/Reaction/SBDD). 8 named leaderboard groups.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Most comprehensive ML-ready therapeutics benchmark hub. NeurIPS 2021 + Nat Chem Bio 2022.

CASP (Critical Assessment of Structure Prediction) โ€” 16 benchmarks tracked (as of 2026-05-12)

Biennial blind evaluation of protein structure prediction; drove AlphaFold's validation.

Kind
competition
Host
Prediction Center, UC Davis
Founded
1994
License model
Public
URL
https://predictioncenter.org/
GitHub
N/A
Composite score
100.0
Flags
none

Count methodology: predictioncenter.org archives: CASP1 (1994) through CASP16 (2024) = 16 editions; ~100 targets ร— ~5 categories per edition.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Historical gold standard for blind evaluation. CASP15 added ligands; CASP16 added multimer + RNA.

ProteinGym โ€” 217 benchmarks tracked (as of 2026-05-12)

Large-scale benchmark for protein fitness prediction from DMS + clinical variant effects.

Kind
meta-platform
Host
Marks Lab (Harvard) + OATML (Oxford) + DeepMind
Founded
2022
License model
MIT
URL
https://proteingym.org/
GitHub
https://github.com/OATML-Markslab/ProteinGym
Composite score
97.5
Flags
none

Count methodology: ProteinGym v1.2 README + NeurIPS 2023 paper: 217 DMS substitution assays + 66 indel assays + 2525 ClinVar clinical variants.

Breakdown

Individually catalogued benchmarks hosted here

Notes

De facto standard for variant effect prediction. Clinical track enables ESM/EVE/AlphaMissense fair comparison.

ELIXIR Infrastructure โ€” 18 benchmarks tracked (as of 2026-05-12)

European life-science data infrastructure hosting benchmark-relevant resources (UniProt, Ensembl, ChEMBL, PDBe, IntAct).

Kind
consortium
Host
EMBL-EBI + 23 EU member nodes
Founded
2013
License model
Mostly CC-BY
URL
https://elixir-europe.org/
GitHub
N/A
Composite score
97.5
Flags
none

Count methodology: ELIXIR Core Data Resources list 2026-05: ~18 resources with benchmark/leaderboard components.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Meta-resource of meta-resources.

CAMEO โ€” 4 benchmarks tracked (as of 2026-05-12)

Continuous weekly blind eval of protein 3D / multimer / ligand prediction using pre-release PDB structures.

Kind
competition
Host
Biozentrum Basel + SIB
Founded
2013
License model
CC-BY 4.0
URL
https://www.cameo3d.org/
GitHub
N/A
Composite score
94.4
Flags
none

Count methodology: cameo3d.org 2026-05: 4 active categories โ€” 3D monomer, 3D multimer, model quality, ligand pocket; ~1000 targets/year.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Excellent continuous cadence complementing CASP.

PoseBusters Evaluation Suite โ€” 3 benchmarks tracked (as of 2026-05-12)

Physics-aware validation of docking/co-folding poses; 19 checks + curated test sets.

Kind
meta-platform
Host
Oxford OPIG (Deane Lab)
Founded
2023-08
License model
BSD-3-Clause
URL
https://posebusters.readthedocs.io/
GitHub
https://github.com/maabuu/posebusters
Composite score
93.9
Flags
none

Count methodology: GitHub README: PoseBusters v1 (308 complexes), v2 (428), Astex Diverse Set (85) = 3 canonical suites.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Changed pose-prediction evaluation norms; default pharma filter now.

PLINDER / PINDER โ€” 2 benchmarks tracked (as of 2026-05-12)

Leakage-controlled protein-ligand (PLINDER) and protein-protein (PINDER) docking datasets.

Kind
meta-platform
Host
Biozentrum Basel + VantAI + Isomorphic Labs + EPFL
Founded
2024-07
License model
CC-BY 4.0
URL
https://www.plinder.sh/
GitHub
https://github.com/plinder-org/plinder
Composite score
93.9
Flags
none

Count methodology: plinder.sh + pinder.sh: 2 major benchmarks (PLINDER 460k systems, PINDER 267k systems).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Replacing PDBbind/CASF for modern docking ML eval.

Open Problems in Single-Cell Analysis โ€” 29 benchmarks tracked (as of 2026-05-12)

Community benchmark suite for single-cell analysis with reproducible Viash/Nextflow pipelines and NeurIPS tracks.

Kind
consortium
Host
CZI + Helmholtz Munich + Yale + HMS
Founded
2021-06
License model
MIT
URL
https://openproblems.bio/
GitHub
https://github.com/openproblems-bio/openproblems
Composite score
91.9
Flags
none

Count methodology: openproblems.bio task registry + Luecken et al. Nat Biotech 2025: 29 benchmark tasks (batch integration, denoising, dim-reduction, label projection, perturbation, spatial, multimodal).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Gold-standard single-cell benchmarking rigor; Nat Biotech 2025.

Polaris Hub โ€” 48 benchmarks tracked (as of 2026-05-12)

Industry-curated small-molecule benchmarks with working groups on method-comparison standards.

Kind
meta-platform
Host
Polaris consortium (Valence Labs, Recursion, Novartis, Pfizer, Merck, AstraZeneca)
Founded
2023-10
License model
CC-BY or Polaris Community License per benchmark
URL
https://polarishub.io/
GitHub
https://github.com/polaris-hub/polaris
Composite score
91.4
Flags
none

Count methodology: polarishub.io/benchmarks public listing 2026-05: ~48 public benchmarks across Recursion, Valence, Novartis, AstraZeneca, Polaris Small Molecule Steering Committee orgs.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Industry-led counterweight to academic benchmarks. Strong on method-comparison rigor.

ScienceAIBench โ€” 227 benchmarks tracked (as of 2026-05-12)

Insilico Medicine's public scientific-AI benchmark portal. Spans biology (longevity, target ID), affinity/binding, ADMET, clinical trials, biologics, materials; leaderboards benchmark frontier LLMs (GPT-5.x, Claude Opus/Sonnet 4.x, Gemini 3, Grok 4.1, DeepSeek v3.2, Kimi K2.x).

Kind
meta-platform
Host
Insilico Medicine
Founded
2025
License model
CC-BY (per portal); academic-friendly
URL
https://scienceaibench.insilico.com/
GitHub
N/A โ€” hosted portal
Composite score
90.6
Flags
none

Count methodology: Fetched https://scienceaibench.insilico.com/api/benchmarks on 2026-05-12; meta.totalBenchmarks=227 across 7 taxonomy categories ร— 17 suites. Leaderboard submitters are external frontier LLMs (top entries: Grok 4.1, GPT 5.1/5.2, Claude Opus 4.5/4.6, Gemini 3 Flash, DeepSeek v3.2, Kimi K2.5). Not self-referential โ€” Insilico's own models are not on the leaderboards.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Biggest of the three Insilico portals. Live leaderboards regenerate against frontier LLMs โ€” therefore NOT flagged self-referential. Strong longevity / aging benchmark slice (unique). Moves up the aging-relevance ranking.

DREAM Challenges โ€” 74 benchmarks tracked (as of 2026-05-12)

Long-running crowd-sourced biomedical prediction challenges, many pharma-sponsored.

Kind
competition
Host
Sage Bionetworks + IBM + academic partners
Founded
2006
License model
Per-challenge (mostly CC-BY-NC)
URL
https://dreamchallenges.org/
GitHub
https://github.com/dreamchallenges
Composite score
89.4
Flags
none

Count methodology: dreamchallenges.org/closed-challenges + /active as of 2026-05: 74 completed/active challenges; ~38 drug-discovery-relevant.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Historical impact on field norms. Cadence has slowed 2022+.

MIMIC-IV / eICU โ€” 14 benchmarks tracked (as of 2026-05-12)

ICU EHR datasets used for clinical outcome, adverse-event, and PK/PD benchmarks.

Kind
data-platform
Host
MIT Lab for Computational Physiology
Founded
2016 / 2020 (v4)
License model
PhysioNet credentialed
URL
https://physionet.org/content/mimiciv/
GitHub
https://github.com/MIT-LCP/mimic-code
Composite score
89.4
Flags
none

Count methodology: PhysioNet + BigBio MIMIC-IV benchmarks 2026-05: 14 derived benchmarks (mortality, LOS, readmission, sepsis, AKI, drug dosing, phenotyping).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Canonical for clinical ML. US-centric.

CZI Virtual Cell / CellxGene / VCC โ€” 12 benchmarks tracked (as of 2026-05-12)

Umbrella for CZI-funded virtual-cell benchmark initiatives: CellxGene, Virtual Cell Challenge, Tabula atlases.

Kind
consortium
Host
Chan Zuckerberg Initiative / CZ Biohub
Founded
2016 / 2024 (VCC)
License model
CC-BY 4.0
URL
https://chanzuckerberg.com/science/programs-resources/virtual-cells/
GitHub
https://github.com/chanzuckerberg
Composite score
88.9
Flags
none

Count methodology: chanzuckerberg.com/science 2026-05: Virtual Cell Challenge (4 tracks), CellxGene Census benchmarks (4), Tabula Sapiens-derived eval suites (4).

Breakdown

Individually catalogued benchmarks hosted here

Notes

VCC is becoming the canonical virtual-cell benchmark.

Open Reaction Database (ORD) โ€” 1 benchmarks tracked (as of 2026-05-12)

Open reaction repository in a schema-validated format; enables reaction / yield / retrosynthesis benchmarks.

Kind
data-platform
Host
ORD consortium (Doyle, Coley, Pfizer, Merck, BASF)
Founded
2021-07
License model
CC-BY-SA 4.0
URL
https://open-reaction-database.org/
GitHub
https://github.com/open-reaction-database
Composite score
88.9
Flags
none

Count methodology: open-reaction-database.org 2026-05: ~2.1M reactions as single versioned benchmark corpus.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Biggest open reaction corpus; industry donations accelerating.

Drug Discovery Benchmarks (DDB) โ€” 206 benchmarks tracked (as of 2026-05-12)

Insilico's drug-discovery-specific benchmark portal: TargetBench, Longevity Benchmark, GPCR affinity, PDBbind-style tasks, ISM ADMET, TDC ADMET mirror, ClinBench, biologics.

Kind
meta-platform
Host
Insilico Medicine
Founded
2025
License model
CC-BY (per portal)
URL
https://ddb.insilico.com/
GitHub
N/A โ€” hosted portal
Composite score
87.6
Flags
none

Count methodology: Fetched https://ddb.insilico.com/api/benchmarks on 2026-05-12; meta.totalBenchmarks=206 across 6 categories ร— 15 suites.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Drug-discovery focused cut. Includes a mirror of TDC ADMET for cross-platform comparability.

CAFA โ€” 6 benchmarks tracked (as of 2026-05-12)

Blind eval of protein function prediction against time-delayed UniProt-GOA.

Kind
competition
Host
Radivojac / Friedberg / Jiang consortium
Founded
2010
License model
Public
URL
https://biofunctionprediction.org/
GitHub
N/A
Composite score
86.8
Flags
none

Count methodology: biofunctionprediction.org archives: CAFA1โ€“5 (2010โ€“2023) + CAFA6 announced 2025 = 6 editions.

Breakdown

Individually catalogued benchmarks hosted here

Notes

CAFA5 (Kaggle, 2023) drew 1625 teams.

CAPRI โ€” 56 benchmarks tracked (as of 2026-05-12)

Blind prediction of protein-protein complexes, protein-peptide, and protein-ligand assemblies.

Kind
competition
Host
EBI + CCP4
Founded
2001
License model
Public
URL
https://www.ebi.ac.uk/pdbe/complex-pred/capri/
GitHub
N/A
Composite score
86.3
Flags
none

Count methodology: EBI CAPRI archive: Round 1 (2001) through Round 56 (2024).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Oldest PPI prediction benchmark.

FAERS / SIDER / OffSides / TWOSIDES โ€” 4 benchmarks tracked (as of 2026-05-12)

FDA adverse event reports + SIDER/OffSides/TWOSIDES derivatives for post-market signal detection.

Kind
data-platform
Host
FDA CDER + Tatonetti Lab
Founded
1969 / 2012
License model
Public / CC-BY
URL
https://www.fda.gov/drugs/surveillance/questions-and-answers-fdas-adverse-event-reporting-system-faers
GitHub
N/A
Composite score
85.6
Flags
none

Count methodology: FAERS (19M+ reports) + 3 derived benchmarks (SIDER, OffSides, TWOSIDES) = 4.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Essential for pharmacovigilance ML. Known reporting biases.

InsilicoBench โ€” 162 benchmarks tracked (as of 2026-05-12)

Compact cut of the Insilico benchmark stack focused on biology (longevity), GPCR affinity, retrosynthesis, ADMET, and clinical trials.

Kind
meta-platform
Host
Insilico Medicine
Founded
2025
License model
CC-BY (per portal)
URL
https://insilicobench.insilico.com/
GitHub
N/A โ€” hosted portal
Composite score
84.6
Flags
none

Count methodology: Fetched https://insilicobench.insilico.com/api/benchmarks on 2026-05-12; meta.totalBenchmarks=162 across 5 categories (Biology 19, Affinity/Binding 88, Chemical Synthesis 2, ADMET 28, Clinical Trials 25).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Curated subset of ScienceAIBench. Same leaderboard model pool โ†’ also NOT self-referential.

FLIP โ€” 15 benchmarks tracked (as of 2026-05-12)

Protein fitness benchmarks focused on realistic train/test splits (AAV, GB1, Meltome, SCL, Bind).

Kind
meta-platform
Host
Rostlab TUM + AlQuraishi Lab Columbia
Founded
2021-12
License model
CC-BY 4.0
URL
https://benchmark.protein.properties/
GitHub
https://github.com/J-SNACKKB/FLIP
Composite score
80.8
Flags
none

Count methodology: FLIP README: 5 landscapes ร— 3 splits = 15 benchmarks.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Complementary to ProteinGym (smaller but careful splits).

CPTAC โ€” 10 benchmarks tracked (as of 2026-05-12)

Integrated proteogenomic datasets across 10 tumor types; hosts DREAM proteogenomic benchmarks.

Kind
consortium
Host
NCI Office of Cancer Clinical Proteomics Research
Founded
2011
License model
dbGaP controlled / public tiers
URL
https://proteomics.cancer.gov/programs/cptac
GitHub
https://github.com/PayneLab/cptac
Composite score
80.8
Flags
none

Count methodology: CPTAC data portal 2026-05: 10 tumor types with full proteogenomic characterization (BR, CO, EN, GBM, HNSCC, LSCC, LUAD, OV, PDAC, CCRCC).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Deep but narrow (oncology).

DeepChem โ€” 40 benchmarks tracked (as of 2026-05-12)

OSS library bundling molecular ML benchmark datasets and baselines; hosts MoleculeNet.

Kind
meta-platform
Host
DeepChem community
Founded
2016
License model
MIT
URL
https://deepchem.io/
GitHub
https://github.com/deepchem/deepchem
Composite score
80.0
Flags
none

Count methodology: deepchem.molnet module listing: ~40 packaged datasets (MoleculeNet core + extensions).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Excellent reproducibility โ€” one-liner dataset loaders.

MoleculeNet โ€” 17 benchmarks tracked (as of 2026-05-12)

Benchmark suite covering quantum, physical, biophysical, physiological molecular ML tasks.

Kind
meta-platform
Host
DeepChem community (Pande Lab alumni)
Founded
2018-03
License model
MIT
URL
https://moleculenet.org/
GitHub
https://github.com/deepchem/deepchem
Composite score
78.0
Flags
data-leakage-known

Count methodology: Wu et al. 2018 Chem Sci + DeepChem repo enumeration: QM7/QM7b/QM8/QM9, ESOL, FreeSolv, Lipophilicity, PCBA, MUV, HIV, BACE, BBBP, Tox21, ToxCast, SIDER, ClinTox, PDBbind (17).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Historically foundational; many splits have documented leakage. Community has largely moved to TDC / Polaris for new work.

TrialBench / HINT / TOP โ€” 4 benchmarks tracked (as of 2026-05-12)

Suite of benchmarks for clinical trial outcome prediction.

Kind
meta-platform
Host
Fu/Sun Lab, Georgia Tech + HMS
Founded
2022
License model
MIT
URL
https://github.com/futianfan/clinical-trial-outcome-prediction
GitHub
https://github.com/futianfan/clinical-trial-outcome-prediction
Composite score
76.5
Flags
none

Count methodology: Fu et al. 2022-2024: HINT (17k trials), TOP (17k), TrialBench (21k trials, 12k drugs), CT-Outcome.

Breakdown

Individually catalogued benchmarks hosted here

Notes

First rigorous ML benchmarks on trial outcomes. Limited by CTgov quality.

ClawBio Benchmarks โ€” 10 benchmarks tracked (as of 2026-05-03)

Public scientific-correctness leaderboard for bio-analysis skills. Independent third-party benchmark (clawbio_bench, authored by Biostochastics LLC) tests ClawBio skills on safety, correctness, honesty. Public failure surface with remediation tasks.

Kind
meta-platform
Host
ClawBio (open source, MIT)
Founded
2026-04
License model
MIT
URL
https://clawbio.ai/benchmarks.html
GitHub
https://github.com/biostochastics/clawbio_bench
Composite score
74.2
Flags
none

Count methodology: Scraped https://clawbio.ai/benchmarks.html on 2026-05-12; last bench run 2026-05-03 against ClawBio commit 7820473 using clawbio_bench v0.1.5. 10 skills audited: claw-metagenomics, equity-scorer, nutrigx-advisor, bio-orchestrator, pharmgx-reporter, fine-mapping, clinical-variant-reporter, cvr-acmg-correctness, gwas-prs, cvr-variant-identity. 168/182 tests passing (92.3%).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Independent third-party bench in a separate repo โ€” structurally NOT self-referential. Coverage narrow (bio-analysis skills) but rigor is exemplary (safety ร— correctness ร— honesty tri-dimensional). Model for how skill/agent correctness should be audited.

EU-OPENSCREEN / EUbOPEN โ€” 5 benchmarks tracked (as of 2026-05-12)

EU chemical biology ERIC compound libraries + EUbOPEN chemogenomic probes.

Kind
consortium
Host
EU-OPENSCREEN ERIC + IMI EUbOPEN
Founded
2018
License model
CC-BY
URL
https://www.eu-openscreen.eu/
GitHub
N/A
Composite score
73.9
Flags
none

Count methodology: eu-openscreen.eu + eubopen.org 2026-05: ECBD (1), Bioactivity sets (2), EUbOPEN probe set (1), EUOS solubility (1).

Breakdown

Individually catalogued benchmarks hosted here

Notes

EUOS solubility benchmark (on Polaris) is most ML-ready.

PKU-AIDD / ChinaDrug Benchmarks โ€” 7 benchmarks tracked (as of 2026-05-12)

PKU AI Drug Discovery + SIMM CAS + Tsinghua + Baidu + Huawei benchmark releases.

Kind
consortium
Host
PKU + SIMM CAS + Tsinghua + Baidu + Huawei
Founded
2020
License model
Apache-2.0 / MIT
URL
https://aidd.pku.edu.cn/
GitHub
https://github.com/pku-aidd
Composite score
73.9
Flags
self_referential

Count methodology: PKU-AIDD + SIMM CAS + BDBench GitHub 2026-05: 7 public releases (PocketBench, ProteinInvBench, GeoMol-CN, HelixFold-Bench, UniMol-Bench, BDBench, PDBbind-China).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Growing Chinese benchmark ecosystem. Some self-referential flags (HelixFold on its own bench).

Kaggle โ€” Pharma / Bio Competitions โ€” 23 benchmarks tracked (as of 2026-05-12)

Industry-sponsored ML competitions (Merck MAC 2012, Open Problems ร—3, NovoZymes, BMS, CAFA 5).

Kind
competition
Host
Google / Kaggle + sponsoring companies
Founded
2010
License model
Per-competition
URL
https://www.kaggle.com/competitions
GitHub
N/A
Composite score
71.9
Flags
none

Count methodology: Kaggle search for bio/chem/pharma competitions 2010-2025: 23 distinct drug-discovery-adjacent competitions identified.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Impactful one-off events; leaderboards go stale post-close.

Papers With Code โ€” Drug Discovery โ€” 120 benchmarks tracked (as of 2026-05-12)

Aggregates published ML benchmarks with linked code; crowd-curated.

Kind
meta-platform
Host
Meta AI / Papers With Code community
Founded
2018
License model
Per benchmark
URL
https://paperswithcode.com/area/medical
GitHub
N/A
Composite score
71.6
Flags
none

Count methodology: paperswithcode.com/area/medical + /task search 2026-05: ~120 drug-discovery-adjacent benchmarks (DTI, generation, ADMET, structure, drug response, etc.).

Breakdown

Individually catalogued benchmarks hosted here

Notes

Useful for discovery; curation quality varies sharply.

PDBbind / CASF โ€” 6 benchmarks tracked (as of 2026-05-12)

Curated experimental binding affinities for PDB complexes + CASF scoring power tests.

Kind
meta-platform
Host
SIMM, Chinese Academy of Sciences
Founded
2004
License model
Academic-only
URL
http://www.pdbbind.org.cn/
GitHub
N/A
Composite score
70.4
Flags
data-leakage-known

Count methodology: pdbbind.org.cn: 2 splits (refined + general) ร— 3 CASF editions (2013, 2016, 2020) = 6 configurations.

Breakdown

Individually catalogued benchmarks hosted here

Notes

Known leakage; still dominant in published benchmarks. Academic-only licensing limits pharma use.

HuggingFace โ€” Bio/Chem Datasets โ€” 310 benchmarks tracked (as of 2026-05-12)

HuggingFace Datasets hub filtered for bio/chem benchmarks (tdc, bigbio, InstaDeep).

Kind
data-platform
Host
HuggingFace + community uploaders
Founded
2020
License model
Per-dataset
URL
https://huggingface.co/datasets
GitHub
N/A
Composite score
67.8
Flags
none

Count methodology: huggingface.co/datasets tag search (biology/chemistry/medical/drug-discovery) + curated orgs tdc/bigbio/InstaDeepAI 2026-05: ~310 entries, with duplication.

Breakdown

Individually catalogued benchmarks hosted here

Notes

High discoverability, low quality floor.