MolGenBench
Comprehensive benchmark for molecular generation in real-world drug discovery, specifically addressing hit-to-lead (H2L) optimization. 220,005 experimentally confirmed active molecules across 120 targets and 5,433 chemical series. Novel pharmaceutically grounded metrics.
Composite
78.2
Experimental validation
Retrospective
Stages
Hit IDLead ID / ADMET
Modalities
small-molecule
Task types
molecular_generationlead_optimizationde_novo_design
Size
molecules: 220,000
targets: 120
entries: 5
series: 433
targets: 120
entries: 5
series: 433
License
Unknown
First release
2025-11
Last updated
2025-11
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
Benchmarking Real-World Applicability of Molecular Generative Models from De novo Design to Lead Optimization with MolGenBench · · 2025 · paper · doi:10.1101/2025.11.03.686215 · 6 citations
Flags
hit_to_leadreal_world_metrics
Experts
—
Groups
—
Hosted by
—
Related benchmarks
Rubric (7-criterion)
rigor
5
coverage
4
maintenance
3
adoption
3
quality
4
accessibility
3
industry_relevance
5
Notes
Reveals significant gap between current generative model capabilities and real-world H2L demands. Novel metrics for target-specific active compound rediscovery and progressive potency optimization. Large experimentally confirmed dataset.