BOOM (Benchmarking Out-Of-Distribution Molecular Predictions)
Systematic benchmark evaluating OOD performance for molecular property prediction. Over 150 model-task combinations evaluated. Key finding: no model consistently achieves strong OOD generalization; average OOD error 3x larger than in-distribution.
Composite
83.3
Experimental validation
None
Stages
Hit IDLead ID / ADMET
Modalities
small-molecule
Task types
property_predictionood_generalization
Size
model-task_combinations: 150
License
MIT
First release
2025-05
Last updated
2025-12
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
HuggingFace
→ HF
Paper
BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models · · 2025 · paper · 7 citations
Flags
neurips_2025ood_evaluationcritical_finding
Experts
—
Groups
—
Hosted by
—
Related benchmarks
Rubric (7-criterion)
rigor
5
coverage
4
maintenance
3
adoption
3
quality
4
accessibility
5
industry_relevance
5
Notes
NeurIPS 2025. Critical benchmark showing the gap between in-distribution and OOD performance in molecular ML. Highly relevant for real-world drug discovery where novel chemotypes are the goal. Frontier challenge for chemical ML.