BOOM (Benchmarking Out-Of-Distribution Molecular Predictions)

Systematic benchmark evaluating OOD performance for molecular property prediction. Over 150 model-task combinations evaluated. Key finding: no model consistently achieves strong OOD generalization; average OOD error 3x larger than in-distribution.

Composite

83.3

Experimental validation

None

Stages

Hit IDLead ID / ADMET

Modalities

small-molecule

Task types

property_predictionood_generalization

Size

model-task_combinations: 150

License

MIT

First release

2025-05

Last updated

2025-12

Official site

→ project page

Leaderboard

→ leaderboard

Dataset

→ dataset

Code / GitHub

→ repository

HuggingFace

→ HF

Paper

BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models · · 2025 · paper · 7 citations

Flags

neurips_2025ood_evaluationcritical_finding

Experts

—

Groups

—

Hosted by

—

Related benchmarks

MoleculeNet, MoleculeACE, TDC ADMET Group

Rubric (7-criterion)

rigor

coverage

maintenance

adoption

quality

accessibility

industry_relevance

Notes

NeurIPS 2025. Critical benchmark showing the gap between in-distribution and OOD performance in molecular ML. Highly relevant for real-world drug discovery where novel chemotypes are the goal. Frontier challenge for chemical ML.

← Back to all benchmarks

Compare:

Open comparison →