DO Challenge 2025 (DeepOrigin Autonomous Drug Discovery)

Benchmark for autonomous AI agents in drug discovery. Agents must identify top 1,000 molecules from 1M conformations with limited budget (100K score queries). Tests ML-based sampling, strategic resource management, and code execution for autonomous discovery pipelines.

Composite
70.9
Experimental validation
None
Stages
Hit ID
Modalities
ai_agentsmall-molecule
Task types
virtual_screeningactive_learningagent_evaluation
Size
molecular_conformations: 1,000,000
query_budget: 100,000
License
Apache-2.0
First release
2025-03
Last updated
2025-05
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
Can AI Agents Design and Implement Drug Discovery Pipelines? · · 2025 · paper · doi:10.5281/zenodo.15296510 · 5 citations
Flags
competitionagent_benchmark
Experts
Groups
Hosted by
Related benchmarks

Rubric (7-criterion)

rigor
4
coverage
2
maintenance
3
adoption
3
quality
4
accessibility
4
industry_relevance
5

Notes

First benchmark specifically for AI agents (not just models) in drug discovery. Multi-agent system 'Deep Thought' outperformed most human teams but underperformed expert solutions. Tests integrated pipeline design rather than isolated tasks.

← Back to all benchmarks

Compare:
Open comparison →