DO Challenge 2025 (DeepOrigin Autonomous Drug Discovery)

Benchmark for autonomous AI agents in drug discovery. Agents must identify top 1,000 molecules from 1M conformations with limited budget (100K score queries). Tests ML-based sampling, strategic resource management, and code execution for autonomous discovery pipelines.

Composite

70.9

Experimental validation

None

Stages

Hit ID

Modalities

ai_agentsmall-molecule

Task types

virtual_screeningactive_learningagent_evaluation

Size

molecular_conformations: 1,000,000
query_budget: 100,000

License

Apache-2.0

First release

2025-03

Last updated

2025-05

Official site

→ project page

Leaderboard

→ leaderboard

Dataset

→ dataset

Code / GitHub

→ repository

HuggingFace

→ HF

Paper

Can AI Agents Design and Implement Drug Discovery Pipelines? · · 2025 · paper · doi:10.5281/zenodo.15296510 · 5 citations

Flags

competitionagent_benchmark

Experts

—

Groups

—

Hosted by

—

Related benchmarks

—

Rubric (7-criterion)

rigor

coverage

maintenance

adoption

quality

accessibility

industry_relevance

Notes

First benchmark specifically for AI agents (not just models) in drug discovery. Multi-agent system 'Deep Thought' outperformed most human teams but underperformed expert solutions. Tests integrated pipeline design rather than isolated tasks.

← Back to all benchmarks

Compare:

Open comparison →