LSD Large-Scale Docking Database

Open-source dataset of 6.3 billion explicitly evaluated ligand-target docking pairs across 11 protein targets. Provides docking scores, SMILES, poses for top molecules, and in vitro validation results. Designed for ML model development and chemical space exploration.

Composite
82.5
Experimental validation
Wet-lab confirmed
Stages
Hit ID
Modalities
protein_structuresmall-molecule
Task types
virtual_screeningdockingscoring
Size
ligand-target_pairs: 6,300,000,000
targets: 11
License
CC-BY-4.0
First release
2025-02
Last updated
2025-04
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
A database for large-scale docking and experimental results · · 2025 · paper · doi:10.1021/acs.jcim.5c00394 · 8 citations
Flags
ultra_large_scaleexperimental_validation
Experts
Groups
Hosted by
Related benchmarks
DOCKSTRING, LIT-PCBA, DUD-E

Rubric (7-criterion)

rigor
4
coverage
5
maintenance
3
adoption
3
quality
4
accessibility
5
industry_relevance
5

Notes

Unprecedented scale for public docking data. Includes experimental in vitro validation for subset. From UCSF Shoichet Lab. Critical for training ML scoring functions and active learning in virtual screening.

← Back to all benchmarks

Compare:
Open comparison →