BELKA (Big Encoded Library for Chemical Assessment)
Largest public DNA-encoded library (DEL) dataset: ~133M small molecules with 3.6B binding measurements against BRD4, sEH, and HSA. NeurIPS 2024 Kaggle competition. Includes library split for OOD evaluation. From Leash Biosciences.
Composite
82.5
Experimental validation
Retrospective
Stages
Hit ID
Modalities
dna_encoded_librarysmall-molecule
Task types
binding_predictionvirtual_screening
Size
molecules: 133,000,000
measurements: 3,600,000,000
targets: 3
measurements: 3,600,000,000
targets: 3
License
CC-BY-4.0
First release
2024-04
Last updated
2024-10
Official site
Leaderboard
Dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
Introducing BELKA: Big Encoded Library for Chemical Assessment · · 2024 · 25 citations
Flags
neurips_2024kaggleultra_large_scalecompetition
Experts
—
Groups
—
Hosted by
—
Related benchmarks
Rubric (7-criterion)
rigor
4
coverage
3
maintenance
3
adoption
5
quality
4
accessibility
5
industry_relevance
5
Notes
NeurIPS 2024 competition. Unprecedented scale for public binding data. Library split tests true OOD generalization. DEL technology enables massive chemical space exploration. Now on Polaris Hub.