Private / Industry benchmarks (28)
Access note: the benchmarks below reference datasets that are not publicly accessible. They are catalogued for industry-relevance reference only and to surface the public proxies that academic / open-source work can use. Access is typically gated by collaboration, data-sharing agreements, or remains closed entirely.
Compiled from public publications, SEC filings, conference talks, and press releases. Each entry links to the best public proxy benchmark.
| Benchmark | Owner | Type | Stage | Modality | Access | Estimated size | Public proxy |
|---|---|---|---|---|---|---|---|
| AstraZeneca CAS-backed DMPK Benchmarks | AstraZeneca | pharma | Lead ID / ADMET | small-molecule | closed | ~200k measured DMPK endpoints | TDC ADMET Group |
| Bristol-Myers Squibb Internal SAR Benchmark | Bristol-Myers Squibb | pharma | Hit ID | small-molecule | closed | ~1M assay points | ChEMBL |
| Chugai Antibody Engineering Benchmark | Chugai Pharmaceutical (Roche) | pharma | Hit ID | biologic | collaboration | undisclosed | Therapeutic Antibody Design Benchmark 2026 |
| Deep Genomics RNA Therapeutics Benchmark | Deep Genomics | biotech | Hit ID | rna-therapeutic | collaboration | undisclosed | mRNA Design Benchmark (CodonBench 2026) |
| Exscientia Precision Medicine Benchmark | Exscientia | biotech | Hit ID | small-molecule | closed | undisclosed | ChEMBL |
| FDA CDRH Internal AI Validation Sets | FDA Center for Devices and Radiological Health | regulatory | Post-market / RWE | cross-modality | closed | undisclosed | FAERS (raw) |
| Flatiron Health Real-World Oncology Benchmark | Flatiron Health (Roche) | pharma | Post-market / RWE | cross-modality | data-sharing-agreement | ~4M oncology patients | MIMIC-IV Benchmark Tasks |
| Genentech gRED Structure-Activity Dataset | Genentech gRED | pharma | Hit ID | small-molecule | closed | undisclosed; referenced as 'millions of assay points' | ChEMBL |
| Gilead Internal Antiviral Benchmark | Gilead Sciences | pharma | Hit ID | small-molecule | closed | ~500k compounds antiviral screened | ASAP Discovery Antiviral 2025 |
| Ginkgo Bioworks Biologics Design Benchmark | Ginkgo Bioworks | biotech | Hit ID | biologic | collaboration | ~millions of enzyme variants with activity measurements | Protein Design Benchmark 2026 |
| IBM RXN Internal Retrosynthesis Benchmark | IBM Research | tech | Hit ID | small-molecule | collaboration | ~5-10M proprietary reactions beyond USPTO | USPTO-50K / USPTO-MIT (Retrosynthesis) |
| Insilico Longevity Benchmark (Full Dataset) | Insilico Medicine | pharma | Virtual Cell | cross-modality | conditional-access | ~1M individuals across NHANES + internal cohorts, 500k methylation samples | Longevity Benchmark (Insilico) |
| Isomorphic Labs Internal Structure/Docking Benchmark | Isomorphic Labs (Alphabet) | biotech | Hit ID | cross-modality | closed | undisclosed | PLINDER v2 Protein-Ligand Benchmark |
| Merck Internal ADMET Benchmark (Demystifying ADMET) | Merck & Co. | pharma | Lead ID / ADMET | small-molecule | closed | ~150k compounds across 17 endpoints | Polaris ADMET |
| Meta FAIR Protein Design Internal Eval | Meta FAIR / EvolutionaryScale | biotech | Hit ID | biologic | closed | undisclosed | ProteinGym |
| Moderna mRNA Design Internal Benchmark | Moderna | pharma | Hit ID | rna-therapeutic | closed | ~100k constructs with HEK293 and primary-cell expression readout | mRNA Design Benchmark (CodonBench 2026) |
| NIBR Therapeutics Data (NTD) | Novartis NIBR | pharma | Lead ID / ADMET | small-molecule | collaboration | ~4M compounds, 20M assays | ChEMBL |
| Open Problems Sponsor-Private Challenge Data | Open Problems consortium (sponsors: 10x Genomics, Chan Zuckerberg Biohub) | consortium | Virtual Cell | cross-modality | conditional-access | varies per competition (~100k cells each) | Open Problems: Perturbation Prediction |
| Open Targets Pharma Partner Extensions | Open Targets consortium | consortium | Target ID | cross-modality | data-sharing-agreement | undisclosed | Open Targets Platform |
| Pfizer Phase II Trial Benchmark Dataset | Pfizer | pharma | phase-ii | cross-modality | closed | ~1500 trials, 40,000 patients | HINT / TrialBench |
| Pfizer mRNA/LNP Internal Benchmark | Pfizer | pharma | Hit ID | rna-therapeutic | closed | undisclosed | mRNA Design Benchmark (CodonBench 2026) |
| Recursion Full Phenomics Dataset | Recursion Pharmaceuticals | pharma | Hit ID | small-molecule | conditional-access | ~50 PB images, 18M compound wells | RxRx3 Phenomics Benchmark |
| Roche pRED ADMET Benchmark | Roche pRED | pharma | Lead ID / ADMET | small-molecule | closed | ~300k compound-endpoint pairs | TDC ADMET Group |
| Sanofi Internal PK/PD Benchmark | Sanofi | pharma | IND-enabling | small-molecule | closed | undisclosed | Obach PK Dataset |
| Takeda Internal AI Pipeline Benchmark | Takeda | pharma | Lead ID / ADMET | small-molecule | collaboration | undisclosed | TDC ADMET Group |
| Tempus Oncology AI Benchmark | Tempus Labs | biotech | Target ID | cross-modality | data-sharing-agreement | ~200k sequenced patients + outcomes | CPTAC Proteogenomic Benchmarks |
| Valence Labs Internal ADMET Extensions | Valence Labs (Recursion) | biotech | Lead ID / ADMET | small-molecule | conditional-access | ~150k compounds | Polaris ADMET |
| Xaira Therapeutics Foundation Benchmark | Xaira Therapeutics | biotech | Virtual Cell | cross-modality | closed | undisclosed; built on Illumina-scale partnerships | Virtual Cell Benchmark Suite 2026 |