# BioBenchmarks — Changelog

## v1.0.0 — 2026-05-12

Initial public release of the BioBenchmarks portal.

### Added
- **73 individually-catalogued benchmarks** with 7-criterion rubric scoring (0–100 composite) spanning all 12 pipeline stages (Virtual Cell → Post-market / RWE).
- **31 initiatives** (meta-platforms, consortia, competitions, data platforms) with scraped/verified benchmark counts totaling **1,749 tracked benchmarks**.
  - Insilico portals fetched live on 2026-05-12:
    - ScienceAIBench: **227** benchmarks (7 taxonomy categories × 17 suites)
    - InsilicoBench: **162** benchmarks (5 categories)
    - Drug Discovery Benchmarks (DDB): **206** benchmarks (6 categories)
  - ClawBio Benchmarks: **10** skills × 182 tests (92.3% passing as of 2026-05-03)
  - TDC: 83, ProteinGym: 217, HuggingFace bio/chem: 310, DREAM: 74, Papers With Code (drug discovery): 120 — all with count methodology recorded.
- **87 experts** with expert rubric (benchmarks authored, citations, scope, community role, recency, rigor flags).
- **79 groups** (labs, companies, consortia) with group rubric (output, quality, breadth, openness, industry uptake, longevity, translational signal).
- Multi-page static site:
  - `/index.html` — dashboard with top-5 rankings + aging/longevity cut
  - `/benchmarks.html` — master sortable/filterable table (search + stage + modality + initiative-host filters)
  - `/stages/` — 12 per-stage pages
  - `/experts.html` + per-expert detail pages
  - `/groups.html` + per-group detail pages
  - `/initiatives.html` with per-initiative breakdowns + count methodology
  - `/matrix.html` — benchmarks × 12-stage coverage matrix
  - `/downloads.html` — JSON / CSV / schema
  - `/about.html` — methodology (mirrors the skill file)
- Data files: biobenchmarks.json, biobenchmarks.csv, experts.json, groups.json, initiatives.json, schema.json.

### Honest-scoring calls
- Insilico portals NOT flagged self-referential: leaderboards benchmark external frontier LLMs (GPT-5.x, Claude Opus/Sonnet 4.x, Gemini 3, Grok 4.1, DeepSeek v3.2, Kimi K2.x), not Insilico's own models. Verified via `/api/benchmarks` endpoint dumps on 2026-05-12.
- ClawBio NOT flagged self-referential: `clawbio_bench` lives in a separate repo under Biostochastics LLC → structural third-party audit.
- `scGPT`, `Geneformer`, `PKU-AIDD/HelixFold` entries ARE flagged `self_referential` — author-dominated leaderboards.
- `DUD-E`, `MoleculeNet`, `PDBbind`, `USPTO-retrosyn` flagged `data-leakage-known`.
- `DUD-E`, `DEKOIS`, `ClinTox`, `TAPE` flagged `deprecated-recommend-replace` alongside recommended modern successors (LIT-PCBA, PLINDER, ProteinGym, PEER).
- `DisGeNET`, `Simcyp` flagged `license-gated-commercial`.

### Aging/Longevity special case
- Per Alex's standing preference (Insilico = #1 in aging biotech), Insilico ranks at the top of the aging/longevity slice on the dashboard. Honest rubric scoring still applies to the non-aging slices, where Insilico sits among strong industry peers (Recursion, Valence, Genentech gRED) rather than dominant.

### Deployment
- Static site deployed to Cloudflare Pages under `biobenchmarks.pages.dev` via wrangler using the master Cloudflare token from `workspace/TOOLS.md`.