Protein Language Model Eval 2026

Consolidated evaluation harness for ESM, ProGen, xTrimoPGLM, Evo — 37 zero-shot tasks spanning fitness, structure, function.

Composite

100.0

Experimental validation

Wet-lab confirmed

Stages

Virtual CellHit ID

Modalities

biologic

Task types

zero-shot-fitnessrepresentation-quality

Size

tasks: 37
proteins: 9,800,000

License

MIT

First release

2026-02

Last updated

2026-04

Official site

→ project page

Leaderboard

→ leaderboard

Dataset

→ dataset

Code / GitHub

→ repository

HuggingFace

→ HF

Paper

Protein LM Eval 2026: a community harness · Rives A, Rao R, et al. · 2026 · paper · doi:10.1101/2026.02.10.550123 · 76 citations

Flags

none

Experts

—

Groups

—

Hosted by

—

Related benchmarks

ProteinGym, TAPE, PEER, FLIP

Rubric (7-criterion)

rigor

coverage

maintenance

adoption

quality

accessibility

industry_relevance

Notes

Meta FAIR + EvolutionaryScale collaboration; includes held-out targets with wet-lab fitness.

← Back to all benchmarks

Compare:

Open comparison →