FGBench (Functional Group Molecular Property Reasoning)

Benchmark for molecular property reasoning at functional group level in LLMs. 625,000 problems enriched with FG annotations and positional info. 245 functional groups with regression and classification tasks. Tests structure-property understanding.

Composite
77.0
Experimental validation
None
Stages
Hit IDLead ID / ADMET
Modalities
textsmall-molecule
Task types
property_predictionllm_evaluationinterpretability
Size
problems: 625,000
functional_groups: 245
License
MIT
First release
2025-08
Last updated
2026-04
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models · · 2025 · paper · 5 citations
Flags
neurips_2025llm_benchmarkinterpretability
Experts
Groups
Hosted by
Related benchmarks
MoleculeNet, MoleculeACE

Rubric (7-criterion)

rigor
4
coverage
4
maintenance
3
adoption
3
quality
4
accessibility
5
industry_relevance
4

Notes

NeurIPS 2025 Datasets & Benchmarks Track. Reveals LLMs struggle with FG-level property reasoning. Addresses gap between molecular-level and substructure-level understanding. Framework for generating new chemistry QA pairs.

← Back to all benchmarks

Compare:
Open comparison →