FGBench (Functional Group Molecular Property Reasoning)

Benchmark for molecular property reasoning at functional group level in LLMs. 625,000 problems enriched with FG annotations and positional info. 245 functional groups with regression and classification tasks. Tests structure-property understanding.

Composite

77.0

Experimental validation

None

Stages

Hit IDLead ID / ADMET

Modalities

textsmall-molecule

Task types

property_predictionllm_evaluationinterpretability

Size

problems: 625,000
functional_groups: 245

License

MIT

First release

2025-08

Last updated

2026-04

Official site

→ project page

Leaderboard

→ leaderboard

Dataset

→ dataset

Code / GitHub

→ repository

HuggingFace

→ HF

Paper

FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models · · 2025 · paper · 5 citations

Flags

neurips_2025llm_benchmarkinterpretability

Experts

—

Groups

—

Hosted by

—

Related benchmarks

MoleculeNet, MoleculeACE

Rubric (7-criterion)

rigor

coverage

maintenance

adoption

quality

accessibility

industry_relevance

Notes

NeurIPS 2025 Datasets & Benchmarks Track. Reveals LLMs struggle with FG-level property reasoning. Addresses gap between molecular-level and substructure-level understanding. Framework for generating new chemistry QA pairs.

← Back to all benchmarks

Compare:

Open comparison →