All Benchmarks/OmniDocBench

OmniDocBench

v1.5

Built by OpenDataLab. 1,355 pages from papers, books, slides, exams, newspapers, and magazines. Scores text extraction via edit distance, formula recognition via CDM, table structure via TEDS, and reading order accuracy. Overall = ((1 − Text Edit) × 100 + Table TEDS + Formula CDM) / 3.

Models Evaluated

18

Dataset Size

1,355 pages

Metrics

5

Source

View on GitHub

Overall Score = ((1 - Text Edit) x 100 + Table TEDS + Formula CDM) / 3

Rankings

#
Model
Overall
Text Edit↓
CDM↑
TEDS↑
TEDS-S↑
Read Order↓
1Gemini-3-FlashGoogle90.10.07790.287.792.60.081
2Nanonets OCR2+Nanonets89.50.05690.379.183.60.090
3Gemini-3-ProGoogle88.80.07887.387.091.70.084
4GPT-5.2OpenAI88.00.11190.184.989.50.098
5Claude Sonnet 4.6Anthropic86.90.16590.287.191.20.149
6Claude Opus 4.6Anthropic85.90.15188.584.489.10.136
7Datalab MarkerDatalab85.50.10988.379.183.70.106
8Gemini 3.1 ProGoogle85.30.08283.380.885.40.073
9GPT-5.4OpenAI85.30.08983.481.386.70.077
10GPT-5-MiniOpenAI82.50.13886.774.680.10.121
11GPT-4.1OpenAI79.90.16782.274.083.80.115
12Claude Haiku 4.5Anthropic79.60.22484.277.183.80.178
13Ministral-8BMistral AI78.30.15783.367.173.80.125
14GLM-OCRZhipu AI69.20.14484.737.439.30.141
15GPT-5-NanoOpenAI63.40.31961.061.269.50.243
16Gemma-3-12B-ITGoogle44.60.47650.031.646.90.364
17Llama-3.2-Vision-11BMeta44.60.54155.432.642.90.340
18Pixtral-12BMistral AI42.30.64158.832.150.80.422

Metrics

Text EditLower is better

Character-level edit distance between predicted and ground-truth text blocks. Lower values indicate more accurate text extraction.

CDMHigher is better

Character Detection Matching score for display formulas. Measures structural and symbolic accuracy of recognized mathematical expressions.

TEDSHigher is better

Tree Edit Distance-based Similarity for tables. Evaluates both content and structure of extracted tables.

TEDS-SHigher is better

Structure-only TEDS that ignores cell content. Focuses purely on table layout and cell spanning.

Read OrderLower is better

Edit distance measuring how well the model preserves the correct reading order across multi-column and complex layouts.