All Benchmarks/OlmOCR Bench

OlmOCR Bench

v1.0

7,010 pass/fail tests built by Allen AI. Feed a page image in, get structured text out. Scored on five axes: does it render math correctly, preserve table layout, pick up headers and captions, suppress watermarks and page numbers, and maintain reading order. Sources include arXiv papers, old scans, financial filings, and multi-column layouts.

Models Evaluated

18

Dataset Size

1,403 pages ยท 7,010 tests

Metrics

5

Source

View on GitHub

Overall Score = Average of Math, Table, Present, Absent, Order, and Base scores

Rankings

#
Model
Overall
Math
Table
Present
Absent
Order
1Nanonets OCR2+Nanonets82.282.586.371.693.176.1
2Datalab MarkerDatalab82.184.283.375.588.673.5
3Gemini 3.1 ProGoogle74.682.389.870.543.365.0
4Claude Sonnet 4.6Anthropic74.487.186.047.041.866.6
5Claude Opus 4.6Anthropic73.986.184.549.139.967.7
6Gemini-3-ProGoogle73.580.185.273.430.573.8
7GPT-5.4OpenAI73.483.191.166.925.274.7
8GPT-5.2OpenAI72.279.386.868.927.172.7
9Gemini-3-FlashGoogle69.279.264.575.727.869.0
10GLM-OCRZhipu AI66.768.458.936.189.171.9
11Ministral-8BMistral AI57.850.879.242.055.771.4
12GPT-5-MiniOpenAI56.751.570.670.226.174.4
13Claude Haiku 4.5Anthropic56.258.783.325.542.053.4
14GPT-4.1OpenAI55.560.059.147.334.959.4
15Llama-3.2-Vision-11BMeta47.239.460.830.569.951.9
16Pixtral-12BMistral AI36.832.846.715.474.525.1
17GPT-5-NanoOpenAI22.82.455.113.043.147.0
18Gemma-3-12B-ITGoogle20.610.034.714.667.97.7

Metrics

MathHigher is better

Accuracy of mathematical equation rendering and LaTeX output.

TableHigher is better

Fidelity of table structure and content preservation.

PresentHigher is better

Correct inclusion of visible document elements like headers and captions.

AbsentHigher is better

Correct suppression of artifacts like watermarks and page numbers.

OrderHigher is better

Accuracy of multi-column and complex layout reading order.