OlmOCR Bench

v1.0

7,010 pass/fail tests built by Allen AI. Feed a page image in, get structured text out. Scored on five axes: does it render math correctly, preserve table layout, pick up headers and captions, suppress watermarks and page numbers, and maintain reading order. Sources include arXiv papers, old scans, financial filings, and multi-column layouts.

Models Evaluated

Dataset Size

1,403 pages · 7,010 tests

Metrics

Source

View on GitHub

Overall Score = Macro average across 8 dataset categories

Rankings

#	Model	Overall	ArXiv Math	H&F	Long/Tiny	Multi-Col	Old Scans	Scans Math	Tables
1	Nanonets OCR-3Nanonets	87.4	89.2	96.6	93.4	87.6	49.6	88.9	94.2
2	Datalab MarkerDatalab	83.2	83.8	88.6	90.3	80.9	51.9	86.7	83.4
3	Nanonets OCR2+Nanonets	82.0	83.8	96.8	89.4	83.9	42.4	74.5	86.3
4	GPT-5.4OpenAI	81.0	83.1	20.1	82.6	83.7	43.9	82.3	91.1
5	GPT-5.2OpenAI	79.1	78.4	26.2	83.3	81.0	40.9	84.5	86.8
6	Qwen3-VL-PlusAlibaba	77.9	88.3	34.6	88.0	86.7	51.0	88.0	86.6
7	Gemini-3-ProGoogle	77.7	87.1	26.7	90.5	77.3	43.7	75.1	73.6
8	Qwen3.5-9BAlibaba	77.2	86.0	48.8	84.8	83.4	46.8	82.5	86.3
9	Qwen3-VL-235BAlibaba	76.8	88.4	33.6	88.9	85.9	49.6	81.2	86.7
10	Qwen3.5-4BAlibaba	75.4	86.7	47.2	83.9	79.2	41.1	81.9	85.0
11	Gemini-3-FlashGoogle	75.3	80.1	27.4	90.3	75.3	45.8	73.6	64.6
12	Claude Opus 4.6Anthropic	74.1	86.6	35.5	73.8	79.5	20.5	83.0	84.5
13	Claude Sonnet 4.6Anthropic	73.9	87.5	37.4	71.3	78.5	19.8	84.3	86.0
14	Qwen3.5-2BAlibaba	71.9	82.1	49.6	77.1	74.7	38.0	75.3	80.7
15	GPT-5-MiniOpenAI	69.9	50.1	25.5	84.6	82.9	41.1	60.5	70.6
16	Gemini 3.1 ProGoogle	69.8	74.4	33.2	84.6	72.7	42.2	72.5	45.1
17	Mistral Small 4Mistral AI	69.6	64.2	41.3	71.3	82.4	36.5	77.5	84.0
18	GLM-OCRZhipu AI	68.4	67.3	89.2	35.7	80.3	41.3	75.5	59.0
19	Qwen3.5-0.8BAlibaba	64.8	73.8	62.2	62.0	70.0	31.6	60.3	63.8
20	Claude Haiku 4.5Anthropic	61.2	55.4	37.6	36.2	62.8	19.6	79.5	83.4
21	Qwen-VL-OCRAlibaba	59.0	66.8	24.7	74.7	75.1	37.3	50.7	43.0
22	Ministral-8BMistral AI	58.7	48.9	53.0	56.6	82.4	27.9	62.9	79.3
23	GPT-4.1OpenAI	49.4	58.2	32.1	53.4	66.3	37.6	71.4	59.2
24	Llama-3.2-Vision-11BMeta	49.1	37.2	68.3	34.2	59.5	29.8	53.7	60.9
25	Gemma-4-E4B-itGoogle	47.0	20.4	48.4	26.0	37.1	28.3	49.8	66.9
26	Pixtral-12BMistral AI	38.3	30.0	72.8	12.2	26.8	29.1	50.7	46.8
27	Gemma-4-E2B-itGoogle	38.2	11.2	55.1	10.0	13.8	23.4	30.8	53.4
28	GPT-5-NanoOpenAI	35.2	2.5	41.6	5.0	52.1	29.7	1.7	55.2
29	Gemma-3-12B-ITGoogle	20.6	0.0	0.0	0.0	0.0	0.0	0.0	0.0

Metrics

ArXiv MathHigher is better

H&FHigher is better

Long/TinyHigher is better

Multi-ColHigher is better

Old ScansHigher is better

Scans MathHigher is better

TablesHigher is better