OlmOCR Bench
v1.07,010 pass/fail tests built by Allen AI. Feed a page image in, get structured text out. Scored on five axes: does it render math correctly, preserve table layout, pick up headers and captions, suppress watermarks and page numbers, and maintain reading order. Sources include arXiv papers, old scans, financial filings, and multi-column layouts.
Overall Score = Average of Math, Table, Present, Absent, Order, and Base scores
Rankings
# | Model | Overall | Math | Table | Present | Absent | Order |
|---|---|---|---|---|---|---|---|
| 1 | Nanonets OCR2+Nanonets | 82.2 | 82.5 | 86.3 | 71.6 | 93.1 | 76.1 |
| 2 | Datalab MarkerDatalab | 82.1 | 84.2 | 83.3 | 75.5 | 88.6 | 73.5 |
| 3 | Gemini 3.1 ProGoogle | 74.6 | 82.3 | 89.8 | 70.5 | 43.3 | 65.0 |
| 4 | Claude Sonnet 4.6Anthropic | 74.4 | 87.1 | 86.0 | 47.0 | 41.8 | 66.6 |
| 5 | Claude Opus 4.6Anthropic | 73.9 | 86.1 | 84.5 | 49.1 | 39.9 | 67.7 |
| 6 | Gemini-3-ProGoogle | 73.5 | 80.1 | 85.2 | 73.4 | 30.5 | 73.8 |
| 7 | GPT-5.4OpenAI | 73.4 | 83.1 | 91.1 | 66.9 | 25.2 | 74.7 |
| 8 | GPT-5.2OpenAI | 72.2 | 79.3 | 86.8 | 68.9 | 27.1 | 72.7 |
| 9 | Gemini-3-FlashGoogle | 69.2 | 79.2 | 64.5 | 75.7 | 27.8 | 69.0 |
| 10 | GLM-OCRZhipu AI | 66.7 | 68.4 | 58.9 | 36.1 | 89.1 | 71.9 |
| 11 | Ministral-8BMistral AI | 57.8 | 50.8 | 79.2 | 42.0 | 55.7 | 71.4 |
| 12 | GPT-5-MiniOpenAI | 56.7 | 51.5 | 70.6 | 70.2 | 26.1 | 74.4 |
| 13 | Claude Haiku 4.5Anthropic | 56.2 | 58.7 | 83.3 | 25.5 | 42.0 | 53.4 |
| 14 | GPT-4.1OpenAI | 55.5 | 60.0 | 59.1 | 47.3 | 34.9 | 59.4 |
| 15 | Llama-3.2-Vision-11BMeta | 47.2 | 39.4 | 60.8 | 30.5 | 69.9 | 51.9 |
| 16 | Pixtral-12BMistral AI | 36.8 | 32.8 | 46.7 | 15.4 | 74.5 | 25.1 |
| 17 | GPT-5-NanoOpenAI | 22.8 | 2.4 | 55.1 | 13.0 | 43.1 | 47.0 |
| 18 | Gemma-3-12B-ITGoogle | 20.6 | 10.0 | 34.7 | 14.6 | 67.9 | 7.7 |
Metrics
MathHigher is better
Accuracy of mathematical equation rendering and LaTeX output.
TableHigher is better
Fidelity of table structure and content preservation.
PresentHigher is better
Correct inclusion of visible document elements like headers and captions.
AbsentHigher is better
Correct suppression of artifacts like watermarks and page numbers.
OrderHigher is better
Accuracy of multi-column and complex layout reading order.