Intelligent Document Processing Leaderboard
Comprehensive document AI leaderboard comparing the best models across OCR, table extraction, key information extraction, and visual QA. Compare performance, accuracy, and cost.
0 Benchmarks0 ModelsOpen evaluation
# | Model | Overall | OlmOCR | OmniDoc | IDP |
|---|---|---|---|---|---|
| 1 | Nanonets OCR-3Nanonets | 85.9 | 87.4 | 90.0 | 80.2 |
| 2 | Nanonets OCR2+Nanonets | 81.8 | 82.0 | 89.5 | 73.8 |
| 3 | GPT-5.4OpenAI | 81.0 | 73.4 | 85.3 | 84.4 |
| 4 | Qwen3-VL-PlusAlibaba | 80.1 | 77.9 | 82.5 | 79.8 |
| 5 | Qwen3-VL-235BAlibaba | 79.6 | 76.8 | 81.9 | 80.0 |
| 6 | Gemini-3-ProGoogle | 79.4 | 67.7 | 88.8 | 81.8 |
| 7 | Claude Sonnet 4.6Anthropic | 79.1 | 69.3 | 86.9 | 81.2 |
| 8 | Claude Opus 4.6Anthropic | 78.8 | 69.3 | 85.9 | 81.1 |
| 9 | Gemini-3-FlashGoogle | 78.6 | 65.3 | 90.1 | 80.5 |
| 10 | Gemini 3.1 ProGoogle | 78.5 | 60.7 | 85.3 | 89.6 |
| 11 | GPT-5.2OpenAI | 78.0 | 68.7 | 88.0 | 77.4 |
| 12 | Qwen3.5-9BAlibaba | 76.7 | 77.2 | 76.7 | 76.2 |
| 13 | Qwen3.5-4BAlibaba | 72.5 | 75.4 | 67.6 | 74.5 |
| 14 | GPT-5-MiniOpenAI | 71.7 | 59.3 | 82.5 | 73.3 |
| 15 | Mistral Small 4Mistral AI | 71.5 | 69.6 | 76.4 | 68.5 |
| 16 | Claude Haiku 4.5Anthropic | 70.2 | 58.2 | 79.6 | 72.9 |
| 17 | Ministral-8BMistral AI | 69.5 | 58.7 | 78.3 | 71.7 |
| 18 | GPT-4.1OpenAI | 69.5 | 54.0 | 79.9 | 74.7 |
| 19 | GLM-OCRZhipu AI | 64.2 | 68.4 | 69.2 | 54.9 |
| 20 | Qwen3.5-2BAlibaba | 62.6 | 71.9 | 48.7 | 67.1 |
| 21 | Qwen3.5-0.8BAlibaba | 57.8 | 64.8 | 47.3 | 61.2 |
| 22 | GPT-5-NanoOpenAI | 52.0 | 26.8 | 63.4 | 65.8 |
| 23 | Llama-3.2-Vision-11BMeta | 50.8 | 49.1 | 44.6 | 58.6 |
| 24 | Pixtral-12BMistral AI | 46.5 | 38.3 | 42.3 | 59.0 |
About the Leaderboard
The Intelligent Document Processing (IDP) Leaderboard provides a comprehensive evaluation framework for assessing the capabilities of various AI models in document understanding and processing tasks. The overall score is the mean of all benchmark scores.