A unified leaderboard for OCR, KIE, classification, QA, table extraction, and confidence score evaluation
The Intelligent Document Processing (IDP) Leaderboard provides a comprehensive evaluation framework for assessing the capabilities of various AI models in document understanding and processing tasks. This benchmark covers seven critical aspects of document intelligence:
This benchmark is included in the Intelligent Document Processing (IDP) Leaderboard, which assesses the performance of different models in long document understanding. For a comprehensive evaluation of document understanding tasks, please visit the full leaderboard.
Long Document Processing assesses a model's ability to process and understand lengthy documents. This includes maintaining context across multiple pages, understanding document structure, and accurately retrieving information from large documents.
Rank | Model | Avg | Nanonets-LongDocBench |
---|---|---|---|
1 | claude-3.7-sonnet (reasoning:low) | 75.93 | 75.93 |
2 | qwen2.5-vl-32b-instruct | 75.62 | 75.62 |
3 | gemma-3-27b-it | 72.95 | 72.95 |
4 | gemini-2.5-flash-preview-04-17 | 69.08 | 69.08 |
5 | InternVL3-38B-Instruct | 68.30 | 68.30 |
6 | gpt-4o-2024-08-06 | 66.90 | 66.90 |
7 | gemini-2.5-pro-preview-03-25 (reasoning: low) | 66.69 | 66.69 |
8 | o4-mini-2025-04-16 | 66.13 | 66.13 |
9 | gpt-4.1-2025-04-14 | 66.00 | 66.00 |
10 | gpt-4o-2024-11-20 | 63.95 | 63.95 |
11 | gemini-2.0-flash | 56.01 | 56.01 |
12 | gpt-4o-mini-2024-07-18 | 55.48 | 55.48 |
13 | claude-sonnet-4 | 40.06 | 40.06 |
14 | qwen2.5-vl-72b-instruct | 37.47 | 37.47 |
15 | mistral-small-3.1-24b-instruct | 29.23 | 29.23 |
16 | gpt-4.1-nano-2025-04-14 | 27.89 | 27.89 |
17 | llama-4-maverick(400B-A17B) | 27.74 | 27.74 |