A unified leaderboard for OCR, KIE, classification, QA, table extraction, and confidence score evaluation
The Intelligent Document Processing (IDP) Leaderboard provides a comprehensive evaluation framework for assessing the capabilities of various AI models in document understanding and processing tasks. This benchmark covers seven critical aspects of document intelligence:
This benchmark is included in the Intelligent Document Processing (IDP) Leaderboard, which assesses the performance of different models in visual question answering task. For a comprehensive evaluation of document understanding tasks, please visit the full leaderboard.
Visual Question Answering (VQA) tests a model's ability to understand and answer questions about document content. This includes both text-based questions and questions that require understanding of the document's visual layout and structure.
Rank | Model | Avg | ChartQA | DocVQA |
---|---|---|---|---|
1 | o4-mini-2025-04-16 | 87.07 | 87.78 | 86.35 |
2 | gemini-2.5-pro-preview-03-25 (reasoning: low) | 85.99 | 85.77 | 86.22 |
3 | gemini-2.5-flash-preview-04-17 | 85.16 | 84.82 | 85.51 |
4 | claude-3.7-sonnet (reasoning:low) | 83.47 | 79.43 | 87.51 |
5 | claude-sonnet-4 | 82.51 | 79.13 | 85.89 |
6 | gemini-2.0-flash | 82.03 | 79.28 | 84.79 |
7 | qwen2.5-vl-32b-instruct | 81.36 | 77.67 | 85.05 |
8 | gpt-4.1-2025-04-14 | 80.37 | 77.76 | 82.97 |
9 | qwen2.5-vl-72b-instruct | 80.10 | 76.20 | 84.00 |
10 | llama-4-maverick(400B-A17B) | 80.10 | 72.81 | 87.39 |
11 | mistral-medium-3 | 80.02 | 74.68 | 85.37 |
12 | gpt-4o-2024-08-06 | 79.08 | 75.27 | 82.89 |
13 | gpt-4o-2024-11-20 | 75.60 | 72.34 | 78.87 |
14 | InternVL3-38B-Instruct | 74.82 | 71.43 | 78.21 |
15 | gpt-4.1-nano-2025-04-14 | 74.08 | 65.27 | 82.89 |
16 | gpt-4o-mini-2024-07-18 | 72.86 | 66.30 | 79.42 |
17 | mistral-small-3.1-24b-instruct | 71.50 | 66.16 | 76.83 |
18 | gemma-3-27b-it | 66.85 | 57.65 | 76.05 |