Intelligent Document Processing Leaderboard

A unified leaderboard for OCR, KIE, classification, QA, table extraction, and confidence score evaluation

This work is sponsored by Nanonets.

About the Leaderboard

The Intelligent Document Processing (IDP) Leaderboard provides a comprehensive evaluation framework for assessing the capabilities of various AI models in document understanding and processing tasks. This benchmark covers seven critical aspects of document intelligence:

  • Key Information Extraction (KIE): Evaluating the ability to extract structured information from documents
  • Visual Question Answering (VQA): Testing comprehension of document content through questions
  • Optical Character Recognition (OCR): Measuring text recognition accuracy across different document types
  • Document Classification: Assessing categorization capabilities
  • Long Document Processing: Evaluating performance on lengthy documents
  • Table Extraction: Testing tabular data understanding and extraction
  • Confidence Score: Measuring the reliability and calibration of model predictions

This benchmark is included in the Intelligent Document Processing (IDP) Leaderboard, which assesses the performance of different models in table extraction task. For a comprehensive evaluation of document understanding tasks, please visit the full leaderboard.

Table Extraction Leaderboard

Table Extraction evaluates how well models can identify, understand, and extract tabular data from documents. This includes preserving table structure, relationships between cells, and accurately extracting both numerical and textual content.

Rank Model Avg nanonets_small_sparse_structured nanonets_small_dense_structured nanonets_small_sparse_unstructured nanonets_long_dense_structured nanonets_long_sparse_structured nanonets_long_sparse_unstructured
1 claude-sonnet-4 93.44 98.14 98.98 83.95 92.75 94.87 91.96
2 claude-3.7-sonnet (reasoning:low) 91.23 98.29 99.06 84.89 92.82 92.92 79.38
2 gemini-2.5-pro-preview-03-25 (reasoning: low) 79.51 81.80 86.58 72.58 91.72 88.36 55.99
3 qwen2.5-vl-32b-instruct 77.46 99.07 98.89 34.55 89.22 86.29 56.74
4 gemini-2.5-flash-preview-04-17 75.82 89.00 94.30 60.36 89.99 84.33 36.92
5 gpt-4.1-2025-04-14 74.34 90.23 97.02 66.22 75.79 69.99 46.76
6 llama-4-maverick 74.15 89.94 97.90 52.57 92.50 86.24 25.74
7 gemini-2.0-flash 71.32 86.35 93.09 52.12 85.07 72.70 38.62
8 o4-mini-2025-04-16 70.76 95.48 98.70 66.64 66.56 68.51 28.65
9 mistral-medium-3 70.21 82.58 97.30 65.61 75.60 64.33 35.86
10 gpt-4o-2024-08-06 64.30 76.14 94.11 61.00 65.04 54.11 35.38
11 mistral-small-3.1-24b-instruct 61.64 72.19 89.96 58.10 64.95 57.52 27.13
12 gpt-4o-2024-11-20 60.74 76.46 92.10 61.62 53.02 49.83 31.39
13 InternVL3-38B-Instruct 58.03 84.37 92.19 74.46 32.00 33.31 31.85
14 gemma-3-27b-it 52.38 77.11 88.60 56.01 39.42 28.55 24.57
15 gpt-4.1-nano-2025-04-14 50.83 68.48 89.32 47.21 50.82 33.23 15.89
16 gpt-4o-mini-2024-07-18 50.47 57.31 82.50 51.34 53.37 32.60 25.67
17 qwen2.5-vl-72b-instruct 48.58 63.03 80.77 59.60 28.45 33.41 26.21

BibTeX

@misc{IDPLeaderboard,
  title={IDPLeaderboard: A Unified Leaderboard for Intelligent Document Processing Tasks},
  author={Souvik Mandal and Nayancy Gupta and Ashish Talewar and Paras Ahuja and Prathamesh Juvatkar and Gourinath Banda},
  howpublished={https://idp-leaderboard.org},
  year={2025},
}
Souvik Mandal*1, Nayancy Gupta*2, Ashish Talewar*1, Paras Ahuja*1, Prathamesh Juvatkar*1,
Gourinath Banda*2
1Nanonets, 2IIT Indore