Intelligent Document Processing Leaderboard

About the Leaderboard

The Intelligent Document Processing (IDP) Leaderboard provides a comprehensive evaluation framework for assessing the capabilities of various AI models in document understanding and processing tasks. This benchmark covers seven critical aspects of document intelligence:

Key Information Extraction (KIE): Evaluating the ability to extract structured information from documents
Visual Question Answering (VQA): Testing comprehension of document content through questions
Optical Character Recognition (OCR): Measuring text recognition accuracy across different document types
Document Classification: Assessing categorization capabilities
Long Document Processing: Evaluating performance on lengthy documents
Table Extraction: Testing tabular data understanding and extraction
Confidence Score: Measuring the reliability and calibration of model predictions

This benchmark is included in the Intelligent Document Processing (IDP) Leaderboard, which assesses the performance of different models in key information extraction tasks. For a comprehensive evaluation of document understanding tasks, please visit the full leaderboard.

Key Information Extraction (KIE) Leaderboard

Key Information Extraction (KIE) evaluates a model's ability to identify and extract specific information from documents, such as names, dates, amounts, and other structured data. This task measures how accurately models can locate and understand key entities within documents.

Rank	Model	Avg	Nanonets-KIE	Docile	Handwritten-Forms
1	gemini-2.5-pro-preview-03-25 (reasoning: low)	79.66	91.00	65.79	82.18
2	qwen2.5-vl-32b-instruct	79.63	89.18	69.18	80.54
3	gemini-2.5-flash-preview-04-17	77.99	91.29	63.35	79.34
4	gemini-2.0-flash	77.22	88.31	65.06	78.28
5	qwen2.5-vl-72b-instruct	76.11	90.52	58.37	79.45
6	claude-3.7-sonnet (reasoning:low)	76.09	87.61	66.80	73.86
7	o4-mini-2025-04-16	75.43	86.91	59.52	79.85
8	mistral-medium-3	74.21	86.49	61.82	77.94
9	llama-4-maverick(400B-A17B)	73.30	85.78	61.70	72.43
10	gemma-3-27b-it	72.81	85.14	60.18	73.13
11	gpt-4.1-2025-04-14	72.68	87.85	61.20	68.98
12	claude-sonnet-4	71.91	85.78	63.53	66.42
13	gpt-4o-2024-08-06	71.83	88.63	56.37	70.48
14	gpt-4o-2024-11-20	70.91	88.03	56.56	68.15
15	InternVL3-38B-Instruct	70.31	84.02	57.47	69.42
16	gpt-4o-mini-2024-07-18	70.03	86.37	60.45	63.26
17	gpt-4.1-nano-2025-04-14	66.25	80.21	51.13	67.41
18	mistral-small-3.1-24b-instruct	63.73	75.47	47.07	68.64

BibTeX

@misc{IDPLeaderboard,
  title={IDPLeaderboard: A Unified Leaderboard for Intelligent Document Processing Tasks},
  author={Souvik Mandal and Nayancy Gupta and Ashish Talewar and Paras Ahuja and Prathamesh Juvatkar and Gourinath Banda},
  howpublished={https://idp-leaderboard.org},
  year={2025},
}

Souvik Mandal^*1, Nayancy Gupta^*2, Ashish Talewar^*1, Paras Ahuja^*1, Prathamesh Juvatkar^*1,
Gourinath Banda^*2

¹Nanonets, ²IIT Indore