Intelligent Document Processing Leaderboard

About the Leaderboard

The Intelligent Document Processing (IDP) Leaderboard provides a comprehensive evaluation framework for assessing the capabilities of various AI models in document understanding and processing tasks. This benchmark covers seven critical aspects of document intelligence:

Key Information Extraction (KIE): Evaluating the ability to extract structured information from documents
Visual Question Answering (VQA): Testing comprehension of document content through questions
Optical Character Recognition (OCR): Measuring text recognition accuracy across different document types
Document Classification: Assessing categorization capabilities
Long Document Processing: Evaluating performance on lengthy documents
Table Extraction: Testing tabular data understanding and extraction
Confidence Score: Measuring the reliability and calibration of model predictions

This benchmark is included in the Intelligent Document Processing (IDP) Leaderboard, which assesses the performance of various OCR models. For a comprehensive evaluation of document understanding tasks, please visit the full leaderboard.

Optical Character Recognition (OCR) Leaderboard

Optical Character Recognition (OCR) measures a model's ability to accurately convert images of text into machine-readable text. This includes handling various fonts, layouts, and document conditions while maintaining high accuracy in text recognition.

Rank	Model	Avg	OCR-Handwriting	OCR-Handwriting-Rotated	OCR-Digital-Diacritics
1	gemini-2.5-pro-preview-03-25 (reasoning: low)	81.18	72.19	72.28	99.08
2	gemini-2.0-flash	80.05	71.24	70.30	98.63
3	gemini-2.5-flash-preview-04-17	78.90	69.40	68.46	98.85
4	gpt-4.1-2025-04-14	75.64	65.40	62.64	98.87
5	gpt-4o-2024-11-20	74.91	64.39	61.58	98.75
6	gpt-4o-2024-08-06	74.56	64.48	60.79	98.39
7	o4-mini-2025-04-16	72.82	61.64	58.60	98.23
8	gpt-4o-mini-2024-07-18	72.43	61.04	59.13	97.13
9	llama-4-maverick(400B-A17B)	70.66	58.88	55.28	97.82
10	qwen2.5-vl-72b-instruct	69.61	56.43	57.45	94.95
11	claude-3.7-sonnet (reasoning:low)	69.19	57.41	52.29	97.87
12	mistral-medium-3	69.05	57.63	54.68	94.85
13	gpt-4.1-nano-2025-04-14	67.09	54.48	50.33	96.46
14	InternVL3-38B-Instruct	66.31	51.30	53.02	94.62
15	claude-sonnet-4	64.09	51.64	42.87	97.75
16	gemma-3-27b-it	54.75	55.49	52.64	56.12
17	mistral-small-3.1-24b-instruct	51.01	41.82	39.14	72.05

BibTeX

@misc{IDPLeaderboard,
  title={IDPLeaderboard: A Unified Leaderboard for Intelligent Document Processing Tasks},
  author={Souvik Mandal and Nayancy Gupta and Ashish Talewar and Paras Ahuja and Prathamesh Juvatkar and Gourinath Banda},
  howpublished={https://idp-leaderboard.org},
  year={2025},
}

Souvik Mandal^*1, Nayancy Gupta^*2, Ashish Talewar^*1, Paras Ahuja^*1, Prathamesh Juvatkar^*1,
Gourinath Banda^*2

¹Nanonets, ²IIT Indore