Intelligent Document Processing Leaderboard

A unified leaderboard for OCR, KIE, classification, QA, table extraction, and confidence score evaluation

This work is sponsored by Nanonets.

About the Leaderboard

The Intelligent Document Processing (IDP) Leaderboard provides a comprehensive evaluation framework for assessing the capabilities of various AI models in document understanding and processing tasks. This benchmark covers seven critical aspects of document intelligence:

  • Key Information Extraction (KIE): Evaluating the ability to extract structured information from documents
  • Visual Question Answering (VQA): Testing comprehension of document content through questions
  • Optical Character Recognition (OCR): Measuring text recognition accuracy across different document types
  • Document Classification: Assessing categorization capabilities
  • Long Document Processing: Evaluating performance on lengthy documents
  • Table Extraction: Testing tabular data understanding and extraction
  • Confidence Score: Measuring the reliability and calibration of model predictions

This benchmark is included in the Intelligent Document Processing (IDP) Leaderboard, which assesses the performance of various OCR models. For a comprehensive evaluation of document understanding tasks, please visit the full leaderboard.

Optical Character Recognition (OCR) Leaderboard

Optical Character Recognition (OCR) measures a model's ability to accurately convert images of text into machine-readable text. This includes handling various fonts, layouts, and document conditions while maintaining high accuracy in text recognition.

Rank Model Avg OCR-Handwriting OCR-Handwriting-Rotated OCR-Digital-Diacritics
1 gemini-2.5-pro-preview-03-25 (reasoning: low) 81.18 72.19 72.28 99.08
2 gemini-2.0-flash 80.05 71.24 70.30 98.63
3 gemini-2.5-flash-preview-04-17 78.90 69.40 68.46 98.85
4 gpt-4.1-2025-04-14 75.64 65.40 62.64 98.87
5 gpt-4o-2024-11-20 74.91 64.39 61.58 98.75
6 gpt-4o-2024-08-06 74.56 64.48 60.79 98.39
7 o4-mini-2025-04-16 72.82 61.64 58.60 98.23
8 gpt-4o-mini-2024-07-18 72.43 61.04 59.13 97.13
9 llama-4-maverick(400B-A17B) 70.66 58.88 55.28 97.82
10 qwen2.5-vl-72b-instruct 69.61 56.43 57.45 94.95
11 claude-3.7-sonnet (reasoning:low) 69.19 57.41 52.29 97.87
12 mistral-medium-3 69.05 57.63 54.68 94.85
13 gpt-4.1-nano-2025-04-14 67.09 54.48 50.33 96.46
14 InternVL3-38B-Instruct 66.31 51.30 53.02 94.62
15 claude-sonnet-4 64.09 51.64 42.87 97.75
16 gemma-3-27b-it 54.75 55.49 52.64 56.12
17 mistral-small-3.1-24b-instruct 51.01 41.82 39.14 72.05

BibTeX

@misc{IDPLeaderboard,
  title={IDPLeaderboard: A Unified Leaderboard for Intelligent Document Processing Tasks},
  author={Souvik Mandal and Nayancy Gupta and Ashish Talewar and Paras Ahuja and Prathamesh Juvatkar and Gourinath Banda},
  howpublished={https://idp-leaderboard.org},
  year={2025},
}
Souvik Mandal*1, Nayancy Gupta*2, Ashish Talewar*1, Paras Ahuja*1, Prathamesh Juvatkar*1,
Gourinath Banda*2
1Nanonets, 2IIT Indore