Mistral AI Unveils OCR 3: Advanced OCR Model for Document Processing

Mistral AI Releases OCR 3: A Smaller Optical Character Recognition (OCR) Model for Structured Document AI at Scale

Introduction to Mistral OCR 3

Mistral AI has recently launched the latest version of its optical character recognition service, known as Mistral OCR 3. This new model, identified as mistral-ocr-2512, is designed to effectively extract both text and images from various document formats, including PDFs. What sets it apart is its ability to maintain the original layout of documents while being incredibly cost-effective, charging only $2 for every 1,000 pages processed, with a 50% discount available when using the Batch API.

What Can You Expect from Mistral OCR 3?

Mistral OCR 3 is particularly tailored for enterprise-level document handling. It excels at processing forms, scanned documents, complex tables, and even handwriting. In rigorous internal tests, it’s outperformed its predecessor, Mistral OCR 2, achieving a remarkable 74% win rate when compared across various document types.

Key Features of Mistral OCR 3

Outputs document content in markdown format, preserving the original layout.
Enriches output with HTML-formatted tables when table formatting is enabled.
Supports easy integration with downstream systems for enhanced analytics and retrieval.

Integration with Mistral Document AI

OCR 3 is a important component of Mistral Document AI, a in-depth tool for document processing that combines OCR capabilities with structured data extraction as well as Document QnA functionalities. Users can access this feature in the Document AI Playground within Mistral AI Studio. Here, you can upload PDFs or images and receive either clean text or structured JSON outputs without needing to write any code. Also, the same OCR capabilities are accessible via a public API, enabling teams to transition from exploratory use to production-level applications smoothly.

Supported Document Formats

The OCR processor supports a wide array of document formats through a single API endpoint. You can submit documents using:

document_url: for PDFs, .pptx, .docx, and similar file types.
image_url: for image formats like PNG, JPEG, or AVIF.
Uploaded or base64 encoded content via the same schema.

All of this information is detailed in the OCR Processor documentation found on Mistral’s official site. (CoinDesk)

Response Structure

The API response consists of a JSON object that contains a pages array. Each page entry includes:

Index of the page
Markdown content
List of extracted images
List of tables (when HTML formatting is enabled)
Detected hyperlinks
Optional header and footer fields (if extraction is enabled)
Dimensions of the page
Document annotations for structured insights
Usage information for tracking purposes

When images and HTML tables are extracted, markdown output includes placeholders that link back to the actual content, making it easier to reconstruct the document downstream. You might also enjoy our guide on Bitcoin Miners Enter New Era as Hashrate Hits 1 Zetahash Mil.

Enhancements from Mistral OCR 2

With the release of Mistral OCR 3, several significant improvements have been made when compared to its predecessor:

Handwriting Recognition: The new model has enhanced capabilities for accurately interpreting cursive and handwritten text.
Form Processing: Improved detection of boxes, labels, and handwritten inputs, especially in complex layouts like invoices.
Robustness to Scanned Documents: It now handles compression artifacts and other distortions more effectively.
Complex Tables: Enhanced ability to reconstruct multi-row, merged cells, and tables with headers, ensuring layout integrity.

Pricing and Batch Processing

The pricing model for Mistral OCR 3 is set at $2 per 1,000 pages for standard OCR and $3 per 1,000 pages for those requiring structured annotations. By making use of the Batch Inference API, the effective price for standard OCR can be reduced to just $1 per 1,000 pages, thanks to a 50% discount applied for batch processing jobs.

Annotations and Additional Features

This model also incorporates two key features: Annotations for structured data and Bounding Box (BBox) extraction. These functionalities empower developers to label specific document regions and retrieve bounding boxes for various elements, facilitating streamlined mapping into other systems or user interfaces.

Conclusion

In summary, Mistral OCR 3 is a state-of-the-art OCR service that significantly enhances document processing capabilities for enterprises. With features designed for real-world applications, it not only improves accuracy and efficiency but also offers a user-friendly API for developers. This strong solution is set to revolutionize how organizations handle their document workloads. For more tips, check out How Semantic Caching Can Slash Your LLM Costs.

FAQs

what’s Mistral OCR 3?

Mistral OCR 3 is an advanced optical character recognition service designed to extract text and images from documents while maintaining their original structure. (Bitcoin.org)