AI startup Mistral has announced the release of Mistral OCR — a multilingual, advanced optical character recognition (OCR) API that allows users to accurately convert any PDF to a text or markdown file.

The ability to accurately convert scanned or digitized PDFs to editable text files remains a challenge for language service providers that require structured input for translation management systems.

With the release of Mistral OCR, text and markdown output allow PDFs to be readily ingested in downstream applications for further automated processing. This also introduces the use of documents as prompts, enabling users to extract information from PDFs and format it in structured outputs.

Commenting on the release, Mistral stated, “Unlike other models, Mistral OCR comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.”

“As a result, Mistral OCR is an ideal model to use in combination with a RAG system taking multimodal documents (such as slides or complex PDFs) as input,” the company added.

The tool is able to parse, understand, and transcribe scripts, fonts, and languages “across all continents,” being natively multilingual and multimodal. The company published a demo of the solution together with quality scores across a range of languages and scripts that reportedly exceeded those of competitors Azure OCR, Google Docs, and Gemini 2.0.

OCR Experts React

Following the announcement, users were quick to test Mistral’s claim to creating the “world’s best document understanding API.”

Kushal Byatnal, CEO of document processing platform Extend stated, “There is still a large gap for businesses in going from raw OCR outputs to document processing for mission-critical use cases. […] Anyone who goes in expecting 100% automation is in for a surprise.”

“You still need to build and label datasets, orchestrate pipelines, detect uncertainty, and correct with human-in-the-loop, fine-tune, and a lot more. You can certainly get close to full automation over time, but it’s going to take time and effort. But the future is on the horizon!” he added.

Raunak Chowdhuri, Founder of AI document ingestion provider Reducto published an independent comparison of Mistral OCR and Gemini Flash 2.0, and stated that “on financial documents, we find it drops content and hallucinates [on] complex tables. On healthcare forms, we found it misses basic checkbox detection and fails to correct table structure.” 

“Overall, […] we find that Mistral is 43.5% less accurate when examining downstream LLM accuracy on complex parsed forms,” he concluded.

2024 Cover Slator Pro Guide Translation AI

2024 Slator Pro Guide: Translation AI

The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.

The Founders of Pulse AI agreed, publishing their own stress tests of Mistral OCR with similar conclusions.

However, there’s some praise for Mistral’s tool. One user tested the tool’s output in Thai — a language not listed on Mistral OCR’s language benchmark — and noted, “Straight away [Mistral OCR] detects that the language is Thai. […] It displays Thai characters in Unicode [in JSON]. It’s done a pretty good job with the Thai characters and being able to OCR them.”

“Remember that this is doing a structured output, not just OCRing everything. We’re telling it what we wanted to find in there and it’s been able to do that. So if you are looking for multilingual [processing], this is definitely worth checking out.”

Mistral OCR reportedly processes 2,000 pages per minute, at a price of USD 1 for 1,000 pages. The tool is available through an API and can also be self-hosted.



Source link