French LLM developer Mistral AI has announced the release of a new API designed to handle complex PDF documents. Mistral OCR is an optical character recognition (OCR) API that can turn any PDF document into a text file to make it easier for AI algorithms to process.

Image source: Scott Graham / Unsplash

The language models that power popular generative algorithms like OpenAI’s ChatGPT work particularly well with raw text. So companies looking to introduce their own AI workflows know the importance of storing and indexing data in a clean format so that the information can be reused by AI algorithms.

Unlike many OCR APIs, Mistral’s development is a multimodal API that can recognize not only text, but also illustrations and photographs placed between text blocks. The OCR API generates bounding boxes around detected graphic elements and includes them in the output. As a result of processing a PDF document using Mistral OCR, text formatted in Markdown is generated, which AI algorithms process more efficiently.

Image source: Mistral

«Over the years, organizations accumulate a large number of documents, often in PDF or slide format, that are not accessible to LLM processing, especially for RAG systems [Retrieval-Augmented Generation — a technique for obtaining and using data as context for generative AI algorithms]. With Mistral OCR, our customers can transform complex documents into readable content in all languages. This is a key step towards the widespread adoption of AI assistants in companies that need to simplify access to extensive internal documentation,” says Guillaume Lample, co-founder and chief scientific officer of Mistral.

Mistral OCR is available on the company’s own platform, as well as on the infrastructure of Mistral’s cloud partners, such as AWS, Azure, and others. For companies that work with sensitive or classified data, Mistral offers a local deployment version of the API. The company said that Mistral OCR works better than similar APIs from Google, Microsoft, or OpenAI. The company tested its API on complex PDF documents, including those containing mathematical expressions, complex layouts, and tables.

Leave a Reply

Your email address will not be published. Required fields are marked *