Is it possible to force OCR for documents containing text and image data?
Optical Character Recognition (OCR) is applied automatically whenever we do not find any text data in your document. This is for example the case when your PDF contains scanned document pages.
Some applications producer however documents containing text data and at the same time images with embedded text. In this case, our default setting will not apply OCR because some text is already present in the document.
If you have PDF documents mixing text data and images containing embedded text, you can set an option to 'Force OCR'. Once activated, OCR will be applied to all documents. Go to your 'Document Parser > Settings' and set 'OCR' to 'Force OCR' to activate this option.
Additionally if your documents are in landscape format and need to be rotated, you will also need to set the preprocessing to Force OCR which will automatically rotate your documents.