Feat: Add Tesseract OCR feature to selection tool
Hello dear okular developers;
This merge request introduces a much-requested feature: Optical Character Recognition (OCR) for image selections.
This allows users to quickly extract unselectable text (from scanned documents or image-based PDFs) and copy it directly to the clipboard.
Implementation Details:
-
Tool: Uses Tesseract for OCR processing.
-
Workflow: A new "OCR to Clipboard" option is added to the context menu when an image area is selected in Selection Mode.
-
Image Quality Optimization (Crucial for Accuracy): To ensure high-quality recognition from low-DPI screen captures, the image selection is upscaled 3x and converted to Grayscale (Format_Grayscale8) before being passed to the Tesseract engine. This significantly boosts accuracy.
-
Localization: Tesseract is initialized with the multi-language setting "tur+eng" for broad utility.
-
Scope: This feature is implemented in
okularpartand usesTesseract_LIBRARIESfor linking to maintain consistency with the project's existing CMake style.**Demonstration Video:** You can see the workflow in action here: