Highlight some specific texts on scanned pdf file

Yeow_Ruo_Ming · April 14, 2023, 4:22am

I wonder if I can highlight some specific text in the scanned pdf file? As the file is in image form (but the format is pdf), i felt hard to locate the place of the specific text that i want to highlight.

Allan_Zimmermann · April 14, 2023, 8:08am

OpenRPA does not support document processing.
It does support OCR but primarily for finding text and images on the screen.
If you could get the image file out of the pdf. you could use “Load from file” and then use “Get Text” to try and read the text of it. But keep in mind, these where not really designed for that kind of usage.
If you want to extract text or tabular data out of an pdf or image, you should use a 3rd party solution like aws textract, google document ai/google vision ai, abby etc.
Someone made an example on how to do that using google here. I did not make or test it.

Here is an example with both openrpa and nodered by me. This is object detection, but it can easily be converted to do document processing or simpel OCR too,

system · April 21, 2023, 8:08am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.