Geting a Number from a scanned pdf Jpeg

dediof39 · November 3, 2023, 8:24am

I am trying to get the number that is inside the red box with using openRPA but my robots can’t take the number inside it. Do you know any method that I can use ?

Velinkton · November 3, 2023, 8:31am

@dediof39
What is your document format? Depending on the format there will be different approaches

Working with images can be used as a universal text recognition method

dediof39 · November 3, 2023, 8:34am

Thank you for your answer Velinkton. My document type is PDF. When I convert it to word the document comes as a form of jpeg.

dediof39 · November 3, 2023, 8:35am

And the issue here is the number ı’m trying to get changes every time. So I can’t give it a spesicif number.

Allan_Zimmermann · November 3, 2023, 9:01am

OpenRPA does not support document processing. It does have OCR, but it’s is designed and optimized for doing UI automation, NOT text extraction ( and the reliability is way to low, for any meaningful text extraction )
For that, you need to use 3rd party tools, like abby, aws textract, google document ai or similar.

at GitHub - open-rpa/examples-files you can find 2 project that could lead you some of the way. object-detection and ocr-with-google-vision both use google’s vision API and can easily be changed into getting text, but for the best results you should up date them to use google document AI.

note: I cannot help with that over the forum, i just made the examples to get people started.

system · November 10, 2023, 9:02am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.