Mitigating bot issues resulting with changes to the UI

Felipe_Carrillo · April 30, 2023, 11:45pm

Hi all,

New to the RPA world and working on learning about this cool technology. One of the challenges I’ve noticed so far is that if there’s a change with the UI that was programmed in with a bot then it could “break” the bot. What are some of the ways this can be mitigated? I was reading a UI Path article and it appears their bots have AI computer vision that enables each bot to “see” and understand every element on a computer screen. Does this feature help mitigate this problem by programming a bot that “looks” for a given element on the computer screen rather than pre-programming a bot that looks for an element in the same location of the computer screen every time? Does Open RPA have a similar feature as UI Path’s computer vision? Thanks in advance for the input

Allan_Zimmermann · May 1, 2023, 7:54am

It’s kind of hard to answer properly when you did not link to the article.

So, RPA can be done either using image recognition (which will be sensitive to changes in DPI and is very hard to make work across multiple computers), or various technologies originally made for accessibility features but are now also used for UI testing and RPA (when talking about Windows elements and Java).

For image recognition, most vendors already use AI when searching for matches. When taking a screenshot and searching for something, there will always be small variations, therefore we use machine learning to find “as close to matches” as we can. OpenRPA uses OpenCV for this.
But that is not what you were talking about.

The idea of using image recognition and the added AI option to intelligently “guess” what was supposed to be clicked in case is interesting. Say, I want to click an OK button, and there can be a billion ways to create an OK button, but as humans, we recognize one very quickly, and AI can be trained to do the same. With a big set of screenshots of different UIs with metadata information, it would be possible to do. Both RoboCorp and OpenIAP have done trials with that, but to my knowledge, no one has that as a production-ready feature (?)

When using APIs either through accessibility features or application APIs like SAP, Chrome, Mainframe, etc., we are constantly fine-tuning our selectors to be specific enough to avoid “hitting” / getting the wrong item but also making it resilient to minor changes (the title changes, the Automation ID changes, the UI changes, the amount of UI elements based on former placement, etc.). There can be a huge difference in how easy this is to accomplish in different applications.

I have never heard of any company creating something that can intelligently fix a broken selector (but specifically, UiPath has done some very good attempts. For instance, if a selector breaks, you can create a new one, make UiPath compare them, and make a working version. I guess that “smells” a little like it, but you were describing something that does it on its own, that is not what that features does. I know they also did some testing with “self-healing bots” that would try to fix selectors themselves ( that was automatic ), but to my knowledge, those are not production-ready (?)

A completely different approach would be to use AI to understand what it is you want to achieve and let the AI “do its thing.” No recorder, no selectors, just describe to a language model what you want to achieve, and an AI, using a combination of different AIs, resolves the task (often using NLP to understand the UI). I may be grasping for straws here, but I feel that is kind of what TagUI was trying to take the first steps in doing. Letting the user write human text, and the bot made it happen.

I am 100% sure there are multiple companies working on something like that right now, but I have not seen any demos or white papers about it yet. It has been on my to-do list to try something myself, but due to financial issues, I have had to put this project on hold.

Adept AI has made a working prototype that only works in browsers (and that is a magnificent accomplishment!!!). You can see videos of this on their Twitter feed https://twitter.com/AdeptAILabs/status/1570144499187453952.

But a general purpose RPA solution that works even when the UI changes is not real yet, but I can guarantee you, it’s coming, really soon.
I would love to be the first one to offer that in an open source setting, but without funding it’s not going to happen.

Felipe_Carrillo · May 2, 2023, 2:57am

@Allan_Zimmermann
Hi Allan,

Thanks so much for the extremely in-depth explanation. Here is the link I was referring to:

If you look at the video, it appears that UI Path’s computer vision is primarily meant to address the use case of using RPA in virtual environments like CiTRIX, VMWare, etc. But without understanding the technology well, this got me wondering if this technology would be able to eliminate issues associated with changes to the UI. Based on what you said it doesnt seem this is able to do that??

Regarding your comment about OpenRPA using OpenCV, would you mind clarifying what you mean by “image recognition”? You’re not referring that OpenRPA uses OpenCV when I insert Get Element in a sequence, then Open Selector then select an element in the UI,correct? My understanding is that when I follow these steps that the selector is defined by me clicking on a given element on the UI. This selector is then defined by attributes (ID, class, name, tag name, etc) that makes the element unique.

For your paragraph “When using APIs either through accessibility features or application APIs like SAP, Chrome, Mainframe, etc., we are constantly fine-tuning our selectors to be specific enough to avoid “hitting” / getting the wrong item but also making it resilient to minor changes …”, could you please point me to where can I learn more about using APIs with OpenRPA? I don’t believe I understand how (or why) one would use APIs with OpenRPA.

The approach of using AI to let it do the entire work sounds amazing! I saw the video on Twitter about AdeptAI. Very cool technology!

Regarding this comment “It has been on my to-do list to try something myself, but due to financial issues, I have had to put this project on hold.”, I would not be able to provide much financial support at this time but I’d be interested in helping with the development. With that said I’m faily new to development so I’d probably be very slow at contributing, at least initially.

Thanks again for taking the time to write this very detailed response!

Allan_Zimmermann · May 2, 2023, 6:54am

The video shows image recognition, not AI that enhances RPA. Most RPA vendors have image recognition in their toolset, and as already explained, so does OpenRPA. However, you should try to avoid using image recognition as much as possible. In my last post, I already explained some of the challenges with using image recognition.

OpenRPA works using an extension system, where each of the different RPA technologies is implemented as an extension in its own DLL. You can find them in the toolbox by the DLL’s name. Image recognition is located under ‘openrpa.image’ (as shown in this video), just like Windows automation is under ‘openrpa.windows’, and browser automation is under ‘openrpa.nm’, and so on.

Anything that is not using image recognition is using APIs ‘under the hood’. That is what I meant by APIs in that context. When it comes to using OpenRPA, you should always try to use APIs as much as possible. APIs are more stable than RPA, so you should rely on APIs as much as possible and only use RPA as a fallback plan when nothing else works. The priority should be:

Use APIs (invoke code/custom NuGet packages in OpenRPA or use agents in OpenFlow).
Use non-image recognition technologies and avoid physical clicks and the physical keyboard (TypeText).
If everything else fails, use keyboard/mouse.
If everything else fails, use image recognition.

OpenRPA is different from all other RPA products out there since it only focuses on RPA. Any API integrations are done using OpenFlow, where you can choose a low-code environment like Elsa or Node-RED that offers drag-and-drop support for integrating with over 3500 APIs, or run your own code written in .NET, Node.js, or Python inside agents.

There is a pretty old video by Anders Jensen that explains many of the concepts mentioned above, which you can watch here. Additionally, there is a newer video that demonstrates how to do a local install, how OpenRPA/OpenFlow interacts, and also showcases work items (which you should use if possible - see more here). You can watch the newer video here.

Felipe_Carrillo · May 3, 2023, 1:46pm

@Allan_Zimmermann

Hi Allan, thanks again for for the detailed explanation. I have a couple more questions but I think some will probably be answered by the resources you provided, so I will review those and perhaps ask anything that is not answered by those.

Thanks again!

Felipe

system · May 10, 2023, 1:47pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.