It’s kind of hard to answer properly when you did not link to the article.
So, RPA can be done either using image recognition (which will be sensitive to changes in DPI and is very hard to make work across multiple computers), or various technologies originally made for accessibility features but are now also used for UI testing and RPA (when talking about Windows elements and Java).
For image recognition, most vendors already use AI when searching for matches. When taking a screenshot and searching for something, there will always be small variations, therefore we use machine learning to find “as close to matches” as we can. OpenRPA uses OpenCV for this.
But that is not what you were talking about.
The idea of using image recognition and the added AI option to intelligently “guess” what was supposed to be clicked in case is interesting. Say, I want to click an OK button, and there can be a billion ways to create an OK button, but as humans, we recognize one very quickly, and AI can be trained to do the same. With a big set of screenshots of different UIs with metadata information, it would be possible to do. Both RoboCorp and OpenIAP have done trials with that, but to my knowledge, no one has that as a production-ready feature (?)
When using APIs either through accessibility features or application APIs like SAP, Chrome, Mainframe, etc., we are constantly fine-tuning our selectors to be specific enough to avoid “hitting” / getting the wrong item but also making it resilient to minor changes (the title changes, the Automation ID changes, the UI changes, the amount of UI elements based on former placement, etc.). There can be a huge difference in how easy this is to accomplish in different applications.
I have never heard of any company creating something that can intelligently fix a broken selector (but specifically, UiPath has done some very good attempts. For instance, if a selector breaks, you can create a new one, make UiPath compare them, and make a working version. I guess that “smells” a little like it, but you were describing something that does it on its own, that is not what that features does. I know they also did some testing with “self-healing bots” that would try to fix selectors themselves ( that was automatic ), but to my knowledge, those are not production-ready (?)
A completely different approach would be to use AI to understand what it is you want to achieve and let the AI “do its thing.” No recorder, no selectors, just describe to a language model what you want to achieve, and an AI, using a combination of different AIs, resolves the task (often using NLP to understand the UI). I may be grasping for straws here, but I feel that is kind of what TagUI was trying to take the first steps in doing. Letting the user write human text, and the bot made it happen.
I am 100% sure there are multiple companies working on something like that right now, but I have not seen any demos or white papers about it yet. It has been on my to-do list to try something myself, but due to financial issues, I have had to put this project on hold.
Adept AI has made a working prototype that only works in browsers (and that is a magnificent accomplishment!!!). You can see videos of this on their Twitter feed https://twitter.com/AdeptAILabs/status/1570144499187453952.
But a general purpose RPA solution that works even when the UI changes is not real yet, but I can guarantee you, it’s coming, really soon.
I would love to be the first one to offer that in an open source setting, but without funding it’s not going to happen.