Feature collaboration: Local storage provider options

OpenRPA version: 1.4.56-pre-release
OpenFlow version: 1.5.8 (not really relevant)
Using app.openiap.io or self hosted openflow: app
Error message: N/A
Screenshot or video: N/A
Attach a simple workflow from OpenRPA or NodeRED that reproduces the error/issue: N/A

In the last community meet up we’ve talked about issues with LiteDB, specifically around Citrix/VDI setups (LiteDB doesn’t play nice with some of them).
After some more discussions, a more long term solution seems to be to replace LiteDB with something else, but in the meantime some users (yours truly :slight_smile: ) are blocked in working in those environments.

With that in mind, I’d like to spend some time on pushing what would help me right now and also making further changes down the line easier in the main release channel (since I don’t want to keep maintaining a parallel branch). So before I get down to code, I’d like to confirm some assumptions so it really is compatible.

Prerequisites, as I see it, to solve my issue and make it compatible with mainline releases would be:

  1. Current usage of LiteDB is unaffected (backwards compatible).
  2. Further changes to local storage providers in the future must be doable without much rework.
  3. There must be at least one other configuration option aside of LiteDB to solve for the issue that sparked this.

To meet the prerequisites, after some code checking, this is what I have in mind right now:

  • Add an abstraction layer around current LiteDB usage (ILocalStorageProvider or something similar) [main code changes in: RobotInstance, Workflow, Project]
  • Add an implementation of ILocalStorageProvider for LiteDB
  • Add an implementation of ILocalStorageProvider for no local storage (in-memory only, as discussed)
  • Add localstorageprovider (string?) to settings.json and Interfaces.Config for choosing local storage. Should default to LiteDB if config option missing for backwards compatibility.

That’s the gist of it.

After the above, current and default usage should remain the same with LiteDB, but there would be a configuration option to use OpenRPA with online-only mode, no local storage of detectors/projects/workflows/queues.

If other storage providers would be added in the future they would just need to be wrapped around the interface implementation and added as a configuration option, so it opens up further development, if that need arises (f.e. local git repo).

Biggest hurdle to overcome will be how to implement local wiq without LiteDB, but I’m not sure if that’s a realistic use case to worry about. Personally I’d be okay if the in-memory-only mode would not support local wiq at all, since it realistically doesn’t make sense for it to do so.

So - what do you think?

1 Like

Excellent idea :+1:
I’ve been spending a little time on OpenRPA again.
The latest updates to master should fix the issue i mentioned in the meeting
I will see if i can start on this one of these days
Edit: just re-read your post, I had missed the part where you suggest you will add this. That would be awesome.
Or we could work on it in parallel in GitHub - skadefro/openrpa at NoLiteDB

ok, spent a little while on this the last two days.
It might not be as simple as I originally thought.

Right now, the way it works is, everything is saved “in memory” in ObservableCollection collection ( this was to ensure UI is always up to date ) and using LocallyCached class to save to disk using litedb. When something is added/modified/deleted it’s marked as dirty in the local database. this field is then later used to sync up to openflow.
Now, if all calls was always following that route, ( though LocallyCached ) it potentially could be done by just adding all logic to LocallyCached, but due to speed optimization, there are several places where updates (and queries) are done directly to the local database, and then i depend on LoadServerData or a later update, to handle syncing to openflow.
Lastly, there is quite a lot of places that uses queries toward litedb, to ensure they would work if openflow was down, and/or to limit the load on openflow.

The beauty of simply removing litedb, is it forces you to get around every single place there is an interaction and implement an alternative. I kind of like the idea of simply only saving in ram and openflow, but that also mean maintaining two versions of OpenRPA since i know a lot of people are using it offline only or depend on it to work with unstable internet connections.

We could try and implement an interfaces that allows both insert/update/delete and queries and use that while replacing all litedb references, and then re-implement all litedb implementations, but i think that will be a relative big task. But if that works, it would also allow for implementing cool alternatives, like my big dream of adding a github implementation.

My daughter insisted i watch her play roblox for the last hour, gave my time to think a little.
Maybe a better approach would be to add a prober DAL ( data access layer ).
So there is simply one way to work with data from openrpa’s point of view, and then the DAL can handle support for caching, saving online, to disk etc.
Feels a little messy they way I do it right now, where I’m multiple places are trying to make a decisions regarding how and where to save.
Hard part is going to decide if we should keep the ObservableCollection’s in RobotInstance or make that part of the DAL too. putting it into the DAL would be the most correct way, but the reason i originally added it was it take A LOT of resources and time, to create something that can handle updates from non-ui threads and also allow notifying WPF ui elements of updates.

I have pushed a suggestion on how to add a storage provider into OpenRPA into the NoLiteDB branch.
It’s a little crud, and needs a lot more testing, but seems to work quite well for now.

  • As one of the first things, doing plugin loading in App.xaml.cs it nows supports loading/locating classes in dlls that implements IStorage from OpenRPA.Interfaces.
  • I then added StorageProvider.cs in OpenRPA and i suggest this is used to apply all actions to all providers.
    My original idea was to add both LiteDB and openflow to this and add support for priorities, but for now, this is only used for LOCAL files.
  • I then updated all places where litedb was used, to use StorageProvider and added a few checks for null values ( there might need to be added more )
  • I then added OpenRPA.Storage.Filesystem and OpenRPA.Storage.LiteDB to support saving local files either on the local file system or using LiteDB
  • I then tested and validated OpenRPA will work completely online without saving anything locally if no Storage dll has been added. It also works with both added ( but will NOT auto sync up between them, so if you first added one and later add the next, they WILL be out of sync )
  • I was depending on a “trick” where LiteDB would ignore the JsonIgnore but uses BsonIgnore, to skip certain properties toward openflow, but keep them locally.
    So i first added a “placeholder” implementation of those attributes, and then i added logic to the 2 providers to be able to handle this. ( DoNotIgnoreResolver.cs and UseIgnoreMapper.cs ) We save sync information like isDirty and isDeleted locally, but not inside openflow. same with UI state, like what projects that are expanded and selected workflows.
    This might need some re-work, to be more clean ?
    It also broken with regards to BsonField but i don’t think that matters (?)
  • I noticed the UI would spam insane amount of updates to the database/storage provider, so simplified some UI state code, to make it less spammy.
  • I added a better check for database updated, doing closing of OpenRPA.

I wonder if the last 2 parts, will solve your litedb getting corrupted issues ? if not, then now you have 2 other options, use local file systems and .json files or simply do not save at all ( will not scale well, when you have many workflows )

Let me know what you think.
If this works it, i think i will also add a Git provider. That could be fun. And then only use the openflow connection for events.

Well, that escalated quickly :smiley:

Didn’t think of doing it as plugins, that’s a neat way of keeping the open/closed.

Sounds good to me. I still think it would be nice to have it configurable, so that all could be installed, but you could disable/enable without messing with the installation folders (in VDI setups that’s an image change, so it adds a non-insignificant delay/cost).

That’s amazing :slight_smile:
Did a quick look into the commits, and only saw one potential bug there.

Just checked on normal version, and indeed, it looks to be saving whole record on selected/expanded changes. Awesome that it got discovered as a side effect of this.

I tried to look for that, but couldn’t find it - where is it located?

It potentially could, but even if not it’s good changes, and the local fs/no save should (in theory) encompass workarounds for all kinds of issues.
I’ll check when we could push it to a testable VDI environment, although it would be great to have a beta build for that.

I’ll try to do some checks/cleanups on that branch in the afternoon, but seriously this is already way more than I expected to happen (especially in such a short time) :slight_smile: :bowing_man:

PS.

Judging by last meetup reaction, that would make at least a few developers happy :slight_smile: .

Super low key. I added a listener on the close event to the app domain that sets a boolean, and then i remember to check that, inside all functions. ( in the old days you would get a ton of errors about not being able to to update the database while closing down. And all of those update was triggered by the UI changing doing closing, so all non important updates )

I have fixed a few issues.
I have added plugin settings to each storage provider and added an “enabled” setting for each, so you can enable/disable each plugin using settings.json ( remember you need to close OpenRPA before updating settings.json )
I have added the two storage provider dll’s to the MSI installer
I have uploaded a signed installer at Release 1.4.57.1 · skadefro/openrpa · GitHub

1 Like

Thank you very much for this.

I’ve been stuck on getting it to work and couldn’t find out why it’s failing.
After some more debugging, and testing on 2 machines, and after facepalming that I skipped a release and actually had much more changes than I thought I had, I’ve narrowed it down to the Script plugin python error killing the app completely without any error message.

Once that was sorted out (disabled embedded python on the ‘faulty’ machine) I was able to get the app to run.
The storage solution seems to be working (at least on face value), but aside of the python issue I’m now facing issues with loading custom acitivty .dll’s from extensions that was working in 1.4.54. Not sure how related that is.

All that said, the issues that I’m encountering actually seem to be more related to 1.4.55/56 than to the IStorage implementation, so to not derail this one I’ll try to test with 1.4.56 and open separate issues for the python and custom .dll’s issues if I can reproduce them there.

The python module requires you are using a support version or it will completly crash the process.
Has always been an issue. That is why, by default it uses an embedded version of python. But this has a few limitation, so if you have the correct version if python i %PATH% you can disable embedded and use the one in path.

Not sure about dlls loading.
I changed the dll loader to use EndsWith from Contains … it should not matter but maybe check that?

Did some testing now, and a couple things stand out:

  1. It gets confused when both storage options are enabled.
    Even when connected to openflow (so no syncing between providers needed), it looks like only one (LiteDB, but there’s no guarantee I think) gets synchronized. This results with interesting behavior (like strict throwing on updating a workflow that exists in LiteDB and doesn’t in local files. Could be treated as a known misconfiguration I think (i.e. only use one at a time).
    With OpenFlow though it synchronizes pretty nicely when only one is active, and since that’s the “recommended” setup I wouldn’t consider this a big deal.

  2. Workflow instances spam can get pretty harsh with real projects, as every execution creates a new file.
    It’s not that different from LiteDB, but the overhead seems a bit higher in storage space (pretty easy to clean up, but still something to know about).

That said, if you’re running from Documents, and you have OneDrive, it gets absolutely confused with what is happening and seems to be getting into a synchronization loop until OpenRPA is closed (OneDrive’s “touch” seems to trigger a change in modified timestamp for OpenRPA, which then triggers a write, which triggers a sync and ad nauseum).
So for FileStorage I’d definitely recommend running from roaming profile to avoid that.

All in all it seems to be working fine, but there are some caveats to look out for.

  1. agree, for now only one should be supported, since there is no sync support (between providers)
    The default right now, is litedb is enabled, filesystem is disabled. So people running OpenRPA in offline mode, is not suddenly unable to see their workflows
  2. hmm, i agree, and maybe not completely agree.
    workflow foundation is suppose to be transactional safe, hence we NEED to also save state when requested. If you don’t need to be able to resume workflows after closing OpenRPA, you can just disable state by setting disable_instance_store to true in settings.json ( unless you also depend on openrpa_instances for reporting on running/failed/successful runs, i can see we might need an “middle way” )

I think it works fine as it is, just the overhead is higher. So no changes here needed, would just be good to note that in the docs (which btw I’ll try to add soon if time allows).

1 Like

I’ve PR’d a small documentation update, not sure how much is really needed there (or if it should be on that page or a separate one).

And also did some more testing and tbh it looks to be working just fine as long as it is configured correctly, so I’m all for putting it into the next release.

I was contemplating adding a config option to redirect where the local things are stored, but realistically that can be easily added later if it would actually be needed.

I was close to adding it to the UI as well, but decided to skip it.
I think its best to keep having litedb as default and “flip it” for next major release.
So only people that really needs it, will use the file provider and for them I think its ok they need to add it to settings.json or to the registry

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.