Change proposal: NugetPackageManager restore functionality

AndrewK · July 10, 2024, 9:53pm

This is a summary change discussion/proposal based on findings related to:
FR: Nuget feed in OpenFlow
(Bug?) Updating dependency leaves loaded dlls unchanged

I’ll try to lay down the train of thoughts, so it’s easier to see why the conclusions are as they are.

Please find below a short list of the main issues, from a user/deployer perspective:

Issue 1:
OpenRPA loads all dll’s from the extensions folder on application start.
This leads to extensions dll’s being locked for update (changing the dependency version in a project does not change the actual dll in extensions), which leads to the update not doing what it’s supposed to and leads to invalid workflows.
Note: this does not occur when a dependency has been added in the same run of OpenRPA, as it can then actually remove the file (process still has ownership of it).

Issue 2:
OpenRPA does not ensure that dependency projects are actually present and loaded for available projects that have a dependency change (it does on project manual import).
This leads to missing dependencies when remotely deploying projects with package dependencies.

Issue 3:
OpenRPA Package Manager only allows for changing dependencies of the projects, but not installing the dependencies on the local machine, and all dependencies show as Installed.
This means that if a package is added to a project as a dependency, Package Manager shows it as Installed, even though it’s completely missing from the machine. To actually install it, the user has to either install or update the package (and thus alter the project), it does not give the possibility to do an equivalent of a nuget restore for available projects.

Issue synthesis:
In summary, the only way to actually install the extension on your machine is to either manually put correct dll’s into the extensions when OpenRPA is closed, or “update” the project dependencies (requires update rights to the project).

Solution approach:
Since reworking OpenRPA to be multi-process or multi-appdomain is not worth the gain just for extensions (it’s a niche use case ), and “external” solutions are at best bandaid hacks, the solution should most likely center around NugetPackageManager and how (when) the extensions are loaded.

Addition 1:
Add a nuget restore equivalent
All of the functionality on the NugetPackageManager side is already there, it should only require adding a new call path to install the packages without altering the projects.
If I’m reading the code correctly, it could work similar to this (a bit hacky, but ehh):

private bool InstallAllProjectDependencies()
{
	var allDependencies = new Dictionary<string, string>();

	foreach (var project in availableProjects
							.Where(p => p.dependencies != null && project.dependencies.Count > 0))
	{
		Log.Debug($"Found dependencies on project: {project.name}");
		foreach (var newDependency in project.dependencies)
		{
			if (allDependencies.ContainsKey(newDependency.Key))
			{
				if (allDependencies[newDependency.Key] != newDependency.Value)
				{
					// could have a better errMsg
					string errMsg = $"Dependency clash for {newDependency.Key}: {newDependency.Value}.";
					Log.Error(errMsg);
					return false;
				}
				
			} else {
				Log.Debug($"Added dependency to load: {newDependency.Key}: {newDependency.Value}");
				allDependencies[newDependency.Key] = newDependency.Value;
			}
		}
	}

	foreach (var kvp in allDependencies)
	{
		// yanked from Project.cs
		 foreach (var jp in allDependencies)
		{
			var ver_range = VersionRange.Parse(jp.Value);
			if (ver_range.IsMinInclusive)
			{
				Log.Information("DownloadAndInstall " + jp.Key);
				var target_ver = NuGet.Versioning.NuGetVersion.Parse(ver_range.MinVersion.ToString());
				await NuGetPackageManager.Instance.DownloadAndInstall(null, new NuGet.Packaging.Core.PackageIdentity(jp.Key, target_ver), true); // can call with project = null as it's not used there? Otherwise create a dummy throwaway project before the loop
			}
		}
	}
	return true;
}

When to call it:

Potentially this could be called instead of the current extension assembly loading in app.xaml, although I’m not sure if the projects are available at that point.
Or, the LoadDll’s parameter could be changed to false and call it before the extensions loading. That way it would do an auto-restore before loading the extensions (config flag default false? → set to true for end users/robots?).

Aside of that, I think it would be good to add a call like that (with true for load) on user demand (either on project level or for all, the code is very similar). That way, the user (dev) would have good control of what’s happening when.

Summary:
I think adding an auto-restore on startup would “fix” most of the extensions version management issues related to this.

So… thoughts?

If this sounds reasonable, I’ll be happy to proceed with seeing if it actually works as intended (at least the manual restore part, not sure about the auto-restore part on startup).
As in - I can add it and PR it in if it works, so you don’t have to worry about this for now I just need an opinion if this is a change that you’re open to before diving into code.

For now, something like this manual restore + the PackageRestore workflow from the OpenFlow Nuget attempt should be a good workaround.

Allan_Zimmermann · July 11, 2024, 8:19am

I’m not sure I agree on issue 3.
When OpenRPA starts, it does a NuGet install for all projects, AND it installs packages as you “add” them, both locally and remotely.

foreach (var _id in updatePackages)
{
    try
    {
        var p = Projects.FindById(_id);
        await p.InstallDependencies(true);
    }
    catch (Exception ex)
    {
        Log.Error(ex.ToString());
    }
}

inside LoadServerData in RobotInstance.cs.
I have done a ton of tests on this a few years ago, but I trust you if you tell me it’s broken now, but please check again just to be safe?

I have a tendency to “misread/understand,” and then after I reply and come back later, I get a “doh” moment. So bear with me if I misunderstood:

The reason I load all DLLs no matter what is to get the toolbox updated in case a DLL contains activities. I guess we could add some “caching” on what DLLs have activities and what has not, and then only load those on startup and the rest from CurrentDomain.AssemblyResolve, but I kind of assume you have a dependency because you intend on using it, hence it would still be locked on the machines you actually want to update, so it would not solve the main issue.

Allan_Zimmermann · July 11, 2024, 8:21am

Oh, I just had my first “doh” moment…
You mean, if it loads the DLLs at startup and then gets an updated project from the server, the DLLs are already locked? True… In that case, it makes sense to not load the DLLs right away, but I’m not sure how to tackle “push loading the toolbox” to later? But for most robots, I would assume they are always online and therefore have an up-to-date project definition.

AndrewK · July 11, 2024, 8:50am

Setup:
Nuget source set:
<add key="OpenRPAPackages" value="C:\Users\{myUser}\AppData\Roaming\OpenRPA\packages"/

Available project:

Dependencies state (nothing installed):

Starting OpenRPA…

Opening a workflow:

XAML relevant parts:

Folder states (still empty):

After running the workflow I sent you yesterday the things install and work just fine.
So - either there is one more place that keeps track of the “installed” state that I missed, and that’s why it doesn’t restore, or it just doesn’t restore.
The thing is if I missed the place that also needs to be cleaned up (btw - checked GAC, these packages are not there), then just clearing the extensions folder will lead to that state.

Allan_Zimmermann · July 11, 2024, 8:58am

try adding a breakpoint to DownloadAndInstall in NuGetPackageManager.cs
does it fail on GetNearest check ?
if not, does it hit the “InstallFile” foreach ?

AndrewK · July 11, 2024, 9:57am

I’ll check in the evening on another machine that I’m not messing with so much.

Today popped up a potential workaround for the dll lock as well (just putting it here so I don’t forget).
Since OpenRPA installs into and loads from the same folder (and that causes the lock-on-startup-load), what if it didn’t?
So the installations and so on would work the same, but not the loading on-start. It could do an extensions-runtime-cache copy (in the openrpa folder) at startup, and load from there. And on close (or also at startup in case of hard crashes) - clean it up.
Since the issue does not surface when you’re in the same session (it still has file ownership and can do whatever with them) and it can’t unload from the domain anyway (so if you load a different version of the same thing… I’m not really sure how that interacts under the hood) so a restart is still needed, then maybe?
Sure, it would increase the startup time a tiny bit, but it shouldn’t be even noticeable unless it’s running on a really slow HDD and there’s waaaay too many extensions.

AndrewK · July 11, 2024, 4:39pm

I’ve set these up as below (the first one just to check if debugger attaches correctly, disabling on first hit):

It hit none of them on startup in app when opening the package manager it hits the initialize breakpoint, so for the whole procedure nothing happened there.

Allan_Zimmermann · July 11, 2024, 5:00pm

Maybe we miss understood eachother
either you installed a package locally, using the package manager ui
This will then call the install function

or you installed it on a remote machine, and your local openrpa needs to “catch” it …

if it’s running it does so in the onWatchEvent inside RobotInstance.cs that calls InstallDependencies
if it was offline, then doing startup, it “catches” the updated project inside LoadServerData that keeps track on update projects and call InstallDependencies for each of them

simply restarting will not do anything

Allan_Zimmermann · July 11, 2024, 5:03pm

and you might say, but what about a “new” robot …
well, when that starts, the local database is empty, so LoadServerData will see all projects as “new” and call InstallDependencies on those.

so if you dont’ want to update a project to test, you can also just close openrpa, delete the localdatabase and start it again

AndrewK · July 11, 2024, 5:10pm

Yeah, I just realized I was going on a completely wrong path. That’s a bit embarrassing

After making sure that it had a project version mismatch between machines:
It does try to install, if the project versions are different, and then it hits the locked dll issue since it already loaded them.

It works fine on a “new” robot, as you said, but after a restart there (so the dll’s lock out), it’s back to square one.

I’ll try to debug some more later, I think I know how to sidestep that (with the extensions-runtime-cache idea). If it works, we could also enable back the Update button, since that should work too. And then extensions should be safe for use, meaning that custom models and so on are pretty much risk free (as long as you can give nuget access to OpenRPA the packages they need).

Sorry for the confusion, I should have been more rigorous when testing on a multi-machine setup so easy to miss a spot.

Once we’ll get this working, I promise to make a video tutorial on how to make use of the extensions for data modeling - and after our last talk about different stack pieces, with a class model that works for both .net framework 4.6.2 (OpenRPA) and .net6 (agents). That’s one way to future-proof potential moving of OpenRPA parts to agents later for scalability/fine coding control, and keep the data strongly typed on both ends on the same model.

Allan_Zimmermann · July 11, 2024, 6:59pm

I’m just happy you are willing to put in some time to this.
I know we cannot make it perfect, due to OpenRPA being a one-process thing right now, but any help to make it better is always welcome.

AndrewK · July 11, 2024, 10:52pm

Small update and I’m out for today:
The good:
extensions-runtime-cache sidestep of the dll-lock seems to be working this is really nice, and allows for turning the Update back

But it’s a western, unfortunately

The bad:
When 2+ projects have a dependency overlap, the newest version will be used. This is fine, and as it should be, but since it’s completely invisible to the user, it can lead to an invalid xaml (or behaviour change) if the versions have breaking changes between them. This is for both direct dependencies, and further down the dependency tree.

The ugly:
When 2+ projects have a dependency overlap, and you uninstall (remove) the dependency from one of them, this triggers a dependency uninstall, which will leave the other project “broken” for execution (missing dependency). Looking through the code, this could even happen if one project has A->B->C dependency, and the other has a direct B or C, so it’s really hard to find at first as a user.

So… I’m working now on a dependency resolver that will take into account all available projects, traverse the dependecy trees of them (using all available nuget sources), then flatten the dependencies to find the “highest minimum version” that fits all of them, then a) produce a recommendation to the user to update dependencies for conflicting projects, or b) triggers a install for the packages (if there’s no conflict).

So far I have the flattened dependency “highest minimum version”, and could implement triggering an install, but it doesn’t check for conflicts yet.
Ahh, the good ol’ dependency hell mitigations

For now I’m adding it as a side-functionality, not changing anything in the existing NugetPackageManager, but since a lot of the functionality overlaps (especially the package finding), I think it could be deduplicated later.

More tomorrow, if time allows

PS. If anyone wants to have a look

AndrewK · July 18, 2024, 2:26pm

Time didn’t allow, so no progress, but found also a potential bug/unintended behavior around this line:

github.com

open-rpa/openrpa/blob/4bdd00d59fdd3968b3d4491496bb108ffd9f0595/OpenRPA/NuGet/NuGetPackageManager.cs#L168


      
          public async Task<List<IPackageSearchMetadata>> SearchPackageVersions(string packageid, bool includePrerelease)
          {
              var ret = new List<IPackageSearchMetadata>();
              foreach (var sourceRepository in DefaultSourceRepositoryProvider.GetRepositories())
              {
                  var searchResource = await sourceRepository.GetResourceAsync<PackageMetadataResource>();
                  using (var cacheContext = new SourceCacheContext())
                  {
                      cacheContext.DirectDownload = true;
                      var metadataList = await searchResource.GetMetadataAsync(packageid, includePrerelease, true, cacheContext, NullLogger.Instance, CancellationToken.None);
                      if (metadataList.Count() > ret.Count())
                      {
                          ret = metadataList.ToList();
                      }
                  }
              }
              return ret;
          }
          public async Task<List<PackageSearchItem>> GetLocal(Project project, PackageIdentity identity)
          {
              var result = new List<PackageSearchItem>();

Since for my testing purposes I’ve put in OpenFlow just the versions I wanted, it wasn’t actually finding the package version I was expecting it to see due to (I think, can’t go to debug mode right now) a different feed having more versions of that package.

What seems to be potentially causing it is a little above the highlighted line - if I’m reading it correctly, the search uses always all feeds, regardless of which one is selected, and returns results from the one that has most versions (which might not include the version that is needed).

Truth be told, I have a working “workaround” for these issues (with the custom workflows, and adding the extension-runtime-cache), so I’m not sure if this should be pursued further, as it seems (and this might be overshooting) that most of the NugetPackageManager (including the UI bindings) would need changes.
So I’m not really sure if it’s worth it (still would be fun though, but time and so on).

I guess we could just pull in the changes related to the folder cache (so dll’s don’t lock, ergo update works as expected) and leave the rest for later?

AndrewK · July 25, 2024, 8:31am

Update so the topic doesn’t close

We’ve talked with Allan on how to handle this, and the solution looks like this so far:

On OpenRPA start, before loading plugins and extensions clean the extensions folder
On LoadServerData first run (so on startup), instead of installing dependencies just for updated projects, install for all (and since extensions are cleaned, we avoid the dll lock without creating even more copies of the dll’s)
Before install for all it will first go through all available projects, gather all dependencies (including dependency dependencies recursively) using all configured nuget feeds (exit on first match), then compose a deduplicated list of all package dependencies and their versions
If any package is referenced with multiple versions, it will a) take the highest version (only sensible option really) and b) prepare the alert for the userthat there is a dependency clash (including from where it originates: projects and dependency chain to get to the conflicting versions)
It will then install all dependencies in highest versions required
Lastly it will Log.Output the dependency clashes (so it’s the last message posted → easiest to see)

The check for clashes could potentially be later added on actions like closing package manager (so it can tell you about the clashes after you potentially changed them), and maybe also after it gets a new project version (since that will still trigger a dependency install if needed).

Please do note that regardless of what we will add here, there will still be a need to restart OpenRPA when dependencies change versions, since the old one is loaded and it cannot be changed in the AppDomain without restart with current design. So that part is unavoidable, but at least it will do that automatically on restart and alert on clashes, so it should be easier to catch that something might be off.

yashshah · July 25, 2024, 7:36pm

Just ask from where all extension dll will be downloaded? Is it from Openflow or from nugget feed?

AndrewK · July 25, 2024, 8:12pm

From nuget feeds.
Adding a nuget feed to OpenFlow is out of scope of this change (still would be cool to do it, but it’s a different part of the stack).

AndrewK · July 26, 2024, 3:44pm

Can’t trust anyone to not introduce dependency clashes

Or in short:
System.Text.Json (which is a built in .net library) has 2 dependencies (System.Memory and System.Text.Encodings.Web) which both rely on System.Runtime.CompilerServices.Unsafe BUT one needs version 4.5.3, the other 6.0.0…

Obviously this still works in this case, but this just goes to show that dependency clashes can be everywhere and introduced by anyone at any point to anything, even multiple levels down the line.

All I’m trying to say is I finally got the dependency path working correctly (recursion, yay) which was needed for proper moving forward, so now it’s relatively downhill to wrap this up on this part of the feature.

AndrewK · July 26, 2024, 8:55pm

Ok, I think it’s done
PR will be opened soon.

Result on different mismatches when just doing the check (opening NugetPackageManager or starting OpenRPA with automatic restore turned off):

And when you turn it on, then it cleans the extensions folder and does a full reinstall on startup:

It can be controlled from settings.json here (wasn’t sure if it should be snake case?)

Tested with missing dependency installations, partially missing dependency installations, version mismatches and things present. Also with different projects having different direct dependencies (easiest to clash).
Did not test with trying to install a package that straight up doesn’t exist in any of the available nuget sources, because that’s already handled (IIRC).

When a live-update of a project comes, it still installs as it did before, this is effectively a “side functionality” that does the dependency checks and installations in a little bit different way.

Note: this does not fix the bug from this post higher up where a package version is not found in the UI of Package Manager because some other feed has more versions of it.

Testing of course still needed, as right now it just “works on my machine”

Allan_Zimmermann · July 27, 2024, 8:25am

You are fast … Nice ! …

( i think it would be best to keep config names as the others … ie using underscore for space and all lowercase )

AndrewK · July 28, 2024, 12:11pm

Added to PR, now it’s snake_case