Rust API 0.0.17 - missing file download await on popworkitem?

OpenRPA version: N/A
OpenFlow version: v1.5.10
Using app.openiap.io or self hosted openflow: self-hosted
Error message:
Screenshot or video:
Attach a simple workflow from OpenRPA or NodeRED that reproduces the error/issue:

Using the rustapi 0.0.17 in an agent local test environment, it appears there’s a missing await somewhere on the file download of popworkitem.
This was observed with a csv file of a bit over 2MB, where the last bits of the file were cut off, causing reading issues.
Adding a Task.Delay(2000); made it download correctly, pointing to a missing await somewhere. The workaround is not scalable, as the await time would need to differ depending on file size, connection speed etc.

Repro:

  1. Add a workitem with an attached file of at least a couple MB (depending on your connection throughput the size may vary)
  2. Use the following code with api 0.0.17:
var workItem = await openflowClient.PopWorkitem("FileUploadTestQueue", downloadfolder: WORKING_FOLDER);
  1. Try reading the file immediately after that line (or debug stop).
  2. Compare the downloaded file with the version from OpenFlow.

Also minor issue with the popworkitem:
when checking the .files property of the workitem, it’s null. Not sure if that’s how it supposed to be. It doesn’t make finding the files which were downloaded super easy, as one needs to operate on temp folders one item at a time (and hope that it fetched everything) with clean-up between transactions, as there isn’t a straightforward way of knowing which files came with this workitem.
Or I’m missing some way to do it better.

I fixed the code, to also add files to workitem result when calling push/pop/update workitem.

but i cannot reproduce the issue with files not getting downloaded correctly.
What is different in mycode versus yours ?

case "pp":
    var pushwi = new Workitem { name = "test from dotnet", payload = "{\"_type\": \"test\"}"};
    // var pushwires = await client.PushWorkitem("q2", pushwi,new string[] {"2023_State of the Union address_multilingual.pdf"});
    var pushwires = await client.PushWorkitem("q2", pushwi,new string[] {"assistant-linux-x86_64.AppImage"});
    // var pushwires = await client.PushWorkitem("q2", pushwi,new string[] {"../testfile.csv"});
    Console.WriteLine("Pushed workitem: {0} {1}", pushwires.id, pushwires.name);
    break;
case "p":
    if(System.IO.Directory.Exists("downloads")) {
        System.IO.Directory.Delete("downloads", true);
    }
    System.IO.Directory.CreateDirectory("downloads");
    var popwi = await client.PopWorkitem("q2", downloadfolder: "downloads");
    if(popwi != null) {
        Console.WriteLine("Popped workitem: ", popwi.id, popwi.name);
        for(var i = 0; i < popwi.files.Length; i++) {
            Console.WriteLine("File: ", popwi.files[i]);
            System.IO.File.Copy("downloads/" + popwi.files[i].filename, "downloads/" + popwi.files[i].filename + ".copy");
        }
    } else {
        Console.WriteLine("No workitem to pop.");
    }
    break;

Sorry for late reply.

This is the updated download part based on your example:

        System.Console.WriteLine(openflowClient.connected());
        if (!Directory.Exists(WORKING_FOLDER))
        {
            Directory.CreateDirectory(WORKING_FOLDER);
        } else {
            Directory.Delete(WORKING_FOLDER, true);
            Directory.CreateDirectory(WORKING_FOLDER);
        }
        var workItem = await openflowClient.PopWorkitem("FileUploadTestQueue", downloadfolder: WORKING_FOLDER);
        if (workItem == null)   return;
        System.Console.WriteLine("Files count: " + workItem.files.Length);
        foreach (var file in workItem.files)
        {
            System.Console.WriteLine("file: " + file.filename);
        }
        // await Task.Delay(2000);
        var downloadedFiles = Directory.GetFiles(WORKING_FOLDER, "");
        System.Console.WriteLine("Found files count: " + downloadedFiles.Length);
        foreach (var downloadedFile in downloadedFiles)
        {
            System.Console.WriteLine("Found file: " + downloadedFile);
        }

And here’s the output:

True
Files count: 0
Found files count: 1
Found file: C:\Users\[SNIP]\VSCodeProjects\MasterDataAgent\bin\Debug\net6.0\TEMPCSV\MasterData_[SNIP].csv

So it looks like the workitem thinks it does not have any files, even though it does. Note that the reason I’m doing a directory.getfiles, is exactly because of that.
image

After this code executes, the found file path (there’s always 1 in the work item) is passed to a function that reads it like this:

            using var reader = new StreamReader(csvFilePath);

            using var csv = new CsvReader(reader, config);

            csv.Context.RegisterClassMap<CustomerMap>();
            var records = csv.GetRecordsAsync<[SNIP]OrgRootObject>();
            await foreach (var item in records) // <--- this line throws on iteration 14163

Which throws for the faulty download:

Exception has occurred: CLR/CsvHelper.MissingFieldException
An exception of type 'CsvHelper.MissingFieldException' occurred in System.Private.CoreLib.dll but was not handled in user code: 'Field at index '2' does not exist. You can ignore missing fields by setting MissingFieldFound to null.
IReader state:
   ColumnCount: 2
   CurrentIndex: 2
   HeaderRecord:
["KUNDENR","FORETAKSNR","NAVN", [SNIP]
IParser state:
   ByteCount: 0
   CharCount: 2077314
   Row: 14163
   RawRow: 14163
   Count: 2
   RawRecord:
609192,9'
   at CsvHelper[SNIP]

When I check the file, in OpenFlow it has 2 105 353 bytes, but downloaded version has 2 088 960 bytes.
The record indeed is cut in the downloaded file, and matches the RawRecord that CSVHelper shows as faulty. In the file in OpenFlow, that record (and more later on, up to 14295) is fine.
If I add a delay (the commented line earlier), everything works fine, so this confirms that the code itself and the file are not wrong.
I’ve also triple checked that it’s redownloading and not using some other version of the file.

For additional context, we’re on OpenFlow 1.5.10.48 (just realized the initial post was lacking the detailed version).

Based on the code you showed, I guess the issue is that the workitem thinks it does not have any files attached, even though it does.
When I check the workitem record in OpenFlow, the file is properly listed there:
image

  • I have not pushed an updated version, with a fix, so you can see the files in the workitem, but that is only related to the “wrapper” code in dotnet, the rust code it self, is seeing it, and is parsing the list of files to download them. I wanted to wait with that, until i had also fixed the potential issue with downloading, but I don’t mind pushing a version 0.0.18 so you can run my test as well
  • You clearly did all the right tests, and I agree it looks like a missing await somewhere, but I cannot reproduce it, on either 2-3 MB files or 130 MB files and even with 1.2 GB files. Maybe it’s the file it self or the content type. Would it be possible for you to share the file with me, by email ? and if not, could you create a version that does not contains sensitive data, test it still fails for you and then share it with me ?

I can check with 0.0.18 first.

As for the repro file - I’ll try to fit it today, but in case I don’t:
It’s a UTF-* BOM csv file (OpenFlow identifies it as text/csv), being read as with the code above (so streamreader passed to csvreader, using the CsvHelper package).

Unfortunately time is very short right now, and I’m starting holiday time off soon, so won’t be able to do much more than that for now.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.