App.openiap.io down time

Allan_Zimmermann · May 17, 2024, 5:58pm

app.openiap.io has been down ALOT the last 2 days.
Has been a pain to figure out what was wrong, but I think i finally “nailed” it

History is generated using jsondiffpatch an absolute wonderful and brilliant package, but it finally meat it’s match. If you feed it a document with an array of more than 57395 entries, it uses all the ram, even if you give it 6 gigabyte ram, and i was doing that, while doing housekeeping on my users role.
Users role has been kind of a problem for me since openflow started to gain traction. Keeping that up to date started get harder once i got over 2,000 users and with +50,000 it just got to big.
A long time ago, i moved the “users” logic out of the actual role, but i keept updating the role object in the collection too, “just to be safe” … I have now removed that, to save IO in the database and to avoid more issues with ram. It makes no sense to have an object that big inside mongodb anyway.

Allan_Zimmermann · May 20, 2024, 10:15am

this is getting embarrassing
one-two times an hour it crashes, EITHER due to the database not responding, or one of the api nodes using all the ram on the host it’s running on.
A long time ago i added heapdump_onstop setting so i can debug what is wrong, but’s taking so long to do, kubernetes kills it … ARGH …
Anyway, I’m sorry to anyone affected by this, i’m trying really hard to figure out what is going on.

Allan_Zimmermann · May 21, 2024, 11:35am

i think i managed to get it solved now.
And I did not need to change any code. For some strange reason, mongodb is now requiring almost 3 times as much memory as it used too.
It used to crash 2-3 times an hour … but i have only seen one crash in the last 24 hours.

Allan_Zimmermann · May 24, 2024, 7:35am

god damn it, this has been driving me nuts.
But finally found one more thing. i keept having a crash 3 times a day, and always at the same “time” of day. So either someone externally was doing something scheduled or my housekeeping job was having issues.
Turns out my housekeeping job was the problem. My calculate db usage job was making my mongodb run out of memory and crashing.
This has now been fixed. And while working on that, i’m preparing to add some crude interface for tracking slow queries.

Allan_Zimmermann · May 27, 2024, 4:36pm

I feel like giving up soon.
So i spent HOURS refining my sidecar to better handle mongodb replicat sets on kubernetes, and i freaking forget to set the connection string to use the full replicate set, so when mongodb did a failover, app.openiap.io was down for 30 minutes until i saw the email about the site being down … ARGH …

Afif_Fauzi · May 28, 2024, 4:27am

is there any probability that this issue affects my localhost.openiap.io?

Allan_Zimmermann · May 28, 2024, 6:43am

That is hard to answer, what is the issue you are having ? what is the symptoms and do you see any errors ? ( my issue is inside kubernetes, if something use to much ram, i don’t get an error, it just gets killed, and i have no way of “tracking” what is using all that ram. ) on docker this is usually not a problem, since it by default does not have resource throttling

Afif_Fauzi · May 28, 2024, 6:02pm

Mine is deployed on the Docker. The OpenFlow is down and inaccessible, and runs normally after the restart on the docker. Unluckily, I didn’t capture the Docker’s log

system · June 4, 2024, 6:02pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.