moderated Activity log #outage


 

Hi All,

Yesterday, between 1:07pm and 1:38pm Pacific time and again this morning between 4:18am and 5:24am Pacific time, the connections to the activity database were exhausted. What this means is that during those times, access to the activity log was slow or errored out. Also, some writes to the activity log were not recorded; I'm not sure how many.

When this happened yesterday, I fixed the issue by restarting all the services. I then spent some time trying to figure out what caused it, but was unable to come up with a reason. When it happened again this morning, I had a second data point and was able to track down the problem and fix it. It turned out that an export of a very large group (and subgroups), would trigger this. The code involved had not been touched in a very long time; it just turned out that, between the large size of the group involved combined with the ever increasing load on our databases caused this to become a problem.

I've re-written the part of the export group process so that this does not happen again. I still need to think through how I can be notified more quickly when an intermittent issue like this happens.

Please let me know if you have any questions.

Thanks,
Mark

Join main@beta.groups.io to automatically receive all group messages.