moderated Activity log #outage


 

Hi All,

Yesterday, between 1:07pm and 1:38pm Pacific time and again this morning between 4:18am and 5:24am Pacific time, the connections to the activity database were exhausted. What this means is that during those times, access to the activity log was slow or errored out. Also, some writes to the activity log were not recorded; I'm not sure how many.

When this happened yesterday, I fixed the issue by restarting all the services. I then spent some time trying to figure out what caused it, but was unable to come up with a reason. When it happened again this morning, I had a second data point and was able to track down the problem and fix it. It turned out that an export of a very large group (and subgroups), would trigger this. The code involved had not been touched in a very long time; it just turned out that, between the large size of the group involved combined with the ever increasing load on our databases caused this to become a problem.

I've re-written the part of the export group process so that this does not happen again. I still need to think through how I can be notified more quickly when an intermittent issue like this happens.

Please let me know if you have any questions.

Thanks,
Mark


Andy I
 

Thank you, Mark.  I think we fell into that window, in #30729 about the missing message and log entries.  The message in question was sent at 4:38 AM Pacific time today, and gone a few hours later.

So, is the theory that the message was legitimately deleted by its author, and the only things lost were the log entries for sending and deleting it?

Have there been other outages in the last month or so?  I have seen other things I couldn't explain in the log.

Andy


 

On Tue, Nov 2, 2021 at 11:54 AM Andy <AI.egrps+io@...> wrote:
Thank you, Mark.  I think we fell into that window, in #30729 about the missing message and log entries.  The message in question was sent at 4:38 AM Pacific time today, and gone a few hours later.

Please send me off-list the subject of the message, and the sender. If you can give me the Message-ID from the header, that'd help as well.

 
Have there been other outages in the last month or so?  I have seen other things I couldn't explain in the log.

No, this is the first time this has ever happened.

Thanks,
Mark