Topics

moderated Outage report #outage

 

Hi All,

We had an incident starting at 4:04pm and lasting until 4:21pm. During that time the website was only sporadically available, and email was not delivered at all.

A post to a group triggered a digest to 271 of their members. The digest was 15MB in size each. Because each digest is unique, that meant 4GB of messages. The process that is responsible for sending email, called karl (after Karl the Mailman Malone), currently keeps all messages in memory. This immediately ate up much more than 4GB of memory, including overhead, causing the karl process to run out of memory. The main web machine became unresponsive and was rebooted. This sequence occurred several more times, until I was able to add some code to karl to skip reading those large digest messages when it started up. The machine is now stable.

To resolve this, I need to re-write parts of karl to not keep large messages in memory. Once I am able to do that, the large digest messages will be delivered.

Thanks,
Mark

Sugar <sugarsyl71@...>
 

As always Mark, God bless you and all you do to help us all enjoy groups io

Sugar

 

‘I have loved the stars too fondly to be fearful of the

night.

 

Sugar

 

From: beta@groups.io [mailto:beta@groups.io] On Behalf Of Mark Fletcher
Sent: Monday, January 30, 2017 4:36 PM
To: beta@groups.io
Subject: [beta] Outage report #outage

 

Hi All,

 

We had an incident starting at 4:04pm and lasting until 4:21pm. During that time the website was only sporadically available, and email was not delivered at all.

 

A post to a group triggered a digest to 271 of their members. The digest was 15MB in size each. Because each digest is unique, that meant 4GB of messages. The process that is responsible for sending email, called karl (after Karl the Mailman Malone), currently keeps all messages in memory. This immediately ate up much more than 4GB of memory, including overhead, causing the karl process to run out of memory. The main web machine became unresponsive and was rebooted. This sequence occurred several more times, until I was able to add some code to karl to skip reading those large digest messages when it started up. The machine is now stable.

 

To resolve this, I need to re-write parts of karl to not keep large messages in memory. Once I am able to do that, the large digest messages will be delivered.

 

Thanks,

Mark



 

As an addendum, the extra large digest was caused by a bug. The group is not set to resize photos in email. We normally resize photos in digests to a max of 400x400 or what the group is set to resize photos to, whichever is smaller. Because of the bug, for groups that don't resize emailed photos, we were not resizing photos at all for digests. Hence, the extra large digests. This has been fixed.

Thanks, Mark

Bob Bellizzi
 

Sounds like self denial of service to me <g>

Bob Bellizzi