Topics

moderated Outage report #outage


 

Hi All,

The Groups.io site was unreachable for 20 minutes this morning, between 3:03am and 3:23am Pacific Time. Linode, our hosting service, has to reboot all instances in their fleet as part of a process to address the Meltdown and Spectre bugs. I was notified yesterday that two of our instances would be rebooted between 3-5am. One of the instances is our main load balancer, so I knew there would be a service interruption, but since it was just a reboot, I figured it would be less than a minute, and in the middle of the night, and some time within a 2 hour window. So I didn't send out a note. I should have. I do not know why it took them 20 minutes of downtime to reboot the machine; the other instance was shutdown and rebooted immediately.

I will be adding another load balancer today to provide redundancy and prevent that from being a single point of failure in the future.


Thanks,
Mark