Data center power loss #outage
Wednesday, 20 June 2018 9:30pm to
Thursday, 21 June 2018 12:39am
(GMT-07:00) America/Los Angeles
On June 20 at approximately 9:30pm, Linode's Fremont datacenter lost Internet connectivity, effectively taking the site off-line. Connectivity was restored after midnight, and the site was brought back on-line around 12:39am on June 21. Linode says that a power outage was responsible, but that's all the information they've given. More than half of the machines in the Groups.io cluster were rebooted during this process. All machines came back up without issues.
I was not paged when the site went down; I happened to notice it at about 10pm. The system I use to check whether the entire site is reachable failed to notify me in this instance. I need to fix that.
Groups.io is hosted in only one datacenter. To avoid this type of downtime in the future, a multi-datacenter setup will be needed. I have a technical path to get there, but it greatly complicates the system. Given that this is only the second time in four years that the datacenter has gone down, moving to a multi-datacenter setup is low priority right now.
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#17486): https://beta.groups.io/g/main/message/17486
Mute This Topic: https://groups.io/mt/22503254/174318
Mute #cal-invite: https://groups.io/mk?hashtag=cal-invite&subid=1984272
Mute #outage: https://groups.io/mk?hashtag=outage&subid=1984272
Group Owner: firstname.lastname@example.org
Unsubscribe: https://beta.groups.io/g/main/leave/1984272/799664390/xyzzy [info@...]