Forums offline

Incident Report for Home Assistant

Postmortem

The underlying hardware node the forums instance was hosted on had flaky network hardware and AWS scheduled it for decommission on June 2nd. The forums went down randomly due to the flaky hardware and resolved automatically within a few minutes. As a proactive measure, the instance was migrated shortly after the downtime to a new hardware node as to avoid downtime on June 2nd caused by AWS.

Posted May 17, 2019 - 22:37 PDT

Resolved

The network issues have been resolved and the instance has been migrated to a new hardware node to avoid downtime on June 2nd.

Posted May 17, 2019 - 22:36 PDT

Update

AWS has confirmed underlying networking issues with the hardware the instance is on. The hardware node was already scheduled for retirement on June 2nd due to the networking issues. AWS would have caused the forums would have gone down at that time to migrate our instance to a new hardware node. Instead, we are going to do this maintenance right now without AWS intervention to ensure no downtime next week. ETA to service restoration is 15 minutes.

Posted May 17, 2019 - 22:25 PDT

Monitoring

The forums have recovered on their own, whether by random chance or AWS intervention (haven’t heard back from AWS yet, so unsure). Continuing to monitor the situation.

Posted May 17, 2019 - 22:07 PDT

Update

Support case has been opened with AWS. Should have a response within 1 hour at worst. Continuing to investigate on our end.

Posted May 17, 2019 - 22:03 PDT

Investigating

The forums are offline due to a hardware failure at AWS. Teams are working to recover the instance.

Posted May 17, 2019 - 21:45 PDT

This incident affected: Forums (Forums).