Monitoring AWS issues
Incident Report for Home Assistant
Resolved
This incident has been resolved.
Posted Jun 13, 2023 - 14:38 PDT
Monitoring
AWS is currently experiencing issues affecting our services.

https://health.aws.amazon.com/health/status

From AWS Status page:
[02:29 PM PDT] Lambda synchronous invocation APIs have recovered. We are still working on processing the backlog of asynchronous Lambda invocations that accumulated during the event, including invocations from other AWS services (such as SQS and EventBridge). Lambda is working to process these messages during the next few hours and during this time, we expect to see continued delays in the execution of asynchronous invocations.

[02:00 PM PDT] Many AWS services are now fully recovered and marked Resolved on this event. We are continuing to work to fully recover all services.

[01:48 PM PDT] Beginning at 11:49 AM PDT, customers began experiencing errors and latencies with multiple AWS services in the US-EAST-1 Region. Our engineering teams were immediately engaged and began investigating. We quickly narrowed down the root cause to be an issue with a subsystem responsible for capacity management for AWS Lambda, which caused errors directly for customers (including through API Gateway) and indirectly through the use by other AWS services. We have associated other services that are impacted by this issue to this post on the Health Dashboard.

Additionally, customers may experience authentication or sign-in errors when using the AWS Management Console, or authenticating through Cognito or IAM STS. Customers may also experience intermittent issues when attempting to call or initiate a chat to AWS Support.

We are now observing sustained recovery of the Lambda invoke error rates, and recovery of other affected AWS services. We are continuing to monitor closely as we work towards full recovery across all services.

[01:38 PM PDT] We are beginning to see an improvement in the Lambda function error rates. We are continuing to work towards full recovery.

[01:14 PM PDT] We are continuing to work to resolve the error rates invoking Lambda functions. We're also observing elevated errors obtaining temporary credentials from the AWS Security Token Service, and are working in parallel to resolve these errors.

[12:36 PM PDT] We are continuing to experience increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. We have identified the root cause as an issue with AWS Lambda, and are actively working toward resolution. For customers attempting to access the AWS Management Console, we recommend using a region-specific endpoint (such as: https://us-west-2.console.aws.amazon.com). We are actively working on full mitigation and will continue to provide regular updates.

[12:26 PM PDT] We have identified the root cause of the elevated errors invoking AWS Lambda functions, and are actively working to resolve this issue.

[12:19 PM PDT] AWS Lambda function invocation is experiencing elevated error rates. We are working to identify the root cause of this issue.

[12:08 PM PDT] We are investigating increased error rates and latencies in the US-EAST-1 Region.
Posted Jun 13, 2023 - 13:29 PDT
This incident affected: Home Assistant Cloud (Remote UI, Alexa, Google Assistant, Webhooks, Account).