Delayed event processing

Degraded

Delayed event processing

Nov 05 at 01:00pm CET

Affected services

[EU] Trace Ingestion

Resolved
Nov 05 at 03:54pm CET

Today, between 12:54 CET and 13:18 CET, we observed database connection issues in our asynchronous event processors in one of our hosting zones. Due to those issues, we stopped our workers which caused the previously reported delays. Before we scaled down the worker instances, we had a window of approximately 10 minutes in which events that were accepted on the API, were not processed by the worker and dropped with an error. This should affect about 1/3 of events that were send to our EU instance within that timeframe.

We added additional error handling to record and store failed events which allows us to replay them in the future instead of dropping them on errors.

We apologize for any inconvenience.

Updated
Nov 05 at 01:38pm CET

We found the root cause and provided a fix. Everything should be working as expected now.

Created
Nov 05 at 01:00pm CET

We identified an issue with our database and observe delayed event processing. We are working on finding the root cause and providing a fix.