Resolved
The incident has been resolved by a patch. The underlying issue was an unbounded data query data kept fetching data and forced our containers to be out of memory killed.
Identified
We've identified the issue and limited the blast radius to tracing specific API calls. The UI and the prompt management features should be fully available again. We're working on a patch for the underlying problem.
Investigating
We’re seeing elevated error rates impacting US-region APIs (including non-tracing routes like Prompts) and parts of the UI, and are working to restore normal service as quickly as possible.