Incidents | Langfuse Incidents reported on status page for Langfuse https://status.langfuse.com/ https://d1lppblt9t2x15.cloudfront.net/logos/7de111f73a3f6b9a466ae4dc504298bb.png Incidents | Langfuse https://status.langfuse.com/ en [US] Delayed Ingestion Processing https://status.langfuse.com/incident/611913 Mon, 30 Jun 2025 19:41:00 -0000 https://status.langfuse.com/incident/611913#1bb4f1f826bc0a71a0511d1dd6cf1d60740ad4f6fef46a6f1325f24ef5d13822 We scaled our services and events are processed in time. [US] Delayed Ingestion Processing https://status.langfuse.com/incident/611913 Mon, 30 Jun 2025 17:36:00 -0000 https://status.langfuse.com/incident/611913#1d2bb297f4a369815fd29d8958d0eea0b900aa98189748acac65a4e4c418e538 Since 07:36 pm CEST, we have delayed ingestion on the US data region. We currently have a delay of ~3 minutes. We are scaling out systems to allow for higher throughput. Delayed ingestion processing https://status.langfuse.com/incident/608249 Tue, 24 Jun 2025 11:45:00 -0000 https://status.langfuse.com/incident/608249#df29d670865d86f1d14225094073657eabfb181f30b76c9c75dd9764fdaa95d4 All systems are healthy again. Delayed ingestion processing https://status.langfuse.com/incident/608249 Tue, 24 Jun 2025 10:32:00 -0000 https://status.langfuse.com/incident/608249#edaf46ade9216b2dbb27227ceffbc0c1b1b64fd17de8de61b6dce9048d1267c4 Ingestion is currently delayed. [US] Delayed Ingestion Processing https://status.langfuse.com/incident/608127 Tue, 24 Jun 2025 09:22:00 -0000 https://status.langfuse.com/incident/608127#4d0109204bb472e42240c191852dae34db943e32bac89bc521a226a3b4a0edfa The issue has been resolved and we're processing events in real time again. [US] Delayed Ingestion Processing https://status.langfuse.com/incident/608127 Tue, 24 Jun 2025 08:32:00 -0000 https://status.langfuse.com/incident/608127#e3e1da05a3a152db4cab3868e8e849420afbcaedeba004c54d6ea0e12f810aa6 The delay is back to 5 minutes. We're actively working on a mitigation. [US] Delayed Ingestion Processing https://status.langfuse.com/incident/608127 Tue, 24 Jun 2025 08:09:00 -0000 https://status.langfuse.com/incident/608127#32315ec2136dc72487297aac8059d06d1712ec43b2fbbd3f28937c30c4fdede4 We still see a fluctuation around wait times, but reduced the average time to approximately 1 minute. [US] Delayed Ingestion Processing https://status.langfuse.com/incident/608127 Tue, 24 Jun 2025 07:21:00 -0000 https://status.langfuse.com/incident/608127#1bdb0feb04546c71cdbd04334e0145e129e36904eb20e4cca9d7a8b50fe6de37 We are observing a delay around ingestion event processing of approximately 5min. We are investigating the situation. [US] Ingestion is currently delayed. https://status.langfuse.com/incident/607897 Mon, 23 Jun 2025 23:01:00 -0000 https://status.langfuse.com/incident/607897#5486d7e93fb706d917e69980eaabb174e396e46c31061569aca4443cdd5c5e17 All queues are processed in time. [US] Ingestion is currently delayed. https://status.langfuse.com/incident/607897 Mon, 23 Jun 2025 22:30:00 -0000 https://status.langfuse.com/incident/607897#db48dfcebd7e5ae5fcb25e53772d0dc78010cca555aa2cdee354b60d85871b1c Currently, event ingestion in the US dats region is delayed by 7 minutes. We are working on scaling our systems. Delayed ingestion https://status.langfuse.com/incident/607678 Mon, 23 Jun 2025 14:13:00 -0000 https://status.langfuse.com/incident/607678#d05bb18c4972469d94477ee4fc3704b9f9e0287da91cc40d45b5b83e692d676e All systems are healthy again. Delayed ingestion https://status.langfuse.com/incident/607678 Mon, 23 Jun 2025 12:32:00 -0000 https://status.langfuse.com/incident/607678#964276be4ccf4e0082cc90e8f7dec25d0fe7795b514996d9512d3312a5058ff4 Ingestion is currently delayed. [US] API latencies for prompts API https://status.langfuse.com/incident/600348 Tue, 10 Jun 2025 20:18:00 -0000 https://status.langfuse.com/incident/600348#1ab4ea069c50a215ca4df32d4407eede8a685aec70dc6a07e686542f2df31dc7 We have scaled our services to address latency issues. The p90 latency is now reduced from 10 seconds to 30 milliseconds. [US] API latencies for prompts API https://status.langfuse.com/incident/600348 Tue, 10 Jun 2025 19:51:00 -0000 https://status.langfuse.com/incident/600348#bbd28ad0f68ee5ac3d2fe262e339991970097f1130e48c5b27f086ed651d557f We experience higher than usual Prompt API latencies. We are investigating the root cause. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 09:03:00 -0000 https://status.langfuse.com/incident/572481#da37a6935d74a21d7ea946199705c3643c3cec0cc6f60ff29e3b3c68f5b88995 All systems are healthy again and Google Auth issues are resolved. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 09:03:00 -0000 https://status.langfuse.com/incident/572481#da37a6935d74a21d7ea946199705c3643c3cec0cc6f60ff29e3b3c68f5b88995 All systems are healthy again and Google Auth issues are resolved. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 09:03:00 -0000 https://status.langfuse.com/incident/572481#da37a6935d74a21d7ea946199705c3643c3cec0cc6f60ff29e3b3c68f5b88995 All systems are healthy again and Google Auth issues are resolved. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 09:03:00 -0000 https://status.langfuse.com/incident/572481#da37a6935d74a21d7ea946199705c3643c3cec0cc6f60ff29e3b3c68f5b88995 All systems are healthy again and Google Auth issues are resolved. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 08:59:00 -0000 https://status.langfuse.com/incident/572481#89b41193dfc7e608a884f7e1ce7a855e6764d8e6a1b44f9b6f790855608854c0 API and tracing performance have recovered. Google Auth issues remain in both US and EU. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 08:59:00 -0000 https://status.langfuse.com/incident/572481#89b41193dfc7e608a884f7e1ce7a855e6764d8e6a1b44f9b6f790855608854c0 API and tracing performance have recovered. Google Auth issues remain in both US and EU. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 08:59:00 -0000 https://status.langfuse.com/incident/572481#89b41193dfc7e608a884f7e1ce7a855e6764d8e6a1b44f9b6f790855608854c0 API and tracing performance have recovered. Google Auth issues remain in both US and EU. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 08:59:00 -0000 https://status.langfuse.com/incident/572481#89b41193dfc7e608a884f7e1ce7a855e6764d8e6a1b44f9b6f790855608854c0 API and tracing performance have recovered. Google Auth issues remain in both US and EU. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 08:10:00 -0000 https://status.langfuse.com/incident/572481#7f64b5d9d107a6b533550599911671f7af3006a07a4d2762580af7916d2bc1ba We're currently observing failing Google Authentications across the EU and US environment and elevated error rates and latencies for API and frontend operations within the US environment. We're investigating the situation. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 08:10:00 -0000 https://status.langfuse.com/incident/572481#7f64b5d9d107a6b533550599911671f7af3006a07a4d2762580af7916d2bc1ba We're currently observing failing Google Authentications across the EU and US environment and elevated error rates and latencies for API and frontend operations within the US environment. We're investigating the situation. [EU + US] Failing Google Auth and Degraded API Performance https://status.langfuse.com/incident/572481 Tue, 20 May 2025 08:10:00 -0000 https://status.langfuse.com/incident/572481#7f64b5d9d107a6b533550599911671f7af3006a07a4d2762580af7916d2bc1ba We're currently observing failing Google Authentications across the EU and US environment and elevated error rates and latencies for API and frontend operations within the US environment. We're investigating the situation. Elevated error rates across public APIs and UI https://status.langfuse.com/incident/557594 Tue, 06 May 2025 17:31:00 -0000 https://status.langfuse.com/incident/557594#e947bcde001278191c729f7d910cef08fb0240c0bf4b8a03a5ad7b7750f82596 Issue has been resolved. Elevated error rates across public APIs and UI https://status.langfuse.com/incident/557594 Tue, 06 May 2025 17:00:00 -0000 https://status.langfuse.com/incident/557594#7812ac724a5964bb171491d8dacc50036f183417c4a0b525b7196676423b86d2 We are currently experiencing elevated error rates in both the Langfuse Cloud UI and on public API routes affecting traces, observations, and scores. Investigation is ongoing. Delayed ingestion https://status.langfuse.com/incident/554990 Thu, 01 May 2025 15:21:00 -0000 https://status.langfuse.com/incident/554990#24a405a72758629919fb1f1e5242a6835b7db8922e5c9a05bc365e9fd327b94d All events are inserted in a minute or less. Delayed ingestion https://status.langfuse.com/incident/554990 Thu, 01 May 2025 14:37:00 -0000 https://status.langfuse.com/incident/554990#0c5eb40ce227e2c5f886d18b595ac80a82bc2450171923f9ac417e5e9faaedb0 Ingestion is currently delayed in EU. Investigation is ongoing. Delayed ingestion https://status.langfuse.com/incident/551705 Fri, 25 Apr 2025 12:20:00 -0000 https://status.langfuse.com/incident/551705#c6177471b8fa5c26886c5a0819219d20efafcf0bf3a1ffa349f67ebb28f45eaa Ingestion has recovered. All services work as expected. Delayed ingestion https://status.langfuse.com/incident/551705 Fri, 25 Apr 2025 11:58:00 -0000 https://status.langfuse.com/incident/551705#a42d27e88f2e9e6f11fefc053242ae81b28270edabd43103e553181abf003c33 Ingestion is currently delayed in EU. Investigation ongoing. Delayed ingestion https://status.langfuse.com/incident/550406 Thu, 24 Apr 2025 13:50:00 -0000 https://status.langfuse.com/incident/550406#ead7d898b7543a60103af31010437808eeddc136863f3e3e168bafb2004e81fd Ingestion is currently delayed in EU. Investigation is ongoing. Delayed ingestion https://status.langfuse.com/incident/550359 Thu, 24 Apr 2025 12:06:00 -0000 https://status.langfuse.com/incident/550359#a543a01383bd644d7ad6a467a6e60554165e2fedb4c6c05956c262f22e6238e9 Ingestion is currently delayed in EU region. Investigation is ongoing. Delayed event ingestion https://status.langfuse.com/incident/549647 Wed, 23 Apr 2025 09:46:00 -0000 https://status.langfuse.com/incident/549647#2b1ce53b8cdd953fb9675416ee68232670d5ffd0fb9accedce250fe255752c55 Ingestion delayed for around 10 minutes. Langfuse interface unavailable for some users [workaround: refresh the page] https://status.langfuse.com/incident/545805 Tue, 15 Apr 2025 19:28:00 -0000 https://status.langfuse.com/incident/545805#e8c15e188ff5f2703351af47ba80d8f2ec4b578fe6873a7d0fd9093863aaa8b4 Fix deployed and issue resolved. Langfuse interface unavailable for some users [workaround: refresh the page] https://status.langfuse.com/incident/545805 Tue, 15 Apr 2025 19:28:00 -0000 https://status.langfuse.com/incident/545805#e8c15e188ff5f2703351af47ba80d8f2ec4b578fe6873a7d0fd9093863aaa8b4 Fix deployed and issue resolved. Langfuse interface unavailable for some users [workaround: refresh the page] https://status.langfuse.com/incident/545805 Tue, 15 Apr 2025 18:54:00 -0000 https://status.langfuse.com/incident/545805#b0cbae685eedc6ea70c0392a5d0c02791819f192ffa3364ec7b44b00451e8721 We have identified the issue and are currently deploying a fix. It will be available in 20-30 minutes across all regions. In the meantime, reloading the page should resolve the issue if you encounter it. Langfuse interface unavailable for some users [workaround: refresh the page] https://status.langfuse.com/incident/545805 Tue, 15 Apr 2025 18:54:00 -0000 https://status.langfuse.com/incident/545805#b0cbae685eedc6ea70c0392a5d0c02791819f192ffa3364ec7b44b00451e8721 We have identified the issue and are currently deploying a fix. It will be available in 20-30 minutes across all regions. In the meantime, reloading the page should resolve the issue if you encounter it. Langfuse interface unavailable for some users [workaround: refresh the page] https://status.langfuse.com/incident/545805 Tue, 15 Apr 2025 18:33:00 -0000 https://status.langfuse.com/incident/545805#a0576096e8decc02ed7057de5b885ddcb74ea8fd26c0fb0babcc6ae56f8c22f4 We got reports that the UI is unavailable for some users, we are investigating the situation. Langfuse interface unavailable for some users [workaround: refresh the page] https://status.langfuse.com/incident/545805 Tue, 15 Apr 2025 18:33:00 -0000 https://status.langfuse.com/incident/545805#a0576096e8decc02ed7057de5b885ddcb74ea8fd26c0fb0babcc6ae56f8c22f4 We got reports that the UI is unavailable for some users, we are investigating the situation. High prompt API latencies https://status.langfuse.com/incident/533789 Mon, 24 Mar 2025 22:15:00 -0000 https://status.langfuse.com/incident/533789#5416cbb9f0a4704c0268de9bd5d6ed6c348e3bbcd842c9431149de85080ad980 The Prompts API responds with very low latency again. High prompt API latencies https://status.langfuse.com/incident/533789 Mon, 24 Mar 2025 21:07:00 -0000 https://status.langfuse.com/incident/533789#8f108b68807f8e0ac6e63120ffadbde3c3bc2ddeb8dcf28a93185410231681d3 Since 10:07 pm CET, we observe abnormally high delays on our Prompts API due to delays in the event loop of our Node containers. [US] Delayed event ingestion https://status.langfuse.com/incident/531874 Thu, 20 Mar 2025 22:50:00 -0000 https://status.langfuse.com/incident/531874#14a2e4ddd140873d030c29ef4558f981d754b6c8314e2d44f806028b88e89b5c All events are processed in time. We continue to scale our systems to serve increasing demand. [US] Delayed event ingestion https://status.langfuse.com/incident/531874 Thu, 20 Mar 2025 21:04:00 -0000 https://status.langfuse.com/incident/531874#488d7078a6a522ab75b683f15279b802bc7e0850474bf4a4b2059207d1815f29 We're currently observing around 20 min delay in processing newly ingested events for the US environment. We're investigating the situation. [US] Delayed event ingestion https://status.langfuse.com/incident/531874 Thu, 20 Mar 2025 20:54:00 -0000 https://status.langfuse.com/incident/531874#6eb05460a48285b7b1a036310eede5a10d828fd624ff92c8cffc55f3f90ebad4 We're currently observing around 12 min delay in processing newly ingested events for the US environment. We're investigating the situation. [US] Trace Ingestion Delayed https://status.langfuse.com/incident/530921 Wed, 19 Mar 2025 18:00:00 -0000 https://status.langfuse.com/incident/530921#9cf7ab2e9905721184cdacf801353898caa10b8bc5c006ef09b588351d0e9d86 We have resolved the incident. [US] Trace Ingestion Delayed https://status.langfuse.com/incident/530921 Wed, 19 Mar 2025 17:42:00 -0000 https://status.langfuse.com/incident/530921#961f89c108583636f528599bc80ed352fd562475ddb9ab3c033ba6b02903b5d4 We have scaled workers, wait time reduced to approx. 2mins. [US] Trace Ingestion Delayed https://status.langfuse.com/incident/530921 Wed, 19 Mar 2025 17:30:00 -0000 https://status.langfuse.com/incident/530921#8bcd6fb7391935db2a402e458480cf879cca4550de582910bb8bc6baec571a2b We're currently observing around 4min delay in processing newly ingested events for the US environment. We're investigating the situation. [EU] Trace Ingestion Delay https://status.langfuse.com/incident/530753 Wed, 19 Mar 2025 11:53:00 -0000 https://status.langfuse.com/incident/530753#e9fd3e9e2225cd2e57a13db5937d5bea1cda9ea6f94fc3badb78f61d3da5618e The issue has been resolved. [EU] Trace Ingestion Delay https://status.langfuse.com/incident/530753 Wed, 19 Mar 2025 11:49:00 -0000 https://status.langfuse.com/incident/530753#a33d41d22a9852afbd95a420a431fa3875f3fc4ec25a67c8ea266f2285e9e90a We're currently observing around 5min delay in processing newly ingested events for the EU environment. Situation is recovering. [EU] Trace Ingestion Delay https://status.langfuse.com/incident/530753 Wed, 19 Mar 2025 11:34:00 -0000 https://status.langfuse.com/incident/530753#d268247a65af2b48a43432699a1e3a77f66e12c0ceb956d6036ba7afc6377589 We're currently observing around 14min delay in processing newly ingested events for the EU environment. We're investigating the situation. [EU] Trace Ingestion Delay https://status.langfuse.com/incident/530753 Wed, 19 Mar 2025 11:29:00 -0000 https://status.langfuse.com/incident/530753#2ef284dce94862c32c1cbd518e2c261c090c6832167363950cc27f46f7493254 We're currently observing around 9min delay in processing newly ingested events for the EU environment. We're investigating the situation. [EU] Trace Ingestion Delay https://status.langfuse.com/incident/530753 Wed, 19 Mar 2025 11:24:00 -0000 https://status.langfuse.com/incident/530753#72f1fd3fdfaff92234e9a870e28fbbdb97dea35d86953f8989d9b504a5b03a44 We're currently observing around 6min delay in processing newly ingested events for the EU environment. We're investigating the situation. [EU] Delayed Trace Ingestion https://status.langfuse.com/incident/530741 Wed, 19 Mar 2025 11:05:00 -0000 https://status.langfuse.com/incident/530741#b9e2b43e45a33088e3fcd38d9286363e9fa8c0f3b05c9bebe7fc80f2d8483e1e Issue has been resolved. [EU] Delayed Trace Ingestion https://status.langfuse.com/incident/530741 Wed, 19 Mar 2025 10:59:00 -0000 https://status.langfuse.com/incident/530741#844c04e3e402e15beb989da45ea56e2474b92e0bbf13ff99cd03d15fbd6c6be2 We're currently observing around 6min delay in processing newly ingested events for the EU environment. We're investigating the situation. [EU] Delayed Trace Ingestion https://status.langfuse.com/incident/530261 Tue, 18 Mar 2025 16:09:00 -0000 https://status.langfuse.com/incident/530261#747d730a4992b0aee0438bdf355a4cc43975ec2edf5218b628d4932c6a38a211 The issue has been resolved. [EU] Delayed Trace Ingestion https://status.langfuse.com/incident/530261 Tue, 18 Mar 2025 15:21:00 -0000 https://status.langfuse.com/incident/530261#12ba73f18ffb3ccdb17fb2ae8b2e9e4c4a80309424cf3df7346305453daf6728 We're currently observing around 7min delay in processing newly ingested events for the EU environment. We're investigating the situation. [US] Delayed trace ingestion https://status.langfuse.com/incident/526445 Tue, 11 Mar 2025 14:17:00 -0000 https://status.langfuse.com/incident/526445#63847f1bdbe04e3406a4f99779bbdd75f1e9c7dca1f157e011a52ce9d1fc8465 We're back to normal processing times. [US] Delayed trace ingestion https://status.langfuse.com/incident/526445 Tue, 11 Mar 2025 14:11:00 -0000 https://status.langfuse.com/incident/526445#0965b60e15c0ca3743faa32798f15b86c2beeed252eb7999d33b81dde44ca4e2 We made changes to our processing and reduced the processing delay to approximately 3 minutes. [US] Delayed trace ingestion https://status.langfuse.com/incident/526445 Tue, 11 Mar 2025 13:16:00 -0000 https://status.langfuse.com/incident/526445#aad41d254f083d24103c3783b64f7d5198bece83a17cb7ddd5d49f835fba6913 We are currently investigating a delayed processing of newly ingested traces on the US data region. Current delay: 10 minutes [EU & US] Elevated Error Rates Across Tracing APIs https://status.langfuse.com/incident/518284 Tue, 25 Feb 2025 17:39:00 -0000 https://status.langfuse.com/incident/518284#5c210182cafed001227ff770dfea584fcb934c8938f8e3e370bfec18f5d90476 Our error rates are back down and operations behave normally. [EU & US] Elevated Error Rates Across Tracing APIs https://status.langfuse.com/incident/518284 Tue, 25 Feb 2025 17:39:00 -0000 https://status.langfuse.com/incident/518284#5c210182cafed001227ff770dfea584fcb934c8938f8e3e370bfec18f5d90476 Our error rates are back down and operations behave normally. [EU & US] Elevated Error Rates Across Tracing APIs https://status.langfuse.com/incident/518284 Tue, 25 Feb 2025 17:26:00 -0000 https://status.langfuse.com/incident/518284#ece720ffe6dd6ea900c7308422211d3764e2ccfd608b1535575579f471fcbd9c Together with the ClickHouse team we developed two fix candidates that are currently being tested. We see a reduction in the total error rate across multiple endpoints and will continue to observe the situation. [EU & US] Elevated Error Rates Across Tracing APIs https://status.langfuse.com/incident/518284 Tue, 25 Feb 2025 17:26:00 -0000 https://status.langfuse.com/incident/518284#ece720ffe6dd6ea900c7308422211d3764e2ccfd608b1535575579f471fcbd9c Together with the ClickHouse team we developed two fix candidates that are currently being tested. We see a reduction in the total error rate across multiple endpoints and will continue to observe the situation. [EU & US] Elevated Error Rates Across Tracing APIs https://status.langfuse.com/incident/518284 Tue, 25 Feb 2025 10:25:00 -0000 https://status.langfuse.com/incident/518284#f57f4bf35ea7ca2a411796e0518c2bbfeb0eaba84b5661bc649df6860836fb0e The incident is still ongoing and we're working on a resolution with the ClickHouse Cloud team. Overall, we see a total of 1-2% of tracing related List and Get calls fail with a higher impact on queries that span longer timeframes. [EU & US] Elevated Error Rates Across Tracing APIs https://status.langfuse.com/incident/518284 Tue, 25 Feb 2025 10:25:00 -0000 https://status.langfuse.com/incident/518284#f57f4bf35ea7ca2a411796e0518c2bbfeb0eaba84b5661bc649df6860836fb0e The incident is still ongoing and we're working on a resolution with the ClickHouse Cloud team. Overall, we see a total of 1-2% of tracing related List and Get calls fail with a higher impact on queries that span longer timeframes. [EU & US] Elevated Error Rates Across Tracing APIs https://status.langfuse.com/incident/518284 Mon, 24 Feb 2025 19:26:00 -0000 https://status.langfuse.com/incident/518284#7f9efe748a9abcfa9635087729ee76a040b6f6a366a9e8609d312e4d54b10c25 We're observing elevated error rates for reads in the application and the API. Prompt management, authentication, and ingestion are not affected. Our team is investigating the situation. [EU & US] Elevated Error Rates Across Tracing APIs https://status.langfuse.com/incident/518284 Mon, 24 Feb 2025 19:26:00 -0000 https://status.langfuse.com/incident/518284#7f9efe748a9abcfa9635087729ee76a040b6f6a366a9e8609d312e4d54b10c25 We're observing elevated error rates for reads in the application and the API. Prompt management, authentication, and ingestion are not affected. Our team is investigating the situation. [US data region] Elevated error rates across APIs https://status.langfuse.com/incident/516221 Thu, 20 Feb 2025 16:21:00 -0000 https://status.langfuse.com/incident/516221#606e43c341196e2e53b99e407f1cb367b267bcd04c89ec4f0d3e2b1d22702009 [Prod US] We scaled our web containers and no longer see elevated error rates across APIs. [US data region] Elevated error rates across APIs https://status.langfuse.com/incident/516221 Thu, 20 Feb 2025 16:21:00 -0000 https://status.langfuse.com/incident/516221#606e43c341196e2e53b99e407f1cb367b267bcd04c89ec4f0d3e2b1d22702009 [Prod US] We scaled our web containers and no longer see elevated error rates across APIs. [US data region] Elevated error rates across APIs https://status.langfuse.com/incident/516221 Thu, 20 Feb 2025 16:21:00 -0000 https://status.langfuse.com/incident/516221#606e43c341196e2e53b99e407f1cb367b267bcd04c89ec4f0d3e2b1d22702009 [Prod US] We scaled our web containers and no longer see elevated error rates across APIs. [US data region] Elevated error rates across APIs https://status.langfuse.com/incident/516221 Thu, 20 Feb 2025 15:33:00 -0000 https://status.langfuse.com/incident/516221#f61bdf221875e221202b5ac93916afccaccc4428c3de73f862bc9169f1c70b3b We see elevated error rates across APIs. This is a small share of requests (<1%), we are actively working on identifying the root cause. Prompt Management and Tracing are robust to these transient errors as they retry/cache requests on errors. [US data region] Elevated error rates across APIs https://status.langfuse.com/incident/516221 Thu, 20 Feb 2025 15:33:00 -0000 https://status.langfuse.com/incident/516221#f61bdf221875e221202b5ac93916afccaccc4428c3de73f862bc9169f1c70b3b We see elevated error rates across APIs. This is a small share of requests (<1%), we are actively working on identifying the root cause. Prompt Management and Tracing are robust to these transient errors as they retry/cache requests on errors. [US data region] Elevated error rates across APIs https://status.langfuse.com/incident/516221 Thu, 20 Feb 2025 15:33:00 -0000 https://status.langfuse.com/incident/516221#f61bdf221875e221202b5ac93916afccaccc4428c3de73f862bc9169f1c70b3b We see elevated error rates across APIs. This is a small share of requests (<1%), we are actively working on identifying the root cause. Prompt Management and Tracing are robust to these transient errors as they retry/cache requests on errors. [US] Tracing ingestion delayed https://status.langfuse.com/incident/505073 Fri, 31 Jan 2025 15:06:00 -0000 https://status.langfuse.com/incident/505073#ba643ece0f2dbe8937eaaa93c60278ece92d740313a885d8a93b35a99afb559d We have resolved this issue [US] Tracing ingestion delayed https://status.langfuse.com/incident/505073 Fri, 31 Jan 2025 14:33:00 -0000 https://status.langfuse.com/incident/505073#3c39b176e63c4676e657f13133fa1d9dd29b5b342b42b2a637a645774a1de3aa Due to a drastic surge in trace volume we currently observe a 7 minute delay on the trace processing. Delayed event ingestion in US data region https://status.langfuse.com/incident/504522 Thu, 30 Jan 2025 16:23:00 -0000 https://status.langfuse.com/incident/504522#5eaea213b42d5d7d9c6e83fe432ace0014da0d74f5ce843f1e61c4e6d0045cfe The queue is processing in time again. Delayed event ingestion in US data region https://status.langfuse.com/incident/504522 Thu, 30 Jan 2025 15:34:00 -0000 https://status.langfuse.com/incident/504522#53fd4980ec404a11ff5fd8f47a4f49d80ddf775b82112e956086546361193e1e We're currently observing around 10 minutes of delay for event ingestion in the US data region. Slow UI and failing API endpoints https://status.langfuse.com/incident/500919 Thu, 23 Jan 2025 23:31:00 -0000 https://status.langfuse.com/incident/500919#08e5b12a4b20f6e387754aac9303736d5d56c161865aa7e2b8c62ddf138cee32 Root cause identified and resolved: The issue was caused by an inefficient database query that was consuming excessive resources on our ClickHouse cluster. Slow UI and failing API endpoints https://status.langfuse.com/incident/500919 Thu, 23 Jan 2025 23:13:00 -0000 https://status.langfuse.com/incident/500919#1fab90af678369e1e09f2eef51f972734d133b78e8c22f90ff481d648efa5545 We currently experience a slow UI and partially failing APIs to fetch data from Langfuse. We are working on finding the root cause. Delayed event ingestion in US data region https://status.langfuse.com/incident/499612 Wed, 22 Jan 2025 10:47:00 -0000 https://status.langfuse.com/incident/499612#c14a4ca6e42a8ed543ce9e1f659da54ae2a4d70e337f6647cdf624610c326693 We are now processing all events in time. We identified the root cause of the issue and continue to invest into performance improvements to ensure all data is processed in time. Delayed event ingestion in US data region https://status.langfuse.com/incident/499612 Tue, 21 Jan 2025 08:44:00 -0000 https://status.langfuse.com/incident/499612#4bcb93aaabc7119b502043521a4184638cc62614142a872060ba4e44408f6706 We're currently observing around 10 minutes of delay for event ingestion in the US data region. High latencies and timeouts on ingestion API https://status.langfuse.com/incident/498505 Mon, 20 Jan 2025 02:35:00 -0000 https://status.langfuse.com/incident/498505#e338890bd9a48d33412959e06969f65c557e6afcc6d3af5f50e7ddb5a8ad1a7d We scaled our ingestion containers and reduced the amoint of observed timeouts and latencies. We continue to observe the situation. We count 19.95k requested which were timed out or were closed by the client. High latencies and timeouts on ingestion API https://status.langfuse.com/incident/498505 Mon, 20 Jan 2025 02:21:00 -0000 https://status.langfuse.com/incident/498505#b37891ad2de8e96c8562da56da0feaefbf4079a6360b0bcede882a75e750e8ab We are observing high latencies on our ingestion API. We are investigating the issue and will try to resolve the issue. Delayed event ingestion in US data region https://status.langfuse.com/incident/497535 Fri, 17 Jan 2025 18:06:00 -0000 https://status.langfuse.com/incident/497535#7ad12e7cf4a9c9ad478db6e2d810cd68e1ee556f4fbd81e9b3a97437f88cebf5 We process all events within 60 seconds as of now. Delayed event ingestion in US data region https://status.langfuse.com/incident/497535 Fri, 17 Jan 2025 17:54:00 -0000 https://status.langfuse.com/incident/497535#65029ced66b07db59ac4d179004be5d93c00d5ab1e785e8d47d7b21482bfcbcf The delay is down to approximately 10 min. Delayed event ingestion in US data region https://status.langfuse.com/incident/497535 Fri, 17 Jan 2025 17:47:00 -0000 https://status.langfuse.com/incident/497535#1e36454e4cf9f599f93b39b7efa48c690dd948aac5751c058657765d1d4c62f8 The delay is down to approximately 15min. Delayed event ingestion in US data region https://status.langfuse.com/incident/497535 Fri, 17 Jan 2025 17:32:00 -0000 https://status.langfuse.com/incident/497535#8dc0eb8b5c8d88f40c2f157293aad6b40c2ca22e1915a5b52c46223c38205b84 We continue to scale our infrastructure. Delay is at 25 minutes right now. We persist all events which are entering our infrastructure. We will process all events eventually. Delayed event ingestion in US data region https://status.langfuse.com/incident/497535 Fri, 17 Jan 2025 17:04:00 -0000 https://status.langfuse.com/incident/497535#08a9db77c0564c79b19528d037ee4c9832c947d40ab071b4f7c7bc2c748e7140 We're currently observing around 10 minutes of delay for event ingestion in the US data region. Delayed event ingestion in US data region https://status.langfuse.com/incident/496402 Wed, 15 Jan 2025 19:22:00 -0000 https://status.langfuse.com/incident/496402#6281d5967ce22ab54944b1f5a3d2713be6365709f00cb0427071b6d0c0c8f69a We are fully recovered and process events within 60 seconds. Delayed event ingestion in US data region https://status.langfuse.com/incident/496402 Wed, 15 Jan 2025 18:10:00 -0000 https://status.langfuse.com/incident/496402#d455e987226033bb43dce73eb5a2d1633b7e12f0b9dc4e0e8fa59549b681cecc We observe delayed event ingestion. Events are visible in the UI and available on GET APIs with a delay of up to 7 minutes. We are investigating the issue and scaling our services to process events faster. Delayed event processing in EU data region https://status.langfuse.com/incident/495249 Mon, 13 Jan 2025 17:44:00 -0000 https://status.langfuse.com/incident/495249#a7506c27c5a269026c1afb1dbf73eb93d86cb17c57cc2787fcedcdbf134eb636 We increased our processing rate and have ingestion latencies of less than 60 seconds. Delayed event processing in EU data region https://status.langfuse.com/incident/495249 Mon, 13 Jan 2025 17:00:00 -0000 https://status.langfuse.com/incident/495249#238ad8366d35d80588ee4cc0f6880b5987f1d9926c74fdfcef294fe9d18b36d3 Since 17:00 UTC, we observe delayed event ingestion. Events are visible in the UI and available on GET APIs with a delay of up to 7 minutes. We are investigating the issue and scaling our services to process events faster. [US] Downtime across all APIs https://status.langfuse.com/incident/492749 Wed, 08 Jan 2025 19:17:00 -0000 https://status.langfuse.com/incident/492749#72a78575dde54f7477c4d1232fe506c4bed1838b04847dc6dadc4aab0638fc02 All APIs are recovered. All traces that have been accepted prior to the full downtime at 7:06pm GMT have also been processed and are available via APIs and UI. [US] Delayed tracing processing and temporary downtime https://status.langfuse.com/incident/492708 Wed, 08 Jan 2025 19:17:00 -0000 https://status.langfuse.com/incident/492708#e5d353b0a4182170e9590ef7dfd0ed3ed7ade3292fdccc58b2b67e763fa9b019 All traces that have been accepted prior to the full downtime (https://status.langfuse.com/incident/492749) at 7:06pm GMT have also been processed and are available via APIs and UI. [US] Downtime across all APIs https://status.langfuse.com/incident/492749 Wed, 08 Jan 2025 19:17:00 -0000 https://status.langfuse.com/incident/492749#72a78575dde54f7477c4d1232fe506c4bed1838b04847dc6dadc4aab0638fc02 All APIs are recovered. All traces that have been accepted prior to the full downtime at 7:06pm GMT have also been processed and are available via APIs and UI. [US] Delayed tracing processing and temporary downtime https://status.langfuse.com/incident/492708 Wed, 08 Jan 2025 19:17:00 -0000 https://status.langfuse.com/incident/492708#e5d353b0a4182170e9590ef7dfd0ed3ed7ade3292fdccc58b2b67e763fa9b019 All traces that have been accepted prior to the full downtime (https://status.langfuse.com/incident/492749) at 7:06pm GMT have also been processed and are available via APIs and UI. [US] Downtime across all APIs https://status.langfuse.com/incident/492749 Wed, 08 Jan 2025 19:17:00 -0000 https://status.langfuse.com/incident/492749#72a78575dde54f7477c4d1232fe506c4bed1838b04847dc6dadc4aab0638fc02 All APIs are recovered. All traces that have been accepted prior to the full downtime at 7:06pm GMT have also been processed and are available via APIs and UI. [US] Delayed tracing processing and temporary downtime https://status.langfuse.com/incident/492708 Wed, 08 Jan 2025 19:17:00 -0000 https://status.langfuse.com/incident/492708#e5d353b0a4182170e9590ef7dfd0ed3ed7ade3292fdccc58b2b67e763fa9b019 All traces that have been accepted prior to the full downtime (https://status.langfuse.com/incident/492749) at 7:06pm GMT have also been processed and are available via APIs and UI. [US] Downtime across all APIs https://status.langfuse.com/incident/492749 Wed, 08 Jan 2025 19:13:00 -0000 https://status.langfuse.com/incident/492749#86192a3977dae08edec4dbb72df76f9db6244c42803410f96d2d7f0857b18cd0 Langfuse UI and prompts APIs are available again, we are recovering the trace ingestion endpoint. [US] Downtime across all APIs https://status.langfuse.com/incident/492749 Wed, 08 Jan 2025 19:13:00 -0000 https://status.langfuse.com/incident/492749#86192a3977dae08edec4dbb72df76f9db6244c42803410f96d2d7f0857b18cd0 Langfuse UI and prompts APIs are available again, we are recovering the trace ingestion endpoint. [US] Downtime across all APIs https://status.langfuse.com/incident/492749 Wed, 08 Jan 2025 19:13:00 -0000 https://status.langfuse.com/incident/492749#86192a3977dae08edec4dbb72df76f9db6244c42803410f96d2d7f0857b18cd0 Langfuse UI and prompts APIs are available again, we are recovering the trace ingestion endpoint. [US] Downtime across all APIs https://status.langfuse.com/incident/492749 Wed, 08 Jan 2025 19:06:00 -0000 https://status.langfuse.com/incident/492749#4cb07a908b74574f377d59c55076716687a7224542e45e26556a0692d241bce7 A configuration change made for to resolve another issue (https://status.langfuse.com/incident/492708) led to downtime across services in the US region. We are working on fixing the issue while recovering all services. [US] Downtime across all APIs https://status.langfuse.com/incident/492749 Wed, 08 Jan 2025 19:06:00 -0000 https://status.langfuse.com/incident/492749#4cb07a908b74574f377d59c55076716687a7224542e45e26556a0692d241bce7 A configuration change made for to resolve another issue (https://status.langfuse.com/incident/492708) led to downtime across services in the US region. We are working on fixing the issue while recovering all services. [US] Downtime across all APIs https://status.langfuse.com/incident/492749 Wed, 08 Jan 2025 19:06:00 -0000 https://status.langfuse.com/incident/492749#4cb07a908b74574f377d59c55076716687a7224542e45e26556a0692d241bce7 A configuration change made for to resolve another issue (https://status.langfuse.com/incident/492708) led to downtime across services in the US region. We are working on fixing the issue while recovering all services. [US] Delayed tracing processing and temporary downtime https://status.langfuse.com/incident/492708 Wed, 08 Jan 2025 17:43:00 -0000 https://status.langfuse.com/incident/492708#3a34da0bcea38b737b31519b81213b302c3fddbe8a63f6d437ce4a1e20deb979 Our Clickhouse database is currently not able to process incoming events at the desired speed. We observe up to 10 minutes of delayed event ingestion and work on scaling the database and our containers for faster ingestion. Delayed ingestion processing https://status.langfuse.com/incident/492077 Tue, 07 Jan 2025 16:15:00 -0000 https://status.langfuse.com/incident/492077#e3c17db69b034911c344bfa7ed66de09e954b006a14b27bd82187072663c1a5a We resolved all issues and are processing events at a high rate. We are now processing with a delay of 3 minutes and less. Delayed ingestion processing https://status.langfuse.com/incident/492077 Tue, 07 Jan 2025 03:30:00 -0000 https://status.langfuse.com/incident/492077#431f06ee99ab0f48b2d2735bd4dda76c83aee2fd08e7c92ce514d3d48636d38b Our Clickhouse database is currently not able to process incoming events at the desired speed. We observe up to 15 minutes of delayed event ingestion and work on scaling the database and our containers for faster ingestion. Delayed trace ingestion https://status.langfuse.com/incident/489902 Thu, 02 Jan 2025 18:14:00 -0000 https://status.langfuse.com/incident/489902#849d06657cb575db156c15ba5aba847bfac060673ae74096d629a8a900e5be72 The issue is resolved, trace processing is back to regular service levels and all outstanding traces that were accumulated have been processed. Delayed trace ingestion https://status.langfuse.com/incident/489902 Thu, 02 Jan 2025 18:06:00 -0000 https://status.langfuse.com/incident/489902#ee0fe563d9cc1896fa179aef45b51e47e4fb3b5b5ba6bac9b9dbc8ea9cc7562b At the current rates, we expect to be fully caught up with ingested traces in the next 15 minutes. Delayed trace ingestion https://status.langfuse.com/incident/489902 Thu, 02 Jan 2025 18:05:00 -0000 https://status.langfuse.com/incident/489902#2271138c3ea20396cf5d7afee0553aaa7fb3a66c34bd2e5a14fa1cc86d457886 After halting trace deletion operations, we have now disabled deletion across the UI on Langfuse Cloud in order to make the UI more predictable. We will work on improving the performance of deletions before enabling it again. Delayed trace ingestion https://status.langfuse.com/incident/489902 Thu, 02 Jan 2025 17:18:00 -0000 https://status.langfuse.com/incident/489902#3d94a70a04499e9cbe2f964f2b2fba88e318b6ac6c8706ce3297709f272273ae We are halting processing of trace deletions made via the UI as they contribute to this issue. We are further investigating the problem. Delayed trace ingestion https://status.langfuse.com/incident/489902 Thu, 02 Jan 2025 16:52:00 -0000 https://status.langfuse.com/incident/489902#2d20edf8623d80429f94085ab27c1f01feb878adaca1b0a1ebdd571c926723ae We have identified the issue and are now catching up with the delayed trace processing. Delayed trace ingestion https://status.langfuse.com/incident/489902 Thu, 02 Jan 2025 16:40:00 -0000 https://status.langfuse.com/incident/489902#47e8dd105d0c810b4a849f98b2cb0ebb282c64bdade208296ec2b2b6107e270d Currently trace ingestion is delayed on Langfuse EU. We are investigating this issue. [US] Delayed trace ingestion and increased levels of frontend errors https://status.langfuse.com/incident/483811 Wed, 18 Dec 2024 11:25:00 -0000 https://status.langfuse.com/incident/483811#cedb73e4b993a69a009619c94baadab94478cb1e11054a9e133d4370b37ef172 All events are processed in time again. [US] Delayed trace ingestion and increased levels of frontend errors https://status.langfuse.com/incident/483811 Wed, 18 Dec 2024 11:25:00 -0000 https://status.langfuse.com/incident/483811#cedb73e4b993a69a009619c94baadab94478cb1e11054a9e133d4370b37ef172 All events are processed in time again. [US] Delayed trace ingestion and increased levels of frontend errors https://status.langfuse.com/incident/483811 Wed, 18 Dec 2024 11:13:00 -0000 https://status.langfuse.com/incident/483811#0baa5412a8522ab639333f2d32c661878e51f305271df80ab9b04572f2ead182 Wait time is at 15 minutes right now and we see a downwards trend. [US] Delayed trace ingestion and increased levels of frontend errors https://status.langfuse.com/incident/483811 Wed, 18 Dec 2024 11:13:00 -0000 https://status.langfuse.com/incident/483811#0baa5412a8522ab639333f2d32c661878e51f305271df80ab9b04572f2ead182 Wait time is at 15 minutes right now and we see a downwards trend. [US] Delayed trace ingestion and increased levels of frontend errors https://status.langfuse.com/incident/483811 Wed, 18 Dec 2024 11:00:00 -0000 https://status.langfuse.com/incident/483811#b660c3caf6b52e0cb75b072394d54ccbc6decd94eb301e9d11f9f45360eb33ac We currently observe delayed trace ingestion and elevated errors by the Langfuse UI. We are investigating the issue. No events are lost and they will be processed with delay of roughly 12 minutes. [US] Delayed trace ingestion and increased levels of frontend errors https://status.langfuse.com/incident/483811 Wed, 18 Dec 2024 11:00:00 -0000 https://status.langfuse.com/incident/483811#b660c3caf6b52e0cb75b072394d54ccbc6decd94eb301e9d11f9f45360eb33ac We currently observe delayed trace ingestion and elevated errors by the Langfuse UI. We are investigating the issue. No events are lost and they will be processed with delay of roughly 12 minutes. Delayed event processing https://status.langfuse.com/incident/483600 Wed, 18 Dec 2024 05:15:00 -0000 https://status.langfuse.com/incident/483600#05b647b31f702e08c668c2310f783382e7c20d15d2ee4e8434fa7b69f34ce44d Events are processed within a minute now. Delayed event processing https://status.langfuse.com/incident/483600 Wed, 18 Dec 2024 05:06:00 -0000 https://status.langfuse.com/incident/483600#4d5c4e4f1159b57bc952b61bd29522814d0945bc5431b546590aed1c244b81d7 Events are still delayed. The delay is at 3 minutes at the moment. We continue observing the situation. Delayed event processing https://status.langfuse.com/incident/483600 Wed, 18 Dec 2024 01:36:00 -0000 https://status.langfuse.com/incident/483600#9c47503c653f5ef451a613acb59bed29575e231e86efa2ed0ca44e992eab7147 In the US data region, we observe delayed event processing since 00:45 UTC. In the worst case a delay (event visible in the UI - event arrived at our ingestion API) can take up to 8 minutes. We are working on scaling the system. Database unresponsive https://status.langfuse.com/incident/476987 Fri, 13 Dec 2024 11:53:00 -0000 https://status.langfuse.com/incident/476987#8dae916d45708993baec196017e300b04e24cd994bcba2118bc7a59580bc78cc Queued trace and project deletions were processed. We're fully back to normal operations. Database unresponsive https://status.langfuse.com/incident/476987 Fri, 13 Dec 2024 04:24:00 -0000 https://status.langfuse.com/incident/476987#ecfda36a5394574dc50422c75910f26e6c8d92c6f6d8d87eb81a542d3984db24 All events and API functionality have been fully restored and are operating as expected. We've temporarily disabled Trace and Project deletions to prevent performance issues with our Clickhouse database. While users can still initiate deletions through the UI, these requests will be queued and processed once we re-enable the deletion functionality. Database unresponsive https://status.langfuse.com/incident/476987 Fri, 13 Dec 2024 04:13:00 -0000 https://status.langfuse.com/incident/476987#84c857af42a33b9810556f404ba3b8b6c3adb2db9e70743a84bbaf7f1cbb80de We identified the root cause and provided a fix. API latencies and error rates are improving. We will slowly speed up event processing again. Database unresponsive https://status.langfuse.com/incident/476987 Fri, 13 Dec 2024 04:00:00 -0000 https://status.langfuse.com/incident/476987#cde4491fc207301d3aaac0a07f5c27d54b0f0604e07724396947914190a4b1ee Since 04:45 GMT, we observe low responsiveness of our database. Incoming events are not processed and API calls which return tracing data (traces, observations, scores) fail or take longer. Delayed event processing and API errors https://status.langfuse.com/incident/469278 Thu, 28 Nov 2024 15:33:00 -0000 https://status.langfuse.com/incident/469278#4cf204c678b34a175d14775f5b48fa31e911fbdcb09344bde77c5513633b4693 We are fully recovered. Delayed event processing and API errors https://status.langfuse.com/incident/469278 Thu, 28 Nov 2024 15:21:00 -0000 https://status.langfuse.com/incident/469278#fdb4cff0b0f5ad0ff6fadf91f67e0eab3561bddaf4e52cc3e4bfbe577776c8db The UI is working as expected. We are currently processing all delayed events. We will be caught up shortly. Delayed event processing and API errors https://status.langfuse.com/incident/469278 Thu, 28 Nov 2024 15:11:00 -0000 https://status.langfuse.com/incident/469278#1526fda4f63e32b12d699fc19b52899f1080bf11f85ae4658ac10d204afc16dd We are currently experiencing very high load on our Clickhouse cluster which causes CPU and Memory issues. We are scaling the database momentarily. Our scores API (api/public/scores) returned 500 due to performance issues. https://status.langfuse.com/incident/467354 Mon, 25 Nov 2024 10:52:00 -0000 https://status.langfuse.com/incident/467354#81d3feefc1e19be319e19c489bc90d56f30dcca241c5b8837c1b1f96c31cc0b0 We rolled our back our change. The API behaves as expected now. Our scores API (api/public/scores) returned 500 due to performance issues. https://status.langfuse.com/incident/467354 Mon, 25 Nov 2024 10:52:00 -0000 https://status.langfuse.com/incident/467354#81d3feefc1e19be319e19c489bc90d56f30dcca241c5b8837c1b1f96c31cc0b0 We rolled our back our change. The API behaves as expected now. Our scores API (api/public/scores) returned 500 due to performance issues. https://status.langfuse.com/incident/467354 Mon, 25 Nov 2024 09:30:00 -0000 https://status.langfuse.com/incident/467354#9d352459798c424d245a812e98e163ad88a51323c5a4d02c9028c458ba7329bb Due to performance issues on our scores API (api/public/scores), we returned 500 errors in some cases. We changed the behavior of that API over the weekend. We rolled that change back and are working on a fix. Our scores API (api/public/scores) returned 500 due to performance issues. https://status.langfuse.com/incident/467354 Mon, 25 Nov 2024 09:30:00 -0000 https://status.langfuse.com/incident/467354#9d352459798c424d245a812e98e163ad88a51323c5a4d02c9028c458ba7329bb Due to performance issues on our scores API (api/public/scores), we returned 500 errors in some cases. We changed the behavior of that API over the weekend. We rolled that change back and are working on a fix. Event ingestion delayed on EU instance, affects GET API and UI https://status.langfuse.com/incident/464273 Tue, 19 Nov 2024 21:33:00 -0000 https://status.langfuse.com/incident/464273#9549e85e89a899965455d650a3dbc7fbf68248d7d2ba7cd0e5fecf05b9ff3583 We processed all missing events which we lost during the outage. Event ingestion delayed on EU instance, affects GET API and UI https://status.langfuse.com/incident/464273 Tue, 19 Nov 2024 18:25:00 -0000 https://status.langfuse.com/incident/464273#8ac97d5c867e3f033a9d1ebec1544a0836dcc98277fe33041905f8be4b3c4928 APIs behave as expected as of now. We are recovering events between (UTC 18:17:00 - UTC 18:25:00) which are not visible in the UI or for the REST API. Event ingestion delayed on EU instance, affects GET API and UI https://status.langfuse.com/incident/464273 Tue, 19 Nov 2024 18:24:00 -0000 https://status.langfuse.com/incident/464273#a9eba0366863a25af251ffa0e61a1978e638b65f79aec47199f1a6a7dad86727 Due to a peak in resource usage, event ingestion delayed on EU instance This affects GET API and UI. We are investigating the issue. Delayed event processing https://status.langfuse.com/incident/456365 Tue, 05 Nov 2024 14:54:00 -0000 https://status.langfuse.com/incident/456365#8e7f47b4c3ad6f742dd8da913c6e574b9dd27d42932a68035e37c45be410a42a Today, between 12:54 CET and 13:18 CET, we observed database connection issues in our asynchronous event processors in one of our hosting zones. Due to those issues, we stopped our workers which caused the previously reported delays. Before we scaled down the worker instances, we had a window of approximately 10 minutes in which events that were accepted on the API, were not processed by the worker and dropped with an error. This should affect about 1/3 of events that were send to our EU instance within that timeframe. We added additional error handling to record and store failed events which allows us to replay them in the future instead of dropping them on errors. We apologize for any inconvenience. Delayed event processing https://status.langfuse.com/incident/456365 Tue, 05 Nov 2024 12:38:00 -0000 https://status.langfuse.com/incident/456365#a3f641fb5d06867a77f35a66b0ae34be2cef446c304079e464c0f7659a16ada2 We found the root cause and provided a fix. Everything should be working as expected now. Delayed event processing https://status.langfuse.com/incident/456365 Tue, 05 Nov 2024 12:00:00 -0000 https://status.langfuse.com/incident/456365#9ba0223eecf10075d89a37e41cc3e0c10ce39aed7d5fd6cbe490aa5bc3a611fd We identified an issue with our database and observe delayed event processing. We are working on finding the root cause and providing a fix. UI degraded, some routes error, backend API routes unaffected https://status.langfuse.com/incident/454052 Thu, 31 Oct 2024 17:59:00 -0000 https://status.langfuse.com/incident/454052#cdff47cb5fcd79eac7002d945a38314ab3de21df6b32f037d33d741a4be32d98 We identified the issue and resolved it. UI degraded, some routes error, backend API routes unaffected https://status.langfuse.com/incident/454052 Thu, 31 Oct 2024 17:59:00 -0000 https://status.langfuse.com/incident/454052#cdff47cb5fcd79eac7002d945a38314ab3de21df6b32f037d33d741a4be32d98 We identified the issue and resolved it. UI degraded, some routes error, backend API routes unaffected https://status.langfuse.com/incident/454052 Thu, 31 Oct 2024 17:49:00 -0000 https://status.langfuse.com/incident/454052#c56e3c0b50f1f623a50481267c34eb726458fd04c000b9fe2df46a55035ab3fc Some tables in the UI are degraded and do not load, we are working on a fix. UI degraded, some routes error, backend API routes unaffected https://status.langfuse.com/incident/454052 Thu, 31 Oct 2024 17:49:00 -0000 https://status.langfuse.com/incident/454052#c56e3c0b50f1f623a50481267c34eb726458fd04c000b9fe2df46a55035ab3fc Some tables in the UI are degraded and do not load, we are working on a fix. Delayed data processing https://status.langfuse.com/incident/444520 Mon, 14 Oct 2024 19:42:00 -0000 https://status.langfuse.com/incident/444520#ef8f663d038dfe6c8b29311b4d385805252af438902c39cb785fd958dcfecf05 All historic data is processed. Delayed data processing https://status.langfuse.com/incident/444520 Mon, 14 Oct 2024 19:42:00 -0000 https://status.langfuse.com/incident/444520#ef8f663d038dfe6c8b29311b4d385805252af438902c39cb785fd958dcfecf05 All historic data is processed. Delayed data processing https://status.langfuse.com/incident/444520 Mon, 14 Oct 2024 19:41:00 -0000 https://status.langfuse.com/incident/444520#f8073e5c04b0f76240b9c4e56850b25d547f1012e55b0b4909d95b64d5720848 We fixed the bug and process events again. Processing will be delayed for EU. The US environment is processing as expected. Events between 09:34 GTM and 09:40 GMT will be processed later. Delayed data processing https://status.langfuse.com/incident/444520 Mon, 14 Oct 2024 19:41:00 -0000 https://status.langfuse.com/incident/444520#f8073e5c04b0f76240b9c4e56850b25d547f1012e55b0b4909d95b64d5720848 We fixed the bug and process events again. Processing will be delayed for EU. The US environment is processing as expected. Events between 09:34 GTM and 09:40 GMT will be processed later. Delayed data processing https://status.langfuse.com/incident/444520 Mon, 14 Oct 2024 19:40:00 -0000 https://status.langfuse.com/incident/444520#27ade2dd2e0624c061f0216257418f57195f0cec46843f101870efb9a692a702 We fully caught up all queues and process events at the usual latencies. Some events from the time between 09:34 GMT and 09:40 GMT failed to process. We are working on recovering them. Delayed data processing https://status.langfuse.com/incident/444520 Mon, 14 Oct 2024 19:40:00 -0000 https://status.langfuse.com/incident/444520#27ade2dd2e0624c061f0216257418f57195f0cec46843f101870efb9a692a702 We fully caught up all queues and process events at the usual latencies. Some events from the time between 09:34 GMT and 09:40 GMT failed to process. We are working on recovering them. Delayed data processing https://status.langfuse.com/incident/444520 Mon, 14 Oct 2024 19:34:00 -0000 https://status.langfuse.com/incident/444520#16380f9d334b734b90c8856a63e058b1e96f1db20cb795713611c0f7702bf9b1 Due to a programming error, we observe delayed event processing Delayed data processing https://status.langfuse.com/incident/444520 Mon, 14 Oct 2024 19:34:00 -0000 https://status.langfuse.com/incident/444520#16380f9d334b734b90c8856a63e058b1e96f1db20cb795713611c0f7702bf9b1 Due to a programming error, we observe delayed event processing [EU] Issues with Ingestion https://status.langfuse.com/incident/443076 Fri, 11 Oct 2024 18:33:00 -0000 https://status.langfuse.com/incident/443076#1527e9870161e2de282723fcf8d4d99a21e6e1fde3896117f44332ac761d805f The incident was resolved by completing the infrastructure upgrade. [EU] Issues with Ingestion https://status.langfuse.com/incident/443076 Fri, 11 Oct 2024 18:26:00 -0000 https://status.langfuse.com/incident/443076#d8fe9e42c3564e3dc25665430d8449659c51dbd32063f1bcc3faf6d64bb94030 Our metrics indicate that ingestion calls succeed again. We're monitoring the situation. [EU] Issues with Ingestion https://status.langfuse.com/incident/443076 Fri, 11 Oct 2024 18:24:00 -0000 https://status.langfuse.com/incident/443076#e6af4acdf42db88f629ac6ec209c67e6f237e59cd5b7af0e1bc2ba8b1739de53 We're currently observing elevated error rates on our ingestion endpoint. This is caused by an infrastructure update. Our team works on a resolution. E-Mail Login Not Working https://status.langfuse.com/incident/441112 Tue, 08 Oct 2024 09:58:00 -0000 https://status.langfuse.com/incident/441112#a0fb329ee06efbbb6db9a9798f9e9baca5718230f112a7a264e0ee5ac86aeb56 All cloud environments should be working again. E-Mail Login Not Working https://status.langfuse.com/incident/441112 Tue, 08 Oct 2024 09:58:00 -0000 https://status.langfuse.com/incident/441112#a0fb329ee06efbbb6db9a9798f9e9baca5718230f112a7a264e0ee5ac86aeb56 All cloud environments should be working again. E-Mail Login Not Working https://status.langfuse.com/incident/441112 Tue, 08 Oct 2024 09:52:00 -0000 https://status.langfuse.com/incident/441112#56969c3c24d6388e2b5f1ce685b26e5fa454e8f52f9d69e49c97882959788f87 We reverted the critical change and validated the effect in our staging environment. A fix is available via v2.83.1 and is being deployed to the Langfuse Cloud environments. E-Mail Login Not Working https://status.langfuse.com/incident/441112 Tue, 08 Oct 2024 09:52:00 -0000 https://status.langfuse.com/incident/441112#56969c3c24d6388e2b5f1ce685b26e5fa454e8f52f9d69e49c97882959788f87 We reverted the critical change and validated the effect in our staging environment. A fix is available via v2.83.1 and is being deployed to the Langfuse Cloud environments. E-Mail Login Not Working https://status.langfuse.com/incident/441112 Tue, 08 Oct 2024 09:30:00 -0000 https://status.langfuse.com/incident/441112#9ac8c3c0b687cba7d6b54e2d564f73a8445395225f100ced389ef15bced88eb9 We are currently investigating an incident where logins via email-address and password are not working. E-Mail Login Not Working https://status.langfuse.com/incident/441112 Tue, 08 Oct 2024 09:30:00 -0000 https://status.langfuse.com/incident/441112#9ac8c3c0b687cba7d6b54e2d564f73a8445395225f100ced389ef15bced88eb9 We are currently investigating an incident where logins via email-address and password are not working. Infrastructure outage caused failing APIs, slow event processing, and lost events. https://status.langfuse.com/incident/432177 Thu, 19 Sep 2024 20:06:00 -0000 https://status.langfuse.com/incident/432177#4746914b5988e304655c63c29cb330b31ca9e07bcf061d7f58db688113540f45 We resolved all infrastructure issues. Users might have experienced failing API calls. Also, we lost some traces and processed them slowly. We make continuous efforts to increase the stability and resilience of Langfuse. Infrastructure outage caused failing APIs, slow event processing, and lost events. https://status.langfuse.com/incident/432177 Thu, 19 Sep 2024 20:06:00 -0000 https://status.langfuse.com/incident/432177#4746914b5988e304655c63c29cb330b31ca9e07bcf061d7f58db688113540f45 We resolved all infrastructure issues. Users might have experienced failing API calls. Also, we lost some traces and processed them slowly. We make continuous efforts to increase the stability and resilience of Langfuse. Infrastructure outage caused failing APIs, slow event processing, and lost events. https://status.langfuse.com/incident/432177 Thu, 19 Sep 2024 16:28:00 -0000 https://status.langfuse.com/incident/432177#ac5b8f2bba1144fee86b2591288975bbd6d42d6a6de26afe2d3cc49bdbf6de7f We exhausted connections on our database. Hence, our containers are partially not be able to complete their APIs or queue processing. Infrastructure outage caused failing APIs, slow event processing, and lost events. https://status.langfuse.com/incident/432177 Thu, 19 Sep 2024 16:28:00 -0000 https://status.langfuse.com/incident/432177#ac5b8f2bba1144fee86b2591288975bbd6d42d6a6de26afe2d3cc49bdbf6de7f We exhausted connections on our database. Hence, our containers are partially not be able to complete their APIs or queue processing. Infrastructure upgrade https://status.langfuse.com/incident/427501 Wed, 11 Sep 2024 07:59:00 -0000 https://status.langfuse.com/incident/427501#c8414a12d4fb9bd72a47bcdd95c7afbe0d870a579c09021360e25f97f781e5cc All systems operate as expected again. Infrastructure upgrade https://status.langfuse.com/incident/427501 Wed, 11 Sep 2024 07:59:00 -0000 https://status.langfuse.com/incident/427501#c8414a12d4fb9bd72a47bcdd95c7afbe0d870a579c09021360e25f97f781e5cc All systems operate as expected again. Infrastructure upgrade https://status.langfuse.com/incident/427501 Wed, 11 Sep 2024 07:28:00 -0000 https://status.langfuse.com/incident/427501#6ac4c6002811366dcaab668ffa97d20b32abcd163aa1656fc65448e34d5995dc We will execute an infrastructure upgrade on our EU data region. During this time, data ingestion via the SDKs will be possible. Fetching Prompts will be degraded. Using the Langfuse UI will not be possible for the migration period. Infrastructure upgrade https://status.langfuse.com/incident/427501 Wed, 11 Sep 2024 07:28:00 -0000 https://status.langfuse.com/incident/427501#6ac4c6002811366dcaab668ffa97d20b32abcd163aa1656fc65448e34d5995dc We will execute an infrastructure upgrade on our EU data region. During this time, data ingestion via the SDKs will be possible. Fetching Prompts will be degraded. Using the Langfuse UI will not be possible for the migration period. Infrastructure upgrade https://status.langfuse.com/incident/425525 Sat, 07 Sep 2024 07:26:00 -0000 https://status.langfuse.com/incident/425525#756465fd7b6b2e43ef947d5d064d502e8a0c3191c2d85d7df4e11d3b7051a2de Everything has been executed successfully. Infrastructure upgrade https://status.langfuse.com/incident/425525 Sat, 07 Sep 2024 07:26:00 -0000 https://status.langfuse.com/incident/425525#756465fd7b6b2e43ef947d5d064d502e8a0c3191c2d85d7df4e11d3b7051a2de Everything has been executed successfully. Infrastructure upgrade https://status.langfuse.com/incident/425525 Sat, 07 Sep 2024 07:20:00 -0000 https://status.langfuse.com/incident/425525#aba2f92f0ce37f6b4fd1c7942452be199437816724b70ebe063d655689320a15 We will execute an infrastructure upgrade on our US data region. During this time, data ingestion via the SDKs will be possible. Fetching Prompts will be degraded. Logging into Langfuse will not be possible for the migration period. Infrastructure upgrade https://status.langfuse.com/incident/422581 Sun, 01 Sep 2024 15:26:00 -0000 https://status.langfuse.com/incident/422581#3836730fb84591d44ce04022ca53e03704a6bb0f02ea46f1d9304d0b6f0608bc Everything has been executed successfully. Infrastructure upgrade https://status.langfuse.com/incident/422581 Sun, 01 Sep 2024 15:26:00 -0000 https://status.langfuse.com/incident/422581#3836730fb84591d44ce04022ca53e03704a6bb0f02ea46f1d9304d0b6f0608bc Everything has been executed successfully. Infrastructure upgrade https://status.langfuse.com/incident/422581 Sun, 01 Sep 2024 15:20:00 -0000 https://status.langfuse.com/incident/422581#047292eb8ce113dd0c727baa74caf66fdf7b4bef1c1f22e14db517713200cc90 We will execute an infrastructure upgrade on our US data region. During this time, data ingestion via the SDKs will be possible. Fetching Prompts will be degraded. Logging into Langfuse will not be possible for the migration period. Infrastructure upgrade https://status.langfuse.com/incident/422581 Sun, 01 Sep 2024 15:20:00 -0000 https://status.langfuse.com/incident/422581#047292eb8ce113dd0c727baa74caf66fdf7b4bef1c1f22e14db517713200cc90 We will execute an infrastructure upgrade on our US data region. During this time, data ingestion via the SDKs will be possible. Fetching Prompts will be degraded. Logging into Langfuse will not be possible for the migration period. API rate limiting bug https://status.langfuse.com/incident/423047 Fri, 30 Aug 2024 11:41:00 -0000 https://status.langfuse.com/incident/423047#ab72ee488cc9f445a31d167f70e908cf6b56a7050176566fdfb5be0c7410d8fa Bug was reverted. All APIs are fully functional. API rate limiting bug https://status.langfuse.com/incident/423047 Fri, 30 Aug 2024 11:41:00 -0000 https://status.langfuse.com/incident/423047#ab72ee488cc9f445a31d167f70e908cf6b56a7050176566fdfb5be0c7410d8fa Bug was reverted. All APIs are fully functional. API rate limiting bug https://status.langfuse.com/incident/423047 Fri, 30 Aug 2024 11:41:00 -0000 https://status.langfuse.com/incident/423047#ab72ee488cc9f445a31d167f70e908cf6b56a7050176566fdfb5be0c7410d8fa Bug was reverted. All APIs are fully functional. API rate limiting bug https://status.langfuse.com/incident/423047 Fri, 30 Aug 2024 11:41:00 -0000 https://status.langfuse.com/incident/423047#ab72ee488cc9f445a31d167f70e908cf6b56a7050176566fdfb5be0c7410d8fa Bug was reverted. All APIs are fully functional. API rate limiting bug https://status.langfuse.com/incident/423047 Fri, 30 Aug 2024 10:37:00 -0000 https://status.langfuse.com/incident/423047#4b67f78442faae4c5d51468f01823ade485f9850f91b4c359e55c5e124f58af1 When releasing a new implementation of custom API limits, we realized a bug. Reverted immediately. Users on a paid plan were not authorized during deployment. API rate limiting bug https://status.langfuse.com/incident/423047 Fri, 30 Aug 2024 10:37:00 -0000 https://status.langfuse.com/incident/423047#4b67f78442faae4c5d51468f01823ade485f9850f91b4c359e55c5e124f58af1 When releasing a new implementation of custom API limits, we realized a bug. Reverted immediately. Users on a paid plan were not authorized during deployment. API rate limiting bug https://status.langfuse.com/incident/423047 Fri, 30 Aug 2024 10:37:00 -0000 https://status.langfuse.com/incident/423047#4b67f78442faae4c5d51468f01823ade485f9850f91b4c359e55c5e124f58af1 When releasing a new implementation of custom API limits, we realized a bug. Reverted immediately. Users on a paid plan were not authorized during deployment. API rate limiting bug https://status.langfuse.com/incident/423047 Fri, 30 Aug 2024 10:37:00 -0000 https://status.langfuse.com/incident/423047#4b67f78442faae4c5d51468f01823ade485f9850f91b4c359e55c5e124f58af1 When releasing a new implementation of custom API limits, we realized a bug. Reverted immediately. Users on a paid plan were not authorized during deployment. API latencies and error rates https://status.langfuse.com/incident/423014 Tue, 27 Aug 2024 21:42:00 -0000 https://status.langfuse.com/incident/423014#62ee5e70acce1bde5f73a1df4a0a3f1c319cdbdb1c2a0aac1c64d8c49834faec All APIs work as expected. API latencies and error rates https://status.langfuse.com/incident/423014 Tue, 27 Aug 2024 21:42:00 -0000 https://status.langfuse.com/incident/423014#62ee5e70acce1bde5f73a1df4a0a3f1c319cdbdb1c2a0aac1c64d8c49834faec All APIs work as expected. API latencies and error rates https://status.langfuse.com/incident/423014 Tue, 27 Aug 2024 20:50:00 -0000 https://status.langfuse.com/incident/423014#024ededded9cf61adbe041813abd4a33d5f9182e00326b383f81517d5dd320f8 Our API endpoints partially fail or take long time to respond as we are rate limited by AWS. In our infrastructure we exceeded network traffic and are rate limited. We are upgrading the infrastructure to resolve the problem. API latencies and error rates https://status.langfuse.com/incident/423014 Tue, 27 Aug 2024 20:50:00 -0000 https://status.langfuse.com/incident/423014#024ededded9cf61adbe041813abd4a33d5f9182e00326b383f81517d5dd320f8 Our API endpoints partially fail or take long time to respond as we are rate limited by AWS. In our infrastructure we exceeded network traffic and are rate limited. We are upgrading the infrastructure to resolve the problem. Infrastructure upgrade https://status.langfuse.com/incident/419479 Mon, 26 Aug 2024 14:24:00 -0000 https://status.langfuse.com/incident/419479#69b698be478139efb488945180904efaedccb12e6471961fff3846c91f5b3fff All APIs are recovered. Infrastructure upgrade https://status.langfuse.com/incident/419479 Mon, 26 Aug 2024 14:19:00 -0000 https://status.langfuse.com/incident/419479#da535560ef9e9cb1f312c0abe92dfd5f549549dc6c7711ecd837fb2d9918a4f6 We executed an infrastructure upgrade which lead to a downtime on our ingestion API. We use Redis internally and upgraded the Redis instance. During this upgrade, we had a 3 minutes window where we dropped events on our ingestion API. All SDKs are built with a retry on errors and are not throwing exceptions into the application. API latencies and error rates https://status.langfuse.com/incident/423008 Fri, 23 Aug 2024 21:42:00 -0000 https://status.langfuse.com/incident/423008#f99407e569eb7cec832ef8083c4dc6c152b5365b64d695c5f52f6af558ac6f31 All APIs behave as expected after infrastructure updates. API latencies and error rates https://status.langfuse.com/incident/423008 Fri, 23 Aug 2024 21:42:00 -0000 https://status.langfuse.com/incident/423008#f99407e569eb7cec832ef8083c4dc6c152b5365b64d695c5f52f6af558ac6f31 All APIs behave as expected after infrastructure updates. API latencies and error rates https://status.langfuse.com/incident/423008 Fri, 23 Aug 2024 20:50:00 -0000 https://status.langfuse.com/incident/423008#723f543360a125eef3dd50544dd3a7316b323774a8203848e72ac97d6be1ef10 Our API endpoints partially fail or take long time to respond as we are rate limited by AWS. In our infrastructure we exceeded network traffic and are rate limited. We are upgrading the infrastructure to resolve the problem. API latencies and error rates https://status.langfuse.com/incident/423008 Fri, 23 Aug 2024 20:50:00 -0000 https://status.langfuse.com/incident/423008#723f543360a125eef3dd50544dd3a7316b323774a8203848e72ac97d6be1ef10 Our API endpoints partially fail or take long time to respond as we are rate limited by AWS. In our infrastructure we exceeded network traffic and are rate limited. We are upgrading the infrastructure to resolve the problem. API latencies and error rates https://status.langfuse.com/incident/423006 Thu, 22 Aug 2024 18:20:00 -0000 https://status.langfuse.com/incident/423006#ac5c7b1e6d9f5dd1e7ef42af3a88ac06519cb7fb69ed70c0b81d9bd3cbb9b16e We upgraded our infrastructure. Everything behaves as expected now. API latencies and error rates https://status.langfuse.com/incident/423006 Thu, 22 Aug 2024 18:20:00 -0000 https://status.langfuse.com/incident/423006#ac5c7b1e6d9f5dd1e7ef42af3a88ac06519cb7fb69ed70c0b81d9bd3cbb9b16e We upgraded our infrastructure. Everything behaves as expected now. API latencies and error rates https://status.langfuse.com/incident/423006 Thu, 22 Aug 2024 18:00:00 -0000 https://status.langfuse.com/incident/423006#7484e11ded60277dabe8dd64057fb921abf943b8c3ebd6b8220e141fb8a5fd90 Our Redis queue is not reachable due to exceeded network traffic. We are upgrading our infrastructure to resolve the issue. API latencies and error rates https://status.langfuse.com/incident/423006 Thu, 22 Aug 2024 18:00:00 -0000 https://status.langfuse.com/incident/423006#7484e11ded60277dabe8dd64057fb921abf943b8c3ebd6b8220e141fb8a5fd90 Our Redis queue is not reachable due to exceeded network traffic. We are upgrading our infrastructure to resolve the issue. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:51:00 -0000 https://status.langfuse.com/incident/395806#63c18d7cf686cc9063244788c72da9efa24c44f5c223997948cbe2ae128f6cc2 All services are fully restored. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:51:00 -0000 https://status.langfuse.com/incident/395806#63c18d7cf686cc9063244788c72da9efa24c44f5c223997948cbe2ae128f6cc2 All services are fully restored. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:51:00 -0000 https://status.langfuse.com/incident/395806#63c18d7cf686cc9063244788c72da9efa24c44f5c223997948cbe2ae128f6cc2 All services are fully restored. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:51:00 -0000 https://status.langfuse.com/incident/395806#63c18d7cf686cc9063244788c72da9efa24c44f5c223997948cbe2ae128f6cc2 All services are fully restored. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:51:00 -0000 https://status.langfuse.com/incident/395806#63c18d7cf686cc9063244788c72da9efa24c44f5c223997948cbe2ae128f6cc2 All services are fully restored. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:51:00 -0000 https://status.langfuse.com/incident/395806#63c18d7cf686cc9063244788c72da9efa24c44f5c223997948cbe2ae128f6cc2 All services are fully restored. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:20:00 -0000 https://status.langfuse.com/incident/395806#bd87c61af2b15744bd3c57231600a772634b480a11479d3f38eeeb107e7c1943 We have identified the issue and are currently deploying a fix. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:20:00 -0000 https://status.langfuse.com/incident/395806#bd87c61af2b15744bd3c57231600a772634b480a11479d3f38eeeb107e7c1943 We have identified the issue and are currently deploying a fix. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:20:00 -0000 https://status.langfuse.com/incident/395806#bd87c61af2b15744bd3c57231600a772634b480a11479d3f38eeeb107e7c1943 We have identified the issue and are currently deploying a fix. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:20:00 -0000 https://status.langfuse.com/incident/395806#bd87c61af2b15744bd3c57231600a772634b480a11479d3f38eeeb107e7c1943 We have identified the issue and are currently deploying a fix. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:20:00 -0000 https://status.langfuse.com/incident/395806#bd87c61af2b15744bd3c57231600a772634b480a11479d3f38eeeb107e7c1943 We have identified the issue and are currently deploying a fix. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:20:00 -0000 https://status.langfuse.com/incident/395806#bd87c61af2b15744bd3c57231600a772634b480a11479d3f38eeeb107e7c1943 We have identified the issue and are currently deploying a fix. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:15:00 -0000 https://status.langfuse.com/incident/395806#fa8cc499ec9db7892b449e5117c28ec12e2b7d4b5520b9d374df0331906f0395 A share of API calls is degraded, we are investigating the issue. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:15:00 -0000 https://status.langfuse.com/incident/395806#fa8cc499ec9db7892b449e5117c28ec12e2b7d4b5520b9d374df0331906f0395 A share of API calls is degraded, we are investigating the issue. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:15:00 -0000 https://status.langfuse.com/incident/395806#fa8cc499ec9db7892b449e5117c28ec12e2b7d4b5520b9d374df0331906f0395 A share of API calls is degraded, we are investigating the issue. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:15:00 -0000 https://status.langfuse.com/incident/395806#fa8cc499ec9db7892b449e5117c28ec12e2b7d4b5520b9d374df0331906f0395 A share of API calls is degraded, we are investigating the issue. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:15:00 -0000 https://status.langfuse.com/incident/395806#fa8cc499ec9db7892b449e5117c28ec12e2b7d4b5520b9d374df0331906f0395 A share of API calls is degraded, we are investigating the issue. APIs degraded https://status.langfuse.com/incident/395806 Tue, 09 Jul 2024 18:15:00 -0000 https://status.langfuse.com/incident/395806#fa8cc499ec9db7892b449e5117c28ec12e2b7d4b5520b9d374df0331906f0395 A share of API calls is degraded, we are investigating the issue. Degraded APIs in US data region https://status.langfuse.com/incident/394204 Fri, 05 Jul 2024 22:05:00 -0000 https://status.langfuse.com/incident/394204#74c7a38ccb33c3bc4db2d641910bc39ad73011be6e9f0c6b3e9adcae7b104e2a Enabling Open Telemetry-based instrumentation on the API routes hosted on Vercel resulted in Gateway Timeouts (HTTP 504) for a portion of requests from 6:00 PM to 9:00 PM (UTC). The share of 504 timeouts across different parts of the application was as follows: - US overall: 2.77% - US Public API (/api/public*): 2.87% - US Tracing (/api/public/ingestion): 0.89% - US Prompt Management (/api/public/prompts): 14.01% The behavior of Langfuse SDKs when the API was partially unavailable: - Tracing: The Langfuse SDKs retried each batch of trace events, which helped reduce some of the data loss during this partial outage. - Prompt Management: The Langfuse SDKs cached fetched prompts and served the stale cache if the Langfuse API was not available. In cases where there was no cached prompt version (e.g., after redeployment or new instances), this partial outage may have caused runtime exceptions. Langfuse v3 will introduce a series of infrastructure changes to increase the overall robustness of the service and the observability stack. Also we are currently migrating to EKS on AWS. The change that led to this incident is part of the overall effort and intended to add Open Telemetry-based observability to all services of the core application. While this change worked flawlessly on the EKS-based services, it negatively affected the current production deployment of the APIs on Vercel. The EU data region was completely unaffected by this issue. Degraded APIs in US data region https://status.langfuse.com/incident/394204 Fri, 05 Jul 2024 22:05:00 -0000 https://status.langfuse.com/incident/394204#74c7a38ccb33c3bc4db2d641910bc39ad73011be6e9f0c6b3e9adcae7b104e2a Enabling Open Telemetry-based instrumentation on the API routes hosted on Vercel resulted in Gateway Timeouts (HTTP 504) for a portion of requests from 6:00 PM to 9:00 PM (UTC). The share of 504 timeouts across different parts of the application was as follows: - US overall: 2.77% - US Public API (/api/public*): 2.87% - US Tracing (/api/public/ingestion): 0.89% - US Prompt Management (/api/public/prompts): 14.01% The behavior of Langfuse SDKs when the API was partially unavailable: - Tracing: The Langfuse SDKs retried each batch of trace events, which helped reduce some of the data loss during this partial outage. - Prompt Management: The Langfuse SDKs cached fetched prompts and served the stale cache if the Langfuse API was not available. In cases where there was no cached prompt version (e.g., after redeployment or new instances), this partial outage may have caused runtime exceptions. Langfuse v3 will introduce a series of infrastructure changes to increase the overall robustness of the service and the observability stack. Also we are currently migrating to EKS on AWS. The change that led to this incident is part of the overall effort and intended to add Open Telemetry-based observability to all services of the core application. While this change worked flawlessly on the EKS-based services, it negatively affected the current production deployment of the APIs on Vercel. The EU data region was completely unaffected by this issue. Degraded APIs in US data region https://status.langfuse.com/incident/394204 Fri, 05 Jul 2024 22:05:00 -0000 https://status.langfuse.com/incident/394204#74c7a38ccb33c3bc4db2d641910bc39ad73011be6e9f0c6b3e9adcae7b104e2a Enabling Open Telemetry-based instrumentation on the API routes hosted on Vercel resulted in Gateway Timeouts (HTTP 504) for a portion of requests from 6:00 PM to 9:00 PM (UTC). The share of 504 timeouts across different parts of the application was as follows: - US overall: 2.77% - US Public API (/api/public*): 2.87% - US Tracing (/api/public/ingestion): 0.89% - US Prompt Management (/api/public/prompts): 14.01% The behavior of Langfuse SDKs when the API was partially unavailable: - Tracing: The Langfuse SDKs retried each batch of trace events, which helped reduce some of the data loss during this partial outage. - Prompt Management: The Langfuse SDKs cached fetched prompts and served the stale cache if the Langfuse API was not available. In cases where there was no cached prompt version (e.g., after redeployment or new instances), this partial outage may have caused runtime exceptions. Langfuse v3 will introduce a series of infrastructure changes to increase the overall robustness of the service and the observability stack. Also we are currently migrating to EKS on AWS. The change that led to this incident is part of the overall effort and intended to add Open Telemetry-based observability to all services of the core application. While this change worked flawlessly on the EKS-based services, it negatively affected the current production deployment of the APIs on Vercel. The EU data region was completely unaffected by this issue. Degraded APIs in US data region https://status.langfuse.com/incident/394204 Fri, 05 Jul 2024 21:01:00 -0000 https://status.langfuse.com/incident/394204#0348339bb94e9cf7d72b603cb3b884d14b4d11ccdfcc6a83a59511fd308850a5 We have reverted a change to our observability stack that caused these issues. All APIs are now fully operational. We will post a post-mortem shortly. Degraded APIs in US data region https://status.langfuse.com/incident/394204 Fri, 05 Jul 2024 21:01:00 -0000 https://status.langfuse.com/incident/394204#0348339bb94e9cf7d72b603cb3b884d14b4d11ccdfcc6a83a59511fd308850a5 We have reverted a change to our observability stack that caused these issues. All APIs are now fully operational. We will post a post-mortem shortly. Degraded APIs in US data region https://status.langfuse.com/incident/394204 Fri, 05 Jul 2024 21:01:00 -0000 https://status.langfuse.com/incident/394204#0348339bb94e9cf7d72b603cb3b884d14b4d11ccdfcc6a83a59511fd308850a5 We have reverted a change to our observability stack that caused these issues. All APIs are now fully operational. We will post a post-mortem shortly. Degraded APIs in US data region https://status.langfuse.com/incident/394204 Fri, 05 Jul 2024 18:00:00 -0000 https://status.langfuse.com/incident/394204#797c603b7845aa4fc5ac012478455e083a1162aa704603dc67cb0b0d802160fa Some APIs timeout (HTTP 504 - Gateway Timeout) in the US data region, we are investigating this issue. Degraded APIs in US data region https://status.langfuse.com/incident/394204 Fri, 05 Jul 2024 18:00:00 -0000 https://status.langfuse.com/incident/394204#797c603b7845aa4fc5ac012478455e083a1162aa704603dc67cb0b0d802160fa Some APIs timeout (HTTP 504 - Gateway Timeout) in the US data region, we are investigating this issue. Degraded APIs in US data region https://status.langfuse.com/incident/394204 Fri, 05 Jul 2024 18:00:00 -0000 https://status.langfuse.com/incident/394204#797c603b7845aa4fc5ac012478455e083a1162aa704603dc67cb0b0d802160fa Some APIs timeout (HTTP 504 - Gateway Timeout) in the US data region, we are investigating this issue. Downtime of EU instance https://status.langfuse.com/incident/390740 Fri, 28 Jun 2024 12:48:00 -0000 https://status.langfuse.com/incident/390740#821b46bcfd6fa91de218efb1e782f4f0e8891ea052a2de385dd33118fd5677d7 We have identified the root cause (an infrastructure migration script which overused database resources) and fixed it. All APIs are available again. Downtime of EU instance https://status.langfuse.com/incident/390740 Fri, 28 Jun 2024 12:48:00 -0000 https://status.langfuse.com/incident/390740#821b46bcfd6fa91de218efb1e782f4f0e8891ea052a2de385dd33118fd5677d7 We have identified the root cause (an infrastructure migration script which overused database resources) and fixed it. All APIs are available again. Downtime of EU instance https://status.langfuse.com/incident/390740 Fri, 28 Jun 2024 12:48:00 -0000 https://status.langfuse.com/incident/390740#821b46bcfd6fa91de218efb1e782f4f0e8891ea052a2de385dd33118fd5677d7 We have identified the root cause (an infrastructure migration script which overused database resources) and fixed it. All APIs are available again. Downtime of EU instance https://status.langfuse.com/incident/390740 Fri, 28 Jun 2024 12:43:00 -0000 https://status.langfuse.com/incident/390740#824318c926ff4fe27ff1df63f87a63a78fbf6c2db1556979b287d5381ce3a4e3 We are investigating the issue Downtime of EU instance https://status.langfuse.com/incident/390740 Fri, 28 Jun 2024 12:43:00 -0000 https://status.langfuse.com/incident/390740#824318c926ff4fe27ff1df63f87a63a78fbf6c2db1556979b287d5381ce3a4e3 We are investigating the issue Downtime of EU instance https://status.langfuse.com/incident/390740 Fri, 28 Jun 2024 12:43:00 -0000 https://status.langfuse.com/incident/390740#824318c926ff4fe27ff1df63f87a63a78fbf6c2db1556979b287d5381ce3a4e3 We are investigating the issue