Update - Incident Update — March 5, 2026

 On the afternoon of March 4th, we confirmed the core root cause of this incident. We have been monitoring our infrastructure continuously since then and are not seeing ongoing impact to our clients.

Root Cause:

This incident was caused by a series of compounding issues, but the primary root cause is a performance flaw in a third-party SQL parsing library used throughout our platform. We identified the problematic behavior early in our investigation, but because SQL parsing is a centralized function critical to Deploy, Refresh, and API operations, we could not simply disable it. A fix required coordination with the third-party vendor, who delivered us a resolution this morning.

What We Did While Awaiting the Vendor Fix:

Rather than wait, we invested heavily in hardening the platform against the impact of this flaw:

  - Reliability improvements: Added timeout and retry mechanisms to prevent jobs from hanging indefinitely on unresponsive connections
  - Resource isolation: Separated heavy SQL parsing workloads onto dedicated infrastructure so they no longer block critical scheduler operations
  - Performance fixes: Identified and resolved a code-level regression that was amplifying the parsing cost per request
  - Monitoring & alerting: Deployed new observability tooling and health checks to detect and automatically remediate the application when a degraded state is detected

Today's Release:

Today we released version 7.30.6, which includes additional reliability and performance improvements that further reduce the impact of the underlying parsing flaw. We will continue to monitor closely and will ship the vendor's fix as soon as it is available.

Next Steps:

 - We are preparing a detailed Incident Report that will be available upon request, including a full timeline and additional details.
 - We are testing an updated fix from our vendor and expect to release an update early next week, pending our quality checks.

We appreciate your patience and collaboration during this ongoing impact to your operations.

Mar 06, 2026 - 03:28 UTC
Update - Starting at approximately 10PM Pacific time Tuesday March 3rd a single resource in our Scheduler API resource pool degraded in performance and did not recover. This caused a performance impact across the product due to the reliance on the Scheduler as our core API service. At 9AM Pacific time Wednesday March 4th, this resource was restarted and operations returned to normal.

We are continuing to root cause the reason we experienced a similar scenario on the AM of March 3rd. We have added additional monitoring and alerting for this specific behavior to reduce any impact if it re-occurs until we deliver a full resolution.

Updates to follow with a full timeline and root cause of this entire incident.

Mar 04, 2026 - 21:47 UTC
Update - We identified a resource that needed to be restarted and the DEADLINE_EXCEEDED errors clients were seeing should cease.

This was unrelated to the overall incident we are tracking here and work will continue to resolving fully.

Mar 03, 2026 - 15:07 UTC
Update - Clients are reporting errors running refresh and deploy operations. We are investigating and will provide updates.
Mar 03, 2026 - 11:54 UTC
Update - We are no longer seeing general or widespread issues with our operations and will continue to monitor this incident over the weekend as we work towards a final solution and root cause analysis.
Feb 28, 2026 - 00:09 UTC
Update - The update we released yesterday has resolved the ongoing Deploy and Refresh failures from our clients. We are continuing to monitor and will provide details on the root cause of the issue as soon as available.

A small number of our clients are seeing delays running Deploy operations that are unrelated to the ongoing issue this week. We have identified a likely root cause and if confirmed and resolved will transition this issue to Operational.

Feb 27, 2026 - 17:48 UTC
Update - Clients may still see issues with Deploy and Refresh operations which we are continuing to monitor. We have released an additional update, version 7.29.5, that includes follow-on improvements to the reliability of our Refresh and Deploy operations. This update is included in version 7.29.5 of our coa CLI and is a recommended upgrade to all customers.

We are continuing to monitor and provide updates as this incident progresses.

Feb 27, 2026 - 01:06 UTC
Update - Clients can expect to continue to experience issues with delays and timeouts. We have an additional reliability optimization we have developed that is targeted for release within the next 2 hours or less. Expect our next update here when available.
Feb 26, 2026 - 23:39 UTC
Update - We have released a new version 7.29.4 that mitigates the performance issues our customers are experiencing. We expect it to reduce customer impact from this issue. Any customers using the coa CLI are recommended to upgrade.

We will continue to update this issue as we make progress towards the root cause.

Feb 26, 2026 - 20:53 UTC
Update - Between 2-5 AM Pacific Time US we discovered jobs were again backing up and not processing as expected.

Engineering has developed a patch to resolve this issue we are targeting to be released by 12 PM Pacific. We will continue to monitor and resolve any issues with job processing and update this incident when the patch has been shipped.

Feb 26, 2026 - 16:43 UTC
Update - Jobs are no longer timing out and the Scheduler is back to operational. We are going to continue to monitor over the next 12-24 hours and provide an update tomorrow with more details on the cause of the service interruption.
Feb 26, 2026 - 00:06 UTC
Monitoring - We are no longer observing job timeouts at this time. Our team continues to monitor the system closely to ensure stability. We will provide further updates if anything changes.
Feb 25, 2026 - 17:37 UTC
Update - We are continuing to investigate. We will post updates at the top of each hour, or sooner if new information becomes available.
Feb 25, 2026 - 17:15 UTC
Identified - Job scheduling issues in the US region have recurred. We have identified that certain jobs are becoming unresponsive and blocking the rest of the queue. We are intervening to clear the backlog and are investigating the underlying cause of these stalled processes. Customers should expect intermittent delays or timeouts in the interim.
Feb 25, 2026 - 16:10 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 25, 2026 - 14:07 UTC
Identified - We have identified a bottleneck in our job scheduling queue in the US region. We are currently recovering our infrastructure and monitoring the backlog of stale jobs. Customers may continue to see some older jobs fail, but new jobs are starting to process. We will provide further updates as the queue returns to normal levels.
Feb 25, 2026 - 13:13 UTC
Investigating - We are investigating reports of jobs intermittently timing out. Our team is actively working to identify the root cause and will provide updates as soon as more information becomes available.
Feb 25, 2026 - 11:21 UTC

About This Site

We publish in real time the operational status of our systems as well as descriptions of historical incidents. Incidents published on this site may reflect downtime due to scheduled maintenance or issues related to external applications or third parties.

If you are experiencing an operational issue with our services please inform us so that we can assist you:

Web: https://support.coalesce.io
Email: support@coalesce.io

Catalog Operational
API Operational
Chrome Extension Operational
Microsoft Teams App Operational
Slack App Operational
Web App Operational
AI Capabilities Operational
Transform Operational
API Operational
Scheduler Operational
Web App Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Mar 6, 2026

Unresolved incident: Intermittent Job Timeouts.

Mar 5, 2026

No incidents reported.

Mar 4, 2026
Mar 3, 2026
Mar 2, 2026

No incidents reported.

Mar 1, 2026

No incidents reported.

Feb 28, 2026
Feb 27, 2026
Feb 26, 2026
Feb 25, 2026
Feb 24, 2026

No incidents reported.

Feb 23, 2026

No incidents reported.

Feb 22, 2026

No incidents reported.

Feb 21, 2026

No incidents reported.

Feb 20, 2026

No incidents reported.