Summary
On November 24th, 2014 some CRM Online organizations hosted in one of our North American data centers went offline. The issue was detected by monitoring and Dynamics Service Engineering followed established troubleshooting processes to investigate and fix the issue. Less than 1% of customers in North America were affected.
Customer Impact
During the incident, customers would have experienced very slow load times or timeouts while trying to access their CRM Online organization during a portion of the incident time.
Incident Start Date and Time
November 24, 2014 9:52 PM PST
Date and Time Service was Restored
November 24, 2014 10:12 PM PST
Root Cause
One of the SQL servers in a single cluster began to experience higher than normal CPU utilization, causing slow performance. The Service Engineering team received an alert to this condition and began failing over some availability groups to alternate database servers to alleviate the high CPU. Unfortunately, some of the availability groups did not fail over cleanly and those customers experienced a brief outage until those availability groups were brought back online.
Next Step(s)
Issue | Next Step | Team Owner | Timeline |
Not all databases failed over cleanly | Investigation through logs and other data of why databases did not fail over. | Microsoft Dynamics CRM Online Service Engineering | Underway |
High CPU utilization | Investigate the cause of the high CPU utilization on the single SQL server and what can be done to protect the service should it happen again | Microsoft Dynamics CRM Online Service Engineering | Underway |