Summary
On Tuesday, October 28, 2014, an issue in one of Microsoft's datacenters caused some customer in EMEA to see sandbox errors, SQL errors, or failed workflows. Less than 25% of the customers in the region reported the issue. Microsoft engineers investigated and took steps to mitigate the issue. Microsoft then reached out to customers who had escalated, to get confirmation of resolution.
Customer Impact
Some European customers were experiencing a degraded experience with workflows or when using sandbox services – including timeout errors, workflow failures, or slowness. Note that initially this SI was posted as having a worldwide impact. Investigation determined that the impact was actually limited to some customers in Europe. There was no impact to customers in the Americas or the Asia Pacific region.
Incident Start Date and Time
Tuesday, October 28, 2014 at 9:00 AM UTC
Date and Time Service was Restored
Tuesday, October 28, 2014 at 5:00 PM UTC
Root Cause
An infrastructure server became overloaded with processing requests, resulting in application process failures when all server resources were exhausted.
Next Step(s)
Issue | Next Step | Team Owner | Timeline |
Monitoring and Alerting | Review the server and service monitoring and alerting currently in place for this type of server and identify and recommend changes. Implement agreed upon monitoring and alerting changes. | Microsoft Dynamics CRM Online Service Engineering | Mid November |
Operations and Design | Review the overall system design to identify and implement short and long term changes to ensure platform performance and reliability. | Microsoft Dynamics CRM Teams (Product Engineering and Service Engineering) | Some longer term related changes are already planned for an upcoming release.
Near term changes are currently under discussion and will be implemented as soon as feasible. |