Portal, Customer API and data processing backend unavailable
Incident Report for Patchman
Resolved
There have been no regressions on our platform. We will be planning maintenance for reintroducing the previously faulty node into the database cluster when it is required. For now, we will be closing this incident.
Posted Jan 26, 2018 - 16:23 CET
Monitoring
We continue to monitor the situation. While we have finished what we considered to be emergency maintenance on the faulty database node, we are still performing lower-profile tests on it to validate our steps have indeed solved the problems.
Posted Jan 26, 2018 - 12:13 CET
Update
The cluster rebalancing has been completed and all services are fully operational. Emergency maintenance on the faulty database node is ongoing.
Posted Jan 26, 2018 - 10:43 CET
Identified
Some of the necessary steps to prevent future regressions require significant downtime to one of our database nodes. We will be doing rebalancing in our database cluster to take the problematic node out of service. This may cause short periods of unavailability on the Portal and Customer API.
Posted Jan 26, 2018 - 10:31 CET
Update
All systems have remained stable over the past hour. We are currently still investigating the root cause and taking measures to prevent this from occurring again in the future.
Posted Jan 26, 2018 - 10:06 CET
Monitoring
All services are back online. We are closely monitoring the situation for regressions, as well as managing the problem that cause this outage in the first place.
Posted Jan 26, 2018 - 09:15 CET
Update
The Portal and Customer API are back online. We are currently still working on the data processing backend.
Posted Jan 26, 2018 - 09:10 CET
Identified
Due to issues with our database cluster we are seeing intermittent failures on various services. The cause has been identified and we are currently working on a solution.
Posted Jan 26, 2018 - 09:05 CET