Portal and Customer API intermittently unavailable

Incident Report for Patchman

Resolved

No changes or abnormalities have been observed during the extended heightened monitoring period and all systems and services have continued to operate normally. This incident has been fully resolved.

Posted Sep 24, 2019 - 11:49 CEST

Update

No changes have been observed in the last 24 hours. Due to the scale and impact of this incident however, we will continue to maintain heightened monitoring throughout the next 48 hours.

Posted Sep 20, 2019 - 09:12 CEST

Update

We can confirm that all systems still look good and are performing as expected.

Posted Sep 19, 2019 - 13:10 CEST

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Sep 19, 2019 - 07:32 CEST

Update

All systems are back online, and are performing as expected. We are still working on adding some more resiliency, but none of this is going to be noticeable to customers.

The backlog of server scan/detection data processing remains and is currently estimated to take at least 6 more hours to process.

Posted Sep 19, 2019 - 06:29 CEST

Update

The Data Processing Backend is being brought back online in phases too.

Please note that there will currently be a significant backlog in processing of data sent from servers, causing data in the Portal to be out-of-date. This will automatically be resolved in the next several hours.

Posted Sep 19, 2019 - 06:18 CEST

Update

We are currently gradually bringing services back online. The Portal and Customer API are currently operating again, and we are closely monitoring their resumption of service. Work on the other components is continuing in the meantime.

Posted Sep 19, 2019 - 06:10 CEST

Update

We hope to resume service within the next two hours.

Posted Sep 19, 2019 - 05:01 CEST

Update

The work is still ongoing. We will be switching to updates every two hours.

Posted Sep 19, 2019 - 03:01 CEST

Update

We are currently performing a series of additional backups and expanding our redundant infrastructure. In light of ensuring the most risk-free form of operation, we want to complete all of this before we resume service.

Rest assured that we can confirm there is no data loss involved in this incident, nor do we expect to be at risk of any at the moment. The current downtime is purely by choice to give ourselves room for necessary precautions for future incidents.

Another update will be provided in an hour.

Posted Sep 19, 2019 - 02:04 CEST

Update

Due to the current pace of verification work, we will be switching to hourly updates on this incident.

Posted Sep 19, 2019 - 01:04 CEST

Update

As before, our stability verification is still ongoing. Unfortunately we are unable to provide an ETA at this time.

Although we naturally understand the problems this is causing for our customers, we want to emphasize that we feel this is currently the most prudent course of action to prevent further issues and further strengthen the platform going forward. We sincerely apologize for the inconvenience.

We will provide another update on this incident within 30 minutes.

Posted Sep 19, 2019 - 00:34 CEST

Update

Our stability verification is still ongoing. Unfortunately we are unable to provide an ETA at this time.

We will provide another update on this incident within 30 minutes.

Posted Sep 19, 2019 - 00:07 CEST

Update

Damaged components of the stack have been replaced. Before we resume service, we are running additional safety and hardening checks to verify the continued stability of the cluster.

We will provide another update on this incident within 30 minutes.

Posted Sep 18, 2019 - 23:41 CEST

Identified

We have identified our database cluster as experiencing a major outage. Work is currently in progress to resolve this. Unfortunately we are unable to provide an ETA at this time.

We will provide another update on this incident within 30 minutes.

Posted Sep 18, 2019 - 23:32 CEST

Update

The Data Processing Backend cluster is also affected. We are still investigating.

Posted Sep 18, 2019 - 23:17 CEST

Investigating

The Portal and Customer API are experiencing issues with availability. We are investigating the issue.

Posted Sep 18, 2019 - 23:10 CEST

This incident affected: Portal, Customer API, and Data Processing Infrastructure.