Sustained HTTP 500 errors on Portal
Incident Report for Patchman
Resolved
We have seen no further regressions. While of course we continue to monitor diligently, we will close this incident for now. We will be working on a post-mortem for this incident soon.
Posted Jan 22, 2024 - 10:38 CET
Monitoring
At this time we are seeing that all errors have been resolved. We will continue to monitor the situation for several days. If you do encounter anything you are unsure about, please don't hesitate to reach out to support@patchman.co.
Posted Jan 19, 2024 - 14:15 CET
Update
There are still certain errors related to API authentication and key management. We are aware of the cause and are working on its resolution.
Posted Jan 19, 2024 - 09:20 CET
Update
All systems are operational again, although the Data Processing Infrastructure is still processing a significant backlog. It may take some time for the data to be up-to-date again in the Portal.

We are continuing work on the backend to run additional checks, verifications and add further redundancies.
Posted Jan 19, 2024 - 09:13 CET
Update
The restoration has been completed, and we are bringing our services back up again.
Posted Jan 19, 2024 - 09:07 CET
Update
We are currently restoring a backup from cold storage, which we expect to take several hours to complete. We will update this post once there is progress to report on.
Posted Jan 18, 2024 - 19:49 CET
Update
Our restoration process has hit a problem, and we have to re-evaluate the next steps. There are still fallback plans, but we have to validate the steps we have before we can definitively decide on which to use.
Posted Jan 18, 2024 - 18:43 CET
Update
We are seeing that another one of our redundancy steps is taking longer than expected. While we understand that it is frustrating to not have reliable estimations for the system coming back online, we are doing everything we can to get this done as soon as humanly possible, and are at this point simply bottlenecked by systems having to process the large amounts of data that we hold in this affected database.
Posted Jan 18, 2024 - 17:58 CET
Update
The recovery process is proceeding steadily, but we have hit a delay because one step took longer than expected. Our predicted online time is now 18:15 CET (UTC+1).
Posted Jan 18, 2024 - 17:25 CET
Update
A viable path to restoration has been identified, and we are currently going through the steps of making this happen. We have various redundant steps and failsafes in this procedure that give us the best chance possible, but all these additional steps mean that the process will take some time to complete. At this point, we expect the remaining downtime to last until roughly 17:30 CET (UTC+1).
Posted Jan 18, 2024 - 16:08 CET
Identified
We have concluded that unfortunately part of our main database system for the Patchman Portal has been corrupted. We are currently trying to determine the safest course of action for remediation between cluster recovery and backup restoration. Our goal is of course to minimize data loss, but we want to make sure we do not use a solution that does not fully remediate the problem of corruption we are facing.
Posted Jan 18, 2024 - 15:04 CET
Update
Our determination so far is a consistency problem in our database backend. We are trying to determine the exact problem so that we can apply the correct resolution. For now, to protect your data in our system, we will unfortunately need to keep the Portal down.
Posted Jan 18, 2024 - 14:14 CET
Investigating
We are currently investigating an issue that is causing the Portal to produce consistent HTTP 500 errors. At this time the Portal is completely unavailable.
Posted Jan 18, 2024 - 13:54 CET
This incident affected: Portal, Customer API, and Data Processing Infrastructure.