This maintenance batch has completed, and all affected services are back online.
Posted 2 days ago. Jan 17, 2018 - 05:15 CET
The Portal, Customer API and Data Processing Backend have been temporarily disabled awaiting maintenance on multiple clusters these systems are dependent on.
Posted 2 days ago. Jan 17, 2018 - 05:01 CET
Our next batch will run tonight at 5:00 AM CET, which will include multiple components of multiple clusters. To minimize the risk of unexpected side effects, we will temporary shut down the Portal webinterface and Customer API altogether while maintenance is running. We expect the downtime to last roughly 15 minutes.
Posted 3 days ago. Jan 16, 2018 - 09:43 CET
Rebalancing has completed.
We will next update this incident once the next maintenance batch has been scheduled.
Posted 4 days ago. Jan 15, 2018 - 10:43 CET
After successful security maintenance last night, we will now be rebalancing our clusters to bring everything back to the original and stable state. This may once again cause some Agent API connections to reset.
Posted 4 days ago. Jan 15, 2018 - 10:01 CET
All preparations for the maintenance of tonight have been completed. None of the maintenance should have any noticeable service impact for our customers.
Posted 5 days ago. Jan 14, 2018 - 22:44 CET
A batch of servers will be rebooted around 3 AM UTC tonight, which includes machines in the message queue and Agent API clusters. To minimize the impact, we will be preventively taking certain nodes our of their respective clusters for the duration of the maintenance, as well as rebalance some clusters to handle the modifications. As a result of these changes, you may see Agent API connections being cycled, and the Data Processing backends will be running at lowered capacity for the night.
Posted 5 days ago. Jan 14, 2018 - 22:10 CET
Our infrastructure provider has notified us that they have started maintenance on our systems which involves full machine reboots. Due to the severity of the situation, all of this will occur on a very short term. We will do our best to keep this post updated as maintenance progresses to inform you of which systems may be temporarily unavailable as a result of the maintenance.
Posted 9 days ago. Jan 09, 2018 - 23:48 CET
All systems are fully operational again and the backlog on the Agent API has been processed.
We do expect to be taking more steps in the upcoming days so we will keep this incident open, but for now we will be monitoring our systems to validate our maintenance has no unintended side effects.
Posted 14 days ago. Jan 05, 2018 - 14:02 CET
Our final step of maintenance will involve a short downtime on the Portal and Customer API of roughly 2 minutes.
Posted 14 days ago. Jan 05, 2018 - 13:56 CET
All connections to the Agent API are now re-establishing.
Posted 14 days ago. Jan 05, 2018 - 12:54 CET
The networking problems have been resolved. We are continuing with a maintenance step that will reset all manager connections one final time.
Posted 14 days ago. Jan 05, 2018 - 12:50 CET
We are currently seeing networking issues on the Agent API that appear unrelated to the current maintenance. We will be investigating this before continuing with the maintenance.
Posted 14 days ago. Jan 05, 2018 - 12:32 CET
The Agent API is fully operational again. Note that due to the signficant downtime our data processing backend is currently dealing with a small backlog, which will be processed in the next hours.
Posted 14 days ago. Jan 05, 2018 - 12:07 CET
The Agent API is resuming normal operation. It may take some time for all connections to be restored.
Posted 14 days ago. Jan 05, 2018 - 11:45 CET
Some of the maintenance is unfortunately taking longer than initially estimated. The Agent API is currently still unvailable. We are working hard to make sure it resumes service as soon as possible.
Posted 14 days ago. Jan 05, 2018 - 11:40 CET
We are now temporarily disabling our entire Agent API cluster to give ourselves the freedom to perform all the necessary steps in rapid succession without requiring complete intermediate connection recovery. We expect this downtime to total 5 to 10 minutes.
Posted 14 days ago. Jan 05, 2018 - 11:17 CET
We have performed the initial batch of updates that can be performed without service interruption. Our next step is to perform selective updates to our Agent API platform. Since this cluster allows connections to fail over from one node to another we expect little interruption, but you may see temporary connection loss reported by your agent.
Posted 14 days ago. Jan 05, 2018 - 10:26 CET
We would like to make our customers aware that we, in conjunction with our infrastructure provider, will be performing high-priority security updates and system reboots on very short term to address recently discovered vulnerabilities in our infrastructure's CPU's architecture, published and covered under the names Meltdown (CVE-2017-5754) and Spectre (CVE-2017-5753, CVE-2017-5715).
We will be updating this post with more information as we progress and inform you about service unavailability as a result of this emergency maintenance.