Service Disruption 2023-02-10
On Friday 10th of February at around 13:30 we detected a severe slowing in the performance of the Manu Online application. This performance degradation was so severe that are system was in practice unusable.
Investigation of the causes identified that there were no problems with our servers but that the problem was with the data centre network that was preventing the servers properly communicating with each other. We quickly escalated this problem to Rackspace, our data centre provider. They confirmed that there was a wider data center disruption outside of Manu Online systems. All customers of their London data centre were affected.
Rackspace fixed the problem but it did take them some time. Here is their statement:
“On 10 February 2023, starting approximately at 11:40 UTC, a portion of customers may have experienced a brief period of latency or packet loss as traffic failed over to the redundant device in the LON3 data center due to high CPU utilization in the aggregation router. At 14:39 UST, engineers applied changes to the offended ports on the affected aggregation router which solved the issue.”
We apologise for this rare service disruption. Over the years, Manu Online has occasionally suffered from a few hours of network failure. However we are still proud that the system has never taken a full day off work.