Kernel Upgrade [04/06/2011]


Cody R.

Recommended Posts

We'll be performing a kernel upgrade on on Wednesday April 6th between 12PM-3PM PST. While we use KSplice to keep our kernels up-to-date for critical bugs and security fixes it doesn't pull in new features from the upstream kernel. As a result we perform semi-annual kernel updates to keep everything up-to-date and in-line with upstream.

Date: 04/06/2011

Start time (PDT): 12:00pm

End time (PDT): 3:00pm

Duration: 1 hour

Estimated Down Time: 10 minutes

Link to comment
Share on other sites

There was an issue during the maintenance of this machine. We're actively investigating and working on resolving it.

We'll post more updates shortly - we're currently waiting for data center technicians to resolve a few issues with the IPMI/virtual media we use.

Link to comment
Share on other sites

There has been some unforeseen issues with the IPMI and network connectivity causing issues with us being able to investigate this further and take corrective action. We're working with data center technicians to get this resolved so we can actively investigate the machine.

We'll be posting more updates as we receive them.

Link to comment
Share on other sites

We'll be providing more information on this issue within the next 24 hours however the machine is currently online. We're wrapping up our initial maintenance and expect everything to be online shortly. We'll update this thread when everything is fully online.

Link to comment
Share on other sites

All sites and services should be fully accessible at this time. Thank you to everyone for being patient throughout the downtime.

As Cody mentioned, we'll be providing a more detailed explanation of todays issues within the next 24 hours once we're able to compile all the necessary information of what led to the crash.

Link to comment
Share on other sites

  • 3 weeks later...

This issue was caused by software bug which resulted in major corruption in the operating system requiring us to repair it via backups. The software bug was supposedly fixed several years ago however best we can tell it somehow got re-introduced. It required a certain set of circumstances to produce and we unfortunately had a machine produce them and we could not stop the problems it caused quick enough to not cause system availability problems.

Once this problem was identified we had issues using our rescue systems on the server. This required further assistance this time from our datacenter in order to restore functionality of the rescue system as we needed this to restore services. Once we had a working rescue system we spent the rest of the time repairing just the operating system which took extensive testing before we were confident it was all corrected and would function as it did before.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.