Tony Posted September 16, 2009 Report Posted September 16, 2009 The Jedi server is once again acting up which we believe to be caused by the raid cards firmware. We have been suggested by datacenter technicians to upgrade the firmware to hopefully solve this issue. As it stands the raid card reports all drives healthy while reporting a degraded array. It also reports in logs that one of the drives has in fact failed. These are the reasons why tonight between 11:00 PM EDT and 1:00 AM EDT we will updating the servers raid card firmware even though this is short notice. Upgrading firmware requires the server off line so unfortunately it will be down for 15-30 minutes. Once the server is back up we will start up all virtual machine instances once again. Along with this maintenance window we're going to take the time to upgrade the nodes kernel to the latest version. We have encountered several bugs that are not service affecting but we'd like to see them fixed. We're very sorry about any inconvenience this may cause but this is necessary so that there is no chance of data loss. Date: 09/16/2009 Start time (EDT): 11:00pm End time (EDT): 1:00am Duration: 2 Hours Estimated Down Time: 15-30 minutes
Tony Posted September 17, 2009 Author Report Posted September 17, 2009 The server has just gone off line for this maintenance to be completed.
Tony Posted September 17, 2009 Author Report Posted September 17, 2009 It has been back up for 5 minutes now we're just working on having all the virtual machines starting back up.
Tony Posted September 17, 2009 Author Report Posted September 17, 2009 All virtual machines should be up and running now. Everything went as planned the firmware is up to date now and we're no longer having misinformation on the raid cards reporting system. The kernel upgrade has fixed some lingering bugs on the system itself as well. The total down time for all of this was 20 minutes. Then about another 10 to have every single virtual machine back up and running.
Tony Posted September 17, 2009 Author Report Posted September 17, 2009 The server is running a verify on the array in the background just in case one of the drives may be bad or have bad data. Other than that everything is resolved so this is window is being marked as completed.
Recommended Posts