Emergency Maintenance [08/01/2009]


Recommended Posts

Posted

This morning at 9:42 AM CDT we were alerted to a degraded array on the Marlin server. Upon inspection we discovered two failed drives as well as a failed backup battery unit. The battery unit acts as protection against power failure so if data has not yet been written to disk it will when power is restored. Without the battery the write caching featured is then disabled which decreases performance.

We are currently still investigating our options in this case as this machine has had a backup battery fail before. We may choose to replace the raid card in this instance.

We will be doing this maintenance as soon as possible which will unfortunately require down time. Once an assessment has been made by a datacenter technician we'll update this thread with the estimated down time and when this maintenance will be done.

Posted

We will be replacing the Backup battery unit in the next 30 minutes. It will require most likely 15 minutes to complete which unfortunately the machine will have to be down to do. Once that is done we will then replace the two bad drives while the server is online (hot swap).

Posted

The drives have been replaced it is now rebuilding the array which unfortunately will mean i/o wait will be much higher than normal due to the fact the server only has two functioning drives currently.

Guest
This topic is now closed to further replies.