Tony Posted July 15, 2009 Report Posted July 15, 2009 The saturn server has gone off line as of 8:36 PM CDT we are currently investigating the problem. We believe it may be related to the file system of the home directory going read only. We will update this once we have more information.
Tony Posted July 15, 2009 Author Report Posted July 15, 2009 The machine is back online and serving requests once again. It may be slightly slower than usual for the next 10 minutes while it rebuilds caches in memory for various applications.
Tony Posted July 15, 2009 Author Report Posted July 15, 2009 Read only once again. We're investigating some weird errors coming from the raid card.
Tony Posted July 15, 2009 Author Report Posted July 15, 2009 We're still working on it. For the time being the server can serve requests but you'll be unable to add new data until we resolve the underlying cause of the file system issues.
Tony Posted July 15, 2009 Author Report Posted July 15, 2009 We're going to be running a fsck on the /home partition as we are unable to get the partition to not go read only. This unfortunately means the machine may be down for up to an hour. Nothing we can do to solve this unfortunately as we assume something with one of the drives caused the issue but it's not something we can correct without running a fsck. This is one of the cases where the down time is necessary for the health of the system and the data the machine has. We are also continuing to investigate the raid itself.
Tony Posted July 15, 2009 Author Report Posted July 15, 2009 We're going to play it safe and run a fsck on all the other partitions as well as they're quite small in comparison to /home . It will still take some time we'll update once we have more information.
Tony Posted July 15, 2009 Author Report Posted July 15, 2009 The /home partition just went to phase 2 of the fsck and is now checking the directory structure. Hopefully should be done soon.
Tony Posted July 15, 2009 Author Report Posted July 15, 2009 Should be back online now and hopefully stay that way.
Tony Posted July 15, 2009 Author Report Posted July 15, 2009 The process of verifying the integrity of the raid array is still ongoing. Unfortunately it is causing some slowness on the server which we unfortunately cannot correct. This is a necessary step in maintaining the integrity of the data on the server.
Tony Posted July 16, 2009 Author Report Posted July 16, 2009 Should be finished and everything is running at full speed again. Although for the most part today things have been very fast once it got through the initial portion. We've now started up any tasks that were stopped for the time being (stats runs and backups)
Recommended Posts