President-Anonymous Posted August 28, 2012 Report Share Posted August 28, 2012 I reported this 3 months ago when it was around 70% via ticket. We're getting close to pushing 85-90 argulseusef 1 Quote Link to comment Share on other sites More sharing options...
Brian Posted August 29, 2012 Report Share Posted August 29, 2012 We have monitoring setup for each partition on every system with thresholds set when the amount of drive space remaining is at a warning level and critical level. Should a partition fill up for whatever reason (this happens very, very rarely) we also have safeguards in place to clear up space on the partition to ensure the system continues to function. One other thing to note is that percentages could be misleading depending on the size of the partitions. You could have a warning saying only 80% left but there could be 10GB free on that partition which may be 5x more space than that particular partition should ever need. Quote Link to comment Share on other sites More sharing options...
President-Anonymous Posted August 30, 2012 Author Report Share Posted August 30, 2012 Either way, overall it does seem better Quote Link to comment Share on other sites More sharing options...
Fowler Posted June 1, 2013 Report Share Posted June 1, 2013 We have monitoring setup for each partition on every system with thresholds set when the amount of drive space remaining is at a warning level and critical level. Should a partition fill up for whatever reason (this happens very, very rarely) we also have safeguards in place to clear up space on the partition to ensure the system continues to function.One other thing to note is that percentages could be misleading depending on the size of the partitions. You could have a warning saying only 80% left but there could be 10GB free on that partition which may be 5x more space than that particular partition should ever need.AMS001 just encountered 100% usage on /var. The result was 500 errors and white pages. A little while later, it drops to 97% and the sites are online again. Now the page still doesn't make for good viewing but atleast things are working again.The reason I mention this is1) How can you lot let it get to 100%2) What were the "safeguards" as to be honest I don't think they workedFor weeks I have been seeing it getting closer and closer to 100% but I had this topic going around in my mind and I thought surely they will not let it reach 100%. Unfortunately I was wrong.Seriously guys... You need to up your game a bit. This is just the latest of a long line of issues with you. I really would struggle to recommend you lot to my worst enemies. Quote Link to comment Share on other sites More sharing options...
Brian Posted June 2, 2013 Report Share Posted June 2, 2013 1) How can you lot let it get to 100% Doesn't take much, unfortunately. Not going to make an excuse and say that it sitting at 95% or higher is okay, but the percentages are less important than the actual amount of disk space left. Even at 97% full there is still over 4GB of space free. AMS001 is not being used for new accounts, so there is theoretically no reason for the usage on any given partition to grow rapidly in a short period of time. How does it happen though? Tons of reasons, most of which involve one off issues that once we identify are fixed. In this instance our backup software went haywire and wrote a whole lot of data in a short amount of time. I'd venture a guess and say that even if there were 10GB free on /var we may have hit 100% at the rate which it wrote to the partition. Mistakes happen, but it isn't due to a lack of due diligence. 2) What were the "safeguards" as to be honest I don't think they worked You're right, in this case they didn't. They do 99% of the time though, and that is why we have such little downtime or persistent server issues. I hate to say it but in this case even our safeguards wouldn't have prevented the issue, the backup software was simply on a mission to fill that partition. For weeks I have been seeing it getting closer and closer to 100% but I had this topic going around in my mind and I thought surely they will not let it reach 100%. Unfortunately I was wrong. I don't see the harm in poke-checking us if you notice something like this. You ultimately stand to benefit if you report an issue that may cause downtime at some point for your website. Seriously guys... You need to up your game a bit. This is just the latest of a long line of issues with you. I really would struggle to recommend you lot to my worst enemies I'd like to know more about this. I think we're having fewer issues now than we were 6 months ago. The issues we do have are handled both more efficiently and with a stronger focus on customer service than before as well. I'm proud of the changes we've made and I definitely don't think we've taken a step back in any aspect of our operation. I think you've been around long enough to know we don't sweep issues under the rug. When something is our fault we hold ourselves accountable and do our damndest to fix it permanently for the future. I'm not happy about this happening, I don't like seeing customers upset like you are, but at the same time I realize we're not going to have 100% uptime 365 days a year. I think we fixed this very quickly and have already begun to address the problem. Feel free to PM me if you want to further the discussion, I'm all ears for you. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.