-
Posts
2,435 -
Joined
-
Last visited
-
Days Won
158
Everything posted by Tony
-
Network connectivity has been interrupted at this point as work on replacing the Cisco 6509-E Chassis has started.
-
Here's the official notification from networking as to the cause of the issues for the Cobra server:
-
The server has a working network once again. We'll post further information as far as the network outage once we have it from networking.
-
Just received notice from networking that there is an issue with the switch the machine is on and it's affecting all machines connected to it. Networking is currently working on fixing the issue.
-
The datacenter networking team is currently looking into the issue and we hope to have an update soon with further information.
-
We are currently investigating network connectivity issues on the Cobra server. Once we have further information it will be posted here.
-
This issue was caused by software bug which resulted in major corruption in the operating system requiring us to repair it via backups. The software bug was supposedly fixed several years ago however best we can tell it somehow got re-introduced. It required a certain set of circumstances to produce and we unfortunately had a machine produce them and we could not stop the problems it caused quick enough to not cause system availability problems. Once this problem was identified we had issues using our rescue systems on the server. This required further assistance this time from our datacenter in order to restore functionality of the rescue system as we needed this to restore services. Once we had a working rescue system we spent the rest of the time repairing just the operating system which took extensive testing before we were confident it was all corrected and would function as it did before.
-
Everything is back up now just waiting for networking to give their report on the issue.
-
Everything in Seattle is once again down to reload the router to hopefully have a permanent fix to the issue.
-
Update for when the service was restored:
-
You'll need to make a ticket as for whatever reason the cluster did not remove the domain when it should have when you terminated the account.
-
On Monday April 18th between 11:00pm and 1:00pm PDT we will be upgrading the Titan server's PHP 5.2.x to the latest 5.2.x version. We do not expect any down time during the maintenance window. We will be upgrading the system PHP to the latest version first then the web servers so for a short time there may be different versions handling command line cron jobs. The reason for this is to get various fixes for LSAPI as well as general PHP fixes as well. If you have any questions about this maintenance window do not hesitate to contact support. Date: 04/07/2011 Start time (PDT): 11:00am End time (PDT): 1:00pm Estimated Down Time: None Duration: 2 hours
-
Just check the versions that they claim will work and match up accordingly. You probably may even be able to upgrade some portions. The only thing you can't run right now is Rails 3. We're working on a non Mongrel solution to that though no ETA though lot of stuff going on.
-
If you're familiar with SSH what I'd recommend doing is checking your process list so: ps -u yourusername [/code] Then find the ruby process which I assume is still running. Then just do kill 3983983 with the number being the process id for the ruby instance running.
-
There is service impacting network maintenance on Monday April 25th 2011. This is necessary network maintenance and will cause most likely an hour of network downtime due to this. This affects just the Falcon servers which is on the newer fcr02 customer facing router. Here is the notice we received regarding this from the datacenter: Scheduled Data Center Maintenance - FCR02.WDC01 [04/25/2011] Date: Monday, April 25, 2011 (04/25/2011) Start Time: 12:00 AM EDT End Time: 02:00 AM EDT Services affected: Public Network Location: WDC01 Duration: 2 hours ================================================== Event Summary: SoftLayer Engineers along with Cisco TAC have identified a faulty component in the FCR02.WDC01 chassis which has prevented the chassis from running in the required redundancy mode. This part is directly integrated into the chassis of the router which will require replacing the entire Cisco 6509-E chassis. Engineers will be required to power down the entire FCR02.WDC01 chassis to perform this replacement. The expected downtime is 1 hour with the maintenance window being scheduled for up to 2 hours. Start Time: 00:00 EDT End Time: 02:00 EDT Expected Duration: 1 Hour Customer Impact: During this maintenance, customers will notice a complete loss of connectivity to their servers on the frontend network (public network). Backend network (private network) connectivity will NOT be impacted during this maintenance. While the upgrade duration is scheduled for 4 hours, we only expect around 1 hour of downtime as the router's chassis is replaced and the router is powered on. Again, this will NOT impact the backend network (private network) for customer servers.
-
The machine is back online at this point and service are all coming back up. Just taking a minute to fill the memory caches again to get back to optimal speeds.
-
We've had to extended the window until 5pm PDT as backups are currently running on this machine.
-
This has been completed the machine is currently starting up all services again.
-
The server is back online now and all services are starting up as I type this.
-
This has been completed. We're doing some cleanup related tasks but all user traffic for shared hosting users should now be served from the Cobra server.
-
We'll be performing a kernel upgrade on Tuesday April 12th between 10:00AM-11:00AM PDT. This upgrade is short notice due to the necessity to attempt to mitigate a a software bug affecting the web server software on the server. Once this upgrade is completed we hope the bug can no longer be produced and as a result there should be improve system stability. Date: 04/12/2011 Start time (PDT): 10:00am End time (PDT): 11:00am Duration: 1 hour Estimated Down Time: 10 minutes
-
This is currently underway should be finished in a few days.