Sign in to follow this  
Tony

April Uptime Report

Recommended Posts

April was unfortunately a tough month at Devoted Host as our primary datacenter SoftLayer was running on generators due to the inclement weather in the area. When switching back to utility power a 2500amp breaker failed placing all our servers on UPS backup power. Unfortunately after 30 minutes machines started to go offline. After about 3 hours of work from SoftLayer electricians onsite power was restored to servers.

I'd like to make things clear here things happen from time to time that are unexpected. Hardware does fail the best you can do is restore service as fast as possible. The 3 hour timeframe this was done in was significantly better than other datacenters in the past have replaced breakers of this size.

So here is our uptime reports for all our servers it includes the no longer active Apollo as well as the now active Jupiter.

Share this post


Link to post
Share on other sites

Apollo Uptime Report

Date Average response time Uptime Downtime

1 113.566 ms 100.00% -

2 115.074 ms 100.00% -

3 112.978 ms 100.00% -

4 126.135 ms 99.87% 1m 52s

5 129.114 ms 100.00% -

6 129.270 ms 100.00% -

7 116.015 ms 100.00% -

8 123.200 ms 100.00% -

9 146.009 ms 100.00% -

10 144.377 ms 100.00% -

11 123.792 ms 100.00% -

12 119.177 ms 99.75% 3m 34s

13 146.134 ms 100.00% -

14 118.226 ms 99.69% 4m 28s

15 122.527 ms 99.87% 1m 50s

16 119.553 ms 100.00% -

17 127.432 ms 100.00% -

18 149.720 ms 98.46% 21m 22s

19 203.870 ms 100.00% -

20 168.695 ms 100.00% -

21 141.747 ms 100.00% -

22 212.197 ms 100.00% -

23 260.297 ms 100.00% -

24 229.536 ms 100.00% -

25 248.282 ms 100.00% -

26 141.782 ms 100.00% -

27 262.005 ms 100.00% -

28 97.680 ms 99.87% 1m 48s

29 248.412 ms 100.00% -

30 180.190 ms 100.00% -

Average Response Time: 155.900 ms

Uptime %: 99.92%

Total Downtime: 34m 54s

Reponse Time Graph (Click for full size)

[ATTACH]9[/ATTACH]

Uptime Graph (Click for full size)

[ATTACH]10[/ATTACH]

There is a few couple minute outages which I'm unsure about the monitoring may have picked up a reboot of the web server. The 23 minute outage was unfortunately due to the server crashing which we rebooted the second we were notified by monitoring.

Share this post


Link to post
Share on other sites

Venus Uptime Report

Date Average response time Uptime Downtime

1 131.996 ms 100.00% -

2 111.240 ms 100.00% -

3 143.644 ms 100.00% -

4 150.203 ms 100.00% -

5 129.246 ms 100.00% -

6 128.621 ms 100.00% -

7 143.324 ms 100.00% -

8 131.763 ms 100.00% -

9 153.875 ms 100.00% -

10 137.886 ms 100.00% -

11 114.887 ms 100.00% -

12 115.060 ms 100.00% -

13 113.470 ms 100.00% -

14 113.087 ms 95.22% 1h 8m

15 115.156 ms 89.79% 2h 25m

16 141.109 ms 100.00% -

17 115.080 ms 100.00% -

18 138.402 ms 100.00% -

19 181.812 ms 100.00% -

20 153.000 ms 100.00% -

21 137.502 ms 100.00% -

22 204.743 ms 100.00% -

23 229.461 ms 100.00% -

24 255.322 ms 100.00% -

25 220.109 ms 100.00% -

26 143.444 ms 100.00% -

27 240.524 ms 100.00% -

28 111.669 ms 100.00% -

29 206.633 ms 100.00% -

30 187.795 ms 100.00% -

Average Response Time: 153.336 ms

Uptime %: 99.49%

Total Downtime: 3h 34m

Reponse Time Graph (Click for full size)

[ATTACH]11[/ATTACH]

Uptime Graph (Click for full size)

[ATTACH]12[/ATTACH]

As stated at the start the only downtime relates to the server losing power which was out of our control.

Share this post


Link to post
Share on other sites

Mercury Uptime Report

Date Average response time Uptime Downtime

1 159.422 ms 100.00% -

2 115.578 ms 100.00% -

3 147.424 ms 100.00% -

4 121.439 ms 100.00% -

5 127.083 ms 100.00% -

6 155.528 ms 99.87% 1m 54s

7 114.271 ms 100.00% -

8 119.988 ms 99.67% 4m 42s

9 151.359 ms 100.00% -

10 139.147 ms 100.00% -

11 143.187 ms 99.84% 2m 19s

12 126.482 ms 100.00% -

13 116.788 ms 100.00% -

14 142.847 ms 94.94% 1h 12m

15 115.447 ms 90.16% 2h 20m

16 138.729 ms 100.00% -

17 143.390 ms 95.58% 1h 2m

18 158.103 ms 100.00% -

19 221.290 ms 100.00% -

20 137.586 ms 100.00% -

21 141.850 ms 100.00% -

22 169.329 ms 100.00% -

23 282.215 ms 100.00% -

24 195.543 ms 100.00% -

25 211.972 ms 100.00% -

26 201.946 ms 100.00% -

27 210.597 ms 100.00% -

28 110.940 ms 100.00% -

29 230.864 ms 100.00% -

30 165.229 ms 100.00% -

Average Response Time: 157.186 ms

Uptime %: 99.32%

Total Downtime: 4h 44m

Reponse Time Graph (Click for full size)

[ATTACH]13[/ATTACH]

Uptime Graph (Click for full size)

[ATTACH]14[/ATTACH]

The power outage affected this machine but unfortunately it also got hit by a large dos attack which caused the machine to crash. When rebooting it unfortunately the file system needed to be checked in case of corruption, this took an hour to complete.

Share this post


Link to post
Share on other sites

Mars Uptime Report

Date Average response time Uptime Downtime

1 149.441 ms 100.00% -

2 129.789 ms 100.00% -

3 141.773 ms 100.00% -

4 127.077 ms 100.00% -

5 138.347 ms 100.00% -

6 121.486 ms 98.88% 16m 4s

7 124.616 ms 100.00% -

8 117.444 ms 100.00% -

9 150.897 ms 100.00% -

10 154.451 ms 100.00% -

11 150.984 ms 99.89% 1m 34s

12 115.573 ms 100.00% -

13 126.394 ms 100.00% -

14 132.046 ms 95.21% 1h 8m

15 124.827 ms 90.16% 2h 20m

16 136.220 ms 100.00% -

17 128.433 ms 100.00% -

18 150.162 ms 100.00% -

19 205.339 ms 100.00% -

20 199.453 ms 100.00% -

21 150.488 ms 99.85% 2m 4s

22 188.618 ms 100.00% -

23 245.251 ms 100.00% -

24 252.726 ms 100.00% -

25 215.145 ms 99.56% 6m 18s

26 165.400 ms 100.00% -

27 274.479 ms 99.07% 13m 10s

28 109.717 ms 100.00% -

29 309.464 ms 100.00% -

30 194.027 ms 100.00% -

Average Response Time: 164.336 ms

Uptime %: 99.40%

Total Downtime: 4h 8m

Reponse Time Graph (Click for full size)

[ATTACH]15[/ATTACH]

Uptime Graph (Click for full size)

[ATTACH]16[/ATTACH]

The server like the others was affected by the power outage. We also did a kernel upgrade late in the month. As for the other small outages I'm not sure about monitoring picked them up but they appear to be somewhat isolated and could have simply been apache being flooded with technicians solving the problem and blocking offending ip's.

Share this post


Link to post
Share on other sites

Jupiter Uptime Report

Date Average response time Uptime Downtime

1 N/A N/A N/A

2 N/A N/A N/A

3 N/A N/A N/A

4 N/A N/A N/A

5 N/A N/A N/A

6 N/A N/A N/A

7 N/A N/A N/A

8 N/A N/A N/A

9 N/A N/A N/A

10 N/A N/A N/A

11 N/A N/A N/A

12 N/A N/A N/A

13 N/A N/A N/A

14 N/A N/A N/A

15 N/A N/A N/A

16 N/A N/A N/A

17 N/A N/A N/A

18 N/A N/A N/A

19 N/A N/A N/A

20 224.864 ms 100.00% -

21 132.831 ms 100.00% -

22 156.153 ms 100.00% -

23 291.108 ms 100.00% -

24 255.024 ms 100.00% -

25 177.539 ms 100.00% -

26 160.908 ms 100.00% -

27 210.755 ms 100.00% -

28 114.108 ms 100.00% -

29 232.005 ms 99.87% 1m 48s

30 187.929 ms 100.00% -

Average Response Time: 194.839 ms

Uptime %: 99.99%

Total Downtime: 1m 48s

Reponse Time Graph (Click for full size)

[ATTACH]17[/ATTACH]

Uptime Graph (Click for full size)

[ATTACH]18[/ATTACH]

Jupiter was brought online late in the month it posted a good uptime the only outage was most likely just the web server rebooting and the monitoring picked it up. The monitoring system does checks in 1 minute intervals so if it picks up the service being down it is counted as a total of 1 minute minimum.

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.
Sign in to follow this