Routing Anomalies [1/29/2010]


Tony

Recommended Posts

We are currently seeing routing anomalies in our Dallas location. It appears to be affecting some ISP's not all and in some cases it is just packet loss other cases higher latency or in rare cases complete inaccessibility. We'll update this thread as we have more information. The latest issue we've tracked was 15 minutes ago but we've opened this up to make users aware in case they are still experiencing issues.

Link to comment
Share on other sites

Event Times:

Date: 01/29/2010

Location: DAL01

Affected Services: Public Network Connectivity

Devices: CER01.DAL01, CER02.DAL01, CER03.DAL01

Start Time: 3:15 PM CST

End Time: 3:25 PM CST

Duration: Approximately 10 minutes.

Date: 01/29/2010

Location: DAL01

Affected Services: Public Network Connectivity

Devices: CER01.DAL01, CER02.DAL01, CER03.DAL01

Start Time: 4:05 PM CST

End Time: 4:20 PM CST

Duration: Approximately 15 minutes.

Event Summary:

At approximately 3:15 PM CST, Datacenter Engineers were alerted to a number of routing anomalies happening within the DAL01 facility. Initial investigations determined that there was a significant drop in outbound traffic resulting in loss of connectivity to a number of services. After further investigation, it was found that the Internap FCP (Flow Control Platform) was injecting a large number of routes resulting in customer traffic being black holed. While this would not have impacted all customers, any customer with a prefix being actively engineered by the FCP would have noticed either severely degraded service or loss of service completely for this time period.

As a first step method to resolve the issue, the BGP sessions from CER01.DAL01, CER02.DAL01, and CER03.DAL01 to the FCP were cleared at approximately 3:25 PM CST. This resulted in all invalid prefixes being dropped from the route table. Customer traffic was restored successfully at this point while Datacenter Engineers continued to work on the FCP device.

At approximately 4:05 PM CST during the course of troubleshooting the device with the vendor, it was determined that the FCP was attempting to install invalid routes into the route table again. This resulted in customer traffic being black holed. As a final measure to resolve the issue, the FCP device was forcefully reloaded to clear any lingering issues that could ultimately cause this issue.

Service to impacted customers in the DAL01 facility was restored at approximately 4:20 PM CST. At this time, the FCP device continues to function normally. Datacenter Engineers will continue to monitor the device to ensure there are no further issues. Engineers have also opened a priority case with Internap regarding this issue to determine the root cause, and a long term fix if required.

We do apologize for the unexpected outage and appreciate your patience during this event.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.