Tony Posted January 29, 2010 Report Posted January 29, 2010 We are currently seeing routing anomalies in our Dallas location. It appears to be affecting some ISP's not all and in some cases it is just packet loss other cases higher latency or in rare cases complete inaccessibility. We'll update this thread as we have more information. The latest issue we've tracked was 15 minutes ago but we've opened this up to make users aware in case they are still experiencing issues.
Tony Posted January 29, 2010 Author Report Posted January 29, 2010 When we posted this it was already resolved. This will remain open until we have an official RFO.
Tony Posted January 30, 2010 Author Report Posted January 30, 2010 Event Times: Date: 01/29/2010 Location: DAL01 Affected Services: Public Network Connectivity Devices: CER01.DAL01, CER02.DAL01, CER03.DAL01 Start Time: 3:15 PM CST End Time: 3:25 PM CST Duration: Approximately 10 minutes. Date: 01/29/2010 Location: DAL01 Affected Services: Public Network Connectivity Devices: CER01.DAL01, CER02.DAL01, CER03.DAL01 Start Time: 4:05 PM CST End Time: 4:20 PM CST Duration: Approximately 15 minutes. Event Summary: At approximately 3:15 PM CST, Datacenter Engineers were alerted to a number of routing anomalies happening within the DAL01 facility. Initial investigations determined that there was a significant drop in outbound traffic resulting in loss of connectivity to a number of services. After further investigation, it was found that the Internap FCP (Flow Control Platform) was injecting a large number of routes resulting in customer traffic being black holed. While this would not have impacted all customers, any customer with a prefix being actively engineered by the FCP would have noticed either severely degraded service or loss of service completely for this time period. As a first step method to resolve the issue, the BGP sessions from CER01.DAL01, CER02.DAL01, and CER03.DAL01 to the FCP were cleared at approximately 3:25 PM CST. This resulted in all invalid prefixes being dropped from the route table. Customer traffic was restored successfully at this point while Datacenter Engineers continued to work on the FCP device. At approximately 4:05 PM CST during the course of troubleshooting the device with the vendor, it was determined that the FCP was attempting to install invalid routes into the route table again. This resulted in customer traffic being black holed. As a final measure to resolve the issue, the FCP device was forcefully reloaded to clear any lingering issues that could ultimately cause this issue. Service to impacted customers in the DAL01 facility was restored at approximately 4:20 PM CST. At this time, the FCP device continues to function normally. Datacenter Engineers will continue to monitor the device to ensure there are no further issues. Engineers have also opened a priority case with Internap regarding this issue to determine the root cause, and a long term fix if required. We do apologize for the unexpected outage and appreciate your patience during this event.
Recommended Posts