An analysis of interdomain availability and causes of failures based on active measurements
- First Online:
- Cite this article as:
- Myakotnykh, E.S., Wittner, O.J., Helvik, B.E. et al. Telecommun Syst (2013) 52: 847. doi:10.1007/s11235-011-9586-1
- 108 Downloads
With the objective to better understand how the global Internet should achieve an availability in the order of “five nines”, i.e. be available 0.99999 of the time, active measurements were performed between Norway and China through the Global Research Network. End-to-end downtime statistics were collected during two 3-month periods, mid November 2009 till mid February 2010 and July 2010 till September 2010. Probe packets were sent every 10 ms between the two measurement systems supplemented by traceroute measurements every two minutes. The collected data (TTL, timestamps, sequence numbers and traceroute output) enabled identification and characterization of IP-level paths between the end-points. Causes of observed network failures were identified and insight is gained into processes preceding and following communication downtimes. We distinguish inter- and intradomain failures and, when possible, identify an exact link or an Autonomous System where a certain event has happened. The study shows that the end-to-end path availability is mainly affected by interdomain failures and long BGP convergence time as well as series of events not straight forwardly explained by the anticipated (re)routing behavior.