Skip to main content

How to Find Correlated Internet Failures

  • Conference paper
  • First Online:
Passive and Active Measurement (PAM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 11419))

Included in the following conference series:

Abstract

Even as residential users increasingly rely upon the Internet, connectivity sometimes fails. Characterizing small-scale failures of last mile networks is essential to improving Internet reliability.

In this paper, we develop and evaluate an approach to detect Internet failure events that affect multiple users simultaneously using measurements from the Thunderping project. Thunderping probes addresses across the U.S. When the areas in which they are geo-located are affected by severe weather alerts. It detects a disruption event when an IP address ceases to respond to pings. In this paper, we focus on simultaneous disruptions of multiple addresses that are related to each other by geography and ISP, and thus are indicative of a shared cause. Using binomial testing, we detect groups of per-IP disruptions that are unlikely to have happened independently. We characterize these dependent disruption events and present results that challenge conventional wisdom on how such outages affect Internet address blocks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Since disruptions are a superset of outages and dynamic reassignment [16], frequent disruptions are not necessarily indicative of poor Internet connectivity. Also, the existence of many aggregates with few disruptions indicates that Thunderping often pinged addresses during weather conditions that were not conducive to disruptions.

References

  1. Argon, O., Bremler-Barr, A., Mokryn, O., Schirman, D., Shavitt, Y., Weinsberg, U.: On the dynamics of IP address allocation and availability of end-hosts. arXiv preprint arXiv:1011.2324 (2010)

  2. Bischof, Z., Bustamante, F., Feamster, N.: The growing importance of being always on - a first look at the reliability of broadband internet access. In: Research Conference on Communications, Information and Internet Policy (TPRC), vol. 46 (2018)

    Google Scholar 

  3. Bischof, Z.S., Bustamante, F.E., Stanojevic, R.: Need, want. Broadband markets and the behavior of users. In: IMC, Can Afford (2014)

    Google Scholar 

  4. Dainotti, A., et al.: Analysis of country-wide Internet outages caused by censorship. In: IMC (2011)

    Google Scholar 

  5. Grover, S., et al.: Peeking behind the NAT: an empirical study of home networks. In: IMC (2013)

    Google Scholar 

  6. Heidemann, J., Pradkin, Y., Govindan, R., Papadopoulos, C., Bartlett, G., Bannister, J.: Census and survey of the visible Internet. In: IMC (2008)

    Google Scholar 

  7. Internet Outage Detection and Analysis (IODA). https://www.caida.org/projects/ioda/

  8. National Hurricane Center Tropical Cyclone Report: Hurricane Irma. https://www.nhc.noaa.gov/data/tcr/AL112017_Irma.pdf

  9. Katz-Basset, E., Madhyastha, H.V., John, J.P., Krishnamurthy, A., Wetherall, D., Anderson, T.: Studying black holes in the internet with Hubble. In: NSDI (2008)

    Google Scholar 

  10. Line Of Storms Moves Through Oklahoma. http://www.newson6.com/story/36651816/tornado-watch-in-effect-for-ne-oklahoma

  11. Northeast Storm Undergoes Bombogenesis, Bringing 70 MPH Gusts, Almost 350 Reports of Wind Damage, Flooding—The Weather Channel. https://weather.com/forecast/regional/news/2017-10-30-northeast-storm-damaging-winds-flooding

  12. 29–30 October 2017 damaging winds, heavy rainfall & flooding. https://www.weather.gov/aly/October29-302017

  13. More than 1 million power outages in the Northeast after blockbuster fall storm - The Washington Post. https://www.washingtonpost.com/news/capital-weather-gang/wp/2017/10/30/over-one-million-power-outages-in-the-northeast-after-blockbuster-fall-storm/

  14. Comcast outage on Sep 13 2017 in the Outages Mailing List. https://puck.nether.net/pipermail/outages/2017-September/010754.html

  15. Padmanabhan, R.: Analyzing internet reliability remotely with probing-based techniques. Ph.D. thesis, University of Maryland (2018)

    Google Scholar 

  16. Padmanabhan, R., Dhamdhere, A., Aben, E., Claffy, K., Spring, N.: Reasons dynamic addresses change. In: IMC (2016)

    Google Scholar 

  17. Padmanabhan, R., Owen, P., Schulman, A., Spring, N.: Timeouts: beware surprisingly high delay. In: IMC (2015)

    Google Scholar 

  18. Quan, L., Heidemann, J., Pradkin, Y.: Trinocular: understanding internet reliability through adaptive probing. In: SIGCOMM (2013)

    Google Scholar 

  19. Richter, P., Padmanabhan, R., Plonka, D., Berger, A., Clark, D.: Advancing the art of internet edge outage detection. In: IMC (2018)

    Google Scholar 

  20. Sánchez, M.A., et al.: Dasu: pushing experiments to the internet’s edge. In: NSDI (2013)

    Google Scholar 

  21. Schulman, A., Spring, N.: Pingin’ in the rain. In: IMC (2011)

    Google Scholar 

  22. Shah, A., Fontugne, R., Aben, E., Pelsser, C., Bush, R.: Disco: fast, good, and cheap outage detection. In: TMA (2017)

    Google Scholar 

  23. Shavitt, Y., Shir, E.: DIMES: let the internet measure itself. SIGCOMM Comput. Commun. Rev. 35, 71–74 (2005)

    Article  Google Scholar 

  24. Sundaresan, S., Burnett, S., Feamster, N., de Donato, W.: BISmark: a testbed for deploying measurements and applications in broadband access networks. In: USENIX ATC, June 2014

    Google Scholar 

  25. van Belle, G., Heagerty, P.J., Fischer, L.D., Lumley, T.S.: Biostatistics: A Methodology for the Health Sciences, 2nd edn. Wiley, Hoboken (2004)

    Book  Google Scholar 

Download references

Acknowledgments

We thank Arthur Berger, Philipp Richter, our shepherd Georgios Smaragdakis, and the anonymous reviewers for their thoughtful feedback. This research is supported by the U.S. Department of Homeland Security (DHS) Science and Technology Directorate, Cyber Security Division (DHS S&T/CSD) via contract number 70RSAT18CB0000015 and by NSF grants CNS-1619048 and CNS-1526635.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramakrishna Padmanabhan .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Determining \(D_{min}\)

Section 3.1 described our technique for detecting dependent disruptions through the calculation of \(D_{min}\). Table 1 presents \(D_{min}\), computed for various values of N and \(P_d\). This table shows that, even for large aggregates of IP addresses, often few simultaneous disruptions are necessary to be able to confidently conclude that a dependent disruption has occurred.

Table 1. \(D_{min}\) values for varying values of N and \(P_d\). There is less than 0.01% probability according to the binomial test that \(D_{min}\) or more addresses fail for each N and \(P_d\).
Fig. 8.
figure 8

Figure 8 shows the distribution of the probability that the 20,831 detected dependent disruption events could have occurred independently. For 90% of events, the probability of occurring independently is less than 0.00005.

Fig. 9.
figure 9

Figure 9 shows the number of dependent disruption events detected per ISP. Note that these numbers are more a reflection of addresses sampled and pinged in the Thunderping dataset than any major underlying problem in their infrastructure. We leave per-ISP comparisons of dependent disruptions to future work.

1.2 A.2 Analyzing the Confidence of Detected Disruption Events

Here, we examine our confidence in the 20,831 detected dependent disruption events from Sect. 3.2. The occurrence of \(D_{min}\) disruptions has less than 0.01% probability according to the binomial test. We test if most detected dependent disruption events have exactly 0.01% probability of occurring or if they are well clear of this threshold.

Figure 8 shows the distribution of the probability that we incorrectly classify an independent event as dependent. The probability of occurring independently is less than 0.005% for 90% of the events and less than 0.001% for 75%. Thus, the probabillity that detected events occurred independently is typically much smaller than our choice of 0.01%.

1.3 A.3 Dependent Disruption Events Across ISPs

We grouped dependent disruption events by ISP to check if any ISPs contribute an unusual number of events. Figure 9 shows the top 15 ISPs with dependent disruption events. These top 15 ISPs together account for 13,643 (65%) of all detected events.

We emphasize that these results are not meant to reflect any underlying problems with these ISPs; Thunderping samples and pings large ISPs more frequently and consequently, finds more disrupted addresses in them. The purpose of this analysis is to ensure that no ISP contributes unduly many events.

Fig. 10.
figure 10

For Comcast, Qwest, and Viasat: Minimum actual disrupted addresses in a /24 vs. responsive addresses in a /24, for all /24s with at least \(D_{min}\) address that were disrupted during a detected dependent disruption event. All ISPs have /24s with actual disrupted addresses where there continued to be responsive addresses throughout the disruption.

1.4 A.4 Dependent Disruptions May Not Disrupt Entire /24s: Implications

Continuing our analysis from Sect. 4.4, we investigated if the responsiveness of other addresses in /24s with actual disrupted addresses would vary across ISPs. Figure 10 shows per-ISP behavior. We see that all these ISPs have /24s with actual disrupted addresses where there continued to be responsive addresses throughout the disruption.

Prior work detecting outages within /24 aggregates may miss these events. Since a single positive response from an address within a /24 could lead Trinocular to conclude that the block is alive [18], it can miss dependent disruption events affecting only a subset of addresses within a /24 address block. Richter et al.’s technique is capable of detecting partial /24 disruptions [19]; indeed, many of their disruptions did not affect all addresses in the /24. However, their choice of the alpha parameter in their technique (\(alpha = 0.5\)) meant that they would only detect disruptions where at least half of the active addresses were disrupted. In this paper, we showed that many /24s with actual disrupted addresses continued to have more than half of their (sampled) addresses responsive.

We believe that prior work may be able to detect these events by analyzing broader address aggregates (such as the state-ASN aggregates we use), in addition to /24 aggregates. In preliminary investigations, we found that many of our dependent disruption events consisted of multiple observed disrupted /24s that were each only partially disrupted; that is, a few addresses from many /24s were disrupted simultaneously but there continued to be other responsive addresses in these /24s. One of the largest events had 811 addresses from 42 /24 blocks in the observed disrupted group and 40 of these blocks had responsive addresses. We leave additional analyses for future work but we believe that we detected such events due to the broader aggregate of addresses we considered.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Padmanabhan, R., Schulman, A., Dainotti, A., Levin, D., Spring, N. (2019). How to Find Correlated Internet Failures. In: Choffnes, D., Barcellos, M. (eds) Passive and Active Measurement. PAM 2019. Lecture Notes in Computer Science(), vol 11419. Springer, Cham. https://doi.org/10.1007/978-3-030-15986-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15986-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15985-6

  • Online ISBN: 978-3-030-15986-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics