How to Find Correlated Internet Failures

Padmanabhan, Ramakrishna; Schulman, Aaron; Dainotti, Alberto; Levin, Dave; Spring, Neil

doi:10.1007/978-3-030-15986-3_14

Ramakrishna Padmanabhan^16,17,
Aaron Schulman¹⁸,
Alberto Dainotti¹⁷,
Dave Levin¹⁶ &
…
Neil Spring¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 11419))

Included in the following conference series:

International Conference on Passive and Active Network Measurement

1670 Accesses
9 Citations

Abstract

Even as residential users increasingly rely upon the Internet, connectivity sometimes fails. Characterizing small-scale failures of last mile networks is essential to improving Internet reliability.

In this paper, we develop and evaluate an approach to detect Internet failure events that affect multiple users simultaneously using measurements from the Thunderping project. Thunderping probes addresses across the U.S. When the areas in which they are geo-located are affected by severe weather alerts. It detects a disruption event when an IP address ceases to respond to pings. In this paper, we focus on simultaneous disruptions of multiple addresses that are related to each other by geography and ISP, and thus are indicative of a shared cause. Using binomial testing, we detect groups of per-IP disruptions that are unlikely to have happened independently. We characterize these dependent disruption events and present results that challenge conventional wisdom on how such outages affect Internet address blocks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Since disruptions are a superset of outages and dynamic reassignment [16], frequent disruptions are not necessarily indicative of poor Internet connectivity. Also, the existence of many aggregates with few disruptions indicates that Thunderping often pinged addresses during weather conditions that were not conducive to disruptions.

References

Argon, O., Bremler-Barr, A., Mokryn, O., Schirman, D., Shavitt, Y., Weinsberg, U.: On the dynamics of IP address allocation and availability of end-hosts. arXiv preprint arXiv:1011.2324 (2010)
Bischof, Z., Bustamante, F., Feamster, N.: The growing importance of being always on - a first look at the reliability of broadband internet access. In: Research Conference on Communications, Information and Internet Policy (TPRC), vol. 46 (2018)
Google Scholar
Bischof, Z.S., Bustamante, F.E., Stanojevic, R.: Need, want. Broadband markets and the behavior of users. In: IMC, Can Afford (2014)
Google Scholar
Dainotti, A., et al.: Analysis of country-wide Internet outages caused by censorship. In: IMC (2011)
Google Scholar
Grover, S., et al.: Peeking behind the NAT: an empirical study of home networks. In: IMC (2013)
Google Scholar
Heidemann, J., Pradkin, Y., Govindan, R., Papadopoulos, C., Bartlett, G., Bannister, J.: Census and survey of the visible Internet. In: IMC (2008)
Google Scholar
Internet Outage Detection and Analysis (IODA). https://www.caida.org/projects/ioda/
National Hurricane Center Tropical Cyclone Report: Hurricane Irma. https://www.nhc.noaa.gov/data/tcr/AL112017_Irma.pdf
Katz-Basset, E., Madhyastha, H.V., John, J.P., Krishnamurthy, A., Wetherall, D., Anderson, T.: Studying black holes in the internet with Hubble. In: NSDI (2008)
Google Scholar
Line Of Storms Moves Through Oklahoma. http://www.newson6.com/story/36651816/tornado-watch-in-effect-for-ne-oklahoma
Northeast Storm Undergoes Bombogenesis, Bringing 70 MPH Gusts, Almost 350 Reports of Wind Damage, Flooding—The Weather Channel. https://weather.com/forecast/regional/news/2017-10-30-northeast-storm-damaging-winds-flooding
29–30 October 2017 damaging winds, heavy rainfall & flooding. https://www.weather.gov/aly/October29-302017
More than 1 million power outages in the Northeast after blockbuster fall storm - The Washington Post. https://www.washingtonpost.com/news/capital-weather-gang/wp/2017/10/30/over-one-million-power-outages-in-the-northeast-after-blockbuster-fall-storm/
Comcast outage on Sep 13 2017 in the Outages Mailing List. https://puck.nether.net/pipermail/outages/2017-September/010754.html
Padmanabhan, R.: Analyzing internet reliability remotely with probing-based techniques. Ph.D. thesis, University of Maryland (2018)
Google Scholar
Padmanabhan, R., Dhamdhere, A., Aben, E., Claffy, K., Spring, N.: Reasons dynamic addresses change. In: IMC (2016)
Google Scholar
Padmanabhan, R., Owen, P., Schulman, A., Spring, N.: Timeouts: beware surprisingly high delay. In: IMC (2015)
Google Scholar
Quan, L., Heidemann, J., Pradkin, Y.: Trinocular: understanding internet reliability through adaptive probing. In: SIGCOMM (2013)
Google Scholar
Richter, P., Padmanabhan, R., Plonka, D., Berger, A., Clark, D.: Advancing the art of internet edge outage detection. In: IMC (2018)
Google Scholar
Sánchez, M.A., et al.: Dasu: pushing experiments to the internet’s edge. In: NSDI (2013)
Google Scholar
Schulman, A., Spring, N.: Pingin’ in the rain. In: IMC (2011)
Google Scholar
Shah, A., Fontugne, R., Aben, E., Pelsser, C., Bush, R.: Disco: fast, good, and cheap outage detection. In: TMA (2017)
Google Scholar
Shavitt, Y., Shir, E.: DIMES: let the internet measure itself. SIGCOMM Comput. Commun. Rev. 35, 71–74 (2005)
Article Google Scholar
Sundaresan, S., Burnett, S., Feamster, N., de Donato, W.: BISmark: a testbed for deploying measurements and applications in broadband access networks. In: USENIX ATC, June 2014
Google Scholar
van Belle, G., Heagerty, P.J., Fischer, L.D., Lumley, T.S.: Biostatistics: A Methodology for the Health Sciences, 2nd edn. Wiley, Hoboken (2004)
Book Google Scholar

Download references

Acknowledgments

We thank Arthur Berger, Philipp Richter, our shepherd Georgios Smaragdakis, and the anonymous reviewers for their thoughtful feedback. This research is supported by the U.S. Department of Homeland Security (DHS) Science and Technology Directorate, Cyber Security Division (DHS S&T/CSD) via contract number 70RSAT18CB0000015 and by NSF grants CNS-1619048 and CNS-1526635.

Author information

Authors and Affiliations

University of Maryland, College Park, USA
Ramakrishna Padmanabhan, Dave Levin & Neil Spring
CAID/UCSD, La Jolla, USA
Ramakrishna Padmanabhan & Alberto Dainotti
UCSD, San Diego, USA
Aaron Schulman

Authors

Ramakrishna Padmanabhan
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Schulman
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Dainotti
View author publications
You can also search for this author in PubMed Google Scholar
Dave Levin
View author publications
You can also search for this author in PubMed Google Scholar
Neil Spring
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramakrishna Padmanabhan .

Editor information

Editors and Affiliations

Northeastern University, Boston, MA, USA
David Choffnes
Federal University of Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
Marinho Barcellos

A Appendix

1.1 A.1 Determining \(D_{min}\)

Section 3.1 described our technique for detecting dependent disruptions through the calculation of \(D_{min}\). Table 1 presents \(D_{min}\), computed for various values of N and \(P_d\). This table shows that, even for large aggregates of IP addresses, often few simultaneous disruptions are necessary to be able to confidently conclude that a dependent disruption has occurred.

Table 1. \(D_{min}\) values for varying values of N and \(P_d\). There is less than 0.01% probability according to the binomial test that \(D_{min}\) or more addresses fail for each N and \(P_d\).

Full size table

1.2 A.2 Analyzing the Confidence of Detected Disruption Events

Here, we examine our confidence in the 20,831 detected dependent disruption events from Sect. 3.2. The occurrence of \(D_{min}\) disruptions has less than 0.01% probability according to the binomial test. We test if most detected dependent disruption events have exactly 0.01% probability of occurring or if they are well clear of this threshold.

Figure 8 shows the distribution of the probability that we incorrectly classify an independent event as dependent. The probability of occurring independently is less than 0.005% for 90% of the events and less than 0.001% for 75%. Thus, the probabillity that detected events occurred independently is typically much smaller than our choice of 0.01%.

1.3 A.3 Dependent Disruption Events Across ISPs

We grouped dependent disruption events by ISP to check if any ISPs contribute an unusual number of events. Figure 9 shows the top 15 ISPs with dependent disruption events. These top 15 ISPs together account for 13,643 (65%) of all detected events.

We emphasize that these results are not meant to reflect any underlying problems with these ISPs; Thunderping samples and pings large ISPs more frequently and consequently, finds more disrupted addresses in them. The purpose of this analysis is to ensure that no ISP contributes unduly many events.

1.4 A.4 Dependent Disruptions May Not Disrupt Entire /24s: Implications

Continuing our analysis from Sect. 4.4, we investigated if the responsiveness of other addresses in /24s with actual disrupted addresses would vary across ISPs. Figure 10 shows per-ISP behavior. We see that all these ISPs have /24s with actual disrupted addresses where there continued to be responsive addresses throughout the disruption.

Prior work detecting outages within /24 aggregates may miss these events. Since a single positive response from an address within a /24 could lead Trinocular to conclude that the block is alive [18], it can miss dependent disruption events affecting only a subset of addresses within a /24 address block. Richter et al.’s technique is capable of detecting partial /24 disruptions [19]; indeed, many of their disruptions did not affect all addresses in the /24. However, their choice of the alpha parameter in their technique (\(alpha = 0.5\)) meant that they would only detect disruptions where at least half of the active addresses were disrupted. In this paper, we showed that many /24s with actual disrupted addresses continued to have more than half of their (sampled) addresses responsive.

We believe that prior work may be able to detect these events by analyzing broader address aggregates (such as the state-ASN aggregates we use), in addition to /24 aggregates. In preliminary investigations, we found that many of our dependent disruption events consisted of multiple observed disrupted /24s that were each only partially disrupted; that is, a few addresses from many /24s were disrupted simultaneously but there continued to be other responsive addresses in these /24s. One of the largest events had 811 addresses from 42 /24 blocks in the observed disrupted group and 40 of these blocks had responsive addresses. We leave additional analyses for future work but we believe that we detected such events due to the broader aggregate of addresses we considered.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Padmanabhan, R., Schulman, A., Dainotti, A., Levin, D., Spring, N. (2019). How to Find Correlated Internet Failures. In: Choffnes, D., Barcellos, M. (eds) Passive and Active Measurement. PAM 2019. Lecture Notes in Computer Science(), vol 11419. Springer, Cham. https://doi.org/10.1007/978-3-030-15986-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-15986-3_14
Published: 13 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15985-6
Online ISBN: 978-3-030-15986-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

How to Find Correlated Internet Failures

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Determining \(D_{min}\)

1.2 A.2 Analyzing the Confidence of Detected Disruption Events

1.3 A.3 Dependent Disruption Events Across ISPs

1.4 A.4 Dependent Disruptions May Not Disrupt Entire /24s: Implications

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation