Computing

, Volume 96, Issue 1, pp 53–65 | Cite as

A coordinated view of the temporal evolution of large-scale Internet events

  • Alistair King
  • Bradley Huffaker
  • Alberto Dainotti
  • k. c. Claffy
Article

Abstract

We present a method to visualize large-scale Internet events, such as a large region losing connectivity, or a stealth probe of the entire IPv4 address space. We apply a well-known technique in information visualization—multiple coordinated views—to Internet-specific data. We animate these coordinated views to study the temporal evolution of an event along different dimensions, including geographic spread, topological (address space) coverage, and traffic impact. We explain the techniques we used to create the visualization, and using two recent case studies we describe how this capability to simultaneously view multiple dimensions of events enabled greater insight into their properties.

Keywords

Visualization Measurement methodology Security Software tools Darknet Outages 

Mathematics Subject Classification

90B18 

1 Introduction

As the Internet grows more critical to our lives, it also grows increasingly complex and resistant to comprehensive modeling or even measurement. The challenge of visualizing large volumes of data reflecting Internet behavior and/or misbehavior is daunting but also irresistible, especially given the expanding scientific, governmental, and popular interest in large-scale outages or attacks. In this work we present an approach to visualizing multiple views of large-scale Internet events simultaneously. We apply a well-developed technique in information visualization—multiple coordinated views [26]—to different Internet-specific data sources. We animate these coordinated views to study the temporal evolution of an event in different visual spaces, including geographic spread, topological (address space) coverage, and traffic impact. These multiple views facilitate a deeper analysis of the dynamics and impact of the event.

The first and most obvious view of interest is geographic coverage of an event. IP addresses found in Internet measurement data can be mapped to an estimated geolocation using a variety of (free or commercial) services [15]. Each IP address also represents a location in the IP address space, and visualizing how an event moves across the IP address space can reveal patterns of interest, e.g., systematic but stealth scanning of the global public Internet. The third view we examine is network traffic volume and associated statistics, such as number of communicating hosts.

We build and extend a set of software tools to provide coordinated multiple views of large-scale Internet events. We start with CAIDA’s Cuttlefish tool [2] which juxtaposes a geographic view of traffic (Sect. 3) with aggregated traffic statistics (Sect. 4). Cuttlefish is implemented as a Perl script that uses the GD library [3] to render a PNG image per frame. To visualize coverage of an event across the IPv4 address space we use the Measurement Factory’s ipv4-heatmap tool [12]. Section 5 describes our graphical address space representation, which required modifying this tool to display properties of the event throughout its evolution. To combine everything into a single animation, we developed a simple tool that operates on a frame-by-frame basis, appropriately scaling and placing one frame from each view into a single dashboard frame. We merged these frames into an animated video of the event.

Section 6 demonstrates the power of our approach to facilitate deeper insights into network data using two case studies of large-scale events that we analyzed in detail in previous work [9, 10]. Both case studies use traffic collected by the UCSD Network Telescope—a large darknet passively capturing traffic sourced mainly by malware-infected hosts around the world [6]. The first event is a botnet-coordinated scan in February 2011, which probed hosts looking for SIP servers across the entire Internet address space. This probing event, which we call “sipscan”, involved millions of hosts and lasted approximately 12 days [9]. The second case study is the government-mandated Internet blackout in Egypt, which isolated the country from the rest of the Internet for more than five days in early 2011 [10]. Animations and screen snapshots of these two examples are at [8].

2 Related work

There has been significant work in visualizing characteristics of Internet traffic, infrastructure, and IPv4 address space, but we are not aware of any published attempt to use a multiple coordinated view to depict the geographic and topological (address space) impact of an anomalous traffic pattern or outage. We review some examples of individual techniques that we leverage in this work.

Lamm et al. [17] visualized web traffic (from a web server to its clients) with a geographic graduated symbol map. They projected three-dimensional bars orthogonal to a globe with oceans, land, and political boundaries as points of reference. Munzner et al. [19] also used a spherical projection with arcs connecting sources and destinations of MBone traffic. Use of a sphere maintains accurate distance between points, but makes it difficult to judge bar height and occludes many data points. Papadakakis et al. [21] used a similar projection of bars to show traffic volume over time (using animation) but against a Mercator (2D) map.

The most popular way to map the large, one-dimensional IPv4 address space to Cartesian coordinates uses the space-filling continuous fractal Hilbert curve (Fig. 1) as mentioned in [23]. Hilbert curves exhibit a property whereby values close in the one-dimensional space are also close in the two-dimensional space covered by the curve, which is useful given how IP address blocks are allocated in contiguous rectangular (CIDR) clusters [13]. Oberheide et al. [20] combined this two-dimensional representation of IP address space with a z-axis to display port, byte count, and bi-directional traffic. The Hilbert curve rendition of IPv4 address space was popularized by Munroe’s xkcd on-line comic to illustrate administrative boundaries of IP address ownership [18], and later used by other researchers to display IPv4 address space reachability [14], BGP-announced address space, and spread of open DNS resolvers across the IPv4 address space [11].
Fig. 1

Examples of Hilbert’s space-filling curves: orders 1, 2 and 3

There have likely been many unpublished uses of multiple coordinated views to investigate Internet phenomena internal to networks, but little documented in the research literature. One example was Huffaker’s et al. [16] juxtaposition of a geographic view with a 2-D topological layout of an Internet overlay network, which he used to visualize NLANR’s web cache hierarchy. Brown et al. [7] created Cichlid—a multiple view visualization tool that displayed pre-defined Internet topology and time-series graphs and supported 3D animations.

This work builds on techniques and lessons learned in these studies. We combine a source address geographic graduated symbol source map, IPv4 address space Hilbert curve, and a time-series plot into a coordinated multiple view display and animation for use in visualizing large-scale Internet events.

3 Geographical representation

3.1 Mercator projection

An integral part of any geo-visualization is the mapping between the geographic coordinate system (latitude and longitude) and the on-screen coordinate system (\(x\) and \(y\)). Cuttlefish does not dictate geographic reference points, but instead allows the user to provide an image, its geographic coordinate bounds, and the projection system used. Currently Cuttlefish supports a simple linear translation from geographic to screen coordinates, as well as the commonly used Mercator projection method. The primary drawback of the Mercator projection is that it distorts the size and shape of objects due to the scale increasing from the Equator to the poles. However, most viewers are more familiar with the Mercator projection, so it tends to minimize the cognitive cost of identifying glyph locations. Also, with the advent of the OpenStreetMap project [5], maps with liberal licensing terms are readily available for use with the tool.

3.2 Day/night terminator

Since network traffic is influenced by patterns of human activity [25], time of day plays an important role in understanding the characteristics of a large-scale Internet event. For example, traffic from malware-infected hosts often has a strong diurnal component [9] because infected machines are typically PCs that are switched off at night by users. On the other hand, infected machines are distributed across many (most) time zones, which can dissipate otherwise stark diurnal variations. To mitigate this problem we use a day/night terminator (labeled in Fig. 2) to visually depict the time of day across the full range of geographic locations. This visual marker enables viewers to intuitively infer morning, noon, evening, and night, as well as how the tempo of an animation corresponds to clock time. Although time-of-day estimation for a given location requires that both (sunrise and sunset) terminators be visible, Cuttlefish also supports a caption denoting the time in one time zone (usually UTC) to assist with such time estimation.
Fig. 2

Day/night terminators in the geographic animations represent local time in different time zones

3.3 Value representation

For most large-scale network data, especially with a geographic component, some aggregation must occur to place it on a map or other layout. To aggregate data, Cuttlefish currently supports only a simple summation of data values whose geographic coordinate maps to the same on-screen pixel. So at this point the tool is only useful for visualizing metrics for which total counts are meaningful, such as number of hosts, packets, byte counts, etc. Cuttlefish can display these aggregated data values as either rectangles or circles. With circles, the area of the circle is proportional to the magnitude associated with the location where the circle is centered. With rectangles, their height represents the magnitude associated with their location. Circles are rendered onto the map in descending order of size, with the largest circles drawn first, to minimize complete occlusion of smaller circles. Rectangles are rendered in reverse latitude order, that is, starting from the top of the map, and drawn upward from the aggregated pixel to create a mild 3D effect, with rectangles closer to the bottom of the screen occluding those behind and so appearing closer to the viewer. For both circles and rectangles, the area or height can use either a linear or logarithmic scale between the minimum and maximum values. The value of the glyph can also be represented by color. Color and size can be used together to represent independent metrics, for example number of unique hosts and packet count for a location.

4 A view of network traffic

We augment the geographic view just described with a time-series graph plotting statistics of network traffic observed per time interval, each of which corresponds to one frame of the animation. For each interval we sum the values of one metric (e.g., host or packet count), across all locations, and use these per-frame values to generate a time-series. For example, in the sipscan frame shown in Fig. 6 (discussed further in Sect. 6), the graph in the bottom left corner plots the number of unique source IP addresses observed per 320-s interval, globally across the duration of the scan. Because each y-value in the graph corresponds to a single frame, we use a vertical yellow line (bottom left graph of Fig. 6) to indicate which time interval maps to the current frame being displayed in the geographic view above the traffic graph. This feature allows the viewer to easily track the current position of the animation relative to the overall event duration, and to correlate changes observed in different views.

5 Visualizing the address space

As described in Sect. 2, we use Wessels’ ipv4-heatmap tool to visualize the IP address space [12]. By default this tool creates a \(4096\times 4096\) pixel image of the whole IPv4 address space, with each pixel representing a /24 sub-network. It can also be configured to render a smaller portion of the overall address space, which is how we use it to visualize traffic coming to our /8 darknet (i.e., each pixel represents an IP address).

Each point in the image is colored with one of 256 colors in the range from blue (1) to red (255), or black (0). Typically this range is used to indicate the fraction of the subnet that belongs to the observed population. For example, to visualize which IPv4 addresses respond to ping, a red pixel would mean that all 256 possible addresses in the /24 network segment responded. In this typical type of heatmap, a color’s hue conveys the magnitude of the value corresponding to the network segment. Hot colors such as yellow and red represent high values, and cooler colors such as blue represent low values. These color assignments effectively highlight the relationship between points in a single image.

To capture temporal dynamics of an event, e.g., which IP addresses are scanned over time, and how long it has been since other addresses have been scanned, the cold-to-hot range of colors is not intuitive. Lightness provides a more effective method for visualizing temporal data because people are reliably able to order colors by lightness [24]. We modified the ipv4-heatmap tool to use Hue, Saturation, and Value (HSV) color space, rather than the Red, Green, and Blue (RGB) space. HSV allows for a single base Hue to be easily varied to create a monotonically increasing spectrum of colors which become lighter as the saturation and value increases. We use a fixed Hue, combined with a linearly increasing progression of equal Saturation and Value parameters, to create a spectrum of colors that move up and toward the center of the HSV color space shown in Fig. 3.
Fig. 3

The Hue, Saturation, Value (HSV) color space (adapted from [22]). When visualizing address space coverage over time, we use a fixed Hue and vary the Saturation and Value parameters to assign lighter shades to more recently active addresses

For the Egyptian outage animation (example frame in Fig. 4), we use a binary state for an address, i.e., if our /8 darknet received traffic from the IP address in the current time interval, we set the value of its pixel to 255, otherwise we use a value of 0. But the sipscan example had more interesting temporal probing dynamics, which we tried to capture in its address space heatmap animation. We use a value of 255 for addresses probed within the current interval, and logarithmically decay this value for addresses probed in prior time intervals. The value is decayed further with every succeeding interval until a lower-bound is reached at which point the value (and corresponding color) remains constant. This decay effect vividly illustrates the temporal dynamics of the scan across the address space, including how recently an address was probed. Figure 5 shows a snapshot frame of the full animation available at [8].
Fig. 4

Snapshot from our Hilbert-space animation of source addresses in Africa observed after the Internet blackout in Egypt [8] (animation described in Sects. 5 and 6.2)

Fig. 5

Sample frame from our animation of the temporal evolution of the IP addresses targeted by the sipscan [8]. A Hilbert curve is used to represent the /8 network of the UCSD Network Telescope, in which the order of the three least significant bytes of each address are reversed to show the progression of the scan. Colored pixels correspond to addresses that have been probed up to that time (February 5 2011 11:47 UTC). We use a logarithmic decay effect on the color of each address to represent the elapsed time since the address was last probed—lighter pixels correspond to more recently probed addresses (this frame is a modified version of the original frame in the animation available at [8]; we have exaggerated the decay effect to better show the progression of the scan in a single image)

By leveraging the fractal nature of the Hilbert curve, coupled with its property of grouping addresses within a prefix into a rectangle, we can zoom into a specific network within the address space, and show a detailed view of the addresses that comprise it. Figure 4 shows only the 41.0.0.0/8 network delegated to AfriNIC using a Hilbert curve of order 12. Each light-colored pixel represents a /24 network from which Conficker-like1 packets were received by the UCSD Network Telescope during the hour (February 2 2011 13:00–14:00 UTC) represented by the frame. The shaded dark blue areas are networks we inferred to be in Egypt at the time of the outage, using both the MaxMind GeoLite Country database [4], and the AfriNIC [1] delegations to Egyptian companies, as we described in [10]. By combining this frame with others representing different time periods, we build up an animation of the reduction in the number of unique source addresses located in Egyptian networks during the blackout as observed from the UCSD Network Telescope.

6 Multiple coordinated views

For each view, we divide the duration of the event into fixed time intervals, aggregating data for each interval into a single frame. Using our composition tool we render a frame set, which graphically merges several sub-frames showing different views. All views are rendered with identical bin sizes, so sub-frames with the same index refer to the same time interval.

Each view provides unique insights into aspects of an event, but when they are combined into a synchronized animation, cues from one view can illuminate the macroscopic behavior. For example, the geographic view has a day/night terminator which allows the viewer to infer the local time of day for a given location. But the terminator alone does not provide any temporal reference for the current time within the overall event—the tool uses a vertical yellow line on the network-traffic graph to provide this temporal reference (bottom left graph of Fig. 6).
Fig. 6

Single frame from our multiple view animation of the sipscan discussed in Sect. 6 and available for viewing at [8]. Combining the geographic and address space coverage with the time-series of network traffic-related statistics (unique number of observed communicating IP addresses) throughout the scan provides a high-level view of the coordinated and distributed nature of the event

The coordinated view also enables visual correlation. The network traffic graph may show behavior phase-shifts during the observed event, e.g., a sudden large drop in the number of hosts generating traffic. Such traffic changes can be visually correlated with patterns in the address space animation and/or associated with specific geographical regions, especially when the geographic map incorporates a traffic-related parameter shown in the traffic graph, e.g., circle size or color.

We next apply this multiple coordinated view technique to analyze two large-scale Internet events—the globally distributed and coordinated “sipscan” we describe in [9], and the Egyptian Internet blackout during the Arab Spring uprising that we measured and analyzed in [10]. For each example we customize the configuration of each view to highlight interesting aspects of the event. Both animations are available at [8].

6.1 Sipscan animation

The sipscan was a botnet-orchestrated stealth scan of the entire IPv4 address space that occurred over approximately 12 days (31 Jan–12 Feb) in early 2011. We identified this scan by analyzing traffic collected at the UCSD Network Telescope and isolated its probing packets through a payload-based signature. We used the multiple coordinated view visualization to confirm and analyze the extraordinarily stealth scanning behavior of the botnet.

Figure 6 shows an example frame from the animation of the sipscan corresponding to a 320-s interval during the scan. The prominent world map at the top of the frame displays the inferred geographic location of hosts sending probing packets that reached the UCSD Network Telescope. The size of the circles represents the number of unique source IP addresses from this location, and the color represents the number of packets received from all observed IP addresses in this location. We represent both values because we found that hosts send probing packets with different rates (i.e., number of packets from a geographical location is not directly proportional to the number of hosts).

The graph in the lower left of the image plots traffic-related statistics over time, in this case showing the total number of unique source IP addresses sending sipscan probes observed in each 320-s interval. The two black squares in the bottom center and bottom right are address space maps showing the addresses of the /8 network of the UCSD Network Telescope that have been probed up to this point in the scan. While the address map in the center shows the original IP addresses, the one on the right is constructed with the IP addresses in reverse byte order. This different representation allowed us to examine and validate our hypothesis (based on manual inspection of packets) that the scan was selecting the target IP by increments in reverse byte order. The animation clearly shows this behavior: the center address map displays an apparent random filling pattern, whereas the one on the right shows a progression that impressively follows the Hilbert curve. The strict observance of this pattern visible in the address map, combined with the geographical view, illustrates a strongly coordinated behavior of the bots participating in the scan.

Using multiple-coordinated views during the interval containing the abrupt decrease in hosts (between days 6 and 11) facilitated our discovery of another property of the scan: even when the number of the scanning bots plummets, the botmaster adjusts the commands sent to the bots to maintain the same pattern. As reflected in the animation, during this period the rightmost address-space map shows a much slower progression but still strictly following the Hilbert curve. In [9] we conclude that the progression in reverse-byte order, combined with other properties, was part of a strategy to make the scan stealthy. Such findings about the scan progression and the high degree of coordination—which this visualization helped demonstrate—were among the most relevant of our analysis in [9].

6.2 Egyptian censorship animation

The second case study we consider is the Egyptian Internet blackout. Beginning the evening of January 27, most BGP routes to Egyptian networks were progressively withdrawn by governmental order, denying Internet access to the vast majority of the population. This state of no connectivity was maintained for approximately 5 days, until the morning of February 2. In [10] we analyzed the outage using multiple measurement data sources and techniques, including traffic collected at the UCSD Network Telescope, which is the data used in this case study. Specifically, we observed the effect of the blackout on malware-infected Egyptian PCs by analyzing the volume of Conficker-like packets received by the telescope and geolocated to Egypt.

Figure 7 shows one frame from our coordinated view animation of the Egypt Internet blackout. The image represents one hour of data from the last full day of the outage (Feb. 1 2011 06:00–07:00 UTC). Similar to the sipscan animation (Fig. 6), the geographic view dominates the frame. However, because our intent is to highlight events in Egypt, we zoom and center the map around Egypt—including nearby countries to give context. We limit maximum and minimum values for the glyphs to those observed from Egyptian locations to emphasize the effect of the outage.
Fig. 7

Sample frame from our multiple coordinated views animation of the Egypt Internet blackout, available at [8] (note map image  OpenStreetMap [5] contributors, CC BY-SA)

The network traffic statistics time-series in the bottom left of the frame plots the unique number of hosts that sent packets to the UCSD Network Telescope per hour. We counted only hosts geolocated to Egypt in order to avoid obscuring the outage signal with data from unaffected countries. We delineated the blackout period with red lines to enhance the function of the yellow “now” marker—allowing the viewer to correlate temporal proximity to the outage with features of the other views.

As with the sipscan example, we customize the two IPv4 address space views to shed light on details of the outage. The bottom middle image represents the full IPv4 address space, highlighting /24 networks from which packets were received in the current time interval. The bottom right image provides the same data, but limited to /24 networks within the 41.0.0.0/8 address block delegated to AfriNIC. In this image the networks delegated to organizations in Egypt are shaded with a dark blue color.

The coordinated view in the animation clearly shows the amount of Conficker-like traffic from Egypt dropping, while the surrounding countries continue to generate the same amount of traffic. The gradual decrease visible in the first three days of the outage reflects the progressive disconnection of ISPs that were initially left untouched (probably because considered of strategic importance, such as the ISP serving the Egyptian stock exchange [10]). The zoomed view of the address space confirms this staged process, showing how traffic from address blocks delegated to such ISPs suddenly disappears from the telescope.

7 Conclusion

We applied several well-known techniques in information visualization to the problem of visualizing multiple aspects of large-scale Internet events. Using multiple coordinated views, we integrated and implemented these combination of techniques. We then animated these coordinated views to study the temporal evolution of an event along different dimensions, including geographic spread, topological (address space) coverage, and traffic impact. We used two case studies—a large scale outage and a Internet-wide address space scan—to illustrate the power of the tool to discover new insights about as well as illustrate characteristics of Internet events. We designed our toolchain to be general enough to incorporate other types of views and hope to integrate it into the UCSD Network Telescope reporting system which will provide views such as these in a near-realtime fashion.

Footnotes

  1. 1.

    We use “Conficker-like” to refer to TCP packets destined to port 445, publicized during the Conficker episode but a target of scanning activity for many years.

Notes

Acknowledgments

Support for the UCSD network telescope operations and data collection, curation, analysis, and sharing is provided by DHS S&T NBCHC070133, NSF CNS-1059439. The author’s efforts on this project were supported by DHS S&T NBCHCO70133 (KC), NSF CNS-1059439 and NSF CNS-1228994.

References

  1. 1.
    AfriNIC. The Registry of Internet Number Resources for Africa (2012) http://www.afrinic.net
  2. 2.
  3. 3.
    Gd graphics library (2012) http://www.boutell.com/gd/
  4. 4.
    MaxMind GeoLite Country (2012) http://www.maxmind.com/app/geolitecountry
  5. 5.
    OpenStreetMap (2012) http://www.openstreetmap.org
  6. 6.
  7. 7.
    Brown J, McGregor A, Braun H-W (2000) Network performance visualization: insight through animation. In: Proceedings of Passive and Active Measurement Workshop, HamiltonGoogle Scholar
  8. 8.
  9. 9.
    Dainotti A, King A, Claffy K, Papale F, Pescapé A (2012) Analysis of a “/0” Stealth Scan from a Botnet. In: ACM Internet Measurement Conference, BostonGoogle Scholar
  10. 10.
    Dainotti A, Squarcella C, Aben E, Claffy KC, Chiesa M, Russo M, Pescapé (2011) A Analysis of country-wide internet outages caused by censorship. In: ACM SIGCOMM Internet measurement conference, BerlinGoogle Scholar
  11. 11.
    Duane Wessels (2009) Gallery of IPv4 Heat Maps. http://maps.measurement-factory.com/gallery/
  12. 12.
    Duane Wessels (2009) IPv4 Heatmap tool. http://maps.measurement-factory.com/software/
  13. 13.
    Fuller V (2006) Li T RFC 4632 (Best current practice)Google Scholar
  14. 14.
    Heidemann J, Pradkin Y, Govindan R, Papadopoulos C, Bartlett G, Bannister J (2008) Census and survey of the visible internet. In: ACM SIGCOMM Internet measurement Conference, VouliagmeniGoogle Scholar
  15. 15.
    Huffaker B, Fomenkov M, Claffy kc (2011) Geocompare: a comparison of public and commercial geolocation databases. In: Network Mapping and Measurement Conference (NMMC). http://www.caida.org/publications/papers/2011/geocompare-tr/
  16. 16.
    Huffaker B, Jung J, Wessels D, Claffy k (1998) Visualization of the growth and topology of the NLANR caching hierarchy. In: 3rd International WWW Caching Workshop, ManchesterGoogle Scholar
  17. 17.
    Lamm S, Reed D, Scullin W (1996) Real-time geographic visualization of world wide web traffic. In: Proceedings of Fifth International World Wide Web Conference (WWW5), Paris, pp 1457–1468Google Scholar
  18. 18.
    Munroe R (2006) xkcd: MAP of the INTERNET. http://blog.xkcd.com/2006/12/11/the-map-of-the-internet/
  19. 19.
    Munzner T, Hoffman E, Claffy k, Fenner B (1996) Visualizing the global topology of the MBone. In: IEEE Symposium on Information Visualization, San FranciscoGoogle Scholar
  20. 20.
    Oberheide J, Goff M, Karir M (2006) Flamingo: visualizing internet traffic. In: Network Operations and Management Symposium. NOMS 2006. 10th IEEE/IFIP, VancouverGoogle Scholar
  21. 21.
    Papadakakis N, Markatos E, Palantir AP (1998) A visualization tool for the world wide web. In: INET’98 Proceedings, GenevaGoogle Scholar
  22. 22.
  23. 23.
    Shannon AN, Spires V (2003) Exhaustive search system and method using space-filling curves. Patent, 10 2003. US 6636847Google Scholar
  24. 24.
    Shirley P, Marschner S (2009) Fundamentals of computer graphics, 3rd edn. A.K Peters Ltd, WellesleyGoogle Scholar
  25. 25.
    Thompson K, Miller GJ, Wilder R (1997) Wide-area internet traffic patterns and characteristics. IEEE Netw Mag 11(6):10–23CrossRefGoogle Scholar
  26. 26.
    Wang Baldonado MQ, Woodruff A, Kuchinsky A (2000) Guidelines for using multiple views in information visualization. In: Conference on Advanced visual interfaces, AVI ’00. ACM, New YorkGoogle Scholar

Copyright information

© Springer-Verlag Wien 2013

Authors and Affiliations

  • Alistair King
    • 1
  • Bradley Huffaker
    • 1
  • Alberto Dainotti
    • 1
  • k. c. Claffy
    • 1
  1. 1.CAIDAUniversity of CaliforniaSan DiegoUSA

Personalised recommendations