A coordinated view of the temporal evolution of large-scale Internet events
We present a method to visualize large-scale Internet events, such as a large region losing connectivity, or a stealth probe of the entire IPv4 address space. We apply a well-known technique in information visualization—multiple coordinated views—to Internet-specific data. We animate these coordinated views to study the temporal evolution of an event along different dimensions, including geographic spread, topological (address space) coverage, and traffic impact. We explain the techniques we used to create the visualization, and using two recent case studies we describe how this capability to simultaneously view multiple dimensions of events enabled greater insight into their properties.
KeywordsVisualization Measurement methodology Security Software tools Darknet Outages
Mathematics Subject Classification90B18
As the Internet grows more critical to our lives, it also grows increasingly complex and resistant to comprehensive modeling or even measurement. The challenge of visualizing large volumes of data reflecting Internet behavior and/or misbehavior is daunting but also irresistible, especially given the expanding scientific, governmental, and popular interest in large-scale outages or attacks. In this work we present an approach to visualizing multiple views of large-scale Internet events simultaneously. We apply a well-developed technique in information visualization—multiple coordinated views —to different Internet-specific data sources. We animate these coordinated views to study the temporal evolution of an event in different visual spaces, including geographic spread, topological (address space) coverage, and traffic impact. These multiple views facilitate a deeper analysis of the dynamics and impact of the event.
The first and most obvious view of interest is geographic coverage of an event. IP addresses found in Internet measurement data can be mapped to an estimated geolocation using a variety of (free or commercial) services . Each IP address also represents a location in the IP address space, and visualizing how an event moves across the IP address space can reveal patterns of interest, e.g., systematic but stealth scanning of the global public Internet. The third view we examine is network traffic volume and associated statistics, such as number of communicating hosts.
We build and extend a set of software tools to provide coordinated multiple views of large-scale Internet events. We start with CAIDA’s Cuttlefish tool  which juxtaposes a geographic view of traffic (Sect. 3) with aggregated traffic statistics (Sect. 4). Cuttlefish is implemented as a Perl script that uses the GD library  to render a PNG image per frame. To visualize coverage of an event across the IPv4 address space we use the Measurement Factory’s ipv4-heatmap tool . Section 5 describes our graphical address space representation, which required modifying this tool to display properties of the event throughout its evolution. To combine everything into a single animation, we developed a simple tool that operates on a frame-by-frame basis, appropriately scaling and placing one frame from each view into a single dashboard frame. We merged these frames into an animated video of the event.
Section 6 demonstrates the power of our approach to facilitate deeper insights into network data using two case studies of large-scale events that we analyzed in detail in previous work [9, 10]. Both case studies use traffic collected by the UCSD Network Telescope—a large darknet passively capturing traffic sourced mainly by malware-infected hosts around the world . The first event is a botnet-coordinated scan in February 2011, which probed hosts looking for SIP servers across the entire Internet address space. This probing event, which we call “sipscan”, involved millions of hosts and lasted approximately 12 days . The second case study is the government-mandated Internet blackout in Egypt, which isolated the country from the rest of the Internet for more than five days in early 2011 . Animations and screen snapshots of these two examples are at .
2 Related work
There has been significant work in visualizing characteristics of Internet traffic, infrastructure, and IPv4 address space, but we are not aware of any published attempt to use a multiple coordinated view to depict the geographic and topological (address space) impact of an anomalous traffic pattern or outage. We review some examples of individual techniques that we leverage in this work.
Lamm et al.  visualized web traffic (from a web server to its clients) with a geographic graduated symbol map. They projected three-dimensional bars orthogonal to a globe with oceans, land, and political boundaries as points of reference. Munzner et al.  also used a spherical projection with arcs connecting sources and destinations of MBone traffic. Use of a sphere maintains accurate distance between points, but makes it difficult to judge bar height and occludes many data points. Papadakakis et al.  used a similar projection of bars to show traffic volume over time (using animation) but against a Mercator (2D) map.
There have likely been many unpublished uses of multiple coordinated views to investigate Internet phenomena internal to networks, but little documented in the research literature. One example was Huffaker’s et al.  juxtaposition of a geographic view with a 2-D topological layout of an Internet overlay network, which he used to visualize NLANR’s web cache hierarchy. Brown et al.  created Cichlid—a multiple view visualization tool that displayed pre-defined Internet topology and time-series graphs and supported 3D animations.
This work builds on techniques and lessons learned in these studies. We combine a source address geographic graduated symbol source map, IPv4 address space Hilbert curve, and a time-series plot into a coordinated multiple view display and animation for use in visualizing large-scale Internet events.
3 Geographical representation
3.1 Mercator projection
An integral part of any geo-visualization is the mapping between the geographic coordinate system (latitude and longitude) and the on-screen coordinate system (\(x\) and \(y\)). Cuttlefish does not dictate geographic reference points, but instead allows the user to provide an image, its geographic coordinate bounds, and the projection system used. Currently Cuttlefish supports a simple linear translation from geographic to screen coordinates, as well as the commonly used Mercator projection method. The primary drawback of the Mercator projection is that it distorts the size and shape of objects due to the scale increasing from the Equator to the poles. However, most viewers are more familiar with the Mercator projection, so it tends to minimize the cognitive cost of identifying glyph locations. Also, with the advent of the OpenStreetMap project , maps with liberal licensing terms are readily available for use with the tool.
3.2 Day/night terminator
3.3 Value representation
For most large-scale network data, especially with a geographic component, some aggregation must occur to place it on a map or other layout. To aggregate data, Cuttlefish currently supports only a simple summation of data values whose geographic coordinate maps to the same on-screen pixel. So at this point the tool is only useful for visualizing metrics for which total counts are meaningful, such as number of hosts, packets, byte counts, etc. Cuttlefish can display these aggregated data values as either rectangles or circles. With circles, the area of the circle is proportional to the magnitude associated with the location where the circle is centered. With rectangles, their height represents the magnitude associated with their location. Circles are rendered onto the map in descending order of size, with the largest circles drawn first, to minimize complete occlusion of smaller circles. Rectangles are rendered in reverse latitude order, that is, starting from the top of the map, and drawn upward from the aggregated pixel to create a mild 3D effect, with rectangles closer to the bottom of the screen occluding those behind and so appearing closer to the viewer. For both circles and rectangles, the area or height can use either a linear or logarithmic scale between the minimum and maximum values. The value of the glyph can also be represented by color. Color and size can be used together to represent independent metrics, for example number of unique hosts and packet count for a location.
4 A view of network traffic
We augment the geographic view just described with a time-series graph plotting statistics of network traffic observed per time interval, each of which corresponds to one frame of the animation. For each interval we sum the values of one metric (e.g., host or packet count), across all locations, and use these per-frame values to generate a time-series. For example, in the sipscan frame shown in Fig. 6 (discussed further in Sect. 6), the graph in the bottom left corner plots the number of unique source IP addresses observed per 320-s interval, globally across the duration of the scan. Because each y-value in the graph corresponds to a single frame, we use a vertical yellow line (bottom left graph of Fig. 6) to indicate which time interval maps to the current frame being displayed in the geographic view above the traffic graph. This feature allows the viewer to easily track the current position of the animation relative to the overall event duration, and to correlate changes observed in different views.
5 Visualizing the address space
As described in Sect. 2, we use Wessels’ ipv4-heatmap tool to visualize the IP address space . By default this tool creates a \(4096\times 4096\) pixel image of the whole IPv4 address space, with each pixel representing a /24 sub-network. It can also be configured to render a smaller portion of the overall address space, which is how we use it to visualize traffic coming to our /8 darknet (i.e., each pixel represents an IP address).
Each point in the image is colored with one of 256 colors in the range from blue (1) to red (255), or black (0). Typically this range is used to indicate the fraction of the subnet that belongs to the observed population. For example, to visualize which IPv4 addresses respond to ping, a red pixel would mean that all 256 possible addresses in the /24 network segment responded. In this typical type of heatmap, a color’s hue conveys the magnitude of the value corresponding to the network segment. Hot colors such as yellow and red represent high values, and cooler colors such as blue represent low values. These color assignments effectively highlight the relationship between points in a single image.
By leveraging the fractal nature of the Hilbert curve, coupled with its property of grouping addresses within a prefix into a rectangle, we can zoom into a specific network within the address space, and show a detailed view of the addresses that comprise it. Figure 4 shows only the 220.127.116.11/8 network delegated to AfriNIC using a Hilbert curve of order 12. Each light-colored pixel represents a /24 network from which Conficker-like1 packets were received by the UCSD Network Telescope during the hour (February 2 2011 13:00–14:00 UTC) represented by the frame. The shaded dark blue areas are networks we inferred to be in Egypt at the time of the outage, using both the MaxMind GeoLite Country database , and the AfriNIC  delegations to Egyptian companies, as we described in . By combining this frame with others representing different time periods, we build up an animation of the reduction in the number of unique source addresses located in Egyptian networks during the blackout as observed from the UCSD Network Telescope.
6 Multiple coordinated views
For each view, we divide the duration of the event into fixed time intervals, aggregating data for each interval into a single frame. Using our composition tool we render a frame set, which graphically merges several sub-frames showing different views. All views are rendered with identical bin sizes, so sub-frames with the same index refer to the same time interval.
The coordinated view also enables visual correlation. The network traffic graph may show behavior phase-shifts during the observed event, e.g., a sudden large drop in the number of hosts generating traffic. Such traffic changes can be visually correlated with patterns in the address space animation and/or associated with specific geographical regions, especially when the geographic map incorporates a traffic-related parameter shown in the traffic graph, e.g., circle size or color.
We next apply this multiple coordinated view technique to analyze two large-scale Internet events—the globally distributed and coordinated “sipscan” we describe in , and the Egyptian Internet blackout during the Arab Spring uprising that we measured and analyzed in . For each example we customize the configuration of each view to highlight interesting aspects of the event. Both animations are available at .
6.1 Sipscan animation
The sipscan was a botnet-orchestrated stealth scan of the entire IPv4 address space that occurred over approximately 12 days (31 Jan–12 Feb) in early 2011. We identified this scan by analyzing traffic collected at the UCSD Network Telescope and isolated its probing packets through a payload-based signature. We used the multiple coordinated view visualization to confirm and analyze the extraordinarily stealth scanning behavior of the botnet.
Figure 6 shows an example frame from the animation of the sipscan corresponding to a 320-s interval during the scan. The prominent world map at the top of the frame displays the inferred geographic location of hosts sending probing packets that reached the UCSD Network Telescope. The size of the circles represents the number of unique source IP addresses from this location, and the color represents the number of packets received from all observed IP addresses in this location. We represent both values because we found that hosts send probing packets with different rates (i.e., number of packets from a geographical location is not directly proportional to the number of hosts).
The graph in the lower left of the image plots traffic-related statistics over time, in this case showing the total number of unique source IP addresses sending sipscan probes observed in each 320-s interval. The two black squares in the bottom center and bottom right are address space maps showing the addresses of the /8 network of the UCSD Network Telescope that have been probed up to this point in the scan. While the address map in the center shows the original IP addresses, the one on the right is constructed with the IP addresses in reverse byte order. This different representation allowed us to examine and validate our hypothesis (based on manual inspection of packets) that the scan was selecting the target IP by increments in reverse byte order. The animation clearly shows this behavior: the center address map displays an apparent random filling pattern, whereas the one on the right shows a progression that impressively follows the Hilbert curve. The strict observance of this pattern visible in the address map, combined with the geographical view, illustrates a strongly coordinated behavior of the bots participating in the scan.
Using multiple-coordinated views during the interval containing the abrupt decrease in hosts (between days 6 and 11) facilitated our discovery of another property of the scan: even when the number of the scanning bots plummets, the botmaster adjusts the commands sent to the bots to maintain the same pattern. As reflected in the animation, during this period the rightmost address-space map shows a much slower progression but still strictly following the Hilbert curve. In  we conclude that the progression in reverse-byte order, combined with other properties, was part of a strategy to make the scan stealthy. Such findings about the scan progression and the high degree of coordination—which this visualization helped demonstrate—were among the most relevant of our analysis in .
6.2 Egyptian censorship animation
The second case study we consider is the Egyptian Internet blackout. Beginning the evening of January 27, most BGP routes to Egyptian networks were progressively withdrawn by governmental order, denying Internet access to the vast majority of the population. This state of no connectivity was maintained for approximately 5 days, until the morning of February 2. In  we analyzed the outage using multiple measurement data sources and techniques, including traffic collected at the UCSD Network Telescope, which is the data used in this case study. Specifically, we observed the effect of the blackout on malware-infected Egyptian PCs by analyzing the volume of Conficker-like packets received by the telescope and geolocated to Egypt.
The network traffic statistics time-series in the bottom left of the frame plots the unique number of hosts that sent packets to the UCSD Network Telescope per hour. We counted only hosts geolocated to Egypt in order to avoid obscuring the outage signal with data from unaffected countries. We delineated the blackout period with red lines to enhance the function of the yellow “now” marker—allowing the viewer to correlate temporal proximity to the outage with features of the other views.
As with the sipscan example, we customize the two IPv4 address space views to shed light on details of the outage. The bottom middle image represents the full IPv4 address space, highlighting /24 networks from which packets were received in the current time interval. The bottom right image provides the same data, but limited to /24 networks within the 18.104.22.168/8 address block delegated to AfriNIC. In this image the networks delegated to organizations in Egypt are shaded with a dark blue color.
The coordinated view in the animation clearly shows the amount of Conficker-like traffic from Egypt dropping, while the surrounding countries continue to generate the same amount of traffic. The gradual decrease visible in the first three days of the outage reflects the progressive disconnection of ISPs that were initially left untouched (probably because considered of strategic importance, such as the ISP serving the Egyptian stock exchange ). The zoomed view of the address space confirms this staged process, showing how traffic from address blocks delegated to such ISPs suddenly disappears from the telescope.
We applied several well-known techniques in information visualization to the problem of visualizing multiple aspects of large-scale Internet events. Using multiple coordinated views, we integrated and implemented these combination of techniques. We then animated these coordinated views to study the temporal evolution of an event along different dimensions, including geographic spread, topological (address space) coverage, and traffic impact. We used two case studies—a large scale outage and a Internet-wide address space scan—to illustrate the power of the tool to discover new insights about as well as illustrate characteristics of Internet events. We designed our toolchain to be general enough to incorporate other types of views and hope to integrate it into the UCSD Network Telescope reporting system which will provide views such as these in a near-realtime fashion.
We use “Conficker-like” to refer to TCP packets destined to port 445, publicized during the Conficker episode but a target of scanning activity for many years.
Support for the UCSD network telescope operations and data collection, curation, analysis, and sharing is provided by DHS S&T NBCHC070133, NSF CNS-1059439. The author’s efforts on this project were supported by DHS S&T NBCHCO70133 (KC), NSF CNS-1059439 and NSF CNS-1228994.
- 1.AfriNIC. The Registry of Internet Number Resources for Africa (2012) http://www.afrinic.net
- 2.Cuttlefish (2012) http://www.caida.org/tools/visualization/cuttlefish/
- 3.Gd graphics library (2012) http://www.boutell.com/gd/
- 4.MaxMind GeoLite Country (2012) http://www.maxmind.com/app/geolitecountry
- 5.OpenStreetMap (2012) http://www.openstreetmap.org
- 6.UCSD Network Telescope (2010) http://www.caida.org/data/passive/network_telescope.xml
- 7.Brown J, McGregor A, Braun H-W (2000) Network performance visualization: insight through animation. In: Proceedings of Passive and Active Measurement Workshop, HamiltonGoogle Scholar
- 8.CAIDA. Supplemental data (2012) http://www.caida.org/publications/papers/2012/coordinated_view_internet_events/supplemental/
- 9.Dainotti A, King A, Claffy K, Papale F, Pescapé A (2012) Analysis of a “/0” Stealth Scan from a Botnet. In: ACM Internet Measurement Conference, BostonGoogle Scholar
- 10.Dainotti A, Squarcella C, Aben E, Claffy KC, Chiesa M, Russo M, Pescapé (2011) A Analysis of country-wide internet outages caused by censorship. In: ACM SIGCOMM Internet measurement conference, BerlinGoogle Scholar
- 11.Duane Wessels (2009) Gallery of IPv4 Heat Maps. http://maps.measurement-factory.com/gallery/
- 12.Duane Wessels (2009) IPv4 Heatmap tool. http://maps.measurement-factory.com/software/
- 13.Fuller V (2006) Li T RFC 4632 (Best current practice)Google Scholar
- 14.Heidemann J, Pradkin Y, Govindan R, Papadopoulos C, Bartlett G, Bannister J (2008) Census and survey of the visible internet. In: ACM SIGCOMM Internet measurement Conference, VouliagmeniGoogle Scholar
- 15.Huffaker B, Fomenkov M, Claffy kc (2011) Geocompare: a comparison of public and commercial geolocation databases. In: Network Mapping and Measurement Conference (NMMC). http://www.caida.org/publications/papers/2011/geocompare-tr/
- 16.Huffaker B, Jung J, Wessels D, Claffy k (1998) Visualization of the growth and topology of the NLANR caching hierarchy. In: 3rd International WWW Caching Workshop, ManchesterGoogle Scholar
- 17.Lamm S, Reed D, Scullin W (1996) Real-time geographic visualization of world wide web traffic. In: Proceedings of Fifth International World Wide Web Conference (WWW5), Paris, pp 1457–1468Google Scholar
- 18.Munroe R (2006) xkcd: MAP of the INTERNET. http://blog.xkcd.com/2006/12/11/the-map-of-the-internet/
- 19.Munzner T, Hoffman E, Claffy k, Fenner B (1996) Visualizing the global topology of the MBone. In: IEEE Symposium on Information Visualization, San FranciscoGoogle Scholar
- 20.Oberheide J, Goff M, Karir M (2006) Flamingo: visualizing internet traffic. In: Network Operations and Management Symposium. NOMS 2006. 10th IEEE/IFIP, VancouverGoogle Scholar
- 21.Papadakakis N, Markatos E, Palantir AP (1998) A visualization tool for the world wide web. In: INET’98 Proceedings, GenevaGoogle Scholar
- 22.Rus J Hsl-hsv models.svg (2012) http://en.wikipedia.org/wiki/File:Hsl-hsv_models.svg
- 23.Shannon AN, Spires V (2003) Exhaustive search system and method using space-filling curves. Patent, 10 2003. US 6636847Google Scholar
- 24.Shirley P, Marschner S (2009) Fundamentals of computer graphics, 3rd edn. A.K Peters Ltd, WellesleyGoogle Scholar
- 26.Wang Baldonado MQ, Woodruff A, Kuchinsky A (2000) Guidelines for using multiple views in information visualization. In: Conference on Advanced visual interfaces, AVI ’00. ACM, New YorkGoogle Scholar