1 Introduction

Contact tracing is a public health tool used in the fight against infectious disease, and is based on the assumption that disease is transmitted via close personal contact. From patients’ contact history, healthcare workers attempt to break the chain of transmission by first tracing the source of infection and then identifying other potential patients exposed to the disease so that they may be monitored and, if necessary, treated (Eames and Keeling, 2003; Rothenberg et al., 2003). Since contact tracing requires intensive manual effort in interviewing patients and collecting their contact records, contact tracing is most effective when the number of infected cases or reproductive ratio of the disease is low (Eames and Keeling, 2003). Contact tracing has been applied to the control of Sexually Transmitted Diseases (STDs), Tuberculosis (TB), and some newly emerging diseases, such as Severe Acute Respiratory Syndrome (SARS) in 2003.

The effect of social networks on STD transmission has long been recognized and has triggered the development of control measures for STDs. In the 1960s, for instance, Dr. Havlak suggested that if several syphilis patients share a common sexual contact, their contact tracing records should be kept in one folder and analyzed as a unit or lot (Rothenberg et al., 2003). This “lot system” has facilitated the identification of potential STD patients for target screening, and its basic premise is similar to the concept of clusters in Social Network Analysis (SNA) (Rothenberg and Narramore, 1996). However, the consideration of using SNA to enhance contact tracing wasn’t begun until the emergence of Acquired Immunodeficiency Syndrome (AIDS) in the 1980s; its rapid spread was believed to be related to fast growing sexual networks augmented by the ease of long distance travel. In 1984, Auerbach et al. (1984) initiated a contact investigation of 19 patients in California to assess the role of sexual relationships in AIDS transmission. They eventually linked 40 patients across ten cities in the USA in a network graph and supported the long held hypothesis that AIDS is transmitted via pathogens.

In 1985, Klovdahl (1985) formally established the connection between contact tracing and SNA, using the same dataset from the Auerbach et al. study to demonstrate how SNA could be applied to examine two causal criteria of transmission: exposure and temporality. In addition, he recapped the relationship between an STD’s spread and the structure of social networks, and he introduced the potential usage of centrality measures in SNA to identify prominent individuals in STD transmission. In 1994, Klovdahl et al. (Klovdahl et al, 1994) further proved the concept of incorporating SNA into disease investigation with a large scale study in Colorado Springs, Colorado, in which over 600 individuals were directly or indirectly connected to each other in one network.

For more than 20 Years following Klovdahl’s 1985 paper, SNA has been successfully applied to the studies of several STD outbreaks. The epidemiological insights that SNA can provide have also evolved from the static identification of core groups to the investigation of transmission dynamics. In this chapter, we review the development of SNA in the field of epidemiology and present a case study of the Taiwan SARS outbreak in 2003 to discuss the role of geographical contacts in disease investigation.

The remainder of this chapter is organized as follows. We first review two important SNA tools for contact tracing: network visualization and measures. Then we discuss how SNA is applied in order to identify prominent individuals in disease pathways and study the dynamics of disease transmission. Finally, we present the case study and conclusions.

2 Network Visualization and Measures in Sna

In any society, individuals develop their relationships with others and form their own personal networks through social activities. From these networks, they may seek advice for important decisions, obtain resources useful for their jobs, and create alliances for supporting their beliefs. Based on the observation of how individuals act in a society, instead of supporting the idea that people are autonomous, SNA proposes that people’s behavior is better explained by seeing them as embedded in a network of relationships. By reconstructing a social network, SNA researchers seek to understand people’s behavior and organizational structures from their linkages with each other.

In SNA, the relationship of individuals is described as a socio-matrix (Scott, 2000; Wasserman and Faust, 1994). It creates a one-to-one mapping between participants, and each cell indicates whether a relationship exists between its row and column persons (1 for existence and 0 otherwise). A socio-matrix can also be visualized as a socio-gram or social network in which individuals are symbolized as nodes and connected to each other with edges or ties for their relationships. Figure 15-1 shows a sample friendship network of ten individuals. In this network, Persons A and E are considered the most active or “popular” persons since they are linked to the largest number of people. Person F is also important although he/she doesn’t have as many connections as Persons A and E: Person F bridges two different groups of friends. Without Person F, these two groups of people may not have the chance to establish relationships with each other in the future. In SNA, these three people are said to be central or prominent within the sample network.

Figure 15-1.
figure 1

A sample friendship network of ten individuals.

Centrality measures are quantitative indicators for finding those “central” individuals from a network, originally developed in communication scenarios. From a topological perspective, people who are able to receive or control the mainstream of message flow typically stand in a position similar to the central point of a star (Freeman, 1978/79), such as the location of Person A in the network above. Various centrality measures, such as degree and betweenness, can be employed to determine the importance of a node within a network. For example, the degree is the number of edges that a node has. Since the central point of a star has the largest number of edges connecting it to the other nodes, a node with a higher degree is topologically considered to be more central to its network (Freeman, 1978/79; Wasserman and Faust, 1994). The betweenness measures “the extent to which a particular node lies between the various other nodes” (Scott, 2000) because the central point also sits between pairs. The higher betweenness a node has, the more potential it has to be a gatekeeper controlling the connections (such as communications) between the others (Scott, 2000). Table 15-1 lists the degree and betweenness of nodes in our sample friendship network. From this table we can see how these measures can reveal the prominence of people in a network.

Table 15-1. Degree and betweenness of nodes in the sample friendship network.

The centrality measures are categorized as micro-level measures and focus on the status of individual nodes in a social network. In contrast, macro-level measures reflect a network’s overall structure and are usually used for network-to-network comparison, such as the number of components and network density. A component in graph theory is defined as a maximal-connected sub-graph. Two nodes belong to the same connected component if they are connected directly with an edge or indirectly through other nodes. The number of components consequently shows the number of connected sub-graphs and reflects the degree to which people are grouped in a network (Scott, 2000). The number of components in our sample friendship network is 1. If we remove Person F from the network, its number of components would become 2. Network density is calculated with the proportion of existing edges to the maximum possible edges among nodes (Wasserman and Faust, 1994). If two social networks have the same number of nodes, the network density can differentiate their interaction intensity. According to combinatorics, the maximum possible edges of our sample network totals (10 × 9)/2 = 45. Its existing edges are 10. Therefore, its network density is 10/45 = 0.2222. The frequently used macro- and micro-level measures are summarized in Table 15-2. It is noted that in some occasions the average value of a micro-level measure can also serve as a macro-level measure. For example, the average degree of nodes can also indicate network participants’ interaction intensity and replace the network density in usage.

Table 15-2. Summary of frequently used network measures.

3 Sna in Epidemiology

When applied to epidemiology, a social network is called a contact network. It represents accumulated linkages among patients with their potential contacts of infection in a period of time. Therefore, unlike the actual route of transmission which is a one-to-one mapping between patients for their infection, a contact network typically depicts a many-to-many relationship. From a contact network, disease investigators can visualize the potential scenarios or social factors that triggered an outbreak and propose corresponding containment strategies to control it.

3.1 Static Analysis of Linkage in a Contact Network for STDs

The main strength of SNA in disease analysis is its ability, through centrality measures, to identify key individuals in an outbreak. For STDs, those key individuals are referred to as the core group and bridges (Thomas and Tucker, 1996; Wasserheit and Aral, 1996). The concept of a core group was introduced by Yorke et al. (Yorke et al., 1978) in the 1970s and postulates that epidemics or endemics of an STD are maintained by a small group of sexually active individuals who persistently infect other healthy people. Because of their active sexual life, those core group members inevitably behave like the central point of a star connecting to a large number of others in a contact network and exhibit high values in the degree and betweenness measures. However, the wide spread of an STD requires individuals who, acting as bridges, transfer the disease from one subpopulation to another (Rothenberg and Narramore, 1996; Wasserheit and Aral, 1996). These bridge people may not have many sexual partners, but accidentally channel the disease to a different class of subpopulation (e.g., different economic class) via their purchase of sexual services. Therefore, they may exhibit low degree values but high betweenness.

In epidemiology, the central questions in SNA studies usually surround which group rather than which person facilitates a disease’s spread. Therefore, investigators need to categorize patients into several groups according to their demographic characteristics and then calculate the average values of centrality measures for each group. In the Colorado Springs study, Rothenberg et al. (1995) estimated the relationship of centrality rankings with the perceived risk of AIDS and categorized the behaviors of their participants into six categories: prostitutes, paying and nonpaying partners, injection drug users and their partners, and other. They reported that prostitutes and nonpaying partners who ranked highest in information centrality were more likely to engage in high-risk sexual activities, such as anal sex, and know someone with AIDS. In a separate study of a syphilis outbreak, Rothenberg et al. (1998b) found that people with syphilis were more central within the outbreak network based on their significantly higher betweenness. From the network visualization, they further uncovered that a group of young girls served as the core group of the outbreak by connecting two different ethnic groups of men.

3.2 Transmission Dynamics of STDs

A contact network is analogous to a snapshot which captures the process of disease distribution within a given period of time. Comparing a series of contact networks with macro-measures enables the study of transmission dynamics by examining the change in transmission patterns over time. In the literature, there are two major perspectives in studying transmission dynamics with SNA: risky behavior and epidemic phases. In 1998, Rothenberg et al. (1998a) presented results from a longitudinal study in Colorado Springs as an example of the risky behavior perspective. Ninety-six AIDS patients were repeatedly interviewed for 3 Years about their contacts with others, including sexual contact, drug use, and needle sharing. For each type of contact, the researchers constructed three serial contact networks at 1-Year intervals and compared the structure of those serial networks to assess network stability and changes in risky behavior. According to the study results, one group of patients showed a significant decrease in needle sharing based on the gradually smaller average degree and size of components in the group’s contact networks.

The dynamic topology of transmission proposed by Wasserheit and Aral (1996) provides a theoretical ground for using SNA to identify the epidemic phases of STDs. Wasserheit and Aral extended the core group theory and suggested that STD transmission is determined not only by the change rate of sexual partners but also by interaction with healthcare programs. According to their dynamic topology as shown in Figure 15-2, in an early phase of transmission or a growth phase, an STD must first enter a sexual network in which the change rate of sex partners is high enough to allow the STD to establish itself and grow within a subpopulation. With a consistent increase of infected individuals, the disease eventually expands to other subpopulations via bridges: people who have sexual contact with more than one subpopulation.

Figure 15-2.
figure 2

Wasserheit and Aral’s dynamic topology adapted from (Wasserheit and Aral, 1996).

When the STD starts to spread simultaneously in various subpopulations, this is described as a hyperendemic phase. At this point, healthcare workers would begin to notice the disease, initiate an investigation, and develop intervention programs and curative therapies. If these measures were effective, the number of incidents would gradually decrease, thereby transitioning to a decline phase. The STD eventually would arrive at an endemic phase and reside in a marginalized subpopulation where the number of sexual partners may be high but contact with healthcare systems is restricted or minimal (Wasserheit and Aral, 1996).

According to Wasserheit and Aral’s topology, Potterat et al. (2002a) suggested that the structure of sexual contact networks is more accurate than secular trend data for indicating epidemic phases. To prove their concept, they constructed a sexual contact network of chlamydial patients in Colorado Springs from 1996 to 1999. They found that while the number of reported cases increased by 55% during this period of time, the network was relatively fragmented and lacked cyclic structures in comparison with an outbreak contact network. These circumstances indicated that the chlamydial transmission was in either a stable or a declining phase. Cunningham et al. (2004) further examined the structural characteristics of a contact network associated with epidemic phases. They compared the structures of two contact networks which respectively represented the periods during and after an epidemic. They reported that after the epidemic, the overall network centrality declined but the component density increased. This finding is consistent with Wasserheit and Aral’s topology that in the decline phase the disease would be restrained in sexual networks that have intensive sexual exchange but limited access to the healthcare system.

3.3 From STDs to Tuberculosis

Before the Year 2000, SNA studies for disease outbreaks all emerged from the study of STDs. One reason for this could have been the availability of contact tracing data. Compared to other infectious diseases, such as influenza, STDs are heavily dependent on personal connections for transmission and hence can be controlled by contact tracing and taking appropriate intervention actions. Another reason may be related to the capability of network presentation. Since SNA was originally developed to study social phenomena via person-to-person linkage, its network presentation is inherently used to portray the relationships between people and contains only individuals as actors in the network graph. This kind of presentation may be sufficient for STDs but is not sophisticated enough to describe the scenarios of indirect-contact or airborne transmission. Klovdahl et al. (2001) addressed this limitation with their investigation of a tuberculosis (TB) outbreak in Houston, Texas. They first used the conventional presentation of SNA and constructed a person-to-person contact network to analyze the outbreak. However, only 12 personal contacts were identified among the 29 patients. Through further collaboration with local healthcare workers, they found that geographical contact was more important than personal contact in understanding the outbreak. By including places such as bars and restaurants in their contact network, they were finally able to connect those 29 patients directly or indirectly in a network (Klovdahl et al., 2001).

Since then, several outbreak studies have adopted the same approach of incorporating geographical contacts into SNA (Abernethy, 2005; Andre et al, 2007; De et al., 2004; McElroy et al., 2003). McElroy et al. (2003) included clubs as nodes in their networks and showed the potential connections among 17 TB patients between 1994 and 2001 in Wichita, Kansas. De et al. (2004) also found a positive relationship between attendance at a motel bar and a gonorrhea infection in Alberta, Canada, in 1999 and used a contact network with the motel bar to demonstrate this connection. Based on these studies, many researchers believe that it is important to examine the social context of disease transmission in a contact network. Geographical locations are places of aggregation and create opportunities for social interaction. Including geographical locations in contact networks can not only help to reveal potential places for indirect or casual transmission contact, but can also help to identify social context which groups people and facilitates pathogen transfer.

3.4 Summary of SNA Studies in Epidemiology

Table 15-3 summarizes several SNA epidemiology studies in chronological order. Although Klovdahl’s conceptual paper was published in 1985, the application of SNA in STD investigation did not start until the Colorado Springs study in 1994. Through the Colorado Springs study, SNA not only empirically demonstrated its ability to support contact tracing but also examined structural evolution of contact networks. Since then, STD with sexual contact has been the focus of analysis. In 2001, SNA was further applied to TB. Including geographical contact in the contact network was proposed to demonstrate airborne and casual contact transmission in public places. Because of the rich insights it provides, the inclusion of geographical contacts gradually became a standard practice for both TB and STDs to show the potential connection of patients via their daily activities.

Table 15-3. Summary of SNA studies in epidemiology.

Nonetheless, SNA has some limitations just like any other analytical tools. First, the accuracy of analysis depends on the quality of contact tracing data (Blanchard, 2002; Ghani et al., 1997). If contact tracing is not well executed and some key patients are not identified, a constructed contact network could be fragmented and fail to present a complete picture of transmission scenarios. All the analyses based on the contact network consequently could be misleading. Second, the qualitative visualization and quantitative measures of SNA are just tools for disease investigators to explore the phenomenon. To understand an outbreak with SNA, the investigators still need to consider many factors, including: environmental and social contexts, patient demographics, disease pathogen characteristics, etc. In addition, they need to interpret those data with their own domain expertise and insights (Rothenberg and Narramore, 1996).

4 A Case Study: The Sars Outbreak in Taiwan

For our case study, we investigated the role of geographical contacts in disease analysis. In this section, we first review the Taiwan SARS outbreak of 2003 and introduce its contact tracing dataset. Then we present the two analyses, connectivity and topology analyses, used in our investigation.

4.1 Taiwan SARS Outbreak and Contact Tracing Dataset

SARS is an infectious disease caused by a novel coronavirus named SARS-associated coronavirus (SARS-CoV) (CDC, 2003; Lipsitch et al., 2003). Its first human case was identified in Guangdong Province, China, on November 16, 2002 (Chu et al., 2005). In February 2003, a medical doctor from Guangdong Province went to Hong Kong and infected at least 17 other guests during his stay at a hotel, initiating a global epidemic of SARS (Donnelly et al., 2003; Peiris et al, 2003). The epidemic ended in July 2003, with more than 24 countries reporting suspected or probable cases, including Canada, Singapore, and Taiwan.

SARS caused great public health concerns because of its rapid international spread, high case fatality rate, and unusual nosocomial infection. The majority of SARS patients were infected in healthcare and hospital settings (Peiris et al, 2003). SARS is highly contagious and transmitted primarily via close personal contact, through exposure to infectious respiratory droplets or body fluids. Some studies have also suggested that SARS may be transmitted via indirect contact based on infection incidents in transportation vehicles, hospitals, or communities (Chen et al, 2004; Peiris et al., 2003; Yu et al., 2004).

In Taiwan, a series of hospital outbreaks caused the number of SARS cases to dramatically increase to over 300 between April to June 2003 (Chu et al., 2005). They started when a municipal hospital in Taipei received a SARS patient without a known source of infection in the middle of April. A week after her admission several healthcare workers gradually developed symptoms. The hospital was reported as having a hospital outbreak on April 22 and closed on April 24. Seven hospitals subsequently reported incidents of nosocomial infection and some suspended their emergency room operations, including a teaching hospital in Taipei. This series of outbreaks were suspected to have been triggered by inter-hospital transfer and the movement of SARS patients (Chu et al., 2005). On July 5, 2003, Taiwan was officially removed from a World Health Organization (WHO) list of SARS-affected areas.

The Taiwan SARS data was collected by the Graduate Institute of Epidemiology at National Taiwan University during the SARS period. It contains the contact tracing records of 961 suspected and confirmed SARS patients in Taiwan and their treatment histories. The records are comprised of two main categories, personal and geographical contacts. The personal contacts are those recognized interactions with known SARS patients in household, workplace, and hospital settings. The geographical contacts include visits to high-risk areas of infection, such as SARS-affected countries and hospitals. Table 15-4 summarizes the numbers of records and patients involved in each type of contact. It should be noted that a patient may have multiple records in a type and across types of contacts.

Table 15-4. Summary of the Taiwan SARS databaset.

4.2 Contact Network Construction

In order to present both personal and geographical contacts at one time, we adopted a two-mode network approach to construct a SARS contact network. This kind of approach has been taken in several studies, such as the Houston tuberculosis study by Klovdahl et al. (2001) and the Alberta gonorrhea study by De et al. (2004). The network contains two types of nodes, patients and geographical locations. We linked two patient nodes with an edge if they were family members or had an identified interaction. We connected a patient node to a location node, such as a hospital or foreign country, if the patient had been there during the SARS period. The construction of a contact network is demonstrated in Figure 15-3.

Figure 15-3.
figure 3

Example of contact network construction.

4.3 Connectivity Analysis

Connectivity is the degree to which a contact type can link individual patients in a network which can then be measured by the number of components. In order to understand how SARS spreads, connectivity analysis can be used to show the relative importance of geographical contacts, based on their ability to connect patients. If a type of contact has relatively high connectivity, it should significantly decrease the number of components from the total number of patient nodes. The types of contacts we investigated in this analysis are listed in Table 15-5.

Table 15-5. Types of contacts in the investigation.

Table 15-6 shows our results for the two main categories of contacts. After applying all available records, we can reduce the number of components in the network from 961 to 10. If we use the personal contacts alone for construction, the number of components decreases to 847 and the network is too sparse to get a comprehensive picture of how SARS spread in those patients. In contrast, the geographical contacts reduce the number of components to 82. This suggests that the majority of patients had been to the same place or places before the onset of their symptoms, indicating that knowing and analyzing the geographical contacts is important for understanding this outbreak.

Table 15-6. Results of connectivity analysis for main categories.

We further examined the connectivity of each type of contact, with Table 15-7 showing the results. Hospital-related contacts are the top 3 contacts in connectivity, consistent with the fact that SARS patients were primarily infected in the hospital setting.

Table 15-7. Connectivity analysis of the nine types of contacts.

4.4 Topology Analysis

A traditional social network, or one-mode network, is comprised of only one set of nodes and describes person-to-person relationships. A two-mode network, on the other hand, has the ability to portray micro and macro relations simultaneously. In topology analysis, the goal is to investigate the value of a two-mode contact network for deducing potential disease pathways.

Since a two-mode network contains two sets of nodes with different layers, personal and geographical, it emphasizes the relationships between patients and their visits to high-risk locations. Figure 15-4 shows the large number of patients whom have had contact with hospitals with outbreaks of nosocomial infection, such as Heping Hospital; the nodes representing patients surround each hospital. Through patients’ visits and admissions, there are unusually complex linkages formed among the hospitals. These linkages may explain the series of hospital outbreaks in Taiwan.

Figure 15-4.
figure 4

Two-mode SARS contact network.

Since a one-mode network is comprised of only patient nodes, we have to degrade geographical relations to person-to-person ones. To do this, we connect two patients together if they have been to the same geographical location. Figure 15-5 shows the transformed one-mode network. Generally, geographical contacts are collected to indicate potential occasions for infection when personal contacts are not traceable. After degrading, the linkage among patients was unnecessarily amplified to such a degree that meaningful patterns from the contact network could no longer be identified. In contrast, a two-mode contact network preserves important clues about the outbreaks from both person-to-person and person-to-location relations, even when hundreds of patients are involved in the graph.

Figure 15-5.
figure 5

One-mode SARS contact network.

The two-mode network stresses person-to-location relationships and presents patients as clusters around high-risk areas. In this type of layout, patients acting as bridges among major clusters are easily seen and identified. Figure 15-6 shows the potential bridges among the major hospitals with nosocomial infection

Figure 15-6.
figure 6

Potential bridges among hospitals and households.

When investigating a hospital outbreak, including geographical contacts in the network is also useful for seeing possible disease transmission scenarios. Figure 15-7 demonstrates the evolution of a small contact network at Heping Hospital through the onset dates of symptoms. On April 16, Mr. L., a laundry worker in Heping Hospital, had a fever and was reported as a suspected SARS patient. On April 16 and 17, Nurse C took care of Mr. L. On April 21, Ms. N, another laundry worker, and Nurse C began to have symptoms. On April 24, Heping Hospital was reported to have a hospital outbreak. On May 1, Nurse C’s daughter had a fever. From the evolution of the network, development of the hospital outbreak can be readily discerned.

Figure 15-7.
figure 7

Example of network evolution through the onset dates of symptoms.

5 Conclusions

SNA has been demonstrated to be a good supplemental tool in the investigation of contact tracing. Compared to the traditional process of reviewing contact records one by one, SNA provides healthcare workers with a more efficient method of integrating and visualizing the relevant records in a contact network to discern potential linkages among patients, thus revealing disease pathways. Network measures, especially centrality measures, enable investigators to examine the context of transmission and develop effective intervention programs by identifying important individuals who may cause or exacerbate an outbreak. In addition, some studies have used SNA to study the transmission of disease dynamics, demonstrating that the structure of a contact network is a more accurate indicator of epidemic phases than the traditional secular trend data.

Incorporating geographical contact information in SNA allows disease investigators to analyze infectious diseases other than STDs. While personal contact provides direct evidence for the causality of infection, geographical contact captures the factors of human aggregation in disease transmission and provides potential leads to indirect or casual infection. In our case study, the role of a type of contact in disease transmission can be potentially identified by its ability to join patients together. Including geographical locations can significantly aid in establishing linkages among patients. Because these locations can play an important role in facilitating the transfer of pathogens, they require the attention of epidemiologists and other investigators of infectious disease.

6 Acknowledgements

This work is supported by the National Science Foundation Information Technology Research Program, ITR, through Grant # IIS-0428241.

7 Questions for Discussion

  1. 1.

    Contact tracing is an important control measure in the fight against an infectious disease. If you want to use contract tracing to control a developing outbreak, what kinds of data will you collect during the interview with confirmed patients? Discuss the question from two perspectives: disease control and outbreak analysis.

  2. 2.

    A contact network depicts the potential pathways of disease propagation among patients. Discuss the strengths and weaknesses of a contact network in outbreak investigations.

  3. 3.

    Assume that you have a set of STD contact tracing data. It includes patients’ sexual contacts, patronized bars and motels, and demographic information, such as patients’ residency, gender, age, occupation, and income level. Discuss the kinds of analysis that can potentially be performed with this dataset and list your steps to investigate them using SNA.

  4. 4.

    Geographical contact information provides additional insights but can also create some problems when you include it in your disease analysis. Discuss the downsides of including geographical contacts in disease analysis and ways to reduce or eliminate them.