Advertisement

DATA, DATA, AND MORE DATA

  • Ana Pastore y Piontti
  • Nicola Perra
  • Luca Rossi
  • Nicole Samay
  • Alessandro Vespignani
Chapter

Keywords

Severe Acute Respiratory Syndrome (SARS) Filoviridae Coronavirus EACH MODEL Model Must 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Given the decades of success achieved in weather forecasts and the phenomenal results in the computational modeling of new material properties, it is natural to wonder why we are at a very different stage in the quantitative forecasting of the next pandemic or the progression of seasonal influenza. The main difference is that spreading and contagion models start with assumptions about the behavior of humans and society instead of the physical laws governing fluid and gas masses. In other words, while it is possible to produce a complete temperature analysis of the sea surface and satellite images of atmospheric turbulence, we do not yet have large-scale knowledge of commuting patterns or precise maps of contact among individuals.

In recent years, however, the tremendous progress made in data collection and information technology is lifting the limits we have faced for decades in the gathering of quantitative social, demographic, and behavioral data. Improved techniques and methodologies support the inter linkage and integration of digitalized datasets with geo-coding information, economical, and transportation databases.

Data-driven epidemic modeling relies on a multitude of datasets, ranging from population movement records to social and behavioral data, as well as census and health-related information. These data are the bricks that build the pillars of any computational approach to the analysis of infectious diseases. This chapter is dedicated to the illustration of the extraordinary high-quality datasets that can be leveraged to build those pillars.

POPULATION

The geographical distribution of the population is the first key ingredient of any epidemiological model. In order to model the spreading of an infectious disease that is transmitted from person to person, it is crucial to have detailed data of where individuals are living. Creating high-quality maps of the human population at the worldwide scale is per se a scientific challenge. Different projects tackled this problem using advanced techniques that merge different datasets, ranging from routinely collected census data to satellite imagery analysis. NASA’s Landscan project, Columbia University’s Gridded Population of the World, and the WorldPop projects all make worldwide maps available at different granularities and resolutions. More precisely, in these works the surface of the world is divided into a grid of cells that can have different resolution levels. Each cell has an estimated population value assigned. Data from the Gridded Population of the World project is shown in FIGURE 2.1 . The bright regions are associated with highly populated locations. Not surprisingly, China, India, and Mexico are extremely visible. These maps make clear how the population is heterogeneously distributed around the globe. Australia and China are two extreme cases. The first country has just a few highly populated regions. The second, instead, is the most populated country in the world, and its population is spread across a large part of the country. Historical, geographical, environmental, and political factors are at the root of these heterogeneities that play an important role in the geospatial diffusion of infectious diseases.
Figure 2.1 |

Mapping the population of the world

Global population distribution from the Gridded Population of the World project.

Source: Columbia University’s Gridded Population of the World

THE AGE STRUCTURE OF THE WORLD POPULATION

The accurate characterization of the structure of social contacts in mathematical and computational models of infectious disease transmission is another key element in the assessment of the impact of epidemic outbreaks and in the evaluation of effective control measures. For instance, the transmissibility potential of a disease and the final epidemic size depend on mixing patterns between individuals in the population, which in turn depend on sociodemographic parameters. A critical feature affecting disease spread is the age structure of the population of each country. Furthermore, individuals of different ages interact differently according to household size, fraction of workers and students in the population, etc. All of these indicators change consistently across the world. Here, we consider 10 age brackets: 0–4, 5–9, 10–14, 15–19, 20–29, 30–39, 40–49, 50–59, 60–69, and 70+ years old. More developed countries typically have larger numbers of the elderly and fewer numbers of children and young adults, while the picture reverses itself for developing countries where we observe a larger proportion of children than the elderly. This is clear from inspecting the two maps showing Europe and Africa (FIGURE 2.2 ). Indeed, the density of individuals over 70 is visibly much higher in Europe than in Africa; alternatively, the density of children is evidently higher in Africa than Europe.
Figure 2.2 |

Age structure heterogeneity

Population distribution for the age bracket 0–4 (left) and 70+ (right), in Europe, Middle East, and Africa.

Source: Columbia University’s Gridded Population of the World

HUMAN MOBILITY

As we go about our daily lives, we come in contact with individuals who can carry viruses, bacteria, and other pathogens, several of which are capable of causing disease and infection. Whenever the conditions are favorable, they may be able to infect us, turning us into a new vehicle for the spreading of the disease. It is not surprising then that the geographical diffusion of pathogens is thus intrinsically intertwined with human mobility. Every time a person moves from home to work, or travels to another city or country, an opportunity arises for the pathogen to spread to a new population of potential hosts.

As technology evolves, so do our traveling habits: automobiles, trains, and airplanes help to shorten physical and temporal distances, for humans and for pathogens.

Long-Range Mobility

Over the course of 100 years, flying went from an occupation of the eccentric, to a luxury of the wealthy and finally to a necessity in the life of millions of passengers every year.

This shift was brought about by the increasing affordability and convenience of air transportation. Once distant cities are now just a few hours apart whether they are separated by hundreds or thousands of miles. This worldwide network evolved organically during the last century through the decentralized decisions of hundreds of different companies and individuals. As a result, its global structure is extremely complex, with many non trivial properties (FIGURE 2.3 ). According to the Official Airline Guide (OAG), the global airline network includes over 4,000 airports connected by tens of thousands of direct flights. Chaining different flights through a major hub creates connections between airports not directly connected. Cities like New York, Atlanta, Frankfurt, or London have large airports (or even multiple airports in close proximity) that serve as the hubs connecting destinations that do not have direct flights. The top 10 cities by airline traffic and number of connections in 2012 are listed in FIGURE 2.4 .
Figure 2.3 |

Global long-range mobility network

Each node represents a major transportation hub. Their size is proportional to the population served. Each link represents a direct flight connection.

Their width is proportional to the number of available seats.

Source: Official Airline Guide

Figure 2.4 |

Top cities by airline traffic and number of connections

List of the top 10 cities by the number of connections (left) and available seats per year (right); data refers to 2012.

Source: Official Airline Guide

Airline data also refers to the so-called origin-destination flows between airports. Origin-destination datasets report the actual number of travelers between airports, regardless of intermediate stops or connecting flights. The origin-destination flows network has many more links than the airline network, which only considers non stop connections. This network accounts for the number of individuals traveling from place to place, thus providing a more accurate picture of the dispersion of individuals potentially carrying diseases.

Short-Range Mobility

Trains, public transportation, and personal automobiles have impacted short-distance travel in much the same way that accessible airline transportation has affected long-distance spreading patterns. Now, more than ever, we can easily cover tens of miles just to go from home to the office or school and return home at the end of the day. In doing so, we greatly increase the number of people we come in contact with daily and with it the opportunities for a disease to spread.

The cyclical nature of our commuting patterns tightly couples neighboring cities within a few hours. In this way, infections that first arrive in a city through airline connections are quickly diffused and spread locally. Such coupling is so evident that, by simply plotting the commuting patterns between neighboring cities, one is able to quickly identify the major metropolitan areas, even in the absence of any other information, as these naturally generate stronger flows. In FIGURE 2.5 , we show two examples: the United States and Italy. In the maps, each node is a municipality and the weighted links between them are the flow of commuters. In the United States, commuting is more concentrated in the east coast where the density of municipalities is higher. Not surprisingly, large urban areas such New York City, Washington D.C., Atlanta, San Francisco, and Seattle are clearly visible and contain the majority of traffic. The same behavior is observed in Italy, where Milan and Turin (north) and Rome and Naples (center and south) are the most densely connected areas.
Figure 2.5 |

Short-range mobility network in continental United States and Italy

Each node represents the center of a population cell. Each link represents a flow of commuters connecting two nodes. The width is proportional to the traffic per connection

MOBILITY PATTERNS AND EPIDEMIC SPREADING

It is interesting, and extremely relevant in understanding the spread of infectious diseases, to notice the differences between the topologies of long-range and short-range mobility networks. Indeed, in the air transportation network, just the nodes are embedded in the earth’s surface; links are not. They cross each other connecting a node with several others, possibly at any distance. In the commuting network rather, links are also embedded in the earth’s surface: a node is only connected to geographically closest points. Understanding the differences between the two topologies is crucial for the characterization of the spreading of diseases. While through air transportation a disease can spread to very far distances directly, in the case of commuting, it is constrained to move in space going from one node to the next closest one, so that the epidemic spreads like a wave in water. Their differences explain the different behavior of pandemics in the modern times with respect to their unfolding before mass air transportation. FIGURE 2.6 illustrates the differences in the case of New York: in orange are the connections to the nearest neighbors in the air transportation network and (inset) in the commuting network. The dissimilarities between the two are extremely clear. By using the flight network, an infected individual can move thousands of miles in less than one day, reaching, for example, Japan, while an individual is rather limited by commuting, to the order of a hundred miles.
Figure 2.6 |

Direct flight connections to and from New York

The locations reachable from New York via one single flight. Inset: the cells within a circle of 100 miles from New York connected by short-range mobility

DISEASES

Needless to say, the knowledge of the pathogenic agent itself is critical input in the analysis of infectious disease spreading. No single model fits all diseases. Their basic features and properties have critical effects in the spreading process, and each model must be tailored to the particular illness under study. In this book, we focus on a particular subset of infectious diseases, which are caused by viruses and are transmitted through human-to-human interactions. We explicitly consider influenza viruses, coronaviruses, and filoviruses. A different virus characterizes each of these diseases. Their biological features define the natural history, mortality, and transmissibility of the disease. In the following section, we provide a basic description of each of these diseases, summarizing their biological characteristics and historical spreading.

DISEASES: INFLUENZA

Every year, the flu spreads across countries during the winter season, usually in mild forms, killing nevertheless between 250,000 and 500,000 individuals worldwide. The biological structure of influenza viruses usually changes little from year to year, leaving the majority of people still immune from previous existing/circulating strains. Occasionally, however, the biological structure can change enough and a new strain emerges, against which the population has no immunity. This might happen through different mechanisms, often involving flu viruses circulating among birds or pigs. Pandemics occur when a new flu virus is able to spread from person to person in an efficient and sustained way, quickly affecting a large fraction of the population that has no immunity to the disease. On average, pandemics take place a few times every century, with reports of flu pandemics seeming to start in ancient Greece more than 2,500 years ago.1 In FIGURE 2.7 , we show the largest influenza pandemics since 1900. As is clear from the number of deaths, influenza viruses are a serious threat. The first influenza virus was isolated in the laboratory in the early 1930s. Since then, our knowledge of its biological structure and dynamics has greatly increased. FIGURE 2.8 shows the geographical impact of the latest flu pandemic of 2009.
Figure 2.7 |

Largest flu pandemics

Death tolls of the largest influenza pandemics since 1900

Source: Centers for Disease Control and Prevention

Figure 2.8 |

Geographical impact of influenza A (H1N1)

The number of influenza cases in each country around the world with circles indicating the number of deaths due to it.

Source: FluNet, World Health Organization

VIRUS STRUCTURE

There are three types of influenza viruses that affect humans: type A, B, and C. The first type is the most diffused and responsible for the majority of epidemics and pandemics in our collective history. Type B and C are rarer and associated with localized outbreaks. Influenza A viruses are further divided into subtypes based on their biological structure. They are made of eight single-stranded RNA segments with two major glycoproteins in the surface: HA hemagglutinin and NA neuraminidase. There are at least 16 different HA and 9 different NA. The predominant subtypes found in humans are H1, H2, H3, and N1, N2. Their combinations form the influenza virus. In recent history, we have been subjected to H1N1, H2N2, and H3N2 pandemics. Each of these subtypes can be further divided in different strains defined by the specific combination of genes and proteins.

VIRUS TRANSMISSION

The virulence and fatality rate change from strain to strain. Influenza viruses affect animals as well and can sometimes be transferred from animals to humans. In particular, swine (H1 and H3), ducks (H7), and chickens (H5 and H9) have been shown to infect humans. There are three mechanisms behind the diffusion of influenza viruses: droplet, airborne, and contact transmission. In the first class, an infected individual coughs or sneezes, diffusing large droplets that reach conductive or mucous membranes of a susceptible person. In the second case, there is no direct contact between the droplets and the susceptible person; they can be vaporized in the air and become breathable by susceptible individuals. In the last case, the virus can be transmitted from person to person through contact with the secretions of an infected individual or through physical contact between individuals. After being in contact with the virus, susceptible individuals might become infected. If this is the case, the virus starts reproducing inside the new host. Typically, the symptoms arise after the first day. The infectiousness increases, while the virus reproduces itself and reaches its peak after 2–3 days on average, after which the viral load then starts to decline thanks to the reaction of the immune system.

DISEASES: CORONAVIRUS

Coronaviruses are extremely diffused among humans and more in general among mammals and birds. They are the cause of a large percentage of all common colds in human adults. In contrast to influenza, just two human coronaviruses were known for many years. In 2003, a new coronavirus was identified as being responsible for the severe acute respiratory syndrome (SARS) outbreak that spread around the globe. SARS is one of the best examples of a new emerging disease. After AIDS in the early 1980s, this was the first new virus able to reach a global scale. Indeed, it spread to more than 30 countries across 5 continents in 2003. While the number of confirmed cases has been limited, around 8,000 in total, the associated mortality was much higher than a typical influenza strain. It ranged from 3% to 10%, creating large concerns that led to global efforts for its containment. The map of FIGURE 2.9 highlights the countries affected during this outbreak. In 2012, a new type of coronavirus was discovered in the Middle East, dubbed MERS-CoV. Then, in May 2015, an outbreak of MERS-CoV occurred in the Republic of Korea, causing one of the largest MERS-CoV outbreaks outside of the Middle East region.2
Figure 2.9 |

Geographical impact of SARS

Total number of confirmed SARS cases in the 2003 outbreak.

Source: World Health Organization

VIRUS STRUCTURE

Coronaviruses belong to the subfamily Coronavirinae in the family Coronaviridae. They are enveloped viruses with a positively stranded RNA genome. Several proteins contribute to the biological structure of coronaviruses. In particular, there are spike, envelope, membrane, and nucleocapsid proteins.

VIRUS TRANSMISSION

The transmission mechanisms of coronaviruses are typical of influenza-like illnesses (ILI). However, other characteristics are quite different. For example, in the case of SARS, the proportion of asymptomatic infections was relatively small. The maximal infectiousness registered occurred about 7 days after the onset of symptoms. The virus responsible for SARS is different from all other known coronaviruses. It appears to have originally been an animal virus that crossed to humans. Indeed, the virus has been isolated from civet cats in the Guangdong Province. In this region, there are many markets in which civet cats and other exotic animals are sold. A large fraction of workers in these markets were found to be seropositive for SARS. However, it is not clear if civet cats or other animals are the natural reservoir of the virus in the wild.

DISEASES: FILOVIRUSES

Ebola and Marburg viruses belong to the family of filoviruses and cause severe hemorrhagic fevers in human and non human primates. The first filovirus was identified in 1967, during an outbreak in Germany and the former Yugoslavia.3 In 1976, the Ebola virus was identified for the first time when two outbreaks, one in Zaire (now the Democratic Republic of Congo) and the other in Sudan, occurred. These outbreaks involved two different species of the virus, named after the two countries. Ebola viruses are highly lethal, with up to 90% (Zaire) and 50% (Sudan) of cases being fatal. In 2014, the world faced the largest outbreak yet of Ebola (Zaire), when Guinea, Liberia, and Sierra Leone were affected, presenting 28,000+ cases and 11,000+ deaths. FIGURE 2.10 shows a timeline for filoviruses since the first isolation of the virus, to the 2014 EVD outbreak in West Africa.
Figure 2.10 |

Ebola virus and Marburg hemorrhagic fever timeline

Source: Centers for Disease Control and Prevention, World Health Organization

VIRUS STRUCTURE

Filoviruses appear in a variety of “threadlike” virions (infectious viral particles) and encode their genome in a negative-sense RNA. Common shapes include long branching filaments, and shorter filaments shaped like a “6,” the “U”-shaped filament, and even circles.

VIRUS TRANSMISSION

It is unknown how the virus is transmitted from its natural reservoir to humans. Once a human is infected, ways of transmission include close personal contact with infected individuals or their bodily fluids. Caregivers are at a higher risk of becoming infected due to close contact with the infectious individual. During outbreaks, the isolation of patients and the use of protective clothing and disinfection procedures are crucial to interrupt the transmission of filoviruses. In 2015, the first vaccine was developed to treat the Ebola virus, and field trials began in West Africa shortly thereafter.

HEALTH INFRASTRUCTURES

Some countries are more ready than others to deal with the risks associated with epidemic diseases. Different indicators are commonly used to evaluate each country’s capacity to respond: for instance, the number of physicians per capita and the number of beds per 10,000 individuals. It is clear that countries with a good health infrastructure will be better at combating the spreading of an epidemic as well as reducing its impact on the population. Understanding and mapping health infrastructures and other socioeconomic indicators are extremely important in the building of models that can provide insight on the disease burden across different countries and measure the risk associated to emerging pathogens.

From the map of FIGURE 2.11 , it is clear that more developed countries have a larger number of physicians per capita. TABLE 2.1 reports the top ten and bottom ten countries ranked according to physicians per capita and hospital beds for 10,000 individuals. We consider the six WHO regions: Africa (AFRO), Americas (AMRO), Eastern Mediterranean (EMRO), Europe (EURO), South-East Asia (SEARO), and Western Pacific (WPRO). As shown in the rankings, European countries are in many of the top positions. The large majority of African countries are instead in the last positions. Indeed, concerning the density of physicians, nine of the last ten are in the AFRO region.
Figure 2.11 |

Mapping health infrastructures

Geographical distribution of the number of physicians per 10,000 people

Table 2.1 |

Distribution of health infrastructures

List of top and last 10 countries per number of physicians and hospital beds per 10,000 people.

Source: Global Health Observatory data repository, World Health Organization, 2009

The same pattern is observed for the number of beds (FIGURE 2.12 ). The precise ranking is a bit different, especially for the top ten, but the geographical distribution of these quantities is well correlated. Indeed both indicators are correlated with the Gross Domestic Products (GDP) of each country.
Figure 2.12 |

Mapping health infrastructures

Geographical distribution of the number of hospital beds per 10,000 people

In the next chapter we discuss how all of this data can be incorporated within the framework of epidemic modeling.

Footnotes

  1. 1.

    World Health Organization

  2. 2.

    Centers for Disease Control and Prevention

  3. 3.

    Centers for Disease Control and Prevention

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Ana Pastore y Piontti
    • 1
  • Nicola Perra
    • 2
  • Luca Rossi
    • 3
  • Nicole Samay
    • 1
  • Alessandro Vespignani
    • 1
  1. 1.Northeastern UniversityBostonUSA
  2. 2.University of GreenwichLondonUK
  3. 3.Institute for Scientific InterchangeTorinoItaly

Personalised recommendations