A Method for Identifying Public Transportation Super Spreaders Considering Community Structure

Chen, Jun; Li, Zaiqi; Zhang, Zixuan; Li, Xiaowei

doi:10.1007/978-981-97-5814-2_40

Jun Chen¹³,
Zaiqi Li¹³,
Zixuan Zhang¹³ &
…
Xiaowei Li¹³

Part of the book series: Lecture Notes in Civil Engineering ((LNCE,volume 603))

Included in the following conference series:

Conference on Sustainable Traffic and Transportation Engineering

126 Accesses

Abstract

Due to variations in passengers’ travel behaviours, not all passengers exhibit the same epidemiological transmission ability when they are infected. Public transportation super spreaders are passengers who can cause more extensive infections when they are infected. This study utilizes multi-source public transit data to construct a weighted passenger contact network and proposes the Gravity Hub Bridge method (GHB) for node identification based on the gravity model and the community structure. Compared to other identification methods, GHB exhibits the largest transmission range difference at low, medium, and high epidemiological levels. In other words, the public transportation super spreaders identified by GHB possess a higher epidemiological transmission ability.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

Efficient urban public transportation plays a crucial role in facilitating human mobility and sustaining economic development. However, the shared and confined high-density travel spaces create favourable conditions for the transmission of diseases among passengers [1]. Infected passengers may potentially transmit the disease to other non-infected passengers during travel. Due to variations in passenger behaviours, some individuals have the capacity to cause more extensive infections when they are infected, identifying them as public transportation super spreaders [2]. Therefore, it is essential to identify these super spreaders. This identification process can assist relevant authorities in implementing targeted measures, such as travel restrictions, vaccination programs, health monitoring, and other disease control strategies.

Currently, the identification methods [3,4,5] for super spreaders are not fully developed and do not take into account the robust community structure within the passenger contact network [6, 7]. In complex networks, community structures facilitate the spread within communities while limiting the spread between communities, thus impacting the speed and extent of transmission. Therefore, in situations characterized by a strong community structure, node identification methods that consider community structure prove more effective than traditional identification methods. Existing approaches for identifying public transportation super spreaders often overlook the community structure of the passenger contact network. This oversight results in a lack of insights into relationships within communities, leading to an inaccurate representation of epidemiological interactions among passengers. Consequently, these methods fail to precisely capture the transmission abilities of individual passengers, which, in turn, affects the final identification of public transportation super spreaders. This discrepancy may introduce biases into decision-making regarding epidemiological control strategies. Considering the community structure within the passenger contact network when identifying public transportation super spreaders may offer a more accurate identification of passenger groups with higher transmission capabilities.

2 Research Method

2.1 GHB Method

Passenger Contact Network Construction

The passenger contact network is defined as a weighted undirected graph G = {V, E, W, M}, where V = {v_i| i = 1, 2, ⋅⋅⋅, n} is the set of nodes, with each node v_i representing a passenger using public transportation. E = {e_ij | i, j = 1, 2, ⋅⋅⋅, n, i ≠ j} is the set of edges, where e_ij represents the connection between two passengers, v_i and v_j, who are simultaneously present in the same vehicle during their travel. W = {w_ij | i, j = 1, 2, ⋅⋅⋅, n, i ≠ j} is the set of edge weights, where the weight of each edge e_ij is denoted as w_ij, representing the duration of the passengers v_i and v_j in the same vehicle. C = {C_k| k = 1, 2, ⋅⋅⋅, m} is the set of communities obtained through community detection algorithms, with each community C_k containing N_k nodes.

Community Division

To identify public transport super spreaders while considering the community structure of the passenger contact network, it is necessary to employ a community detection algorithm. This algorithm partitions the network into several communities based on the network's topology, aiming for strong connections within communities and weak connections between them. The Infomap algorithm is efficient, stable, and applicable to community detection in large networks. Therefore, this study utilizes the Infomap algorithm to perform community detection on the weighted passenger contact network.

The Infomap algorithm, based on the minimization of code length, employs a random walk approach to identify communities with the shortest path encoding length [8]. The description length L(M) for the random walk paths generated by partitioning the network's n nodes into m communities using partition method M is represented by Eq. (1). Initially, the Infomap algorithm treats each node as an individual community and progressively merges adjacent communities to maximize the reduction of the objective function L(M) until the reduction becomes negligible.

$$ L(M) = q_{\rm{ \curvearrowright }} H(Q) + \sum_{i = 1}^m {p^i_{\rm{ \circlearrowright }} H(P^i )} $$

(1)

$$ H(X) = - \sum_l^n {p_i \log p_i } $$

(2)

where $q_{\rm{ \curvearrowright }}$ is the probability that a certain step in the random walk will be converted to other communities at any node; H (Q) is the information entropy of random walks among different communities; H ( Pⁱ) is the information entropy of random walks within the community; $p^i_{\rm{ \circlearrowright }}$ is the sum of the probability of visiting each node in the community i and the probability of exiting the community i.

Weighted Interconnection Density Calculation

As the community structure of the network plays a role in promoting transmission within communities while inhibiting transmission between communities, the level of connectivity between communities has significant implications. When a community has fewer connections to other communities, the nodes within that community primarily transmit the disease to their neighboring nodes within the same community. Their impact on other communities is relatively minimal, making the hub nodes within the community particularly crucial. On the other hand, when a community has numerous connections to other communities, the nodes within that community possess the ability to spread the disease to neighboring communities. In this scenario, identifying bridge nodes between communities becomes essential. Therefore, there is a need to quantify the interaction between communities and distinguish the contribution of nodes to transmission within their own community and transmission to other communities. For community C_k, in combination with all edges involving nodes within and outside C_k, along with their strengths, the weighted interconnection density of community C_k is defined as follows:

$$ \rho_{C_k } = \frac{{\sum\limits_ {v_i \in C_k} {S_{^{in} } (i)/(S_{in} (i) + S_{^{out} } (i))} }}{N_k } $$

(3)

where S_in (i) represents the internal community weight, which is the sum of edge weights between node v_i and neighboring nodes within the community. S_out (i) represents the external community weight, which is the sum of edge weights between node v_i and neighboring nodes outside the community.

Node GHB Value Calculation

The relationships between internal and external connections within communities, as well as the impact of community size, are reflected through weighted interconnection density and the number of community nodes. If node v_i belongs to community C_k and its neighboring community is C_l, considering the reciprocal of edge weights as the distance between nodes, along with the internal and external weights of node v_i and its neighboring node v_j, as well as the weighted interconnection density of their respective communities and community size, we calculate the centrality of node v_i using the calculation method of the gravity model. The GHB (Gravity Hub Bridge) value of node v_i is calculated as follows:

$$ GHB(i) = \rho_{C_k } H(i) + (1 - \rho_{C_l } )B(i) $$

(4)

$$ H(i) = N_k \sum_{v_j \in I(i),v_j \in C_k } {\frac{{S_{in} (i)S_{in} (j)}}{{1/w_{ij}^2 }}} $$

(5)

$$ B(i) = \sum_{v_j \in I(i),v_j \notin C_k } {N_l } \frac{{S_{out} (i)S_{in} (j)}}{{1/w_{ij}^2 }} $$

(6)

where I(i) is the set of neighboring nodes of node v_i.

2.2 Baseline Method

Existing research has selected degree, strength, and k-shell decomposition as methods for identifying super spreaders in public transportation. However, as the passenger contact network is a weighted network, the s-shell decomposition method [9] is an extension of k-shell decomposition for weighted networks. In this study, we have chosen strength (NS) and the s-shell decomposition method (s-shell) as baseline methods for comparison. Additionally, we have selected weighted betweenness centrality (WBC) [10], weighted eigenvector centrality (WEC) [11], and weighted gravity model (Gravity) [12] as baseline methods for comparative analysis alongside GHB.

1) NS: The NS of node v_i is the sum of the edge weights connected to it.

$$ NS(i) = \sum_{j \in I(i)} {w_{ij} } $$

(7)

2) WBC: The WBC of node v_i is the number of shortest paths passing through this node in the weighted network, reflecting the hub of node propagation in the network.

$$ WBC(i) = \sum_{i \ne s,i \ne j,s \ne j} {\frac{{d_{sj} (i)}}{{d_{sj} }}} $$

(8)

where d_sj is the number of all shortest paths from node v_s to node v_j in the weighted network, and d_sj (i) is the number of shortest paths passing through node v_i in d_sj.

3) WEC: The WEC evaluates the importance of the neighboring node by using the information of the neighboring nodes and calculates the weighted adjacency matrix corresponding to the complex network.

$$ WEC(i) = \lambda^{ - 1} \sum_{j = 1}^N {w_{ij} e_j } $$

(9)

where λ is the largest eigenvalue of the weighted adjacency matrix W, and the eigenvector corresponding to W is denoted as e = (e₁,e₂,…,e_n)^T.

4) s-shell: Remove the node with the lowest NS in the network, all the removed nodes have an s-shell value of 1, denoted as wk_s = 1, and remove the remaining sub-networks, and assign an s-shell value of 2, denoted as wk_s = 2, and repeat this step until there are no nodes in the network.

5) Gravity: In [12], the degree of each node is regarded as its mass, and the shortest path distance between two nodes is regarded as the distance between them, and an index of gravitational centrality is proposed to identify influential spreaders in complex networks. For the weighted network, this paper replaces the indicators in the formula with the indicators in the weighted network.

$$ Gravity(i) = \sum_{d_{ij} \in \Psi_i } {\frac{NS(i)*NS(j)}{{d_{ij}^2 }}} $$

(10)

where ψ_i is the set of neighborhoods whose distance to node v_i is less than or equal to the given value, set to 3 in [12].

2.3 Evaluation Method

Weighted SIR Model

We use the SIR Model to capture the transmission capability of the super-spreaders. The SIR model categorizes nodes into three health states: susceptible (S), infected (I), and recovered (R). Infected nodes transmit the disease to their neighbors with an infection probability λ, and infected nodes recover and gain immunity at a recovery rate β (in this paper, β = 1). The calculation formula for the infection probability λ is as follows:

$$ \lambda_{ij} = m\lambda_t w_{ij} $$

(11)

where m controls the spread of the epidemic. In this paper, m is set to represent low (0.2), medium (0.5), and high (0.8) levels of epidemiological transmission ability. According to reference [5], λ_t = 8.17 × 10^-4 h^-1, where w_ij represents the edge weight between node v_i and node v_j.

For each identification method, selecting the first p percent nodes as the initial infected nodes, applying the SIR model to simulate the propagation of the network 100 times, and calculating R_m which represents the average final number of recovered nodes for the tested identification methods. This value is considered as the transmission capability of the super spreaders identified by this method.

Transmission Range Difference

To facilitate the comparison of the transmission capability among different methods, we selected NS, a widely used and easily comprehensible metric, as a reference method. We calculated the transmission range difference, denoted as r, between the other five methods and NS.

$$ r = \frac{R_m - R_s }{{R_s }} $$

(12)

where R_s stands for the average final number of recovered nodes for the NS method.

3 Algorithm Experiment

We used MySQL to capture passenger contact relationships and python to build passenger contact networks and community division.

3.1 Passenger Contact Network Construction

We collected raw data for one week of bus routes in a city, including smart card data, GPS data, transit operation records, and bus stop coordinates. The passenger boarding stations were determined by linking these data sources. Using the assumption of the next trip, the last trip, and the return trip, the alighting stations were determined using the trip chain method. Transfer behaviours were identified using an independent threshold-based public transit transfer model. The methods for determining boarding and alighting points, as well as transfer judgments, are detailed in references [13, 14]. Individual passenger trips with the same travel purpose were combined into public transit Origin-Destination (OD) pairs, and data cleaning was performed to obtain the weekly public transit OD data. Based on the passenger's travel chain data, the algorithm for determining contact among passengers within the same train carriage is designed, as illustrated in Fig. 1.

3.2 Community Division Result

The passenger contact network is divided into communities and the modular is shown in Table 1. It can be observed that the passenger contact network exhibits a strong community structure. However, a minority of passengers travel on less popular routes during off-peak hours, resulting in the presence of some independent communities.

Table 1. Community division results

Full size table

3.3 Comparison of Methods

Table 2. Transmission range difference table

Full size table

Using different super spreader identification methods to analyze the passenger contact network will yield different node ranking results. Therefore, it's essential to compare the transmission ranges caused by the identification results of each method. To facilitate comparisons, the transmission range difference for each method is calculated using Eq. (12). Taking Monday as an example, the transmission range difference of super spreaders selected by each method at different epidemiological levels in different proportions is shown in Table 2. As the intensity is the control method, the transmission range difference is 0. The transmission range difference is shown in Fig. 2, where the horizontal axis represents the super spreader ratio p, and the vertical axis indicates the method's relative transmission range difference r. A positive value suggests that the method's identification of public transportation super spreaders has a higher transmission capability compared to the results obtained with NS.

Combining Table 2 and Fig. 2, it can be observed that the GHB method demonstrates the most substantial disparity in transmission range among the identified super spreaders in public transportation. This signifies that the results produced by the GHB method can lead to a broader reach, indicating that GHB is more proficient in identifying public transportation super spreaders when compared to other methods. For epidemics at low, medium, and high levels, the range of transmission range difference for GHB falls within 0.037–0.085, 0.058–0.178, and 0.067–0.254, respectively. The smallest transmission range difference for GHB occurs during low-level epidemics with a recognition rate p of 0.5, which is 0.037. Conversely, in high-level epidemics with a recognition rate p of 0.05, the highest transmission range difference for GHB reaches 0.254. As the recognition rate p of super spreaders decreases, and the epidemiological level increases, the extent of the transmission range difference for GHB also increases. This implies that the identified super spreaders possess a more robust transmission capability. Moreover, as the recognition rate p of super spreaders increases, the results obtained by various methods significantly overlap. In scenarios characterized by lower epidemiological levels, the transmission ability of super spreaders is more constrained. In these cases, the GHB method proves more effective, although the difference is not particularly pronounced.

Each method has its own characteristics, so it has different performance. Among the identification methods, it is observed that the WEC method yields relatively poor transmission capability in its identification results. This is primarily due to the presence of nodes with exceptionally high degree in the network, resulting in a phenomenon where the centrality scores tend to concentrate around these high-degree nodes. As a consequence, the discriminative power among scores for other nodes becomes significantly reduced. Similarly, the s-shell method also demonstrates limited transmission capability in its identification results. This can be attributed to a shared drawback with the k-shell method, namely, the inability to precisely partition nodes within the same shell. Consequently, the s-shell method falls short in providing a nuanced quantification of the transmission capabilities of different nodes located within the same shell. In contrast, the WBC method evaluates the significance of nodes as pivotal transmission points in a weighted network. A higher WBC value implies a greater likelihood of disease transmission occurring through that particular node. The Gravity method takes into account both the NS of nodes and their neighboring nodes, as well as the distance between them. However, it neglects the network's community structure and does not achieve the desired identification results. Conversely, the GHB method takes into account the community structure inherent in the passenger contact network. This approach offers a more accurate representation of the epidemiological interactions among passengers. As a result, its identification results exhibit significantly enhanced transmission ability at different epidemiological levels.

4 Conclusion

The GHB method identifies public transport super spreaders with higher epidemiological transmission ability. As the identification proportion of super spreaders decreases and the epidemiological level increases, the GHB method becomes more effective.

However, this paper does not consider further infection caused by the virus spreading inside the bus after the infected passenger has exited the bus, or exposure caused by the infected passenger while waiting on the platform. More comprehensive pathways of epidemic spread will be considered in our future studies.

References

Morawska, L., Cao, J.: Airborne transmission of SARS-CoV-2: The world should face the reality. Environ Int. 139, 105730 (2020). 2020-06-01
Google Scholar
Liu, Y., et al.: Characterizing super-spreading in microblog: An epidemic-based information propagation model. Physica A 463, 202–218 (2016). 2016-12-01
Google Scholar
Kang, L., Ling, Y., Zhanwu, M., Fan, Z., Juanjuan, Z.: Investigating physical encounters of individuals in urban metro systems with large-scale smart card data. Physica A: Statist. Mecha. Appl. 545 (2020). 2020-05-01
Google Scholar
Mo, B., et al.: Modeling epidemic spreading through public transit using time-varying encounter network. Trans. Res. Part C: Emerg. Technol. 122, 102893 (2021)
Article Google Scholar
Qian, X., Sun, L., Ukkusuri, S.V.: Scaling of contact networks for epidemic spreading in urban transit systems. Scientific Reports 11 (2021). 2021-02-23
Google Scholar
Hajdu, L., Bóta, A., Krész, M., Khani, A., Gardner, L.M.: Discovering the hidden community structure of public transportation networks. Netw. Spat. Econ. 20, 209–231 (2020)
Article Google Scholar
Kumar, P., Khani, A., Lind, E., Levin, J.: Estimation and mitigation of epidemic risk on a public transit route using automatic passenger count data. Trans. Res. Record 2675 (2021). 2021-05-01
Google Scholar
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences - PNAS, vol. 105, pp. 1118–1123 (2008). 2008-01-01
Google Scholar
Eidsaa, M., Almaas, E.: s-core network decomposition: a generalization of k-core analysis to weighted networks. Physical review. E, Statistical, nonlinear, and soft matter physics 88, 062819 (2013). 2013-01-01
Google Scholar
Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Social Networks 32, 245–251 (2010). 2010-01-01
Google Scholar
Newman, M.E.: Analysis of weighted networks. Phys. Rev. E Stat. Nonlin. Soft. Matter. Phys. 70, 056131 (2004). 2004-11-01
Google Scholar
Ma, L., Ma, C., Zhang, H., Wang, B.: Identifying influential spreaders in complex networks based on gravity formula. Physica A 451, 205–212 (2016)
Article Google Scholar
Zhao, J., Rahbee, A., Wilson, N.H.M.: Estimating a rail passenger trip origin-destination matrix using automatic data collection systems. Comp.-Aided Civil and Infrastr. Eng. 22, 376–387 (2007). 2007-01-01
Google Scholar
Chen, J., Yang, D.: Estimating smart card commuters origin-destination distribution based on APTS data. J. Transport. Sys. Eng. Info. Technol. 13, 47–53 (2013). 2013-08-15
Google Scholar

Download references

Author information

Authors and Affiliations

School of Civil Engineering, Xi’an University of Architecture and Technology, No. 13, Yanta Road, Beilin, Xi’an, China
Jun Chen, Zaiqi Li, Zixuan Zhang & Xiaowei Li

Authors

Jun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zaiqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Zixuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Chen .

Editor information

Editors and Affiliations

North Minzu University, Yinchuan, Ningxia, China
Andrii Bieliatynskyi
North Minzu University, Yinchuan, China
Dmytro Komyshev
Northeast University School of Resources, Northeastern University, Shenyang, China
Wen Zhao

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Li, Z., Zhang, Z., Li, X. (2024). A Method for Identifying Public Transportation Super Spreaders Considering Community Structure. In: Bieliatynskyi, A., Komyshev, D., Zhao, W. (eds) Proceedings of Conference on Sustainable Traffic and Transportation Engineering in 2023. CSTTE 2023. Lecture Notes in Civil Engineering, vol 603. Springer, Singapore. https://doi.org/10.1007/978-981-97-5814-2_40

Download citation

DOI: https://doi.org/10.1007/978-981-97-5814-2_40
Published: 31 July 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5813-5
Online ISBN: 978-981-97-5814-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics