Clustering Activity–Travel Behavior Time Series using Topological Data Analysis

Chen, Renjie; Zhang, Jingyue; Ravishanker, Nalini; Konduri, Karthik

doi:10.1007/s42421-019-00008-6

Clustering Activity–Travel Behavior Time Series using Topological Data Analysis

Original Paper
Published: 23 October 2019

Volume 1, pages 109–121, (2019)
Cite this article

Journal of Big Data Analytics in Transportation Aims and scope Submit manuscript

Renjie Chen ORCID: orcid.org/0000-0003-1670-1947¹,
Jingyue Zhang²,
Nalini Ravishanker¹ &
…
Karthik Konduri²

911 Accesses
7 Citations
Explore all metrics

Abstract

Over the last few years, traffic data have been exploding and the transportation discipline has entered the era of big data. It brings out new opportunities for doing data-driven analysis, but it also challenges traditional analytic methods. This paper proposes a new divide and combine-based approach to do K-means clustering on activity–travel behavior time series using features that are derived using tools in time series analysis and topological data analysis. Our approach facilitates a case study, where each individual’s daily activity–travel behavior is characterized as a categorical time series consisting of three different levels. Clustering data from five waves of the National Household Travel Survey ranging from 1990 to 2017 suggests that activity–travel patterns of individuals over the last 3 decades can be grouped into three clusters. Results also provide evidence in support of recent claims about differences in activity–travel patterns of different survey cohorts. The proposed method is generally applicable and is not limited only to activity–travel behavior analysis in transportation studies. Driving behavior, travel mode choice, household vehicle ownership, when being characterized as categorical time series, can all be analyzed using the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigation of Changes in Passenger Behavior Using Longitudinal Smart Card Data

Article 16 October 2020

Clustering of Trajectory Data Using Hierarchical Approaches

Clustering of Urban Road Paths; Identifying the Optimal Set of Linear and Nonlinear Clustering Features

Notes

A brief review is provided in the appendix. For details on TDA, see Edelsbrunner and Harer (2010); Wang et al. (2018)

References

Bubenik P (2015) Statistical topological data analysis using persistence landscapes. J Mach Learn Res 16(1):77–102
MathSciNet MATH Google Scholar
Calabrese F, Diao M, Lorenzo GD, Ferreira J, Ratti C (2013) Understanding individual mobility patterns from urban sensing data: a mobile phone trace example. Trans Res Part C: Emerg Technol 26:301–313
Article Google Scholar
Candia J, González MC, Wang P, Schoenharl T, Madey G, Barabási AL (2008) Uncovering individual and collective human dynamics from mobile phone records. J Phys A: Math Theor 41(22):224015
Article MathSciNet Google Scholar
Carlsson G (2009) Topology and data. Bull Am Math Soc 46(2):255–308
Article MathSciNet Google Scholar
Edelsbrunner H, Harer J (2010) Computational Topology. American Mathematical Society, An Introduction
Figueiras P, Silva R, Ramos A, Guerreiro G, Costa R, Jardim-Goncalves R (2016) Big data processing and storage framework for its: a case study on dynamic tolling. ASME 2016 International Mechanical Engineering Congress and Exposition
Goulias KG (1999) Longitudinal analysis of activity and travel pattern dynamics using generalized mixed markov latent class models. Trans Res Part B: Methodol 33(8):535–558
Article Google Scholar
Huang J, Levinson D, Wang J, Zhou J, Zj Wang (2018) Tracking job and housing dynamics with smartcard data. Proc Natl Acad Sci 115(50):12710–12715
Article Google Scholar
Jandui Silva LLVSFF Bárbara França (2015) Towards smart traffic lights using big data to improve urban traffic. SMART 2015: The Fourth International Conference on Smart Systems, Devices and Technologies
Joh CH, Arentze T, Timmermans H (2001) Pattern recognition in complex activity travel patterns: comparison of euclidean distance, signal-processing theoretical, and multidimensional sequence alignment methods. Trans Res Record J Trans Res Board 1752:16–22
Article Google Scholar
Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: an analysis and critique. Strateg Manag J 17(6):441–458
Article Google Scholar
Kwan MP (2000) Interactive geovisualization of activity-travel patterns using three dimensional geographical information systems: a methodological exploration with a large data set. Trans Res Part C Emerg Technol 8:185–203
Article Google Scholar
Pas EI (1988) Weekly travel-activity behavior. Transportation 15(1):89–109
Google Scholar
Recker WW, McNally MG, Root GS (1985) Travel/activity analysis: pattern recognition, classification and interpretation. Transp Res Part A Gen 19(4):279–296
Article Google Scholar
Shanks JL (1969) Computation of the fast Walsh–Fourier transform. IEEE Trans Comput 18(5):457–459
Article Google Scholar
Federal Highway Administration (2017) 2017 National Household Travel Survey. U.S, Department of Transportation, Washington, DC
Shoval N, Isaacson M (2007) Sequence alignment as a method for human activity analysis in space and time. Ann Assoc Am Geogr 97:282–297
Article Google Scholar
Stoffer DS (1991) Walsh–Fourier analysis and its statistical applications. J Am Stat Assoc 86(414):461–479
Article MathSciNet Google Scholar
Stolz BJ, Harrington HA, Porter MA (2017) Persistent homology of time-dependent functional networks constructed from coupled time series. Chaos Interdiscip J Nonlinear Sci 27(4):47410
Article MathSciNet Google Scholar
Thorndike RL (1953) Who belongs in the family. Psychometrika pp 267–276
Article Google Scholar
Wang Y, Ombao H, Chung MK (2018) Topological data analysis of single-trial electroencephalographic signals. Ann Appl Stat 12(3):1506
Article MathSciNet Google Scholar
Wilson C (2001) Activity patterns of canadian women: application of clustalg sequence alignment software. Transp Res Record 1777(1):55–67
Article Google Scholar
Zhang A, Kang JE, Axhausen K, Kwon C (2018) Multi-day activity-travel pattern sampling based on single-day data. Transp Res Part C: Emerg Technol 89:96–112
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to the editor and anonymous reviewers whose suggestions helped enhance this paper.

Author information

Authors and Affiliations

Department of Statistics, University of Connecticut, Mansfield, USA
Renjie Chen & Nalini Ravishanker
Department of Civil and Environmental Engineering, University of Connecticut, Mansfield, USA
Jingyue Zhang & Karthik Konduri

Authors

Renjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jingyue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Nalini Ravishanker
View author publications
You can also search for this author in PubMed Google Scholar
Karthik Konduri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renjie Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: TDA and the First-order Persistence Landscape

We start with a brief review of topological data analysis (TDA), which is now an emerging area for analyzing big data with complex structures. Using computational homology, TDA is aimed at analyzing the topological features of data and representing these features using low-dimensional representations (Carlsson 2009). The input to TDA is often a set of data points (point cloud) or a function, and persistence homology distills essential topological features in the data, which can then be used together with suitable dissimilarity measures to identify patterns in the data sets. We discuss TDA on functions, which is the approach developed in Sections 2 and 3.

Computational Procedure for TDA on Functions

We look at the method to construct persistence diagrams on functions using the sublevel set filtration. Figure 10 shows the simple procedure of extracting a persistence diagram from a function. Suppose $y_j = f(j), j=1,\dots , 10$ and let the sublevel set be $L_r = \{y_j|y_j \le r\}$. TDA is used to construct the persistence diagram based on $L_r$.

(i)
When $r = 0$, a connected component is identified (marked as a blue dot, which is the oldest connected component). The vertical slash line of the second plot records the “birth time $= 0$” and the horizontal slash line indicates r. There is no point on the birth/death plot, since no connected components died at $r=0$.
(ii)
When $r = 0.5$, there are two more connected components coming out (indicated in blue); the blue dot in the middle with a blue line connecting it to the dark green dot indicates that the oldest connected component “enlarges” and is “still alive”. The other black vertical slash line in the second plot gives the “birth time” for the other two new connected components. There is no connected component dead yet, and hence no points are shown on the birth/death plot.
(iii)
When $r = 1$, all old components “enlarge” and there is one newer component “killed” by the older one. Therefore, there is a “black dot with birth $= 0.5$ and death $= 1$” shown on the second plot.
(iv)
When $r = 2$, the last component is “killed, birth $= 0$, death $= 2$”, which is the black dot on the location (0, 2). The other black dot corresponding to (0.5, 1.5) of the second plot tells the “birth and death” of another connected component.

First-Order Persistence Landscape

First, in the persistence diagram obtained using the sublevel set filtration, the furthest point away from the diagonal line is always born at the minimum value of the function and dies at the maximum value of the function.

Second, referring to the definition of persistence landscape in Section 2.3 from Bubenik (2015), given a persistence diagram $\{(b_i, d_i), \forall i\}$, the first-order persistence landscape is

$$\begin{aligned} \text{ PL }(\ell ) = \max _i\{ \min (\ell -b_i, d_i-\ell )_+ \}, \end{aligned}$$

where $ \ell $ is a real number. Because the persistence diagram uses a sublevel set filtration, it has the point $(d_{\min }, d_{\max })$. For all $(b_i, d_i)$ that belong to the persistence diagram, $d_{\min }\le b_i \le d_i \le d_{\max }$. Therefore, for any real number $\ell $, $\ell -d_{\min } \ge \ell -b_i$ and $d_{\max } - \ell \ge d_i - \ell $, which implies that

$$\begin{aligned} \min (\ell -d_{\min }, d_{\max }-\ell )_+ \ge \min (\ell -b_i, d_i-\ell )_+ , \end{aligned}$$

which in turn implies that

$$\begin{aligned} \text{ PL }(\ell )= & {} \max _i\{ \min (\ell -b_i, d_i-\ell )_+ \} \\= & {} \min (\ell -d_{\min }, d_{\max }-\ell )_+. \end{aligned}$$

Finally, let $(d_{\min }, d_{\max }) \subset (D_{\min }, D_{\max })$ and taking grids $\{D_{\min }+\frac{(\ell -1)*(D_{\max }-D_{\min })}{L-1},\ell =1, 2, \ldots , L\}$, we have

$$\begin{aligned} \text{ PL }(\ell ) = \min (V_1 (\ell ), V_2 (\ell ))_{+}, \end{aligned}$$

where,

$$\begin{aligned} V_1(\ell )= & {} D_{\min } + \frac{(\ell -1) (D_{\max }-D_{\min })}{L-1} - d_{\min } \\ V_2(\ell )= & {} d_{\max } - D_{\min } - \frac{(\ell -1) (D_{\max } - D_{\min })}{L-1}. \end{aligned}$$

These expressions will be used on the WFT function obtained from each time series $n=1,\ldots ,N$ in Section 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, R., Zhang, J., Ravishanker, N. et al. Clustering Activity–Travel Behavior Time Series using Topological Data Analysis. J. Big Data Anal. Transp. 1, 109–121 (2019). https://doi.org/10.1007/s42421-019-00008-6

Download citation

Received: 07 June 2019
Revised: 30 September 2019
Accepted: 03 October 2019
Published: 23 October 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s42421-019-00008-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering Activity–Travel Behavior Time Series using Topological Data Analysis

Abstract

Access this article

Similar content being viewed by others

Investigation of Changes in Passenger Behavior Using Longitudinal Smart Card Data

Clustering of Trajectory Data Using Hierarchical Approaches

Clustering of Urban Road Paths; Identifying the Optimal Set of Linear and Nonlinear Clustering Features

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: TDA and the First-order Persistence Landscape

Computational Procedure for TDA on Functions

First-Order Persistence Landscape

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering Activity–Travel Behavior Time Series using Topological Data Analysis

Abstract

Access this article

Similar content being viewed by others

Investigation of Changes in Passenger Behavior Using Longitudinal Smart Card Data

Clustering of Trajectory Data Using Hierarchical Approaches

Clustering of Urban Road Paths; Identifying the Optimal Set of Linear and Nonlinear Clustering Features

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: TDA and the First-order Persistence Landscape

Appendix: TDA and the First-order Persistence Landscape

Computational Procedure for TDA on Functions

First-Order Persistence Landscape

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation