Abstract
Over the last few years, traffic data have been exploding and the transportation discipline has entered the era of big data. It brings out new opportunities for doing data-driven analysis, but it also challenges traditional analytic methods. This paper proposes a new divide and combine-based approach to do K-means clustering on activity–travel behavior time series using features that are derived using tools in time series analysis and topological data analysis. Our approach facilitates a case study, where each individual’s daily activity–travel behavior is characterized as a categorical time series consisting of three different levels. Clustering data from five waves of the National Household Travel Survey ranging from 1990 to 2017 suggests that activity–travel patterns of individuals over the last 3 decades can be grouped into three clusters. Results also provide evidence in support of recent claims about differences in activity–travel patterns of different survey cohorts. The proposed method is generally applicable and is not limited only to activity–travel behavior analysis in transportation studies. Driving behavior, travel mode choice, household vehicle ownership, when being characterized as categorical time series, can all be analyzed using the proposed method.
Similar content being viewed by others
References
Bubenik P (2015) Statistical topological data analysis using persistence landscapes. J Mach Learn Res 16(1):77–102
Calabrese F, Diao M, Lorenzo GD, Ferreira J, Ratti C (2013) Understanding individual mobility patterns from urban sensing data: a mobile phone trace example. Trans Res Part C: Emerg Technol 26:301–313
Candia J, González MC, Wang P, Schoenharl T, Madey G, Barabási AL (2008) Uncovering individual and collective human dynamics from mobile phone records. J Phys A: Math Theor 41(22):224015
Carlsson G (2009) Topology and data. Bull Am Math Soc 46(2):255–308
Edelsbrunner H, Harer J (2010) Computational Topology. American Mathematical Society, An Introduction
Figueiras P, Silva R, Ramos A, Guerreiro G, Costa R, Jardim-Goncalves R (2016) Big data processing and storage framework for its: a case study on dynamic tolling. ASME 2016 International Mechanical Engineering Congress and Exposition
Goulias KG (1999) Longitudinal analysis of activity and travel pattern dynamics using generalized mixed markov latent class models. Trans Res Part B: Methodol 33(8):535–558
Huang J, Levinson D, Wang J, Zhou J, Zj Wang (2018) Tracking job and housing dynamics with smartcard data. Proc Natl Acad Sci 115(50):12710–12715
Jandui Silva LLVSFF Bárbara França (2015) Towards smart traffic lights using big data to improve urban traffic. SMART 2015: The Fourth International Conference on Smart Systems, Devices and Technologies
Joh CH, Arentze T, Timmermans H (2001) Pattern recognition in complex activity travel patterns: comparison of euclidean distance, signal-processing theoretical, and multidimensional sequence alignment methods. Trans Res Record J Trans Res Board 1752:16–22
Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: an analysis and critique. Strateg Manag J 17(6):441–458
Kwan MP (2000) Interactive geovisualization of activity-travel patterns using three dimensional geographical information systems: a methodological exploration with a large data set. Trans Res Part C Emerg Technol 8:185–203
Pas EI (1988) Weekly travel-activity behavior. Transportation 15(1):89–109
Recker WW, McNally MG, Root GS (1985) Travel/activity analysis: pattern recognition, classification and interpretation. Transp Res Part A Gen 19(4):279–296
Shanks JL (1969) Computation of the fast Walsh–Fourier transform. IEEE Trans Comput 18(5):457–459
Federal Highway Administration (2017) 2017 National Household Travel Survey. U.S, Department of Transportation, Washington, DC
Shoval N, Isaacson M (2007) Sequence alignment as a method for human activity analysis in space and time. Ann Assoc Am Geogr 97:282–297
Stoffer DS (1991) Walsh–Fourier analysis and its statistical applications. J Am Stat Assoc 86(414):461–479
Stolz BJ, Harrington HA, Porter MA (2017) Persistent homology of time-dependent functional networks constructed from coupled time series. Chaos Interdiscip J Nonlinear Sci 27(4):47410
Thorndike RL (1953) Who belongs in the family. Psychometrika pp 267–276
Wang Y, Ombao H, Chung MK (2018) Topological data analysis of single-trial electroencephalographic signals. Ann Appl Stat 12(3):1506
Wilson C (2001) Activity patterns of canadian women: application of clustalg sequence alignment software. Transp Res Record 1777(1):55–67
Zhang A, Kang JE, Axhausen K, Kwon C (2018) Multi-day activity-travel pattern sampling based on single-day data. Transp Res Part C: Emerg Technol 89:96–112
Acknowledgements
The authors are grateful to the editor and anonymous reviewers whose suggestions helped enhance this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: TDA and the First-order Persistence Landscape
Appendix: TDA and the First-order Persistence Landscape
We start with a brief review of topological data analysis (TDA), which is now an emerging area for analyzing big data with complex structures. Using computational homology, TDA is aimed at analyzing the topological features of data and representing these features using low-dimensional representations (Carlsson 2009). The input to TDA is often a set of data points (point cloud) or a function, and persistence homology distills essential topological features in the data, which can then be used together with suitable dissimilarity measures to identify patterns in the data sets. We discuss TDA on functions, which is the approach developed in Sections 2 and 3.
Computational Procedure for TDA on Functions
We look at the method to construct persistence diagrams on functions using the sublevel set filtration. Figure 10 shows the simple procedure of extracting a persistence diagram from a function. Suppose \(y_j = f(j), j=1,\dots , 10\) and let the sublevel set be \(L_r = \{y_j|y_j \le r\}\). TDA is used to construct the persistence diagram based on \(L_r\).
- (i)
When \(r = 0\), a connected component is identified (marked as a blue dot, which is the oldest connected component). The vertical slash line of the second plot records the “birth time \(= 0\)” and the horizontal slash line indicates r. There is no point on the birth/death plot, since no connected components died at \(r=0\).
- (ii)
When \(r = 0.5\), there are two more connected components coming out (indicated in blue); the blue dot in the middle with a blue line connecting it to the dark green dot indicates that the oldest connected component “enlarges” and is “still alive”. The other black vertical slash line in the second plot gives the “birth time” for the other two new connected components. There is no connected component dead yet, and hence no points are shown on the birth/death plot.
- (iii)
When \(r = 1\), all old components “enlarge” and there is one newer component “killed” by the older one. Therefore, there is a “black dot with birth \(= 0.5\) and death \(= 1\)” shown on the second plot.
- (iv)
When \(r = 2\), the last component is “killed, birth \(= 0\), death \(= 2\)”, which is the black dot on the location (0, 2). The other black dot corresponding to (0.5, 1.5) of the second plot tells the “birth and death” of another connected component.
First-Order Persistence Landscape
First, in the persistence diagram obtained using the sublevel set filtration, the furthest point away from the diagonal line is always born at the minimum value of the function and dies at the maximum value of the function.
Second, referring to the definition of persistence landscape in Section 2.3 from Bubenik (2015), given a persistence diagram \(\{(b_i, d_i), \forall i\}\), the first-order persistence landscape is
where \( \ell \) is a real number. Because the persistence diagram uses a sublevel set filtration, it has the point \((d_{\min }, d_{\max })\). For all \((b_i, d_i)\) that belong to the persistence diagram, \(d_{\min }\le b_i \le d_i \le d_{\max }\). Therefore, for any real number \(\ell \), \(\ell -d_{\min } \ge \ell -b_i\) and \(d_{\max } - \ell \ge d_i - \ell \), which implies that
which in turn implies that
Finally, let \((d_{\min }, d_{\max }) \subset (D_{\min }, D_{\max })\) and taking grids \(\{D_{\min }+\frac{(\ell -1)*(D_{\max }-D_{\min })}{L-1},\ell =1, 2, \ldots , L\}\), we have
where,
These expressions will be used on the WFT function obtained from each time series \(n=1,\ldots ,N\) in Section 2.
Rights and permissions
About this article
Cite this article
Chen, R., Zhang, J., Ravishanker, N. et al. Clustering Activity–Travel Behavior Time Series using Topological Data Analysis. J. Big Data Anal. Transp. 1, 109–121 (2019). https://doi.org/10.1007/s42421-019-00008-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42421-019-00008-6