Abstract
Ridership modeling is a growing field critical for Intelligent Transportation. Accurate traffic prediction and early surge detection are vital components in designing public transit dispatching systems. However, modeling Spatio-temporal traffic at a small geographic scale and fine time granularity is challenging due to the sparseness, low signal-to-noise ratio, and the large dimensionality of the mobility network data. We propose a framework for edge-level traffic prediction to tackle these challenges, which addresses the curse of dimensionality through a pipeline of appropriate network aggregation, nonlinear modeling, and final edge-level disaggregation. Subsequently, we show that the low-dimensional aggregated space model residuals are more suited for anomaly detection than raw ridership data. Our framework is evaluated using the for-hire vehicle and taxi ridership dataset from the two airports in New York City, experimenting with different network aggregation techniques and modeling paradigms. The results reinstate the superiority of the proposed pipeline in ridership prediction and anomaly detection compared with single-model methods, and help build up scenario design for transportation simulation and planning.
This is a preview of subscription content, access via your institution.



Data Availability Statement
This manuscript has associated data in a data repository. [Authors’ comment: The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. Original datasets’ sources are publicly available, please refer to Sect. 3.]
References
L. Figueiredo, I. Jesus, J.A.T. Machado, J.R. Ferreira, J.L. Martins De Carvalho, Towards the development of intelligent transportation systems. In: ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No. 01TH8585), pp. 1206–1211. IEEE (2001)
M.L. Anderson, Subways, strikes, and slowdowns: the impacts of public transit on traffic congestion. Am. Econ. Rev. 104(9), 2763–96 (2014)
E.L. Glaeser, M.E. Kahn, The greenness of cities: carbon dioxide emissions and urban development. J. Urban Econ. 67(3), 404–418 (2010)
L. Zha, Y. Yin, X. Zhengtian, Geometric matching and spatial pricing in ride-sourcing markets. Transp. Res. Part C Emerg. Technol. 92, 58–75 (2018)
S. Qida, D.Z.W. Wang, Morning commute problem with supply management considering parking and ride-sourcing. Transp. Res. Part C Emerg. Technol. 105, 626–647 (2019)
D.M. Hawkins, Identification of Outliers, vol. 11 (Springer, Berlin, 1980)
M. He, S. Pathak, U. Muaz, J. Zhou, S. Saini, S. Malinchik, S. Sobolevsky, Pattern and anomaly detection in urban temporal networks. arXiv preprint arXiv:1912.01960 (2019)
V. Hodge, J. Austin, A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, Z. Li, Deep multi-view spatial-temporal network for taxi demand prediction. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
X. Qian, S.V. Ukkusuri, C. Yang, F. Yan, Short-term demand forecasting for on-demand mobility service. IEEE Trans. Intell. Transp. Syst. 23(2), 1019–1029 (2022). https://doi.org/10.1109/TITS.2020.3019509
X. Qian, S.V. Ukkusuri, Spatial variation of the urban taxi ridership using GPS data. Appl. Geogr. 59, 31–42 (2015)
X. Jun, R. Rahmatizadeh, L. Bölöni, D. Turgut, Real-time prediction of taxi demand using recurrent neural networks. IEEE Trans. Intell. Transp. Syst. 19(8), 2572–2581 (2017)
D. Correa, K. Xie, K. Ozbay, Exploring the taxi and uber demand in New York City: an empirical analysis and spatial modeling. Technical report (2017)
M.P. Souza, A.A.M. Oliveira, M.A. Pereira, F.A.L. Reis, P.E.M. Almeida, E.J. Silva, D.S. Crepalde, Optimization of taxi cabs assignment using a geographical location-based system in distinct offer and demand scenarios. Rev. Bras. De Cartogr 68, 1143–1155 (2015)
H.R. Sayarshad, J.J. Chow, Survey and empirical evaluation of nonhomogeneous arrival process models with taxi data. J. Adv. Transp. 50(7), 1275–1294 (2016)
F. Rodrigues, I. Markou, F.C. Pereira, Combining time-series and textual data for taxi demand prediction in event areas: a deep learning approach. Inf. Fusion 49, 120–129 (2019)
K. Zhao, D. Khryashchev, J. Freire, C. Silva, H. Vo, Predicting taxi demand at high spatial resolution: approaching the limit of predictability. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 833–842. IEEE (2016)
G.E.P. Box, D.A. Pierce, Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Am. Stat. Assoc. 65(332), 1509–1526 (1970)
D. Simon, Kalman filtering. Embed. Syst. Program. 14(6), 72–79 (2001)
L. Moreira-Matias, J. Gama, M. Ferreira, L. Damas, A predictive model for the passenger demand on a taxi network. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems, pp. 1014–1019. IEEE (2012)
M. Levin, Y.-D. Tsao. On forecasting freeway occupancies and volumes (abridgment). Transp. Res. Record (773) (1980)
B.M. Williams, P.K. Durvasula, D.E. Brown, Urban freeway traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models. Transp. Res. Record 1644(1), 132–141 (1998)
A. Klein, Forecasting the Antwerp maritime traffic flows using transformations and intervention models. J. Forecast. 15(5), 395–412 (1996)
H. Lütkepohl, Linear transformations of vector ARMA processes. J. Econom. 26(3), 283–293 (1984)
P.E. Pfeifer, S.J. Deutsch, Seasonal space-time ARIMA modeling. Geogr. Anal. 13(2), 117–133 (1981)
Q.Y. Ding, X.F. Wang, X.Y. Zhang, Z.Q. Sun, Forecasting traffic volume with space-time ARIMA model. Adv. Mater. Res. 156, 979–983 (2011)
S.R. Chandra, H. Al-Deek, Predictions of freeway traffic speeds and volumes using vector autoregressive models. J. Intell. Transp. Syst. 13(2), 53–72 (2009)
J. Ke, S. Feng, Z. Zhu, H. Yang, J. Ye, Joint predictions of multi-modal ride-hailing demands: a deep multi-task multi-graph learning-based approach. Transp. Res. Part C Emerg. Technol. 127, 103063 (2021)
J. Ke, X. Qin, H. Yang, Z. Zheng, Z. Zhu, J. Ye, Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network. Transp. Res. Part C Emerg. Technol. 122, 102858 (2021)
J. Tang, J. Liang, F. Liu, J. Hao, Y. Wang, Multi-community passenger demand prediction at region level based on spatio-temporal graph convolutional network. Transp. Res. Part C Emerg. Technol. 124, 102951 (2021)
H. Yao, X. Tang, H. Wei, G. Zheng, Y. Yu, Z. Li, Modeling spatial-temporal dynamics for traffic prediction. arXiv preprint arXiv:1803.01254 (2018)
J. Ke, H. Zheng, H. Yang, X.M. Chen, Short-term forecasting of passenger demand under on-demand ride services: a spatio-temporal deep learning approach. Transp. Res. Part C Emerg. Technol. 85, 591–608 (2017)
Yu. Junbo Zhang, J.S. Zheng, D. Qi, Flow prediction in spatio-temporal networks based on multitask deep learning. IEEE Trans. Knowl. Data Eng. 32(3), 468–478 (2019)
C. Zhang, F. Zhu, X. Wang, L. Sun, H. Tang, Y. Lv, Taxi demand prediction using parallel multi-task learning model. IEEE Trans. Intell. Transpo. Syst. 23(2), 794–803 (2022). https://doi.org/10.1109/TITS.2020.3015542
L. Liu, Z. Qiu, G. Li, Q. Wang, W. Ouyang, L. Lin, Contextualized spatial-temporal network for taxi origin-destination demand prediction. IEEE Trans. Intell. Transp. Syst. 20(10), 3875–3887 (2019)
A. Klein, C. Craun, R.S. Lee, Airport delay prediction using weather-impacted traffic index (WITI) model. In: 29th Digital Avionics Systems Conference, pp. 2–B. IEEE (2010)
M. Defferrard, X. Bresson, P. Vandergheynst, Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst. 29 (2016)
B. Yu, H. Yin, Z. Zhu, Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017)
W. Chen, Yu. Ling Chen, W.C. Xie, Y. Gao, X. Feng, Multi-range attentive bicomponent graph convolutional network for traffic forecasting. Proc. AAAI Conf. Artif. Intell. 34, 3529–3536 (2020)
Z. Pan, F. Cai, W. Chen, H. Chen, M. de Rijke, Star graph neural networks for session-based recommendation. In: Proceedings of the 29th ACM International Conference on Information and Knowledge Management, pp. 1195–1204 (2020)
Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, J. Ye, W. Lv, The simpler the better: a unified approach to predicting original taxi demands based on large-scale online platforms. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1653–1662. ACM (2017)
L. Breiman, Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
M. Ahmed, A.N. Mahmood, J. Hu, A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 60, 19–31 (2016)
J. Bohannon, Tweeting the London riots. Science 336(6083), 831 (2012). https://doi.org/10.1126/science.336.6083.831
H. Abdelhaq, C. Sengstock, M. Gertz, Eventweet: online localized event detection from twitter. Proc. VLDB Endow. 6(12), 1326–1329 (2013)
X. Kong, H. Gao, O. Alfarraj, Q. Ni, C. Zheng, G. Shen, HUAD: hierarchical urban anomaly detection based on spatio-temporal data. IEEE Access 8, 26573–26582 (2020)
K. Pasini, Forecast and anomaly detection on time series with dynamic context. Application to the mining of transit ridership data. PhD thesis, Université gustave eiffel (2021)
A. Lakhina, M. Crovella, C. Diot, Diagnosing network-wide traffic anomalies. ACM SIGCOMM Comput. Commun. Rev. 34(4), 219–230 (2004)
C.C. Aggarwal, An introduction to outlier analysis, in Outlier Analysis. (Springer, Berlin, 2017), pp. 1–34
H. Ringberg, A. Soule, J. Rexford, C. Diot, Sensitivity of PCA for traffic anomaly detection. In: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 109–120 (2007)
D. Brauckhoff, K. Salamatian, M. May, Applying PCA for traffic anomaly detection: problems and solutions. In: IEEE INFOCOM 2009, pp. 2866–2870. IEEE (2009)
M. Girvan, M.E.J. Newman, Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Z. Chen, W. Hendrix, N.F. Samatova, Community-based anomaly detection in evolutionary networks. J. Intell. Inf. Syst. 39(1), 59–85 (2012)
D. Liu, C.-H. Lung, I. Lambadaris, N. Seddigh, Network traffic anomaly detection using clustering techniques and performance comparison. In: 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–4. IEEE (2013)
D. Tran, W. Ma, D. Sharma, Network anomaly detection using fuzzy gaussian mixture models. Int. J. Future Gener. Commun. Netw. 1(1), 37–42 (2006)
W. Kuang, S. An, H. Jiang, Detecting traffic anomalies in urban areas using taxi GPS data. Math. Probl. Eng. 2015 (2015)
New York City Taxi&Â Limousine Commission. Tlc trip record data. https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page. Accessed: 2019-11-30
D. Zhang, T. He, S. Lin, S. Munir, J.A. Stankovic, Taxi-passenger-demand modeling based on big data from a roving sensor network. IEEE Trans. Big Data 3(3), 362–374 (2016)
F. Miao, S. Han, S. Lin, J.A. Stankovic, D. Zhang, S. Munir, H. Huang, T. He, G.J. Pappas, Taxi dispatch with real-time sensing data in metropolitan areas: a receding horizon control approach. IEEE Trans. Autom. Sci. Eng. 13(2), 463–478 (2016)
C. Kamga, M.A. Yazici, A. Singhal, Analysis of taxi demand and supply in New York City: implications of recent taxi regulations. Transp. Plan. Technol. 38(6), 601–625 (2015)
B. Mutzabaugh, Flight delays mount at nyc airports as pre-christmas storm slams region, Herald (2018)
R.E. Shapire , Y. Singer. Boostexter: a system for multi-label text categorization. Mach. Learn. 39(2/3), 135–168 (1998)
A. Liaw, M. Wiener et al., Classification and regression by randomforest. R News 2(3), 18–22 (2002)
G. Biau, E. Scornet, A random forest guided tour. TEST 25(2), 197–227 (2016)
Y. She, A.B. Owen, Outlier detection using nonconvex penalized regression. J Am Stat Assoc 106(494), 626–639 (2011)
A.A. Green, M.D. Craig, Analysis of aircraft spectrometer data with logarithmic residuals. In: JPL Proceedings of the Airborne Imaging Spectrometer Data Anal. Workshop (1985)
L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, G. Varoquaux. API design for machine learning software: experiences from the SCIkit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122 (2013)
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, SCIkit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
J. Davis , M. Goadrich. The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)
Official Holidays. TLC trip record data. https://www.officeholidays.com/countries/usa/new-york/2018. Accessed: 2019-11-30
National Centers for Environmental Information. Storm events database. https://www.ncdc.noaa.gov/stormevents/listevents.jsp?eventType=ALL&beginDate_mm=01&beginDate_dd=01&beginDate_yyyy=2018&endDate_mm=12&endDate_dd=31&endDate_yyyy=2018&county=NEW%2BYORK%3A61&hailfilter=0.00&tornfilter=0&windfilter=000&sort=DT&submitbutton=Search&statefips=36%2CNEW+YORK. Accessed: 2019-11-30
Acknowledgements
This work is partly funded by the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, under Award Number DE-EE0008524. The authors are solely responsible for the findings in this paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A Tables
Appendix B Figures
Rights and permissions
About this article
Cite this article
He, M., Muaz, U., Jiang, H. et al. Ridership prediction and anomaly detection in transportation hubs: an application to New York City. Eur. Phys. J. Spec. Top. 231, 1655–1671 (2022). https://doi.org/10.1140/epjs/s11734-022-00551-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1140/epjs/s11734-022-00551-4