Skip to main content

Ridership prediction and anomaly detection in transportation hubs: an application to New York City

Abstract

Ridership modeling is a growing field critical for Intelligent Transportation. Accurate traffic prediction and early surge detection are vital components in designing public transit dispatching systems. However, modeling Spatio-temporal traffic at a small geographic scale and fine time granularity is challenging due to the sparseness, low signal-to-noise ratio, and the large dimensionality of the mobility network data. We propose a framework for edge-level traffic prediction to tackle these challenges, which addresses the curse of dimensionality through a pipeline of appropriate network aggregation, nonlinear modeling, and final edge-level disaggregation. Subsequently, we show that the low-dimensional aggregated space model residuals are more suited for anomaly detection than raw ridership data. Our framework is evaluated using the for-hire vehicle and taxi ridership dataset from the two airports in New York City, experimenting with different network aggregation techniques and modeling paradigms. The results reinstate the superiority of the proposed pipeline in ridership prediction and anomaly detection compared with single-model methods, and help build up scenario design for transportation simulation and planning.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Data Availability Statement

This manuscript has associated data in a data repository. [Authors’ comment: The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. Original datasets’ sources are publicly available, please refer to Sect. 3.]

References

  1. L. Figueiredo, I. Jesus, J.A.T. Machado, J.R. Ferreira, J.L. Martins De Carvalho, Towards the development of intelligent transportation systems. In: ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No. 01TH8585), pp. 1206–1211. IEEE (2001)

  2. M.L. Anderson, Subways, strikes, and slowdowns: the impacts of public transit on traffic congestion. Am. Econ. Rev. 104(9), 2763–96 (2014)

    Article  Google Scholar 

  3. E.L. Glaeser, M.E. Kahn, The greenness of cities: carbon dioxide emissions and urban development. J. Urban Econ. 67(3), 404–418 (2010)

    Article  Google Scholar 

  4. L. Zha, Y. Yin, X. Zhengtian, Geometric matching and spatial pricing in ride-sourcing markets. Transp. Res. Part C Emerg. Technol. 92, 58–75 (2018)

    Article  Google Scholar 

  5. S. Qida, D.Z.W. Wang, Morning commute problem with supply management considering parking and ride-sourcing. Transp. Res. Part C Emerg. Technol. 105, 626–647 (2019)

    Article  Google Scholar 

  6. D.M. Hawkins, Identification of Outliers, vol. 11 (Springer, Berlin, 1980)

    Book  MATH  Google Scholar 

  7. M. He, S. Pathak, U. Muaz, J. Zhou, S. Saini, S. Malinchik, S. Sobolevsky, Pattern and anomaly detection in urban temporal networks. arXiv preprint arXiv:1912.01960 (2019)

  8. V. Hodge, J. Austin, A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)

    Article  MATH  Google Scholar 

  9. H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, Z. Li, Deep multi-view spatial-temporal network for taxi demand prediction. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  10. X. Qian, S.V. Ukkusuri, C. Yang, F. Yan, Short-term demand forecasting for on-demand mobility service. IEEE Trans. Intell. Transp. Syst. 23(2), 1019–1029 (2022). https://doi.org/10.1109/TITS.2020.3019509

  11. X. Qian, S.V. Ukkusuri, Spatial variation of the urban taxi ridership using GPS data. Appl. Geogr. 59, 31–42 (2015)

    Article  Google Scholar 

  12. X. Jun, R. Rahmatizadeh, L. Bölöni, D. Turgut, Real-time prediction of taxi demand using recurrent neural networks. IEEE Trans. Intell. Transp. Syst. 19(8), 2572–2581 (2017)

    Google Scholar 

  13. D. Correa, K. Xie, K. Ozbay, Exploring the taxi and uber demand in New York City: an empirical analysis and spatial modeling. Technical report (2017)

  14. M.P. Souza, A.A.M. Oliveira, M.A. Pereira, F.A.L. Reis, P.E.M. Almeida, E.J. Silva, D.S. Crepalde, Optimization of taxi cabs assignment using a geographical location-based system in distinct offer and demand scenarios. Rev. Bras. De Cartogr 68, 1143–1155 (2015)

    Google Scholar 

  15. H.R. Sayarshad, J.J. Chow, Survey and empirical evaluation of nonhomogeneous arrival process models with taxi data. J. Adv. Transp. 50(7), 1275–1294 (2016)

    Article  Google Scholar 

  16. F. Rodrigues, I. Markou, F.C. Pereira, Combining time-series and textual data for taxi demand prediction in event areas: a deep learning approach. Inf. Fusion 49, 120–129 (2019)

    Article  Google Scholar 

  17. K. Zhao, D. Khryashchev, J. Freire, C. Silva, H. Vo, Predicting taxi demand at high spatial resolution: approaching the limit of predictability. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 833–842. IEEE (2016)

  18. G.E.P. Box, D.A. Pierce, Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Am. Stat. Assoc. 65(332), 1509–1526 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  19. D. Simon, Kalman filtering. Embed. Syst. Program. 14(6), 72–79 (2001)

    Google Scholar 

  20. L. Moreira-Matias, J. Gama, M. Ferreira, L. Damas, A predictive model for the passenger demand on a taxi network. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems, pp. 1014–1019. IEEE (2012)

  21. M. Levin, Y.-D. Tsao. On forecasting freeway occupancies and volumes (abridgment). Transp. Res. Record (773) (1980)

  22. B.M. Williams, P.K. Durvasula, D.E. Brown, Urban freeway traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models. Transp. Res. Record 1644(1), 132–141 (1998)

    Article  Google Scholar 

  23. A. Klein, Forecasting the Antwerp maritime traffic flows using transformations and intervention models. J. Forecast. 15(5), 395–412 (1996)

    Article  Google Scholar 

  24. H. Lütkepohl, Linear transformations of vector ARMA processes. J. Econom. 26(3), 283–293 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  25. P.E. Pfeifer, S.J. Deutsch, Seasonal space-time ARIMA modeling. Geogr. Anal. 13(2), 117–133 (1981)

    Article  Google Scholar 

  26. Q.Y. Ding, X.F. Wang, X.Y. Zhang, Z.Q. Sun, Forecasting traffic volume with space-time ARIMA model. Adv. Mater. Res. 156, 979–983 (2011)

    Google Scholar 

  27. S.R. Chandra, H. Al-Deek, Predictions of freeway traffic speeds and volumes using vector autoregressive models. J. Intell. Transp. Syst. 13(2), 53–72 (2009)

    Article  Google Scholar 

  28. J. Ke, S. Feng, Z. Zhu, H. Yang, J. Ye, Joint predictions of multi-modal ride-hailing demands: a deep multi-task multi-graph learning-based approach. Transp. Res. Part C Emerg. Technol. 127, 103063 (2021)

    Article  Google Scholar 

  29. J. Ke, X. Qin, H. Yang, Z. Zheng, Z. Zhu, J. Ye, Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network. Transp. Res. Part C Emerg. Technol. 122, 102858 (2021)

    Article  Google Scholar 

  30. J. Tang, J. Liang, F. Liu, J. Hao, Y. Wang, Multi-community passenger demand prediction at region level based on spatio-temporal graph convolutional network. Transp. Res. Part C Emerg. Technol. 124, 102951 (2021)

    Article  Google Scholar 

  31. H. Yao, X. Tang, H. Wei, G. Zheng, Y. Yu, Z. Li, Modeling spatial-temporal dynamics for traffic prediction. arXiv preprint arXiv:1803.01254 (2018)

  32. J. Ke, H. Zheng, H. Yang, X.M. Chen, Short-term forecasting of passenger demand under on-demand ride services: a spatio-temporal deep learning approach. Transp. Res. Part C Emerg. Technol. 85, 591–608 (2017)

    Article  Google Scholar 

  33. Yu. Junbo Zhang, J.S. Zheng, D. Qi, Flow prediction in spatio-temporal networks based on multitask deep learning. IEEE Trans. Knowl. Data Eng. 32(3), 468–478 (2019)

    Article  Google Scholar 

  34. C. Zhang, F. Zhu, X. Wang, L. Sun, H. Tang, Y. Lv, Taxi demand prediction using parallel multi-task learning model. IEEE Trans. Intell. Transpo. Syst. 23(2), 794–803 (2022). https://doi.org/10.1109/TITS.2020.3015542

  35. L. Liu, Z. Qiu, G. Li, Q. Wang, W. Ouyang, L. Lin, Contextualized spatial-temporal network for taxi origin-destination demand prediction. IEEE Trans. Intell. Transp. Syst. 20(10), 3875–3887 (2019)

    Article  Google Scholar 

  36. A. Klein, C. Craun, R.S. Lee, Airport delay prediction using weather-impacted traffic index (WITI) model. In: 29th Digital Avionics Systems Conference, pp. 2–B. IEEE (2010)

  37. M. Defferrard, X. Bresson, P. Vandergheynst, Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst. 29 (2016)

  38. B. Yu, H. Yin, Z. Zhu, Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017)

  39. W. Chen, Yu. Ling Chen, W.C. Xie, Y. Gao, X. Feng, Multi-range attentive bicomponent graph convolutional network for traffic forecasting. Proc. AAAI Conf. Artif. Intell. 34, 3529–3536 (2020)

    Google Scholar 

  40. Z. Pan, F. Cai, W. Chen, H. Chen, M. de Rijke, Star graph neural networks for session-based recommendation. In: Proceedings of the 29th ACM International Conference on Information and Knowledge Management, pp. 1195–1204 (2020)

  41. Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, J. Ye, W. Lv, The simpler the better: a unified approach to predicting original taxi demands based on large-scale online platforms. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1653–1662. ACM (2017)

  42. L. Breiman, Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  43. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  44. M. Ahmed, A.N. Mahmood, J. Hu, A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 60, 19–31 (2016)

    Article  Google Scholar 

  45. J. Bohannon, Tweeting the London riots. Science 336(6083), 831 (2012). https://doi.org/10.1126/science.336.6083.831

  46. H. Abdelhaq, C. Sengstock, M. Gertz, Eventweet: online localized event detection from twitter. Proc. VLDB Endow. 6(12), 1326–1329 (2013)

    Article  Google Scholar 

  47. X. Kong, H. Gao, O. Alfarraj, Q. Ni, C. Zheng, G. Shen, HUAD: hierarchical urban anomaly detection based on spatio-temporal data. IEEE Access 8, 26573–26582 (2020)

    Article  Google Scholar 

  48. K. Pasini, Forecast and anomaly detection on time series with dynamic context. Application to the mining of transit ridership data. PhD thesis, Université gustave eiffel (2021)

  49. A. Lakhina, M. Crovella, C. Diot, Diagnosing network-wide traffic anomalies. ACM SIGCOMM Comput. Commun. Rev. 34(4), 219–230 (2004)

    Article  Google Scholar 

  50. C.C. Aggarwal, An introduction to outlier analysis, in Outlier Analysis. (Springer, Berlin, 2017), pp. 1–34

    Chapter  MATH  Google Scholar 

  51. H. Ringberg, A. Soule, J. Rexford, C. Diot, Sensitivity of PCA for traffic anomaly detection. In: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 109–120 (2007)

  52. D. Brauckhoff, K. Salamatian, M. May, Applying PCA for traffic anomaly detection: problems and solutions. In: IEEE INFOCOM 2009, pp. 2866–2870. IEEE (2009)

  53. M. Girvan, M.E.J. Newman, Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  54. Z. Chen, W. Hendrix, N.F. Samatova, Community-based anomaly detection in evolutionary networks. J. Intell. Inf. Syst. 39(1), 59–85 (2012)

    Article  Google Scholar 

  55. D. Liu, C.-H. Lung, I. Lambadaris, N. Seddigh, Network traffic anomaly detection using clustering techniques and performance comparison. In: 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–4. IEEE (2013)

  56. D. Tran, W. Ma, D. Sharma, Network anomaly detection using fuzzy gaussian mixture models. Int. J. Future Gener. Commun. Netw. 1(1), 37–42 (2006)

    Google Scholar 

  57. W. Kuang, S. An, H. Jiang, Detecting traffic anomalies in urban areas using taxi GPS data. Math. Probl. Eng. 2015 (2015)

  58. New York City Taxi& Limousine Commission. Tlc trip record data. https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page. Accessed: 2019-11-30

  59. D. Zhang, T. He, S. Lin, S. Munir, J.A. Stankovic, Taxi-passenger-demand modeling based on big data from a roving sensor network. IEEE Trans. Big Data 3(3), 362–374 (2016)

    Article  Google Scholar 

  60. F. Miao, S. Han, S. Lin, J.A. Stankovic, D. Zhang, S. Munir, H. Huang, T. He, G.J. Pappas, Taxi dispatch with real-time sensing data in metropolitan areas: a receding horizon control approach. IEEE Trans. Autom. Sci. Eng. 13(2), 463–478 (2016)

    Article  Google Scholar 

  61. C. Kamga, M.A. Yazici, A. Singhal, Analysis of taxi demand and supply in New York City: implications of recent taxi regulations. Transp. Plan. Technol. 38(6), 601–625 (2015)

    Article  Google Scholar 

  62. B. Mutzabaugh, Flight delays mount at nyc airports as pre-christmas storm slams region, Herald (2018)

  63. R.E. Shapire , Y. Singer. Boostexter: a system for multi-label text categorization. Mach. Learn. 39(2/3), 135–168 (1998)

  64. A. Liaw, M. Wiener et al., Classification and regression by randomforest. R News 2(3), 18–22 (2002)

    Google Scholar 

  65. G. Biau, E. Scornet, A random forest guided tour. TEST 25(2), 197–227 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  66. Y. She, A.B. Owen, Outlier detection using nonconvex penalized regression. J Am Stat Assoc 106(494), 626–639 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  67. A.A. Green, M.D. Craig, Analysis of aircraft spectrometer data with logarithmic residuals. In: JPL Proceedings of the Airborne Imaging Spectrometer Data Anal. Workshop (1985)

  68. L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, G. Varoquaux. API design for machine learning software: experiences from the SCIkit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122 (2013)

  69. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, SCIkit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

  70. J. Davis , M. Goadrich. The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)

  71. Official Holidays. TLC trip record data. https://www.officeholidays.com/countries/usa/new-york/2018. Accessed: 2019-11-30

  72. National Centers for Environmental Information. Storm events database. https://www.ncdc.noaa.gov/stormevents/listevents.jsp?eventType=ALL&beginDate_mm=01&beginDate_dd=01&beginDate_yyyy=2018&endDate_mm=12&endDate_dd=31&endDate_yyyy=2018&county=NEW%2BYORK%3A61&hailfilter=0.00&tornfilter=0&windfilter=000&sort=DT&submitbutton=Search&statefips=36%2CNEW+YORK. Accessed: 2019-11-30

Download references

Acknowledgements

This work is partly funded by the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, under Award Number DE-EE0008524. The authors are solely responsible for the findings in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stanislav Sobolevsky.

Appendices

Appendix A Tables

See Tables 5, 6, 7, and 8.

Appendix B Figures

See Figs. 4, 5, 6, 7, 8, 9, 10, and 11.

Fig. 4
figure 4

Boroughs New York

Fig. 5
figure 5

Taxi zones New York

Fig. 6
figure 6

Taxi zones to 6 communities

Fig. 7
figure 7

Taxi zones to 24 communities

Fig. 8
figure 8

JFK threshold of anomalies in mean absolute error

Fig. 9
figure 9

LGA threshold of anomalies in mean absolute error

Fig. 10
figure 10

Normalized residual distribution at LGA

Fig. 11
figure 11

Consistency distribution at JFK

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, M., Muaz, U., Jiang, H. et al. Ridership prediction and anomaly detection in transportation hubs: an application to New York City. Eur. Phys. J. Spec. Top. 231, 1655–1671 (2022). https://doi.org/10.1140/epjs/s11734-022-00551-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1140/epjs/s11734-022-00551-4