Optimization Models for Estimating Transit Network Origin–Destination Flows with Big Transit Data

Liu, Xinyu; Van Hentenryck, Pascal; Zhao, Xilei

doi:10.1007/s42421-021-00050-3

Optimization Models for Estimating Transit Network Origin–Destination Flows with Big Transit Data

Original Paper
Published: 29 October 2021

Volume 3, pages 247–262, (2021)
Cite this article

Journal of Big Data Analytics in Transportation Aims and scope Submit manuscript

376 Accesses
6 Citations
Explore all metrics

Abstract

The increasing adoption of automatic vehicle location and automatic passenger count technologies by transit agencies produces passenger boarding and alighting count data on a continuous basis. This data provides new opportunities for origin–destination (O–D) flow estimation. However, the state-of-the-art methodologies generated flows within routes and barely considered linked trips. This paper proposes optimization models to identify transfers and approximate network-level O–D flows by: a quadratic integer program (QIP), a feasible rounding procedure for the quadratic convex programming (QCP) relaxation of the QIP, and an integer program (IP). A case study for Ann Arbor-Ypsilanti area in Michigan suggests that: The IP model outperforms the QCP in terms of accuracy and remains tractable from an efficiency standpoint, contrary to the QIP. Its O–D estimation achieves an R-Squared metric of \(95.57\%\) at the traffic analysis zone level and \(92.39\%\) at the stop level, compared to the ground-truths inferred from the state-of-the-practice trip-chaining methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integer Programming Based Approaches for Multi-Trip Location Routing

Data-driven planning of reliable itineraries in multi-modal transit networks

Article 02 December 2019

Network’s Trip Demand Estimation as a Problem of Combinatorial Optimization

Notes

The terms transfer probabilities and transfer rates are referred to as the proportions of transfers.

References

Alsger AA, Mesbah M, Luis F, Safi H (2015) Use of smart card fare data to estimate public transport origin-destination matrix. Transp Res Rec J Transp Res Board 2535:88–96
Article Google Scholar
Alsger A, Assemi B, Mesbah M, Ferreira L (2016) Validating and improving public transport origin-destination estimation algorithm using smart card fare data. Transp Res Part C Emerg Technol 68:490–506
Article Google Scholar
Badu-Marfo G, Farooq B, Patterson Z (2019) A perspective on the challenges and opportunities for privacy-aware big transportation data. J Big Data Anal Transp 1(1):1–23. https://doi.org/10.1007/s42421-019-00001-z
Article Google Scholar
Barry J, Newhouser R, Rahbee A, Sayeda S (2002) Origin and destination estimation in New York City with automated fare system data. Transp Res Rec J Transp Res Board 1817:183–187
Article Google Scholar
Ben-Akiva ME, Macke PP, Hsu PS (1985) Alternative methods to estimate route-level trip tables and expand on-board surveys. Transp Res Rec 1037:1–11
Google Scholar
Chu X (2004) Ridership models at the stop level. Technical report, National Center for Transit Research, University of South Florida
Deming WE, Stephan FF (1940) On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Stat. 11(4):427–444. https://doi.org/10.1214/aoms/1177731829
Article MathSciNet MATH Google Scholar
Devillaine F, Munizaga M, Trépanier M (2012) Detection of activities of public transport users by analyzing smart card data. Transp Res Rec J Transp Res Board 2276:48–55
Article Google Scholar
Furth PG, Hemily B, Muller THJ, Strathman JG (2003) Uses of archived AVL-APC data to improve transit performance and management: review and potential. Technical report, Transit Cooperative Research Program
Furth PG, Strathman JG, Hemily B (2005) Making automatic passenger counts mainstream: accuracy, balancing algorithms, and data structures. Transp Res Rec 1927(1):206–216
Article Google Scholar
Golani H (2007) Use of archived bus location, dispatch, and ridership data for transit analysis. Transp Res Rec 1992(1):101–112
Article Google Scholar
Google Transit APIs: GTFS Static Overview (2019). https://developers.google.com/transit/gtfs/
Gurobi Optimization, LLC: Gurobi optimizer reference manual (2019). http://www.gurobi.com
Iliopoulou C, Kepaptsoglou K (2019) Combining its and optimization in public transportation planning: state of the art and future research paths. Eur Transp Res Rev. https://doi.org/10.1186/s12544-019-0365-5
Article Google Scholar
James G, Witten D, Hastie T, Tibshirani R (2013) Unsupervised learning. In: An introduction to statistical learning. Springer, New York
Chapter Google Scholar
Jang W (2010) Travel time and transfer analysis using transit smart card data. Transp Res Rec 2144(1):142–149. https://doi.org/10.3141/2144-16
Article MathSciNet Google Scholar
Ji Y, Mishalani RG, McCord MR (2014) Estimating transit route OD flow matrices from APC data on multiple bus trips using the IPF method with an iteratively improved base: method and empirical evaluation. J Transp Eng 140(5):04014008. https://doi.org/10.1061/(ASCE)TE.1943-5436.0000647
Article Google Scholar
Ji Y, Mishalani RG, McCord MR (2015a) Transit passenger origin-destination flow estimation: efficiently combining onboard survey and large automatic passenger count datasets. Transp Res Part C Emerg Technol 58:178–192
Article Google Scholar
Ji Y, You Q, Jiang S, Zhang HM (2015b) Statistical inference on transit route-level origin-destination flows using automatic passenger counter data. J Adv Transp 49(6):724–737
Article Google Scholar
Luo D, Cats O, van Lint H (2017) Constructing transit origin-destination matrices with spatial clustering. Transp Res Rec J Transp Res Board 2652:39–49
Article Google Scholar
Mandelzys M, Hellinga B (2010) Identifying causes of performance issues in bus schedule adherence with automatic vehicle location and passenger count data. Transp Res Rec 2143(1):9–15
Article Google Scholar
McCord MR, Mishalani RG, Goel P, Strohl B (2010) Iterative proportional fitting procedure to determine bus route passenger origin-destination flows. Transp Res Rec 2145(1):59–65
Article Google Scholar
Munizaga MA, Palma C (2012) Estimation of a disaggregate multimodal public transport origin-destination matrix from passive smartcard data from Santiago, Chile. Transp Res Part C Emerg Technol 24:9–18
Article Google Scholar
Parker D (2008) AVL systems for bus transit: update. Technical report, Transit Cooperative Research Program
Pelletier MP, Trépanier M, Morency C (2011) Smart card data use in public transit: a literature review. Transp Res Part C Emerg Technol 19(4):557–568
Article Google Scholar
Tamblay S, Galilea P, Iglesias P, Raveau S, Muñoz JC (2016) A zonal inference model based on observed smart-card transactions for Santiago de Chile. Transp Res Part A Policy Pract 84:44–54
Article Google Scholar
Tavassoli A, Alsger A, Hickman M, Mesbah M (2016) How close the models are to the reality? Comparison of transit origin-destination estimates with automatic fare collection data
Tétreault PR, El-Geneidy AM (2010) Estimating bus run times for new limited-stop service using archived AVL and APC data. Transp Res Part A Policy Pract 44(6):390–402
Article Google Scholar
Trépanier M, Tranchant N, Chapleau R (2007) Individual trip destination estimation in a transit smart card automated fare collection system. J Intell Transp Syst 11(1):1–14
Article Google Scholar
United States Department of Transportation: advanced passenger counters fact sheet: transit overview (2019). https://www.pcb.its.dot.gov/factsheets/apc/apc_overview.aspx#page=tech
Washington S, Karlaftis MG, Mannering FL (2011) Statistical and econometric methods for transportation data analysis. CRC Press, Boca Raton
MATH Google Scholar
Wu L, Kang JE, Chung Y, Nikolaev A (2019) Monitoring multimodal travel environment using automated fare collection data: data processing and reliability analysis. J Big Data Anal Transp 1(2):123–146. https://doi.org/10.1007/s42421-019-00012-w
Article Google Scholar

Download references

Acknowledgements

This research is funded by the Michigan Institute of Data Science (MIDAS) and by Grant 7F-30154 from the Department of Energy. The authors would like to thank Forest Yang from the AAATA for his assistance in providing the data. Findings presented in this paper do not necessarily represent the views of the funding agencies.

Author information

Authors and Affiliations

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Xinyu Liu & Pascal Van Hentenryck
Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, USA
Xilei Zhao

Authors

Xinyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Van Hentenryck
View author publications
You can also search for this author in PubMed Google Scholar
Xilei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinyu Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The appended figures visualize the spatial distribution of stop-level accuracy of the model estimation, with the IP approach on the Go!Pass data as an example. Figures 6 and 10 each illustrate the benchmark counts of trips originating from and terminating at each stop, in which the 2 transit centers are each treated as a stop. Figures 7 and 11 each depict the inferred counts of trips originating from and terminating at the each stop. Figures 8 and 12 each visualize the total deviation of the IP model estimation from the benchmark, also in terms of the trip counts at each stop as an origin and as a destination. Both figures show the absolute differences when comparing Figs. 7 and 11 against Figs. 6 and 10. Specifically, let \(\text{OD}^*\) denote the benchmark matrix, \(\text{OD}\) denote the matrix estimation, n denote the total number of stops, and \((i,j) \in \{1,\ldots ,n\}^2\) denote the indices (or the coordinates) of the origin and destination pairs. For each stop i as the origin, the data presented in Fig. 8 was calculated as \(\big | \sum _{j=1}^{n} \text{OD}^*_{i,j} - \sum _{j=1}^{n} \text{OD}_{i,j} \big |\). For each stop j as the destination, the data presented in Fig. 12 was calculated as \(\big | \sum _{i=1}^{n} \text{OD}^*_{i,j} - \sum _{i=1}^{n} \text{OD}_{i,j} \big |\). Figures 9 and 13 depict the L2-norm as the distance measure between the benchmark matrix and the estimations, also at the stop-level. Specifically, let the vector \(\text{OD}^*_{i,\cdot }\) denote the ith row of the benchmark matrix, and the vector \(\text{OD}_{i,\cdot }\) denote the ith row of the estimation matrix. Also, let the vector \(\text{OD}^*_{\cdot ,j}\) denote the jth column of the benchmark matrix, and the vector \(\text{OD}_{\cdot ,j}\) denote the jth column of the estimation matrix. For each stop i as the origin, the data presented in Fig. 9 was calculated as \(\left\Vert \text{OD}^*_{i,\cdot } - \text{OD}_{i, \cdot }\right\Vert _2 = \sum _{j=1}^n (\text{OD}^*_{i,j} - \text{OD}_{i,j})^2\). For each stop j as the destination, the data presented in Fig. 13 was calculated as \(\left\Vert \text{OD}^*_{\cdot ,j} - \text{OD}_{\cdot ,j}\right\Vert _2 = \sum _{i=1}^n (\text{OD}^*_{i,j} - \text{OD}_{i,j})^2\). The data in Figs. 8, 9, 12 and 13 have the same unit as those in Figs. 6, 7, 10, and 11, and were plotted in the same scale for comparison. In Figs. 6, 7, 10, and 11, the size of the red circles depicts the volume of flows originating from or terminating at each stop. In Figs. 8, 9, 12 and 13, the red circles of larger sizes correspond to larger differences between the estimation and the benchmark and poorer model performance.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Van Hentenryck, P. & Zhao, X. Optimization Models for Estimating Transit Network Origin–Destination Flows with Big Transit Data. J. Big Data Anal. Transp. 3, 247–262 (2021). https://doi.org/10.1007/s42421-021-00050-3

Download citation

Received: 24 June 2021
Revised: 14 September 2021
Accepted: 10 October 2021
Published: 29 October 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s42421-021-00050-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization Models for Estimating Transit Network Origin–Destination Flows with Big Transit Data

Abstract

Access this article

Similar content being viewed by others

Integer Programming Based Approaches for Multi-Trip Location Routing

Data-driven planning of reliable itineraries in multi-modal transit networks

Network’s Trip Demand Estimation as a Problem of Combinatorial Optimization

Notes

References

Acknowledgements