Skip to main content

Optimization Models for Estimating Transit Network Origin–Destination Flows with Big Transit Data

Abstract

The increasing adoption of automatic vehicle location and automatic passenger count technologies by transit agencies produces passenger boarding and alighting count data on a continuous basis. This data provides new opportunities for origin–destination (O–D) flow estimation. However, the state-of-the-art methodologies generated flows within routes and barely considered linked trips. This paper proposes optimization models to identify transfers and approximate network-level O–D flows by: a quadratic integer program (QIP), a feasible rounding procedure for the quadratic convex programming (QCP) relaxation of the QIP, and an integer program (IP). A case study for Ann Arbor-Ypsilanti area in Michigan suggests that: The IP model outperforms the QCP in terms of accuracy and remains tractable from an efficiency standpoint, contrary to the QIP. Its O–D estimation achieves an R-Squared metric of \(95.57\%\) at the traffic analysis zone level and \(92.39\%\) at the stop level, compared to the ground-truths inferred from the state-of-the-practice trip-chaining methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. The terms transfer probabilities and transfer rates are referred to as the proportions of transfers.

References

  • Alsger AA, Mesbah M, Luis F, Safi H (2015) Use of smart card fare data to estimate public transport origin-destination matrix. Transp Res Rec J Transp Res Board 2535:88–96

    Article  Google Scholar 

  • Alsger A, Assemi B, Mesbah M, Ferreira L (2016) Validating and improving public transport origin-destination estimation algorithm using smart card fare data. Transp Res Part C Emerg Technol 68:490–506

    Article  Google Scholar 

  • Badu-Marfo G, Farooq B, Patterson Z (2019) A perspective on the challenges and opportunities for privacy-aware big transportation data. J Big Data Anal Transp 1(1):1–23. https://doi.org/10.1007/s42421-019-00001-z

    Article  Google Scholar 

  • Barry J, Newhouser R, Rahbee A, Sayeda S (2002) Origin and destination estimation in New York City with automated fare system data. Transp Res Rec J Transp Res Board 1817:183–187

    Article  Google Scholar 

  • Ben-Akiva ME, Macke PP, Hsu PS (1985) Alternative methods to estimate route-level trip tables and expand on-board surveys. Transp Res Rec 1037:1–11

    Google Scholar 

  • Chu X (2004) Ridership models at the stop level. Technical report, National Center for Transit Research, University of South Florida

  • Deming WE, Stephan FF (1940) On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Stat. 11(4):427–444. https://doi.org/10.1214/aoms/1177731829

    MathSciNet  Article  MATH  Google Scholar 

  • Devillaine F, Munizaga M, Trépanier M (2012) Detection of activities of public transport users by analyzing smart card data. Transp Res Rec J Transp Res Board 2276:48–55

    Article  Google Scholar 

  • Furth PG, Hemily B, Muller THJ, Strathman JG (2003) Uses of archived AVL-APC data to improve transit performance and management: review and potential. Technical report, Transit Cooperative Research Program

  • Furth PG, Strathman JG, Hemily B (2005) Making automatic passenger counts mainstream: accuracy, balancing algorithms, and data structures. Transp Res Rec 1927(1):206–216

    Article  Google Scholar 

  • Golani H (2007) Use of archived bus location, dispatch, and ridership data for transit analysis. Transp Res Rec 1992(1):101–112

    Article  Google Scholar 

  • Google Transit APIs: GTFS Static Overview (2019). https://developers.google.com/transit/gtfs/

  • Gurobi Optimization, LLC: Gurobi optimizer reference manual (2019). http://www.gurobi.com

  • Iliopoulou C, Kepaptsoglou K (2019) Combining its and optimization in public transportation planning: state of the art and future research paths. Eur Transp Res Rev. https://doi.org/10.1186/s12544-019-0365-5

    Article  Google Scholar 

  • James G, Witten D, Hastie T, Tibshirani R (2013) Unsupervised learning. In: An introduction to statistical learning. Springer, New York

    Chapter  Google Scholar 

  • Jang W (2010) Travel time and transfer analysis using transit smart card data. Transp Res Rec 2144(1):142–149. https://doi.org/10.3141/2144-16

    MathSciNet  Article  Google Scholar 

  • Ji Y, Mishalani RG, McCord MR (2014) Estimating transit route OD flow matrices from APC data on multiple bus trips using the IPF method with an iteratively improved base: method and empirical evaluation. J Transp Eng 140(5):04014008. https://doi.org/10.1061/(ASCE)TE.1943-5436.0000647

    Article  Google Scholar 

  • Ji Y, Mishalani RG, McCord MR (2015a) Transit passenger origin-destination flow estimation: efficiently combining onboard survey and large automatic passenger count datasets. Transp Res Part C Emerg Technol 58:178–192

    Article  Google Scholar 

  • Ji Y, You Q, Jiang S, Zhang HM (2015b) Statistical inference on transit route-level origin-destination flows using automatic passenger counter data. J Adv Transp 49(6):724–737

    Article  Google Scholar 

  • Luo D, Cats O, van Lint H (2017) Constructing transit origin-destination matrices with spatial clustering. Transp Res Rec J Transp Res Board 2652:39–49

    Article  Google Scholar 

  • Mandelzys M, Hellinga B (2010) Identifying causes of performance issues in bus schedule adherence with automatic vehicle location and passenger count data. Transp Res Rec 2143(1):9–15

    Article  Google Scholar 

  • McCord MR, Mishalani RG, Goel P, Strohl B (2010) Iterative proportional fitting procedure to determine bus route passenger origin-destination flows. Transp Res Rec 2145(1):59–65

    Article  Google Scholar 

  • Munizaga MA, Palma C (2012) Estimation of a disaggregate multimodal public transport origin-destination matrix from passive smartcard data from Santiago, Chile. Transp Res Part C Emerg Technol 24:9–18

    Article  Google Scholar 

  • Parker D (2008) AVL systems for bus transit: update. Technical report, Transit Cooperative Research Program

  • Pelletier MP, Trépanier M, Morency C (2011) Smart card data use in public transit: a literature review. Transp Res Part C Emerg Technol 19(4):557–568

    Article  Google Scholar 

  • Tamblay S, Galilea P, Iglesias P, Raveau S, Muñoz JC (2016) A zonal inference model based on observed smart-card transactions for Santiago de Chile. Transp Res Part A Policy Pract 84:44–54

    Article  Google Scholar 

  • Tavassoli A, Alsger A, Hickman M, Mesbah M (2016) How close the models are to the reality? Comparison of transit origin-destination estimates with automatic fare collection data

  • Tétreault PR, El-Geneidy AM (2010) Estimating bus run times for new limited-stop service using archived AVL and APC data. Transp Res Part A Policy Pract 44(6):390–402

    Article  Google Scholar 

  • Trépanier M, Tranchant N, Chapleau R (2007) Individual trip destination estimation in a transit smart card automated fare collection system. J Intell Transp Syst 11(1):1–14

    Article  Google Scholar 

  • United States Department of Transportation: advanced passenger counters fact sheet: transit overview (2019). https://www.pcb.its.dot.gov/factsheets/apc/apc_overview.aspx#page=tech

  • Washington S, Karlaftis MG, Mannering FL (2011) Statistical and econometric methods for transportation data analysis. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Wu L, Kang JE, Chung Y, Nikolaev A (2019) Monitoring multimodal travel environment using automated fare collection data: data processing and reliability analysis. J Big Data Anal Transp 1(2):123–146. https://doi.org/10.1007/s42421-019-00012-w

    Article  Google Scholar 

Download references

Acknowledgements

This research is funded by the Michigan Institute of Data Science (MIDAS) and by Grant 7F-30154 from the Department of Energy. The authors would like to thank Forest Yang from the AAATA for his assistance in providing the data. Findings presented in this paper do not necessarily represent the views of the funding agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinyu Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The appended figures visualize the spatial distribution of stop-level accuracy of the model estimation, with the IP approach on the Go!Pass data as an example. Figures 6 and 10 each illustrate the benchmark counts of trips originating from and terminating at each stop, in which the 2 transit centers are each treated as a stop. Figures 7 and 11 each depict the inferred counts of trips originating from and terminating at the each stop. Figures 8 and 12 each visualize the total deviation of the IP model estimation from the benchmark, also in terms of the trip counts at each stop as an origin and as a destination. Both figures show the absolute differences when comparing Figs. 7 and 11 against Figs. 6 and 10. Specifically, let \(\text{OD}^*\) denote the benchmark matrix, \(\text{OD}\) denote the matrix estimation, n denote the total number of stops, and \((i,j) \in \{1,\ldots ,n\}^2\) denote the indices (or the coordinates) of the origin and destination pairs. For each stop i as the origin, the data presented in Fig. 8 was calculated as \(\big | \sum _{j=1}^{n} \text{OD}^*_{i,j} - \sum _{j=1}^{n} \text{OD}_{i,j} \big |\). For each stop j as the destination, the data presented in Fig. 12 was calculated as \(\big | \sum _{i=1}^{n} \text{OD}^*_{i,j} - \sum _{i=1}^{n} \text{OD}_{i,j} \big |\). Figures 9 and 13 depict the L2-norm as the distance measure between the benchmark matrix and the estimations, also at the stop-level. Specifically, let the vector \(\text{OD}^*_{i,\cdot }\) denote the ith row of the benchmark matrix, and the vector \(\text{OD}_{i,\cdot }\) denote the ith row of the estimation matrix. Also, let the vector \(\text{OD}^*_{\cdot ,j}\) denote the jth column of the benchmark matrix, and the vector \(\text{OD}_{\cdot ,j}\) denote the jth column of the estimation matrix. For each stop i as the origin, the data presented in Fig. 9 was calculated as \(\left\Vert \text{OD}^*_{i,\cdot } - \text{OD}_{i, \cdot }\right\Vert _2 = \sum _{j=1}^n (\text{OD}^*_{i,j} - \text{OD}_{i,j})^2\). For each stop j as the destination, the data presented in Fig. 13 was calculated as \(\left\Vert \text{OD}^*_{\cdot ,j} - \text{OD}_{\cdot ,j}\right\Vert _2 = \sum _{i=1}^n (\text{OD}^*_{i,j} - \text{OD}_{i,j})^2\). The data in Figs. 8912 and 13 have the same unit as those in Figs. 6710, and 11, and were plotted in the same scale for comparison. In Figs. 67, 10, and 11, the size of the red circles depicts the volume of flows originating from or terminating at each stop. In Figs. 89, 12 and 13, the red circles of larger sizes correspond to larger differences between the estimation and the benchmark and poorer model performance.

Fig. 6
figure 6

Benchmark counts of trips originating from each stop

Fig. 7
figure 7

Inferred counts of trips originating from each stop

Fig. 8
figure 8

Total difference between the inferred and the benchmark counts of trips originating from each stop

Fig. 9
figure 9

L2-normed difference between the inferred and the benchmark counts of trips originating from each stop

Fig. 10
figure 10

Benchmark counts of trips terminating at each stop

Fig. 11
figure 11

Inferred counts of trips terminating at each stop

Fig. 12
figure 12

Total difference between the inferred and the benchmark counts of trips terminating at each stop

Fig. 13
figure 13

L2-normed difference between the inferred and the benchmark counts of trips terminating at each stop

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Van Hentenryck, P. & Zhao, X. Optimization Models for Estimating Transit Network Origin–Destination Flows with Big Transit Data. J. Big Data Anal. Transp. 3, 247–262 (2021). https://doi.org/10.1007/s42421-021-00050-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42421-021-00050-3

Keywords

  • Network-level origin–destination (O–D) matrix
  • Automatic vehicle location (AVL) data
  • Automatic passenger count (APC) data
  • Transfer identification
  • Integer programming