Abstract
Predicting transportation mode choice is a critical component of forecasting travel demand. Recently, machine learning methods have become increasingly more popular in predicting transportation mode choice. Class association rules (CARs) have been applied to transportation mode choice, but the application of the imputed rules for prediction remains a long-standing challenge. Based on CARs, this paper proposes a new rule merging approach, called CARM, to improve predictive accuracy. In the suggested approach, first, CARs are imputed from the frequent pattern tree (FP-tree) based on the frequent pattern growth (FP-growth) algorithm. Next, the rules are pruned based on the concept of pessimistic error rate. Finally, the rules are merged to form new rules without increasing predictive error. Using the 2015 Dutch National Travel Survey, the performance of suggested model is compared with the performance of CARIG that uses the information gain statistic to generate new rules, class-based association rules (CBA), decision trees (DT) and the multinomial logit (MNL) model. In addition, the proposed model is assessed using a ten-fold cross validation test. The results show that the accuracy of the proposed model is 91.1%, which outperforms CARIG, CBA, DT and the MNL model.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.: Mining Association in Large Databases. Proc. 1993 ACM SIGMOD Int. Conf. Manag. data - SIGMOD ’93. 207–216 (1993). https://doi.org/10.1145/170036.170072
Arentze, T.A., Timmermans, H.J.P.: A learning-based transportation oriented simulation system. Transp. Res. Part B Methodol. 38, 613–633 (2004). https://doi.org/10.1016/j.trb.2002.10.001
Arentze, T., Hofman, F., van Mourik, H., Timmermans, H.J.P.: ALBATROSS: multiagent, rule-based model of activity pattern decisions. Transp. Res. Rec. 1706, 136–144 (2000)
Azmi, M., Runger, G.C., Berrado, A.: Interpretable regularized class association rules algorithm for classification in a categorical data space. Inf. Sci. 483, 313–331 (2019)
Azmi, M., Berrado, A.: Class-association rules pruning using regularization. Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl. AICCSA. 0, 1–7 (2016). https://doi.org/10.1109/AICCSA.2016.7945625
Beckman, J.D., Goulias, K.G.: Immigration, residential location, car ownership, and commuting behavior: A multivariate latent class analysis from California. Transportation (amst). 35, 655–671 (2008). https://doi.org/10.1007/s11116-008-9172-x
Cascetta, E., Papola, A.: Random utility models with implicit availability/perception of choice alternatives for the simulation of travel demand. Transp. Res. Part C Emerg. Technol. 9, 249–263 (2001)
Cheng, L., Chen, X., De Vos, J., Lai, X., Witlox, F.: Applying a random forest method approach to model travel mode choice behavior. Travel Behav. Soc. 14, 1–10 (2019)
Chu, K.K.A., Chapleau, R.: Augmenting transit trip characterization and travel behavior comprehension: Multiday location-stamped smart card transactions. Transp. Res. Rec. 29–40 (2010)
Dabiri, S., Lu, C.T., Heaslip, K., Reddy, C.K.: Semi-supervised deep learning approach for transportation mode identification using GPS trajectory data. IEEE Trans. Knowl. Data Eng. 32, 1010–1023 (2020)
Delgado-Osuna, J.A., García-Martínez, C., Gómez-Barbadillo, J., Ventura, S.: Heuristics for interesting class association rule mining a colorectal cancer database. Inf. Process. Manag. 57, 102207 (2020). https://doi.org/10.1016/j.ipm.2020.102207
Diana, M.: Studying patterns of use of transport modes through data mining. Transp. Res. Rec. (2012). https://doi.org/10.3141/2308-01
Feng, T., Timmermans, H.J.P.: Transportation mode recognition using GPS and accelerometer data. Transp. Res. Part C Emerg. Technol. 37, 118–130 (2013)
Feng, T., Timmermans, H.J.P.: Comparison of advanced imputation algorithms for detection of transportation mode and activity episode using GPS data. Transp. Plan. Technol. 39, 180–194 (2016)
Guo, J., Feng, T., Timmermans, H.J.P.: Co-dependent workplace, residence and commuting mode choice: Results of a multi-dimensional mixed logit model with panel effects. Cities (2020). https://doi.org/10.1016/j.cities.2019.102448
Guo, J., Feng, T., Timmermans, H.J.P.: Modeling co-dependent choice of workplace, residence and commuting mode using an error component mixed logit model. Transportation (Amst) (2020). https://doi.org/10.1007/s11116-018-9927-y
Hafezi, M.H., Liu, L., Millward, H.: A time-use activity-pattern recognition model for activity-based travel demand modeling. Transportation (amst). 46, 1369–1394 (2019). https://doi.org/10.1007/s11116-017-9840-9
Hagenauer, J., Helbich, M.: A comparative study of machine learning classifiers for modeling travel mode choice. Expert Syst. Appl. 78, 273–282 (2017). https://doi.org/10.1016/j.eswa.2017.01.057
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. (ACM Spec. Interes. Gr. Manag. Data). 29, 1–12 (2004). https://doi.org/10.1145/335191.335372
Hensher, D.A., Ton, T.T.: A comparison of the predictive potential of artificial neural networks and nested logit models for commuter mode choice. Transp Res Part E Logist Transp Rev (2000). https://doi.org/10.1016/S1366-5545(99)00030-7
Hu, S., Liang, Q., Qian, H., Weng, J., Zhou, W., Lin, P.: Frequent-pattern growth algorithm based association rule mining method of public transport travel stability. Int. J. Sustain. Transp. (2020). https://doi.org/10.1080/15568318.2020.1827318
Huang, R., Liu, J., Chen, H., Li, Z., Liu, J., Li, G., Guo, Y., Wang, J.: An effective fault diagnosis method for centrifugal chillers using associative classification. Appl. Therm. Eng. 136, 633–642 (2018)
Ibrahim, S.P.S., Chandran, K.R.: Compact Weighted Class Association Rule Mining Using Information Gain. Int. J. Data Min. Knowl. Manag. Process. 1, 1–13 (2011)
Keuleers, B., Wets, G., Arentze, T., Timmermans, H.: Association rules in identification of spatial-temporal patterns in multiday activity diary data. Transp. Res. Rec. (2001). https://doi.org/10.3141/1752-05
Keuleers, B., Wets, G., Timmermans, H., Arentze, T., Vanhoof, K.: Stationary and time-varying patterns in activity diary panel data: Explorative analysis with association rules. Transp. Res. Rec. 9–15 (2002)
Kim, S., Rasouli, S., Timmermans, H., Yang, D.: Estimating panel effects in probabilistic representations of dynamic decision trees using Bayesian generalized linear mixture models. Transp. Res. Part B Methodol. 111, 168–184 (2018)
Koppelman, F.S., Sethi, V.: Incorporating variance and covariance heterogeneity in the Generalized Nested Logit model: An application to modeling long distance travel choice behavior. Transp. Res. Part B Methodol. 39, 825–853 (2005)
Kusumastuti, D., Hannes, E., Janssens, D., Wets, G., Dellaert, B.G.C.: Scrutinizing individuals’ leisure-shopping travel decisions to appraise activity-based models of travel demand. Transportation (amst). 37, 647–661 (2010). https://doi.org/10.1007/s11116-010-9272-2
Lee, J.K., Yoo, K.E., Song, K.H.: A study on travelers’ transport mode choice behavior using the mixed logit model: A case study of the Seoul-Jeju route. J. Air Transp. Manag. 56, 131–137 (2016). https://doi.org/10.1016/j.jairtraman.2016.04.020
Lee, D., Derrible, S., Pereira, F.C.: Comparison of Four Types of Artificial Neural Network and a Multinomial Logit Model for Travel Mode Choice Modeling. Transp. Res. Rec. 2672, 101–112 (2018). https://doi.org/10.1177/0361198118796971
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules, pp. 369–376. Proc. - IEEE Int. Conf. Data Mining, ICDM (2001)
Li, L., Zhu, J., Zhang, H., Tan, H., Du, B., Ran, B.: Coupled application of generative adversarial networks and conventional neural networks for travel mode detection using GPS data. Transp. Res. Part A Policy Pract. 136, 282–292 (2020). https://doi.org/10.1016/j.tra.2020.04.005
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. Proc. fourth Int. Conf. Knowl. Discov. Data Min. 24–25 (1998)
Lu, Y., Kawamura, K.: Data-Mining Approach to Work Trip Mode Choice Analysis in Chicago, Illinois. Area. Transp. Res. Rec. J. Transp. Res. Board. 2156, 73–80 (2010)
Omrani, H.: Predicting travel mode of individuals by machine learning. Transp. Res. Procedia. 10, 840–849 (2015). https://doi.org/10.1016/j.trpro.2015.09.037
Paulssen, M., Temme, D., Vij, A., Walker, J.L.: Values, attitudes and travel behavior: A hierarchical latent variable mixed logit model of travel mode choice. Transportation 41, 873–888 (2014). https://doi.org/10.1007/s11116-013-9504-3
Pitombo, C.S., Kawamoto, E., Sousa, A.J.: An exploratory analysis of relationships between socioeconomic, land use, activity participation variables and travel patterns. Transp. Policy. 18, 347–357 (2011). https://doi.org/10.1016/j.tranpol.2010.10.010
Rasouli, S., Kim, S., Yang, D.: Albatross IV: from single day to multi time horizon travel demand forecasting. In: 97th Transportation Research Board Annual Meeting (2018)
Rasouli, S., Timmermans, H.J.P.: Using ensembles of decision trees to predict transport mode choice decisions: Effects on predictive success and uncertainty estimates. Eur. J. Transp. Infrastruct. Res. 14, 412–424 (2014)
Seeniselvi, T., Imrankhan, R.: Personalized Mobile Search Engine by Analyzing Query Travel Patterns with Association Rule Mining. Int. J. 2, 199–205 (2013)
Sekhar, C.R., Minal, Madhu, E.: Mode choice analysis using random forest decision trees. Transp. Res. Procedia. 17, 644–652 (2016)
Shanmugam, L., Ramasamy, M.: Study on mode choice using nested logit models in travel towards Chennai metropolitan city. J. Ambient Intell. Humaniz. Comput. (2021). https://doi.org/10.1007/s12652-020-02868-1
Shao, Y., Liu, B., Wang, S., Li, G.: A novel software defect prediction based on atomic class-association rule mining. Expert Syst. Appl. 114, 237–254 (2018). https://doi.org/10.1016/j.eswa.2018.07.042
Shao, Y., Liu, B., Wang, S., Li, G.: Software defect prediction based on correlation weighted class association rule mining. Knowledge-Based Syst. 196, 105742 (2020). https://doi.org/10.1016/j.knosys.2020.105742
Song, K., Lee, K.: Predictability-based collective class association rule mining. Expert Syst. Appl. 79, 1–7 (2017). https://doi.org/10.1016/j.eswa.2017.02.024
Srinivasan, S., Bhat, C.R., Holguin-Veras, J.: Empirical analysis of the impact of security perception on intercity mode choice: A panel rank-ordered mixed logit model. Transp. Res. Rec. (2006). https://doi.org/10.3141/1942-02
Srivastava, M., Sekhar, C.R.: Web Survey Data and Commuter Mode Choice Analysis Using Artificial Neural Network. Int. J. Traffic Transp. Eng. 8, 359–371 (2018)
Supattranuwong, S., Sinthupinyo, S., Juwattanasamran, P.: Applying Data Mining to Analyze Travel Pattern in Searching Travel Destination Choices. Int. J. Eng. Sci. 2, 38–44 (2013)
Tang, L., Xiong, C., Zhang, L.: Decision tree method for modeling travel mode switching in a dynamic behavioral process. Transp. Plan. Technol. 38, 833–850 (2015). https://doi.org/10.1080/03081060.2015.1079385
Wang, F., Ross, C.L.: Machine Learning Travel Mode Choices: Comparing the Performance of an Extreme Gradient Boosting Model with a Multinomial Logit Model. Transp. Res. Rec. 2672, 35–45 (2018). https://doi.org/10.1177/0361198118773556
Weng, J., Tu, Q., Yuan, R., Lin, P., Chen, Z.: Modeling mode choice behaviors for public transport commuters in Beijing. J. Urban Plan. Dev. 144, 1–9 (2018)
Wets, G., Vanhoof, K., Arentze, T., Timmermans, H.: Identifying decision structures underlying activity patterns: An exploration of data mining algorithms. Transp. Res. Rec. (2000). https://doi.org/10.3141/1718-01
Xiao, G., Juan, Z., Zhang, C.: Travel mode detection based on GPS track data and Bayesian networks. Comput. Environ. Urban Syst. 54, 14–22 (2015). https://doi.org/10.1016/j.compenvurbsys.2015.05.005
Xie, C., Lu, J., Parkany, E.: Work Travel Mode Choice Modeling with Data Mining: Decision Trees and Neural Networks. Transp. Res. Rec. (2003). https://doi.org/10.3141/1854-06
Yamamoto, T., Kitamura, R., Fujii, J.: Driver’s route choice behavior: Analysis by data mining algorithms. Transp. Res. Rec. (2002). https://doi.org/10.3141/1807-08
Zhan, G., Yan, X., Zhu, S., Wang, Y.: Using hierarchical tree-based regression model to examine university student travel frequency and mode choice patterns in China. Transp. Policy. 45, 55–65 (2016). https://doi.org/10.1016/j.tranpol.2015.09.006
Zhang, J., Feng, T., Timmermans, H.J.P., Lin, Z.: Association rules and prediction of travel choices: a case study of transportation mode choice. In: 99th Annual Meeting of the Transportation Research Board (2019)
Zhang, Y., Xie, Y.: Travel mode choice modeling with support vector machines. Transp. Res. Rec. (2008). https://doi.org/10.3141/2076-16
Zhao, X., Yan, X., Yu, A., Van Hentenryck, P.: Prediction and behavioral analysis of travel mode choice: A comparison of machine learning and logit models. Travel Behav. Soc. 20, 22–35 (2020)
Zhu, Z., Chen, X., Xiong, C., Zhang, L.: A mixed Bayesian network for two-dimensional decision modeling of departure time and mode choice. Transportation 45, 1499–1522 (2018). https://doi.org/10.1007/s11116-017-9770-6
Acknowledgements
This work was supported by China Scholarship Council.
Author information
Authors and Affiliations
Contributions
The authors contributed to the paper as follows: study design: J. Zhang, T. Feng, H.J.P. Timmermans and Z. Lin; computer programming and analysis: J. Zhang; interpretation of results: J. Zhang, T. Feng, H.J.P. Timmermans and Z. Lin; draft manuscript preparation: J. Zhang and H.J.P. Timmermans. All authors reviewed the draft manuscript and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
An example of dataset with 3 classes has 60 observations, where the training dataset is from \({x}_{1}\) to \({x}_{45}\), and the test dataset is from \({x}_{46}\) to \({x}_{60}\). The more details are listed in Table 7.
Appendix 2
In Table 8, there are 152 rules (CARs) are obtained from FP-tree, where slashes represent redundant rules. The rules have been ordered in accordance with the requirements.
Appendix 3
After pruning, 47 rules are obtained in Table 9. In the training dataset, these rules are used to obtain \(cCAR\_record\) and \(cCAR\_record\). From \(cCAR\_record\) and \(cCAR\_record\), \(cCAR\), \(wCAR\) and \(wcCAR\) are found.
Appendix 4
\(FrequentRules\) and \(RareRules\) are listed in Tables 10 and 11, respectively, where 2 + 13 stands for the conditions of rule 2 in Table 9 and rule 13 in Table 9 are merged to form a new rule, and the class label of this new rule is same to the class label of rule 13. For each new rule, Local_Confidence and Local_Support are calculated.
Appendix 5
Rights and permissions
About this article
Cite this article
Zhang, J., Feng, T., Timmermans, H. et al. Improved imputation of rule sets in class association rule modeling: application to transportation mode choice. Transportation 50, 63–106 (2023). https://doi.org/10.1007/s11116-021-10238-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11116-021-10238-9