Abstract
A regression tree method for analyzing rank data is proposed. A key ingredient of the methodology is to convert ranks into scores by paired comparison. We then utilize the GUIDE tree method on the score vectors to identify the preference patterns in the data. This method is exempt from selection bias and the simulation results show that it is good with respect to the selection of split variables and has a better prediction accuracy than the two other investigated methods in some cases. Furthermore, it is applicable to complex data which may contain incomplete ranks and missing covariate values. We demonstrate its usefulness in two real data studies.
Similar content being viewed by others
Notes
Pearson’s Chi-Square test of independence and its Bonferroni-adjusted p value were used in CHAID, a classical tree method (Kass 1980).
References
Alvo M, Yu PLH (2014) Statistical methods for ranking data. Springer, New York
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39:324–345
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Cattelan M (2012) Models for paired comparison data: a review with emphasis on dependent data. Stat Sci 27:412–433
Cheng W, Hühn J, Hüllermeier E (2009) Decision tree and instance-based learning for label ranking. In: International conference on machine learning, Montreal
Critchlow DE (1985) Metric methods for analyzing partially ranked data. Springer, New York
D’Ambrosio A, Heiser WJ (2016) A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika 81:774–794
Davidson RR (1970) On extending the Bradley–Terry model to accommodate ties in paired comparison experiments. J Am Stat Assoc 65:317–328
De’ath G (2002) Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology 83:1105–1117
Diaconis P (1988) Group representations in probability and statistics. Institute of Mathematical Statistics, Hayward
Emond EJ, Mason DW (2002) A new rank correlation coefficient with application to the consensus ranking problem. J Multi-Criteria Decis Anal 11:17–28
Francis B, Dittrich R, Hatzinger R, Penn R (2002) Analysing partial ranks by using smoothed paired comparison methods: an investigation of value orientation in Europe. J R Stat Soc Ser C (Appl Stat) 51:319–336
Francis B, Dittrich R, Hatzinger R, Humphreys L (2014) A mixture model for longitudinal partially ranked data. Commun Stat Theory Methods 43:722–734
Hatzinger R, Dittrich R (2012) prefmod: an R package for modeling preferences based on paired comparisons, rankings, or ratings. J Stat Softw 48:1–31
Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651–674
Hsiao WC, Shih YS (2007) Splitting variable selection for multivariate regression trees. Stat Probab Lett 77:265–271
Inglehart R (1977) The silent revolution: changing values and political styles among western publics. Princeton University Press, Princeton
Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 29:119–127
Kemeny JG, Snell JL (eds) (1962) Preference rankings: an axiomatic approach. In: Mathematical models in the social sciences. The MIT press, Cambridge, pp 9–23
Kung YH, Lin CT, Shih YS (2012) Split variable selection for tree modeling on rank data. Comput Stat Data Anal 56:2830–2836
Lee PH, Yu PLH (2010) Distance-based tree models for ranking data. Comput Stat Data Anal 54:1672–1682
Liu KH, Shih YS (2016) Score-scale decision tree for paired comparison data. Statistica Sinica 26:429–444
Loh WY (2014) Fifty years of classification and regression trees (with discussion). Int Stat Rev 34:329–370
Loh WY, Zheng W (2013) Regression trees for longitudinal and multiresponse data. Ann Appl Stat 7:495–522
Marden JI (1995) Analyzing and modeling rank data. Chapman & Hall, London
Qinglong L (2015) StatMethRank: statistical methods for ranking data. R package version 1.3
Strobl C, Wickelmaier F, Zeileis A (2011) Accounting for individual differences in Bradley–Terry models by means of recursive partitioning. J Educ Behav Stat 36:135–153
Vermunt J (2003) Multilevel latent class models. Sociol Methodol 33:213–239
Yandell BS (1997) Practical data analysis for designed experiments. Chapman & Hall, Boca Raton
Yu PLH, Wan WM, Lee PH (2010) Decision tree modeling for ranking data. In: Fürnkranz J, Hüllermeier E (eds) Preference learning. Springer, New York, pp 83–106
Yu PLH, Lee PH, Cheung SF, Lau EYY, Mok DSY, Hui HC (2016) Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians. Comput Stat 31:799–827
Zeileis A, Hornik K (2007) Generalized M-fluctuation tests for parameter instability. Statistica Neerlandica 61:488–508
Zeileis A, Hothorn T, Hornik K (2008) Model-based recursive partitioning. J Comput Graph Stat 17:492–514
Acknowledgements
The authors are very grateful to the two reviewers and the editors for many helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported in part by Taiwan MOST Grant 106-2118-M-194-002.
Rights and permissions
About this article
Cite this article
Shih, YS., Liu, KH. Regression trees for detecting preference patterns from rank data. Adv Data Anal Classif 13, 683–702 (2019). https://doi.org/10.1007/s11634-018-0332-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-018-0332-3