Within the framework of preference rankings, the interest can lie in finding which predictors and which interactions are able to explain the observed preference structures, because preference decisions will usually depend on the characteristics of both the judges and the objects being judged. This work proposes the use of a univariate decision tree for ranking data based on the weighted distances for complete and incomplete rankings, and considers the area under the ROC curve both for pruning and model assessment. Two real and well-known datasets, the SUSHI preference data and the University ranking data, are used to display the performance of the methodology.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Preference rankings can be represented through either rank vectors (as in this paper) or order vectors (D’Ambrosio et al. 2015).
It is not the optimal tree with the best tree size but we decide to prune the tree to a size that ensures a right trade-off between tree predictive accuracy and complexity.
Data are available at http://www.kamishima.net/asset/sushi3.tgz.
Amodio S, D’Ambrosio A, Siciliano R (2016) Accurate algorithms for identifying the median ranking when dealing with weak and partial rankings under the kemeny axiomatic approach. Eur J Oper Res 249(2):667–676. https://doi.org/10.1016/j.ejor.2015.08.048
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, London
Chen J, Li Y, Feng L (2012) A new weighted Spearman’s footrule as—a measure of distance between rankings. CoRR. arXiv:1207.2541
Cheng W, Hühn J, Hüllermeier E (2009) Decision tree and instance-based learning for label ranking. In Bottou L, Littman M (eds) Proceedings of the 26th international conference on machine learning. Omnipress, Montreal, pp 161–168
Cook WD (2006) Distance based and ad hoc consensus models in ordinal preference ranking. Eur J Oper Res 172:369–385. https://doi.org/10.1016/j.ejor.2005.03.048
Cook W, Kress M, Seiford LM (1986) An axiomatic approach to distance on partial orderings. Rev Franaise Autom Iinformatique Rech Oprationnelle Rech Oprationnelle 20(2):115–122
D’Ambrosio A (2007) Tree based methods for data editing and preference rankings. Ph.D. thesis, Universitá degli Studi di Napoli “Federico II”
D’Ambrosio A, Amodio S (2015) ConsRank: compute the median ranking(s) according to the Kemeny’s axiomatic approach. R package version 1.0.2. http://CRAN.R-project.org/package=ConsRank
D’Ambrosio A, Amodio S, Iorio C (2015) Two algorithms for finding optimal solutions of the Kemeny rank aggregation problem for full rankings. Electron J Appl Stat Anal 8(2). http://siba-ese.unisalento.it/index.php/ejasa/article/view/14986
Dittrich R, Hatzinger R, Katzenbeisser W (1998) Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. J R Stat Soc Ser C (Appl Stat) 47(4):511–525. https://doi.org/10.1111/1467-9876.00125
Edmond, EJ Mason DW (2000) A new technique for high level decision support. Technical Report DOR (CAM) Project Report 2000/13. https://doi.org/10.1002/meda.313
Edmond EJ, Mason DW (2002) A new rank correlation coefficient with application to the concensus ranking problem. J Multi-criteria Decision Anal 11:17–28
Farnoud F, Touri B, Milenkovic O (2012) Novel distance measures for vote aggregation. arXiv:1203.6371
Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
García-Lapresta JL, Pérez-Román D (2010) Consensus measures generated by weighted kemeny distances on weak orders. In: Proceedings of the 10th international conference on intelligent systems design and applications, Cairo
Good IJ (1980) The number of orderings of n candidates when ties and omissions are both allowed. J Stat Comput Simul 10(2):159–159. https://doi.org/10.1080/00949658008810357
Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186. https://doi.org/10.1023/A:1010920819831
Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16):1897–1916
Kamishima T (2003) Nantonac collaborative filtering: recommendation based on order responses. In: 9th international conference on knowledge discovery and data mining, KDD2003, pp 583–588
Kemeny JG, Snell JL (1962) Preference rankings an axiomatic approach. MIT Press, Cambridge
Kumar R, Indrayan A (2011) Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatrics 48(4):277–287. https://doi.org/10.1007/s13312-011-0055-4
Kumar R, Vassilvitskii S (2010) Generalized distances between rankings. In: Proceedings of the 19th international conference on World Wide Web, WWW ’10, New York, NY, USA. ACM, pp 571–580. https://doi.org/10.1145/1772690.1772749
Lee PH, Yu PL (2010) Distance-based tree models for ranking data. Comput Stat Data Anal 54(6):1672–1682. https://doi.org/10.1016/j.csda.2010.01.027
Marcus P (2013) Comparison of heterogeneous probability models for ranking data. Master thesis. http://www.math.leidenuniv.nl/scripties/1MasterMarcus.pdf
Piccarreta R (2010) Binary trees for dissimilarity data. Comput Stat Data Anal 54(6):1516–1524. https://doi.org/10.1016/j.csda.2009.12.011
Ripley B (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Sciandra M, Plaia A, Capursi V (2016) Classification trees for multivariate ordinal response: an application to student evaluation teaching. Qual Quant. https://doi.org/10.1007/s11135-016-0430-2
Shih Y-S (2001) Selecting the best splits for classification trees with categorical variables. Stat Probab Lett 54(4):341–345. https://doi.org/10.1016/S0167-7152(00)00188-7
Strobl C, Wickelmaier F, Zeileis A (2011) Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. J Educ Behav Stat 36(2):135–153. https://doi.org/10.3102/1076998609359791
Therneau T, Clinic M (2015) User written splitting functions for Rpart. https://cran.r-project.org/web/packages/rpart/vignettes/usercode.pdf
Therneau T, Atkinson B, Ripley B (2015) Rpart: recursive partitioning and regression trees. R package version 4.1-10. http://CRAN.R-project.org/package=rpart
Yu PL, Wan, WM, Lee PH (2010) Decision tree modeling for ranking data. In: Preference learning. Springer, Berlin, pp 83–106
We would like to thanks Antonio D’Ambrosio for suggesting the SUSHI Preference data set.
About this article
Cite this article
Plaia, A., Sciandra, M. Weighted distance-based trees for ranking data. Adv Data Anal Classif 13, 427–444 (2019). https://doi.org/10.1007/s11634-017-0306-x
- Decision tree
- Distance-based methods
- Ranking data
- Kemeny distance
- SUSHI data
- University ranking data