Skip to main content

Weighted distance-based trees for ranking data

Abstract

Within the framework of preference rankings, the interest can lie in finding which predictors and which interactions are able to explain the observed preference structures, because preference decisions will usually depend on the characteristics of both the judges and the objects being judged. This work proposes the use of a univariate decision tree for ranking data based on the weighted distances for complete and incomplete rankings, and considers the area under the ROC curve both for pruning and model assessment. Two real and well-known datasets, the SUSHI preference data and the University ranking data, are used to display the performance of the methodology.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    Preference rankings can be represented through either rank vectors (as in this paper) or order vectors (D’Ambrosio et al. 2015).

  2. 2.

    It is not the optimal tree with the best tree size but we decide to prune the tree to a size that ensures a right trade-off between tree predictive accuracy and complexity.

  3. 3.

    Data are available at http://www.kamishima.net/asset/sushi3.tgz.

References

  1. Amodio S, D’Ambrosio A, Siciliano R (2016) Accurate algorithms for identifying the median ranking when dealing with weak and partial rankings under the kemeny axiomatic approach. Eur J Oper Res 249(2):667–676. https://doi.org/10.1016/j.ejor.2015.08.048

    MathSciNet  Article  MATH  Google Scholar 

  2. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, London

    MATH  Google Scholar 

  3. Chen J, Li Y, Feng L (2012) A new weighted Spearman’s footrule as—a measure of distance between rankings. CoRR. arXiv:1207.2541

  4. Cheng W, Hühn J, Hüllermeier E (2009) Decision tree and instance-based learning for label ranking. In Bottou L, Littman M (eds) Proceedings of the 26th international conference on machine learning. Omnipress, Montreal, pp 161–168

  5. Cook WD (2006) Distance based and ad hoc consensus models in ordinal preference ranking. Eur J Oper Res 172:369–385. https://doi.org/10.1016/j.ejor.2005.03.048

    MathSciNet  Article  MATH  Google Scholar 

  6. Cook W, Kress M, Seiford LM (1986) An axiomatic approach to distance on partial orderings. Rev Franaise Autom Iinformatique Rech Oprationnelle Rech Oprationnelle 20(2):115–122

    MathSciNet  MATH  Google Scholar 

  7. D’Ambrosio A (2007) Tree based methods for data editing and preference rankings. Ph.D. thesis, Universitá degli Studi di Napoli “Federico II”

  8. D’Ambrosio A, Amodio S (2015) ConsRank: compute the median ranking(s) according to the Kemeny’s axiomatic approach. R package version 1.0.2. http://CRAN.R-project.org/package=ConsRank

  9. D’Ambrosio A, Amodio S, Iorio C (2015) Two algorithms for finding optimal solutions of the Kemeny rank aggregation problem for full rankings. Electron J Appl Stat Anal 8(2). http://siba-ese.unisalento.it/index.php/ejasa/article/view/14986

  10. Dittrich R, Hatzinger R, Katzenbeisser W (1998) Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. J R Stat Soc Ser C (Appl Stat) 47(4):511–525. https://doi.org/10.1111/1467-9876.00125

    Article  MATH  Google Scholar 

  11. Edmond, EJ Mason DW (2000) A new technique for high level decision support. Technical Report DOR (CAM) Project Report 2000/13. https://doi.org/10.1002/meda.313

  12. Edmond EJ, Mason DW (2002) A new rank correlation coefficient with application to the concensus ranking problem. J Multi-criteria Decision Anal 11:17–28

    Article  Google Scholar 

  13. Farnoud F, Touri B, Milenkovic O (2012) Novel distance measures for vote aggregation. arXiv:1203.6371

  14. Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010

    MathSciNet  Article  Google Scholar 

  15. García-Lapresta JL, Pérez-Román D (2010) Consensus measures generated by weighted kemeny distances on weak orders. In: Proceedings of the 10th international conference on intelligent systems design and applications, Cairo

  16. Good IJ (1980) The number of orderings of n candidates when ties and omissions are both allowed. J Stat Comput Simul 10(2):159–159. https://doi.org/10.1080/00949658008810357

    Article  Google Scholar 

  17. Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186. https://doi.org/10.1023/A:1010920819831

    Article  MATH  Google Scholar 

  18. Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16):1897–1916

    MathSciNet  Article  MATH  Google Scholar 

  19. Kamishima T (2003) Nantonac collaborative filtering: recommendation based on order responses. In: 9th international conference on knowledge discovery and data mining, KDD2003, pp 583–588

  20. Kemeny JG, Snell JL (1962) Preference rankings an axiomatic approach. MIT Press, Cambridge

    Google Scholar 

  21. Kumar R, Indrayan A (2011) Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatrics 48(4):277–287. https://doi.org/10.1007/s13312-011-0055-4

    Article  Google Scholar 

  22. Kumar R, Vassilvitskii S (2010) Generalized distances between rankings. In: Proceedings of the 19th international conference on World Wide Web, WWW ’10, New York, NY, USA. ACM, pp 571–580. https://doi.org/10.1145/1772690.1772749

  23. Lee PH, Yu PL (2010) Distance-based tree models for ranking data. Comput Stat Data Anal 54(6):1672–1682. https://doi.org/10.1016/j.csda.2010.01.027

    MathSciNet  Article  MATH  Google Scholar 

  24. Marcus P (2013) Comparison of heterogeneous probability models for ranking data. Master thesis. http://www.math.leidenuniv.nl/scripties/1MasterMarcus.pdf

  25. Piccarreta R (2010) Binary trees for dissimilarity data. Comput Stat Data Anal 54(6):1516–1524. https://doi.org/10.1016/j.csda.2009.12.011

    MathSciNet  Article  MATH  Google Scholar 

  26. Ripley B (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  27. Sciandra M, Plaia A, Capursi V (2016) Classification trees for multivariate ordinal response: an application to student evaluation teaching. Qual Quant. https://doi.org/10.1007/s11135-016-0430-2

    Google Scholar 

  28. Shih Y-S (2001) Selecting the best splits for classification trees with categorical variables. Stat Probab Lett 54(4):341–345. https://doi.org/10.1016/S0167-7152(00)00188-7

    MathSciNet  Article  MATH  Google Scholar 

  29. Strobl C, Wickelmaier F, Zeileis A (2011) Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. J Educ Behav Stat 36(2):135–153. https://doi.org/10.3102/1076998609359791

    Article  Google Scholar 

  30. Therneau T, Clinic M (2015) User written splitting functions for Rpart. https://cran.r-project.org/web/packages/rpart/vignettes/usercode.pdf

  31. Therneau T, Atkinson B, Ripley B (2015) Rpart: recursive partitioning and regression trees. R package version 4.1-10. http://CRAN.R-project.org/package=rpart

  32. Yu PL, Wan, WM, Lee PH (2010) Decision tree modeling for ranking data. In: Preference learning. Springer, Berlin, pp 83–106

Download references

Acknowledgements

We would like to thanks Antonio D’Ambrosio for suggesting the SUSHI Preference data set.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Antonella Plaia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Plaia, A., Sciandra, M. Weighted distance-based trees for ranking data. Adv Data Anal Classif 13, 427–444 (2019). https://doi.org/10.1007/s11634-017-0306-x

Download citation

Keywords

  • Decision tree
  • Distance-based methods
  • Ranking data
  • Kemeny distance
  • SUSHI data
  • University ranking data