Advertisement

Weighted distance-based trees for ranking data

  • Antonella PlaiaEmail author
  • Mariangela Sciandra
Regular Article

Abstract

Within the framework of preference rankings, the interest can lie in finding which predictors and which interactions are able to explain the observed preference structures, because preference decisions will usually depend on the characteristics of both the judges and the objects being judged. This work proposes the use of a univariate decision tree for ranking data based on the weighted distances for complete and incomplete rankings, and considers the area under the ROC curve both for pruning and model assessment. Two real and well-known datasets, the SUSHI preference data and the University ranking data, are used to display the performance of the methodology.

Keywords

Decision tree Distance-based methods Ranking data Kemeny distance SUSHI data University ranking data 

Notes

Acknowledgements

We would like to thanks Antonio D’Ambrosio for suggesting the SUSHI Preference data set.

References

  1. Amodio S, D’Ambrosio A, Siciliano R (2016) Accurate algorithms for identifying the median ranking when dealing with weak and partial rankings under the kemeny axiomatic approach. Eur J Oper Res 249(2):667–676.  https://doi.org/10.1016/j.ejor.2015.08.048 MathSciNetCrossRefzbMATHGoogle Scholar
  2. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, LondonzbMATHGoogle Scholar
  3. Chen J, Li Y, Feng L (2012) A new weighted Spearman’s footrule as—a measure of distance between rankings. CoRR. arXiv:1207.2541
  4. Cheng W, Hühn J, Hüllermeier E (2009) Decision tree and instance-based learning for label ranking. In Bottou L, Littman M (eds) Proceedings of the 26th international conference on machine learning. Omnipress, Montreal, pp 161–168Google Scholar
  5. Cook WD (2006) Distance based and ad hoc consensus models in ordinal preference ranking. Eur J Oper Res 172:369–385.  https://doi.org/10.1016/j.ejor.2005.03.048 MathSciNetCrossRefzbMATHGoogle Scholar
  6. Cook W, Kress M, Seiford LM (1986) An axiomatic approach to distance on partial orderings. Rev Franaise Autom Iinformatique Rech Oprationnelle Rech Oprationnelle 20(2):115–122MathSciNetzbMATHGoogle Scholar
  7. D’Ambrosio A (2007) Tree based methods for data editing and preference rankings. Ph.D. thesis, Universitá degli Studi di Napoli “Federico II”Google Scholar
  8. D’Ambrosio A, Amodio S (2015) ConsRank: compute the median ranking(s) according to the Kemeny’s axiomatic approach. R package version 1.0.2. http://CRAN.R-project.org/package=ConsRank
  9. D’Ambrosio A, Amodio S, Iorio C (2015) Two algorithms for finding optimal solutions of the Kemeny rank aggregation problem for full rankings. Electron J Appl Stat Anal 8(2). http://siba-ese.unisalento.it/index.php/ejasa/article/view/14986
  10. Dittrich R, Hatzinger R, Katzenbeisser W (1998) Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. J R Stat Soc Ser C (Appl Stat) 47(4):511–525.  https://doi.org/10.1111/1467-9876.00125 CrossRefzbMATHGoogle Scholar
  11. Edmond, EJ Mason DW (2000) A new technique for high level decision support. Technical Report DOR (CAM) Project Report 2000/13.  https://doi.org/10.1002/meda.313
  12. Edmond EJ, Mason DW (2002) A new rank correlation coefficient with application to the concensus ranking problem. J Multi-criteria Decision Anal 11:17–28CrossRefGoogle Scholar
  13. Farnoud F, Touri B, Milenkovic O (2012) Novel distance measures for vote aggregation. arXiv:1203.6371
  14. Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874.  https://doi.org/10.1016/j.patrec.2005.10.010 MathSciNetCrossRefGoogle Scholar
  15. García-Lapresta JL, Pérez-Román D (2010) Consensus measures generated by weighted kemeny distances on weak orders. In: Proceedings of the 10th international conference on intelligent systems design and applications, CairoGoogle Scholar
  16. Good IJ (1980) The number of orderings of n candidates when ties and omissions are both allowed. J Stat Comput Simul 10(2):159–159.  https://doi.org/10.1080/00949658008810357 CrossRefGoogle Scholar
  17. Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186.  https://doi.org/10.1023/A:1010920819831 CrossRefzbMATHGoogle Scholar
  18. Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16):1897–1916MathSciNetCrossRefzbMATHGoogle Scholar
  19. Kamishima T (2003) Nantonac collaborative filtering: recommendation based on order responses. In: 9th international conference on knowledge discovery and data mining, KDD2003, pp 583–588Google Scholar
  20. Kemeny JG, Snell JL (1962) Preference rankings an axiomatic approach. MIT Press, CambridgeGoogle Scholar
  21. Kumar R, Indrayan A (2011) Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatrics 48(4):277–287.  https://doi.org/10.1007/s13312-011-0055-4 CrossRefGoogle Scholar
  22. Kumar R, Vassilvitskii S (2010) Generalized distances between rankings. In: Proceedings of the 19th international conference on World Wide Web, WWW ’10, New York, NY, USA. ACM, pp 571–580.  https://doi.org/10.1145/1772690.1772749
  23. Lee PH, Yu PL (2010) Distance-based tree models for ranking data. Comput Stat Data Anal 54(6):1672–1682.  https://doi.org/10.1016/j.csda.2010.01.027 MathSciNetCrossRefzbMATHGoogle Scholar
  24. Marcus P (2013) Comparison of heterogeneous probability models for ranking data. Master thesis. http://www.math.leidenuniv.nl/scripties/1MasterMarcus.pdf
  25. Piccarreta R (2010) Binary trees for dissimilarity data. Comput Stat Data Anal 54(6):1516–1524.  https://doi.org/10.1016/j.csda.2009.12.011 MathSciNetCrossRefzbMATHGoogle Scholar
  26. Ripley B (1996) Pattern recognition and neural networks. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  27. Sciandra M, Plaia A, Capursi V (2016) Classification trees for multivariate ordinal response: an application to student evaluation teaching. Qual Quant.  https://doi.org/10.1007/s11135-016-0430-2 Google Scholar
  28. Shih Y-S (2001) Selecting the best splits for classification trees with categorical variables. Stat Probab Lett 54(4):341–345.  https://doi.org/10.1016/S0167-7152(00)00188-7 MathSciNetCrossRefzbMATHGoogle Scholar
  29. Strobl C, Wickelmaier F, Zeileis A (2011) Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. J Educ Behav Stat 36(2):135–153.  https://doi.org/10.3102/1076998609359791 Google Scholar
  30. Therneau T, Clinic M (2015) User written splitting functions for Rpart. https://cran.r-project.org/web/packages/rpart/vignettes/usercode.pdf
  31. Therneau T, Atkinson B, Ripley B (2015) Rpart: recursive partitioning and regression trees. R package version 4.1-10. http://CRAN.R-project.org/package=rpart
  32. Yu PL, Wan, WM, Lee PH (2010) Decision tree modeling for ranking data. In: Preference learning. Springer, Berlin, pp 83–106Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Scienze Economiche, Aziendali e StatisticheUniversity of PalermoPalermoItaly

Personalised recommendations