Skip to main content
Log in

Regression trees for detecting preference patterns from rank data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

A regression tree method for analyzing rank data is proposed. A key ingredient of the methodology is to convert ranks into scores by paired comparison. We then utilize the GUIDE tree method on the score vectors to identify the preference patterns in the data. This method is exempt from selection bias and the simulation results show that it is good with respect to the selection of split variables and has a better prediction accuracy than the two other investigated methods in some cases. Furthermore, it is applicable to complex data which may contain incomplete ranks and missing covariate values. We demonstrate its usefulness in two real data studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Pearson’s Chi-Square test of independence and its Bonferroni-adjusted p value were used in CHAID, a classical tree method (Kass 1980).

References

  • Alvo M, Yu PLH (2014) Statistical methods for ranking data. Springer, New York

    Book  MATH  Google Scholar 

  • Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39:324–345

    MathSciNet  MATH  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont

    MATH  Google Scholar 

  • Cattelan M (2012) Models for paired comparison data: a review with emphasis on dependent data. Stat Sci 27:412–433

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng W, Hühn J, Hüllermeier E (2009) Decision tree and instance-based learning for label ranking. In: International conference on machine learning, Montreal

  • Critchlow DE (1985) Metric methods for analyzing partially ranked data. Springer, New York

    Book  MATH  Google Scholar 

  • D’Ambrosio A, Heiser WJ (2016) A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika 81:774–794

    Article  MathSciNet  MATH  Google Scholar 

  • Davidson RR (1970) On extending the Bradley–Terry model to accommodate ties in paired comparison experiments. J Am Stat Assoc 65:317–328

    Article  Google Scholar 

  • De’ath G (2002) Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology 83:1105–1117

    Google Scholar 

  • Diaconis P (1988) Group representations in probability and statistics. Institute of Mathematical Statistics, Hayward

    MATH  Google Scholar 

  • Emond EJ, Mason DW (2002) A new rank correlation coefficient with application to the consensus ranking problem. J Multi-Criteria Decis Anal 11:17–28

    Article  MATH  Google Scholar 

  • Francis B, Dittrich R, Hatzinger R, Penn R (2002) Analysing partial ranks by using smoothed paired comparison methods: an investigation of value orientation in Europe. J R Stat Soc Ser C (Appl Stat) 51:319–336

    Article  MathSciNet  MATH  Google Scholar 

  • Francis B, Dittrich R, Hatzinger R, Humphreys L (2014) A mixture model for longitudinal partially ranked data. Commun Stat Theory Methods 43:722–734

    Article  MathSciNet  MATH  Google Scholar 

  • Hatzinger R, Dittrich R (2012) prefmod: an R package for modeling preferences based on paired comparisons, rankings, or ratings. J Stat Softw 48:1–31

    Article  Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651–674

    Article  MathSciNet  Google Scholar 

  • Hsiao WC, Shih YS (2007) Splitting variable selection for multivariate regression trees. Stat Probab Lett 77:265–271

    Article  MathSciNet  MATH  Google Scholar 

  • Inglehart R (1977) The silent revolution: changing values and political styles among western publics. Princeton University Press, Princeton

    Google Scholar 

  • Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 29:119–127

    Article  Google Scholar 

  • Kemeny JG, Snell JL (eds) (1962) Preference rankings: an axiomatic approach. In: Mathematical models in the social sciences. The MIT press, Cambridge, pp 9–23

  • Kung YH, Lin CT, Shih YS (2012) Split variable selection for tree modeling on rank data. Comput Stat Data Anal 56:2830–2836

    Article  MathSciNet  MATH  Google Scholar 

  • Lee PH, Yu PLH (2010) Distance-based tree models for ranking data. Comput Stat Data Anal 54:1672–1682

    Article  MathSciNet  MATH  Google Scholar 

  • Liu KH, Shih YS (2016) Score-scale decision tree for paired comparison data. Statistica Sinica 26:429–444

    MathSciNet  MATH  Google Scholar 

  • Loh WY (2014) Fifty years of classification and regression trees (with discussion). Int Stat Rev 34:329–370

    Article  MATH  Google Scholar 

  • Loh WY, Zheng W (2013) Regression trees for longitudinal and multiresponse data. Ann Appl Stat 7:495–522

    Article  MathSciNet  MATH  Google Scholar 

  • Marden JI (1995) Analyzing and modeling rank data. Chapman & Hall, London

    MATH  Google Scholar 

  • Qinglong L (2015) StatMethRank: statistical methods for ranking data. R package version 1.3

  • Strobl C, Wickelmaier F, Zeileis A (2011) Accounting for individual differences in Bradley–Terry models by means of recursive partitioning. J Educ Behav Stat 36:135–153

    Article  Google Scholar 

  • Vermunt J (2003) Multilevel latent class models. Sociol Methodol 33:213–239

    Article  Google Scholar 

  • Yandell BS (1997) Practical data analysis for designed experiments. Chapman & Hall, Boca Raton

    Book  MATH  Google Scholar 

  • Yu PLH, Wan WM, Lee PH (2010) Decision tree modeling for ranking data. In: Fürnkranz J, Hüllermeier E (eds) Preference learning. Springer, New York, pp 83–106

    Chapter  Google Scholar 

  • Yu PLH, Lee PH, Cheung SF, Lau EYY, Mok DSY, Hui HC (2016) Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians. Comput Stat 31:799–827

    Article  MathSciNet  MATH  Google Scholar 

  • Zeileis A, Hornik K (2007) Generalized M-fluctuation tests for parameter instability. Statistica Neerlandica 61:488–508

    Article  MathSciNet  MATH  Google Scholar 

  • Zeileis A, Hothorn T, Hornik K (2008) Model-based recursive partitioning. J Comput Graph Stat 17:492–514

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are very grateful to the two reviewers and the editors for many helpful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Shan Shih.

Additional information

This research is supported in part by Taiwan MOST Grant 106-2118-M-194-002.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shih, YS., Liu, KH. Regression trees for detecting preference patterns from rank data. Adv Data Anal Classif 13, 683–702 (2019). https://doi.org/10.1007/s11634-018-0332-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-018-0332-3

Keywords

Mathematics Subject Classification

Navigation