Model tree pruning

  • Xinlei Zhou
  • Dasen YanEmail author
Original Article


A model tree is a decision tree in which a specified model, such as a linear regression or naive Bayes model, is built on part of the leaf nodes. Compared with the typical decision tree in which every leaf node is assigned a class label, a model tree has several advantages: the flexibility to handle mixed attributes, a simplified tree structure, and a good potential for processing big data. This paper investigates a model tree in which the ELM model is applied to some leaf nodes of the tree and compares two fundamental strategies for generating model trees in terms of training complexity and generalization ability, namely, prepruning and postpruning. The experimental results and algorithmic analysis show that, with respect to the ELM model tree, postpruning achieves better performance than does prepruning, which has previously been universally regarded as one of the most popular decision tree generation strategies.


Model tree Pruning Decision tree Extreme learning machine ELM-Tree 



We would like to express our gratitude to all those who helped me during the writing of this paper. We gratefully acknowledge the help of our supervisor, Prof. XiZhao Wang, who has offered us valuable suggestions to revise and improve this paper. This work was supported in part by the National Natural Science Foundation of China (Grant 61772344 and Grant 61732011), in part by the Natural Science Foundation of SZU (Grant 827-000140, Grant 827-000230 and Grant 2017060).


  1. 1.
    Frank E, Wang Y, Inglis S, Holmes G, Witten IH (1998) Using model trees for classification. Mach Learn 32(1):63–76CrossRefzbMATHGoogle Scholar
  2. 2.
    Quinlan J R (1992) Learning with continues classes. In: 5th Australian joint conference on artificial intelligenceGoogle Scholar
  3. 3.
    Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27(3):221–234CrossRefGoogle Scholar
  4. 4.
    Esposito F, Malerba D, Semeraro G et al (1997) A comparative analysis of methods for pruning decision trees. IEEE Trans Pattern Anal Mach Intell 19(5):476–491CrossRefGoogle Scholar
  5. 5.
    Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
  6. 6.
    Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90CrossRefzbMATHGoogle Scholar
  7. 7.
    Holte R C, Acker L E, Porter B W (1989) Concept learning and the problem of small disjuncts. In: International joint conference on artificial intelligence, pp 813–818Google Scholar
  8. 8.
    Niblett T (1987) Constructing decision trees in noisy domains. In: Proceedings of the second European working session on learning. Sigma Press, Wilmslow, England, pp 67–78Google Scholar
  9. 9.
    Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27(3):221–234CrossRefGoogle Scholar
  10. 10.
    Breslow LA, Aha DW (1997) Simplifying decision trees: a survey. Knowl Eng Rev 12(1):1–40CrossRefGoogle Scholar
  11. 11.
    Niblett T, Bratko I (1986) Learning decision rules in noisy domains. In: Proceedings of expert systems’86. Cambridge University Press, Cambridge, pp 25–34Google Scholar
  12. 12.
    Cestnik B, Bratko I (1991) On estimating probabilities in tree pruning. In: Proceedings of European working sessions on learning. Springer, Porto, pp 138–150Google Scholar
  13. 13.
    Breiman L, Friedman J, Olshen RA et al (1984) Classification and regression trees. Wadsworth, Belmont, pp 1–358zbMATHGoogle Scholar
  14. 14.
    Nobel A (2002) Analysis of a complexity-based pruning scheme for classification trees. IEEE Trans Inf Theory 48(8):2362–2368MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Wang R, He YL, Chow CY et al (2015) Learning ELM-Tree from big data based on uncertainty reduction. Fuzzy Sets Syst 258(C):79–100MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Schmidt WF, Kraaijveld MA, Duin RPW (1992) Feedforward neural networks with random weights. In: Pattern recognition, 1992, vol II, conference B: pattern recognition methodology and systems, Proceedings, 11th IAPR international conference on. IEEE, pp 1–4Google Scholar
  17. 17.
    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501CrossRefGoogle Scholar
  18. 18.
    Lan Y, Soh YC, Huang GB (2010) Two-stage extreme learning machine for regression. Neurocomputing 73(16–18):3028–3038CrossRefGoogle Scholar
  19. 19.
    Huang GB, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16–18):3460–3468CrossRefGoogle Scholar
  20. 20.
    Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892CrossRefGoogle Scholar
  21. 21.
    Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90CrossRefzbMATHGoogle Scholar
  22. 22.
    Gama J (2004) Functional trees. Mach Learn 55(3):219–250CrossRefzbMATHGoogle Scholar
  23. 23.
    ​Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). AAAI, Cambridge, pp 202–207Google Scholar
  24. 24.
    Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59(1–2):161–205CrossRefzbMATHGoogle Scholar
  25. 25.
    Sumner M, Frank E, Hall M (2005) Speeding up logistic model tree induction. In: European conference on principles of data mining and knowledge discovery. Springer, Berlin, pp 675–683Google Scholar
  26. 26.
    Witten IH, Frank E, Hall MA (2005) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan KaufmannGoogle Scholar
  27. 27.
    UCI Machine Learning Repository. Available online:
  28. 28.
    Srivastava A, Han EH, Kumar V et al (1999) Parallel formulations of decision-tree classification algorithms. High performance data mining. Springer, Boston, pp 237–261Google Scholar
  29. 29.
    Ben-Haim Y, Tom-Tov E (2010) A streaming parallel decision tree algorithm. J Mach Learn Res 11:849–872MathSciNetzbMATHGoogle Scholar
  30. 30.
    Jin R, Agrawal G (2003) Communication and memory efficient parallel decision tree construction. In: Proceedings of the 2003 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 119–129Google Scholar
  31. 31.
    He Q, Shang T, Zhuang F et al (2013) Parallel extreme learning machine for regression based on MapReduce. Neurocomputing 102:52–58CrossRefGoogle Scholar
  32. 32.
    Wang Y, Dou Y, Liu X et al (2016) PR-ELM: parallel regularized extreme learning machine based on cluster. Neurocomputing 173:1073–1081CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Computer Science and Software Engineering, Guangdong Key Lab of Intelligent Information ProcessingShenzhen UniversityGuangdongChina

Personalised recommendations