Decision Trees

Joshi, Ameet V

doi:10.1007/978-3-030-26622-6_6

Ameet V Joshi²

8316 Accesses
2 Citations

Abstract

Decision tree is a fundamentally different approach towards machine learning compared to other options like neural networks or support vector machines. The other approaches deal with the data that is strictly numerical that may increase or decrease monotonically. The equations that define these approaches are designed to work only when the data is numerical. However, the theory of decision trees does not rely on the assumption of numerical data. In this chapter, we will study the theory of decision trees along with some advanced topics in decision trees, like ensemble methods. We will focus on bagging and boosting as two main types of ensemble methods and learn how they work and what their advantages and disadvantages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The words weak and strong have a different meaning in this context. A weak learner is a decision tree that is trained using only fraction of the total data and is not capable or even expected of giving metrics that are close to the desired ones. Theoretical definition of a weak learner is one whose performance is only slightly better than pure random chance. A strong learner is a single decision tree uses all the data and is capable of producing reasonably good metrics. In ensemble methods individual tree is always a weak learner as it is not exposed to the full data set.
2.
Outliers represent an important concept in the theory of machine learning. Although, its meaning is obvious, its impact on learning is not quite trivial. An outlier is a sample in training data that does not represent the generic trends in the data. Also, from mathematical standpoint, the distance of an outlier from other samples in the data is typically large. Such large distances can throw a machine learning model significantly away from the desired behavior. In other words, a small set of outliers can affect the learning of a machine learning model adversely and can reduce the metrics significantly. It is thus an important property of a machine learning model to be resilient of a reasonable number of outliers.

References

Trevor Hastie, Robert Tibshirani, Jerome Friedman The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. (Springer, New York, 2016).
Google Scholar
Tin Kam Ho, Random Decision Forests, Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282.
Google Scholar
Leo Breiman, Random Forests, Machine learning 45.1 (2001): 5–32.
Article Google Scholar
Leo Breimian, Prediction Games and ARCing Algorithms, Technical Report 504, Statistics Department, University of California, Berkerley, CA, 1998.
Google Scholar
Yoav Freund, Robert Schapire A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5):771–780, September, 1999.
Google Scholar
Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, Antonio Criminisi, Decision Jungles: COmpact and Rich Models for Classification, NIPS 2013.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft (United States), Redmond, WA, USA
Ameet V Joshi

Authors

Ameet V Joshi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Joshi, A.V. (2020). Decision Trees. In: Machine Learning and Artificial Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-26622-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-26622-6_6
Published: 25 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26621-9
Online ISBN: 978-3-030-26622-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics