Cost-Sensitive Decision Tree Learning for Forensic Classification

Davis, Jason V.; Ha, Jungwoo; Rossbach, Christopher J.; Ramadan, Hany E.; Witchel, Emmett

doi:10.1007/11871842_60

Jason V. Davis²¹,
Jungwoo Ha²¹,
Christopher J. Rossbach²¹,
Hany E. Ramadan²¹ &
…
Emmett Witchel²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4212))

Included in the following conference series:

European Conference on Machine Learning

5706 Accesses
16 Citations

Abstract

In some learning settings, the cost of acquiring features for classification must be paid up front, before the classifier is evaluated. In this paper, we introduce the forensic classification problem and present a new algorithm for building decision trees that maximizes classification accuracy while minimizing total feature costs. By expressing the ID3 decision tree algorithm in an information theoretic context, we derive our algorithm from a well-formulated problem objective. We evaluate our algorithm across several datasets and show that, for a given level of accuracy, our algorithm builds cheaper trees than existing methods. Finally, we apply our algorithm to a real-world system, Clarify. Clarify classifies unknown or unexpected program errors by collecting statistics during program runtime which are then used for decision tree classification after an error has occurred. We demonstrate that if the classifier used by the Clarify system is trained with our algorithm, the computational overhead (equivalently, total feature costs) can decrease by many orders of magnitude with only a slight (<1%) reduction in classification accuracy.

Download to read the full chapter text

Chapter PDF

Machine Learning for Feature Constraints Discovery

Classification Tree Method with Parameter Shielding

A Study of Learning Data Structure Invariants Using Off-the-shelf Tools

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bradford, J., Kunz, C., Kohavi, R., Brunk, C., Brodley, C.: Pruning decision trees with misclassification costs. In: European Conference on Machine Learning (1998)
Google Scholar
Brun, Y., Ernst, M.D.: Finding latent code errors via machine learning over program executions. In: ICSE (2004)
Google Scholar
Cohen, I., Zhang, S., Goldszmidt, M., Symons, J., Kelly, T., Fox, A.: Capturing, indexing, clustering, and retrieving system history. In: SOSP (2005)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley Series in Telecommunications (1991)
Google Scholar
Blake, C.L., Newman, D.J., Hettich, S., Merz, C.J.: UCI repository of machine learning databases (1998)
Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: International joint conference on artifical intelligence (2001)
Google Scholar
Ha, J., Ramadan, H., Davis, J., Rossbach, C., Roy, I., Witchel, E.: Navel: Automating software support by classifying program behavior. Technical Report TR-06-11, University of Texas at Austin (2006)
Google Scholar
Hangal, S., Lam, M.S.: Tracking down software bugs using automatic anomaly detection. In: ICSE (2002)
Google Scholar
Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: PLDI (2005)
Google Scholar
Mitchell, T.: Machine Learning. In: WCB. McGraw-Hill, New York (1997)
Google Scholar
Norton, S.W.: Generating better decision trees. In: International joint conference on artifical intelligence (1989)
Google Scholar
Nunez, M.: The use of background knowledge in decision tree induction. Machine Learning (1991)
Google Scholar
Quinlan, R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco (1992)
Google Scholar
Turney, P.: Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of artificial intelligence research (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Sciences, The University of Texas at Austin,
Jason V. Davis, Jungwoo Ha, Christopher J. Rossbach, Hany E. Ramadan & Emmett Witchel

Authors

Jason V. Davis
View author publications
You can also search for this author in PubMed Google Scholar
Jungwoo Ha
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. Rossbach
View author publications
You can also search for this author in PubMed Google Scholar
Hany E. Ramadan
View author publications
You can also search for this author in PubMed Google Scholar
Emmett Witchel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Davis, J.V., Ha, J., Rossbach, C.J., Ramadan, H.E., Witchel, E. (2006). Cost-Sensitive Decision Tree Learning for Forensic Classification. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_60

Download citation

DOI: https://doi.org/10.1007/11871842_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cost-Sensitive Decision Tree Learning for Forensic Classification

Abstract

Chapter PDF

Similar content being viewed by others

Machine Learning for Feature Constraints Discovery

Classification Tree Method with Parameter Shielding

A Study of Learning Data Structure Invariants Using Off-the-shelf Tools

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Cost-Sensitive Decision Tree Learning for Forensic Classification

Abstract

Chapter PDF

Similar content being viewed by others

Machine Learning for Feature Constraints Discovery

Classification Tree Method with Parameter Shielding

A Study of Learning Data Structure Invariants Using Off-the-shelf Tools

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation