Identify Error-Sensitive Patterns by Decision Tree

Wu, William

doi:10.1007/978-3-319-20910-4_7

William Wu⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9165))

Included in the following conference series:

Industrial Conference on Data Mining

1400 Accesses

Abstract

When errors are inevitable during data classification, finding a particular part of the classification model which may be more susceptible to error than others, when compared to finding an Achilles’ heel of the model in a casual way, may help uncover specific error-sensitive value patterns and lead to additional error reduction measures. As an initial phase of the investigation, this study narrows the scope of problem by focusing on decision trees as a pilot model, develops a simple and effective tagging method to digitize individual nodes of a binary decision tree for node-level analysis, to link and track classification statistics for each node in a transparent way, to facilitate the identification and examination of the potentially “weakest” nodes and error-sensitive value patterns in decision trees, to assist cause analysis and enhancement development.

This digitization method is not an attempt to re-develop or transform the existing decision tree model, but rather, a pragmatic node ID formulation that crafts numeric values to reflect the tree structure and decision making paths, to expand post-classification analysis to detailed node-level. Initial experiments have shown successful results in locating potentially high-risk attribute and value patterns; this is an encouraging sign to believe this study worth further exploration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Quinlan, J.R.: C4. 5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
MATH Google Scholar
Alpaydin, E.: Introduction to Machine Learning. The MIT Press, London (2004)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
Google Scholar
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bio-informatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Google Scholar
Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43(1), 5–13 (2010)
Article MATH Google Scholar
Tabakhi, S., Moradi, P., Akhlaghian, F.: An unsupervised feature selection algorithm based on ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)
Article Google Scholar
Breiman, L.: Bagging Predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Schapire, R.E.: The Strength of Weak Learnability. Mach. Learn. 5(2), 197–227 (1990)
Google Scholar
Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282 (1995)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of online learning and an application to boosting. In: Computational Learning Theory, pp. 23–37 (1995)
Google Scholar
Breiman, L.: Random Forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Grossmann, E.: AdaTree: boosting a weak classifier into a decision tree. In: Computer Vision and Pattern Recognition Workshop (2004)
Google Scholar
Tu, Z.: Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In: Tenth IEEE International Conference on Computer Vision, vol. 2, pp. 1589–1596 (2005)
Google Scholar
Monteith, K., Carroll, J.L., Seppi, K., Martinez, T.: Turning bayesian model averaging into bayesian model combination. In: The 2011 International Joint Conference on Neural Networks, pp. 2657–2663
Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, New Jersey (2004)
Book Google Scholar
Yang, P., Yang, Y.H., Zhou, B., Zomaya, A.: A review of ensemble methods in bioinformatics. Current Bioinf. 5(4), 296–308 (2010)
Article Google Scholar
Wu, W., Zhang, S.: Evaluation of error-sensitive attributes. In: Li, J., Cao, L., Wang, C., Tan, K.C., Liu, B., Pei, J., Tseng, V.S. (eds.) PAKDD 2013 Workshops. LNCS, vol. 7867, pp. 283–294. Springer, Heidelberg (2013)
Chapter Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

QCIS FEIT, University of Technology Sydney, Ultimo, NSW, Australia
William Wu

Authors

William Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William Wu .

Editor information

Editors and Affiliations

IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, W. (2015). Identify Error-Sensitive Patterns by Decision Tree. In: Perner, P. (eds) Advances in Data Mining: Applications and Theoretical Aspects. ICDM 2015. Lecture Notes in Computer Science(), vol 9165. Springer, Cham. https://doi.org/10.1007/978-3-319-20910-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-20910-4_7
Published: 20 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20909-8
Online ISBN: 978-3-319-20910-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics