Evaluation of Error-Sensitive Attributes

Wu, William; Zhang, Shichao

doi:10.1007/978-3-642-40319-4_25

William Wu²⁵ &
Shichao Zhang²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7867))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3435 Accesses
2 Citations

Abstract

Numerous attribute selection frameworks have been developed to improve performance and results in the research field of machine learning and data classification (Guyon & Elisseeff 2003; Saeys, Inza & Larranaga 2007), majority of the effort has focused on the performance and cost factors, with a primary aim to examine and enhance the logic and sophistication of the underlying components and methods of specific classification models, such as a variety of wrapper, filter and cluster algorithms for feature selection, to work as a data pre-process step or embedded as an integral part of a specific classification process. Taking a different approach, our research is to study the relationship between classification errors and data attributes not before, not during, but after the fact, to evaluate risk levels of attributes and identify the ones that may be more prone to errors based on such a post-classification analysis and a proposed attribute-risk evaluation routine. Possible benefits from this research can be to help develop error reduction measures and to investigate specific relationship between attributes and errors in a more efficient and effective way. Initial experiments have shown some supportive results, and the unsupportive results can also be explained by a hypothesis extended from this evaluation proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alpaydin, E.: Introduction to Machine Learning. The MIT Press, London (2004)
Google Scholar
Bredensteiner, E.J., Bennett, K.P.: Feature Minimization within Decision Trees. Computational Optimization and Applications 10(2), 111–126 (1998)
MathSciNet MATH Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
MATH Google Scholar
Carpenter, G.A., Markuzon, N.: ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases. Neural Networks 11, 323–336 (1998)
Article Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, Irvine (2010), http://archive.ics.uci.edu/ml
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. The Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Han, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Google Scholar
Kayaer, K., Yyldyrym, T.: Medical diagnosis on pima indian diabetes using General Regression Neural Networks. Paper presented to the International Conference on Artificial Neural Networks/International Conference on Neural Information Processing, Istanbul, Turkey (2003)
Google Scholar
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256. Morgan Kaufmann Publishers Inc. (1992)
Google Scholar
Kittler, J.: Feature set search algorithms. Pattern recognition and signal processing 41, 60 (1978)
Google Scholar
Liu, H., Motoda, H., Setiono, R.: Feature Selection: An Ever Evolving Frontier in Data Mining. Journal of Machine Learning Research: Workshop and Conference Proceedings 10, 10 (2010)
Google Scholar
Mangasarian, O.L., Street, W.N., Wolberg, W.H.: Breast Cancer Diagnosis and Prognosis via Linear Programming, Mathematical Programming Technical Report (1994)
Google Scholar
Quinlan, J.R.: C4. 5: programs for machine learning. Morgan Kaufmann (1993)
Google Scholar
Raymer, M.L., Doom, T.E., Kuhn, L.A., Punch, W.L.: Knowledge Discovery in Medical and Biological Datasets Using a Hybrid Bayes Classifier/Evolutionary Algorithm. In: Proceedings of the IEEE 2nd International Symposium on Bioinformatics and Bioengineering Conference, pp. 236–245 (2001)
Google Scholar
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
Shannon, C.E.: A Mathematical Theory of Communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)
Article MathSciNet MATH Google Scholar
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus. In: Proc. Annu. Symp. Comput. Appl. Med. Care., vol. 9, pp. 261–265 (1988)
Google Scholar
Taylor, J.R.: An Introduction to error analysis: The Study of uncertainties in physical measurements, 2nd edn. University Science Books, Sausalito (1996)
Google Scholar
Wei, L., Altman, R.B.: An Automated System for Generating Comparative Disease Profiles and Making Diagnoses. IEEE Transactions on Neural Networks 15 (2004)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Google Scholar
Wolberg, W.H., Mangasarian, O.L.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences 87, 9193–9196 (1990)
Article MATH Google Scholar
Yoon, K.: The propagation of errors in multiple-attribute decision analysis: A practical approach. Journal of the Operational Research Society 40(7), 681–686 (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia
William Wu & Shichao Zhang

Authors

William Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shichao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Mathematical Sciences, University of South Australia, 1 Mawson Lakes Boulevard, 5095, Adelaide, SA, Australia
Jiuyong Li
Advanced Analytics Institute, University of Technology, 2-12 Blackfriars Street, Chippendale, Blackfriars Campus, 2008, Sydney, NSW, Australia
Longbing Cao & Can Wang &
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, 117576, Singapore, Singapore
Kay Chen Tan
School of Automation, Guangdong University of Technology, No. 100 Waihuan Xi Road, Panyu District, 510006, Guangzhou, China
Bo Liu
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
Department of Computer Science and Information Engineering, National Cheng Kung University, No.1, University Road, 701, Tainan, Taiwan
Vincent S. Tseng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, W., Zhang, S. (2013). Evaluation of Error-Sensitive Attributes. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-40319-4_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40318-7
Online ISBN: 978-3-642-40319-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics