Abstract
Numerous attribute selection frameworks have been developed to improve performance and results in the research field of machine learning and data classification (Guyon & Elisseeff 2003; Saeys, Inza & Larranaga 2007), majority of the effort has focused on the performance and cost factors, with a primary aim to examine and enhance the logic and sophistication of the underlying components and methods of specific classification models, such as a variety of wrapper, filter and cluster algorithms for feature selection, to work as a data pre-process step or embedded as an integral part of a specific classification process. Taking a different approach, our research is to study the relationship between classification errors and data attributes not before, not during, but after the fact, to evaluate risk levels of attributes and identify the ones that may be more prone to errors based on such a post-classification analysis and a proposed attribute-risk evaluation routine. Possible benefits from this research can be to help develop error reduction measures and to investigate specific relationship between attributes and errors in a more efficient and effective way. Initial experiments have shown some supportive results, and the unsupportive results can also be explained by a hypothesis extended from this evaluation proposal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alpaydin, E.: Introduction to Machine Learning. The MIT Press, London (2004)
Bredensteiner, E.J., Bennett, K.P.: Feature Minimization within Decision Trees. Computational Optimization and Applications 10(2), 111–126 (1998)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
Carpenter, G.A., Markuzon, N.: ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases. Neural Networks 11, 323–336 (1998)
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, Irvine (2010), http://archive.ics.uci.edu/ml
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. The Journal of Machine Learning Research 3, 1157–1182 (2003)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Han, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Kayaer, K., Yyldyrym, T.: Medical diagnosis on pima indian diabetes using General Regression Neural Networks. Paper presented to the International Conference on Artificial Neural Networks/International Conference on Neural Information Processing, Istanbul, Turkey (2003)
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256. Morgan Kaufmann Publishers Inc. (1992)
Kittler, J.: Feature set search algorithms. Pattern recognition and signal processing 41, 60 (1978)
Liu, H., Motoda, H., Setiono, R.: Feature Selection: An Ever Evolving Frontier in Data Mining. Journal of Machine Learning Research: Workshop and Conference Proceedings 10, 10 (2010)
Mangasarian, O.L., Street, W.N., Wolberg, W.H.: Breast Cancer Diagnosis and Prognosis via Linear Programming, Mathematical Programming Technical Report (1994)
Quinlan, J.R.: C4. 5: programs for machine learning. Morgan Kaufmann (1993)
Raymer, M.L., Doom, T.E., Kuhn, L.A., Punch, W.L.: Knowledge Discovery in Medical and Biological Datasets Using a Hybrid Bayes Classifier/Evolutionary Algorithm. In: Proceedings of the IEEE 2nd International Symposium on Bioinformatics and Bioengineering Conference, pp. 236–245 (2001)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Shannon, C.E.: A Mathematical Theory of Communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus. In: Proc. Annu. Symp. Comput. Appl. Med. Care., vol. 9, pp. 261–265 (1988)
Taylor, J.R.: An Introduction to error analysis: The Study of uncertainties in physical measurements, 2nd edn. University Science Books, Sausalito (1996)
Wei, L., Altman, R.B.: An Automated System for Generating Comparative Disease Profiles and Making Diagnoses. IEEE Transactions on Neural Networks 15 (2004)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Wolberg, W.H., Mangasarian, O.L.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences 87, 9193–9196 (1990)
Yoon, K.: The propagation of errors in multiple-attribute decision analysis: A practical approach. Journal of the Operational Research Society 40(7), 681–686 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, W., Zhang, S. (2013). Evaluation of Error-Sensitive Attributes. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-40319-4_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40318-7
Online ISBN: 978-3-642-40319-4
eBook Packages: Computer ScienceComputer Science (R0)