Optimizing text classification through efficient feature selection based on quality metric
- 488 Downloads
- 5 Citations
Abstract
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we show that a simple adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. The method is experienced on different types of textual datasets. The paper illustrates that the proposed method provides a very significant performance increase, as compared to state of the art methods, in all the studied cases even when a single bag of words model is exploited for data description. Interestingly, the most significant performance gain is obtained in the case of the classification of highly unbalanced, highly multidimensional and noisy data, with a high degree of similarity between the classes.
Keywords
Feature maximization Clustering quality index Feature selection Supervised learning Unbalanced data TextNotes
References
- Aha, D., & Kibler, D. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.Google Scholar
- Attik, M., Lamirel, J.-C., Al Shehabi, S. (2006). Clustering analysis for data with multiple labels. In Proceedings of the IASTED International conference on databases and applications (DBA). Innsbruck.Google Scholar
- Bache, K., & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml.
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.MATHCrossRefGoogle Scholar
- Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A. (2012). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 1–37.Google Scholar
- Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984). Classification and Regression Trees. Belmont: Wadsworth International Group.MATHGoogle Scholar
- Chawla, N.V., Bowyer, K.V., Hall, L.O., Kegelmeyer, W.P. (2002). Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.MATHGoogle Scholar
- Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence, 151(1), 155–176.MATHMathSciNetCrossRefGoogle Scholar
- Daviet, H. (2009). Class-Add, une procédure de sélection de variables basée sur une troncature k-additive de l’ information mutuelle et sur une classification ascendante hiérarchique en pré-traitement. PhD, Université de Nantes, France.Google Scholar
- Dempster, A. P., Laird, N. M., Rubin, D. B. (1977). Maximum likelihood for incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, 39(1), 1–38.MATHMathSciNetGoogle Scholar
- Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3, 1289–1305.MATHGoogle Scholar
- Good, P. (2006). Resampling methods, 3rd edn. Birkhauser.Google Scholar
- Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine learning, 46(1), 389–422.MATHCrossRefGoogle Scholar
- Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.MATHGoogle Scholar
- Hall, M.A., & Smith, L.A. (1999). Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In Proceedings of the 12th international florida artificial intelligence research society conference (pp. 235-239). AAAI Press.Google Scholar
- Hajlaoui, K., Cuxac, P., Lamirel, J.C., Francois, C. (2012). Enhancing patent expertise through automatic matching with scientific papers. Discovery Science LNCS, 7569, 299–312.CrossRefGoogle Scholar
- Ken Lang, K. (1995). Learning to filter netnews. In Proceedings of the 12th international conference on machine learning (pp. 331–339).Google Scholar
- Kohavi, R., & John, G.H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273–324.MATHCrossRefGoogle Scholar
- Kononenko, I. (1994). Estimating Attributes: Analysis and Extensions of RELIEF. European Conference on Machine Learning, 171–182.Google Scholar
- Ladha, L., & Deepa, T. (2011). Feature selection methods and algorithms. International Journal on Computer Science and Engineering, 3(5), 1787–1797.Google Scholar
- Lallich, S., & Rakotomalala, R. (2000). Fast Feature Selection Using Partial Correlation for Multi-valued Attributes. In D. A. Zighed, J. Komorowski, J. Zytkow (Eds.), Principles of data mining and knowledge discovery, 221-231. Lecture notes in computer science (pp. 1910). Berlin-Heidelberg: Springer.Google Scholar
- Lamirel, J.-C., Al Shehabi, S., Francois, C., Hoffmann, M. (2004). New classification quality estimators for analysis of documentary information: application to patent analysis and web mapping. Scientometrics, 60(3).Google Scholar
- Lamirel, J.-C., & Ta, A.P. (2008). Combination of hyperbolic visualization and graph-based approach for organizing data analysis results: an application to social network analysis. In Proceedings of the 4th international conference on webometrics, informetrics and scientometrics and 9th COLLNET meeting. Berlin.Google Scholar
- Lamirel, J.-C., Ghribi, M., Cuxac, P. (2010). Unsupervised recall and precision measures: a step towards new efficient clustering quality indexes. In Proceedings of the 19th international conference on computational statistics (COMPSTAT’2010). Paris.Google Scholar
- Lamirel, J.-C, Mall, R., Cuxac, P., Safi, G. (2011). Variations to incremental growing neural gas algorithm based on label maximization. In Proceedings of IJCNN 2011. San Jose.Google Scholar
- Lamirel, J.-C. (2012). A new approach for automatizing the analysis of research topics dynamics: application to optoelectronics research. Scientometrics, 93, 151–166.CrossRefGoogle Scholar
- Mejía-Lavalle, M., Sucar, E., Arroyo, G. (2006). Feature selection with a perceptron neural net. Feature selection for data mining: interfacing machine learning and statistics.Google Scholar
- Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11), 559–572.CrossRefGoogle Scholar
- Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. In B. Schoelkopf, C. Burges, A. Smola (Eds.). Advances in kernel methods - support vector learning. MIT Press.Google Scholar
- Porter, M.F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.CrossRefGoogle Scholar
- Quinlan, R. (1993). C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann.Google Scholar
- Salton, G. (1971). Automatic processing of foreign language documents. Englewood Cliffs: Prentice-Hill.Google Scholar
- Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.CrossRefGoogle Scholar
- Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of international conference on new methods in language processing.Google Scholar
- Witten, I.H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques. Morgan Kaufmann.Google Scholar
- Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: a fast correlation-based filter solution. ICML 2003, 856–863. Washington.Google Scholar
- Zhang, T., & Oles, F.J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval, 4(1), 5–31.MATHCrossRefGoogle Scholar