A Survey on Filter Techniques for Feature Selection in Text Mining

Bharti, Kusum Kumari; Singh, Pramod kumar

doi:10.1007/978-81-322-1602-5_154

Kusum Kumari Bharti⁹ &
Pramod kumar Singh⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 236))

1887 Accesses
9 Citations

Abstract

A large portion of a document is usually covered by irrelevant features. Instead of identifying actual context of the document, such features increase dimensions in the representation model and computational complexity of underlying algorithm, and hence adversely affect the performance. It necessitates a requirement of relevant feature selection in the given feature space. In this context, feature selection plays a key role in removing irrelevant features from the original feature space. Feature selection methods are broadly categorized into three groups: filter, wrapper, and embedded. Filter methods are widely used in text mining because of their simplicity, computational complexity, and efficiency. In this article, we provide a brief survey of filter feature selection methods along with some of the recent developments in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)
Google Scholar
Chen, X.: An improved branch and bound algorithm for feature selection. Pattern Recogn. Lett. 24(12), 1925–1933 (2003)
Google Scholar
Chuang, L.Y., Tsai, S.W., Yang, C.H.: Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 38(10), 12699–12707 (2011)
Google Scholar
Chuang, L.Y., Yang, C.H., Wu, K.C., Yang, C.H.: A hybrid feature selection method for DNA microarray data. Comput. Biol. Med. 41(4), 228–237 (2011)
Google Scholar
Church, K.W., Hanks, P.: Word association norm, mutual information and lexicography. J. Comput. Linguist. 27(1), 22–29 (1990)
Google Scholar
Deerwester, S.: Improving information retrieval with latent semantic indexing. In: Proceedings of the 51st Annual Meeting of the American Society for Information Science, Vol. 25, pp. 36–40 (1988)
Google Scholar
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 185–205 (2005)
Google Scholar
Ferreira, A.J., Figueired, M.A.T.: Efficient feature selection filters for high-dimensional data. Pattern Recogn. Lett. 33(13), 1794–1804 (2012)
Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. Thesis. Department of Computer Science, University of Waikato (1999)
Google Scholar
Hsu, H.H., Hsieh, C. W., Lu, M.D.: Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011)
Google Scholar
Li, B., Zhang, P., Ren, G., Xing, Z.: A two stage feature selection method for gear fault diagnosis using reliefF and GA-wrapper. In: Proceedings International Conference on Measuring Technology and Mechatronics Automation, pp. 578–581 (2009)
Google Scholar
Liu, L., Kang, J., Yu, J., Wang, Z.: A comparative study on unsupervised feature selection methods for text clustering. In: Proceedings of Natural Language Processing and Knowledge, Engineering, pp. 59–601 (2005)
Google Scholar
Liu, Y., Qin, Z., Xu, Z., He, X.: Feature selection with particle swarms. In: Computational and Information Science, pp. 425–430. Springer, Heidelberg (2004)
Google Scholar
Liu, Y., Wang, G., Chen, H., Dong, H., Zhu, X., Wang, S.: An improved particle swarm optimization for feature selection. J. Bionic Eng. 8(2), 191–200 (2011)
Google Scholar
Meng, J., Lin, H., Yu, Y.: A two-stage feature selection method for text categorization. Knowl.-Based Syst. 62(7), 2793–2800 (2011)
Google Scholar
Mitra, P., Murthy, C., Pal, S.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Machine Intell. 24(3), 301–312 (2002)
Google Scholar
Ng, H. T., Goh, W. B., Low, K. L.: Feature selection, perception learning, and a usability case study for text categorization. In: Proceedings of the 20th ACM International Conference on Research and Development in, Information Retrieval, pp. 67–73 (1997)
Google Scholar
Pearson, K.: On lines and planes of closest filt to systems of points in space. Phil. Mag. 1(6), 559–572 (1901)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Google Scholar
Pudil, P., Novoviciva, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15(11), 1119–1125 (1994)
Google Scholar
Quinlan, J.R.: Induction of decision tree. Mach. learn. 1(1), 81–106 (1986)
Google Scholar
Salton, G., Wong, A., Yang, C. S.: A vector space model for automatic indexing. Commun. ACM18(11), 613–620 (1975)
Google Scholar
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text clustering. Expert Syst. Appl. 33(1), 1–5 (2007)
Google Scholar
Shevade, S., Keerthi, S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003)
Google Scholar
Song, W., Park, S.C.: Genetic algorithm for text clustering based on latent semantic indexing. Comput. Math. Appl. 57(11–12), 1901–1907 (2009)
Google Scholar
Tu, C.J., Chuang, L.Y., Chang, J.Y., Yang, C.H.: Feature selection using PSO-SVM. In: Proceedings of Multiconferenc of Engineers, pp. 138–143 (2006)
Google Scholar
Uguz, H.: A hybrid system based on information gain and principal component analysis for the classification of transcranial Doppler signals. Comput. Methods Programs Biomed. 107(3), 598–609 (2012)
Google Scholar
Uguz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl. Based. Syst. 24(7), 1024–1032 (2011)
Google Scholar
Unler, A., Murat, A., Chinnam, R.B.: \(\text{ mr }^{2}\text{ PSO }\): A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci. 181(20), 4625–4641 (2011)
Google Scholar
Yang, C.H., Chuang, L.Y., Yang, C.H.: IG-GA: a hybrid filter/wrapper method for feature selection of microarray data. J. Med. Biol. Eng. 30(1), 23–28 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Intelligence and Data Mining Research Lab, ABV-Indian Institute of Information Technology and Management Gwalior, Morena Link Road, Gwalior, Madhya Pradesh, India
Kusum Kumari Bharti & Pramod kumar Singh

Authors

Kusum Kumari Bharti
View author publications
You can also search for this author in PubMed Google Scholar
Pramod kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kusum Kumari Bharti .

Editor information

Editors and Affiliations

Institute of Engineering and Technology, JK Lakshmipat University, Jaipur, Rajasthan, India
B. V. Babu
Department of Computer Science, Liverpool Hope University, Liverpool, United Kingdom
Atulya Nagar
Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Kusum Deep
Department of Paper Technology, Indian Institute of Technology Roorkee, Roorkee, India
Millie Pant
Department of Applied Mathematics, South Asian University, New Delhi, India
Jagdish Chand Bansal
Institute of Engineering and Technology, JK Lakshmipat University, Jaipur, Rajasthan, India
Kanad Ray
Institute of Engineering and Technology, JK Lakshmipat University, Jaipur, Rajasthan, India
Umesh Gupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bharti, K.K., Singh, P.k. (2014). A Survey on Filter Techniques for Feature Selection in Text Mining. In: Babu, B., et al. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. Advances in Intelligent Systems and Computing, vol 236. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1602-5_154

Download citation

DOI: https://doi.org/10.1007/978-81-322-1602-5_154
Published: 26 February 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1601-8
Online ISBN: 978-81-322-1602-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics