Skip to main content

Fuzzy Rough Set-Based Feature Selection for Text Categorization

  • Chapter
  • First Online:
Fuzzy, Rough and Intuitionistic Fuzzy Set Approaches for Data Handling

Part of the book series: Forum for Interdisciplinary Mathematics ((FFIM))

Abstract

Recent technological advances led to accumulation of large volumes of data in digital repositories. Mining data for information retrieval from such repositories faces a big challenge both in perspective of dimensionality and the sample size. Mining tasks such as text mining have been confronted with the problem of high dimensionality of the data. Thus, it becomes necessary to minimize the high dimensionality of the data. Fuzzy rough set feature selection techniques have proved highly efficient in dimension reduction. It can successfully handle the data dependencies and reduce data dimensionality without compromising the performance of classification and clustering. In this paper, an attempt has been made to review major developments in fuzzy rough set-based feature selection domain over a period of 20 years. Further, the paper discusses the potential of fuzzy rough set-based feature selection in the domain of text categorization. A hybrid feature selection technique is proposed based on large-scale spectral clustering with landmark-based representation and fuzzy rough feature selection and it is found to work efficiently in memory short environments. Moreover, the proposed technique reduces the data dimensionality immensely on the considered datasets with acceptable degree of clustering accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Radzikowska, A.M., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets Syst. 126(2), 137–155 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  2. Albrecht, A.A.: Stochastic local search for the feature set problem, with applications to microarray data. Appl. Math. Comput. 183(2), 1148–1164 (2006)

    MathSciNet  MATH  Google Scholar 

  3. Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. AAAI 91, 547–552 (1991)

    MATH  Google Scholar 

  4. Anaraki, J.R., Eftekhari, M.: Improving fuzzy-rough quick reduct for feature selection. In: 2011 19th Iranian Conference on Electrical Engineering, pp. 1–6. IEEE (2011)

    Google Scholar 

  5. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2002)

    Google Scholar 

  6. Bhatt, R.B., Gopal, M.: On fuzzy-rough sets approach to feature selection. Pattern Recogn. Lett. 26(7), 965–975 (2005)

    Article  Google Scholar 

  7. Bhatt, R.B., Gopal, M.: On the compact computational domain of fuzzy-rough sets. Pattern Recogn. Lett. 26(11), 1632–1640 (2005)

    Article  Google Scholar 

  8. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)

    MATH  Google Scholar 

  9. Chen, D., Hu, Q., Yang, Y.: Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets. Inf. Sci. 181(23), 5169–5179 (2011)

    Article  MATH  Google Scholar 

  10. Chen, X., Cai, D.: Large scale spectral clustering with landmark-based representation. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)

    Google Scholar 

  11. Chen, J., Mi, J., Lin, Y.: A graph approach for fuzzy-rough feature selection. Fuzzy Sets Syst. 391, 96–116 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  12. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)

    Article  Google Scholar 

  13. Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice Hall (1982)

    Google Scholar 

  15. Diao, R., Mac Parthaláin, N., Shen, Q.: Dynamic feature selection with fuzzy-rough sets. In: 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7. IEEE (2013)

    Google Scholar 

  16. Du¨ntsch, I., Gediga, G.: Rough Set Data Analysis: A Road to Non-invasive Knowledge Discovery. Methodos, Bangor (2000)

    Google Scholar 

  17. Fukunaga, K.: Introduction to Statistical Pattern Recognition, ser. Computer Science and Scientific Computing. Academic, Boston (1990)

    Google Scholar 

  18. Gennari, J.H., Langley, P., Fisher, D.: Models of incremental concept formation. Artif. Intell. 40(1–3), 11–61 (1989)

    Article  Google Scholar 

  19. Hu, Q., Xie, Z., Yu, D.: Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recogn. 40(12), 3509–4352 (2007)

    Article  MATH  Google Scholar 

  20. Jensen, R.: Combining Rough and Fuzzy Sets for Feature Selection. Doctoral dissertation, University of Edinburgh (2005)

    Google Scholar 

  21. Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches. IEEE Trans. Knowl. Data Eng. 16(12), 1457–1471 (2004)

    Article  Google Scholar 

  22. Jensen, R., Shen, Q.: Fuzzy–rough attribute reduction with application to web categorization. Fuzzy Sets Syst. 141(3), 469–485 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  23. Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)

    Article  Google Scholar 

  24. Jensen, R., Shen, Q.: New approaches to fuzzy-rough feature selection. IEEE Trans. Fuzzy Syst. 17(4), 824–883 (2008)

    Article  Google Scholar 

  25. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Machine Learning Proceedings 1994, pp. 121–129. Morgan Kaufmann

    Google Scholar 

  26. Kuncheva, L.I.: Fuzzy rough sets: application to feature selection. Fuzzy Sets Syst. 51(2), 147–153 (1992)

    Article  MathSciNet  Google Scholar 

  27. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)

    Article  MATH  Google Scholar 

  28. Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 46(1–3), 423–444 (2002)

    Google Scholar 

  29. Beynon, M.J.: Stability of continuous value discretisation: an application within rough set theory. Int. J. Approx. Reas. 35, 29–53 (2004)

    Article  MATH  Google Scholar 

  30. Ni, P., Zhao, S., Wang, X., Chen, H., Li, C., Tsang, E.C.: Incremental feature selection based on fuzzy rough sets. Inf. Sci. 536, 185–204 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  31. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)

    Article  MATH  Google Scholar 

  32. Parthaláin, N.M., Jensen, R.: Measures for unsupervised fuzzy-rough feature selection. Int. J. Hybrid Intell. Syst. 7(4), 249–259 (2010)

    MATH  Google Scholar 

  33. Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982)

    Article  MATH  Google Scholar 

  34. Qian, Y., Wang, Q., Cheng, H., Liang, J., Dang, C.: Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst. 258, 61–78 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  35. Qu, Y., Rong, Y., Deng, A., Yang, L.: Associated multi-label fuzzy-rough feature selection. In: 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems, pp. 1–6. IEEE, Otsu, Japan

    Google Scholar 

  36. Shen, Q., Jensen, R.: Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring. Pattern Recogn. 3 (2004)

    Google Scholar 

  37. Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Google Scholar 

  38. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. Artif. Intell. Mag. 17(3), 37–54 (1996)

    Google Scholar 

  39. Wang, C., Qi, Y., Shao, M., Hu, Q., Chen, D., Qian, Y., Lin, Y.: A fitting model for feature selection with fuzzy rough sets. IEEE Trans. Fuzzy Syst. 25(4), 741–753 (2016)

    Article  Google Scholar 

  40. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, No. 412–420, p. 35 (1997)

    Google Scholar 

  41. Yang, Y., Chen, D., Wang, H., Tsang, E.C., Zhang, D.: Fuzzy rough set based incremental attribute reduction from dynamic data with sample arriving. Fuzzy Sets Syst. 312, 66–86 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  42. Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 737–742 (2004)

    Google Scholar 

  43. Zhao, S., Tsang, E.C., Chen, D.: The model of fuzzy variable precision rough sets. IEEE Trans. Fuzzy Syst. 17(2), 451–467 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shahin Ara Begum .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gupta, A., Begum, S.A. (2023). Fuzzy Rough Set-Based Feature Selection for Text Categorization. In: Som, T., Castillo, O., Tiwari, A.K., Shreevastava, S. (eds) Fuzzy, Rough and Intuitionistic Fuzzy Set Approaches for Data Handling. Forum for Interdisciplinary Mathematics. Springer, Singapore. https://doi.org/10.1007/978-981-19-8566-9_4

Download citation

Publish with us

Policies and ethics