Skip to main content

Feature Evaluation by Filter, Wrapper, and Embedded Approaches

  • Chapter
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 584))

Abstract

The choice of particular variables for construction of a set of characteristic features relevant to classification can be executed in a kind of external process with respect to a classification system employed in pattern recognition, it can depend on the performance of such system, or it can involve some inherent mechanism, build-in in the system. The three types of approaches correspond to three categories of methodologies typically exploited in feature selection and reduction: filters, wrappers, and embedded solutions, respectively. They are used when domain knowledge is unavailable or insufficient for an informed choice, or in order to support this expert knowledge to achieve higher efficiency, enhanced classification, or reduced sizes of classifiers. The chapter illustrates the combinations of the three approaches with the aim of feature evaluation, for binary classification with balanced, for the task of authorship attribution that belongs with stylometric analysis of texts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.: Applying data mining techniques in text analysis. Technical Report C-1997-23. Department of Computer Science, University of Helsinki, Finland (1997)

    Google Scholar 

  2. Argamon, S., Karlgren, J., Shanahan, J.: Stylistic analysis of text for information access. In: Proceedings of the 28th International ACM Conference on Research and Development in Information Retrieval, Brazil (2005)

    Google Scholar 

  3. Argamon, S., Burns, K., Dubnov, S. (eds.): The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer, Berlin (2010)

    Google Scholar 

  4. Baayen, H., van Haltern, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)

    Article  Google Scholar 

  5. Bayardo Jr, R., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)

    Google Scholar 

  6. Berber Sardinha, T.: Using key words in text analysis: practical aspects. Available on-line from ftp://ftp.liv.ac.uk/pub/linguistics (1999)

  7. Craig, H.: Stylistic analysis and authorship studies. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)

    Google Scholar 

  8. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)

    Article  Google Scholar 

  9. Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151, 155–176 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  10. Deuntsch, I., Gediga, G.: Rough Set Data Analysis: A Road to Noninvasive Knowledge Discovery. Mathodos Publishers, Bangor (2000)

    Google Scholar 

  11. Fiesler, E., Beale, R.: Handbook of Neural Computation. Oxford University Press, Oxford (1997)

    Book  Google Scholar 

  12. Greco, S., Matarazzo, B., Słowiński, R.: Rough set theory for multicriteria decision analysis. Eur. J. Oper. Res. 129(1), 1–47 (2001)

    Article  MATH  Google Scholar 

  13. Greco, S., Matarazzo, B., Słowiński, R.: Dominance-based rough set approach as a proper way of handling graduality in rough set theory. Trans. Rough Sets 7, 36–52 (2007)

    Google Scholar 

  14. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  16. Jelonek, J., Krawiec, K., Stefanowski, J.: Comparative study of feature subset selection techniques for machine learning tasks. In: Proceedings of the 7th Workshop on Intelligent Information Systems (1998)

    Google Scholar 

  17. Jensen, R., Shen, Q.: Computational Intelligence and Feature Selection. Wiley, Hoboken (2008)

    Book  Google Scholar 

  18. John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning: Proceedings of the 11th International Conference, pp. 121–129. Morgan Kaufmann Publishers (1994)

    Google Scholar 

  19. Kavzoglu, T., Mather, P.: Assessing artificial neural network pruning algorithms. In: Proceedings of the 24th Annual Conference and Exhibition of the Remote Sensing Society, pp. 603–609. Greenwich (2011)

    Google Scholar 

  20. Khmelev, D., Tweedie, F.: Using Markov chains for identification of writers. Lit. Linguist. Comput. 16(4), 299–307 (2001)

    Article  Google Scholar 

  21. Kingston, G., Maier, H., Lambert, M.: A statistical input pruning method for artificial neural networks used in environmental modelling. In: Transactions of the 2nd Biennial Meeting of the International Environmental Modelling and Software Society, pp. 87–92. Osnabrueck, Germany (2004)

    Google Scholar 

  22. Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  MATH  Google Scholar 

  23. Lal, T., Chapelle, O., Weston, J., Elisseeff, E.: Embedded methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L. (eds.) Feature Extraction: Foundations and Applications. Studies in Fuzziness and Soft Computing, vol. 207, pp. 137–165. Springer, Berlin (2006)

    Chapter  Google Scholar 

  24. Lynam, T., Clarke, C., Cormack, G.: Information extraction with term frequencies. In: Proceedings of the Human Language Technology Conference, pp. 1–4. San Diego (2001)

    Google Scholar 

  25. Moshkov, M., Piliszczuk, M., Zielosko, B.: On partial covers, reducts and decision rules with weights. Trans. Rough Sets 6, 211–246 (2006)

    Google Scholar 

  26. Moshkow, M., Skowron, A., Suraj, Z.: On covering attribute sets by reducts. In: Kryszkiewicz, M., Peters, J., Rybinski, H., Skowron, A. (eds.) Rough Sets and Emerging Intelligent Systems Paradigms. LNCS (LNAI), vol. 4585, pp. 175–180. Springer, Berlin (2007)

    Chapter  Google Scholar 

  27. Novaković, J., Strbac, P., Bulatović, D.: Toward optimal feature selection using ranking methods and classification algorithms. Yugosl. J. Oper. Res. 21(1), 119–135 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  28. Pawlak, Z.: Computing, artificial intelligence and information technology: rough sets, decision algorithms and Bayes’ theorem. Eur. J. Oper. Res. 136, 181–189 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  29. Pawlak, Z.: Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  30. Peng, R.: Statistical aspects of literary style. Bachelor’s Thesis, Yale University (1999)

    Google Scholar 

  31. Peng, R., Hengartner, H.: Quantitative analysis of literary styles. Am. Stat. 56(3), 15–38 (2002)

    Article  MathSciNet  Google Scholar 

  32. Sikora, M.: Rule quality measures in creation and reduction of data rule models. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H., Słowiński, R. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 4259, pp. 716–725. Springer (2006)

    Google Scholar 

  33. Słowiński, R., Greco, S., Matarazzo, B.: Dominance-based rough set approach to reasoning about ordinal data. LNCS (LNAI) 4585, 5–11 (2007)

    Google Scholar 

  34. Stańczyk, U.: Dominance-based rough set approach employed in search of authorial invariants. In: Kurzyński, M., Woźniak, M. (eds.) Computer Recognition Systems 3. AISC, vol. 57, pp. 315–323. Springer, Berlin (2009)

    Google Scholar 

  35. Stańczyk, U.: DRSA decision algorithm analysis in stylometric processing of literary texts. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) Rough Sets and Current Trends in Computing. LNCS (LNAI), vol. 6086, pp. 600–609. Springer, Berlin (2010)

    Chapter  Google Scholar 

  36. Stańczyk, U.: Rough set-based analysis of characteristic features for ANN classifier. In: Grana Romay, M., Corchado, E., Garcia-Sebastian, M. (eds.) Hybrid Artificial Intelligence Systems Part 1. LNCS (LNAI), vol. 6076, pp. 565–572. Springer, Berlin (2010)

    Chapter  Google Scholar 

  37. Stańczyk, U.: On performance of DRSA-ANN classifier. In: Corchado, M., Kurzyński, E., Woźniak, M. (eds.) Hybrid Artificial Intelligence Systems Part 2. LNCS (LNAI), vol. 6679, pp. 172–179. Springer, Berlin (2011)

    Google Scholar 

  38. Stańczyk, U.: Rule-based approach to computational stylistics. In: Bouvry, P., Kłopotek, M., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) Security and Intelligent Information Systems. LNCS (LNAI), vol. 7053, pp. 168–179. Springer, Berlin (2012)

    Chapter  Google Scholar 

  39. Stańczyk, U.: On preference order of DRSA conditional attributes for computational stylistics. In: Decker, H., Lhotska, L., Link, S., Basl, J., Tjoa, A. (eds.) Database and Expert Systems Applications. LNCS, vol. 8056, pp. 26–33. Springer, Berlin (2013)

    Chapter  Google Scholar 

  40. Stańczyk, U.: Relative reduct-based estimation of relevance for stylometric features. In: Catania, B., Guerrini, G., Pokorny, J. (eds.) Advances in Databases and Information Systems. LNCS, vol. 8133, pp. 135–147. Springer, Berlin (2013)

    Chapter  Google Scholar 

  41. Stańczyk, U.: Rough set and artificial neural network approach to computational stylistics. In: Ramanna, S., Howlett, R., Jain, L. (eds.) Emerging Paradigms in Machine Learning, Smart Innovation, Systems and Technologies, vol. 13, pp. 441–470. Springer, Berlin (2013)

    Chapter  Google Scholar 

  42. Stańczyk, U.: Weighting of attributes in an embedded rough approach. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions. AISC, vol. 242, pp. 475–483. Springer, Berlin (2013)

    Chapter  Google Scholar 

  43. Sun, Y., Wu, D.: A RELIEF based feature extraction algorithm. In: Proceedings of the SIAM International Conference on Data Mining, pp. 188–195 (2008)

    Google Scholar 

Download references

Acknowledgments

All texts used in the performed experiments are available for on-line reading and download thanks to Project Guttenberg (http://www.gutenberg.org). 4eMka Software used in DRSA processing [13, 33] was developed at the Laboratory of Intelligent Decision Support Systems, (http://www-idss.cs.put.poznan.pl/), Poznan University of Technology, Poland. For simulation of ANN there was used California Scientific Brainmaker software package. Ranking of features with Relief algorithm was executed with WEKA software [15].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Urszula Stańczyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Stańczyk, U. (2015). Feature Evaluation by Filter, Wrapper, and Embedded Approaches. In: Stańczyk, U., Jain, L. (eds) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol 584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45620-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45620-0_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45619-4

  • Online ISBN: 978-3-662-45620-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics