Skip to main content

AIFSA: A New Approach for Feature Selection and Weighting

  • Conference paper
Informatics Engineering and Information Science (ICIEIS 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 252))

  • 1483 Accesses

Abstract

Feature selection is a typical search problem where each state in the search space represents a subset of features candidate for selection. Out of n features, 2n subsets can be constructed, hence, an exhaustive search of all subsets becomes infeasible when n is relatively large. Therefore, Feature selection is done by employing a heuristic search algorithm that tries to reach the optimal feature subset. Here, we propose a new wrapper feature selection and weighting algorithm called Artificial Immune Feature Selection Algorithm (AIFSA); the algorithm is based on the metaphors of the Clonal Selection Algorithm (CSA). AIFSA, by itself, is not a classification algorithm, rather it utilizes well-known classifiers to evaluate and promote candidate feature subset. Experiments were performed on textual datasets like WebKB and Syskill&Webert web page ratings. Experimental results showed AIFSA competitive performance over traditional well-known filter feature selection approaches as well as some wrapper approaches existing in literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  2. Forman, G.: An extensive empirical study of feature selection metrics for text classification. Machine Learning Research 3, 1289–1305 (2003)

    MATH  Google Scholar 

  3. Qi, X., Davison, B.D.: Web Page Classification: Features and Algorithms. ACM Computing Surveys 41(2) (2009)

    Google Scholar 

  4. Singh, S.R., Murthy, H.A., Gonsalves, T.A.: Feature Selection for Text Classification Based on Gini Coefficient of Inequality. Journal of Machine Learning Research 10, 76–85 (2010)

    Google Scholar 

  5. Xhemali, D., Hinde, C.J., Stone, R.G.: Naïve Bayes vs. Decision Trees vs. Neural Networks in the classification of training web pages. International Journal of Computer Science Issues 4(1), 16–23 (2009)

    Google Scholar 

  6. Otsubo, M., Hung, B.Q., Hijikata, Y., Nishida, S.: Web Page Classification using Anchor-related Text Extracted by a DOM-based Method. Information and Media Technologies 5(1), 193–205 (2010)

    Google Scholar 

  7. Othman, M.S., Yusuf, L.M., Salim, J.: Features Discovery for Web Classification Using Support Vector Machine. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics (ICICCI), Kuala Lumpur, pp. 36–40 (2010)

    Google Scholar 

  8. Baykan, E., Henzinger, M., Marian, L., Weber, I.: Purely URL-based topic classification. In: 18th International Conference on World Wide Web (WWW 2009), pp. 1109–1110 (2009)

    Google Scholar 

  9. Meshkizadeh, S., Rahmani, A.M.: Webpage Classification based on Compound of Using HTML Features & URL Features and Features of Sibling Pages. International Journal of Advancements in Computing Technology 2(4), 36–46 (2010)

    Article  Google Scholar 

  10. Rajalakshmi, R., Aravindan, C.: Naive Bayes Approach for Website Classification. Communications in Computer and Information Science 147(2), 323–326 (2011)

    Article  Google Scholar 

  11. Neville, J., Jensen, D.: Iterative classification in relational data. In: Workshop on Learning Statistical Models from Relational Data (AAAI 2000), pp. 13–20 (2000)

    Google Scholar 

  12. Slattery, S., Mitchell, T.M.: Discovering Test Set Regularities in Relational Domains. In: 17th International Conference on Machine Learning (ICML 2000), Stanford, CA, pp. 895–902 (2000)

    Google Scholar 

  13. Pazzani, M., Billsus, D.: Learning and revising user profiles: The identification of interesting web sites. Machine Learning 27(3), 313–331 (1997)

    Article  Google Scholar 

  14. Twycross, J., Cayzer, S.: An immune-based approach to document classification. In: Intelligent Information Processing and Web Mining, Proceedings of the International IIS (IIPWM 2003), Zakopane, pp. 33–46 (2002)

    Google Scholar 

  15. Fürnkranz, J., Mitchell, T., Riloff, E.: A case study in using linguistic phrases for text categorization on the WWW. In: Working Notes of the AAAI/ICML Workshop on Learning for Text Categorization, pp. 5–12. AAAI Press (1998)

    Google Scholar 

  16. Cohen, W.W.: Fast effective rule induction. In: 12th International Conference on Machine Learning (ML 1995), Tahoe City, California, pp. 115–123 (1995)

    Google Scholar 

  17. Craven, M., Slattery, S.: Relational learning with statistical predicate invention: better models for hypertext. Machine Learning 43(1/2), 97–117 (2001)

    Article  MATH  Google Scholar 

  18. Ghani, R., Slattery, S., Yang, Y.: Hypertext categorization using hyperlink patterns and meta data. In: 18th International Conference on Machine Learning (ICML 2001), Williamstown, pp. 115–178 (2001)

    Google Scholar 

  19. Quinlan, J.R.: Learning logical definitions from relations. Machine Learning 5(3), 239–266 (1990)

    Google Scholar 

  20. Sun, A., Lim, E.P., Ng, W.K.: Web classification using support vector machine. In: 4th ACM CIKM International Workshop on Web Information and Data Management (WIDM 2002), Virginia, pp. 96–99 (2002)

    Google Scholar 

  21. Schneider, K.M.: Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, pp. 252–263 (2005)

    Google Scholar 

  22. Yang, H., King, I.: Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5507, pp. 53–60. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fouad, W., Badr, A., Farag, I. (2011). AIFSA: A New Approach for Feature Selection and Weighting. In: Abd Manaf, A., Zeki, A., Zamani, M., Chuprat, S., El-Qawasmeh, E. (eds) Informatics Engineering and Information Science. ICIEIS 2011. Communications in Computer and Information Science, vol 252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25453-6_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25453-6_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25452-9

  • Online ISBN: 978-3-642-25453-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics