AIFSA: A New Approach for Feature Selection and Weighting

Fouad, Walid; Badr, Amr; Farag, Ibrahim

doi:10.1007/978-3-642-25453-6_49

Walid Fouad³,
Amr Badr³ &
Ibrahim Farag³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 252))

Included in the following conference series:

International Conference on Informatics Engineering and Information Science

1483 Accesses

Abstract

Feature selection is a typical search problem where each state in the search space represents a subset of features candidate for selection. Out of n features, 2n subsets can be constructed, hence, an exhaustive search of all subsets becomes infeasible when n is relatively large. Therefore, Feature selection is done by employing a heuristic search algorithm that tries to reach the optimal feature subset. Here, we propose a new wrapper feature selection and weighting algorithm called Artificial Immune Feature Selection Algorithm (AIFSA); the algorithm is based on the metaphors of the Clonal Selection Algorithm (CSA). AIFSA, by itself, is not a classification algorithm, rather it utilizes well-known classifiers to evaluate and promote candidate feature subset. Experiments were performed on textual datasets like WebKB and Syskill&Webert web page ratings. Experimental results showed AIFSA competitive performance over traditional well-known filter feature selection approaches as well as some wrapper approaches existing in literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance 97, 273–324 (1997)
Article MATH Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Machine Learning Research 3, 1289–1305 (2003)
MATH Google Scholar
Qi, X., Davison, B.D.: Web Page Classification: Features and Algorithms. ACM Computing Surveys 41(2) (2009)
Google Scholar
Singh, S.R., Murthy, H.A., Gonsalves, T.A.: Feature Selection for Text Classification Based on Gini Coefficient of Inequality. Journal of Machine Learning Research 10, 76–85 (2010)
Google Scholar
Xhemali, D., Hinde, C.J., Stone, R.G.: Naïve Bayes vs. Decision Trees vs. Neural Networks in the classification of training web pages. International Journal of Computer Science Issues 4(1), 16–23 (2009)
Google Scholar
Otsubo, M., Hung, B.Q., Hijikata, Y., Nishida, S.: Web Page Classification using Anchor-related Text Extracted by a DOM-based Method. Information and Media Technologies 5(1), 193–205 (2010)
Google Scholar
Othman, M.S., Yusuf, L.M., Salim, J.: Features Discovery for Web Classification Using Support Vector Machine. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics (ICICCI), Kuala Lumpur, pp. 36–40 (2010)
Google Scholar
Baykan, E., Henzinger, M., Marian, L., Weber, I.: Purely URL-based topic classification. In: 18th International Conference on World Wide Web (WWW 2009), pp. 1109–1110 (2009)
Google Scholar
Meshkizadeh, S., Rahmani, A.M.: Webpage Classification based on Compound of Using HTML Features & URL Features and Features of Sibling Pages. International Journal of Advancements in Computing Technology 2(4), 36–46 (2010)
Article Google Scholar
Rajalakshmi, R., Aravindan, C.: Naive Bayes Approach for Website Classification. Communications in Computer and Information Science 147(2), 323–326 (2011)
Article Google Scholar
Neville, J., Jensen, D.: Iterative classification in relational data. In: Workshop on Learning Statistical Models from Relational Data (AAAI 2000), pp. 13–20 (2000)
Google Scholar
Slattery, S., Mitchell, T.M.: Discovering Test Set Regularities in Relational Domains. In: 17th International Conference on Machine Learning (ICML 2000), Stanford, CA, pp. 895–902 (2000)
Google Scholar
Pazzani, M., Billsus, D.: Learning and revising user profiles: The identification of interesting web sites. Machine Learning 27(3), 313–331 (1997)
Article Google Scholar
Twycross, J., Cayzer, S.: An immune-based approach to document classification. In: Intelligent Information Processing and Web Mining, Proceedings of the International IIS (IIPWM 2003), Zakopane, pp. 33–46 (2002)
Google Scholar
Fürnkranz, J., Mitchell, T., Riloff, E.: A case study in using linguistic phrases for text categorization on the WWW. In: Working Notes of the AAAI/ICML Workshop on Learning for Text Categorization, pp. 5–12. AAAI Press (1998)
Google Scholar
Cohen, W.W.: Fast effective rule induction. In: 12th International Conference on Machine Learning (ML 1995), Tahoe City, California, pp. 115–123 (1995)
Google Scholar
Craven, M., Slattery, S.: Relational learning with statistical predicate invention: better models for hypertext. Machine Learning 43(1/2), 97–117 (2001)
Article MATH Google Scholar
Ghani, R., Slattery, S., Yang, Y.: Hypertext categorization using hyperlink patterns and meta data. In: 18th International Conference on Machine Learning (ICML 2001), Williamstown, pp. 115–178 (2001)
Google Scholar
Quinlan, J.R.: Learning logical definitions from relations. Machine Learning 5(3), 239–266 (1990)
Google Scholar
Sun, A., Lim, E.P., Ng, W.K.: Web classification using support vector machine. In: 4th ACM CIKM International Workshop on Web Information and Data Management (WIDM 2002), Virginia, pp. 96–99 (2002)
Google Scholar
Schneider, K.M.: Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, pp. 252–263 (2005)
Google Scholar
Yang, H., King, I.: Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5507, pp. 53–60. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Computers and Information, Cairo University, Egypt
Walid Fouad, Amr Badr & Ibrahim Farag

Authors

Walid Fouad
View author publications
You can also search for this author in PubMed Google Scholar
Amr Badr
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim Farag
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Advanced Informatics School (UTM AIS), UTM International Campus, 54100, Kuala Lumpur, Malaysia
Azizah Abd Manaf , Akram Zeki , Mazdak Zamani & Suriayati Chuprat , , &
Information Systems Department, King Saud University, Riyadh, Saudi Arabia
Eyas El-Qawasmeh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fouad, W., Badr, A., Farag, I. (2011). AIFSA: A New Approach for Feature Selection and Weighting. In: Abd Manaf, A., Zeki, A., Zamani, M., Chuprat, S., El-Qawasmeh, E. (eds) Informatics Engineering and Information Science. ICIEIS 2011. Communications in Computer and Information Science, vol 252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25453-6_49

Download citation

DOI: https://doi.org/10.1007/978-3-642-25453-6_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25452-9
Online ISBN: 978-3-642-25453-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics