Abstract
Feature selection is a typical search problem where each state in the search space represents a subset of features candidate for selection. Out of n features, 2n subsets can be constructed, hence, an exhaustive search of all subsets becomes infeasible when n is relatively large. Therefore, Feature selection is done by employing a heuristic search algorithm that tries to reach the optimal feature subset. Here, we propose a new wrapper feature selection and weighting algorithm called Artificial Immune Feature Selection Algorithm (AIFSA); the algorithm is based on the metaphors of the Clonal Selection Algorithm (CSA). AIFSA, by itself, is not a classification algorithm, rather it utilizes well-known classifiers to evaluate and promote candidate feature subset. Experiments were performed on textual datasets like WebKB and Syskill&Webert web page ratings. Experimental results showed AIFSA competitive performance over traditional well-known filter feature selection approaches as well as some wrapper approaches existing in literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance 97, 273–324 (1997)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Machine Learning Research 3, 1289–1305 (2003)
Qi, X., Davison, B.D.: Web Page Classification: Features and Algorithms. ACM Computing Surveys 41(2) (2009)
Singh, S.R., Murthy, H.A., Gonsalves, T.A.: Feature Selection for Text Classification Based on Gini Coefficient of Inequality. Journal of Machine Learning Research 10, 76–85 (2010)
Xhemali, D., Hinde, C.J., Stone, R.G.: Naïve Bayes vs. Decision Trees vs. Neural Networks in the classification of training web pages. International Journal of Computer Science Issues 4(1), 16–23 (2009)
Otsubo, M., Hung, B.Q., Hijikata, Y., Nishida, S.: Web Page Classification using Anchor-related Text Extracted by a DOM-based Method. Information and Media Technologies 5(1), 193–205 (2010)
Othman, M.S., Yusuf, L.M., Salim, J.: Features Discovery for Web Classification Using Support Vector Machine. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics (ICICCI), Kuala Lumpur, pp. 36–40 (2010)
Baykan, E., Henzinger, M., Marian, L., Weber, I.: Purely URL-based topic classification. In: 18th International Conference on World Wide Web (WWW 2009), pp. 1109–1110 (2009)
Meshkizadeh, S., Rahmani, A.M.: Webpage Classification based on Compound of Using HTML Features & URL Features and Features of Sibling Pages. International Journal of Advancements in Computing Technology 2(4), 36–46 (2010)
Rajalakshmi, R., Aravindan, C.: Naive Bayes Approach for Website Classification. Communications in Computer and Information Science 147(2), 323–326 (2011)
Neville, J., Jensen, D.: Iterative classification in relational data. In: Workshop on Learning Statistical Models from Relational Data (AAAI 2000), pp. 13–20 (2000)
Slattery, S., Mitchell, T.M.: Discovering Test Set Regularities in Relational Domains. In: 17th International Conference on Machine Learning (ICML 2000), Stanford, CA, pp. 895–902 (2000)
Pazzani, M., Billsus, D.: Learning and revising user profiles: The identification of interesting web sites. Machine Learning 27(3), 313–331 (1997)
Twycross, J., Cayzer, S.: An immune-based approach to document classification. In: Intelligent Information Processing and Web Mining, Proceedings of the International IIS (IIPWM 2003), Zakopane, pp. 33–46 (2002)
Fürnkranz, J., Mitchell, T., Riloff, E.: A case study in using linguistic phrases for text categorization on the WWW. In: Working Notes of the AAAI/ICML Workshop on Learning for Text Categorization, pp. 5–12. AAAI Press (1998)
Cohen, W.W.: Fast effective rule induction. In: 12th International Conference on Machine Learning (ML 1995), Tahoe City, California, pp. 115–123 (1995)
Craven, M., Slattery, S.: Relational learning with statistical predicate invention: better models for hypertext. Machine Learning 43(1/2), 97–117 (2001)
Ghani, R., Slattery, S., Yang, Y.: Hypertext categorization using hyperlink patterns and meta data. In: 18th International Conference on Machine Learning (ICML 2001), Williamstown, pp. 115–178 (2001)
Quinlan, J.R.: Learning logical definitions from relations. Machine Learning 5(3), 239–266 (1990)
Sun, A., Lim, E.P., Ng, W.K.: Web classification using support vector machine. In: 4th ACM CIKM International Workshop on Web Information and Data Management (WIDM 2002), Virginia, pp. 96–99 (2002)
Schneider, K.M.: Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, pp. 252–263 (2005)
Yang, H., King, I.: Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5507, pp. 53–60. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fouad, W., Badr, A., Farag, I. (2011). AIFSA: A New Approach for Feature Selection and Weighting. In: Abd Manaf, A., Zeki, A., Zamani, M., Chuprat, S., El-Qawasmeh, E. (eds) Informatics Engineering and Information Science. ICIEIS 2011. Communications in Computer and Information Science, vol 252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25453-6_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-25453-6_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25452-9
Online ISBN: 978-3-642-25453-6
eBook Packages: Computer ScienceComputer Science (R0)