Abstract
Keyword extraction plays an increasingly crucial role in several texts related researches. Applications that utilize feature word selection include text mining, web page retrieval, text clustering and text categorization. Current methods for computing the keywords of a document are subject to a series of evolutions. Nevertheless, the methods do not perform well in very high dimensional state spaces. The methods are quite inefficient as they depend greatly on a human form of input. This attribute of the existing keyword extraction methods is not ideal in several applications. This paper presents a technique which will extract keywords without any kind of manual support. Genetic based extraction computes the list of key terms for each document. Irrespective of the text size, the novel method is able to perform the required computation with a higher echelon of performance. Calculations are done with the information taken from a structured document. Then the document is converted into a numerical representation by bestowing the distinct words with a numerical weight. The proposed method uses the knowledge of an iterative computation with a genetic algorithm to discover the optimal key terms. The evolutionary technique is subject to gradual changes that ensure the survival of the fittest. Experiments were done using three different data sets. The proposed method shows a high degree of correlation when the performance was checked against the existing methods of weighted term standard deviation, The Differential Text Categorizer method and the discourse method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abdelmalek, A., Zakaria, E., Ladjel, B., Michel, S., Mimoun, M.: Concept - Based Clustering of Textual Documents Using SOM. In: Computer Systems and Applications AICCSA, pp. 156–163 (2008)
Berend, G., Farkas, R.: Feature engineering for keyphrase extraction. In: Proceeding of the 5th International Workshop on Semantic Evaluation, pp. 186–189. ACL, Uppsala (2010)
Bracewell, D.B., Ren, F., Kuriowa, S.: Multilingual single document keyword extraction for information retrieval. In: IEEE International Conference in Natural Language Processing and Knowledge Engineering, pp. 517–522 (2005)
Khalessizadeh, S.M., Zaefarian, R., Nasseri, S.H., Ardil, E.: Genetic Mining: Using Genetic Algorithm for Topic based on Concept Distribution Word. World Academy of Science, Engineering and Technology
Kian, H.H., Zahedi, M.: An efficient approach for keyword selection: improving accessibility of web contents by general search engines. International Journal of Web & Semantic Technology 2(4) (2011)
Zhang, K., Xu, H., Tang, J., Li, J.: Keyword Extraction Using Support Vector Machine, pp. 85–96. Springer, Berlin (2006)
Matsuo, Y., Ishizuka, M.: Keyword Extraction from a single document using word co-occurrence statistical information. Int. J. Artificial Intelligence 13 (2004)
Murugeshan, M.S., Lakshmi, K., Mukerjee, S.: A negative category based approach for Wikipedia document classification. Int J. Knowledge Engineering and Data Mining 1(1), 84–97 (2010)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing Management 24(5), 513–523 (1988)
Srinivas, M., Patnaik, L.M.: Adaptive Probabilities of Crossover and Mutation in Genetic Algorithm. IEEE Transactions on Systems, Man and Cybernetics 24(4) (1994)
Weng, S.S., Lin, Y.-J.: A Study on searching for document based on multiple concepts and distribution of concepts. Expert Systems with Applications, pp. 355–368. Elsevier (2003)
Kawahara, T., Hasegawa, M., Shitaoka, K., Kitade, T., Nanjo, H.: Automatic indexing of lecture presentations using unsupervised learning of presumed discourse Markers. IEEE Transactions on Speech and Audio Processing 12, 409–419 (2004)
You, W., Fontaine, D., Barthes, J.-P.: An automatic Key phrase extraction system for scientific documents. In: Knowledge Information System. Springer-Verlag London Limited (2012), doi:10.1007/s10115-012-0480-2
Xue, X.-B., Zhou, Z.-H.: Distributional Features for Text Categorization. IEEE Transactions on Knowledge and Data Engineering
Zhang, K., Xu, H., Tang, J., Li, J.: Keyword extraction using support vector machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 85–96. Springer, Heidelberg (2006)
Li, Z., Zhou, D., Juan, Y.F., Han, J.: Keyword Extraction for Social Snippets. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1143–1144 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Rose, J.D., Dev, D.D., Robin, C.R.R. (2014). An Improved Genetic Based Keyword Extraction Technique. In: Terrazas, G., Otero, F., Masegosa, A. (eds) Nature Inspired Cooperative Strategies for Optimization (NICSO 2013). Studies in Computational Intelligence, vol 512. Springer, Cham. https://doi.org/10.1007/978-3-319-01692-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-01692-4_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01691-7
Online ISBN: 978-3-319-01692-4
eBook Packages: EngineeringEngineering (R0)