An Improved Genetic Based Keyword Extraction Technique

Rose, J. Dafni; Dev, Divya D.; Robin, C. R. Rene

doi:10.1007/978-3-319-01692-4_12

J. Dafni Rose⁵,
Divya D. Dev⁶ &
C. R. Rene Robin⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 512))

1220 Accesses
2 Citations

Abstract

Keyword extraction plays an increasingly crucial role in several texts related researches. Applications that utilize feature word selection include text mining, web page retrieval, text clustering and text categorization. Current methods for computing the keywords of a document are subject to a series of evolutions. Nevertheless, the methods do not perform well in very high dimensional state spaces. The methods are quite inefficient as they depend greatly on a human form of input. This attribute of the existing keyword extraction methods is not ideal in several applications. This paper presents a technique which will extract keywords without any kind of manual support. Genetic based extraction computes the list of key terms for each document. Irrespective of the text size, the novel method is able to perform the required computation with a higher echelon of performance. Calculations are done with the information taken from a structured document. Then the document is converted into a numerical representation by bestowing the distinct words with a numerical weight. The proposed method uses the knowledge of an iterative computation with a genetic algorithm to discover the optimal key terms. The evolutionary technique is subject to gradual changes that ensure the survival of the fittest. Experiments were done using three different data sets. The proposed method shows a high degree of correlation when the performance was checked against the existing methods of weighted term standard deviation, The Differential Text Categorizer method and the discourse method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdelmalek, A., Zakaria, E., Ladjel, B., Michel, S., Mimoun, M.: Concept - Based Clustering of Textual Documents Using SOM. In: Computer Systems and Applications AICCSA, pp. 156–163 (2008)
Google Scholar
Berend, G., Farkas, R.: Feature engineering for keyphrase extraction. In: Proceeding of the 5th International Workshop on Semantic Evaluation, pp. 186–189. ACL, Uppsala (2010)
Google Scholar
Bracewell, D.B., Ren, F., Kuriowa, S.: Multilingual single document keyword extraction for information retrieval. In: IEEE International Conference in Natural Language Processing and Knowledge Engineering, pp. 517–522 (2005)
Google Scholar
Khalessizadeh, S.M., Zaefarian, R., Nasseri, S.H., Ardil, E.: Genetic Mining: Using Genetic Algorithm for Topic based on Concept Distribution Word. World Academy of Science, Engineering and Technology
Google Scholar
Kian, H.H., Zahedi, M.: An efficient approach for keyword selection: improving accessibility of web contents by general search engines. International Journal of Web & Semantic Technology 2(4) (2011)
Google Scholar
Zhang, K., Xu, H., Tang, J., Li, J.: Keyword Extraction Using Support Vector Machine, pp. 85–96. Springer, Berlin (2006)
Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword Extraction from a single document using word co-occurrence statistical information. Int. J. Artificial Intelligence 13 (2004)
Google Scholar
Murugeshan, M.S., Lakshmi, K., Mukerjee, S.: A negative category based approach for Wikipedia document classification. Int J. Knowledge Engineering and Data Mining 1(1), 84–97 (2010)
Article Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing Management 24(5), 513–523 (1988)
Article Google Scholar
Srinivas, M., Patnaik, L.M.: Adaptive Probabilities of Crossover and Mutation in Genetic Algorithm. IEEE Transactions on Systems, Man and Cybernetics 24(4) (1994)
Google Scholar
Weng, S.S., Lin, Y.-J.: A Study on searching for document based on multiple concepts and distribution of concepts. Expert Systems with Applications, pp. 355–368. Elsevier (2003)
Google Scholar
Kawahara, T., Hasegawa, M., Shitaoka, K., Kitade, T., Nanjo, H.: Automatic indexing of lecture presentations using unsupervised learning of presumed discourse Markers. IEEE Transactions on Speech and Audio Processing 12, 409–419 (2004)
Article Google Scholar
You, W., Fontaine, D., Barthes, J.-P.: An automatic Key phrase extraction system for scientific documents. In: Knowledge Information System. Springer-Verlag London Limited (2012), doi:10.1007/s10115-012-0480-2
Google Scholar
Xue, X.-B., Zhou, Z.-H.: Distributional Features for Text Categorization. IEEE Transactions on Knowledge and Data Engineering
Google Scholar
Zhang, K., Xu, H., Tang, J., Li, J.: Keyword extraction using support vector machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 85–96. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, Z., Zhou, D., Juan, Y.F., Han, J.: Keyword Extraction for Social Snippets. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1143–1144 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

St. Joseph’s Institute of Technology, Chennai, 119, India
J. Dafni Rose
St. Joseph’s College of Engineering, Chennai, 119, India
Divya D. Dev
Jerusalem College of Engineering, Chennai, 100, India
C. R. Rene Robin

Authors

J. Dafni Rose
View author publications
You can also search for this author in PubMed Google Scholar
Divya D. Dev
View author publications
You can also search for this author in PubMed Google Scholar
C. R. Rene Robin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Dafni Rose .

Editor information

Editors and Affiliations

School of Computer Science, University of Nottingham, Nottingham, United Kingdom
German Terrazas
School of Computing, University of Kent, Canterbury, Kent, United Kingdom
Fernando E. B. Otero
Center for Research on ICT, University of Granada, Granada, Spain
Antonio D. Masegosa

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rose, J.D., Dev, D.D., Robin, C.R.R. (2014). An Improved Genetic Based Keyword Extraction Technique. In: Terrazas, G., Otero, F., Masegosa, A. (eds) Nature Inspired Cooperative Strategies for Optimization (NICSO 2013). Studies in Computational Intelligence, vol 512. Springer, Cham. https://doi.org/10.1007/978-3-319-01692-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-01692-4_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01691-7
Online ISBN: 978-3-319-01692-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics