Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 512))

Abstract

Keyword extraction plays an increasingly crucial role in several texts related researches. Applications that utilize feature word selection include text mining, web page retrieval, text clustering and text categorization. Current methods for computing the keywords of a document are subject to a series of evolutions. Nevertheless, the methods do not perform well in very high dimensional state spaces. The methods are quite inefficient as they depend greatly on a human form of input. This attribute of the existing keyword extraction methods is not ideal in several applications. This paper presents a technique which will extract keywords without any kind of manual support. Genetic based extraction computes the list of key terms for each document. Irrespective of the text size, the novel method is able to perform the required computation with a higher echelon of performance. Calculations are done with the information taken from a structured document. Then the document is converted into a numerical representation by bestowing the distinct words with a numerical weight. The proposed method uses the knowledge of an iterative computation with a genetic algorithm to discover the optimal key terms. The evolutionary technique is subject to gradual changes that ensure the survival of the fittest. Experiments were done using three different data sets. The proposed method shows a high degree of correlation when the performance was checked against the existing methods of weighted term standard deviation, The Differential Text Categorizer method and the discourse method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdelmalek, A., Zakaria, E., Ladjel, B., Michel, S., Mimoun, M.: Concept - Based Clustering of Textual Documents Using SOM. In: Computer Systems and Applications AICCSA, pp. 156–163 (2008)

    Google Scholar 

  2. Berend, G., Farkas, R.: Feature engineering for keyphrase extraction. In: Proceeding of the 5th International Workshop on Semantic Evaluation, pp. 186–189. ACL, Uppsala (2010)

    Google Scholar 

  3. Bracewell, D.B., Ren, F., Kuriowa, S.: Multilingual single document keyword extraction for information retrieval. In: IEEE International Conference in Natural Language Processing and Knowledge Engineering, pp. 517–522 (2005)

    Google Scholar 

  4. Khalessizadeh, S.M., Zaefarian, R., Nasseri, S.H., Ardil, E.: Genetic Mining: Using Genetic Algorithm for Topic based on Concept Distribution Word. World Academy of Science, Engineering and Technology

    Google Scholar 

  5. Kian, H.H., Zahedi, M.: An efficient approach for keyword selection: improving accessibility of web contents by general search engines. International Journal of Web & Semantic Technology 2(4) (2011)

    Google Scholar 

  6. Zhang, K., Xu, H., Tang, J., Li, J.: Keyword Extraction Using Support Vector Machine, pp. 85–96. Springer, Berlin (2006)

    Google Scholar 

  7. Matsuo, Y., Ishizuka, M.: Keyword Extraction from a single document using word co-occurrence statistical information. Int. J. Artificial Intelligence 13 (2004)

    Google Scholar 

  8. Murugeshan, M.S., Lakshmi, K., Mukerjee, S.: A negative category based approach for Wikipedia document classification. Int J. Knowledge Engineering and Data Mining 1(1), 84–97 (2010)

    Article  Google Scholar 

  9. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  10. Srinivas, M., Patnaik, L.M.: Adaptive Probabilities of Crossover and Mutation in Genetic Algorithm. IEEE Transactions on Systems, Man and Cybernetics 24(4) (1994)

    Google Scholar 

  11. Weng, S.S., Lin, Y.-J.: A Study on searching for document based on multiple concepts and distribution of concepts. Expert Systems with Applications, pp. 355–368. Elsevier (2003)

    Google Scholar 

  12. Kawahara, T., Hasegawa, M., Shitaoka, K., Kitade, T., Nanjo, H.: Automatic indexing of lecture presentations using unsupervised learning of presumed discourse Markers. IEEE Transactions on Speech and Audio Processing 12, 409–419 (2004)

    Article  Google Scholar 

  13. You, W., Fontaine, D., Barthes, J.-P.: An automatic Key phrase extraction system for scientific documents. In: Knowledge Information System. Springer-Verlag London Limited (2012), doi:10.1007/s10115-012-0480-2

    Google Scholar 

  14. Xue, X.-B., Zhou, Z.-H.: Distributional Features for Text Categorization. IEEE Transactions on Knowledge and Data Engineering

    Google Scholar 

  15. Zhang, K., Xu, H., Tang, J., Li, J.: Keyword extraction using support vector machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 85–96. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  16. Li, Z., Zhou, D., Juan, Y.F., Han, J.: Keyword Extraction for Social Snippets. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1143–1144 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Dafni Rose .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Rose, J.D., Dev, D.D., Robin, C.R.R. (2014). An Improved Genetic Based Keyword Extraction Technique. In: Terrazas, G., Otero, F., Masegosa, A. (eds) Nature Inspired Cooperative Strategies for Optimization (NICSO 2013). Studies in Computational Intelligence, vol 512. Springer, Cham. https://doi.org/10.1007/978-3-319-01692-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01692-4_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01691-7

  • Online ISBN: 978-3-319-01692-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics