An Improved Genetic Based Keyword Extraction Technique

  • J. Dafni Rose
  • Divya D. Dev
  • C. R. Rene Robin
Part of the Studies in Computational Intelligence book series (SCI, volume 512)

Abstract

Keyword extraction plays an increasingly crucial role in several texts related researches. Applications that utilize feature word selection include text mining, web page retrieval, text clustering and text categorization. Current methods for computing the keywords of a document are subject to a series of evolutions. Nevertheless, the methods do not perform well in very high dimensional state spaces. The methods are quite inefficient as they depend greatly on a human form of input. This attribute of the existing keyword extraction methods is not ideal in several applications. This paper presents a technique which will extract keywords without any kind of manual support. Genetic based extraction computes the list of key terms for each document. Irrespective of the text size, the novel method is able to perform the required computation with a higher echelon of performance. Calculations are done with the information taken from a structured document. Then the document is converted into a numerical representation by bestowing the distinct words with a numerical weight. The proposed method uses the knowledge of an iterative computation with a genetic algorithm to discover the optimal key terms. The evolutionary technique is subject to gradual changes that ensure the survival of the fittest. Experiments were done using three different data sets. The proposed method shows a high degree of correlation when the performance was checked against the existing methods of weighted term standard deviation, The Differential Text Categorizer method and the discourse method.

Keywords

Genetic algorithms Weighted Term Standard Deviation Genetic based algorithm mutation crossover 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abdelmalek, A., Zakaria, E., Ladjel, B., Michel, S., Mimoun, M.: Concept - Based Clustering of Textual Documents Using SOM. In: Computer Systems and Applications AICCSA, pp. 156–163 (2008)Google Scholar
  2. 2.
    Berend, G., Farkas, R.: Feature engineering for keyphrase extraction. In: Proceeding of the 5th International Workshop on Semantic Evaluation, pp. 186–189. ACL, Uppsala (2010)Google Scholar
  3. 3.
    Bracewell, D.B., Ren, F., Kuriowa, S.: Multilingual single document keyword extraction for information retrieval. In: IEEE International Conference in Natural Language Processing and Knowledge Engineering, pp. 517–522 (2005)Google Scholar
  4. 4.
    Khalessizadeh, S.M., Zaefarian, R., Nasseri, S.H., Ardil, E.: Genetic Mining: Using Genetic Algorithm for Topic based on Concept Distribution Word. World Academy of Science, Engineering and TechnologyGoogle Scholar
  5. 5.
    Kian, H.H., Zahedi, M.: An efficient approach for keyword selection: improving accessibility of web contents by general search engines. International Journal of Web & Semantic Technology 2(4) (2011)Google Scholar
  6. 6.
    Zhang, K., Xu, H., Tang, J., Li, J.: Keyword Extraction Using Support Vector Machine, pp. 85–96. Springer, Berlin (2006)Google Scholar
  7. 7.
    Matsuo, Y., Ishizuka, M.: Keyword Extraction from a single document using word co-occurrence statistical information. Int. J. Artificial Intelligence 13 (2004)Google Scholar
  8. 8.
    Murugeshan, M.S., Lakshmi, K., Mukerjee, S.: A negative category based approach for Wikipedia document classification. Int J. Knowledge Engineering and Data Mining 1(1), 84–97 (2010)CrossRefGoogle Scholar
  9. 9.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  10. 10.
    Srinivas, M., Patnaik, L.M.: Adaptive Probabilities of Crossover and Mutation in Genetic Algorithm. IEEE Transactions on Systems, Man and Cybernetics 24(4) (1994)Google Scholar
  11. 11.
    Weng, S.S., Lin, Y.-J.: A Study on searching for document based on multiple concepts and distribution of concepts. Expert Systems with Applications, pp. 355–368. Elsevier (2003)Google Scholar
  12. 12.
    Kawahara, T., Hasegawa, M., Shitaoka, K., Kitade, T., Nanjo, H.: Automatic indexing of lecture presentations using unsupervised learning of presumed discourse Markers. IEEE Transactions on Speech and Audio Processing 12, 409–419 (2004)CrossRefGoogle Scholar
  13. 13.
    You, W., Fontaine, D., Barthes, J.-P.: An automatic Key phrase extraction system for scientific documents. In: Knowledge Information System. Springer-Verlag London Limited (2012), doi:10.1007/s10115-012-0480-2Google Scholar
  14. 14.
    Xue, X.-B., Zhou, Z.-H.: Distributional Features for Text Categorization. IEEE Transactions on Knowledge and Data EngineeringGoogle Scholar
  15. 15.
    Zhang, K., Xu, H., Tang, J., Li, J.: Keyword extraction using support vector machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 85–96. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Li, Z., Zhou, D., Juan, Y.F., Han, J.: Keyword Extraction for Social Snippets. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1143–1144 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • J. Dafni Rose
    • 1
  • Divya D. Dev
    • 2
  • C. R. Rene Robin
    • 3
  1. 1.St. Joseph’s Institute of TechnologyChennaiIndia
  2. 2.St. Joseph’s College of EngineeringChennaiIndia
  3. 3.Jerusalem College of EngineeringChennaiIndia

Personalised recommendations