Search Word Extraction Using Extended PageRank Calculations

Part of the Studies in Computational Intelligence book series (SCI, volume 391)


This paper describes a newmethod to determine characteristic terms from texts by weighting them using extended PageRank calculations. Additionally, this method clusters found semantic term relations to assign each term a level of specifity to be able to distinguish between general and specific terms. This way, it is also possible to differentiate between terms of different semantic orientations in the same specifity level. In the experiments, it is shown which terms can be used for the automatic retrieval of semantically similar documents from large corpora like the World Wide Web through automatic query formulation. The selection of query terms of a different specifity level is also a useful instrument in interactive document retrieval to express the intended similarity of documents to be found. An added advantage of this method is, that it does not rely on third-party datasets and works on single texts.


Query Term Semantic Context Similar Document Characteristic Term Document Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)CrossRefGoogle Scholar
  2. 2.
    Heyer, G., Quasthoff, U., Wittig, T.: Text Mining - Wissensrohstoff Text. W3L Verlag, Bochum (2006)Google Scholar
  3. 3.
    Kubek, M., Unger, H.: Empiric Considerations of the PageRank’s Clustering Property. In: 7th International Conference on Computing and Information Technology (IC2IT), Bangkok (2011)Google Scholar
  4. 4.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. In: Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
  5. 5.
    Wang, J., Liu, J., Wang, C.: Keyword extraction based on pageRank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 857–864. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Mihalcea, R., Tarau, P., Figa, E.: PageRank on Semantic Networks, with application to Word Sense Disambiguation. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)Google Scholar
  7. 7.
    Sodsee, S., Komkhao, M., Meesad, P., Unger, H.: An Extended PageRank Calculation Including Network Parameters. In: Computer Science Education: Innovation and Technology (CSEIT 2010) Special Track: Knowledge Discovery, KD 2010 (2010)Google Scholar
  8. 8.
    Buechler, M.: Flexibles Berechnen von Kookkurrenzen auf strukturierten und unstrukturierten Daten. Masters thesis, University of Leipzig (2006)Google Scholar
  9. 9.
    Quasthoff, U., Wolff, C.: The Poisson Collocation Measure and its Applications. In: Proc. Second International Workshop on Computational Approaches to Collocations, Wien (2002)Google Scholar
  10. 10.
    Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)Google Scholar
  11. 11.
    Kubek, M., Witschel, H.F.: Searching the Web by Using the Knowledge in Local Text Documents. In: Proceedings of Mallorca Workshop 2010 Autonomous Systems. Shaker Verlag, Aachen (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Faculty of Mathematics and Computer ScienceFernUniversity in HagenHagenGermany

Personalised recommendations