Skip to main content

Search Word Extraction Using Extended PageRank Calculations

  • Chapter

Part of the Studies in Computational Intelligence book series (SCI,volume 391)

Abstract

This paper describes a newmethod to determine characteristic terms from texts by weighting them using extended PageRank calculations. Additionally, this method clusters found semantic term relations to assign each term a level of specifity to be able to distinguish between general and specific terms. This way, it is also possible to differentiate between terms of different semantic orientations in the same specifity level. In the experiments, it is shown which terms can be used for the automatic retrieval of semantically similar documents from large corpora like the World Wide Web through automatic query formulation. The selection of query terms of a different specifity level is also a useful instrument in interactive document retrieval to express the intended similarity of documents to be found. An added advantage of this method is, that it does not rely on third-party datasets and works on single texts.

Keywords

  • Query Term
  • Semantic Context
  • Similar Document
  • Characteristic Term
  • Document Corpus

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-24806-1_25
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-3-642-24806-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   219.99
Price excludes VAT (USA)
Hardcover Book
USD   219.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    CrossRef  Google Scholar 

  2. Heyer, G., Quasthoff, U., Wittig, T.: Text Mining - Wissensrohstoff Text. W3L Verlag, Bochum (2006)

    Google Scholar 

  3. Kubek, M., Unger, H.: Empiric Considerations of the PageRank’s Clustering Property. In: 7th International Conference on Computing and Information Technology (IC2IT), Bangkok (2011)

    Google Scholar 

  4. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. In: Technical report, Stanford Digital Library Technologies Project (1998)

    Google Scholar 

  5. Wang, J., Liu, J., Wang, C.: Keyword extraction based on pageRank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 857–864. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  6. Mihalcea, R., Tarau, P., Figa, E.: PageRank on Semantic Networks, with application to Word Sense Disambiguation. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)

    Google Scholar 

  7. Sodsee, S., Komkhao, M., Meesad, P., Unger, H.: An Extended PageRank Calculation Including Network Parameters. In: Computer Science Education: Innovation and Technology (CSEIT 2010) Special Track: Knowledge Discovery, KD 2010 (2010)

    Google Scholar 

  8. Buechler, M.: Flexibles Berechnen von Kookkurrenzen auf strukturierten und unstrukturierten Daten. Masters thesis, University of Leipzig (2006)

    Google Scholar 

  9. Quasthoff, U., Wolff, C.: The Poisson Collocation Measure and its Applications. In: Proc. Second International Workshop on Computational Approaches to Collocations, Wien (2002)

    Google Scholar 

  10. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)

    Google Scholar 

  11. Kubek, M., Witschel, H.F.: Searching the Web by Using the Knowledge in Local Text Documents. In: Proceedings of Mallorca Workshop 2010 Autonomous Systems. Shaker Verlag, Aachen (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Kubek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kubek, M., Unger, H. (2012). Search Word Extraction Using Extended PageRank Calculations. In: Unger, H., Kyamaky, K., Kacprzyk, J. (eds) Autonomous Systems: Developments and Trends. Studies in Computational Intelligence, vol 391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24806-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24806-1_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24805-4

  • Online ISBN: 978-3-642-24806-1

  • eBook Packages: EngineeringEngineering (R0)