Skip to main content

Extended Document Representation for Search Result Clustering

  • Chapter
  • First Online:
Intelligent Tools for Building a Scientific Information Platform

Part of the book series: Studies in Computational Intelligence ((SCI,volume 390))

Abstract

Organizing query results into clusters facilitates quick navigation through search results and helps users to specify their search intentions. Most meta-search engines group documents based on short fragments of source text called snippets. Such a model of data representation in many cases shows to be insufficient to reflect semantic correlation between documents. In this paper, we discuss a framework of document description extension which utilizes domain knowledge and semantic similarity. Our idea is based on application of Tolerance Rough Set Model, semantic information extracted from source text and domain ontology to approximate concepts associated with documents and to enrich the vector representation.

The authors are supported by the grant N N516 077837 from the Ministry of Science and Higher Education of the Republic of Poland and by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 1st edn. Addison Wesley Longman Publishing Co. Inc. (May 1999)

    Google Scholar 

  2. Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Comput. Surv. 41, 17:1–17:38 (2009)

    Google Scholar 

  3. Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proceedings of SIGIR 1996, 19th ACM International Conference on Research and Development in Information Retrieval, Zürich, CH, pp. 76–84 (1996)

    Google Scholar 

  4. Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: a Tutorial (1998)

    Google Scholar 

  5. Ngo, C.L., Nguyen, H.S.: A method of web search result clustering based on rough sets. In: Skowron, A., Agrawal, R., Luck, M., Yamaguchi, T., Morizet-Mahoudeaux, P., Liu, J., Zhong, N. (eds.) Web Intelligence, pp. 673–679. IEEE Computer Society (2005)

    Google Scholar 

  6. Osinski, S.: An algorithm for clustering of web search result. Master’s thesis, Poznan University of Technology, Poland (June 2003)

    Google Scholar 

  7. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer, Dordrecht (1991)

    MATH  Google Scholar 

  8. Rand, W.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66(336), 846–850 (1971)

    Article  Google Scholar 

  9. Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)

    Article  Google Scholar 

  10. Kawasaki, S., Nguyen, N.B., Ho, T.-B.: Hierarchical document clustering based on tolerance rough set model. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 458–463. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae 27(2-3), 245–253 (1996)

    MathSciNet  MATH  Google Scholar 

  12. Szczuka, M., Janusz, A., Herba, K.: Clustering of rough set related documents with use of knowledge from dBpedia. In: Yao, J. (ed.) RSKT 2011. LNCS, vol. 6954, pp. 394–403. Springer, Heidelberg (2011)

    Google Scholar 

  13. Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. International Journal of Intelligent Systems 17(2), 199–212 (2002)

    Article  MATH  Google Scholar 

  14. Weiss, D.: A clustering interface for web search results in polish and english. Master’s thesis, Poznan University of Technology, Poland (June 2001)

    Google Scholar 

  15. Wroblewski, M.: A hierarchical www pages clustering algorithm based on the vector space model. Master’s thesis, Poznan University of Technology, Poland (July 2003)

    Google Scholar 

  16. Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Computer Networks 31(11–16), 1361–1374 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Hoa Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this chapter

Cite this chapter

Nguyen, S.H., Świeboda, W., Jaśkiewicz, G. (2012). Extended Document Representation for Search Result Clustering. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24809-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24809-2_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24808-5

  • Online ISBN: 978-3-642-24809-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics