Extended Document Representation for Search Result Clustering

Nguyen, S. Hoa; Świeboda, Wojciech; Jaśkiewicz, Grzegorz

doi:10.1007/978-3-642-24809-2_6

S. Hoa Nguyen⁵,
Wojciech Świeboda⁵ &
Grzegorz Jaśkiewicz⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 390))

759 Accesses
7 Citations

Abstract

Organizing query results into clusters facilitates quick navigation through search results and helps users to specify their search intentions. Most meta-search engines group documents based on short fragments of source text called snippets. Such a model of data representation in many cases shows to be insufficient to reflect semantic correlation between documents. In this paper, we discuss a framework of document description extension which utilizes domain knowledge and semantic similarity. Our idea is based on application of Tolerance Rough Set Model, semantic information extracted from source text and domain ontology to approximate concepts associated with documents and to enrich the vector representation.

The authors are supported by the grant N N516 077837 from the Ministry of Science and Higher Education of the Republic of Poland and by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 1st edn. Addison Wesley Longman Publishing Co. Inc. (May 1999)
Google Scholar
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Comput. Surv. 41, 17:1–17:38 (2009)
Google Scholar
Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proceedings of SIGIR 1996, 19th ACM International Conference on Research and Development in Information Retrieval, Zürich, CH, pp. 76–84 (1996)
Google Scholar
Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: a Tutorial (1998)
Google Scholar
Ngo, C.L., Nguyen, H.S.: A method of web search result clustering based on rough sets. In: Skowron, A., Agrawal, R., Luck, M., Yamaguchi, T., Morizet-Mahoudeaux, P., Liu, J., Zhong, N. (eds.) Web Intelligence, pp. 673–679. IEEE Computer Society (2005)
Google Scholar
Osinski, S.: An algorithm for clustering of web search result. Master’s thesis, Poznan University of Technology, Poland (June 2003)
Google Scholar
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer, Dordrecht (1991)
MATH Google Scholar
Rand, W.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66(336), 846–850 (1971)
Article Google Scholar
Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)
Article Google Scholar
Kawasaki, S., Nguyen, N.B., Ho, T.-B.: Hierarchical document clustering based on tolerance rough set model. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 458–463. Springer, Heidelberg (2000)
Chapter Google Scholar
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae 27(2-3), 245–253 (1996)
MathSciNet MATH Google Scholar
Szczuka, M., Janusz, A., Herba, K.: Clustering of rough set related documents with use of knowledge from dBpedia. In: Yao, J. (ed.) RSKT 2011. LNCS, vol. 6954, pp. 394–403. Springer, Heidelberg (2011)
Google Scholar
Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. International Journal of Intelligent Systems 17(2), 199–212 (2002)
Article MATH Google Scholar
Weiss, D.: A clustering interface for web search results in polish and english. Master’s thesis, Poznan University of Technology, Poland (June 2001)
Google Scholar
Wroblewski, M.: A hierarchical www pages clustering algorithm based on the vector space model. Master’s thesis, Poznan University of Technology, Poland (July 2003)
Google Scholar
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Computer Networks 31(11–16), 1361–1374 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics, Informatics, and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
S. Hoa Nguyen, Wojciech Świeboda & Grzegorz Jaśkiewicz

Authors

S. Hoa Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Świeboda
View author publications
You can also search for this author in PubMed Google Scholar
Grzegorz Jaśkiewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Hoa Nguyen .

Editor information

Editors and Affiliations

Institute of Computer Science, Faculty of Electronics and Information, Warsaw University of Technology, Ul. Nowowiejska 15/19, Warsaw, 00-665, Poland
Robert Bembenik
Institute of Computer Science, Faculty of Electronics and Information, Warsaw University of Technology, Ul. Nowowiejska 15/19, Warsaw, 00-665, Poland
Lukasz Skonieczny
Institute of Computer Science, Faculty of Computer Science and, Warsaw University of Technology, ul. Zolnierska 49, Warsaw, 00-665, Poland
Henryk Rybiński
, Interdisciplinary Centre for Mathematica, University of Warsaw, Ul. Pawińskiego 5a, Warsaw, 02-106, Poland
Marek Niezgodka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nguyen, S.H., Świeboda, W., Jaśkiewicz, G. (2012). Extended Document Representation for Search Result Clustering. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24809-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-24809-2_6
Published: 24 January 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24808-5
Online ISBN: 978-3-642-24809-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics