Implementation of Web Search Result Clustering System

Hanumanthappa, M.; Prakash, B. R.

doi:10.1007/978-81-322-0740-5_94

M. Hanumanthappa⁵ &
B. R. Prakash⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 174))

1415 Accesses

Abstract

Web search results clustering is an increasingly popular technique for providing useful grouping of web search results. This paper introduces a prototype web search results clustering engine that use the random sampling technique with medoids instead of centroids to improve clustering quality, Cluster labeling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure by using Modified Furthest Point First algorithm. M-FPF is compared against two other established web document clustering algorithms: Suffix Tree Clustering (STC) and Lingo, which are provided by the free open source Carrot2 Document Clustering Workbench. We measure cluster quality by considering precision , recall and relevance. Results from testing on different datasets show a considerable clustering quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Proceedings of the 21st Annual International SIGIR Conference on Research and Development in Information Retrieval (1998)
Google Scholar
Hanumanthappa, M., Prakash, B.R., Mamatha, M.: Improving the efficiency of document clustering and labeling using Modified FPF algorithm. In: Proceeding of International Conference on Problem Solving and Soft Computing (2011)
Google Scholar
Geraci, F., Leoncini, M., Montangero, M., Pellegrini, M., Renda, M.E.: FPF-SB: A Scalable Algorithm for Microarray Gene Expression Data Clustering. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 606–615. Springer, Heidelberg (2007)
Chapter Google Scholar
Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intelligent Systems 20(3), 48–54 (2005)
Article Google Scholar
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the 34th Annual ACM Symposium on the Theory of Computing, STOC 2002, Montreal, CA, pp. 380–388 (2002)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the14th International Conference on Machine Learning, ICML 1997, Nashville, US, pp. 412–420 (1997)
Google Scholar
Ferragina, P., Gulli, A.: A personalized search engine based on Web-snippet hierarchical clustering. Special Interest Tracks and Poster Proceedings of the 14th International Conference on the World Wide Web, WWW 2005, Chiba, JP, pp. 801–810 (2005)
Google Scholar
Crabtree, D., Gao, X., Andreae, P.: Standardized evaluation method for web clustering results. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (2005)
Google Scholar
Matsumoto, T., Hung, E.: Fuzzy Clustering and Relevance Ranking of Web Search Results with Differentiating Cluster Label Generation
Google Scholar
Geraci, F., Pellegrini, M., Maggini, M., Sebastiani, F.: Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 25–36. Springer, Heidelberg (2006)
Chapter Google Scholar
Geraci, F., Pellegrini, M., Pisati, P., Sebastiani, F.: A scalable algorithm for high-quality clustering of Web snippets. In: Proceedings of the 21st ACM Symposium on Applied Computing, SAC 2006, Dijon, FR, pp. 1058–1062 (2007)
Google Scholar
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38(2/3), 293–306 (1985)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Bangalore University, Bangalore, India
M. Hanumanthappa & B. R. Prakash

Authors

M. Hanumanthappa
View author publications
You can also search for this author in PubMed Google Scholar
B. R. Prakash
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Hanumanthappa .

Editor information

Editors and Affiliations

M. S. Ramaiah Institute of Technology, Bengaluru, India
Aswatha Kumar M.
M. S. Ramaiah Institute of Technology, Bengaluru, India
Selvarani R.
M. S. Ramaiah Institute of Technology, Bengaluru, India
T V Suresh Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hanumanthappa, M., Prakash, B.R. (2013). Implementation of Web Search Result Clustering System. In: Kumar M., A., R., S., Kumar, T. (eds) Proceedings of International Conference on Advances in Computing. Advances in Intelligent Systems and Computing, vol 174. Springer, New Delhi. https://doi.org/10.1007/978-81-322-0740-5_94

Download citation

DOI: https://doi.org/10.1007/978-81-322-0740-5_94
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-0739-9
Online ISBN: 978-81-322-0740-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics