Skip to main content

Implementation of Web Search Result Clustering System

  • Conference paper
Proceedings of International Conference on Advances in Computing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 174))

  • 1415 Accesses

Abstract

Web search results clustering is an increasingly popular technique for providing useful grouping of web search results. This paper introduces a prototype web search results clustering engine that use the random sampling technique with medoids instead of centroids to improve clustering quality, Cluster labeling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure by using Modified Furthest Point First algorithm. M-FPF is compared against two other established web document clustering algorithms: Suffix Tree Clustering (STC) and Lingo, which are provided by the free open source Carrot2 Document Clustering Workbench. We measure cluster quality by considering precision , recall and relevance. Results from testing on different datasets show a considerable clustering quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Proceedings of the 21st Annual International SIGIR Conference on Research and Development in Information Retrieval (1998)

    Google Scholar 

  2. Hanumanthappa, M., Prakash, B.R., Mamatha, M.: Improving the efficiency of document clustering and labeling using Modified FPF algorithm. In: Proceeding of International Conference on Problem Solving and Soft Computing (2011)

    Google Scholar 

  3. Geraci, F., Leoncini, M., Montangero, M., Pellegrini, M., Renda, M.E.: FPF-SB: A Scalable Algorithm for Microarray Gene Expression Data Clustering. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 606–615. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intelligent Systems 20(3), 48–54 (2005)

    Article  Google Scholar 

  5. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the 34th Annual ACM Symposium on the Theory of Computing, STOC 2002, Montreal, CA, pp. 380–388 (2002)

    Google Scholar 

  6. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the14th International Conference on Machine Learning, ICML 1997, Nashville, US, pp. 412–420 (1997)

    Google Scholar 

  7. Ferragina, P., Gulli, A.: A personalized search engine based on Web-snippet hierarchical clustering. Special Interest Tracks and Poster Proceedings of the 14th International Conference on the World Wide Web, WWW 2005, Chiba, JP, pp. 801–810 (2005)

    Google Scholar 

  8. Crabtree, D., Gao, X., Andreae, P.: Standardized evaluation method for web clustering results. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (2005)

    Google Scholar 

  9. Matsumoto, T., Hung, E.: Fuzzy Clustering and Relevance Ranking of Web Search Results with Differentiating Cluster Label Generation

    Google Scholar 

  10. Geraci, F., Pellegrini, M., Maggini, M., Sebastiani, F.: Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 25–36. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Geraci, F., Pellegrini, M., Pisati, P., Sebastiani, F.: A scalable algorithm for high-quality clustering of Web snippets. In: Proceedings of the 21st ACM Symposium on Applied Computing, SAC 2006, Dijon, FR, pp. 1058–1062 (2007)

    Google Scholar 

  12. Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38(2/3), 293–306 (1985)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Hanumanthappa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer India

About this paper

Cite this paper

Hanumanthappa, M., Prakash, B.R. (2013). Implementation of Web Search Result Clustering System. In: Kumar M., A., R., S., Kumar, T. (eds) Proceedings of International Conference on Advances in Computing. Advances in Intelligent Systems and Computing, vol 174. Springer, New Delhi. https://doi.org/10.1007/978-81-322-0740-5_94

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-0740-5_94

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-0739-9

  • Online ISBN: 978-81-322-0740-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics