Skip to main content

Improving Web Search by Categorization, Clustering, and Personalization

  • Conference paper
Advanced Data Mining and Applications (ADMA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

Abstract

This research combines Web snippet categorization, clustering and personalization techniques to recommend relevant results to users. RIB – Recommender Intelligent Browser which categorizes Web snippets using socially constructed Web directory such as the Open Directory Project (ODP) is to be developed. By comparing the similarities between the semantics of each ODP category represented by the category-documents and the Web snippets, the Web snippets are organized into a hierarchy. Meanwhile, the Web snippets are clustered to boost the quality of the categorization. Based on an automatically formed user profile which takes into consideration desktop computer information and concept drift, the proposed search strategy recommends relevant search results to users. This research also intends to verify text categorization, clustering, and feature selection algorithms in the context where only Web snippets are available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gauch, S., Chaffee, J., Pretschner, A.: Ontology-based personalized search and browsing. Web intelligence and Agent System 1, 219–234 (2003)

    Google Scholar 

  2. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic indexing. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)

    Article  Google Scholar 

  3. Montebello, M.: Information Overload–An IR Problem? In: Proceedings of String Processing and Information Retrieval: A South American Symposium, pp. 65–74. IEEE Computer Society, Los Alamitos (1998)

    Chapter  Google Scholar 

  4. Zhu, D., Dreher, H.: IR Issues for Digital Ecosystems Users. In: Proceedings of the Second IEEE Digital Ecosystems and Technologies Conference, pp. 586–591. IEEE, Los Alamitos (2008)

    Google Scholar 

  5. Chirita, P.-A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using ODP Metadata to Personalize Search. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185. ACM Press, New York (2005)

    Chapter  Google Scholar 

  6. Godoy, D., Amandi, A.: Modeling user interests by conceptual clustering. Inform. Syst. 31, 247–265 (2006)

    Article  Google Scholar 

  7. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Comput. Surv. 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  8. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Comput. Surv. 31, 264–323 (1999)

    Article  Google Scholar 

  9. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Inform. Retrieval 1, 69–90 (1999)

    Article  Google Scholar 

  10. Zhu, D.: Improving the Relevance of Search Results via Search-term Disambiguation and Ontological Filtering. School of Information Systems, Curtin Business School, Master. Curtin University of Technology, Perth, pp. 235 (2007)

    Google Scholar 

  11. Mizzaro, S.: Relevance: The Whole History. J. Am. Soc. Inf. Sci. 48, 810–832 (1997)

    Article  Google Scholar 

  12. Pitkow, J., Schütze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., Breuel, T.: Personalized Search: A contextual computing approach may prove a breakthrough in personalized search efficiency. Commun. ACM 45, 50–55 (2002)

    Article  Google Scholar 

  13. Tsymbal, A.: The problem of concept drift: definitions and related work. Technical report, Trinity College Dublin (2004)

    Google Scholar 

  14. Webb, G.I., Pazzani, M.J., Billsus, D.: Machine Learning for User Modeling. User Model User-Adap. 11, 19–29 (2001)

    Article  MATH  Google Scholar 

  15. Shen, X., Tan, B., Zhai, C.: Privacy Protection in Personalization Search. ACM SIGIR Forum 41, 4–17 (2007)

    Article  Google Scholar 

  16. Smith, B.: Ontology. In: Floridi, L. (ed.) Blackwell Guide to the Philosophy of Computing and Information, pp. 155–166. Blackwell, Oxford (2004)

    Google Scholar 

  17. Klas, C.-P., Fuhr, N.: A New Effective Approach for Categorizing Web Documents. In: Proceedings of the 22nd Annual Colloquium of the British Computer Society Information Retrieval Specialist Group (BCSIGSG 2000) (2000)

    Google Scholar 

  18. Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Inform. Process. Manag. 24, 513–523 (1988)

    Article  Google Scholar 

  19. Hemayati, R., Meng, W., Yu, C.: Semantic-based Grouping of Search Engine Results Using WordNet. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 678–686. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  20. Zhu, D., Dreher, H.: An Integrating Text Retrieval Framework for Digital Ecosystems Paradigm. In: Proceedings of the Inaugural IEEE Digital Ecosystems and Technologies Conference, pp. 367–372. IEEE, Los Alamitos (2007)

    Chapter  Google Scholar 

  21. Hearst, M.A., Pedersen, J.O.: Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In: Proceedings of the 19th annual international ACM/SIGIR conference on Research and development in information retrieval, pp. 76–84. ACM Press, New York (1996)

    Chapter  Google Scholar 

  22. Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. In: Proceedings of the Eighth International World Wide Web Conference (WWW8), pp. 283–296. Elsevier, Amsterdam (1999)

    Google Scholar 

  23. Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to Cluster Web Search Results. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210–217. ACM Press, New York (2004)

    Google Scholar 

  24. Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the Web. ACM Trans. Inter. Tech. 1, 2–43 (2001)

    Article  Google Scholar 

  25. Meng, W., Yu, C., Liu, K.-L.: Building Efficient and Effective Metasearch Engines. ACM Comput. Surv. 34, 48–89 (2000)

    Article  Google Scholar 

  26. Mladenic, D., Grobelnik, M.: Feature selection on hierarchy of web documents. Decis. Support Syst. 35, 45–87 (2003)

    Article  Google Scholar 

  27. Gospodnetić, O., Hatcher, E.: Lucene In Action. Manning Publications, Greenwich (2005)

    Google Scholar 

  28. Zhu, D., Dreher, H.: Personalized Information Retrieval in Digital Ecosystems. In: Proceedings of the Second IEEE Digital Ecosystems and Technologies Conference, pp. 580–585. IEEE, Los Alamitos (2008)

    Google Scholar 

  29. Zhu, D.: RIB: A Personalized Ontology-based Categorization/Clustering Approach to Improve the Relevance of Web Search Results. In: Proceedings of Curtin Business School Doctorial Colloquium. Curtin University of Technology, Perth (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, D., Dreher, H. (2008). Improving Web Search by Categorization, Clustering, and Personalization. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88192-6_69

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88191-9

  • Online ISBN: 978-3-540-88192-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics