Abstract
This research combines Web snippet categorization, clustering and personalization techniques to recommend relevant results to users. RIB – Recommender Intelligent Browser which categorizes Web snippets using socially constructed Web directory such as the Open Directory Project (ODP) is to be developed. By comparing the similarities between the semantics of each ODP category represented by the category-documents and the Web snippets, the Web snippets are organized into a hierarchy. Meanwhile, the Web snippets are clustered to boost the quality of the categorization. Based on an automatically formed user profile which takes into consideration desktop computer information and concept drift, the proposed search strategy recommends relevant search results to users. This research also intends to verify text categorization, clustering, and feature selection algorithms in the context where only Web snippets are available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gauch, S., Chaffee, J., Pretschner, A.: Ontology-based personalized search and browsing. Web intelligence and Agent System 1, 219–234 (2003)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic indexing. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
Montebello, M.: Information Overload–An IR Problem? In: Proceedings of String Processing and Information Retrieval: A South American Symposium, pp. 65–74. IEEE Computer Society, Los Alamitos (1998)
Zhu, D., Dreher, H.: IR Issues for Digital Ecosystems Users. In: Proceedings of the Second IEEE Digital Ecosystems and Technologies Conference, pp. 586–591. IEEE, Los Alamitos (2008)
Chirita, P.-A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using ODP Metadata to Personalize Search. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185. ACM Press, New York (2005)
Godoy, D., Amandi, A.: Modeling user interests by conceptual clustering. Inform. Syst. 31, 247–265 (2006)
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Comput. Surv. 34, 1–47 (2002)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Comput. Surv. 31, 264–323 (1999)
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Inform. Retrieval 1, 69–90 (1999)
Zhu, D.: Improving the Relevance of Search Results via Search-term Disambiguation and Ontological Filtering. School of Information Systems, Curtin Business School, Master. Curtin University of Technology, Perth, pp. 235 (2007)
Mizzaro, S.: Relevance: The Whole History. J. Am. Soc. Inf. Sci. 48, 810–832 (1997)
Pitkow, J., Schütze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., Breuel, T.: Personalized Search: A contextual computing approach may prove a breakthrough in personalized search efficiency. Commun. ACM 45, 50–55 (2002)
Tsymbal, A.: The problem of concept drift: definitions and related work. Technical report, Trinity College Dublin (2004)
Webb, G.I., Pazzani, M.J., Billsus, D.: Machine Learning for User Modeling. User Model User-Adap. 11, 19–29 (2001)
Shen, X., Tan, B., Zhai, C.: Privacy Protection in Personalization Search. ACM SIGIR Forum 41, 4–17 (2007)
Smith, B.: Ontology. In: Floridi, L. (ed.) Blackwell Guide to the Philosophy of Computing and Information, pp. 155–166. Blackwell, Oxford (2004)
Klas, C.-P., Fuhr, N.: A New Effective Approach for Categorizing Web Documents. In: Proceedings of the 22nd Annual Colloquium of the British Computer Society Information Retrieval Specialist Group (BCSIGSG 2000) (2000)
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Inform. Process. Manag. 24, 513–523 (1988)
Hemayati, R., Meng, W., Yu, C.: Semantic-based Grouping of Search Engine Results Using WordNet. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 678–686. Springer, Heidelberg (2007)
Zhu, D., Dreher, H.: An Integrating Text Retrieval Framework for Digital Ecosystems Paradigm. In: Proceedings of the Inaugural IEEE Digital Ecosystems and Technologies Conference, pp. 367–372. IEEE, Los Alamitos (2007)
Hearst, M.A., Pedersen, J.O.: Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In: Proceedings of the 19th annual international ACM/SIGIR conference on Research and development in information retrieval, pp. 76–84. ACM Press, New York (1996)
Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. In: Proceedings of the Eighth International World Wide Web Conference (WWW8), pp. 283–296. Elsevier, Amsterdam (1999)
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to Cluster Web Search Results. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210–217. ACM Press, New York (2004)
Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the Web. ACM Trans. Inter. Tech. 1, 2–43 (2001)
Meng, W., Yu, C., Liu, K.-L.: Building Efficient and Effective Metasearch Engines. ACM Comput. Surv. 34, 48–89 (2000)
Mladenic, D., Grobelnik, M.: Feature selection on hierarchy of web documents. Decis. Support Syst. 35, 45–87 (2003)
Gospodnetić, O., Hatcher, E.: Lucene In Action. Manning Publications, Greenwich (2005)
Zhu, D., Dreher, H.: Personalized Information Retrieval in Digital Ecosystems. In: Proceedings of the Second IEEE Digital Ecosystems and Technologies Conference, pp. 580–585. IEEE, Los Alamitos (2008)
Zhu, D.: RIB: A Personalized Ontology-based Categorization/Clustering Approach to Improve the Relevance of Web Search Results. In: Proceedings of Curtin Business School Doctorial Colloquium. Curtin University of Technology, Perth (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, D., Dreher, H. (2008). Improving Web Search by Categorization, Clustering, and Personalization. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_69
Download citation
DOI: https://doi.org/10.1007/978-3-540-88192-6_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)