Advertisement

Search personalization through query and page topical analysis

  • Sofia Stamou
  • Alexandros Ntoulas
Original Paper

Abstract

Thousands of users issue keyword queries to the Web search engines to find information on a number of topics. Since the users may have diverse backgrounds and may have different expectations for a given query, some search engines try to personalize their results to better match the overall interests of an individual user. This task involves two great challenges. First the search engines need to be able to effectively identify the user interests and build a profile for every individual user. Second, once such a profile is available, the search engines need to rank the results in a way that matches the interests of a given user. In this article, we present our work towards a personalized Web search engine and we discuss how we addressed each of these challenges. Since users are typically not willing to provide information on their personal preferences, for the first challenge, we attempt to determine such preferences by examining the click history of each user. In particular, we leverage a topical ontology for estimating a user’s topic preferences based on her past searches, i.e. previously issued queries and pages visited for those queries. We then explore the semantic similarity between the user’s current query and the query-matching pages, in order to identify the user’s current topic preference. For the second challenge, we have developed a ranking function that uses the learned past and current topic preferences in order to rank the search results to better match the preferences of a given user. Our experimental evaluation on the Google query-stream of human subjects over a period of 1 month shows that user preferences can be learned accurately through the use of our topical ontology and that our ranking function which takes into account the learned user preferences yields significant improvements in the quality of the search results.

Keywords

Personalized search Web search User preferences Topical ontology Topic-specific rankings 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agichtein, E., Brill, E., Dumais, S., Ragno, R.: Learning user interaction models for predicting web search result preferences. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp. 3–10. Seattle, WA (2006)Google Scholar
  2. Aktas, M., Nacar, M., Menczer, F.: Personalizing pageRank based on domain profiles. In: Proceedings of WebKDD 2004: KDD workshop on web mining and web usage analysis. Seattle, WA (2004)Google Scholar
  3. Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Intelligent scalable text summarization workshop (ISTS’97). ACL, Madrid, Spain (1997)Google Scholar
  4. Bentivogli, L., Forner, P., Magnini, B., Pianta, E.: Revising WordNet domains hierarchy: semantics, coverage, and balancing. In: Proceedings of COLING 2004 workshop on multilingual linguistic resources, pp. 101–108. Geneva, Switzerland (2004)Google Scholar
  5. Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Comput. Netw. 29 (8–13): 1157–1166 (1997) CrossRefGoogle Scholar
  6. Chen, L., Sycara, K.: WebMate: a personal agent for browsing and searching. In: Proceedings of the second international conference on autonomous agents, pp. 132–139. Minneapolis, MN (1998)Google Scholar
  7. Chirita, P.A., Firan, C.S., Nejdl, W.: Personalized query expansion for the web. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp. 7–14. Amsterdam, The Netherlands (2007)Google Scholar
  8. Dou, Z., Song, R., Wen, J.-R.: A large-scale evaluation and analysis of personalized search strategies. In: WWW ’07: Proceedings of the 16th international conference on World Wide Web, pp. 581–590. ACM, Banff, Alberta, Canada (2007)Google Scholar
  9. Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press (1998)Google Scholar
  10. Fox, S., Karnawat, K., Mydland, M., Dumais, S., White, T.: Evaluating implicit measures to improve web search. ACM Trans. Inf. Syst. 23(2): 147–168 (2005) CrossRefGoogle Scholar
  11. Gauch, S., Chaffee, J., Pretschner, A.: Ontology-based personalized search and browsing. Web Intell. Agent Syst. 1(3-4): 219–234 (2003) Google Scholar
  12. Gliozzo, A., Strapparava, C., Dagan, I.: Unsupervised and supervised exploitation of semantic domains in lexical disambiguation. Comput. Speech Lang. 3(18): 275–299 (2004) CrossRefGoogle Scholar
  13. Gulli, A., Signorini, A.: The indexable web is more than 11.5 billion pages. In: Special interest tracks and posters of the 14th international conference on World Wide Web, pp. 902–903. Chiba, Japan (2005)Google Scholar
  14. Haveliwala, T.H.: Topic-sensitive pagerank. In: Proceedings of the eleventh World Wide Web conference, pp. 517–526. Honolulu, HI (2002)Google Scholar
  15. Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manage. 36(2): 207–227 (2000) CrossRefGoogle Scholar
  16. Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th international conference on World Wide Web, pp. 271–279. Budapest, Hungary (2003)Google Scholar
  17. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp. 154–161. Salvador, Brazil (2005)Google Scholar
  18. Krikos, V., Stamou, S., Kokosis, P., Ntoulas, A., Christodoulakis, D.: DirectoryRank: ordering pages in web directories. In: WIDM ’05: Proceedings of the 7th annual ACM international workshop on web information and data management, pp. 17–22. ACM, Bremen, Germany (2005)Google Scholar
  19. Liu, F., Yu, C., Meng, W.: Personalized web search by mapping user queries to categories. In: CIKM ’02: Proceedings of the eleventh international conference on information and knowledge management, pp. 558–565. ACM, McLean, VA (2002)Google Scholar
  20. Ma, Z., Pant, G., Sheng, O.R.L.: Interest-based personalized search. ACM Trans. Inf. Syst. 25(1): 5 (2007)CrossRefGoogle Scholar
  21. My Yahoo!: My Yahoo! http://my.yahoo.com (2007)
  22. Pazzani, M.J., Muramatsu, J., Billsus, D.: Syskill & webert: identifying interesting web sites. In: Proceedings of the 13th national conference on artificial intelligence and 8th conference on innovative applications of artificial intelligence, vol. 1, pp. 54–61. Portland, OR (1996)Google Scholar
  23. Pease, A., Niles, I., Li, J.: The suggested upper merged ontology: a large ontology for the semantic web and its applications. In: Working notes of the AAAI-2002 workshop on ontologies and the semantic web. Edmonton, Canada (2002)Google Scholar
  24. Pretschner, A., Gauch, S.: Ontology based personalized search. In: Proceedings of the 11th IEEE international conference on tools with artificial intelligence, pp. 391–298. IEEE Computer Society, Chicago, IL (1999)Google Scholar
  25. Qiu, F., Cho, J.: Automatic identification of user interest for personalized search. In: Proceedings of the 15th international conference on World Wide Web, pp. 727–736. Edinburgh, Scotland (2006)Google Scholar
  26. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the international joint conference on artificial intelligence, pp. 448–453. Montreal, Quebec, Canada (1995)Google Scholar
  27. Richardson, M., Domingos, P.: The intelligent surfer: probabilistic combination of link and content information in PageRank. In: Advances in neural information processing systems 14, pp. 1441–1448. MIT Press, Cambridge, MA (2002)Google Scholar
  28. Shen, X., Zhai, C.: Exploiting query history for document ranking in interactive information retrieval. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp. 377–378. Toronto, Canada (2003)Google Scholar
  29. Song, Y.I., Han, K.S., Rim, H.C.: A term weighting method based on lexical chain for automatic summarization. In: Proceedings of the international conference on intelligent text processing and computational linguistics, pp. 636–639. Seoul, Korea (2004)Google Scholar
  30. Speretta, M., Gauch, S.: Personalizing search based on user search histories. In: Proceedings of the 2004 CIKM conference on information and knowledge management. Washington, DC (2004)Google Scholar
  31. Stamou, S., Ntoulas, A., Krikos, V., Kokosis, P., Christodoulakis, D.: Classifying web data in directory structures. In: Proceedings of the 8th Asia pacific web conference, pp. 238–249. Harbin, China (2006)Google Scholar
  32. Stamou, S., Ntoulas, A., Christodoulakis, D.: TODE: an ontology based model for the dynamic population of web directories. In: Data management with ontologies: implementations, findings and frameworks, Published by Idea Group Inc., pp. 1–17 (2007)Google Scholar
  33. Sugiyama, K., Hatano, K., Yoshikawa, M.: Adaptive web search based on user profile constructed without any effort from users. In: Proceedings of the 13th international conference on World Wide Web, pp. 675–684. New York, NY (2004)Google Scholar
  34. Sun, J.T., Zeng, H.J., Liu, H., Lu, Y., Chen, Z.: CubeSVD: a novel approach to personalized web search. In: Proceedings of the 14th international conference on World Wide Web, pp. 382–390. Chiba, Japan (2005)Google Scholar
  35. Teevan, J., Dumais, S.T., Horvitz, E.: Personalizing search via automated analysis of interests and activities. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp. 449–456. Salvador, Brazil (2005)Google Scholar
  36. Teevan, J., Adar, E., Jones, R., Potts, M.: Information re-retrieval: repeat queries in Yahoo’s Logs. In: SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp. 151–158. ACM, Amsterdam, Netherlands (2007)Google Scholar
  37. Turney, P.: Word sense disambiguation by web mining for word co-Occurrence probabilities. In: Proceedings of the third international workshop on the evaluation of systems for the semantic analysis of text (SENSEVAL-3), pp. 239–242. Barcelona, Spain (2004)Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.Department of Computer Engineering and InformaticsPatras UniversityPatrasGreece
  2. 2.Microsoft ResearchMountain ViewUSA

Personalised recommendations