Spying Out Accurate User Preferences for Search Engine Adaptation

  • Lin Deng
  • Wilfred Ng
  • Xiaoyong Chai
  • Dik-Lun Lee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3932)


Most existing search engines employ static ranking algorithms that do not adapt to the specific needs of users. Recently, some researchers have studied the use of clickthrough data to adapt a search engine’s ranking function. Clickthrough data indicate for each query the results that are clicked by users. As a kind of implicit relevance feedback information, clickthrough data can easily be collected by a search engine. However, clickthrough data is sparse and incomplete, thus, it is a challenge to discover accurate user preferences from it. In this paper, we propose a novel algorithm called “Spy Naïve Bayes” (SpyNB) to identify user preferences generated from clickthrough data. First, we treat the result items clicked by the users as sure positive examples and those not clicked by the users as unlabelled data. Then, we plant the sure positive examples (the spies) into the unlabelled set of result items and apply a naïve Bayes classification to generate the reliable negative examples. These positive and negative examples allow us to discover more accurate user’s preferences. Finally, we employ the SpyNB algorithm with a ranking SVM optimizer to build an adaptive metasearch engine. Our experimental results show that, compared with the original ranking, SpyNB can significantly improve the average ranks of users’ click by 20%.


Search Engine User Preference Ranking Function Vote Procedure Preference Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bartell, B., Cottrell, G., Belew, R.: Automatic combination of multiple ranked retrieval systemss. In: Proc. of the 17th ACM SIGIR Conference, pp. 173–181 (1994)Google Scholar
  2. 2.
    Cohen, W., Shapire, R., Singer, Y.: Learning to order things. Journal of Artifical Intelligence Research 10, 243–270 (1999)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Boyan, J., Freitag, D., Joachims, T.: A machine learning architecture for optimizing web search engines. In: Proc. of AAAI workshop on Internet-Based Information System (1996)Google Scholar
  4. 4.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. of the 8th ACM SIGKDD Conference, pp. 133–142 (2002)Google Scholar
  5. 5.
    Tan, Q., Chai, X., Ng, W., Lee, D.: Applying co-training to clickthrough data for search engine adaptation. In: Proc. of the 9th DASFAA conference, pp. 519–532 (2004)Google Scholar
  6. 6.
    Li, X., Liu, B.: Learning to classify text using positive and unlabeled data. In: Proc. of 8th International Joint Conference on Artificial Intelligence (2003)Google Scholar
  7. 7.
    Liu, B., Dai, Y., Li, X., Lee, W.S.: Building text classifiers using positive and unlabeled examples. In: Proc. of the 3rd International Conference on Data Mining (2003)Google Scholar
  8. 8.
    Liu, B., Lee, W.S., Yu, P., Li, X.: Partially supervised classification of text documents. In: Proc. of the 19th International Conference on Machine Learning (2002)Google Scholar
  9. 9.
    Yu, H., Han, J., Chang, K.: PEBL: Positive example based learning for web page classification using svm. In: Proc. of the 8th ACM SIGKDD Conference (2002)Google Scholar
  10. 10.
    Mitchell, T.: Machine Learning. McGraw Hill, Inc., New York (1997)zbMATHGoogle Scholar
  11. 11.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-wesley-Longman, Harlow (1999)Google Scholar
  12. 12.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proc. of AAAI/ICML 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)Google Scholar
  13. 13.
    Hoffgen, K., Simon, H., Horn, K.V.: Robust trainability of single neurons. Journal of Computer and System Sciences 50, 114–125 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Joachims, T.: Making large-scale SVM learning practical. In: Scholkoph, B., et al. (eds.) Advances in Kernel Methods – Support Vector Learning. MIT Press, Cambridge (1999), Google Scholar
  15. 15.
    Joachims, T.: Evaluating retrieval performance using clickthrough data. In: Proc. of the SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lin Deng
    • 1
  • Wilfred Ng
    • 1
  • Xiaoyong Chai
    • 1
  • Dik-Lun Lee
    • 1
  1. 1.Department of Computer ScienceHong Kong University of Science and TechnologyHong Kong

Personalised recommendations