Comparing Web Logs: Sensitivity Analysis and Two Types of Cross-Analysis

  • Nikolai Buzikashvili
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4182)


Different Web log studies calculate the same metrics using different search engines logs sampled during different observation periods and processed under different values of two controllable variables peculiar to the Web log analysis: a client discriminator used to exclude clients who are agents and a temporal cut-off used to segment logged client transactions into temporal sessions. How much are the results dependent on these variables? We analyze the sensitivity of the results to two controllable variables. The sensitivity analysis shows significant varying of the metrics values depending on these variables. In particular, the metrics varies up to 30-50% on the commonly assigned values. So the differences caused by controllable variables are of the same order of magnitude as the differences between the metrics reported in different studies. Thus, the direct comparison of the reported results is an unreliable approach leading to artifactual conclusions. To overcome the method-dependency of the direct comparison of the reported results we introduce and use a cross-analysis technique of the direct comparison of logs. Besides, we propose an alternative easy-accessible comparison of the reported metrics, which corrects the reported values accordingly to the controllable variables used in the studies.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buzikashvili, N.: The Yandex study: First findings. Internet-math. Yandex, 95–120 (2005)Google Scholar
  2. 2.
    Holscher, C., Strube, G.: Web search behavior of internet experts and newbies. International Journal of Computer and Telecommunications Networking 33(1-6), 337–346 (2000)Google Scholar
  3. 3.
    Jansen, B.J., Spink, A.: How are we searching the World Wide Web? An analysis of nine search engine transaction logs. Inf. Processing & Management 42(1), 248–263 (2006)CrossRefGoogle Scholar
  4. 4.
    Silverstein, C., Henzinger, M., Marais, H., Moricz, M.: Analysis of a very large web search engine query log. SIGIR Forum 33(1), 6–12 (1999)CrossRefGoogle Scholar
  5. 5.
    Spink, A., Ozmutlu, H.C., Ozmutlu, S., Jansen, B.J.: U.S. versus European Web search trends. SIGIR Forum 36(2), 32–38 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nikolai Buzikashvili
    • 1
  1. 1.Institute of system analysisRussian Academy of ScienceMoscowRussia

Personalised recommendations