Web Usage Data Pre-processing

  • Gaston L’Huillier
  • Juan D. Velásquez
Part of the Studies in Computational Intelligence book series (SCI, volume 452)


End users leave traces of behavior all over the Web all times. From the explicit or implicit feedback of a multimedia document or a comment in an online social network, to a simple click in a relevant link in a search engine result, the information that we as users pour into the Web defines its actual representation, which is independent for each user. Our usage can be represented by different sources of data, for which different collection strategies must be considered, as well as the merging and cleaning techniques for Web usage data. Once the data is properly preprocessed, the identification of an individual user within the Web can be a complex task. Understanding the whole life of a user within a session in a Web site and the path that was pursued involves advanced data modeling and a set of assumptions which are modified every day, as new ways to interact with the online content are created. The objective is to understand the behaviour and preferences of a web user, also when several privacy issues are involved, which, as of today, are not clear how to be properly addressed. In this chapter, all previous topics regarding the processing of Web usage data are extensively discussed.


Relevance Feedback User Session Implicit Feedback Search Engine Result Explicit Relevance Feedback 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 2nd edn. Addison-Wesley Publishing Company, USA (2008)Google Scholar
  2. 2.
    Campbell, I., Van Rijsbergen, C.J.: The ostensive model of developing information needs, pp. 251–268. The Royal School of Librarianship (1996)Google Scholar
  3. 3.
    Cerrolaza, J.J., Villanueva, A., Cabeza, R.: Taxonomic study of polynomial regressions applied to the calibration of video-oculographic systems. In: Proceedings of the 2008 Symposium on Eye Tracking Research & Applications, ETRA 2008, pp. 259–266. ACM, New York (2008)CrossRefGoogle Scholar
  4. 4.
    Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1(1), 5–32 (1999)Google Scholar
  5. 5.
    Das, R., Turkoglu, I.: Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method. Expert Syst. Appl. 36(3), 6635–6644 (2009)CrossRefGoogle Scholar
  6. 6.
    Dell, R.F., Román, P.E., Velásquez, J.D.: Web user session reconstruction using integer programming. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01, WI-IAT 2008, pp. 385–388. IEEE Computer Society, Washington, DC (2008)CrossRefGoogle Scholar
  7. 7.
    Demir, G.N., Goksedef, M., Etaner-Uyar, A.S.: Effects of session representation models on the performance of web recommender systems. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, ICDEW 2007, pp. 931–936. IEEE Computer Society, Washington, DC (2007)CrossRefGoogle Scholar
  8. 8.
    Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Min. Knowl. Discov. 22(1-2), 183–210 (2011)CrossRefGoogle Scholar
  9. 9.
    Duchowski, A.T.: Eye Tracking Methodology: Theory and Practice. Springer-Verlag New York, Inc., Secaucus (2007)Google Scholar
  10. 10.
    Faro, A., Giordano, D., Pino, C., Spampinato, C.: Visual attention for implicit relevance feedback in a content based image retrieval. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA 2010, pp. 73–76. ACM, New York (2010)CrossRefGoogle Scholar
  11. 11.
    Granka, L.A., Joachims, T., Gay, G.: Eye-tracking analysis of user behavior in www search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2004, pp. 478–479. ACM, New York (2004)Google Scholar
  12. 12.
    Gündüz, Ş., Tamer Özsu, M.: A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 535–540. ACM, New York (2003)CrossRefGoogle Scholar
  13. 13.
    Hoashi, K., Matsumoto, K., Inoue, N.: Personalization of user profiles for content-based music retrieval based on relevance feedback. In: Proceedings of the Eleventh ACM International Conference on Multimedia, MULTIMEDIA 2003, pp. 110–119. ACM, New York (2003)CrossRefGoogle Scholar
  14. 14.
    Hopfgartner, F., Hannah, D., Gildea, N., Jose, J.M.: Capturing multiple interests in news video retrieval by incorporating the ostensive model. In: PersDB 2008, 2nd International Workshop on Personalized Access, Profile Management, and Context Awareness: Databases, Electronic Proceedings, pp. 48–55 (2008)Google Scholar
  15. 15.
    Hopfgartner, F., Jose, J.: Evaluating the implicit feedback models for adaptive video retrieval. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, MIR 2007, pp. 323–331. ACM, New York (2007)CrossRefGoogle Scholar
  16. 16.
    Ide, E.: New Experiments in Relevance Feedback. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  17. 17.
    Ivancsy, R., Juhasz, S.: Analysis of web user identification methods. World Academy of Science, Engineering, and Technology 34, 34–59 (2007)Google Scholar
  18. 18.
    Jawaheer, G., Szomszor, M., Kostkova, P.: Comparison of implicit and explicit feedback from an online music recommendation service. In: Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems, HetRec 2010, pp. 47–51. ACM, New York (2010)CrossRefGoogle Scholar
  19. 19.
    Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. 25(2) (April 2007)Google Scholar
  20. 20.
    Jung, J.J., Jo, G.-S.: Semantic outlier analysis for sessionizing web logs. In: Proceedings of the 1st European Web Mining Forum, EWMF 2003, Croatia (2003)Google Scholar
  21. 21.
    Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37(2) (September 2003)Google Scholar
  22. 22.
    Khasawneh, N., Chan, C.-C.: Active user-based and ontology-based web log data preprocessing for web usage mining. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2006, pp. 325–328. IEEE Computer Society, Washington, DC (2006)CrossRefGoogle Scholar
  23. 23.
    Kohonen, T.: Self-organizing maps. Springer-Verlag New York, Inc., Secaucus (1997)MATHCrossRefGoogle Scholar
  24. 24.
    Komogortsev, O.V., Jayarathna, S., Koh, D.H., Gowda, S.M.: Qualitative and quantitative scoring and evaluation of the eye movement classification algorithms. Technical Reports-Computer Science, San Marcos, Texas, Texas State University (2009)Google Scholar
  25. 25.
    Komogortsev, O.V., Jayarathna, S., Koh, D.H., Gowda, S.M.: Qualitative and quantitative scoring and evaluation of the eye movement classification algorithms. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA 2010, pp. 65–68. ACM, New York (2010)CrossRefGoogle Scholar
  26. 26.
    Li, Y., Feng, B., Mao, Q.: Research on path completion technique in web usage mining. In: Proceedings of the 2008 International Symposium on Computer Science and Computational Technology - Volume 01, ISCSCT 2008, pp. 554–559. IEEE Computer Society, Washington, DC (2008)CrossRefGoogle Scholar
  27. 27.
    Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York, Inc., Secaucus (2006)Google Scholar
  28. 28.
    Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Effective personalization based on association rule discovery from web usage data. In: Proceedings of the 3rd International Workshop on Web Information and Data Management, WIDM 2001, pp. 9–15. ACM, New York (2001)CrossRefGoogle Scholar
  29. 29.
    Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggregate usage profiles for web personalization. Data Min. Knowl. Discov. 6(1), 61–82 (2002)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Moloney, M., Bannister, F.: A privacy control theory for online environments. In: Proceedings of the 42nd Hawaii International Conference on System Sciences, HICSS 2009, pp. 1–10. IEEE Computer Society, Washington, DC (2009)Google Scholar
  31. 31.
    Nakayama, M., Hayashi, Y.: Estimation of viewer’s response for contextual understanding of tasks using features of eye-movements. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA 2010, pp. 53–56. ACM, New York (2010)CrossRefGoogle Scholar
  32. 32.
    Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. on Knowl. and Data Eng. 20(2), 202–215 (2008)CrossRefGoogle Scholar
  33. 33.
    Nichols, D.M.: Implicit rating and filtering. In: In Proceedings of the Fifth DELOS Workshop on Filtering and Collaborative Filtering, pp. 31–36 (1998)Google Scholar
  34. 34.
    Poole, A., Ball, L.J.: Eye Tracking in Human-Computer Interaction and Usability Research: Current Status and Future (2005)Google Scholar
  35. 35.
    Rayner, K.: Eye movements in reading and information processing: 20 years of research. Psychological Bulletin 124(3), 372–422 (1998)CrossRefGoogle Scholar
  36. 36.
    Rocchio, J.J.: Relevance Feedback in Information Retrieval. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  37. 37.
    Sadagopan, N., Li, J.: Characterizing typical and atypical user sessions in clickstreams. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 885–894. ACM, New York (2008)CrossRefGoogle Scholar
  38. 38.
    Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. J. ACM 15(1), 8–36 (1968)MATHCrossRefGoogle Scholar
  39. 39.
    Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. In: Readings in Information Retrieval, pp. 355–364. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  40. 40.
    Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web-usage analysis. INFORMS Journal on Computing 15(2), 171–190 (2003)MATHCrossRefGoogle Scholar
  41. 41.
    Stevanovic, D., An, A., Vlajic, N.: Detecting Web Crawlers from Web Server Access Logs with Data Mining Classifiers. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS, vol. 6804, pp. 483–489. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  42. 42.
    Suda, B.: Using microformats, 1st edn. O’Reilly (2006)Google Scholar
  43. 43.
    Tan, P.-N., Kumar, V.: Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov. 6(1), 9–35 (2002)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 113 (2007)CrossRefGoogle Scholar
  45. 45.
    Vallet, D., Hopfgartner, F., Jose, J.M., Castells, P.: Effects of usage-based feedback on video retrieval: A simulation-based study. ACM Trans. Inf. Syst. 29(2), 11:1–11:32 (2011)Google Scholar
  46. 46.
    Velasquez, J.D., Palade, V.: Building a knowledge base for implementing a web-based computerized recommendation system. International Journal of Artificial Intelligence Tools 16(5), 793–828 (2007)CrossRefGoogle Scholar
  47. 47.
    Velasquez, J.D., Palade, V.: A knowledge base for the maintenance of knowledge extracted from web data. Knowledge?Based Systems Journal 20(3), 238–248 (2007)CrossRefGoogle Scholar
  48. 48.
    Velásquez, J.D., Dujovne, L.E., L’Huillier, G.: Extracting significant website key objects: A semantic web mining approach. Eng. Appl. Artif. Intell. 24(8), 1532–1541 (2011)CrossRefGoogle Scholar
  49. 49.
    Velásquez, J.D., Palade, V.: Adaptive Web Sites: A Knowledge Extraction from Web Data Approach. IOS Press, Amsterdam (2008)Google Scholar
  50. 50.
    Wang, S., Schlobach, S., Klein, M.: What Is Concept Drift and How to Measure It? In: Cimiano, P., Pinto, H.S. (eds.) EKAW 2010. LNCS, vol. 6317, pp. 241–256. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  51. 51.
    White, R.W., Ruthven, I., Jose, J.M.: The Use of Implicit Evidence for Relevance Feedback in Web Retrieval. In: Crestani, F., Girolami, M., van Rijsbergen, C.J.K. (eds.) ECIR 2002. LNCS, vol. 2291, pp. 93–109. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  52. 52.
    White, R.W., Ruthven, I., Jose, J.M., Van Rijsbergen, C.J.: Evaluating implicit feedback models using searcher simulations. ACM Trans. Inf. Syst. 23(3) (July 2005)Google Scholar
  53. 53.
    Yue, C., Xie, M., Wang, H.: Automatic cookie usage setting with cookiepicker. In: Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2007, pp. 460–470. IEEE Computer Society, Washington, DC (2007)CrossRefGoogle Scholar
  54. 54.
    Zhang, Y., Fu, H., Liang, Z., Chi, Z., Feng, D.: Eye movement as an interaction mechanism for relevance feedback in a content-based image retrieval system. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA 2010, pp. 37–40. ACM, New York (2010)CrossRefGoogle Scholar
  55. 55.
    Zigoris, P., Zhang, Y.: Bayesian adaptive user profiling with explicit & implicit feedback. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM 2006, pp. 397–404. ACM, New York (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Groupon, Inc.Palo AltoUSA
  2. 2.Department of Industrial EngineeringUniversity of ChileSantiagoChile

Personalised recommendations