Skip to main content

Web Usage Data Pre-processing

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 452))

Abstract

End users leave traces of behavior all over the Web all times. From the explicit or implicit feedback of a multimedia document or a comment in an online social network, to a simple click in a relevant link in a search engine result, the information that we as users pour into the Web defines its actual representation, which is independent for each user. Our usage can be represented by different sources of data, for which different collection strategies must be considered, as well as the merging and cleaning techniques for Web usage data. Once the data is properly preprocessed, the identification of an individual user within the Web can be a complex task. Understanding the whole life of a user within a session in a Web site and the path that was pursued involves advanced data modeling and a set of assumptions which are modified every day, as new ways to interact with the online content are created. The objective is to understand the behaviour and preferences of a web user, also when several privacy issues are involved, which, as of today, are not clear how to be properly addressed. In this chapter, all previous topics regarding the processing of Web usage data are extensively discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, 2nd edn. Addison-Wesley Publishing Company, USA (2008)

    Google Scholar 

  2. Campbell, I., Van Rijsbergen, C.J.: The ostensive model of developing information needs, pp. 251–268. The Royal School of Librarianship (1996)

    Google Scholar 

  3. Cerrolaza, J.J., Villanueva, A., Cabeza, R.: Taxonomic study of polynomial regressions applied to the calibration of video-oculographic systems. In: Proceedings of the 2008 Symposium on Eye Tracking Research & Applications, ETRA 2008, pp. 259–266. ACM, New York (2008)

    Chapter  Google Scholar 

  4. Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1(1), 5–32 (1999)

    Google Scholar 

  5. Das, R., Turkoglu, I.: Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method. Expert Syst. Appl. 36(3), 6635–6644 (2009)

    Article  Google Scholar 

  6. Dell, R.F., Román, P.E., Velásquez, J.D.: Web user session reconstruction using integer programming. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01, WI-IAT 2008, pp. 385–388. IEEE Computer Society, Washington, DC (2008)

    Chapter  Google Scholar 

  7. Demir, G.N., Goksedef, M., Etaner-Uyar, A.S.: Effects of session representation models on the performance of web recommender systems. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, ICDEW 2007, pp. 931–936. IEEE Computer Society, Washington, DC (2007)

    Chapter  Google Scholar 

  8. Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Min. Knowl. Discov. 22(1-2), 183–210 (2011)

    Article  Google Scholar 

  9. Duchowski, A.T.: Eye Tracking Methodology: Theory and Practice. Springer-Verlag New York, Inc., Secaucus (2007)

    Google Scholar 

  10. Faro, A., Giordano, D., Pino, C., Spampinato, C.: Visual attention for implicit relevance feedback in a content based image retrieval. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA 2010, pp. 73–76. ACM, New York (2010)

    Chapter  Google Scholar 

  11. Granka, L.A., Joachims, T., Gay, G.: Eye-tracking analysis of user behavior in www search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2004, pp. 478–479. ACM, New York (2004)

    Google Scholar 

  12. Gündüz, Ş., Tamer Özsu, M.: A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 535–540. ACM, New York (2003)

    Chapter  Google Scholar 

  13. Hoashi, K., Matsumoto, K., Inoue, N.: Personalization of user profiles for content-based music retrieval based on relevance feedback. In: Proceedings of the Eleventh ACM International Conference on Multimedia, MULTIMEDIA 2003, pp. 110–119. ACM, New York (2003)

    Chapter  Google Scholar 

  14. Hopfgartner, F., Hannah, D., Gildea, N., Jose, J.M.: Capturing multiple interests in news video retrieval by incorporating the ostensive model. In: PersDB 2008, 2nd International Workshop on Personalized Access, Profile Management, and Context Awareness: Databases, Electronic Proceedings, pp. 48–55 (2008)

    Google Scholar 

  15. Hopfgartner, F., Jose, J.: Evaluating the implicit feedback models for adaptive video retrieval. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, MIR 2007, pp. 323–331. ACM, New York (2007)

    Chapter  Google Scholar 

  16. Ide, E.: New Experiments in Relevance Feedback. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  17. Ivancsy, R., Juhasz, S.: Analysis of web user identification methods. World Academy of Science, Engineering, and Technology 34, 34–59 (2007)

    Google Scholar 

  18. Jawaheer, G., Szomszor, M., Kostkova, P.: Comparison of implicit and explicit feedback from an online music recommendation service. In: Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems, HetRec 2010, pp. 47–51. ACM, New York (2010)

    Chapter  Google Scholar 

  19. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., Gay, G.: Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst. 25(2) (April 2007)

    Google Scholar 

  20. Jung, J.J., Jo, G.-S.: Semantic outlier analysis for sessionizing web logs. In: Proceedings of the 1st European Web Mining Forum, EWMF 2003, Croatia (2003)

    Google Scholar 

  21. Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37(2) (September 2003)

    Google Scholar 

  22. Khasawneh, N., Chan, C.-C.: Active user-based and ontology-based web log data preprocessing for web usage mining. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2006, pp. 325–328. IEEE Computer Society, Washington, DC (2006)

    Chapter  Google Scholar 

  23. Kohonen, T.: Self-organizing maps. Springer-Verlag New York, Inc., Secaucus (1997)

    Book  MATH  Google Scholar 

  24. Komogortsev, O.V., Jayarathna, S., Koh, D.H., Gowda, S.M.: Qualitative and quantitative scoring and evaluation of the eye movement classification algorithms. Technical Reports-Computer Science, San Marcos, Texas, Texas State University (2009)

    Google Scholar 

  25. Komogortsev, O.V., Jayarathna, S., Koh, D.H., Gowda, S.M.: Qualitative and quantitative scoring and evaluation of the eye movement classification algorithms. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA 2010, pp. 65–68. ACM, New York (2010)

    Chapter  Google Scholar 

  26. Li, Y., Feng, B., Mao, Q.: Research on path completion technique in web usage mining. In: Proceedings of the 2008 International Symposium on Computer Science and Computational Technology - Volume 01, ISCSCT 2008, pp. 554–559. IEEE Computer Society, Washington, DC (2008)

    Chapter  Google Scholar 

  27. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York, Inc., Secaucus (2006)

    Google Scholar 

  28. Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Effective personalization based on association rule discovery from web usage data. In: Proceedings of the 3rd International Workshop on Web Information and Data Management, WIDM 2001, pp. 9–15. ACM, New York (2001)

    Chapter  Google Scholar 

  29. Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggregate usage profiles for web personalization. Data Min. Knowl. Discov. 6(1), 61–82 (2002)

    Article  MathSciNet  Google Scholar 

  30. Moloney, M., Bannister, F.: A privacy control theory for online environments. In: Proceedings of the 42nd Hawaii International Conference on System Sciences, HICSS 2009, pp. 1–10. IEEE Computer Society, Washington, DC (2009)

    Google Scholar 

  31. Nakayama, M., Hayashi, Y.: Estimation of viewer’s response for contextual understanding of tasks using features of eye-movements. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA 2010, pp. 53–56. ACM, New York (2010)

    Chapter  Google Scholar 

  32. Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans. on Knowl. and Data Eng. 20(2), 202–215 (2008)

    Article  Google Scholar 

  33. Nichols, D.M.: Implicit rating and filtering. In: In Proceedings of the Fifth DELOS Workshop on Filtering and Collaborative Filtering, pp. 31–36 (1998)

    Google Scholar 

  34. Poole, A., Ball, L.J.: Eye Tracking in Human-Computer Interaction and Usability Research: Current Status and Future (2005)

    Google Scholar 

  35. Rayner, K.: Eye movements in reading and information processing: 20 years of research. Psychological Bulletin 124(3), 372–422 (1998)

    Article  Google Scholar 

  36. Rocchio, J.J.: Relevance Feedback in Information Retrieval. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  37. Sadagopan, N., Li, J.: Characterizing typical and atypical user sessions in clickstreams. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 885–894. ACM, New York (2008)

    Chapter  Google Scholar 

  38. Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. J. ACM 15(1), 8–36 (1968)

    Article  MATH  Google Scholar 

  39. Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. In: Readings in Information Retrieval, pp. 355–364. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  40. Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web-usage analysis. INFORMS Journal on Computing 15(2), 171–190 (2003)

    Article  MATH  Google Scholar 

  41. Stevanovic, D., An, A., Vlajic, N.: Detecting Web Crawlers from Web Server Access Logs with Data Mining Classifiers. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS, vol. 6804, pp. 483–489. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  42. Suda, B.: Using microformats, 1st edn. O’Reilly (2006)

    Google Scholar 

  43. Tan, P.-N., Kumar, V.: Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov. 6(1), 9–35 (2002)

    Article  MathSciNet  Google Scholar 

  44. Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 113 (2007)

    Article  Google Scholar 

  45. Vallet, D., Hopfgartner, F., Jose, J.M., Castells, P.: Effects of usage-based feedback on video retrieval: A simulation-based study. ACM Trans. Inf. Syst. 29(2), 11:1–11:32 (2011)

    Google Scholar 

  46. Velasquez, J.D., Palade, V.: Building a knowledge base for implementing a web-based computerized recommendation system. International Journal of Artificial Intelligence Tools 16(5), 793–828 (2007)

    Article  Google Scholar 

  47. Velasquez, J.D., Palade, V.: A knowledge base for the maintenance of knowledge extracted from web data. Knowledge?Based Systems Journal 20(3), 238–248 (2007)

    Article  Google Scholar 

  48. Velásquez, J.D., Dujovne, L.E., L’Huillier, G.: Extracting significant website key objects: A semantic web mining approach. Eng. Appl. Artif. Intell. 24(8), 1532–1541 (2011)

    Article  Google Scholar 

  49. Velásquez, J.D., Palade, V.: Adaptive Web Sites: A Knowledge Extraction from Web Data Approach. IOS Press, Amsterdam (2008)

    Google Scholar 

  50. Wang, S., Schlobach, S., Klein, M.: What Is Concept Drift and How to Measure It? In: Cimiano, P., Pinto, H.S. (eds.) EKAW 2010. LNCS, vol. 6317, pp. 241–256. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  51. White, R.W., Ruthven, I., Jose, J.M.: The Use of Implicit Evidence for Relevance Feedback in Web Retrieval. In: Crestani, F., Girolami, M., van Rijsbergen, C.J.K. (eds.) ECIR 2002. LNCS, vol. 2291, pp. 93–109. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  52. White, R.W., Ruthven, I., Jose, J.M., Van Rijsbergen, C.J.: Evaluating implicit feedback models using searcher simulations. ACM Trans. Inf. Syst. 23(3) (July 2005)

    Google Scholar 

  53. Yue, C., Xie, M., Wang, H.: Automatic cookie usage setting with cookiepicker. In: Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2007, pp. 460–470. IEEE Computer Society, Washington, DC (2007)

    Chapter  Google Scholar 

  54. Zhang, Y., Fu, H., Liang, Z., Chi, Z., Feng, D.: Eye movement as an interaction mechanism for relevance feedback in a content-based image retrieval system. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA 2010, pp. 37–40. ACM, New York (2010)

    Chapter  Google Scholar 

  55. Zigoris, P., Zhang, Y.: Bayesian adaptive user profiling with explicit & implicit feedback. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM 2006, pp. 397–404. ACM, New York (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gaston L’Huillier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

L’Huillier, G., Velásquez, J.D. (2013). Web Usage Data Pre-processing. In: Velásquez, J., Palade, V., Jain, L. (eds) Advanced Techniques in Web Intelligence-2. Studies in Computational Intelligence, vol 452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33326-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33326-2_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33325-5

  • Online ISBN: 978-3-642-33326-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics