Semantic Analysis for Data Preparation of Web Usage Mining

  • Jason J. Jung
  • Geun-Sik Jo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3029)


As the web usage patterns from clients are getting more complex, simple sessionizations based on time and navigation-oriented heuristics have been restricted to exploit various kinds of rule discovering methods. In this paper, we present semantic analysis approach based on semantic session reconstruction as finding out semantic outliers from web log data. Web directory service is applied to enrich semantics to web logs, categorizing them to all possible hierarchical paths. In order to detect the candidate set of session identifiers, semantic factors like semantic mean, deviation, and distance matrix are established. Eventually, each semantic session is obtained based on nested repetition of top-down partitioning and evaluation process. For experiment, we applied this ontology-oriented heuristics to sessionize the access log files for one week from IRCache. Compared with time-oriented heuristics, more than 48% of sessions were additionally detected by semantic outlier analysis. It means that we can conceptually track the behavior of users tending to easily change their intentions and interests, or simultaneously try to search various kinds of information on the web.


Semantic Distance Semantic Factor Cache Server Levenshtein Edit Distance Semantic Session 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cooley, R., Srivastava, J., Mobasher, B.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proc. of the 9th IEEE Int. Conf. on Tools with Artificial Intelligence (1997)Google Scholar
  2. 2.
    Batista, P., Silva, M.J.: Web Access Mining from an On-line Newspaper Logs. In: Proc. 12th Int. Meeting of the Euro Working Group on Decision Support Systems (2001)Google Scholar
  3. 3.
    Bonchi, F., Giannotti, F., Gozzi, C., Manco, G., Nanni, M., Pedreschi, D., Renso, C., Ruggieri, S.: Web log data warehousing and mining for intelligent web caching. Data and Knowledge Engineering 39(2), 165–189 (2001)MATHCrossRefGoogle Scholar
  4. 4.
    Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1(1), 5–32 (1999)Google Scholar
  5. 5.
    Berendt, B., Spiliopoulou, M.: Analysing navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9(1), 56–75 (2000)CrossRefGoogle Scholar
  6. 6.
    Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Communications of the ACM 43(8) (2000)Google Scholar
  7. 7.
    Berendt, B., Mobasher, B., Nakagawa, M., Spiliopoulou, M.: The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis. In: Proc. of the 4th WebKDD Workshop at the ACM-SIGKDD Conf. on Knowledge Discovery in Databases (2002)Google Scholar
  8. 8.
    Chen, Z., Tao, L., Wang, J., Wenyin, L., Ma, W.-Y.: A Unified Framework for Web Link Analysis. In: Proc. of the 3rd Int. Conf. on Web Information Systems Engineering, pp. 63–72 (2002)Google Scholar
  9. 9.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
  10. 10.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: Proc. of the ACM SIGMOD Conf. on Management of Data, pp. 427–438 (2000)Google Scholar
  11. 11.
    IRCache Users Guide,
  12. 12.
    Arning, A., Agrawal, R., Raghavan, P.: A Linear Model for Deviation Detection in Large Databases. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pp. 164–169 (1996)Google Scholar
  13. 13.
    Menasalvas, E., Millan, S., Pena, J.M., Hadjimichael, M., Marban, O.: Subsessions: a granular approach to click path analysis. In: Proc. of the IEEE Int. Conf. on Fuzzy Systems, pp. 878–883 (2002)Google Scholar
  14. 14.
    Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, Springer, Heidelberg (2000)CrossRefGoogle Scholar
  15. 15.
  16. 16.
    Labrou, Y., Finin, T.: Yahoo! as an Ontology: Using Yahoo! Categories to Describe Documents. In: Proc. of the 8th Int. Conf. on Information Knowledge Management, pp. 180–187 (1999)Google Scholar
  17. 17.
    McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Building Domain-Specific Search Engines with Machine Learning Techniques. In: AAAI Spring Symposium (1999)Google Scholar
  18. 18.
    Jung, J.J., Yoon, J.-S., Jo, G.-S.: Collaborative Information Filtering by Using Categorized Bookmarks on the Web. In: Proc. of the 14th Int. Conf. on Applications of Prolog, pp. 343–357 (2001)Google Scholar
  19. 19.
    Levenshtein, I.V.: Binary Codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10(8), 707–710 (1966)MathSciNetGoogle Scholar
  20. 20.
    Aggarwal, C., Wolf, J.L., Yu, P.S.: Caching on the World Wide Web. IEEE Tran. on Knowldge and Data Engineering 11(1), 94–107 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Jason J. Jung
    • 1
  • Geun-Sik Jo
    • 1
  1. 1.Intelligent E-Commerce Systems Laboratory, School of Computer EngineeringInha UniversityIncheonKorea

Personalised recommendations