Advertisement

Journal of Classification

, Volume 33, Issue 2, pp 298–324 | Cite as

Analysis of Web Visit Histories, Part I: Distance-Based Visualization of Sequence Rules

  • Roberta Siciliano
  • Antonia D’Ambrosio
  • Massimo Aria
  • Sonia Amodio
Article

Abstract

This paper constitutes Part I of the contribution to the analysis of web visit histories through a new methodological framework. Firstly, web usage and web structure mining are considered as an unique mining process to detect the latent structure of the web navigation across the web sections of a single portal. We extend association rules theory to web data defining new concepts of web (patterns) association and preference matrices, as well as of (indirect and direct) sequence rules. We identify the most significant rules, according to a multiple testing procedure. In the literature, web usage patterns can be visualized in no-distance-based graphs describing the navigation behavior across web pages with sequential arrows. In the following, we introduce a geometrical visualization of sequence rules at any click of the web navigation. In particular, we provide two distance-based visualization methods for the static analysis of all data tout court and the dynamic analysis to discover the most significant web paths click by click. A real world case study is considered throughout the methodological description.

Keywords

Association rules Sequence rules Bonferroni inequality Multidimensional scaling Non-symmetric correspondence analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ABDI, H. (2007a), “Bonferroni and Šidák Corrections for Multiple Comparisons”, in Encyclopedia of Measurement and Statistics, ed. N.J. Salkind, Thousand Oaks, CA: Sage, pp. 104–108.Google Scholar
  2. ABDI, H. (2007b), “RV Coefficient and Congruence Coefficient”, in Encyclopedia of Measurement and Statistics, ed. N.J. Salkind, Thousand Oaks, CA: Sage, pp. 850–856.Google Scholar
  3. AL-SAFADI, L.A.E. (2010), “A Dual-Mode Intelligent Shopping Assistant”, Advances in Information Sciences and Service Sciences, 2(4), 43–54.Google Scholar
  4. AGRAWAL, R., and SRIKANT R. (1994), “Fast Algorithms for Mining Association Rules”, in Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, pp. 487–499.Google Scholar
  5. BERRY, M.J.A., and LINOFF, G.S. (2002), Mining the Web: Transforming Customer Data, New York: John Wiley and Sons.Google Scholar
  6. BLANC, E., and GIUDICI, P. (2002), “Sequence Rules for Web Clickstream Analysis”, Advances in Data Mining, Lecture Notes in Computer Science, 2394/-1, 1–14.Google Scholar
  7. BORG, I., and GROENEN, P.J.F. (2005), Modern Multidimensional Scaling, New York: Springer-Verlag.zbMATHGoogle Scholar
  8. BORG, I., GROENEN, P.J.F., and MAIR, P. (2013), Applied Multidimensional Scaling, Heidelberg: Springer.CrossRefGoogle Scholar
  9. CHAKRABARTI, S. (2002), Mining the Web, San Francisco CA: Morgan Kaufmann.Google Scholar
  10. COMMANDEUR, J.J.F., and HEISER,W.J. (1993), “Mathematical Derivations in the Proximity Scaling (PROXSCAL) of Symmetric Data Matrices”, Technical Report No. RR-93-03, Leiden University, The Netherlands, Department of Data Theory.Google Scholar
  11. COX, A., and COX, T.F. (2001), Multidimensional Scaling, London: Chapman and Hall.zbMATHGoogle Scholar
  12. COOLEY, R., MOBASHER, B., and SRIVASTAVA, J. (1999), “Data Preparation for Mining World Wide Web Browsing Patterns”, Knowledge and Information Systems, 1, 5–32.CrossRefGoogle Scholar
  13. D’AMBROSIO, A., and PECORARO, M. (2011), “Multidimensional Scaling as Visualization Tool of Web Sequence Rules”, in Classification and Multivariate Analysis for Complex Data Structures, Studies in Classification, Data Analysis, and Knowledge Organization, eds. B. Fichet et al., Berlin, Heidelbert: Springer-Verlag, pp. 307-314.Google Scholar
  14. D’AMBROSIO, A., PECORARO, M., SICILIANO, R. (2008) “Web Preferences Visualization through Multidimensional Scaling and Trees”, DATAVIZ VI International Conference on Statistical Graphics: Data and Information Visualization in Today’s Multimedia Society, Bremen, Jacobs University, June 25-28, 2008 (Organizers: Lars Linsen and Adi Wilhelm).Google Scholar
  15. DE LEEUW, J. (1977), “Application of Convex Analysis to Multidimensional Scaling”, in Recent Developments in Statistics, eds. J.R. Barra, F. Brodeau,G. Romier, and B. van Cutsem, Amsterdam: North Holland Publishing, pp. 133–145.Google Scholar
  16. DUNN, O.J. (1961), “Multiple Comparisons Among Means”, Journal of the American Statistical Association, 56, 52–64.MathSciNetCrossRefzbMATHGoogle Scholar
  17. ETZIONI, O. (1996), “The World Wide Web: Quagmire or Gold Mine”, in Communications of the ACM, 39(11), 65–68.CrossRefGoogle Scholar
  18. FREUND, Y., and SCHAPIRE, R.E. (1997), “A Decision-Theoretic Generalization of Online Learning and an Application to Boosting”, Journal of Computer and System Sciences, 55(1), 119–139.MathSciNetCrossRefzbMATHGoogle Scholar
  19. GIUDICI, P., and FIGINI, S. (2009), Applied Data Mining for Business and Industry, New York: Wiley.CrossRefzbMATHGoogle Scholar
  20. HÄMÄLÄINEN, W. (2010), “StatApriori: An Efficient Algorithm for Searching Statistically Significant Association Rules” Knowledge and Information Systems, 23(3), 373–399.CrossRefGoogle Scholar
  21. HASTIE T., TIBSHIRANI R., and FRIEDMAN J. (2009), The Elements of Statistical Learning (2nd ed.), Springer-Verlag.Google Scholar
  22. HEISER, W.J. (1988), “PROXSCAL, Multidimensional Scaling of Proximities”, in International Meeting on the Analysis of Multiway Data Matrices, Software Guide, eds. A. Di Ciaccio and G. Bove, Rome: C.N.R., pp. 77–81.Google Scholar
  23. HURJUI, C., GRAUR, A., and TURCU, C.O. (2008), “Monitoring the Shopping Activities from the Supermarkets Based on the Intelligent Basket by Using RFID Technology”, Electronics and Electrical Engineering, 3(83), 7–10.Google Scholar
  24. LAURO, N.C., and SICILIANO, R. (1989), “Exploratory Methods and Modelling for Contingency Tables: An Integrated Approach”, Statistica Applicata: Italian Journal of Applied Statistics, 1, 5–32.Google Scholar
  25. LAURO, N.C., and SICILIANO, R. (2000), “Analyse non symmetrique des correspondances pour des tables de contingences”, in L’Analyse des Correspondances et les techniques connexes, partie III, eds. J. Moreau, P.A. Doudin, and P. Cazes, Berlin, Heidelberg: Springer Verlag, pp. 183–210.Google Scholar
  26. KOSALA, R., and BLOCKEEL, H. (2000), “Web Mining Research: A Survey”, ACM SIGKDD Explorations, 2, 1–15.CrossRefGoogle Scholar
  27. PECORARO, M., and SICILIANO, R. (2008), “Statistical Methods for User Profiling in Web Usage Mining”, in Handbook of Research on Text and Web Mining Technologies, eds. M. Song, and Y.B. Wu, Hershey PA: Idea Group Inc., pp. 359–368.Google Scholar
  28. SHAFFER, J. (1995), “Multiple Hypothesis Testing”, Annual Review of Psychology, 46, 561–584.CrossRefGoogle Scholar
  29. SICILIANO, R., MOOIJAART, A., and VAN DER HEIJDEN, P.G.M. (1993),“A Probabilistic Model for Nonsymmetric Correspondence Analysis and Prediction in Contingency Tables”, Journal of Italian Statistical Society, 2(1), 85–106.CrossRefzbMATHGoogle Scholar
  30. SRIVASTAVA, J., COOLEY, R., DESHPANDE, M., and TANS, P.-N. (2000), “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data”, SIGKDD Explorations, 1, 12–23.CrossRefGoogle Scholar
  31. TOBLER, W., and WINEBURG, S. (1971), “A Cappadocian Speculation”, Nature, 231, 39–41.CrossRefGoogle Scholar
  32. ZHANG, C., and ZHANG, S. (2002), Association Rule Mining: Models and Algorithms, Heidelberg: Springer-Verlag.CrossRefzbMATHGoogle Scholar

Copyright information

© Classification Society of North America 2016

Authors and Affiliations

  • Roberta Siciliano
    • 1
  • Antonia D’Ambrosio
    • 1
  • Massimo Aria
    • 1
  • Sonia Amodio
    • 1
  1. 1.Department of Industrial EngineeringUniversity of Naples Federico IINaplesItaly

Personalised recommendations