Advertisement

Journal of Classification

, Volume 34, Issue 3, pp 473–493 | Cite as

Analysis of Web Visit Histories, Part II: Predicting Navigation by Nested STUMP Regression Trees

  • Roberta Siciliano
  • Antonio D’Ambrosio
  • Massimo Aria
  • Sonia Amodio
Article
  • 59 Downloads

Abstract

This paper constitutes part II of the contribution to the analysis of web visit histories through a new methodological framework for web usage-structure mining considering association rules theory. The aim is to explore through a tree structure the sequence of direct rules (i.e. paths) that characterize a web navigator who keeps standing longer on a web page with respect to the path characterizing navigators who leave the web earlier. A novel tree-based structure is introduced to take into account that the learning sample changes click by click leaving out navigators who drop off from the web after any click. The response variable at each time point is the remaining number of clicks before leaving the web. The split is induced by the predictors that describe the preferred web sections. The methodology introduced results in a Nested Stump Regression Tree that is an hierarchy of stump trees, where a stump is a tree with only one split or, equivalently, with only two terminal nodes. Suitable properties are outlined. As in first part of the contribution to the analysis of the web visit histories, a methodological description is provided by considering a web portal with a fixed set of web sections, i.e. a data set coming from the UCI Machine Learning Repository.

Keywords

Web path Sequence rules Recursive partitioning Web Usage-Structure Mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AGRAWAL, R., and SRIKANT, R. (1994), “Fast Algorithms for Mining Association Rules”, Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Vol. 1215, pp. 487–499.Google Scholar
  2. BLANC, E., and GIUDICI, P. (2002), “Sequence Rules for Web Clickstream Analysis”, in Advances in Data Mining, Berlin, Heidelberg: Springer, pp. 1–14.Google Scholar
  3. BREIMAN, L. (1996), “Bagging Predictors”, Machine Learning, 24(2), 123–140.zbMATHGoogle Scholar
  4. BREIMAN, L. (2001), “Random Forests”, Machine Learning, 45(1), 5–32.CrossRefzbMATHGoogle Scholar
  5. BREIMAN, L., FRIEDMAN, J., OLSHEN, R.A. and STONE, C.J. (1984), Classification and Regression Trees, Boca Raton: CRC Press.zbMATHGoogle Scholar
  6. CAPPELLI, C., MOLA, F., and SICILIANO, R. (2002), “A Statistical Approach to Growing a Reliable Honest Tree”, Computational Statistics and Data Analysis, 38(3), 285–299.MathSciNetCrossRefzbMATHGoogle Scholar
  7. CHAKRABARTI, S. (2002), Mining the Web: Discovering Knowledge from Hypertext Data, The Netherlands: Elsevier.Google Scholar
  8. COOLEY, R., MOBASHER, B., and SRIVASTAVA, J. (1999), “Data Preparation for Mining World Wide Web Browsing Patterns”, Knowledge and Information Systems, 1(1), 5–32.CrossRefGoogle Scholar
  9. D’AMBROSIO, A., ARIA, M., and SICILIANO, R. (2012), “Accurate Tree-Based Missing Data Imputation and Data FusionWithin the Statistical Learning Paradigm”, Journal of Classification, 29(2), 227–258.MathSciNetCrossRefzbMATHGoogle Scholar
  10. D’AMBROSIO, A., and PECORARO, M. (2011), “Multidimensional Scaling as Visualization Tool of Web Sequence Rules”, in Classification and Multivariate Analysis for Complex Data Structures, Berlin, Heidelberg: Springer, pp. 309–316.CrossRefGoogle Scholar
  11. D’AMBROSIO, A., PECORARO, M., and SICILIANO, R. (2008), “Web Preferences Visualization Through Multidimensional Scaling and Trees”, in DATAVIZ VI International Conference: Statistical Graphics: Data and Information Visualization in Today’s Multimedia Society, Bremen, June 25–28, 2008.Google Scholar
  12. DIETTERICH, T.G. (2000), “Ensemble Methods in Machine Learning”, in Multiple Classifier Systems, Berlin: Springer, pp. 1–15.Google Scholar
  13. ETZIONI, O. (1996), “The World-Wide Web: Quagmire or Gold Mine?”, Communications of the ACM, 39(11), 65–68.CrossRefGoogle Scholar
  14. FOKKEMA, M., SMITS, N., ZEILEIS, A., HOTHORN, T., and KELDERMAN, H. (2015), “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”, Working Papers, Faculty of Economics and Statistics, University of Innsbruck, ftp://ftp.repec.org/opt/ReDIF/RePEc/inn/wpaper/2015-10.pdf.
  15. FREUND, Y., and SCHAPIRE, R.E. (1997), “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting”, Journal of Computer and System Sciences, 55(1), 119–139.MathSciNetCrossRefzbMATHGoogle Scholar
  16. FU, W., and SIMONOFF, J.S. (2015), “Unbiased Regression Trees for Longitudinal and Clustered Data”, Computational Statistics and Data Analysis, 88, 53–74.MathSciNetCrossRefGoogle Scholar
  17. GIUDICI, P., and FIGINI, S. (2009), Applied Data Mining: Statistical Methods for Business and Industry, New York: John Wiley and Sons.CrossRefzbMATHGoogle Scholar
  18. HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Berlin: Springer.CrossRefzbMATHGoogle Scholar
  19. IBA, W., and LANGLEY, P. (1992), “Induction of One-Level Decision Trees”, in Proceedings of the Ninth International Conference on Machine Learning, pp. 233–240.Google Scholar
  20. KOSALA, R., and BLOCKEEL, H. (2000), “Web Mining Research: A Survey”, ACM SIGKDD Explorations, 2, 1–15.Google Scholar
  21. LINOFF, G.S, and BERRY, M.J. (2001), Mining the Web: Transforming Customer Data into Customer Value, New York: John Wiley and Sons, Inc.Google Scholar
  22. MOLA, F., and SICILIANO, R. (1997), “A Fast Splitting Procedure for Classification and Regression Trees”, Statistics and Computing, 7, 208–216.CrossRefGoogle Scholar
  23. PECORARO, M., and SICILIANO, R. (2008), “Statistical Methods for User Profiling in Web Usage Mining”, in Handbook of Research on Text and Web Mining Technologies, eds. M. Song and Y.B. Wu, Hershey PA: Idea Group Inc., pp. 359–368.Google Scholar
  24. SICILIANO, R., D’AMBROSIO, A., ARIA, M., and AMODIO, S. (2016), ”Analysis of Web Visit Histories, Part I: Distance-Based Visualization of Sequence Rules”, Journal of Classification, 33(2), 298–324.MathSciNetCrossRefzbMATHGoogle Scholar
  25. SICILIANO, R., and MOLA, F. (1996), “A Fast Regression Tree Procedure”, in Proceedings of the 11th International Workshop on Statistical Modeling, eds. A. Forcina, G.M. Marchetti, R. Hatzinger, and G. Galmacci, Citta’ di Castello IT: Graphos, pp. 332–340.Google Scholar
  26. SICILIANO, R., and MOLA, F. (2000), “Multivariate Data Analysis Through Classification and Regression Trees”, Computational Statistics and Data Analysis, 32, 285–301.CrossRefzbMATHGoogle Scholar
  27. SRIVASTAVA, J., COOLEY, R., DESHPANDE, M., and PANG-NING T., (2000), “Web Usage Mining: Discovery and Applications of Usage Patterns fromWeb Data”, ACM SIGKDD Explorations Newsletter, 1(2), 12–23.CrossRefGoogle Scholar
  28. VEZZOLI, M. (2011), “Exploring the Facets of Overall Job Satisfaction Through a Novel Ensemble Learning”, Electronic Journal of Applied Statistical Analysis, 4(1), 23–38.Google Scholar
  29. ZHANG, C., and ZHANG, S. (2002), Association Rule Mining: Models and Algorithms, Heidelberg: Springer.CrossRefzbMATHGoogle Scholar

Copyright information

© Classification Society of North America 2017

Authors and Affiliations

  • Roberta Siciliano
    • 1
  • Antonio D’Ambrosio
    • 1
  • Massimo Aria
    • 1
  • Sonia Amodio
    • 2
  1. 1.Department of Industrial EngineeringUniversity of Naples Federico IINaplesItaly
  2. 2.Leiden University Medical CenterLeidenThe Netherlands

Personalised recommendations