Analysis of Web Visit Histories, Part II: Predicting Navigation by Nested STUMP Regression Trees

Siciliano, Roberta; D’Ambrosio, Antonio; Aria, Massimo; Amodio, Sonia

doi:10.1007/s00357-017-9239-5

Analysis of Web Visit Histories, Part II: Predicting Navigation by Nested STUMP Regression Trees

Published: 07 October 2017

Volume 34, pages 473–493, (2017)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Roberta Siciliano¹,
Antonio D’Ambrosio¹,
Massimo Aria¹ &
…
Sonia Amodio²

103 Accesses
1 Citation
Explore all metrics

Abstract

This paper constitutes part II of the contribution to the analysis of web visit histories through a new methodological framework for web usage-structure mining considering association rules theory. The aim is to explore through a tree structure the sequence of direct rules (i.e. paths) that characterize a web navigator who keeps standing longer on a web page with respect to the path characterizing navigators who leave the web earlier. A novel tree-based structure is introduced to take into account that the learning sample changes click by click leaving out navigators who drop off from the web after any click. The response variable at each time point is the remaining number of clicks before leaving the web. The split is induced by the predictors that describe the preferred web sections. The methodology introduced results in a Nested Stump Regression Tree that is an hierarchy of stump trees, where a stump is a tree with only one split or, equivalently, with only two terminal nodes. Suitable properties are outlined. As in first part of the contribution to the analysis of the web visit histories, a methodological description is provided by considering a web portal with a fixed set of web sections, i.e. a data set coming from the UCI Machine Learning Repository.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A random forest guided tour

Article 19 April 2016

Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks

Article 09 May 2018

A review of predictive uncertainty estimation with machine learning

Article Open access 18 March 2024

References

AGRAWAL, R., and SRIKANT, R. (1994), “Fast Algorithms for Mining Association Rules”, Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Vol. 1215, pp. 487–499.
Google Scholar
BLANC, E., and GIUDICI, P. (2002), “Sequence Rules for Web Clickstream Analysis”, in Advances in Data Mining, Berlin, Heidelberg: Springer, pp. 1–14.
Google Scholar
BREIMAN, L. (1996), “Bagging Predictors”, Machine Learning, 24(2), 123–140.
MATH Google Scholar
BREIMAN, L. (2001), “Random Forests”, Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
BREIMAN, L., FRIEDMAN, J., OLSHEN, R.A. and STONE, C.J. (1984), Classification and Regression Trees, Boca Raton: CRC Press.
MATH Google Scholar
CAPPELLI, C., MOLA, F., and SICILIANO, R. (2002), “A Statistical Approach to Growing a Reliable Honest Tree”, Computational Statistics and Data Analysis, 38(3), 285–299.
Article MathSciNet MATH Google Scholar
CHAKRABARTI, S. (2002), Mining the Web: Discovering Knowledge from Hypertext Data, The Netherlands: Elsevier.
Google Scholar
COOLEY, R., MOBASHER, B., and SRIVASTAVA, J. (1999), “Data Preparation for Mining World Wide Web Browsing Patterns”, Knowledge and Information Systems, 1(1), 5–32.
Article Google Scholar
D’AMBROSIO, A., ARIA, M., and SICILIANO, R. (2012), “Accurate Tree-Based Missing Data Imputation and Data FusionWithin the Statistical Learning Paradigm”, Journal of Classification, 29(2), 227–258.
Article MathSciNet MATH Google Scholar
D’AMBROSIO, A., and PECORARO, M. (2011), “Multidimensional Scaling as Visualization Tool of Web Sequence Rules”, in Classification and Multivariate Analysis for Complex Data Structures, Berlin, Heidelberg: Springer, pp. 309–316.
Chapter Google Scholar
D’AMBROSIO, A., PECORARO, M., and SICILIANO, R. (2008), “Web Preferences Visualization Through Multidimensional Scaling and Trees”, in DATAVIZ VI International Conference: Statistical Graphics: Data and Information Visualization in Today’s Multimedia Society, Bremen, June 25–28, 2008.
DIETTERICH, T.G. (2000), “Ensemble Methods in Machine Learning”, in Multiple Classifier Systems, Berlin: Springer, pp. 1–15.
Google Scholar
ETZIONI, O. (1996), “The World-Wide Web: Quagmire or Gold Mine?”, Communications of the ACM, 39(11), 65–68.
Article Google Scholar
FOKKEMA, M., SMITS, N., ZEILEIS, A., HOTHORN, T., and KELDERMAN, H. (2015), “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”, Working Papers, Faculty of Economics and Statistics, University of Innsbruck, ftp://ftp.repec.org/opt/ReDIF/RePEc/inn/wpaper/2015-10.pdf.
FREUND, Y., and SCHAPIRE, R.E. (1997), “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting”, Journal of Computer and System Sciences, 55(1), 119–139.
Article MathSciNet MATH Google Scholar
FU, W., and SIMONOFF, J.S. (2015), “Unbiased Regression Trees for Longitudinal and Clustered Data”, Computational Statistics and Data Analysis, 88, 53–74.
Article MathSciNet Google Scholar
GIUDICI, P., and FIGINI, S. (2009), Applied Data Mining: Statistical Methods for Business and Industry, New York: John Wiley and Sons.
Book MATH Google Scholar
HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Berlin: Springer.
Book MATH Google Scholar
IBA, W., and LANGLEY, P. (1992), “Induction of One-Level Decision Trees”, in Proceedings of the Ninth International Conference on Machine Learning, pp. 233–240.
KOSALA, R., and BLOCKEEL, H. (2000), “Web Mining Research: A Survey”, ACM SIGKDD Explorations, 2, 1–15.
LINOFF, G.S, and BERRY, M.J. (2001), Mining the Web: Transforming Customer Data into Customer Value, New York: John Wiley and Sons, Inc.
Google Scholar
MOLA, F., and SICILIANO, R. (1997), “A Fast Splitting Procedure for Classification and Regression Trees”, Statistics and Computing, 7, 208–216.
Article Google Scholar
PECORARO, M., and SICILIANO, R. (2008), “Statistical Methods for User Profiling in Web Usage Mining”, in Handbook of Research on Text and Web Mining Technologies, eds. M. Song and Y.B. Wu, Hershey PA: Idea Group Inc., pp. 359–368.
SICILIANO, R., D’AMBROSIO, A., ARIA, M., and AMODIO, S. (2016), ”Analysis of Web Visit Histories, Part I: Distance-Based Visualization of Sequence Rules”, Journal of Classification, 33(2), 298–324.
Article MathSciNet MATH Google Scholar
SICILIANO, R., and MOLA, F. (1996), “A Fast Regression Tree Procedure”, in Proceedings of the 11th International Workshop on Statistical Modeling, eds. A. Forcina, G.M. Marchetti, R. Hatzinger, and G. Galmacci, Citta’ di Castello IT: Graphos, pp. 332–340.
SICILIANO, R., and MOLA, F. (2000), “Multivariate Data Analysis Through Classification and Regression Trees”, Computational Statistics and Data Analysis, 32, 285–301.
Article MATH Google Scholar
SRIVASTAVA, J., COOLEY, R., DESHPANDE, M., and PANG-NING T., (2000), “Web Usage Mining: Discovery and Applications of Usage Patterns fromWeb Data”, ACM SIGKDD Explorations Newsletter, 1(2), 12–23.
Article Google Scholar
VEZZOLI, M. (2011), “Exploring the Facets of Overall Job Satisfaction Through a Novel Ensemble Learning”, Electronic Journal of Applied Statistical Analysis, 4(1), 23–38.
Google Scholar
ZHANG, C., and ZHANG, S. (2002), Association Rule Mining: Models and Algorithms, Heidelberg: Springer.
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, University of Naples Federico II, Corso Umberto I, 80138, Naples, Italy
Roberta Siciliano, Antonio D’Ambrosio & Massimo Aria
Leiden University Medical Center, Leiden, The Netherlands
Sonia Amodio

Authors

Roberta Siciliano
View author publications
You can also search for this author in PubMed Google Scholar
Antonio D’Ambrosio
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Aria
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Amodio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberta Siciliano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Siciliano, R., D’Ambrosio, A., Aria, M. et al. Analysis of Web Visit Histories, Part II: Predicting Navigation by Nested STUMP Regression Trees. J Classif 34, 473–493 (2017). https://doi.org/10.1007/s00357-017-9239-5

Download citation

Published: 07 October 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00357-017-9239-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of Web Visit Histories, Part II: Predicting Navigation by Nested STUMP Regression Trees

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks

A review of predictive uncertainty estimation with machine learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation