Advertisement

A Methodological Approach for Time Series Analysis and Forecasting of Web Dynamics

  • Maria Carla CalzarossaEmail author
  • Marco L. Della Vedova
  • Luisa Massari
  • Giuseppe Nebbione
  • Daniele Tessera
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11610)

Abstract

The web is a complex information ecosystem that provides a large variety of content changing over time as a consequence of the combined effects of management policies, user interactions and external events. These highly dynamic scenarios challenge technologies dealing with discovery, management and retrieval of web content. In this paper, we address the problem of modeling and predicting web dynamics in the framework of time series analysis and forecasting. We present a general methodological approach that allows the identification of the patterns describing the behavior of the time series, the formulation of suitable models and the use of these models for predicting the future behavior. Moreover, to improve the forecasts, we propose a method for detecting and modeling the spiky patterns that might be present in a time series. To test our methodological approach, we analyze the temporal patterns of page uploads of the Reuters news agency website over one year. We discover that the upload process is characterized by a diurnal behavior and by a much larger number of uploads during weekdays with respect to weekend days. Moreover, we identify several sudden spikes and a daily periodicity. The overall model of the upload process – obtained as a superposition of the models of its individual components – accurately fits the data, including most of the spikes.

Keywords

Web dynamics Temporal patterns Time series analysis Forecasting Performance modeling Search engines ARMA models 

References

  1. 1.
    Adar, E., Teevan, J., Dumais, S.T., Elsas, J.: The web changes everything: understanding the dynamics of web content. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining - WSDM 2009, pp. 282–291. ACM (2009)Google Scholar
  2. 2.
    Box, G.E.P., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecasting and Control, 5th edn. Wiley, Hoboken (2015)zbMATHGoogle Scholar
  3. 3.
    Brewington, B., Cybenko, G.: How dynamic is the web? Comput. Netw. 33(1–6), 257–276 (2000)CrossRefGoogle Scholar
  4. 4.
    Calzarossa, M., Massari, L., Tessera, D.: Workload characterization: a survey revisited. ACM Comput. Surv. 48(3), 48:1–48:43 (2016)CrossRefGoogle Scholar
  5. 5.
    Calzarossa, M., Tessera, D.: Characterization of the evolution of a news Web site. J. Syst. Softw. 81(12), 2236–2344 (2008)CrossRefGoogle Scholar
  6. 6.
    Calzarossa, M., Tessera, D.: Time series analysis of the dynamics of news websites. In: Proceedings of the 13th International Conference on Parallel and Distributed Computing, Applications and Technologies - PDCAT 2012, pp. 529–533. IEEE Computer Society Press (2012)Google Scholar
  7. 7.
    Calzarossa, M., Tessera, D.: Multivariate analysis of web content changes. In: Proceedings of the 11th ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2014, pp. 699–706. IEEE Computer Society Press (2014)Google Scholar
  8. 8.
    Calzarossa, M., Tessera, D.: Modeling and predicting temporal patterns of web content changes. J. Netw. Comput. Appl. 56, 115–123 (2015)CrossRefGoogle Scholar
  9. 9.
    Calzarossa, M., Tessera, D.: Analysis and forecasting of web content dynamics. In: Proceedings of the 32nd International Conference on Advanced Information Networking and Applications Workshops - WAINA 2018, pp. 12–17. IEEE Computer Society Press (2018)Google Scholar
  10. 10.
    Cho, J., Garcia-Molina, H.: Estimating frequency of change. ACM Trans. Internet Technol. 3(3), 256–290 (2003)CrossRefGoogle Scholar
  11. 11.
    Cleveland, R., Cleveland, W., McRae, J., Terpenning, I.: STL: a seasonal-trend decomposition procedure based on loess (with discussion). J. Official Stat. 6, 3–73 (1990)Google Scholar
  12. 12.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)zbMATHGoogle Scholar
  13. 13.
    Fetterly, D., Manasse, M., Najork, M., Wiener, J.: A large-scale study of the evolution of Web pages. Softw.: Pract. Experience 34(2), 213–237 (2004)Google Scholar
  14. 14.
    Hamilton, J.D.: Time Series Analysis. Princeton University Press, Princeton (1994)zbMATHGoogle Scholar
  15. 15.
    Ke, Y., Deng, L., Ng, W., Lee, D.L.: Web dynamics and their ramifications for the development of web search engines. Comput. Netw. 50(10), 1430–1447 (2006)CrossRefGoogle Scholar
  16. 16.
    Li, X., Cline, D.B.H., Loguinov, D.: Temporal update dynamics under blind sampling. IEEE/ACM Trans. Networking 25(1), 363–376 (2017)CrossRefGoogle Scholar
  17. 17.
    Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Characterizing web document change. In: Wang, X.S., Yu, G., Lu, H. (eds.) Advances in Web-Age Information Management - WAIM 2001. LNCS, vol. 2118, pp. 133–144. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-47714-4_13CrossRefGoogle Scholar
  18. 18.
    Makridakis, S., Wheelwright, S.C., Hyndman, R.J.: Forecasting - Methods and Applications, 3rd edn. Wiley, Hoboken (1998)Google Scholar
  19. 19.
    Oita, M., Senellart, P.: Deriving dynamics of web pages: a survey. In: Proceedings of the 1st International Temporal Workshop on Web Archiving - In Conjunction with WWW 2011, pp. 25–32 (2011)Google Scholar
  20. 20.
    Radinsky, K., Bennett, P.: Predicting content change on the web. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining - WSDM 2013, pp. 415–424. ACM (2013)Google Scholar
  21. 21.
    Radinsky, K., et al.: Behavioral dynamics on the web: learning, modeling, and prediction. ACM Trans. Inf. Syst. 31(3), 16:1–16:37 (2013)CrossRefGoogle Scholar
  22. 22.
    Shi, W., Collins, E., Karamcheti, V.: Modeling object characteristics of dynamic Web content. J. Parallel Distrib. Comput. 63(10), 963–980 (2003)CrossRefGoogle Scholar
  23. 23.
    Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining - WSDM 2011, pp. 177–186. ACM (2011)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Maria Carla Calzarossa
    • 1
    Email author
  • Marco L. Della Vedova
    • 2
  • Luisa Massari
    • 1
  • Giuseppe Nebbione
    • 1
  • Daniele Tessera
    • 2
  1. 1.Dipartimento di Ingegneria Industriale e dell’InformazioneUniversità di PaviaPaviaItaly
  2. 2.Dipartimento di Matematica e FisicaUniversità Cattolica del Sacro CuoreBresciaItaly

Personalised recommendations