A Soft Computing Prefetcher to Mitigate Cache Degradation by Web Robots

  • Ning Xie
  • Kyle Brown
  • Nathan Rude
  • Derek DoranEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10261)


This paper investigates the feasibility of a resource prefetcher able to predict future requests made by web robots, which are software programs rapidly overtaking human users as the dominant source of web server traffic. Such a prefetcher is a crucial first line of defense for web caches and content management systems that must service many requests while maintaining good performance. Our prefetcher marries a deep recurrent neural network with a Bayesian network to combine prior global data with local data about specific robots. Experiments with traffic logs from web servers across two universities demonstrate improved predictions over a traditional dependency graph approach. Finally, preliminary evaluation of a hypothetical caching system that incorporates our prefetching scheme is discussed.


LSTM Deep learning Bayesian model Web Caching Resource prediction 



The authors thank Logan Rickert for data processing support, Maria-Carla Calzarossa for data from the University of Pavia, and Mark Anderson for data from Wright State University. This paper is based on work supported by the National Science Foundation (NSF) under Grant No. 1464104. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.


  1. 1.
    Almeida, V., Menascé, D., Riedi, R., Peligrinelli, F., Fonseca, R., Meira Jr., W.: Analyzing web robots and their impact on caching. In: Proceedings of Sixth Workshop on Web Caching and Content Distribution, pp. 20–22 (2001)Google Scholar
  2. 2.
    Brandman, O., Cho, J., Garcia-Molina, H., Shivakumar, S.: Crawler-friendly web servers. In: Proceedings of Performance and Architecture of Web Servers Conference (2000)Google Scholar
  3. 3.
    Chen, X., Zhang, X.: A popularity-based prediction model for web prefetching. Computer 36(3), 63–70 (2003)CrossRefGoogle Scholar
  4. 4.
    Dietz, L.: Directed factor graph notation for generative models. Technical report, Max Planck Institute for Informatics (2010)Google Scholar
  5. 5.
    Doran, D., Gokhale, S.: A classification framework for web robots. J. Am. Soc. Inf. Sci. Technol. 63, 2549–2554 (2012)CrossRefGoogle Scholar
  6. 6.
    Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Mining Knowl. Discov. 22(1–2), 183–210 (2011)CrossRefGoogle Scholar
  7. 7.
    Doran, D., Morillo, K., Gokhale, S.: A comparison of web robot and human requests. In: Proceedings of ACM/IEEE Conference on Advances in Social Network Analysis and Mining, pp. 1374–1380 (2013)Google Scholar
  8. 8.
    Gellert, A., Florea, A.: Web prefetching through efficient prediction by partial matching. World Wide Web 19(5), 921–932 (2016)CrossRefGoogle Scholar
  9. 9.
    Graves, A.: Neural networks. In: Graves, A. (ed.) Supervised Sequence Labelling with Recurrent Neural Networks, pp. 15–35. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  10. 10.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  11. 11.
    Lee, J., Cha, S., Lee, D., Lee, H.: Classification of web robots: an empirical study based on over one billion requests. Comput. Secur. 28(8), 795–802 (2009)CrossRefGoogle Scholar
  12. 12.
    Li, H., Lee, W.-C., Sivasubramaniam, A., Giles, C.L.: A hybrid cache and prefetch mechanism for scientific literature search engines. In: Baresi, L., Fraternali, P., Houben, G.-J. (eds.) ICWE 2007. LNCS, vol. 4607, pp. 121–136. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-73597-7_10 CrossRefGoogle Scholar
  13. 13.
    Menascé, D., Almeida, V., Riedi, R., Ribeiro, F., Fonseca, R., Meira Jr., W.: In search of invariants for e-business workloads. In: Proceedings of the 2nd ACM Conference on Electronic Commerce, pp. 56–65 (2000)Google Scholar
  14. 14.
    Pallis, G., Vakali, A., Pokorny, J.: A clustering-based prefetching scheme on a web cache environment. Comput. Electr. Eng. 34(4), 309–323 (2008)CrossRefzbMATHGoogle Scholar
  15. 15.
    Qualman, E.: Socialnomics: How Social Media Transforms the Way We Live and Do Business. Wiley, Hoboken (2012)Google Scholar
  16. 16.
    Rude, H.N., Doran, D.: Request type prediction for web robot and internet of things traffic. In: Proceedings of IEEE International Conference on Machine Learning and Applications, pp. 995–1000 (2015)Google Scholar
  17. 17.
    Zeifman, I.: Report: Bot traffic is up to 61.5% of all website traffic.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ning Xie
    • 1
  • Kyle Brown
    • 1
  • Nathan Rude
    • 1
  • Derek Doran
    • 1
    Email author
  1. 1.Department of Computer Science and Engineering, Kno.e.sis Research CenterWright State UniversityDaytonUSA

Personalised recommendations