A Soft Computing Prefetcher to Mitigate Cache Degradation by Web Robots
This paper investigates the feasibility of a resource prefetcher able to predict future requests made by web robots, which are software programs rapidly overtaking human users as the dominant source of web server traffic. Such a prefetcher is a crucial first line of defense for web caches and content management systems that must service many requests while maintaining good performance. Our prefetcher marries a deep recurrent neural network with a Bayesian network to combine prior global data with local data about specific robots. Experiments with traffic logs from web servers across two universities demonstrate improved predictions over a traditional dependency graph approach. Finally, preliminary evaluation of a hypothetical caching system that incorporates our prefetching scheme is discussed.
KeywordsLSTM Deep learning Bayesian model Web Caching Resource prediction
The authors thank Logan Rickert for data processing support, Maria-Carla Calzarossa for data from the University of Pavia, and Mark Anderson for data from Wright State University. This paper is based on work supported by the National Science Foundation (NSF) under Grant No. 1464104. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
- 1.Almeida, V., Menascé, D., Riedi, R., Peligrinelli, F., Fonseca, R., Meira Jr., W.: Analyzing web robots and their impact on caching. In: Proceedings of Sixth Workshop on Web Caching and Content Distribution, pp. 20–22 (2001)Google Scholar
- 2.Brandman, O., Cho, J., Garcia-Molina, H., Shivakumar, S.: Crawler-friendly web servers. In: Proceedings of Performance and Architecture of Web Servers Conference (2000)Google Scholar
- 4.Dietz, L.: Directed factor graph notation for generative models. Technical report, Max Planck Institute for Informatics (2010)Google Scholar
- 7.Doran, D., Morillo, K., Gokhale, S.: A comparison of web robot and human requests. In: Proceedings of ACM/IEEE Conference on Advances in Social Network Analysis and Mining, pp. 1374–1380 (2013)Google Scholar
- 12.Li, H., Lee, W.-C., Sivasubramaniam, A., Giles, C.L.: A hybrid cache and prefetch mechanism for scientific literature search engines. In: Baresi, L., Fraternali, P., Houben, G.-J. (eds.) ICWE 2007. LNCS, vol. 4607, pp. 121–136. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-73597-7_10 CrossRefGoogle Scholar
- 13.Menascé, D., Almeida, V., Riedi, R., Ribeiro, F., Fonseca, R., Meira Jr., W.: In search of invariants for e-business workloads. In: Proceedings of the 2nd ACM Conference on Electronic Commerce, pp. 56–65 (2000)Google Scholar
- 15.Qualman, E.: Socialnomics: How Social Media Transforms the Way We Live and Do Business. Wiley, Hoboken (2012)Google Scholar
- 16.Rude, H.N., Doran, D.: Request type prediction for web robot and internet of things traffic. In: Proceedings of IEEE International Conference on Machine Learning and Applications, pp. 995–1000 (2015)Google Scholar
- 17.Zeifman, I.: Report: Bot traffic is up to 61.5% of all website traffic. bit.ly/MoMRxE