Abstract
This paper addresses the issue of Weblog Data cleaning within the scope of Web Usage Mining. Weblog data are information on end-user clicks and underlying user-agent hits recorded by webservers. Since Web Usage Mining is interested in end-user behavior, user-agent hits are referred to as noise to be cleaned before mining. The most referenced and implemented cleaning methods are the conventional and advanced cleaning. They are content-centric filtering heuristics, based on the requested resource attribute of the weblog database. These cleaning methods are limited in terms of relevancy, workability and cost constraints, within the context of dynamic and responsive web. In order to deal with dynamic and responsive web constraints, this contribution introduces a rule-based cleaning method focused on the logging structure rules. The rule-based cleaning method experimentation demonstrates significant advantages compared to the content-centric methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explor. Newsl. 1(2), 12–23 (2000)
Srivastava, M., Garg, R., Mishra, P.K.: Preprocessing techniques in web usage mining: a survey. Int. J. Comput. Appl. 97(18), 1–9 (2014)
Kohavi, R.: Mining e-Commerce data: the good, the bad, and the ugly. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, p. 2. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45357-1_2
Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)
Langhnoja, S., Barot, M., Mehta, D.: Pre-processing: procedure on web log file for web usage mining. Int. J. Emerg. Technol. 2(12), 5 (2012)
Chitraa, V., Thanamani, D.A.S.: Web log data cleaning for enhancing mining process. Int. J. Commun. Comput. Technol. 01(03), 7 (2012)
Srivastava, J., Desikan, P., Kumar, V.: Web mining: Accomplishments and future directions. In: National Science Foundation Workshop on Next Generation Data Mining (NGDM 2002), pp. 51–56 (2002)
Pabarskaite, Z., Raudys, A.: A process of knowledge discovery from web log data: systematization and critical review. J. Intell. Inf. Syst. 28(1), 79–104 (2007)
Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web-usage analysis. Informs J. Comput. 15(2), 171–190 (2003)
Pabarskaite, Z.: Implementing advanced cleaning and end-user interpretability technologies in web log mining. In: 2002 Proceedings of the 24th International Conference on Information Technology Interfaces, ITI 2002, pp. 109–113 (2002)
Dhandi, M., Chakrawarti, R.K.: A comprehensive study of web usage mining, pp. 1–5 (2016)
Srinivas, A.V.: A survey on preprocessing of web-log data in web usage mining. Int. J. Modern Trends Sci. Technol. 03(02), 35–41 (2017)
Zhang, Q., Segall, R.S.: Web mining: a survey of current research, techniques, and software. Int. J. Inf. Technol. Decis. Making 7(04), 683–720 (2008)
Spiliopoulou, M.: Web usage mining for Web site evaluation. Commun. ACM 43(8), 127–134 (2000)
Zahran, D.I., Al-Nuaim, H.A., Rutter, M.J., Benyon, D.: A comparative approach to web evaluation and website evaluation methods. Int. J. Pub. Inf. Syst. 10(1), 21–39 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ganibardi, A., Ali, C.A. (2018). Web Usage Data Cleaning. In: Ordonez, C., Bellatreche, L. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2018. Lecture Notes in Computer Science(), vol 11031. Springer, Cham. https://doi.org/10.1007/978-3-319-98539-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-98539-8_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98538-1
Online ISBN: 978-3-319-98539-8
eBook Packages: Computer ScienceComputer Science (R0)