Abstract
Web mining is an application of data mining technologies for huge data repositories. Before applying web mining techniques, the data in the web log has to be pre-processed, integrated and transformed. As the World Wide Web is continuously and rapidly growing, it is necessary for the web miners to utilize intelligent tools in order to find, extract, filter and evaluate the desired information. The data preprocessing stage is the most important phase in the process of web mining and is critical and complex in successful extraction of useful data. The web log is incremental in nature, thus conventional data pre-processing techniques were proved to be not suitable as they assume that the data is static. The web logs are non scalable, impractical and are distributed in nature. Hence we require a comprehensive learning algorithm in order to get the desired information.
This paper introduces an intelligent system, capable of preprocessing web logs efficiently. It can identify human user and web search engine accesses intelligently, in less time. The system discussed reduces the error rate and improves significant learning performance of the learning algorithm. The work ensures the goodness of split by using popular measures like Entropy and Gini index. The experimental results proving this claim are given in this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Spiliopoulou, M.: Web Usage Mining for Site Evaluation. Comm. ACM 43(8), 127–134 (2000)
Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology (TOIT) 3(1), 1–27 (2003)
Kleinberg, J.M.: Authoritatve sources in a hyperlinked environment. In: ACM-SIAM Symposium on Discrete Algorithms (1998)
Kamdar, T.: Creating Adaptive Web Servers Using Incremental Weblog Mining, masters thesis, Computer Science Dept., Univ. of Maryland, Baltimore, C0–1 (2001)
Wang, Y.: Web Mining and Knowledge Discovery of Usage Patterns (February 2000)
Cooley, R., Mobasher, B., Srivastava, J.: Web mining: Information and pattern discovery on the World Wide Web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 1997 (1997)
Srivastava, J., Desikan, P., Kumar, V.: Web Mining: Accomplishments and Future Directions. In: Proc. US Nat’l Science Foundation Workshop on Next-Generation Data Mining (NGDM). Nat’l Science Foundation (2002)
Kumar, R., et al.: Trawling the Web for Emerging Cybercommunities. In: Proc. 8th World Wide Web Conf., Elsevier Science, Amsterdam (1999)
Manolopoulos, Y., et al.: Indexing Techniques for Web Access Logs. Web Information Systems, IDEA Group (2004)
Armstrong, R., et al.: Webwatcher: A Learning Apprentice for the World Wide Web. In: Proc. AAAI Spring Symp. Information Gathering from Heterogeneous, Distributed Environments. AAAI Press, Menlo Park (1995)
Chen, M.-S., Park, J.S., Yu, P.S.: Efficient Data Mining for Path Traversal Patterns. IEEE Trans. Knowledge and Data Eng. 10(2) (1998)
Yanchun, C.: Research on Intelligence Collecting System[J]. Journal of Shijiazhuang Railway Institute(Natural Science) (2008)
Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (1999)
Chen, M.S., Park, J.S., Yu, P.S.: Efficient Data Mining for Path Traversal Patterns in a Web Environment. IEEE Transaction on Knowledge and Data Engineering (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rao, V.V.R.M., Kumari, V.V., Raju, K.V.S.V.N. (2011). An Intelligent System for Web Usage Data Preprocessing. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. CCSIT 2011. Communications in Computer and Information Science, vol 131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17857-3_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-17857-3_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17856-6
Online ISBN: 978-3-642-17857-3
eBook Packages: Computer ScienceComputer Science (R0)