An Intelligent System for Web Usage Data Preprocessing

Rao, V. V. R. Maheswara; Kumari, V. Valli; Raju, K. V. S. V. N.

doi:10.1007/978-3-642-17857-3_47

V. V. R. Maheswara Rao⁴,
V. Valli Kumari⁵ &
K. V. S. V. N. Raju⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 131))

Included in the following conference series:

International Conference on Computer Science and Information Technology

1073 Accesses

Abstract

Web mining is an application of data mining technologies for huge data repositories. Before applying web mining techniques, the data in the web log has to be pre-processed, integrated and transformed. As the World Wide Web is continuously and rapidly growing, it is necessary for the web miners to utilize intelligent tools in order to find, extract, filter and evaluate the desired information. The data preprocessing stage is the most important phase in the process of web mining and is critical and complex in successful extraction of useful data. The web log is incremental in nature, thus conventional data pre-processing techniques were proved to be not suitable as they assume that the data is static. The web logs are non scalable, impractical and are distributed in nature. Hence we require a comprehensive learning algorithm in order to get the desired information.

This paper introduces an intelligent system, capable of preprocessing web logs efficiently. It can identify human user and web search engine accesses intelligently, in less time. The system discussed reduces the error rate and improves significant learning performance of the learning algorithm. The work ensures the goodness of split by using popular measures like Entropy and Gini index. The experimental results proving this claim are given in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Spiliopoulou, M.: Web Usage Mining for Site Evaluation. Comm. ACM 43(8), 127–134 (2000)
Article Google Scholar
Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology (TOIT) 3(1), 1–27 (2003)
Article Google Scholar
Kleinberg, J.M.: Authoritatve sources in a hyperlinked environment. In: ACM-SIAM Symposium on Discrete Algorithms (1998)
Google Scholar
Kamdar, T.: Creating Adaptive Web Servers Using Incremental Weblog Mining, masters thesis, Computer Science Dept., Univ. of Maryland, Baltimore, C0–1 (2001)
Google Scholar
Wang, Y.: Web Mining and Knowledge Discovery of Usage Patterns (February 2000)
Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Web mining: Information and pattern discovery on the World Wide Web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 1997 (1997)
Google Scholar
Srivastava, J., Desikan, P., Kumar, V.: Web Mining: Accomplishments and Future Directions. In: Proc. US Nat’l Science Foundation Workshop on Next-Generation Data Mining (NGDM). Nat’l Science Foundation (2002)
Google Scholar
Kumar, R., et al.: Trawling the Web for Emerging Cybercommunities. In: Proc. 8th World Wide Web Conf., Elsevier Science, Amsterdam (1999)
Google Scholar
Manolopoulos, Y., et al.: Indexing Techniques for Web Access Logs. Web Information Systems, IDEA Group (2004)
Book Google Scholar
Armstrong, R., et al.: Webwatcher: A Learning Apprentice for the World Wide Web. In: Proc. AAAI Spring Symp. Information Gathering from Heterogeneous, Distributed Environments. AAAI Press, Menlo Park (1995)
Google Scholar
Chen, M.-S., Park, J.S., Yu, P.S.: Efficient Data Mining for Path Traversal Patterns. IEEE Trans. Knowledge and Data Eng. 10(2) (1998)
Google Scholar
Yanchun, C.: Research on Intelligence Collecting System[J]. Journal of Shijiazhuang Railway Institute(Natural Science) (2008)
Google Scholar
Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (1999)
Google Scholar
Chen, M.S., Park, J.S., Yu, P.S.: Efficient Data Mining for Path Traversal Patterns in a Web Environment. IEEE Transaction on Knowledge and Data Engineering (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Applications, Shri Vishnu Engineering College for Women, Bhimavaram, W.G. Dt, Andhra Pradesh, India
V. V. R. Maheswara Rao
Department of Computer Science & Systems Engineering, College of Engineering, Andhra University, Visakhapatnam, Andhra Pradesh, India
V. Valli Kumari & K. V. S. V. N. Raju

Authors

V. V. R. Maheswara Rao
View author publications
You can also search for this author in PubMed Google Scholar
V. Valli Kumari
View author publications
You can also search for this author in PubMed Google Scholar
K. V. S. V. N. Raju
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jackson State University, 39217, Jackson, MS, USA
Natarajan Meghanathan
Deptt. of Electronics and Computer Engg., Indian Institute of Technology, Roorkee, India
Brajesh Kumar Kaushik
Wireilla Net Solutions PTY Ltd, Melbourne, Victoria, Australia
Dhinaharan Nagamalai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rao, V.V.R.M., Kumari, V.V., Raju, K.V.S.V.N. (2011). An Intelligent System for Web Usage Data Preprocessing. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. CCSIT 2011. Communications in Computer and Information Science, vol 131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17857-3_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-17857-3_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17856-6
Online ISBN: 978-3-642-17857-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics