Skip to main content

An Intelligent System for Web Usage Data Preprocessing

  • Conference paper
Advances in Computer Science and Information Technology (CCSIT 2011)

Abstract

Web mining is an application of data mining technologies for huge data repositories. Before applying web mining techniques, the data in the web log has to be pre-processed, integrated and transformed. As the World Wide Web is continuously and rapidly growing, it is necessary for the web miners to utilize intelligent tools in order to find, extract, filter and evaluate the desired information. The data preprocessing stage is the most important phase in the process of web mining and is critical and complex in successful extraction of useful data. The web log is incremental in nature, thus conventional data pre-processing techniques were proved to be not suitable as they assume that the data is static. The web logs are non scalable, impractical and are distributed in nature. Hence we require a comprehensive learning algorithm in order to get the desired information.

This paper introduces an intelligent system, capable of preprocessing web logs efficiently. It can identify human user and web search engine accesses intelligently, in less time. The system discussed reduces the error rate and improves significant learning performance of the learning algorithm. The work ensures the goodness of split by using popular measures like Entropy and Gini index. The experimental results proving this claim are given in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Spiliopoulou, M.: Web Usage Mining for Site Evaluation. Comm. ACM 43(8), 127–134 (2000)

    Article  Google Scholar 

  2. Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology (TOIT) 3(1), 1–27 (2003)

    Article  Google Scholar 

  3. Kleinberg, J.M.: Authoritatve sources in a hyperlinked environment. In: ACM-SIAM Symposium on Discrete Algorithms (1998)

    Google Scholar 

  4. Kamdar, T.: Creating Adaptive Web Servers Using Incremental Weblog Mining, masters thesis, Computer Science Dept., Univ. of Maryland, Baltimore, C0–1 (2001)

    Google Scholar 

  5. Wang, Y.: Web Mining and Knowledge Discovery of Usage Patterns (February 2000)

    Google Scholar 

  6. Cooley, R., Mobasher, B., Srivastava, J.: Web mining: Information and pattern discovery on the World Wide Web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 1997 (1997)

    Google Scholar 

  7. Srivastava, J., Desikan, P., Kumar, V.: Web Mining: Accomplishments and Future Directions. In: Proc. US Nat’l Science Foundation Workshop on Next-Generation Data Mining (NGDM). Nat’l Science Foundation (2002)

    Google Scholar 

  8. Kumar, R., et al.: Trawling the Web for Emerging Cybercommunities. In: Proc. 8th World Wide Web Conf., Elsevier Science, Amsterdam (1999)

    Google Scholar 

  9. Manolopoulos, Y., et al.: Indexing Techniques for Web Access Logs. Web Information Systems, IDEA Group (2004)

    Book  Google Scholar 

  10. Armstrong, R., et al.: Webwatcher: A Learning Apprentice for the World Wide Web. In: Proc. AAAI Spring Symp. Information Gathering from Heterogeneous, Distributed Environments. AAAI Press, Menlo Park (1995)

    Google Scholar 

  11. Chen, M.-S., Park, J.S., Yu, P.S.: Efficient Data Mining for Path Traversal Patterns. IEEE Trans. Knowledge and Data Eng. 10(2) (1998)

    Google Scholar 

  12. Yanchun, C.: Research on Intelligence Collecting System[J]. Journal of Shijiazhuang Railway Institute(Natural Science) (2008)

    Google Scholar 

  13. Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (1999)

    Google Scholar 

  14. Chen, M.S., Park, J.S., Yu, P.S.: Efficient Data Mining for Path Traversal Patterns in a Web Environment. IEEE Transaction on Knowledge and Data Engineering (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rao, V.V.R.M., Kumari, V.V., Raju, K.V.S.V.N. (2011). An Intelligent System for Web Usage Data Preprocessing. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. CCSIT 2011. Communications in Computer and Information Science, vol 131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17857-3_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17857-3_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17856-6

  • Online ISBN: 978-3-642-17857-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics