Skip to main content

An Effective System for Mining Web Log

  • Conference paper
Frontiers of WWW Research and Development - APWeb 2006 (APWeb 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Included in the following conference series:

Abstract

The WWW provides a simple yet effective media for users to search, browse, and retrieve information in the Web. Web log mining is a promising tool to study user behaviors, which could further benefit web-site designers with better organization and services. Although there are many existing systems that can be used to analyze the traversal path of web-site visitors, their performance is still far from satisfactory. In this paper, we propose our effective Web log mining system consists of data preprocessing, sequential pattern mining and visualization. In particular, we propose an efficient sequential mining algorithm (LAPIN_WEB: LAst Position INduction for WEB log), an extension of previous LAPIN algorithm to extract user access patterns from traversal path in Web logs. Our experimental results and performance studies demonstrate that LAPIN_WEB is very efficient and outperforms well-known PrefixSpan by up to an order of magnitude on real Web log datasets. Moreover, we also implement a visualization tool to help interpret mining results as well as predict users’ future requests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Google Website, http://www.google.com

  2. Wu, K., Yu, P.S., Ballman, A.: Speedtracer: A Web usage mining and analysis tool. IBM Systems Journal 37(1), 89–105 (1998)

    Article  Google Scholar 

  3. Ishikawa, H., Ohta, M., Yokoyama, S., Nakayama, J., Katayama, K.: On the Effectiveness of Web Usage Mining for Page Recommendation and Restructuring. In: 2nd Annual International Workshop of the Working Group ”Web and Databases” of the German Informatics Society (October 2002)

    Google Scholar 

  4. Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential Pattern Mining using A Bitmap Representation. In: 8th ACM SIGKDD Int’l Conf. Knowledge Discovery in Databases (KDD 2002), Alberta, Canada, pp. 429–435 (July 2002)

    Google Scholar 

  5. Hong, J.I., Landay, J.A.: WebQuilt: A Framework for Capturing and Visualizing the Web Experience. In: 10th Int’l Conf. on the World Wide Web (WWW 2001), Hong Kong, China, pp. 717–724 (May 2001)

    Google Scholar 

  6. Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access pattern efficiently from web logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 396–407. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  7. Pei, J., Han, J., Behzad, M.A., Pinto, H.: PrefixSpan:Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: 17th Int’l Conf. of Data Engineering (ICDE 2001), Heidelberg, Germany (April 2001)

    Google Scholar 

  8. Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Mining Sequential Patterns by Pattern-growth: The PrefixSpan Approach. IEEE Transactions on Knowledge and Data Engineering 16(11), 1424–1440 (2004)

    Article  Google Scholar 

  9. Pitkow, J., Bharat, K.: WebViz: A Tool for World-Wide Web Access Log Analysis. In: 1st Int’l Conf. on the World Wide Web (WWW 1994), Geneva, Switzerland (May 1994)

    Google Scholar 

  10. Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning 40, 31–60 (2001)

    Article  Google Scholar 

  11. Spiliopoulou, M., Faulstich, L.C.: WUM: A Web Utilization Miner. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590. Springer, Heidelberg (1999)

    Google Scholar 

  12. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: 20th Int’l Conf. on Very Large Databases (VLDB 1994), Santiago, Chile, pp. 487–499 (September 1994)

    Google Scholar 

  13. Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th Int’l Conf. of Data Engineering (ICDE 1995), Taipei, Taiwan, pp. 3–14 (March 1995)

    Google Scholar 

  14. Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. J. Knowledge and Information Systems 1(1), 5–32 (1999)

    Google Scholar 

  15. Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2(1), 1–15 (2000)

    Article  Google Scholar 

  16. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 13–17. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  17. Yan, X., Han, J., Afshar, R.: CloSpan: Mining closed sequential patterns in large datasets. In: 3rd SIAM Int’l Conf. Data Mining (SDM 2003), San Francisco, CA, pp. 166–177 (May 2003)

    Google Scholar 

  18. Yang, Z., Wang, Y., Kitsuregawa, M.: LAPIN: Effective Sequential Pattern Mining Algorithms by Last Position Induction. Technical Report (TR050617), Info. and Comm. Eng. Dept., Tokyo University, Japan (June 2005), http://www.tkl.iis.u-tokyo.ac.jp/~yangzl/Document/LAPIN.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, Z., Wang, Y., Kitsuregawa, M. (2006). An Effective System for Mining Web Log. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_5

Download citation

  • DOI: https://doi.org/10.1007/11610113_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31142-3

  • Online ISBN: 978-3-540-32437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics