Advertisement

Nonlinear Dynamics

, Volume 61, Issue 3, pp 347–361 | Cite as

CAS based clustering algorithm for Web users

  • Miao Wan
  • Lixiang Li
  • Jinghua Xiao
  • Yixian Yang
  • Cong Wang
  • Xiaolei Guo
Article

Abstract

This article devises a clustering technique for detecting groups of Web users from Web access logs. In this technique, Web users are clustered by a new clustering algorithm which uses the mechanism analysis of chaotic ant swarm (CAS). This CAS based clustering algorithm is called as CAS-C and it solves clustering problems from the perspective of chaotic optimization. The performance of CAS-C for detecting Web user clusters is compared with the popular clustering method named k-means algorithm. Clustering qualities are evaluated via calculating the average intra-cluster and inter-cluster distance. Experimental results demonstrate that CAS-C is an effective clustering technique with larger average intra-cluster distance and smaller average inter-cluster distance than k-means algorithm. The statistical analysis of resulted distances also proves that the CAS-C based Web user clustering algorithm has better stability. In order to show the utility, the proposed approach is applied to a pre-fetching task which predicts user requests with encouraging results.

Keywords

Clustering Chaotic ant swarm (CAS) Web access logs Web user clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Miyano, T., Tsutsui, T.: Data synchronization in a network of coupled phase oscillators. Phys. Rev. Lett. 98, 024102 (2007) CrossRefGoogle Scholar
  2. 2.
    Ye, Z., Hu, S., Yu, J.: Adaptive clustering algorithm for community detection in complex networks. Phys. Rev. E 78, 046115 (2008) CrossRefMathSciNetGoogle Scholar
  3. 3.
    Feldt, S., Waddell, J., Hetrick, V.L., Berke, J.D., Żochowski, M.: Functional clustering algorithm for the analysis of dynamic network data. Phys. Rev. E 79, 056104 (2009) CrossRefGoogle Scholar
  4. 4.
    Reichardt, J., Leone, M.: (Un)detectable cluster structure in sparse networks. Phys. Rev. Lett. 101, 078701 (2008) CrossRefGoogle Scholar
  5. 5.
    Gfeller, D., Chappelier, J.C., DeLosRios, P.: Finding instabilities in the community structure of complex networks. Phys. Rev. E 72, 056135 (2005) CrossRefGoogle Scholar
  6. 6.
    Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006) CrossRefMathSciNetGoogle Scholar
  7. 7.
    Hu, Y., Li, M., Zhang, P., Fan, Y., Di, Z.: Community detection by signaling on complex networks. Phys. Rev. E 78, 016115 (2008) CrossRefGoogle Scholar
  8. 8.
    Vázquez, A., Oliveira, J.G., Dezsö, Z., Goh, K.I., Kondor, I., Barabási, A.L.: Modeling bursts and heavy tails in human dynamics. Phys. Rev. E 73, 036127 (2006) CrossRefGoogle Scholar
  9. 9.
    Gonçalves, B., Ramasco, J.J.: Human dynamics revealed through Web analytics. Phys. Rev. E 78, 026123 (2008) CrossRefGoogle Scholar
  10. 10.
    Meiss, M.R., Menczer, F., Fortunato, S., Flammini, A., Vespignani, A.: Ranking Web Sites with Real User Traffic. In: Proc. WSDM ’08, California, vol. 1, pp. 65–76. ACM, New York (2008) CrossRefGoogle Scholar
  11. 11.
    Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on web usage mining. Commun. ACM 43, 142–151 (2000) CrossRefGoogle Scholar
  12. 12.
    Paik, H.Y., Benatallah, B., Hamadi, R.: Dynamic restructuring of e-catalog communities based on user interaction patterns. World Wide Web 5, 325–366 (2002) CrossRefGoogle Scholar
  13. 13.
    IBM, SurfAid  Analytics, http://surfaid.dfw.ibm.com (2003)
  14. 14.
    Padmanabhan, V.N., Mogul, J.C.: Using predictive prefetching to improve world wide web latency. ACM Comput. Commun. Rev. 3, 23–36 (1996) Google Scholar
  15. 15.
    Berendt, B.: Using site semantics to analyze, visualize, and support navigation. Data Mining Knowl. Discov. 6, 37–59 (2002) CrossRefMathSciNetGoogle Scholar
  16. 16.
    Fu, Y., Creado, M., Ju, C.: Reorganizing web sites based on user access patterns. In: Proc. 10th Int. Conf. on Information and Knowledge Management, Georgia, USA, vol. 1, pp. 583–585. ACM, New York (2001) Google Scholar
  17. 17.
    Ansari, S., Kohavi, R., Mason, L., Zheng, Z.: Integrating e-commerce and data mining: architecture and challenges. In: Proc. 2001 IEEE Int. Conf. on Data Mining (ICDM 2001), San Mateo, USA, vol. 1, pp. 27–34. IEEE Computer Society, Washington (2000) Google Scholar
  18. 18.
    MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proc. 5th Berkeley Symp. on Math. Statist. and Prob., Berkeley, vol. 1, pp. 281–297. University of California Press, Berkeley (1967) Google Scholar
  19. 19.
    Aihara, K., Takabe, T., Toyoda, M.: Chaotic neural networks. Phys. Lett. A 144, 333–340 (1990) CrossRefMathSciNetGoogle Scholar
  20. 20.
    Chen, L., Aihara, K.: Chaotic simulated annealing by a neural network model with transient chaos. Neural Netw. 8, 915 (1995) CrossRefGoogle Scholar
  21. 21.
    Cai, J., Ma, X., Li, L., Peng, H.: Chaotic particle swarm optimization for economic dispatch considering the generator constraints. Energy Convers. Manag. 48, 645 (2007) CrossRefGoogle Scholar
  22. 22.
    Tokuda, I., Aihara, K., Nagashima, T.: Adaptive annealing for chaotic optimization. Phys. Rev. E 58, 5157 (1998) CrossRefMathSciNetGoogle Scholar
  23. 23.
    Yang, D., Li, G., Cheng, G.: On the efficiency of chaos optimization algorithms for global optimization. Chaos Solitons Fractals 34, 1366–1375 (2007) CrossRefGoogle Scholar
  24. 24.
    Li, L., Yang, Y., Peng, H., Wang, X.: An optimization method inspired by chaotic ant behavior. Int. J. Bifurc. Chaos 16, 2351–2364 (2006) CrossRefMathSciNetGoogle Scholar
  25. 25.
    Li, L., Yang, Y., Peng, H., Wang, X.: Parameters identification of chaotic systems via chaotic ant swarm. Chaos Solitons Fractals 28, 1204–1211 (2006) MATHCrossRefGoogle Scholar
  26. 26.
    Cai, J., Ma, X., Li, L., Yang, Y., Peng, H., Wang, X.: Chaotic ant swarm optimization to economic dispatch. Electric Power Syst. Res. 77, 1373–1380 (2007) CrossRefGoogle Scholar
  27. 27.
    Li, L., Yang, Y., Peng, H.: Fuzzy system identification via chaotic ant swarm. Chaos Solitons Fractals 40, 1399–1407 (2009) CrossRefGoogle Scholar
  28. 28.
    Cole, B.J.: Is animal behavior chaotic? Evidence from the activity of ants. Proc. R. Soc. Lond. B, Biol. Sci. 244, 253–259 (1991) CrossRefGoogle Scholar
  29. 29.
    Solé, R.V., Miramontes, O., Goodwill, B.C.: Oscillations and chaos in ant societies. J. Theor. Biol. 161, 343–357 (1993) CrossRefGoogle Scholar
  30. 30.
    Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1, 5–32 (1999) Google Scholar
  31. 31.
    Cooley, R.: Web usage mining: discovery and application of interesting patterns from web data. PhD thesis, University of Minnesota (2000) Google Scholar
  32. 32.
    Anderson, C.R.: Amachine learning approach to web personalization. PhD thesis, University of Washington (2002) Google Scholar
  33. 33.
    Catledge, L.D., Pitkow, J.E.: Characterizing browsing strategies in the World-Wide Web. Comput. Netw. ISDN Syst. 27, 1065–1073 (1995) CrossRefGoogle Scholar
  34. 34.
    Cunha, C.A., Bestavros, A., Crovella, M.E.: Characteristics of WWW client traces. Boston University Department of Computer Science, Technical Report TR-95-010, April 1995. http://ita.ee.lbl.gov/html/contrib/BU-Web-Client.html
  35. 35.
    The Internet Traffic Archive. http://ita.ee.lbl.gov/index.html
  36. 36.
    Teng, W., Chang, C., Chen, M.: Integrating web caching and web prefetching in client-side proxies. IEEE Trans. Parallel Distrib. Syst. 16, 444–455 (2005) CrossRefGoogle Scholar
  37. 37.
    Lan, B., Bressan, S., Ooi, B.C., Tan, K.: Rule-assisted prefetching in web server caching. In: Proc. 2000 ACM Int. Conf. on Information and Knowledge Management, Virginia, USA, vol. 1, pp. 504–511. ACM, New York (2000) Google Scholar
  38. 38.
    Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: Effective prediction of web-user accesses: a data mining approach. In: Proc. Workshop Web Usage Analysis and User Profiling (WebKDD’01), San Francisco, USA. ACM, New York (2001) Google Scholar
  39. 39.
    Pitkow, J., Pirolli, P.: Mining longest repeating subsequence to predict world wide web surfing. In: Proc. 2nd USENIX Symp. Internet Technologies and Systems (USENIX, 1999), Colorado, USA. vol. 1, pp. 139–150 Google Scholar
  40. 40.
    Tian, W., Choi, B., Phoha, V.V.: An adaptive web cache access predictor using neural network. In: Proc. 15th Int. Conf. on IEA/AIE, Cairns, Australia, vol. 2358, pp. 450–459. Springer, Berlin (2002) Google Scholar
  41. 41.
    Wu, Y., Chen, A.: Prediction of web page accesses by proxy server log. World Wide Web 5, 67–88 (2002) MATHCrossRefGoogle Scholar
  42. 42.
    Mulvenna, M.D., Anand, S.S., Buchner, A.G.: Personalization on the net using web mining. Commun. ACM 43, 123–125 (2000) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Miao Wan
    • 1
    • 2
    • 3
  • Lixiang Li
    • 1
    • 2
    • 3
  • Jinghua Xiao
    • 1
    • 4
  • Yixian Yang
    • 1
    • 2
    • 3
  • Cong Wang
    • 1
    • 2
    • 3
  • Xiaolei Guo
    • 1
    • 2
    • 3
  1. 1.Information Security Center, State Key Laboratory of Networking and Switching TechnologyBeijing University of Posts and TelecommunicationsBeijingChina
  2. 2.Key Laboratory of Network and Information Attack & Defence Technology of MOEBeijing University of Posts and TelecommunicationsBeijingChina
  3. 3.National Engineering Laboratory for Disaster Backup and RecoveryBeijing University of Posts and TelecommunicationsBeijingChina
  4. 4.School of ScienceBeijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations