The SKM Algorithm: A K-Means Algorithm for Clustering Sequential Data

  • José G. Dias
  • Maria João Cortinhal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5290)

Abstract

This paper introduces a new algorithm for clustering sequential data. The SKM algorithm is a K-Means-type algorithm suited for identifying groups of objects with similar trajectories and dynamics. We provide a simulation study to show the good properties of the SKM algorithm. Moreover, a real application to website users’ search patterns shows its usefulness in identifying groups with heterogeneous behavior. We identify two distinct clusters with different styles of website search.

Keywords

clustering sequential data K-Means algorithm KL distance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cadez, I., Heckerman, D., Meek, C., Smyth, P., White, S.: Model-based clustering and visualization of navigation patterns on a web site. Data Mining and Knowledge Discovery 7(4), 399–424 (2003)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Dias, J.G., Willekens, F.: Model-based clustering of life histories with an application to contraceptive use dynamics. Mathematical Population Studies 12(3), 135–157 (2005)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Dias, J.G., Vermunt, J.K.: Latent class modeling of website users’ search patterns: Implications for online market segmentation. Journal of Retailing and Consumer Services 14(4), 359–368 (2007)CrossRefGoogle Scholar
  4. 4.
    Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Lee, J., Podlaseck, M., Schonberg, E., Hoch, R.: Visualization and analysis of clickstream data of online stores for understanding Web merchandising. Data Mining and Knowledge Discovery 5(1-2), 59–84 (2001)CrossRefGoogle Scholar
  6. 6.
    MathWorks.: MATLAB 7.0. Natick, MA: The MathWorks, Inc (2004)Google Scholar
  7. 7.
    MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  8. 8.
    Petridou, S.G., Koutsonikola, V.A., Vakali, A.I., Papadimitriou, G.I.: A divergence-oriented approach for web users clustering. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3981, pp. 1229–1238. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Ross, S.M.: Introduction to Probability Models, 7th edn. Harcourt/Academic Press, San Diego (2000)MATHGoogle Scholar
  10. 10.
    Shahabi, C., Zarkesh, A.M., Adibi, J., Shah, V.: Knowledge discovery from users Web-page navigation. In: Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE 1997). High Performance Database Management for Large-Scale Applications, pp. 20–29. IEEE Computer Society, Los Alamitos (1997)Google Scholar
  11. 11.
    Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Smith, K.A., Ng, A.: Web page clustering using a self-organizing map of user navigation patterns. Decision Support Systems 35(2), 245–256 (2003)CrossRefGoogle Scholar
  13. 13.
    Spiliopoulou, M., Pohle, C.: Data mining for measuring and improving the success of Web sites. Data Mining and Knowledge Discovery 5(1-2), 85–114 (2001)MATHCrossRefGoogle Scholar
  14. 14.
    Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R.: MONIC: modeling and monitoring cluster transitions. In: KDD 2006, pp. 706–711 (2006)Google Scholar
  15. 15.
    Vakali, A., Pokorny, J., Dalamagas, T.: An overview of web data clustering practices. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 597–606. Springer, Heidelberg (2004)Google Scholar
  16. 16.
    Yang, Y.H., Padmanabhan, B.: GHIC: A hierarchical pattern-based clustering algorithm for grouping Web transactions. IEEE Transactions on Knowledge and Data Engineering 17(9), 1300–1304 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • José G. Dias
    • 1
  • Maria João Cortinhal
    • 2
  1. 1.Department of Quantitative MethodsISCTE Business School and UNIDELisboaPortugal
  2. 2.Department of Quantitative MethodsISCTE Business School and CIOLisboaPortugal

Personalised recommendations