An Efficient Similarity Measure for Clustering of Categorical Sequences

  • Sang-Kyun Noh
  • Yong-Min Kim
  • DongKook Kim
  • Bong-Nam Noh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4304)


In this paper, we propose an efficient similarity measure as pre-processing method for clustering of categorical and sequential attributes. The similarity measure is based on a new dynamic programming algorithm, which computes sequence comparison scoring from the gap penalty matrix. This is presented by normalizing sequence comparison scoring. Self-evaluation of the proposed similarity measure is conducted by experimental results of clustering, which is an unsupervised learning algorithm greatly influenced by similarity measure between clusters. In the experiment, Tcpdump Data from DARPA 1999 Intrusion Detection Evaluation Data Sets are used. These transmission data are composed of sequential packet data in a network. Finally, the results of comparison experiments are discussed.


Similarity measure Dynamic programming Sequence clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to algorithms, 14th edn. MIT Press and McGraw-Hill Book (1994)Google Scholar
  2. 2.
    Sali, A., Blundell, T.L.: Definition of general topological equivalence in protein structures: A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J. Mol. Biol. 212, 403–428 (1990)CrossRefGoogle Scholar
  3. 3.
    Tillmann, C., Ney, H.: Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation. Computational Linguistics 29(1), 97–133 (2003)CrossRefGoogle Scholar
  4. 4.
    Myers, C., et al.: Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition. IEEE Trans. on acoustics, speech, and signal processing ASSP-28(6) (December 1980)Google Scholar
  5. 5.
    Atallah, M.J.: Algorithms and Theory of Computation Handbook, CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, FL 33431-9868, USA (1999)Google Scholar
  6. 6.
    Allison, L.: Dynamic programming algorithm (DPA) for edit-distance. In: Algorithms and Data Structures Research & Reference Material, School of Computer Science and Software Engineering, Monash University, Australia (1999)Google Scholar
  7. 7.
    Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proceeding of the IEEE International Conference on Data Engineering, Sydney (March 1999)Google Scholar
  8. 8.
    MIT Lincoln Laboratory, DARPA Intrusion Detection Evaluation Data Sets,

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sang-Kyun Noh
    • 1
  • Yong-Min Kim
    • 2
  • DongKook Kim
    • 3
  • Bong-Nam Noh
    • 3
  1. 1.Interdisciplinary Program of Information SecurityChonnam National UniversityKorea
  2. 2.Dept. of Electronic CommerceChonnam National UniversityKorea
  3. 3.Div. of Electronics Computer EngineeringChonnam National UniversityKorea

Personalised recommendations