An Efficient Similarity Measure for Clustering of Categorical Sequences
In this paper, we propose an efficient similarity measure as pre-processing method for clustering of categorical and sequential attributes. The similarity measure is based on a new dynamic programming algorithm, which computes sequence comparison scoring from the gap penalty matrix. This is presented by normalizing sequence comparison scoring. Self-evaluation of the proposed similarity measure is conducted by experimental results of clustering, which is an unsupervised learning algorithm greatly influenced by similarity measure between clusters. In the experiment, Tcpdump Data from DARPA 1999 Intrusion Detection Evaluation Data Sets are used. These transmission data are composed of sequential packet data in a network. Finally, the results of comparison experiments are discussed.
KeywordsSimilarity measure Dynamic programming Sequence clustering
Unable to display preview. Download preview PDF.
- 1.Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to algorithms, 14th edn. MIT Press and McGraw-Hill Book (1994)Google Scholar
- 4.Myers, C., et al.: Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition. IEEE Trans. on acoustics, speech, and signal processing ASSP-28(6) (December 1980)Google Scholar
- 5.Atallah, M.J.: Algorithms and Theory of Computation Handbook, CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, FL 33431-9868, USA (1999)Google Scholar
- 6.Allison, L.: Dynamic programming algorithm (DPA) for edit-distance. In: Algorithms and Data Structures Research & Reference Material, School of Computer Science and Software Engineering, Monash University, Australia (1999)Google Scholar
- 7.Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proceeding of the IEEE International Conference on Data Engineering, Sydney (March 1999)Google Scholar
- 8.MIT Lincoln Laboratory, DARPA Intrusion Detection Evaluation Data Sets, http://www.ll.mit.edu/IST/ideval/data/data_index.html