Scientific and Statistical Database Management

Volume 5566 of the series Lecture Notes in Computer Science pp 461-477

Finding Structural Similarity in Time Series Data Using Bag-of-Patterns Representation

  • Jessica LinAffiliated withComputer Science Department, George Mason University
  • , Yuan LiAffiliated withComputer Science Department, George Mason University

* Final gross prices may vary according to local VAT.

Get Access


For more than one decade, time series similarity search has been given a great deal of attention by data mining researchers. As a result, many time series representations and distance measures have been proposed. However, most existing work on time series similarity search focuses on finding shape-based similarity. While some of the existing approaches work well for short time series data, they typically fail to produce satisfactory results when the sequence is long. For long sequences, it is more appropriate to consider the similarity based on the higher-level structures. In this work, we present a histogram-based representation for time series data, similar to the “bag of words” approach that is widely accepted by the text mining and information retrieval communities. We show that our approach outperforms the existing methods in clustering, classification, and anomaly detection on several real datasets.


Data mining Time series Similarity Search