Group SAX: Extending the Notion of Contrast Sets to Time Series and Multimedia Data

  • Jessica Lin
  • Eamonn Keogh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)


In this work, we take the traditional notation of contrast sets and extend them to other data types, in particular time series and by extension, images. In the traditional sense, contrast-set mining identifies attributes, values and instances that differ significantly across groups, and helps user understand the differences between groups of data. We reformulate the notion of contrast-sets for time series data, and define it to be the key pattern(s) that are maximally different from the other set of data. We propose a fast and exact algorithm to find the contrast sets, and demonstrate its utility in several diverse domains, ranging from industrial to anthropology. We show that our algorithm achieves 3 orders of magnitude speedup from the brute-force algorithm, while producing exact solutions.


Time Series Outer Loop Hash Table Association Rule Mining Power Demand 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Mining Associations Between Sets of Items in Massive Databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)Google Scholar
  2. 2.
    Bay, S.: Multivariate Discretization of Continuous Variables for Set Mining. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, August 20-23 (2000)Google Scholar
  3. 3.
    Bay, S.D., Pazzani, M.J.: Detecting Change in Categorical Data: Mining Contrast Sets. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 15-18, pp. 302–306 (1999)Google Scholar
  4. 4.
    Bayardo, J., Roberto, J.: Efficiently Mining Long Patterns from Databases. In: Proceedings of 1998 ACM SIGMOD International Conference on Management of Data, pp. 85–93 (1998)Google Scholar
  5. 5.
    Bentley, J.L., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 360–369 (1997)Google Scholar
  6. 6.
    He, Z., Xu, X., Deng, S.: Mining Cluster-Defining Actionable Rules. In: Proceedings of NDBC 2004 (2004)Google Scholar
  7. 7.
    Keogh, E., Kasetty, S.: On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In: Proc. of SIGKDD, pp. 102–111 (2002)Google Scholar
  8. 8.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2003)Google Scholar
  9. 9.
    Menzies, T., Hu, Y.: Data Mining for Very Busy People. IEEE Computer, 18–25 (2003)Google Scholar
  10. 10.
    Minaei-Bidgoli, B., Tan, P.-N., Punch, W.F.: Mining Interesting Contrast Rules for a Web-based Educational System. In: Proceedings of 2004 International Conference on Machine Learning Application, Louisville, KY, December 16-18 (2004)Google Scholar
  11. 11.
    Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Fast Detection of XML Structural Similarity. IEEE Trans. Knowl. Data Eng. 17(2), 160–175 (2005)CrossRefGoogle Scholar
  12. 12.
    Tanaka, Y., Uehara, K.: Motif Discovery Algorithm from Motion Data. In: Proceedings of the 18th Annual Conference of the Japanese Society for Artificial Intelligence (JSAI) (2004)Google Scholar
  13. 13.
    Webb, G.: Magnum Opus version 1.3. Computer software, Distributed by Rulequest Research (2001)Google Scholar
  14. 14.
    Webb, G., Butler, S., Newlands, D.: On Detecting Differences Between Groups. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, August 24-27 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jessica Lin
    • 1
  • Eamonn Keogh
    • 2
  1. 1.Information and Software EngineeringGeorge Mason University 
  2. 2.Department of Computer Science & EngineeringUniversity of CaliforniaRiverside

Personalised recommendations