Discovering pan-correlation patterns from time course data sets by efficient mining algorithms
Time-course correlation patterns can be positive or negative, and time-lagged with gaps. Mining all these correlation patterns help to gain broad insights on variable dependencies. Here, we prove that diverse types of correlation patterns can be represented by a generalized form of positive correlation patterns. We prove a correspondence between positive correlation patterns and sequential patterns, and present an efficient single-scan algorithm for mining the correlations. Evaluations on synthetic time course data sets, and yeast cell cycle gene expression data sets indicate that: (1) the algorithm has linear time increment in terms of increasing number of variables; (2) negative correlation patterns are abundant in real-world data sets; and (3) correlation patterns with time lags and gaps are abundant. Existing methods have only discovered incomplete forms of many of these patterns, and have missed some important patterns completely.
KeywordsPan-correlation pattern Time-course data Positive correlation patterns Negative correlation patterns Time-lagged positive correlation patterns Time-lagged negative correlation patterns
Mathematics Subject Classification68R01 (General)
- 6.Jiang D, Pei J, Ramanathan M, Tang C, Zhang A (2004a) Mining coherent gene clusters from gene-sample-time microarray data. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’04, pp 430–439Google Scholar
- 7.Koch K, Schonauer S, Jansen I, van den Bussche J, Burzykowski T (2007) Finding clusters of positive and negative coregulated genes in gene expression data. In: Proceedings of the 7th IEEE international conference on bioinformatics and bioengineering, 2007. BIBE 2007, pp 93–99Google Scholar
- 8.Li J, Liu Q, Zeng T (2010) Negative correlations in collaboration: concepts and algorithms. In: KDD, pp 463–472Google Scholar
- 16.Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: 20th international conference on data engineering, 2004. Proceedings, pp 79–90Google Scholar