Advertisement

Computing

, Volume 100, Issue 4, pp 421–437 | Cite as

Discovering pan-correlation patterns from time course data sets by efficient mining algorithms

  • Qian Liu
  • Shameek Ghosh
  • Jinyan Li
  • Limsoon Wong
  • Kotagiri Ramamohanarao
Article
  • 42 Downloads

Abstract

Time-course correlation patterns can be positive or negative, and time-lagged with gaps. Mining all these correlation patterns help to gain broad insights on variable dependencies. Here, we prove that diverse types of correlation patterns can be represented by a generalized form of positive correlation patterns. We prove a correspondence between positive correlation patterns and sequential patterns, and present an efficient single-scan algorithm for mining the correlations. Evaluations on synthetic time course data sets, and yeast cell cycle gene expression data sets indicate that: (1) the algorithm has linear time increment in terms of increasing number of variables; (2) negative correlation patterns are abundant in real-world data sets; and (3) correlation patterns with time lags and gaps are abundant. Existing methods have only discovered incomplete forms of many of these patterns, and have missed some important patterns completely.

Keywords

Pan-correlation pattern Time-course data Positive correlation patterns Negative correlation patterns Time-lagged positive correlation patterns Time-lagged negative correlation patterns 

Mathematics Subject Classification

68R01 (General) 

References

  1. 1.
    Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2(1):65–73CrossRefGoogle Scholar
  2. 2.
    Chuang CL, Jen CH, Chen CM, Shieh GS (2008) A pattern recognition approach to infer time-lagged genetic interactions. Bioinformatics 24(9):1183–1190CrossRefGoogle Scholar
  3. 3.
    Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Proc Nat Acad Sci 97(22):12,079–12,084CrossRefGoogle Scholar
  4. 4.
    Ji L, Tan KL (2004) Mining gene expression data for positive and negative co-regulated gene clusters. Bioinformatics 20(16):2711–2718CrossRefGoogle Scholar
  5. 5.
    Ji L, Tan KL (2005) Identifying time-lagged gene clusters using gene expression data. Bioinformatics 21(4):509–516CrossRefGoogle Scholar
  6. 6.
    Jiang D, Pei J, Ramanathan M, Tang C, Zhang A (2004a) Mining coherent gene clusters from gene-sample-time microarray data. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’04, pp 430–439Google Scholar
  7. 7.
    Koch K, Schonauer S, Jansen I, van den Bussche J, Burzykowski T (2007) Finding clusters of positive and negative coregulated genes in gene expression data. In: Proceedings of the 7th IEEE international conference on bioinformatics and bioengineering, 2007. BIBE 2007, pp 93–99Google Scholar
  8. 8.
    Li J, Liu Q, Zeng T (2010) Negative correlations in collaboration: concepts and algorithms. In: KDD, pp 463–472Google Scholar
  9. 9.
    Li X, Rao S, Jiang W, Li C, Xiao Y, Guo Z, Zhang Q, Wang L, Du L, Li J, Li L, Zhang T, Wang Q (2006) Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling. BMC Bioinform 7(1):26CrossRefGoogle Scholar
  10. 10.
    Madeira S, Oliveira A (2009) A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series. Algorithms Mol Biol 4(1):8CrossRefGoogle Scholar
  11. 11.
    Madeira SC, Teixeira MC, Sa-Correia I, Oliveira AL (2010) Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm. IEEE/ACM Trans Comput Biol Bioinf 7(1):153–165CrossRefGoogle Scholar
  12. 12.
    Parsons L, Haque E, Liu H (2004) clustering for high dimensional data: a review. SIGKDD Explor Newsl 6(1):90–105CrossRefGoogle Scholar
  13. 13.
    Roy S, Bhattacharyya DK, Kalita JK (2013) CoBi: pattern based co-regulated biclustering of gene expression data. Pattern Recogn Lett 34(14):1669–1678CrossRefGoogle Scholar
  14. 14.
    Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell cycleregulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9(12):3273–3297CrossRefGoogle Scholar
  15. 15.
    Van Mechelen I, Bock HH, De Boeck P (2004) Two-mode clustering methods: a structured overview. Stat Methods Med Res 13(5):363–394MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: 20th international conference on data engineering, 2004. Proceedings, pp 79–90Google Scholar
  17. 17.
    Yin L, Wang G, Mao K, Zhao Y (2006) Mining time-delayed coherent patterns in time series gene expression data. In: Li X, Zaiane O, Li Zh (eds) Advanced data mining and applications, vol 4093. Lecture notes in computer science. Springer, Berlin, pp 711–722CrossRefGoogle Scholar
  18. 18.
    Zeng T, Li J (2010) Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways. Nucleic Acids Res 38(1):e1CrossRefGoogle Scholar
  19. 19.
    Zhao Y, Yu J, Wang G, Chen L, Wang B, Yu G (2008b) Maximal coregulated gene clustering. IEEE Trans Knowl Data Eng 20(1):83–98CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2018

Authors and Affiliations

  • Qian Liu
    • 1
  • Shameek Ghosh
    • 1
  • Jinyan Li
    • 1
  • Limsoon Wong
    • 2
  • Kotagiri Ramamohanarao
    • 3
  1. 1.Advanced Analytics InstituteUniversity of Technology SydneyBroadwayAustralia
  2. 2.School of ComputingNational University of SingaporeSingaporeSingapore
  3. 3.Department of Computing and Information SystemsThe University of MelbourneParkvilleAustralia

Personalised recommendations