Advertisement

Order Preserving Clustering by Finding Frequent Orders in Gene Expression Data

  • Li Teng
  • Laiwan Chan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4774)

Abstract

This paper concerns the discovery of Order Preserving Clusters (OP-Clusters) in gene expression data, in each of which a subset of genes induce a similar linear ordering along a subset of conditions. After converting each gene vector into an ordered label sequence. The problem is transferred into finding frequent orders appearing in the sequence set. We propose an algorithm of finding the frequent orders by iteratively Combining the most Frequent Prefixes and Suffixes (CFPS) in a statistical way. We also define the significance of an OP-Cluster. Our method has good scale-up property with dimension of the dataset and size of the cluster. Experimental study on both synthetic datasets and real gene expression dataset shows our approach is very effective and efficient.

Keywords

Gene Expression Data Association Rule Brca2 Mutation Minimum Support Synthetic Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Cheng, Y., Churhc, G.: Biclustering of expression data. In: ISMB 2000, pp. 93–103. ACM Press, New York (2000)Google Scholar
  2. 2.
    Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. IEEE Transactions on Knowledge and Data Engineering 18, 136–144 (2002)Google Scholar
  3. 3.
    Wang, H., Wang, W., Yang, J., Yu, P.: Clustering by pattern similarity in large data sets. In: ACM SIGMOD Conference on Management of Data 2002, pp. 394–405 (2002)Google Scholar
  4. 4.
    Yang, J., Wang, W., Wang, H., Yu, P.: δ-clustering: Capturing subspace correlation in a large data set. In: 18th IEEE Int’l. Conf. Data Eng., pp. 517–528 (2002)Google Scholar
  5. 5.
    Bleuler, S., Prelic, A., Zitzler, E.: An ea framework for biclustering of gene expression data. In: Congress on Evolutionary Computation 2004, pp. 166–173 (2004)Google Scholar
  6. 6.
    Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue cococlustering of gene expression data. In: Fourth SIAM Int’l. Conf. Data Mining (2004)Google Scholar
  7. 7.
    Teng, L., Chan, L.: Biclustering gene expression profiles by alternately sorting with weighted correlation coefficient. In: IEEE International Workshop on Machine Learning for Signal Processing’06 (2006)Google Scholar
  8. 8.
    Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)CrossRefGoogle Scholar
  9. 9.
    Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: The order-preserving submatrix problem. In: RECOMB 2002, ACM Press, New York (2002)Google Scholar
  10. 10.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th International Conference on Data Engineering, pp. 3–14 (1995)Google Scholar
  11. 11.
    Han, J., Pei, J., Yin, J.: Mining frequent frequent patterns without candidate generation. In: ISMB’00 ACM SIGMOD Conference on Management of Data 2002, pp. 1–12 (2000)Google Scholar
  12. 12.
    Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering 16, 1424–1440 (2004)CrossRefGoogle Scholar
  13. 13.
    Liu, J., Yang, J., Wang, W.: Biclustering in gene expression data by tendency. In: IEEE Computational Systems Bioinformatics Conference, pp. 182–193. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  14. 14.
    Hipp, J., Guntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining- a general survey and comparison. SIGKDD Explorations 2, 58–64 (2000)CrossRefGoogle Scholar
  15. 15.
    Bleuler, S., Zitzler, E.: Order preserving clustering over multiple time course experiments. In: EvoBIO 2005, pp. 33–43 (2005)Google Scholar
  16. 16.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: 20th Int’l. Conf. Very Large Data Bases, pp. 487–499 (1994)Google Scholar
  17. 17.
    Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P.: Gene expression profiles in hereditary breast cancer. NEJM 344, 539–548 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Li Teng
    • 1
  • Laiwan Chan
    • 1
  1. 1.Department of Computer Science and Engineering, The Chinese University of Hong KongHong Kong

Personalised recommendations