Information Systems Frontiers

, Volume 11, Issue 4, pp 433–447

Efficient mining of multilevel gene association rules from microarray and gene ontology

  • Vincent S. Tseng
  • Hsieh-Hui Yu
  • Shih-Chiang Yang
Article

Abstract

Some recent studies have shown that association rules can reveal the interactions between genes that might not have been revealed using traditional analysis methods like clustering. However, the existing studies consider only the association rules among individual genes. In this paper, we propose a new data mining method named MAGO for discovering the multilevel gene association rules from the gene microarray data and the concept hierarchy of Gene Ontology (GO). The proposed method can efficiently find out the relations between GO terms by analyzing the gene expressions with the hierarchy of GO. For example, with the biological process in GO, some rules like Process A (up) → Process B (up) cab be discovered, which indicates that the genes involved in Process B of GO are likely to be up-regulated whenever those involved in Process A are up-regulated. Moreover, we also propose a constrained mining method named CMAGO for discovering the multilevel gene expression rules with user-specified constraints. Through empirical evaluation, the proposed methods are shown to have excellent performance in discovering the hidden multilevel gene association rules.

Keywords

Data mining Microarray Gene expression analysis Association rules mining Multi-level association rules Gene ontology 

References

  1. Ableson, A., & Glasgow, J. I. (2003). Efficient Statistical Pruning of Association Rules. In: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 22–26, Cavtat-Dubrovnik, Croatia, 23–34.Google Scholar
  2. Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, May, Washington, D. C., 207–216.Google Scholar
  3. Agrawal, R., & Srikant, R. (1994). Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 487–499.Google Scholar
  4. Berrar, D., Dubitzky, W., Granzow, M., & Ells, R. (2001). Analysis of Gene Expression and Drug Activity Data by Knowledge-based Association Mining. In: Proceedings of Critical Assessment of Techniques for Microarray Data Analysis, Duke University, NC, USA, 23–28.Google Scholar
  5. Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sug-net, C. W., Furey, T. S., et al. (2000). Know-ledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences, USA, 97(1), 262–267.CrossRefGoogle Scholar
  6. Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J. M., & Pascual-Montano, A. (2006). Integrated analysis of gene expression by association rules discovery. BMC Bioinformatics, 7(54), 1–16.Google Scholar
  7. Chen, R., Jiang, Q., Yuan, H., & Gruenwald, L. (2001). Mining Association Rules in Analysis of Transcription Factors Essential to Gene Expressions. In: Proceedings of The Atlantic Symposium on Computational Biology and genome Information Systems & Technology, Durham, NC, USA.Google Scholar
  8. Chuang, J. H., Huang, Y. H., Yu, H. H., & Tseng, V. S. (2006). Liver hepcidin and stainable iron expression in biliary atresia. Pediatric Research, 59(5), 662–666.CrossRefGoogle Scholar
  9. Creighton, C., & Hanash, S. (2003). mining gene expression databases for association rules. Bioinformatics, 19, 79–86.CrossRefGoogle Scholar
  10. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.CrossRefGoogle Scholar
  11. Gruźdź, A., Ihnatowicz, A., Śl, , & zak, D. (2006). Interactive gene clustering—a case study of breast cancer microarray data. Information Systems Frontiers, 8(1), 21–27.CrossRefGoogle Scholar
  12. Han, J., & Fu, Y. (1995). Discovery of Multiple-Level Association Rules from Large Databases. In: Proceedings of the 21st International Conference on Very Large Data Bases, 420–431.Google Scholar
  13. Huang, Z., Li, J., Su, H., Watts, G. S., & Chen, H. (2007). Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining. Decision Support Systems, 43(4), 1207–1225.CrossRefGoogle Scholar
  14. Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C. D., et al. (2000). Functional Discovery via a compendium of expression profiles. Cell, 102, 109–126.CrossRefGoogle Scholar
  15. Hvidsten, T. R., Lægreid, A., & Komorowski, J. (2003). Learning rule-based models of biological process from gene expression time profiles using Gene Ontology. Bioinformatics, 19, 1116–1123.CrossRefGoogle Scholar
  16. Icev, A., Ruiz, C., & Ryder, E. F. (2003). Distance-Enhanced Association Rules for Gene Expression. In: Proceedings of the 3rd ACM SIGKDD Workshop on Data Mining in Bioinformatics, 34–40.Google Scholar
  17. Johnson, S. C. (1967). Hierarchical Clustering Schemes. Psychometrika, 2, 241–254.CrossRefGoogle Scholar
  18. Kotala, P., Zhou, P., Mudivarthy, S., Perrizo, W., & Deckard, E. (2001). Gene Expression Profiling of DNA Microarray Data using Peano Count Trees (P-trees). In Online Proceedings of the First Virtual Conference on Genomics and Bioinformatics, 15–16.Google Scholar
  19. Kotlyar, M., & Jurisica, I. (2006). Predicting protein–protein interactions by association mining. Information Systems Frontiers, 8(1), 37–47.CrossRefGoogle Scholar
  20. Lee, C. F., Changchien, S. W., Wang, W. T., & Shen, J. J. (2006). A data mining approach to database compression. Information Systems Frontiers, 8(3), 147–161.CrossRefGoogle Scholar
  21. Li, J., & Wong, L. (2002). Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 18, 725–734.CrossRefGoogle Scholar
  22. MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1, 281–297.Google Scholar
  23. Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Lecture Notes in Computer Science, 1540, 398–416.CrossRefGoogle Scholar
  24. Pe’er, D., Regev, A., Elidan, G., & Friedman, N. (2001). Inferring subnetworks from perturbed expression profiles. Bioinformatics, 17, 215–224.Google Scholar
  25. Tamayo, P., et al. (1996). Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. In: Proceedings of the National Academy of Sciences, USA, 96, 2907–2912.CrossRefGoogle Scholar
  26. The Gene Ontology (GO) Consortium (2000). Gene Ontology: tool for the unification of biology. Nature Genetics, 25, 25–29.CrossRefGoogle Scholar
  27. The Gene Ontology (GO) Consortium (2001). Creating the Gene Ontology resource: design and implementation. Genome Research, 11, 1425–1433.CrossRefGoogle Scholar
  28. Toivonen, H., Klemettinen, M., Ronkainen, P., Hätönen, K., & Mannila, H. (1995). Pruning and Grouping Discovered Association Rules. In: Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, 47–52.Google Scholar
  29. Tseng, V. S., & Kao, C.-P. (2005). Efficiently mining gene expression data via a novel parameterless clustering method. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(4), 355–365.CrossRefGoogle Scholar
  30. Tseng, V. S., & Kao, C.-P. (2007). A novel similarity-based fuzzy clustering algorithm by integrating PCM and Mountain Method. In: IEEE Transactions on Fuzzy Systems, 15(6), 1188–1196.CrossRefGoogle Scholar
  31. Tuzhilin, A., & Adomavicius, G. (2002). Handling Very Large Numbers of Association Rules in the Analysis of Microarray Data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 296–304.Google Scholar
  32. Umebayashi, K., & Nakano, A. (2003). Ergosterol is required for targeting of tryptophan permease to the yeast plasma membrane. Journal of Cell Biology, 11, 1117–1131.CrossRefGoogle Scholar
  33. Wang, L., Zhu, J., & Zou, H. (2008). Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics, 24, 412–419.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Vincent S. Tseng
    • 1
    • 2
  • Hsieh-Hui Yu
    • 1
  • Shih-Chiang Yang
    • 1
  1. 1.Department Computer Science and Information EngineeringNational Cheng Kung UniversityTaiwanROC
  2. 2.Institute of Medical InformaticsNational Cheng Kung UniversityTaiwanROC

Personalised recommendations