Mining Association Rule Bases from Integrated Genomic Data and Annotations

  • Ricardo Martinez
  • Nicolas Pasquier
  • Claude Pasquier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5488)

Abstract

During the last decade, several clustering and association rule mining techniques have been applied to highlight groups of co-regulated genes in gene expression data. Nowadays, integrating these data and biological knowledge into a single framework has become a major challenge to improve the relevance of mined patterns and simplify their interpretation by biologists. GenMiner was developed for mining association rules from such integrated datasets. It combines a new nomalized discretization method, called NorDi, and the JClose algorithm to extract condensed representations for association rules. Experimental results show that GenMiner requires less memory than Apriori based approaches and that it improves the relevance of extracted rules. Moreover, association rules obtained revealed significant co-annotated and co-expressed gene patterns showing important biological relationships supported by recent biological literature.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the VLDB international conference, pp. 478–499 (1994)Google Scholar
  2. 2.
    Altman, R., Raychaudhuri, S.: Whole-Genome Expression Analysis: Challenges Beyond Clustering. Current Opinion Structural Biology 11, 340–347 (2001)CrossRefGoogle Scholar
  3. 3.
    Bera, A., Jarque, C.: Efficient Tests for Normality, Homoscedasticity and Serial Independence of Regression Residuals: Monte Carlo Evidence. Economics Letters 7, 313–318 (1981)CrossRefGoogle Scholar
  4. 4.
    Borgelt, C.: Recursion Pruning for the Apriori Algorithm. In: Proceedings of the FIMI international workshop (2004)Google Scholar
  5. 5.
    Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic Itemset Counting and Amplication Rules for Market Basket Data. In: Proceedings of the ACM SIGMOD international conference, pp. 255–264 (1997)Google Scholar
  6. 6.
    Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J., Pascual-Montano, A.: Integrated Analyis of Gene Expression by Association Rules Discovery. BMC Bioinformatics 7, 54 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Creighton, C., Hanansh, S.: Mining Gene Expression Databases for Association Rules. Bioinformatics 19, 79–86 (2003)CrossRefPubMedGoogle Scholar
  8. 8.
    Cristofor, L., Simovici, D.A.: Generating an Informative Cover for Association Rules. In: Proceedings of the ICDM international conference, pp. 597–600 (2002)Google Scholar
  9. 9.
    DeRisi, J., Iyer, L., Brown, V.: Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale. Science 278, 680–686 (1997)CrossRefPubMedGoogle Scholar
  10. 10.
    Eisen, M., Spellman, P., Brown, P., Botsein, D.: Cluster Analysis and Display of Genome Wide Expression Patterns. Proc. Nat. Aca. Sci. 95, 14863–14868 (1998)CrossRefGoogle Scholar
  11. 11.
    FIMI: Frequent Itemset Mining Implementations Repository, http://fimi.cs.helsinki.fi
  12. 12.
  13. 13.
    Georgi, E., Richter, L., Ruckert, U., Kramer, S.: Analyzing Microarray Data using Quantitative Association Rules. Bioinformatics 21, 123–129 (2005)CrossRefGoogle Scholar
  14. 14.
    Grubbs, F.: Procedures for Detecting Outlying Observations in Samples. Technometrics 11, 1–21 (1969)CrossRefGoogle Scholar
  15. 15.
    KEIA: Knowledge Extraction, Integration and Applications, http://keia.i3s.unice.fr
  16. 16.
    Lilliefors, H.: On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown. Journal of the American Statistical Association 62 (1967)Google Scholar
  17. 17.
    Lopez, F.J., Blanco, A., Garcia, F., Cano, C., Marin, A.: Fuzzy Association Rules for Biological Data Analysis: A Case Study on Yeast. BMC Bioinformatics 9, 107 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Martinez, R., Collard, M.: Extracted knowledge: Interpretation in Mining Biological Data, a Survey. Int. J. of Computer Science and Applications 1, 1–21 (2007)Google Scholar
  19. 19.
    Martinez, R., Pasquier, N., Pasquier, C.: GenMiner: Mining Informative Association Rules from Genomic Data. In: Proceedings of the IEEE BIBM international conference, pp. 15–22 (2007)Google Scholar
  20. 20.
    NIST: e-Handbook of Statistical Methods. SEMATECH (2007), http://www.itl.nist.gov/div898/handbook/
  21. 21.
    Pan, K., Lih, C., Cohen, N.: Effects of Threshold Choice on Biological Conclusions Reached During Analysis of Gene Expression by DNA Microarrays. Proc. Nat. Aca. Sci. 102, 8961–8965 (2005)CrossRefGoogle Scholar
  22. 22.
    Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., Lakhal, L.: Generating a Condensed Representation for Association Rules. Journal of Intelligent Information Systems 24(1), 29–60 (2005)CrossRefGoogle Scholar
  23. 23.
    Shatkay, H., Edwards, S., Wilbur, W., Boguski, M.: Genes, Themes, Microarrays: Using Information Retrieval for Large-Scale Gene Analysis. In: Proceedings of the ISMB international conference, pp. 340–347 (2000)Google Scholar
  24. 24.
    Tuzhilin, A., Adomavicius, G.: Handling Very Large Numbers of Association Rules in the Analysis of Microarray Data. In: Proceedings of the SIGKDD international conference, pp. 396–404 (2002)Google Scholar
  25. 25.
    Yang, I., Chen, E., Hasseman, J., Liang, W., Frank, B., Sharov, V., Quackenbush, J.: Within the Fold: Assesing Differential Expression Measures and Reproducibility in Microarray Assays. Genome Biology 3, 11 (2002)Google Scholar
  26. 26.
    Zhao, Y., McIntosh, K., Rudra, D., Schawalder, S., Shore, D., Warner, J.: Fine-Structure Analysis of Ribosomal Protein Gene Transcription. Molecular Cellular Biology 26(13), 4853–4862 (2006)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ricardo Martinez
    • 1
  • Nicolas Pasquier
    • 1
  • Claude Pasquier
    • 2
  1. 1.Laboratoire I3SUniversité de Nice Sophia-Antipolis/CNRS UMR-6070SophiaFrance
  2. 2.IDBCUniversité de Nice Sophia-Antipolis/CNRS UMR-6543, Parc ValroseNiceFrance

Personalised recommendations