Skip to main content

Association Analysis Techniques for Bioinformatics Problems

  • Conference paper
Bioinformatics and Computational Biology (BICoB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5462))

Included in the following conference series:

Abstract

Association analysis is one of the most popular analysis paradigms in data mining. Despite the solid foundation of association analysis and its potential applications, this group of techniques is not as widely used as classification and clustering, especially in the domain of bioinformatics and computational biology. In this paper, we present different types of association patterns and discuss some of their applications in bioinformatics. We present a case study showing the usefulness of association analysis-based techniques for pre-processing protein interaction networks for the task of protein function prediction. Finally, we discuss some of the challenges that need to be addressed to make association analysis-based techniques more applicable for a number of interesting problems in bioinformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proc. SIGMOD, pp. 207–216 (1993)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. VLDB, pp. 487–499 (1994)

    Google Scholar 

  3. Balding, D.: A tutorial on statistical methods for population association studies. Nature Reviews Genetics 7(10), 781 (2006)

    Article  CAS  PubMed  Google Scholar 

  4. Bay, S., Pazzani, M.: Detecting group differences: Mining contrast sets. DMKD 5(3), 213–246 (2001)

    Google Scholar 

  5. Becquet, C., et al.: Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human sage data. Genome Biology 3 (2002)

    Google Scholar 

  6. Bergmann, S., Ihmels, J., Barkai, N.: Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review 67 (2003)

    Google Scholar 

  7. Breitkreutz, B.-J., Stark, C., Tyers, M.: The GRID: the General Repository for Interaction Datasets. Genome Biology 4(3), R23 (2003)

    Article  Google Scholar 

  8. Ceglar, A., Roddick, J.F.: Association mining. ACM Comput. Surv. 38(2), 5 (2006)

    Article  Google Scholar 

  9. Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: Proc. IEEE ICDE, pp. 716–725 (2007)

    Google Scholar 

  10. Cheng, H., Yan, X., Han, J., Yu, P.: Direct mining of discriminative and essential graphical and itemset features via model-based search tree. In: Proc. ACM SIGKDD International Conference, pp. 230–238 (2008)

    Google Scholar 

  11. Cheng, H., Yu, P.S., Han, J.: Ac-close: Efficiently mining approximate closed itemsets by core pattern recovery. In: Proceedings of the 2006 IEEE International Conference on Data Mining, pp. 839–844 (2006)

    Google Scholar 

  12. Cheng, Y., Church, G.: Biclustering of Expression Data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology table of contents, pp. 93–103. AAAI Press, Menlo Park (2000)

    Google Scholar 

  13. Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)

    Article  CAS  PubMed  Google Scholar 

  14. Deane, C.M., Salwinski, L., Xenarios, I., Eisenberg, D.: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 1(5), 349–356 (2002)

    Article  CAS  PubMed  Google Scholar 

  15. Deng, M., Sun, F., Chen, T.: Assessment of the reliability of protein–protein interactions and protein function prediction. In: Pac. Symp. Biocomputing, pp. 140–151 (2003)

    Google Scholar 

  16. Dong, G., Li, J.: Efficient mining of emerging paterns: Discovering trends and differences. In: Proceedings of the 2001 ACM SIGKDD International Conference, pp. 43–52 (1999)

    Google Scholar 

  17. Eisenberg, D., Marcotte, E.M., Xenarios, I., Yeates, T.O.: Protein function in the post-genomic era. Nature 405(6788), 823–826 (2000)

    Article  CAS  PubMed  Google Scholar 

  18. Fan, W., Zhang, K., Cheng, H., Gao, J., Yan, X., Han, J., Yu, P.S., Verscheure, O.: Direct discriminative pattern mining for effective classification. In: Proc. IEEE ICDE, pp. 169–178 (2008)

    Google Scholar 

  19. Gupta, R., Fang, G., Field, B., Steinbach, M., Kumar, V.: Quantitative evaluation of approximate frequent pattern mining algorithms. In: Proceeding of the 14th ACM SIGKDD Conference, pp. 301–309 (2008)

    Google Scholar 

  20. Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery 8(1), 53–87 (2004)

    Article  Google Scholar 

  21. Hart, G.T., Ramani, A.K., Marcotte, E.M.: How complete are current yeast and human protein-interaction networks? Genome. Biol. 7(11), 120 (2006)

    Article  PubMed  PubMed Central  Google Scholar 

  22. Hirschhorn, J.: Genetic Approaches to Studying Common Diseases and Complex Traits. Pediatric Research 57(5 Part 2), 74R (2005)

    Article  Google Scholar 

  23. Klemettinen, M., Mannila, H., Toivonen, H.: Rule Discovery in Telecommunication Alarm Data. J. Network and Systems Management 7(4), 395–423 (1999)

    Article  Google Scholar 

  24. Kuramochi, M., Karypis, G.: An efficient algorithm for discovering frequent subgraphs. IEEE Trans. on Knowl. and Data Eng. 16(9), 1038–1051 (2004)

    Article  Google Scholar 

  25. Legrain, P., Wojcik, J., Gauthier, J.-M.: Protein–protein interaction maps: a lead towards cellular functions. Trends Genet. 17(6), 346–352 (2001)

    Article  CAS  PubMed  Google Scholar 

  26. Lin, C., Jiang, D., Zhang, A.: Prediction of protein function using common-neighbors in protein-protein interaction networks. In: Proc. IEEE Symposium on BionInformatics and BioEngineering (BIBE), pp. 251–260 (2006)

    Google Scholar 

  27. Liu, J., Paulsen, S., Sun, X., Wang, W., Nobel, A., Prins, J.: Mining Approximate Frequent Itemsets In the Presence of Noise: Algorithm and Analysis. In: Proc. SIAM International Conference on Data Mining (2006)

    Google Scholar 

  28. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1(1), 24–45 (2004)

    Article  CAS  Google Scholar 

  29. Martinez, R., Pasquier, N., Pasquier, C.: GenMiner: mining non-redundant association rules from integrated gene expression data and annotations. Bioinformatics 24(22), 2643–2644 (2008)

    Article  CAS  PubMed  Google Scholar 

  30. McIntosh, T., Chawla, S.: High confidence rule mining for microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(4), 611–623 (2007)

    Article  CAS  Google Scholar 

  31. Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(suppl. 1), i1–i9 (2005)

    Google Scholar 

  32. Nelson, M., Kardia, S., Ferrell, R., Sing, C.: A Combinatorial Partitioning Method to Identify Multilocus Genotypic Partitions That Predict Quantitative Trait Variation. Genome Research 11(3), 458–470 (2001)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Nguyen, D.V., Arpat, A.B., Wang, N., Carroll, R.J.: DNA microarray experiments: biological and technological aspects. Biometrics 58(4), 701–717 (2002)

    Article  PubMed  Google Scholar 

  34. Pandey, G., Atluri, G., Steinbach, M., Kumar, V.: Association analysis for real-valued data: Definitions and application to microarray data. Technical Report 08-007, Department of Computer Science and Engineering, University of Minnesota (March 2008)

    Google Scholar 

  35. Pandey, G., Atluri, G., Steinbach, M., Kumar, V.: Association analysis techniques for discovering functional modules from microarray data. Nature Proceedings, Presented at ISMB, SIG Meeting on Automated Function Prediction (2008), http://dx.doi.org/10.1038/npre.2008.2184.1

  36. Pandey, G., Kumar, V., Steinbach, M.: Computational approaches for protein function prediction: A survey. Technical Report 06-028, Department of Computer Science and Engineering, University of Minnesota (October 2006)

    Google Scholar 

  37. Pandey, G., Steinbach, M., Gupta, R., Garg, T., Kumar, V.: Association analysis-based transformations for protein interaction networks: a function prediction case study. In: Proceedings of the 13th ACM SIGKDD International Conference, pp. 540–549 (2007)

    Google Scholar 

  38. Pei, J., Tung, A., Han, J.: Fault-tolerant frequent pattern mining: Problems and challenges. In: Workshop on Research Issues in Data Mining and Knowledge Discovery (2001)

    Google Scholar 

  39. Pereira-Leal, J.B., Enright, A.J., Ouzounis, C.A.: Detection of functional modules from protein interaction networks. Proteins 54(1), 49–57 (2003)

    Article  Google Scholar 

  40. Pfaltz, J., Taylor, C.: Closed set mining of biological data. In: Workshop on Data Mining in Bioinformatics (BIOKDD) (2002)

    Google Scholar 

  41. Pu, S., Ronen, K., Vlasblom, J., Greenblatt, J., Wodak, S.J.: Local coherence in genetic interaction patterns reveals prevalent functional versatility. Bioinformatics 24(20), 2376–2383 (2008)

    Article  CAS  PubMed  Google Scholar 

  42. Ritchie, M., et al.: Multifactordimensionality reduction reveals high-order iteractions among estrogen- metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69(1), 1245–1250 (2001)

    Article  Google Scholar 

  43. Ruepp, A., et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 32(18), 5539–5545 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Salwinski, L., Eisenberg, D.: Computational methods of analysis of protein-protein interactions. Curr. Opin. Struct. Biology 13(3), 377–382 (2003)

    Article  CAS  Google Scholar 

  45. Samanta, M.P., Liang, S.: Predicting protein functions from redundancies in large-scale protein interaction networks. Proc. Natl. Acad Sci. U.S.A. 100(22), 12579–12583 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nature Biotechnology 18(12), 1257–1261 (2000)

    Article  CAS  PubMed  Google Scholar 

  47. Seppanen, J., Mannila, H.: Dense itemsets. In: KDD, pp. 683–688 (2004)

    Google Scholar 

  48. Seshasayee, A.S.N., Babu, M.M.: Contextual inference of protein function. In: Subramaniam, S. (ed.) Encyclopaedia of Genetics and Genomics and Proteomics and Bioinformatics. John Wiley and Sons, Chichester (2005)

    Google Scholar 

  49. Tan, P., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the eighth ACM SIGKDD International Conference, pp. 32–41 (2002)

    Google Scholar 

  50. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2005)

    Google Scholar 

  51. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl. 1), 136–144 (2002)

    Article  Google Scholar 

  52. Tramontano, A.: The Ten Most Wanted Solutions in Protein Bioinformatics. CRC Press, Boca Raton (2005)

    Book  Google Scholar 

  53. van Vliet, M., Klijn, C., Wessels, L., Reinders, M.: Module-based outcome prediction using breast cancer compendia. PLoS ONE 2(10), 1047 (2007)

    Article  Google Scholar 

  54. Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein–protein interaction networks. Nat. Biotechnology 21(6), 697–700 (2003)

    Article  CAS  Google Scholar 

  55. Wang, J., Karypis, G.: Harmony: Efficiently mining the best rules for classification. In: Proceedings of SIAM International Conference on Data Mining, pp. 205–216 (2005)

    Google Scholar 

  56. Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.-M., Eisenberg, D.: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research 30(1), 303–305 (2002)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Xiong, H., He, X., Ding, C., Zhang, Y., Kumar, V., Holbrook, S.R.: Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proc. Pacific Symposium on Biocomputing (PSB), pp. 221–232 (2005)

    Google Scholar 

  58. Xiong, H., Pandey, G., Steinbach, M., Kumar, V.: Enhancing data analysis with noise removal. IEEE Trans. on Knowl. and Data Eng. 18(3), 304–319 (2006)

    Article  Google Scholar 

  59. Xiong, H., Steinbach, M., Kumar, V.: Privacy leakage in multi-relational databases via pattern based semi-supervised learning. In: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 355–356. ACM, New York (2005)

    Google Scholar 

  60. Xiong, H., Steinbach, M., Tan, P., Kumar, V.: HICAP: Hierarchial Clustering with Pattern Preservation. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 279–290 (2004)

    Google Scholar 

  61. Xiong, H., Tan, P.-N., Kumar, V.: Hyperclique pattern discovery. Data Min. Knowl. Discov. 13(2), 219–242 (2006)

    Article  Google Scholar 

  62. Yang, C., Fayyad, U., Bradley, P.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proc. ACM SIGKDD, pp. 194–203 (2001)

    Google Scholar 

  63. Yona, G., Dirks, W., Rahman, S., Lin, D.M.: Effective similarity measures for expression profiles. Bioinformatics 22(13), 1616–1622 (2006)

    Article  CAS  PubMed  Google Scholar 

  64. Zaki, M., Ogihara, M.: Theoretical foundations of association rules. In: 3rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (June 1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Atluri, G., Gupta, R., Fang, G., Pandey, G., Steinbach, M., Kumar, V. (2009). Association Analysis Techniques for Bioinformatics Problems. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00727-9_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00726-2

  • Online ISBN: 978-3-642-00727-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics