A New Approach for Association Rule Mining and Bi-clustering Using Formal Concept Analysis

  • Kartick Chandra Mondal
  • Nicolas Pasquier
  • Anirban Mukhopadhyay
  • Ujjwal Maulik
  • Sanghamitra Bandhopadyay
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7376)

Abstract

Association rule mining and bi-clustering are data mining tasks that have become very popular in many application domains, particularly in bioinformatics. However, to our knowledge, no algorithm was introduced for performing these two tasks in one process. We propose a new approach called FIST for extracting bases of extended association rules and conceptual bi-clusters conjointly. This approach is based on the frequent closed itemsets framework and requires a unique scan of the database. It uses a new suffix tree based data structure to reduce memory usage and improve the extraction efficiency, allowing parallel processing of the tree branches. Experiments conducted to assess its applicability to very large datasets show that FIST memory requirements and execution times are in most cases equivalent to frequent closed itemsets based algorithms and lower than frequent itemsets based algorithms.

Keywords

Association Rules Bi-clustering Closure Lattice Frequent Closed Itemsets Suffix Tree Data Structures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithm for mining association rules in large databases. In: Proc. VLDB, pp. 487–499 (1994)Google Scholar
  2. 2.
    Ceglar, A., Roddick, J.: Association mining. ACM Computing Surveys 38 (2006)Google Scholar
  3. 3.
    Eisen, M., Spellman, P., Brown, P.O., Botstein, D.: Cluster analysis and display of genome wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)CrossRefGoogle Scholar
  4. 4.
    Fu, W., Sanders-Beer, B., Katz, K., Maglott, D., Pruitt, K., Ptak, R.: Human immunodeficiency virus type 1, human protein interaction database at NCBI. Nucleic Acids Research 37, 417–422 (2009)CrossRefGoogle Scholar
  5. 5.
    Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer (1999)Google Scholar
  6. 6.
    Hamrouni, T., Ben Yahia, S., Mephu Nguifo, E.: Succinct System of Minimal Generators: A Thorough Study, Limitations and New Definitions. In: Yahia, S.B., Nguifo, E.M., Belohlavek, R. (eds.) CLA 2006. LNCS (LNAI), vol. 4923, pp. 80–95. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Series in Data Management Systems (2011)Google Scholar
  8. 8.
    Han, J., Pei, J.: Mining frequent patterns by pattern-growth: Methodology and implications. SIGKDD Explor. Newsl. 2(2), 14–20 (2000)CrossRefGoogle Scholar
  9. 9.
    Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinformatics 1, 24–45 (2004)CrossRefGoogle Scholar
  10. 10.
    Madeira, S., Oliveira, A.: A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series. Algorithms for Molecular Biology 4(8) (2009)Google Scholar
  11. 11.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Systems 24(1), 25–46 (1999)CrossRefGoogle Scholar
  12. 12.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Closed sets based discovery of small covers for association rules. Network. and Inf. Systems 3(2), 349–377 (2001)Google Scholar
  13. 13.
    Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. Journal of Intelligent Information Systems 24(1), 29–60 (2005)MATHCrossRefGoogle Scholar
  14. 14.
    Peeters, R.: The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics 131(3), 651–654 (2003)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Ptak, R., Fu, W., Sanders-Beer, B., Dickerson, J., Pinney, J., Robertson, D., Rozanov, M., Katz, K., Maglott, D., Pruitt, K., Dieffenbach, C.: Cataloguing the HIV-1 human protein interaction network. AIDS Research and Human Retroviruses 4(12), 1497–1502 (2008)CrossRefGoogle Scholar
  16. 16.
    Shekofteh, M.: A survey of algorithms in FCIM. In: Proc. DSDE, pp. 29–33 (2010)Google Scholar
  17. 17.
    Yahia, S.B., Hamrouni, T., Nguifo, E.M.: Frequent closed itemset based algorithms: A thorough structural and analytical survey. SIGKDD Explorations 8, 93–104 (2006)CrossRefGoogle Scholar
  18. 18.
    Zaki, M.J.: Generating non-redundant association rules. In: Proc. SIGKDD, pp. 34–43 (2000)Google Scholar
  19. 19.
    Zaki, M.J., Hsiao, C.J.: CHARM: An efficient algorithm for closed itemset mining. In: Proc. SIAM, pp. 457–473 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kartick Chandra Mondal
    • 1
  • Nicolas Pasquier
    • 1
  • Anirban Mukhopadhyay
    • 2
  • Ujjwal Maulik
    • 3
  • Sanghamitra Bandhopadyay
    • 4
  1. 1.Laboratoire I3S (CNRS UMR-7271)Université de Nice Sophia-AntipolisFrance
  2. 2.Department of Computer Science and EngineeringUniversity of KalyaniIndia
  3. 3.Department of Comupter Science and EngineeringUniversity of JadavpurIndia
  4. 4.Machine Intelligent UnitIndian Statistical InstituteKolkataIndia

Personalised recommendations