Advertisement

Efficiently Mining Approximate Models of Associations in Evolving Databases

  • Adriano Veloso
  • Bruno Gusmão
  • Wagner MeiraJr.
  • Marcio Carvalho
  • Srini Parthasarathy
  • Mohammed Zaki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2431)

Abstract

Much of the existing work in machine learning and data mining has relied on devising efficient techniques to build accurate models from the data. Research on how the accuracyof a model changes as a function of dynamic updates to the databases is very limited. In this work we show that extracting this information: knowing which aspects of the model are changing; and how theyare changing as a function of data updates; can be verye effective for interactive data mining purposes (where response time is often more important than model qualityas long as model qualityi s not too far off the best (exact) model.

In this paper we consider the problem of generating approximate models within the context of association mining, a keyda ta mining task. We propose a new approach to incrementallyg enerate approximate models of associations in evolving databases. Our approach is able to detect how patterns evolve over time (an interesting result in its own right), and uses this information in generating approximate models with high accuracy at a fraction of the cost (of generating the exact model). Extensive experimental evaluation on real databases demonstrates the effectiveness and advantages of the proposed approach.

References

  1. 1.
    R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20 th Int’l Conf. on Very Large Databases, San Tiago, Chile, June 1994.Google Scholar
  2. 2.
    D. Cheung, J. Han, V. Ng, and C. Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. In Proc. of the 12 th Intl. Conf. on Data Engineering, February 1996.Google Scholar
  3. 3.
    D. Cheung, K. Hu, and S. Xia. Asynchronous parallel algorithm for mining association rules on a shared-memorym ultipprocessors. In ACM Symposium on Parallel Algorithms and Architectures, pages 279–288, 1998.Google Scholar
  4. 4.
    D. Cheung, S. Lee, and B. Kao. A general incremental technique for maintaining discovered association rules. In Proc. of the 5 th Intl. Conf. on Database Systems for Advanced Applications, pages 1–4, April 1997.Google Scholar
  5. 5.
    V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Mining and monitoring evolving data. In Proc. of the 16 th Int’l Conf. on Data Engineering, pages 439–448, San Diego, USA, May 2000.Google Scholar
  6. 6.
    K. Gouda and M. Zaki. Efficientlymining maximal frequent itemsets. In Proc. of the 1 st IEEE Int’l Conference on Data Mining, San Jose, USA, November 2001.Google Scholar
  7. 7.
    J. Han, H. Jamil, Y. Lu, L. Chen, Y. Liao, and J. Pei. Dna-miner: A system prototype for mining dna sequences. In Proc. of the 2001 ACM-SIGMOD Int’l. Conf. on Management of Data, Santa Barbara, CA, May 2001.Google Scholar
  8. 8.
    J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, May 2000.Google Scholar
  9. 9.
    C. Kamath. On mining scientific datasets. In et al R. L. Grossman, editor, Data Mining for Scientific and Engineering Applications, pages 1–21. Kluwer Academic Publishers, 2001.Google Scholar
  10. 10.
    S. Lee and D. Cheung. Maintenance of discovered association rules: When to update? In Research Issues on Data Mining and Knowledge Discovery, page March, 1997.Google Scholar
  11. 11.
    H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. In Technical Report TR C-1997-8, U. of Helsinki, January 1997.Google Scholar
  12. 12.
    S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. ACM Confereince on Information and Knowledge Management (CIKM), Mar 1999.Google Scholar
  13. 13.
    S. Parthasarathy, M. Zaki, M. Ogihara, and W. Li. Parallel data mining for association rules on shared-memorysy stems. In Knowledge and Information Systems, Santa Barbara, CA, February 2001.Google Scholar
  14. 14.
    M. Rajman and R. Besan. Text mining-knowledge extraction from unstructured textual data. In Proc. of the 6 th Int'l Conf. Federation of Classication Societies, pages 473–480, Roma, Italy, 1998.Google Scholar
  15. 15.
    S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka. An efficient algorithm for the incremental updation of association rules. In Proc. of the 3 rd Int’l Conf. on Knowledge Discovery and Data Mining, August 1997.Google Scholar
  16. 16.
    A. Veloso, W. Meira Jr., M. B. de Carvalho, B. Pôssas, S. Parthasarathy, and M. Zaki. Mining frequent itemsets in evolving databases. In Proc. of the 2 nd SIAM Int’l Conf. on Data Mining, Arlington, USA, May 2002.Google Scholar
  17. 17.
    A. Veloso, B. Rocha, W. Meira Jr., and M. de Carvalho. Real world association rule mining. In Proc. of the 19th British National Conf. on Databases (to appear), July 2002.Google Scholar
  18. 18.
    M. Zaki and C. Hsiao. Charm: An efficient algorithm for closed itemset mining. In Proc. of the 2nd SIAM Int'l Conf. on Data Mining, Arlington, USA, May 2002.Google Scholar
  19. 19.
    M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. of 3 rd Int’l Conf. Knowledge Discovery and Data Mining, August 1997.Google Scholar
  20. 20.
    M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New parallel algorithms for fast discoveryof association rules. Data Mining and Knowledge Discovery: An International Journal, 4(1):343–373, December 1997.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Adriano Veloso
    • 1
  • Bruno Gusmão
    • 1
  • Wagner MeiraJr.
    • 1
  • Marcio Carvalho
    • 1
  • Srini Parthasarathy
    • 2
  • Mohammed Zaki
    • 3
  1. 1.Computer Science DepartmentUniversidade Federal de Minas GeraisBrazil
  2. 2.Department of Computer and Information ScienceThe Ohio-State UniversityUSA
  3. 3.Computer Science DepartmentRensselaer Polytechnic InstituteUSA

Personalised recommendations