Advertisement

Mining Databases and Data Streams with Query Languages and Rules

  • Carlo Zaniolo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3933)

Abstract

Among data-intensive applications that are beyond the reach of traditional Data Base Management Systems (DBMS), data mining stands out because of practical importance and the complexity of the research problems that must be solved before the vision of Inductive DBMS can become a reality. In this paper, we first discuss technical developments that have occurred since the very notion of Inductive DBMS emerged as a result of the seminal papers authored by Imielinski and Mannila a decade ago. The research progress achieved since then can be subdivided into three main problem subareas as follows: (i) language (ii) optimization, and (iii) representation. We discuss the problems in these three areas and the different approaches to Inductive DBMS that are made possible by recent technical advances. Then, we pursue a language-centric solution, and introduce simple SQL extensions that have proven very effective at supporting data mining. Finally, we turn our attention to the related problem of supporting data stream mining using Data Stream Management Systems (DSMS) and introduce the notion of Inductive DSMS. In addition to continuous query languages, DSMS provide support for synopses, sampling, load shedding, and other built-in functions that are needed for data stream mining. Moreover, we show that Inductive DSMS can be achieved by generalizing DSMS to assure that their continuous query languages support efficiently data stream mining applications. Thus, DSMS extended with inductive capabilities will provide a uniquely supportive environment for data stream mining applications.

Keywords

Data Stream Association Rule Query Language Association Rule Mining Query Optimization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Imielinski, T.: A database perspective on knowledge discovery. In: The First International Conference on Knowledge Discovery and Data Mining (KDD 1995) (1995)Google Scholar
  2. 2.
    Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communication ACM 39(11), 58–64 (1996)CrossRefGoogle Scholar
  3. 3.
    Imielinski, T., Virmani, A.: MSQL: a query language for database mining. Data Mining and Knowledge Discovery 3, 373–408 (1999)CrossRefGoogle Scholar
  4. 4.
    Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.R.: DMQL: A data mining query language for relational databases. In: Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Montreal, Canada, pp. 27–33 (June 1996)Google Scholar
  5. 5.
    Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: VLDB, Bombay, India, pp. 122–133 (1996)Google Scholar
  6. 6.
    Botta, M., Boulicaut, J.-F., Masson, C., Meo, R.: Query languages supporting descriptive rule mining: A comparative study. In: Database Support for Data Mining Applications, pp. 24–51 (2004)Google Scholar
  7. 7.
    Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Examiner: Optimized level-wise frequent pattern mining with monotone constraint. In: ICDM, pp. 11–18 (2003)Google Scholar
  8. 8.
    Lee, S.D., De Raedt, L.: An algebra for inductive query evaluation. In: ICDM, pp. 147–154 (2003)Google Scholar
  9. 9.
    Bonchi, F., Lucchese, C.: Pushing tougher constraints in frequent pattern mining. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 114–124. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Jeudy, B., Boulicaut, J.-F.: Constraint-based discovery and inductive queries: Application to association rule mining. In: Pattern Detection and Discovery, pp. 110–124 (2002)Google Scholar
  11. 11.
    IBM. Db2 intelligent miner, http://www-306.ibm.com/software/data/iminer
  12. 12.
    ORACLE. Oracle data miner release 10gr2, http://www.oracle.com/technology/products/bi/odm
  13. 13.
    Tang, Z., Maclennan, J., Kim, P.P.: Building data mining solutions with ole db for dm and xml for analysis. SIGMOD Record 34(2), 80–85 (2005)CrossRefGoogle Scholar
  14. 14.
    Data Mining Group (DMG). Predictive model markup language (pmml), http://sourceforge.net/projects/pmml
  15. 15.
    Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. In: SIGMOD (1998)Google Scholar
  16. 16.
    Siebes, A.: Where is the mining in kdid (invited talk). In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Law, Y.-N., Wang, H., Zaniolo, C.: Data models and query language for data streams. In: VLDB, pp. 492–503 (2004)Google Scholar
  18. 18.
    Wang, H., Zaniolo, C.: Atlas: a native extension of sql for data minining. In: Proceedings of Third SIAM Int. Conference on Data Mining, pp. 130–141 (2003)Google Scholar
  19. 19.
    Weka 3—data mining with open source machine learning software in java, http://www.cs.waikato.ac.nz
  20. 20.
    Johnson, T., Lakshmanan, L.V.S., Ng, R.T.: The 3w model and algebra for unified data mining. In: VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, pp. 21–32. Morgan Kaufmann, San Francisco (2000)Google Scholar
  21. 21.
    Babcock, B., Babu, S., Datar, M., Motawani, R., Widom, J.: Models and issues in data stream systems. In: PODS (2002)Google Scholar
  22. 22.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: SIGKDD, pp. 97–106. ACM Press, San Francisco (2001)Google Scholar
  23. 23.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD, pp. 226–235 (2003)Google Scholar
  24. 24.
    Chu, F., Wang, Y., Zaniolo, C.: An adaptive learning approach for noisy data streams. In: ICDM, pp. 351–354 (2004)Google Scholar
  25. 25.
    Golab, L., Ozsu, M.T.: Issues in data stream management. ACM SIGMOD Record 32(2), 5–14 (2003)CrossRefGoogle Scholar
  26. 26.
    Johnson, T., Muthukrishnan, S., Rozenbaum, I.: Sampling algorithms in a stream operator. In: SIGMOD Conference, pp. 1–12 (2005)Google Scholar
  27. 27.
    Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. VLDB Journal, 12(2), 120–139 (2003)CrossRefGoogle Scholar
  28. 28.
    Cranor, C., Gao, Y., Johnson, T., Shkapenyuk, V., Spatscheck, O.: Gigascope: High performance network monitoring with an sql interface. In: SIGMOD, p. 623. ACM Press, New York (2002)Google Scholar
  29. 29.
    Arasu, A., Babu, S., Widom, J.: Cql: A language for continuous queries over streams and relations. In: Lausen, G., Suciu, D. (eds.) DBPL 2003. LNCS, vol. 2921, pp. 1–19. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  30. 30.
    Gaber, M.M., Zaslavsky, A.B., Krishnaswamy, S.: Mining data streams: a review. SIGMOD Record 34(2), 18–26 (2005)CrossRefMATHGoogle Scholar
  31. 31.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng., 15(3), 515–528 (2003)CrossRefGoogle Scholar
  32. 32.
    Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) VLDB 1996, Proceedings of 22th International Conference on Very Large Data Bases, Mumbai (Bombay), India, September 3-6, pp. 134–145. Morgan Kaufmann, San Francisco (1996)Google Scholar
  33. 33.
    Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connect. Sci. 8(3), 385–404 (1996)CrossRefGoogle Scholar
  34. 34.
    Law, Y.-N., Zaniolo, C.: Improving the accuracy of continuous aggregates and mining queries (2005) (Submitted for Publication)Google Scholar
  35. 35.
    Tatbul, N., Çetintemel, U., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: VLDB, pp. 309–320 (2003)Google Scholar
  36. 36.
    Ahmad, Y., Berg, B., Çetintemel, U., Humphrey, M., Hwang, J.-H., Jhingran, A., Maskey, A., Papaemmanouil, O., Rasin, A., Tatbul, N., Xing, W., Xing, Y., Zdonik, S.B.: Distributed operation in the borealis stream processing engine. In: SIGMOD Conference, pp. 882–884 (2005)Google Scholar
  37. 37.
  38. 38.
    Luo, C., Thakkar, H., Wang, H., Zaniolo, C.: A native extension of sql for mining data streams, pp. 873–875 (2005)Google Scholar
  39. 39.
    Kriegel, H.-P., Ester, M., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, pp. 226–231 (1996)Google Scholar
  40. 40.
    Bai, Y., Chang, L., Thakkar, H., Zhou, X., Zaniolo, C.: Efficient support for time series queries in data stream management systems. In: Chaudhry, K.S.N., Abdelguerfi, M. (eds.) Stream Data Management, ch. 6. Kluwer Academic Publishers, Dordrecht (2005)Google Scholar
  41. 41.
    Zhou, X., Thakkar, H., Zaniolo, C.: Unifying the processing of xml streams and relational data streams. In: The 22nd International Conference on Data Engineering Atlanta, GA, April 3-7 (2006)Google Scholar
  42. 42.
    Tang, Z., Maclennan, J., Kim, P(P.): Building data mining solutions with ole db for dm and xml for analysis. SIGMOD Record 34(2), 80–85 (2005)CrossRefGoogle Scholar
  43. 43.
  44. 44.
    Giannotti, F., Manco, G., Pedreschi, D., Turini, F.: Experiences with a logicbased knowledge discovery support environment. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD) (1999)Google Scholar
  45. 45.
    Giannotti, F., Manco, G., Pedreschi, D., Turini, F.: Experiences with a logic-based knowledge discovery support environment. In: AI*IA, pp. 202–213 (1999)Google Scholar
  46. 46.
    Arni, F., Ong, K., Tsur, S., Wang, H., Zaniolo, C.: The deductive database system ldl++. TPLP 3(1), 61–94 (2003)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Carlo Zaniolo
    • 1
  1. 1.Computer Science DepartmentUCLALos AngelesUSA

Personalised recommendations