Skip to main content

Parallel data mining for very large relational databases

  • Conference paper
  • First Online:
  • 169 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1067))

Abstract

Data mining, or Knowledge Discovery in Databases (KDD), is of little benefit to commercial enterprises unless it can be carried out efficiently on realistic volumes of data. Operational factors also dictate that KDD should be performed within the context of standard DBMS. Fortunately, relational DBMS have a declarative query interface (SQL) that has allowed designers of parallel hardware to exploit data parallelism efficiently. Thus, an effective approach to the problem of efficient KDD consists of arranging that KDD tasks execute on a parallel SQL server. In this paper we devise generic KDD primitives, map these to SQL and present some results of running these primitives on a commercially-available parallel SQL server.

Supported by Brazilian government's CNPq, grant number 200384/93-7.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal et al. An interval classifier for database mining applications. Proc. 18th Int. Conf. Very Large Databases, 560–573. Vancouver, 1992.

    Google Scholar 

  2. M.P. Burwen. The White Cross parallel database servers. The Superperformance Computing Service. Product/Technology Review No. 145. (Available from 2685 Marine Way, Suite 1212, Mountain View, CA, USA, 94043.)

    Google Scholar 

  3. M. Holsheimer and A. Siebes. Data mining: the search for knowledge in databases. Report CS-R9406. Amsterdam, The Netherlands: CWI, 1994.

    Google Scholar 

  4. M. Houtsma and A. Swami. Set-oriented mining for association rules in relational databases. Proc. IEEE Int. Conf. Data Engineering, 1995.

    Google Scholar 

  5. IBC Ltd. Proc. Conf. on Commercial Parallel Processing, London, Oct. 1995. (Available from IBC Technical Services Ltd., 57-61 Mortimer Street, London.)

    Google Scholar 

  6. D. Michie et al. (Ed.) Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood, 1995.

    Google Scholar 

  7. G. Piatetsky-Shapiro and W.J. Frawley (Eds.) Knowledge Discovery in Databases. Menlo Park, CA: AAAI, 1991.

    Google Scholar 

  8. F.J. Provost & J.M. Aronis. Scaling up inductive learning with massive parallelism. To appear in Machine Learning.

    Google Scholar 

  9. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

    Google Scholar 

  10. M. Richeldi and M. Rossotto. Class-Driven statistical discretization of continuous attributes. Proc. 8th ECML-95. LNAI-912, 335–338.

    Google Scholar 

  11. C. Schaffer. A conservation law for generalization performance. Proc. 11th Int. Conf. Machine Learning, 259–265, 1994.

    Google Scholar 

  12. A. Shatdal and J.F. Naughton. Adaptive parallel aggregation algorithms. Proc. 1995 ACM SIGMOD Int. Conf. Management of Data, 104–114.

    Google Scholar 

  13. S.J. Stolfo et al. A parallel and distributed environment for database rule processing: open problems and future directions. In: M. Abdelguerfi & S. Lavington (Ed.) Emerging Trends in Database and Knowledge-Base Machine. IEEE Computer Science Press, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Heather Liddell Adrian Colbrook Bob Hertzberger Peter Sloot

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Freitas, A.A., Lavington, S.H. (1996). Parallel data mining for very large relational databases. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds) High-Performance Computing and Networking. HPCN-Europe 1996. Lecture Notes in Computer Science, vol 1067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61142-8_542

Download citation

  • DOI: https://doi.org/10.1007/3-540-61142-8_542

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61142-4

  • Online ISBN: 978-3-540-49955-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics