Parallel data mining for very large relational databases

Freitas, Alex Alves; Lavington, Simon H.

doi:10.1007/3-540-61142-8_542

Parallel data mining for very large relational databases

Alex Alves Freitas¹ &
Simon H. Lavington¹

Conference paper
First Online: 01 January 2005

169 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1067))

Abstract

Data mining, or Knowledge Discovery in Databases (KDD), is of little benefit to commercial enterprises unless it can be carried out efficiently on realistic volumes of data. Operational factors also dictate that KDD should be performed within the context of standard DBMS. Fortunately, relational DBMS have a declarative query interface (SQL) that has allowed designers of parallel hardware to exploit data parallelism efficiently. Thus, an effective approach to the problem of efficient KDD consists of arranging that KDD tasks execute on a parallel SQL server. In this paper we devise generic KDD primitives, map these to SQL and present some results of running these primitives on a commercially-available parallel SQL server.

Supported by Brazilian government's CNPq, grant number 200384/93-7.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal et al. An interval classifier for database mining applications. Proc. 18th Int. Conf. Very Large Databases, 560–573. Vancouver, 1992.
Google Scholar
M.P. Burwen. The White Cross parallel database servers. The Superperformance Computing Service. Product/Technology Review No. 145. (Available from 2685 Marine Way, Suite 1212, Mountain View, CA, USA, 94043.)
Google Scholar
M. Holsheimer and A. Siebes. Data mining: the search for knowledge in databases. Report CS-R9406. Amsterdam, The Netherlands: CWI, 1994.
Google Scholar
M. Houtsma and A. Swami. Set-oriented mining for association rules in relational databases. Proc. IEEE Int. Conf. Data Engineering, 1995.
Google Scholar
IBC Ltd. Proc. Conf. on Commercial Parallel Processing, London, Oct. 1995. (Available from IBC Technical Services Ltd., 57-61 Mortimer Street, London.)
Google Scholar
D. Michie et al. (Ed.) Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood, 1995.
Google Scholar
G. Piatetsky-Shapiro and W.J. Frawley (Eds.) Knowledge Discovery in Databases. Menlo Park, CA: AAAI, 1991.
Google Scholar
F.J. Provost & J.M. Aronis. Scaling up inductive learning with massive parallelism. To appear in Machine Learning.
Google Scholar
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
Google Scholar
M. Richeldi and M. Rossotto. Class-Driven statistical discretization of continuous attributes. Proc. 8th ECML-95. LNAI-912, 335–338.
Google Scholar
C. Schaffer. A conservation law for generalization performance. Proc. 11th Int. Conf. Machine Learning, 259–265, 1994.
Google Scholar
A. Shatdal and J.F. Naughton. Adaptive parallel aggregation algorithms. Proc. 1995 ACM SIGMOD Int. Conf. Management of Data, 104–114.
Google Scholar
S.J. Stolfo et al. A parallel and distributed environment for database rule processing: open problems and future directions. In: M. Abdelguerfi & S. Lavington (Ed.) Emerging Trends in Database and Knowledge-Base Machine. IEEE Computer Science Press, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Essex, CO4 3SQ, Wivenhoe Park, Colchester, UK
Alex Alves Freitas & Simon H. Lavington

Authors

Alex Alves Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Simon H. Lavington
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Heather Liddell Adrian Colbrook Bob Hertzberger Peter Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Freitas, A.A., Lavington, S.H. (1996). Parallel data mining for very large relational databases. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds) High-Performance Computing and Networking. HPCN-Europe 1996. Lecture Notes in Computer Science, vol 1067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61142-8_542

Download citation

DOI: https://doi.org/10.1007/3-540-61142-8_542
Published: 18 August 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61142-4
Online ISBN: 978-3-540-49955-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics