Abstract
Knowledge discovery in databases or data mining is the semi-automated analysis of large volumes of data, looking for the relationships and knowledge that are implicit in large volumes of data and are ’interesting’ in the sense of impacting an organization’s practice. Data mining and knowledge discovery on large amounts of data can benefit of the use of parallel computers both to improve performance and quality of data selection. This paper presents and discusses different forms of parallelism that can be exploited in data mining techniques and algorithms. For the main data mining techniques, such as rule induction, clustering algorithms, decision trees, genetic algorithms, and neural networks, the possible ways to exploit parallelism are presented and discussed in detail. Finally, some promising research directions in the parallel data mining research area are outlined.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules, Proc. of the 20th Int’l Conference on Very Large Databases, Santiago, Chile, 1994.
R. Agrawal and J.C. Shafer, Parallel Mining of Association Rules, IEEE Transactions on Knowledge and Data Engineering, 8, 1996.
M.J.A. Berry and G. Linoff, Data Mining Techniques for Marketing, Sales, and Customer Support, Wiley Computer Publishing, 1997.
J.P. Bigus, Data Mining with Neural Networks, McGraw-Hill, New York, 1996.
M. Bruynooghe, Parallel Implementation of Fast Clustering Algorithms, Proc. Int. Symp. On High Performance Computing, pp. 65–78, 1989.
M. Cannataro, D. Talia and P. Trunfio, KNOWLEDGE GRID: High Performance Knowledge Discovery Services on the Grid, Proc. 2nd Int. Workshop GRID 2001, Denver, CO, LNCS 2242, Springer-Verlag, pp. 38–50, November 2001.
D. Foti, D. Lipari, C. Pizzuti and D. Talia, Scalable Parallel Clustering for Data Mining on Multicomputers, Proc. of the 3rd Int. Workshop on High Performance Data Mining HPDM00-IPDPS, Cancun, LNCS 1800, pp. 390–398, Springer-Verlag, 2000.
A.A. Freitas and S.H. Lavington, Mining Very Large Database with Parallel Processing, Kluwer Academic Publishers, 1998.
E.-H. Han, G. Karypis and V. Kumar, Scalable Parallel Data Mining for Association Rules, IEEE Transactions on Knowledge and Data Engineering, 1999.
D. Judd, K. McKinley and A.K. Jain, Large-Scale Parallel Data Clustering, Proc. Int. Conf. On Pattern Recognition, Vienna, 1996.
R. Kufrin, Generating C4.5 Production Rules in Parallel, Proc. 14th Nat. Conf. on Artificial Intelligence-AAAI-97, AAAI Press, 1997.
X. Li and Z. Fang, Parallel Clustering Algorithms, Parallel Computing, 11, pp. 275–290, 1989.
F. Neri and A. Giordana, A Parallel Genetic Algorithm for Concept Learning, Proc. 6th Int. Conf. Genetic Algorithms, pp. 436–443, 1995.
C.F. Olson, Parallel Algorithms for Hierarchical Clustering, Parallel Computing, 21, pp. 1313–1325, 1995.
R.A. Pearson, A Coarse-grained Parallel Induction Heuristic, in: H. Kitano, V. Kumar, C.B. Suttner (Eds.), Parallel Processing for Artificial Intelligence 2, Elsevier Science, pp. 207–226, 1994.
J. Shafer, R. Agrawal and M. Mehta, SPRINT: A Scalable Parallel Classifier for Data Mining, Proc. 22nd Int. Conf. Very Large Databases-VLDB-96, Bombay, 1996.
D. Skillicorn, Strategies for Parallel Data Mining, IEEE Concurrency, 7:4, pp. 26–35, 1999.
M.J. Zaki, Parallel and Distributed Association Mining: A Survey, IEEE Concurrency, 7:4, pp. 14–25, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Talia, D. (2002). Parallelism in Knowledge Discovery Techniques. In: Fagerholm, J., Haataja, J., Järvinen, J., Lyly, M., Råback, P., Savolainen, V. (eds) Applied Parallel Computing. PARA 2002. Lecture Notes in Computer Science, vol 2367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48051-X_14
Download citation
DOI: https://doi.org/10.1007/3-540-48051-X_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43786-4
Online ISBN: 978-3-540-48051-8
eBook Packages: Springer Book Archive