Abstract
Today’s world has seen a massive explosion in various kinds of data having some unique characteristics such as high-dimensionality and heterogeneity. The need of automated data driven techniques has become a necessity to extract useful information from this huge and diverse data sets. Data mining is an important step in the process of knowledge discovery in databases (KDD) and focuses on discovering hidden information in data that go beyond simple analysis. Traditional data mining methods are often found inefficient and unsuitable in analyzing today’s data sets due to their heterogeneity, massive size and high-dimensionality. So, the need of parallelization of traditional data mining algorithms has almost become inevitable but challenging considering available hardware and software solutions. The main objective of this paper is to look at the need and limitations of parallelization of data mining algorithms and finding ways to achieve the best. In this comparative study, we took a look at different parallel computer architectures, well proven parallelization methods, and programming language of choice.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
M. Stonebraker, R. Agrawal, U. Dayal, E.J. Neuhold and A. Reuterr DBMS Research at a Crossroads: The Vienna Update. 1993: In Proc. of the 19th VLDB Conference, Dublin, Ireland
M.S. Chen, J. Han and P.S. Yu Data mining: An overview from database perspective. December 1996: IEEE Transactions on Knowledge and Data Eng.
R. Agrawal, T. Imielinski and A. Swami Mining association rules between sets of items in large databases. 1993: In Proc. of 1993 ACM-SIGMOD Int. Conf. on Management of Data, Washington, D.C.
A.A. Freitas and S.H. Lavington Mining Very Large Databases with Parallel Processing. 1998: Kluwer Academic Publishers
M.J. Zaki Parallel and Distributed Association Mining: A Survey Rensselaer Polytechnic Institute
F. Stahl, M.M. Gaber and M. Bramer Scaling up Data Mining Techniques to Large Datasets Using Parallel and Distributed Processing
S. Paul Parallel and Distributed Data Mining Karunya University, Coimbatore, India
T.G. Lewis Data parallel computing: an alternative for the 1990s. September 1991: IEEE Computer, 24(9)
W.D. Hillis and L. Steele Jr. Data parallel algorithms. December 1986: Comm. ACM, 29(12)
A. Kaminsky Parallel Programing in Java. Presented at the CCSCNE 2007 Conference April 20, 2007
L. F. Lau, A. L. Ananda, G. Tan, W. F. Wong JAVM: Internet-based Parallel Computing Using Java. School of Computing, National University of Singapore
E.R. Harold Java Network Programming. 1997: O’Reilly and Associates
U. Banerjee, R. Eigenmann, A. Nicolau, D. Padua Automatic program parallelization. Proceedings of the IEEE 81(2), 211–243 (1993)
B.J. Bradel, T.S. Abdelrahman Automatic Trace-Based Parallelization of Java Programs. Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada M5S 3G4
B. Chan and T.S. Abdelrahman Run-Time Support for the Automatic Parallelization of Java Programs. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada M5S 3G4
J. Rafael, I. Correia, A. Fonseca, B. Cabral Dependency-Based Automatic Parallelization of Java Applications. University of Coimbra, Portugal
R.D. Blumofe, C.E. Leiserson Scheduling Multithreaded Computations by Work Stealing. J. ACM 46(5), 720–748 (1999)
Y. Sun, W. Zhang On-line Trace Based Automatic Parallelization of Java Programs on Multicore Platforms. Department of ECE, Virginia Commonwealth University
P.A. Felber Semi-Automatic Parallelization of Java Applications. Institut EURECOM 06904 Sophia Antipolis, France
S.C. Mller, G. Alonso, A. Amara, A. Csillaghy Pydron: Semi-Automatic Parallelization for Multi-Core and the Cloud. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, October 68, 2014, Broomfield, CO
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Singapore
About this paper
Cite this paper
Mondal, K.C., Sayan Bhattacharya, Anindita Sarkar (2017). Comparative Study of Parallelism on Data Mining. In: Mandal, J., Satapathy, S., Sanyal, M., Bhateja, V. (eds) Proceedings of the First International Conference on Intelligent Computing and Communication. Advances in Intelligent Systems and Computing, vol 458. Springer, Singapore. https://doi.org/10.1007/978-981-10-2035-3_21
Download citation
DOI: https://doi.org/10.1007/978-981-10-2035-3_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2034-6
Online ISBN: 978-981-10-2035-3
eBook Packages: EngineeringEngineering (R0)