Comparative Study of Parallelism on Data Mining

Mondal, Kartick Chandra; Sayan Bhattacharya; Anindita Sarkar

doi:10.1007/978-981-10-2035-3_21

Comparative Study of Parallelism on Data Mining

Kartick Chandra Mondal¹⁷,
Sayan Bhattacharya¹⁷ &
Anindita Sarkar¹⁸

Conference paper
First Online: 23 November 2016

932 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 458))

Abstract

Today’s world has seen a massive explosion in various kinds of data having some unique characteristics such as high-dimensionality and heterogeneity. The need of automated data driven techniques has become a necessity to extract useful information from this huge and diverse data sets. Data mining is an important step in the process of knowledge discovery in databases (KDD) and focuses on discovering hidden information in data that go beyond simple analysis. Traditional data mining methods are often found inefficient and unsuitable in analyzing today’s data sets due to their heterogeneity, massive size and high-dimensionality. So, the need of parallelization of traditional data mining algorithms has almost become inevitable but challenging considering available hardware and software solutions. The main objective of this paper is to look at the need and limitations of parallelization of data mining algorithms and finding ways to achieve the best. In this comparative study, we took a look at different parallel computer architectures, well proven parallelization methods, and programming language of choice.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

M. Stonebraker, R. Agrawal, U. Dayal, E.J. Neuhold and A. Reuterr DBMS Research at a Crossroads: The Vienna Update. 1993: In Proc. of the 19th VLDB Conference, Dublin, Ireland
Google Scholar
M.S. Chen, J. Han and P.S. Yu Data mining: An overview from database perspective. December 1996: IEEE Transactions on Knowledge and Data Eng.
Google Scholar
R. Agrawal, T. Imielinski and A. Swami Mining association rules between sets of items in large databases. 1993: In Proc. of 1993 ACM-SIGMOD Int. Conf. on Management of Data, Washington, D.C.
Google Scholar
A.A. Freitas and S.H. Lavington Mining Very Large Databases with Parallel Processing. 1998: Kluwer Academic Publishers
Google Scholar
M.J. Zaki Parallel and Distributed Association Mining: A Survey Rensselaer Polytechnic Institute
Google Scholar
F. Stahl, M.M. Gaber and M. Bramer Scaling up Data Mining Techniques to Large Datasets Using Parallel and Distributed Processing
Google Scholar
S. Paul Parallel and Distributed Data Mining Karunya University, Coimbatore, India
Google Scholar
T.G. Lewis Data parallel computing: an alternative for the 1990s. September 1991: IEEE Computer, 24(9)
Google Scholar
W.D. Hillis and L. Steele Jr. Data parallel algorithms. December 1986: Comm. ACM, 29(12)
Google Scholar
A. Kaminsky Parallel Programing in Java. Presented at the CCSCNE 2007 Conference April 20, 2007
Google Scholar
L. F. Lau, A. L. Ananda, G. Tan, W. F. Wong JAVM: Internet-based Parallel Computing Using Java. School of Computing, National University of Singapore
Google Scholar
E.R. Harold Java Network Programming. 1997: O’Reilly and Associates
Google Scholar
U. Banerjee, R. Eigenmann, A. Nicolau, D. Padua Automatic program parallelization. Proceedings of the IEEE 81(2), 211–243 (1993)
Google Scholar
B.J. Bradel, T.S. Abdelrahman Automatic Trace-Based Parallelization of Java Programs. Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada M5S 3G4
Google Scholar
B. Chan and T.S. Abdelrahman Run-Time Support for the Automatic Parallelization of Java Programs. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada M5S 3G4
Google Scholar
J. Rafael, I. Correia, A. Fonseca, B. Cabral Dependency-Based Automatic Parallelization of Java Applications. University of Coimbra, Portugal
Google Scholar
R.D. Blumofe, C.E. Leiserson Scheduling Multithreaded Computations by Work Stealing. J. ACM 46(5), 720–748 (1999)
Google Scholar
Y. Sun, W. Zhang On-line Trace Based Automatic Parallelization of Java Programs on Multicore Platforms. Department of ECE, Virginia Commonwealth University
Google Scholar
P.A. Felber Semi-Automatic Parallelization of Java Applications. Institut EURECOM 06904 Sophia Antipolis, France
Google Scholar
S.C. Mller, G. Alonso, A. Amara, A. Csillaghy Pydron: Semi-Automatic Parallelization for Multi-Core and the Cloud. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, October 68, 2014, Broomfield, CO
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Jadavpur University, Kolkata, India
Kartick Chandra Mondal & Sayan Bhattacharya
School of Mobile Computing, Jadavpur University, Kolkata, India
Anindita Sarkar

Authors

Kartick Chandra Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Anindita Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kartick Chandra Mondal .

Editor information

Editors and Affiliations

Department of CSE, University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Department of Computer Science & Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, Andhra Pradesh, India
Suresh Chandra Satapathy
J K Mandal, Dept of CSE, University of Kalyani, Kalyani, West Bengal, India
Manas Kumar Sanyal
Department of ECE, Sri Ramswaroop Memorial College of Engineering and Management Lucknow, Lucknow, Uttar Pradesh, India
Vikrant Bhateja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mondal, K.C., Sayan Bhattacharya, Anindita Sarkar (2017). Comparative Study of Parallelism on Data Mining. In: Mandal, J., Satapathy, S., Sanyal, M., Bhateja, V. (eds) Proceedings of the First International Conference on Intelligent Computing and Communication. Advances in Intelligent Systems and Computing, vol 458. Springer, Singapore. https://doi.org/10.1007/978-981-10-2035-3_21

Download citation

DOI: https://doi.org/10.1007/978-981-10-2035-3_21
Published: 23 November 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2034-6
Online ISBN: 978-981-10-2035-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics