Skip to main content

Comparative Study of Parallelism on Data Mining

  • Conference paper
  • First Online:
  • 932 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 458))

Abstract

Today’s world has seen a massive explosion in various kinds of data having some unique characteristics such as high-dimensionality and heterogeneity. The need of automated data driven techniques has become a necessity to extract useful information from this huge and diverse data sets. Data mining is an important step in the process of knowledge discovery in databases (KDD) and focuses on discovering hidden information in data that go beyond simple analysis. Traditional data mining methods are often found inefficient and unsuitable in analyzing today’s data sets due to their heterogeneity, massive size and high-dimensionality. So, the need of parallelization of traditional data mining algorithms has almost become inevitable but challenging considering available hardware and software solutions. The main objective of this paper is to look at the need and limitations of parallelization of data mining algorithms and finding ways to achieve the best. In this comparative study, we took a look at different parallel computer architectures, well proven parallelization methods, and programming language of choice.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. M. Stonebraker, R. Agrawal, U. Dayal, E.J. Neuhold and A. Reuterr DBMS Research at a Crossroads: The Vienna Update. 1993: In Proc. of the 19th VLDB Conference, Dublin, Ireland

    Google Scholar 

  2. M.S. Chen, J. Han and P.S. Yu Data mining: An overview from database perspective. December 1996: IEEE Transactions on Knowledge and Data Eng.

    Google Scholar 

  3. R. Agrawal, T. Imielinski and A. Swami Mining association rules between sets of items in large databases. 1993: In Proc. of 1993 ACM-SIGMOD Int. Conf. on Management of Data, Washington, D.C.

    Google Scholar 

  4. A.A. Freitas and S.H. Lavington Mining Very Large Databases with Parallel Processing. 1998: Kluwer Academic Publishers

    Google Scholar 

  5. M.J. Zaki Parallel and Distributed Association Mining: A Survey Rensselaer Polytechnic Institute

    Google Scholar 

  6. F. Stahl, M.M. Gaber and M. Bramer Scaling up Data Mining Techniques to Large Datasets Using Parallel and Distributed Processing

    Google Scholar 

  7. S. Paul Parallel and Distributed Data Mining Karunya University, Coimbatore, India

    Google Scholar 

  8. T.G. Lewis Data parallel computing: an alternative for the 1990s. September 1991: IEEE Computer, 24(9)

    Google Scholar 

  9. W.D. Hillis and L. Steele Jr. Data parallel algorithms. December 1986: Comm. ACM, 29(12)

    Google Scholar 

  10. A. Kaminsky Parallel Programing in Java. Presented at the CCSCNE 2007 Conference April 20, 2007

    Google Scholar 

  11. L. F. Lau, A. L. Ananda, G. Tan, W. F. Wong JAVM: Internet-based Parallel Computing Using Java. School of Computing, National University of Singapore

    Google Scholar 

  12. E.R. Harold Java Network Programming. 1997: O’Reilly and Associates

    Google Scholar 

  13. U. Banerjee, R. Eigenmann, A. Nicolau, D. Padua Automatic program parallelization. Proceedings of the IEEE 81(2), 211–243 (1993)

    Google Scholar 

  14. B.J. Bradel, T.S. Abdelrahman Automatic Trace-Based Parallelization of Java Programs. Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada M5S 3G4

    Google Scholar 

  15. B. Chan and T.S. Abdelrahman Run-Time Support for the Automatic Parallelization of Java Programs. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada M5S 3G4

    Google Scholar 

  16. J. Rafael, I. Correia, A. Fonseca, B. Cabral Dependency-Based Automatic Parallelization of Java Applications. University of Coimbra, Portugal

    Google Scholar 

  17. R.D. Blumofe, C.E. Leiserson Scheduling Multithreaded Computations by Work Stealing. J. ACM 46(5), 720–748 (1999)

    Google Scholar 

  18. Y. Sun, W. Zhang On-line Trace Based Automatic Parallelization of Java Programs on Multicore Platforms. Department of ECE, Virginia Commonwealth University

    Google Scholar 

  19. P.A. Felber Semi-Automatic Parallelization of Java Applications. Institut EURECOM 06904 Sophia Antipolis, France

    Google Scholar 

  20. S.C. Mller, G. Alonso, A. Amara, A. Csillaghy Pydron: Semi-Automatic Parallelization for Multi-Core and the Cloud. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, October 68, 2014, Broomfield, CO

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kartick Chandra Mondal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Singapore

About this paper

Cite this paper

Mondal, K.C., Sayan Bhattacharya, Anindita Sarkar (2017). Comparative Study of Parallelism on Data Mining. In: Mandal, J., Satapathy, S., Sanyal, M., Bhateja, V. (eds) Proceedings of the First International Conference on Intelligent Computing and Communication. Advances in Intelligent Systems and Computing, vol 458. Springer, Singapore. https://doi.org/10.1007/978-981-10-2035-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2035-3_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2034-6

  • Online ISBN: 978-981-10-2035-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics