Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 255))

Abstracts

Distributed systems, represented by Hadoop, are becoming an essential component of large-scale mining system. Therefore, this paper is to complete a data mining task in the Hadoop distributed system, whose main purpose is to build a distributed cluster computing environment by Hadoop and perform data mining tasks in the environment. The paper studies the Hadoop system structure and acquires an in-depth understanding on the distributed file system HDFS and the principle and implementation of MapReduce parallel programming model. We achieve a systemic control of the data mining process, apply the traditional data mining algorithms to MapReduce programming model, research the implementation of data mining algorithms on Hadoop platform, and mainly analyze the execution efficiency and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xu, Z.: Big data: The impending data revolution, Guangxi Normal University, Guangxi (2012)

    Google Scholar 

  2. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)

    Google Scholar 

  3. Chow, J.: Redpoll:A machine learning library based on hadoop, CS Department, Jinan University, Guangzhou (2010)

    Google Scholar 

  4. Qin, G., Li, Q.: Knowledge acquisition and discovery based on data mining. Comput. Eng. (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianwei Guo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer India

About this paper

Cite this paper

Guo, J., Li, Y., Du, L., Zhao, G., Jiang, J. (2014). Research on Distributed Data Mining System Based on Hadoop Platform. In: Patnaik, S., Li, X. (eds) Proceedings of International Conference on Computer Science and Information Technology. Advances in Intelligent Systems and Computing, vol 255. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1759-6_72

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-1759-6_72

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-1758-9

  • Online ISBN: 978-81-322-1759-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics