Skip to main content

An Efficient Data Integration Framework in Cloud Using MapReduce

Part of the SpringerBriefs in Applied Sciences and Technology book series (BRIEFSFOMEBI)

Abstract

In Bigdata applications, providing security to massive data is an important challenge because working with such data requires large scale resources that must be provided by cloud service provider. Here, this paper demonstrates a cloud implementation and technologies using big data and discusses how to protect such data using hashing and how users can be authenticated. In particular, technologies using big data such as the Hadoop project of Apache are discussed, which provides parallelized and distributed data analyzing and processing of petabyte of data, along with a summarized view of monitoring and usage of Hadoop cluster. In this paper, an algorithm called FNV hashing is introduced to provide integrity of the data that has been outsourced to cloud by the user. The data within Hadoop cluster can be accessed and verified using hashing. This approach brings out to enable many new security challenges over the cloud environment using Hadoop distributed file system. The performance of the cluster can be monitored by using ganglia monitoring tool. This paper designs an evaluation cloud model which will provide quantity related results for regularly checking accuracy and cost. From the results of the experiment found out that this model is more accurate, cheaper and can respond in real time.

Keywords

  • Big data
  • Hadoop
  • MapReduce
  • Cloud computing
  • Accuracy
  • Consumption

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-981-287-338-5_11
  • Chapter length: 9 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-981-287-338-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. http://www.edureka.in/blog/what-is-big-data-and-why-learn-hadoop/

  2. http://tools.ietf.org/html/draft-eastlake-fnv-07

  3. Svantesson D, Clarke R (2010) Privacy and consumer risks in cloud computing. Comput Law Secur Review 26(4):391–397

    CrossRef  Google Scholar 

  4. King NJ, Raja VT (2012) Protecting the privacy and security of sensitive customer data in the cloud. Comput Law Secur Rev 28(3):308–319

    CrossRef  Google Scholar 

  5. Breitinger F, Stivaktakis G, Baier H (2013) A framework to test algorithms of similarity hashing. Digit Invest 10:S50–S58

    CrossRef  Google Scholar 

  6. Rupesh M, Chitre DK (2012) Data leakage and detection of guilty agent. Int J Sci Eng Res 3(6)

    Google Scholar 

  7. Hadoop, http://hadoop.apache.org

  8. http://www.isthe.com/chongo/tech/comp/fnv/index.html#history

  9. Zhao J, Wang L, Tao J, Chen J, Sun W, Ranjan R, Kołodziej J, Streit A, Georgakopoulos D (2014) A security framework in GHadoop for bigdata computing across distributed Cloud data centers. Comput Syst Sci 80:994–1007

    CrossRef  MATH  Google Scholar 

  10. Wang L, Tao J, Ranjan R, Marten H, Streit A, Chen D, Chen J (2013) G-Hadoop: mapreduce across distributed data centers from data-intensive computing. Future Gener Comput Syst 29(3):739

    CrossRef  Google Scholar 

  11. Caballer M, de Alfonso C, Molto G, Romero E, Blanquer I, Garcia A (2014) Code cloud: A platform to enable execution of programming models on the Clouds. J Syst Softw 93:187–198

    CrossRef  Google Scholar 

  12. AL-Saiyd NA, Sail N (2013) Data integrity in cloud computing security. Theor Appl Inform Technol 58

    Google Scholar 

  13. Dillibabu M, Kumari S, Saranya T, Preethi R (2013) Assured protection and veracity for cloud data using Merkle hash tree algorithm. Indian J Appl Res 3:1–3

    Google Scholar 

  14. Mounika CH, RamaDevi L, Nikhila P (2013) Sample load rebalancing for distributed hash table in cloud. ISRO J Comput Eng 13:60–65

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Srinivasa Rao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 The Author(s)

About this chapter

Cite this chapter

Srinivasa Rao, P., Krishna Prasad, M.H.M., Thammi Reddy, K. (2015). An Efficient Data Integration Framework in Cloud Using MapReduce. In: Muppalaneni, N., Gunjan, V. (eds) Computational Intelligence Techniques for Comparative Genomics. SpringerBriefs in Applied Sciences and Technology(). Springer, Singapore. https://doi.org/10.1007/978-981-287-338-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-287-338-5_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-287-337-8

  • Online ISBN: 978-981-287-338-5

  • eBook Packages: EngineeringEngineering (R0)