A Survey of Different Technologies and Recent Challenges of Big Data

Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 44)


Big Data, the buzz around the globe in recent days is used for large-scale data which have huge volume, variety and with some genuinely difficult complex structure. The last few years of internet technology as well as computer world has seen a lot of growth and popularity in the field of cloud computing. As a consequence, these cloud applications are continually generating this big data. There are various burning problems associated with big data in the research field, like how to store, analysis and visualize these for generating further outcomes. This paper initially points out the recent developed information technologies in the field of big data. Later on, the paper outlines the major key problems like, proper load balancing, storage and processing of small files and de-duplication regarding the big data.


Big data Key technologies Hadoop Load balancing Storage 


  1. 1.
    Dev, D., Baishnab, K.L.: A review and research towards mobile cloud computing. In: 2014 2nd IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud), pp. 252, 256, 8–11 Apr 2014Google Scholar
  2. 2.
    Eaton, C., Deroos, D., Deutsch, T., Lapis, G., Zikopoulos, P.C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. Mc Graw-Hill Companies, New York (2012). ISBN 978-0-07-179053-6Google Scholar
  3. 3.
    Schneider, R.D.: Hadoop for Dummies, Special Edition. Wiley, Canada (2012). ISBN 978-1-118-25051-8Google Scholar
  4. 4.
    Intel IT Center.: Planning Guide: Getting Started with Hadoop. Steps IT Managers Can Take to Move Forward with Big Data Analytics (2012).
  5. 5.
    Singh, S., Singh, N.: Big data analytics. In: 2012 International Conference on Communication, Information & Computing Technology Mumbai India, IEEE (2011) Accessed 11 Mar 2013
  6. 6. Access 11 Mar 2013
  7. 7.
    Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute (2011).
  8. 8.
    Gerhardt, B., Griffin, K., Klemann, R.: Unlocking value in the fragmented world of big data analytics. Cisco Internet Business Solutions Group (2012).
  9. 9.
  10. 10. Accessed 11 Mar 2013
  11. 11.
    Tankard, C.: Big data security. Network Security Newsletter, Elsevier (2012). ISSN 1353-4858Google Scholar
  12. 12.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. CACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  13. 13.
    Apache, Hadoop.: Open-source implementation of MapReduce.
  14. 14.
    Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of ACM SOSP (2003)Google Scholar
  15. 15.
    Apache.: HDFS Architecture Guide. Apache Software Foundation, Canada (2008)Google Scholar
  16. 16.
    Dev, D., Patgiri, R.: Performance evaluation of HDFS in big data management. In: 2014 International Conference on High Performance Computing and Applications (ICHPCA), pp. 1, 7, 22–24 Dec 2014Google Scholar
  17. 17.
    Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Rec. 39(4), 12–27 (2010)CrossRefGoogle Scholar
  18. 18.
    Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. ACM SIGMOD Rec. 40(4), 11–20 (2011)CrossRefGoogle Scholar
  19. 19.
    Ci, X., Meng, X.: Big data management: concepts, techniques and challenges. J. Comput. Res. Dev. 50, 146–169 (2013)Google Scholar
  20. 20.
    Li, X., Dong, B., Xiao, L. Ruan, L., Ding, Y.: Small files problem in parallel file system. In: 2011 International Conference on Network Computing and Information Security, NCIS 2011, pp. 227–232. Guilin, Guangxi, China, 14–15 May 2011Google Scholar
  21. 21.
    Dong, B., Zheng, Q., Tian, F., Chao, K., Ma, R., Anane, R.: An optimized approach for storing and accessing small files on cloud storage. J. Netw. Comput. Appl. 35, 1847–1862 (2012)CrossRefGoogle Scholar
  22. 22.
    Dong, B., Qiu, J., Zheng, Q., Zhong, X., Li, J., Li, Y.: A novel approach to improving the efficiency of storing and accessing small files on Hadoop: a case study by PowerPoint files. In: 2010 IEEE 7th International Conference on Services Computing, SCC 2010, pp. 65–72. Miami, FL, United States, 5–10 July 2010Google Scholar
  23. 23.
    MacKey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in HDFS. In: 2009 IEEE International Conference on Cluster Computing and Workshops, CLUSTER ‘09. New Orleans, LA, United States, 31 Aug–4 Sept 2009Google Scholar
  24. 24.
    Chandrasekar, S., Dakshinamurthy, R., Seshakumar, P.G., Prabavathy, B., Babu, C.: A novel indexing scheme for efficient handling of small files in Hadoop distributed file system. In: 2013 3rd International Conference on Computer Communication and Informatics, ICCCI 2013. Government of India, Department of Science and Technology, Council for Scientific and Industrial Research (CSIR), Coimbatore, India, 4–6 Jan 2013Google Scholar
  25. 25.
    Zhang, Y., Liu, D.: Improving the efficiency of storing for small files in HDFS. In: 2012 International Conference on Computer Science and Service System, CSSS 2012, pp. 2239–2242. Nanjing, China, 11–13 Aug 2012Google Scholar
  26. 26.
    Li, X., Dong, B., Xiao, L., Ruan, L.: Performance optimization of small file I/O with adaptive migration strategy in cluster file system. In: 2nd International Conference on High-Performance Computing and Applications, HPCA 2009, pp. 242–249. Shanghai, China (2010), 10–12 Aug 2009Google Scholar
  27. 27.
    Mohandas, N., Thampi, S.M.: Improving Hadoop performance in handling small files. In: 1st International Conference on Advances in Computing and Communications, ACC 2011, pp. 187–194. Kochi, India, 22–24 July 2011 Google Scholar
  28. 28.
    Liu, J., Bing, L., Meina, S.: The optimization of HDFS based on small files. In: 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology, IC-BNMT2010, pp. 912–915. Beijing, China, 26–28 Oct 2010Google Scholar
  29. 29.
    Zhang, C., Yin, J.: Dynamic load balancing algorithm of distributed file system. J. Chin. Comput. Syst. 32, 1424–1426 (2011)Google Scholar
  30. 30.
    Wu, W.: Research on Mass Storage Metadata Management, vol. D. Huazhong University of Science and Technology, Wuhan (2010)Google Scholar
  31. 31.
    Tian, J., Song, W., Yu, H.: Load-balance policy in two level cluster file system. Comput. Eng. 33, 77–79, 82 (2007)Google Scholar
  32. 32.
    Gu, F.: Research on Distributed File System Load Balancing in Cloud Environment, vol. D. Jiaotong University, Beijing (2011)Google Scholar
  33. 33.
    Cai, B., Zhang, F.L., Wang, C.: Research on chunking algorithms of data de-duplication. In: International Conference on Communication, Electronics, and Automation Engineering, 2012, pp. 1019–1028. Xi’an, China (2013)Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Institute of Technology SilcharAssamIndia

Personalised recommendations