Skip to main content
Log in

A general perspective of Big Data: applications, tools, challenges and trends

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Big Data has become a very popular term. It refers to the enormous amount of structured, semi-structured and unstructured data that are exponentially generated by high-performance applications in many domains: biochemistry, genetics, molecular biology, physics, astronomy, business, to mention a few. Since the literature of Big Data has increased significantly in recent years, it becomes necessary to develop an overview of the state-of-the-art in Big Data. This paper aims to provide a comprehensive review of Big Data literature of the last 4 years, to identify the main challenges, areas of application, tools and emergent trends of Big Data. To meet this objective, we have analyzed and classified 457 papers concerning Big Data. This review gives relevant information to practitioners and researchers about the main trends in research and application of Big Data in different technical domains, as well as a reference overview of Big Data tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Talia D (2013) Clouds for scalable big data analytics. Computer 46(5):98–101

    Article  Google Scholar 

  2. Lomotey RK, Deters R (2014) Towards knowledge discovery in big data. In: Proceeding of the 8th international symposium on service oriented system engineering. IEEE Computer Society, pp 181–191

  3. Laney D (2001) 3-D management: controlling data volume, velocity, and variety. Application Delivery Strategies. META Group Original Research Note 949, pp 1–4. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 11 Aug 2015

  4. Fan W, Bifet A (2012) Mining big data: current status, and forescast to the future. SIGKDD Explor 14(2):1–5

    Article  Google Scholar 

  5. Begoli E (2012) A short survey on the state of the art in architectures and platforms for large scale data analysis and knowledge discovery from data. In: Proceeding of the joint working IEEE/IFIP Conference on software architecture (WICSA) and European conference on software architecture (ECSA), pp 177–183

  6. Sagiroglu S, Sinanc D (2013) Big data: a review. In: Proceeding of the 2013 international conference on collaboration technologies and systems (CTS). IEEE Computer Society, pp 42–47

  7. Katal A, Wazid M, Goudar RH (2013) Big data: issues, challenges, tools and good practices, In: Sixth international conference on contemporary computing (IC3), pp 404–409

  8. Kaisler S, Armour F, Espinosa JA, Money W (2013) Big data: issues and challenges moving forward. In: Proceeding of the 46th Hawaii international conference on system sciences, pp 995–1004

  9. Louridas P, Ebert C (2013) Embedded Analytics and Statistics for Big Data. IEEE Softw 30(6):33–39

    Article  Google Scholar 

  10. Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573

    Article  Google Scholar 

  11. Chen PCL, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci Elsevier 275:314–347

    Article  Google Scholar 

  12. Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19:171–209

    Article  MathSciNet  Google Scholar 

  13. Halevi G, Moed H (2012) The evolution of big data as a research and scientific topic: overview of the literature. Res Trends 30:3–6

    Google Scholar 

  14. Lee J, Lapira E, Bagheri B, Kao H (2013) Recent advances and trends in predictive manufacturing systems in big data environment. Manufact Lett 1(1):38–41

    Article  Google Scholar 

  15. Costa FF (2014) Big data in biomedicine. Drug Discov Today Elsevier 19(4):433–440

    Article  Google Scholar 

  16. Patel AB, Birla M, Nair U (2012) Addressing big data problem using Hadoop and MapReduce. In: NIRMA University international conference on engineering, NuiCONE, pp 1–5

  17. Brown B, Chui M, Manyika J (2011) Are you Ready for the Era of ‘Big Data’? McKinsey Q 4:24–35

  18. Gantz J, Reinsel D (2011) Extracting value from chaos. IDC IVIEW: IDC Analyze the Future 1142:1–12

  19. Manovich L (2012) Trending: the promises and the challenges of big social data. In: Gold MK (ed) Debates in the digital humanities. University of Minessota Press, Minneapolis, pp 460–475

  20. Burgess J, Bruns A (2012) Twitter archives and the challenges of “Big Social Data” for media and communication research. M/C J 15(5):1–7

    Google Scholar 

  21. Mahrt M, Scharkow M (2013) The value of big data in digital media research. J Broadcast Electron Media 57(1):20–33

    Article  Google Scholar 

  22. Dobre C, Xhafa F (2014) Intelligent services for big data science. Future Gener Comput Syst 37:267–281

    Article  Google Scholar 

  23. Laurila JK, Gatica-Perez D, Aad I et al (2013) From big smartphone data to worldwide research: the mobile data challenge. Pervasive Mob Comput 9(6):752–771

    Article  Google Scholar 

  24. Demchenko Y, Grosso P, de Laat C, Membrey P (2013) Addressing Big Data Issues in Scientific Data Infrastructure. In: International Conference on Collaboration Technologies and Systems (CTS). IEEE Computer Society

  25. Hu H, Wen Y, Chua T-S, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687

    Article  Google Scholar 

  26. Agrawal D, Bernstein P, Bertino E et al (2011) Challenges and Opportunities with Big Data 2011-1. Cyber Center Technical Reports, (Paper 1). Retrieved from http://dpcs.lib.purdue.edu/cctech/1

  27. He Y, Lee R, Huai Y et al. (2011) RCFile: a fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Proceeding of the IEEE international conference on data engineering (ICDE), pp 1199–1208

  28. Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40

    Article  Google Scholar 

  29. The Apache Software Foundation. Apache HBase. http://hbase.apache.org

  30. Voldemort. Project Voldemort. http://project-voldemort.com

  31. Rabl T, Sadoghi M, Jacobsen H-A et al (2012) Solving big data challenges for enterprise application performance management. J VLDB Endow 5(12):1724–1735

    Article  Google Scholar 

  32. Dean J, Ghemawat S (2008) MapReduce: Simplified Data Processing on Large Clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  33. White T (2009) Hadoop: the definite guide, 1st edn. OReilly Media Inc, Sebastopol

    Google Scholar 

  34. Schadt E, Linderman MD, Sorenson J et al (2010) Computational Solutions to Large-Scale Data Management and Analysis. Nat Rev Genet 11:647–657

    Article  Google Scholar 

  35. Marx V (2013) Biology: The Big Challenges of Big Data. Nature 498:255–260

    Article  Google Scholar 

  36. Gantz J, Reinsel D (2012) The digital Universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC IVIEW: IDC Analyze the Future 1414_v3:1–16

  37. Thusoo A, Sarma JS, Jain N et al (2010) Hive-A petabyte scale data Warehouse using Hadoop. In: Proceeding of ICDE. IEEE, pp 996–1005

  38. Olston C, Reed B, Srivastava U et al (2008) Pig Latin: a not-so-foreign language for data processing. In: Proceeding of the SIGMOD conference, pp 1099–1110

  39. Chaiken R, Jenkins B, Larson PA et al (2008) SCOPE: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276

    Article  Google Scholar 

  40. Chaudhuri S (2012) What next? A Half-Dozen data management research goals for big data and the cloud. In: Proceeding of the symposium on principles of database systems (PODS). ACM, pp 1–4

  41. Naseer A, Laera L, Matsutsuka T (2013) Enterprise BigGraph. In: 46th Hawaii international conference on system sciences. IEEE Computer Society, pp 1005–1014

  42. Wood D (2012) Linking enterprise data. Springer, New York

    Google Scholar 

  43. Hampton SE, Strasser CA et al (2013) Big data and the future of ecology. Front Ecol Environ 11(3):156–162

    Article  Google Scholar 

  44. Schadt E (2012) The changing privacy landscape in the Era of big data. Mol Syst Biol 8(612):1–3

    Google Scholar 

  45. Ranganathan S, Schönbach C, Kelso J et al (2011) Towards big data science in the decade ahead from 10 years of InCoB and the 1st ISCB-Asia joint conference. BMC Inf 12(13):1–4

    Google Scholar 

  46. Zhang X, Yang LT, Liu C, Chen J (2014) A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Trans Parallel Distrib Syst 25(2):363–373

    Article  Google Scholar 

  47. Manyika J, Chui M, Brown B et al (2011) Big data: the next frontier for innovation, competition and productivity. McKinsey Global Institute, New York

    Google Scholar 

  48. McAfee A, Brynjolfsson E (2012) Big data: the management revolution. Harv Bus Rev 90(10):60–68

    Google Scholar 

  49. Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big impact. Manag Inf Syst Q (MIS) Q 36(4):1165–1188

    Google Scholar 

  50. Boyd D, Crawford K (2012) Critical questions for big data provocations for a cultural, technological, and scholarly phenomenon. Inf Commun Soc 15(5):662–679

    Article  Google Scholar 

  51. Kezunovic M, Xie L, Grijalva S (2013) The role of big data in improving power system operation and protection. In: IREP symposium bulk power system dynamics and control -ix optimization, security and control of the emerging power grid. IEEE computer society

  52. Belaud J-P, Negny S, Dupros F et al (2014) Collaborative simulation and scientific big data analysis: illustration for sustainability in natural hazards management and chemical process engineering. Comput Ind 65:521–535

    Article  Google Scholar 

  53. Herodotou H, Lim H, Luo G et al (2011) Starfish: a self-tuning system for big data analytics. In: Proceeding of the 5th biennial conference on innovative data systems research (CIDR 11), pp 261–272

  54. Begoli E, Horey J (2012) Design principles for effective knowledge discovery from big data. In: Proceeding of the joint working IEEE/IFIP conference on software architecture (WICSA) and European conference on software architecture (ECSA), pp 215–218

  55. Agrawal D, Das S, Abbadi AE (2011) Big data and cloud computing: current state and future opportunities. In: Proceeding of the 14th international conference on extending database technology (EDBT/ICDT). ACM, pp 530–533

  56. Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. J VLDB Endow 5(12):1802–1813

    Article  Google Scholar 

  57. Walker DW, Dongarra JJ (1996) MPI: a standard message passing interface. Supercomputer 12:56–68

    Google Scholar 

  58. Huai Y, Lee R, Zhang S et al (2011) DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceeding of the ACM symposium on cloud computing

  59. Costa P, Donnelly A, Rowstron A, OShea G (2012) Camdoop: exploiting in-network aggregation for big data applications. In: Proceeding of the USENIX symposium on networked systems design and implementation (NSDI). ACM

  60. Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107

    Article  Google Scholar 

  61. Bu Y, Brokar V, Carey MJ et al (2012) Scaling datalog for machine learning on big data. Computer research repository (CoRR) Cornell University Library, pp 1–14. http://arxiv.org/pdf/1203.0160v2.pdf. Accessed 11 Aug 2015

  62. Suthaharan S (2014) Big data classification: problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Perform Eval Rev 41(4):70–73

    Article  Google Scholar 

  63. Wang W, Lu D, Zhou X et al (2013) Statistical wavelet-based anomaly detection in big data with compressive sensing. EURASIP J Wirel Commun Netw 2013(269):1–6

    Google Scholar 

  64. Madden S (2012) From databases to big data. IEEE Internet Comput 16(3):4–6

    Article  Google Scholar 

  65. Borkar V, Carey MJ, Li C (2012) Inside “Big Data Management”: ogres, onions, or parfaits? In: Proceeding of EDBT/ICDT joint conference. ACM

  66. Fisher D, DeLine R, Czerwinsk M, Drucker S (2012) Interactions with big data analytics. Interactions 19(3):50–59

    Article  Google Scholar 

  67. Shen Z, Wei J, Sundaresan N, Ma K-L (2012) Visual analysis of massive web session data. In: IEEE symposium on large data analysis and visualization (LDAV), pp 65–72

  68. Light RP, Polley DE, Börner K (2014) Open data and open code for big science studies. Scientometrics 101(2):1535–1551

    Article  Google Scholar 

  69. Camacho J (2014) Visualizing big data with compressed score plots: approach and research challenges. Chemometr Intell Lab Syst 135:110–125

    Article  Google Scholar 

  70. Aronova E, Baker KS, Oreskes N (2010) Big science and big data in biology. Hist Stud Nat Sci 40(2):183–224

    Article  Google Scholar 

  71. Bughin J, Chui M, Maniya J (2010) Clouds, big data, and smart assets: ten tech-enabled business trends to watch. McKinsey Q 56(1):75–86

    Google Scholar 

  72. Ari I, Olmezogullari E, Celebi OF (2012) Data stream analytics and mining in the cloud. In: IEEE international conference on cloud computing technology and science. IEEE Computer Society, pp 857–862

  73. Takeda S, Kobayashi A, Kobayashi H et al (2012) Irregular trend finder: visualization tool for analyzing time-series big data. In: IEEE international conference on visual analytics science and technology (VAST). IEEE Computer Society, pp 305–306

  74. Ma C-L, Shang X-F, Yuan Y-B (2012) A three-dimensional display for big data sets. In: International conference on machine learning and cybernetics (ICMLC). IEEE Computer Society, pp 1541–1545

  75. Xu X, Yang Z, Xiu J, Liu C (2013) A big data acquisition engine based on rule engine. J Chin Univ Post Telecommun 20(1):45–49

    Article  Google Scholar 

  76. Uehara M (2013) Split file model for big data in low throughput storage. In: IEEE International conference on complex, intelligent, and software intensive systems, pp 250–256

  77. Khalid A, Afzal H, Aftab S (2014) Balancing scalability, performance and fault tolerance for structured data (BSPF). In: IEEE international conference on advanced communication technology (ICACT), pp 725–732

  78. Xu Z, Mei L, Liu Y, Hu C (2013) Video structural description: a semantic based model for representing and organizing video surveillance big data. In: IEEE international conference on computational science and engineering, pp 802–809

  79. Wang Y, Li B, Luo R, Chen Y (2014) Energy efficient neural networks for big data analytics. In: Design, automation and test in Europe conference and exhibition (DATE), pp 1–2

  80. Bi C, Ono K, Ma K-L et al (2013) Proper orthogonal decomposition based parallel compression for visualizing big data on the K computer. In: IEEE symposium on large data analysis and visualization, pp 121–122

  81. Bao F, Chen J (2014) Visual framework for big data in d3.js. In: Proceeding of the 2014 IEEE workshop on electronics, computer and applications, pp 47–50

  82. Cuzzocrea A, Moussa R, Xu G (2013) OLAP*: effectively and efficiently supporting parallel OLAP over big data. Model Data Eng 8216:38–49

    Article  Google Scholar 

  83. Czarnul P (2014) A workflow application for parallel processing of big data from an internet portal. Proc Comput Sci 29:499–508

    Article  Google Scholar 

  84. Hui K, Mou J (2013) Case of small-data analysis for ion implanters in the era of big-data FDC. In: IEEE annual SEMI advanced semiconductor manufacturing conference (ASMC), pp 315–319

  85. Steed CA, Ricciuto DM, Shipman G et al (2013) Big data visual analytics for exploratory earth system simulation analysis. Comput Geosci 61:71–82

    Article  Google Scholar 

  86. Gao S, Li L, Li W et al (2014) Constructing Gazetteers from volunteered big geo-data based on Hadoop. Comput Environ Urban Syst. doi:10.1016/j.compenvurbsys.2014.02.004

    Google Scholar 

  87. Afendi FM, Ono N, Nakamura Y et al (2013) Data mining methods for OMICS and knowledge of crude medicinal plants toward big data biology. Comput Struct Biotechnol J 4(5):1–14

    Article  Google Scholar 

  88. Levy V (2013) A predictive tool for nonattendance at a speciality clinic: an application of multivariate probabilistic big data analytics. In: Proceeding of the IEEE international conference and expo on emerging technologies for a smarter world (CEWIT), pp 1–4

  89. Park HW, Leydesdorff L (2013) Decomposing social and semantic networks in emerging “Big Data” research. J Inf 7(3):756–765

    Article  Google Scholar 

  90. Ackermann K, Angus SD (2014) A resource efficient big data analysis method for the social sciences: the case of global IP activity. Proc Comput Sci 29(2014):2360–2369

    Article  Google Scholar 

  91. Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59

    Article  Google Scholar 

  92. Rybicki J, von St Vieth B, Mallmann D (2013) A concept of generic workspace for big data processing in humanities. In: IEEE international conference on big data, pp 63–70

  93. O’Driscoll A, Daugelaite J, Sleator RD (2013) “Big Data”, Hadoop and cloud computing in genomics. J Biomed Inform 46(6):774–781

    Article  MATH  Google Scholar 

  94. NIST: http://www.nist.gov

  95. Jacobs A (2009) The pathologies of big data. Commun ACM 52(8):36–44

    Article  Google Scholar 

  96. Chang F, Dean J, Ghemawat S et al (2008) BigTable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):1–26

    Article  Google Scholar 

  97. DeCandia G, Hastorum D, Jampani M et al (2007) Dynamo: Amazons highly available key-value store. In: Proceeding of the 21st ACM SIGOPS symposium on operating systems principles, pp 205–220

  98. Dou W, Zhang X, Liu J et al (2013) HireSome-II: towards privacy-aware cross-cloud service composition for big data applications. IEEE Trans Parallel Distrib Syst TPDS 26(2):455–466

    Article  Google Scholar 

  99. Zhang X, Liu C, Nepal S et al (2014) A hybrid approach for scalable sub-tree anonymization over big data using mapreduce on cloud. J Comput Syst Sci 80(5):1008–1020

    Article  MathSciNet  MATH  Google Scholar 

  100. Jung G, Gnanasambandam N, Mukherjee T (2012) Synchronous parallel processing of big-data analytics services to optimize performance in federated clouds. In: Proceeding of the 2012 IEEE 5th international conference on cloud computing, pp 811–818

  101. Yang C, Zhang X, Zhong C et al (2014) A Spatiotemporal compression based approach for efficient big data processing on cloud. J Comput Syst Sci 80(8):1563–1583

    Article  MathSciNet  MATH  Google Scholar 

  102. IDC: http://www.idc.com

  103. Hazen BT, Boone CA, Ezell JD et al (2014) Data Quality for data science, predictive analysis, and big data in supply chain management: an introduction to the problem and suggestions for research and applications. Int J Prod Econ 154:72–80

    Article  Google Scholar 

  104. Tannahill BK, Jamshidi M (2014) System of systems and big data analytics -bridging the gap. Comput Electr Eng 40:2–15

    Article  Google Scholar 

  105. Lohr S (2012) The age of big data. The New York Times, New York

    Google Scholar 

  106. Cohen J, Dolan B, Dunlap M et al (2009) MAD skills: new analysis practices for big data. In: Proceeding of the VLDB 09. VLDB endowment

  107. Kumar A, Niu F, Ré C (2013) Hazy: make it easier to build and maintain big-data analytics. Commun ACM 56(3):40–49

    Article  Google Scholar 

  108. Machanavajjgala A, Reiter JP (2012) Big privacy: protecting confidentiality in big data. Magazine XRDS: crossroads. ACM Mag Stud Big Data 19(1):20–23

    Google Scholar 

  109. Feldman D, Schmidt M, Sohler C (2013) Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceeding of the annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1434–1453

  110. Laptev N, Zeng K, Zaniolo C (2013) Very fast estimation for result and accuracy of big data analytics: the EARL system. In: Proceeding of the IEEE international conference on data engineering (ICDE), pp 1296–1299

  111. Wu Z, Chin OB (2014) From big data to data science: a multi-disciplinary perspective. Big Data Res 1:1

    Article  Google Scholar 

  112. Chandramouli B, Goldstein J, Duan S (2012) Temporal analytics on big data for web advertising. In: Proceeding of the IEEE 28th international conference on data engineering (ICDE), pp 90–101

  113. LaValle S, Lesser E, Shockley R et al (2011) Big data, analytics, and the path from insights to value. Hum Cap Rev Focus Hum Cap Anal 1(1)

  114. Russom P (2011) Big data analytics. TDWI Best Practices Report, Fourth Quarter, pp 1–37. ftp://ftp.software.ibm.com/software/tw/Defining_Big_Data_through_3V_v.pdf. Accessed 11 Aug 2015

  115. Borgman CL (2010) Research data: who will share what, with whom, when, and why? Working Paper No. 161, German Data Forum (RatSWD). Retrieved from www.germandataforum.de

  116. Yang C, Goodchild M, Huang Q et al (2011) Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing? Int J Digit Earth 4(4):305–329

    Article  Google Scholar 

  117. Pijanowski BC, Tayyebi A, Doucette J et al (2014) A big data urban growth simulation at a national scale: configuring the GIS and neural network based land transformation model to run in a high performance computing (HPC) environment. Environ Model Softw 51:250–268

    Article  Google Scholar 

  118. Callebaut W (2012) Scientific perspectivism: a philosopher of sciences response to the challenge of big data biology. Stud Hist Philos Biol Biomed Sci 43(1):69–80

    Article  MathSciNet  Google Scholar 

  119. Vanacek J (2012) How cloud and big data are impacting the human genome: touching 7 billion lives. Forbes. http://www.forbes.com/sites/sap/2012/04/16/how-cloud-and-big-data-are-impacting-the-human-genome-touching-7-billion-lives/. Accessed 11 Aug 2015

  120. Costa FF (2012) Big data in genomics: challenges and solutions. GIT Lab J 11–12:1–4

    Google Scholar 

  121. Varpoorte R, Kim H, Choi Y (2006) Plants as source of medicines:new perspectives. In: Bogers RJ, Craker LE, Lange D (eds) Medicinal and aromatic plants. Springer, Netherlands, pp 261–273

  122. Boyd D, Crawford K (2011) Six provocations for big data. In: A decade in internet time: symposium on the dynamics of the internet and society. doi:10.2139/ssrn.1926431. Accessed 11 Aug 2015

  123. Ansolabehere S, Hersh E (2012) Validation: what big data reveal about survey misreporting and the real electorate. Polit Anal 20(4):437–459

    Article  Google Scholar 

  124. Tene O, Polonetsky J (2012) Privacy in the age of big data: a time for big decisions. Standf Law Rev 63:63–69

    Google Scholar 

  125. Spalation Neutron Source (SNS). http://neutrons.ornl.gov/sns

  126. White AA (2013) Big data are shaping the future of materials science. MRS Bull 38:594–595

    Article  Google Scholar 

  127. ADARA. http://www.csm.ornl.gov/newsite/adara.html

  128. Von Lilienfeld OA (2013) First principles view on chemical compound space: gaining rigorous atomistic control of molecular properties. Int J Quantum Chem 113(12):1676–1689

  129. Groves P, Kayyali B, Knott D et al (2013) The big-data revolution in US health care: accelerating value and innovation. McKinsey & Company, New York

    Google Scholar 

  130. Kayyali B, Knott D, Van Kauiken S (2013) The big-data revolution in US health care: accelerating value and innovation. McKinsey & Company, New York

    Google Scholar 

  131. Lusher SJ, McGuire R, van Schaik RC et al (2014) Data-driven medicinal chemistry in the Era of big data. Drug Discov Today 19(7):859–868

    Article  Google Scholar 

  132. Costa FF (2013) Social networks, web-based tools and diseases: implication for biomedical research. Drug Discov Today Elsevier 18(5–6):272–281

    Article  Google Scholar 

  133. New Vantage Partners (2012) Big data executive survey 2012. Consolidated summary report. http://newvantage.com/wp-content/uploads/2012/12/NVP-Big-Data-Survey-Themes-Trends.pdf. Accessed 11 Aug 2015

  134. Demirkan H, Delen D (2013) Leveraging the capabilities of service-oriented decision support systems: putting analytics and big data in cloud. Decis Support Syst 558(1):412–421

    Article  Google Scholar 

  135. Roman S, Katerina S (2012) The usability of agent-based simulation in decision support system of e-commerce architecture. Int J Inf Eng Electron Bus 4(1):10–17

    Article  Google Scholar 

  136. Harrison C, Eckman B, Hamilton R et al (2010) Foundations for smarter cities. IBM J Res Dev 54(4):1–16

    Article  Google Scholar 

  137. Khan Z, Anjum A, Liaquat Kiani S (2013) Cloud based big data analytics for smart future cities. In: Proceeding of the IEE/ACM 6th international conference on utility and cloud computing, pp 381–386

  138. Vilajosana I, Llosa J, Martinez B et al (2013) Bootstrapping smart cities through a self-sustainable model based on big data flows. IEEE Commun Mag 51(6):128–134

    Article  Google Scholar 

  139. Dey S, Chakravorty A, Naskar S, Misra P (2012) Smart city surveillance: leveraging benefits of cloud data stores. In: Proceeding of the first IEEE international workshop on global trends in smart cities, pp 868–876

  140. Jara AJ, Genoud D, Bocchi Y (2014) Big data in smart cities: from poisson to human dynamics. In: Proceeding of the IEEE 28th international conference on advanced information networking and applications workshops (WAINA). IEEE computer society, pp 785–790

  141. Girtelschmid S, Steinbauer M, Kumar V et al (2013) Big data in large scale intelligent smart city installations. In: Proceeding of the international conference on information integration and web-based applications and services (IIWAS). ACM

  142. Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (1996) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, New York

    Google Scholar 

  143. Rajaraman A, Ullman J (2011) Mining of massive data sets. Cambridge Univercity Press, Cambridge

    Book  Google Scholar 

  144. Berkovich S, Liao D (2012) On clusterization of big data streams. In. Proceeding of the 3rd international conference on computing for geospatial research and applications (COM.Geo). ACM

  145. Moens S, Aksehirli E, Goethals B (2013) Frequent itemset mining for big data. In: Proceeding of the IEEE international conference on big data, pp 111–118

  146. Ledolter J (2013) Data mining and business analytics with R. John Wiley & Sons, New York

    Book  MATH  Google Scholar 

  147. Slavakis K, Giannakis GB, Mateos G (2014) Modeling and optimization for big data analytics. IEEE Signal Process Mag 31(5):18–31

    Article  Google Scholar 

  148. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  149. Grolinger K, Hayes M, Higashino WA et al (2014) Challenges for MapReduce in big data. In: Proceeding of the 2014 IEEE world congress on services (SERVICES), pp 182–189

  150. Hashem IAT, Yaqoob I, Badrul Anuar N et al (2015) The rise of “Big Data” on cloud computing: review and open research issues. Inf Syst 47:98–115

    Article  Google Scholar 

  151. Zhifeng X, Yang X (2013) Security and privacy in cloud computing. IEEE Commun Surv Tutor 15(2):843–859

    Article  Google Scholar 

  152. Esposito C, Ficco M, Palmieri F et al (2014) A knowledge-based platform for big data analytics based on publish/subscribe services and stream processing. Knowl Based Syst 79:3–17

    Article  Google Scholar 

  153. López V, del Río S, Benítez JM et al (2014) Cost-sensitive Linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data. Fuzzy Sets Syst 258:5–38

    Article  MathSciNet  Google Scholar 

  154. Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: Proceeding of the 19th ACM symposium on operating systems principles SOSP 03, pp 29–43

  155. Lin J, Ryaboy D (2012) Scaling big data mining infrastructure: the twitter experience. SIGKDD Explor 14(2):6–19

    Article  Google Scholar 

  156. Isard M, Budiu M, Yu Y et al (2007) Dryad: distributed data-parallel programs from sequential building blocks In: Proceeding of the 2nd ACM SIGOPS/EuroSys European conference on computer systems, pp 59–72

  157. Yu Y, Isard M, Fetterly D et al (2008) DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: Proceeding of the 8th USENIX conference on operating systems design and implementation, pp 1–14

  158. Owen S, Anil R, Dunning T et al (2011) Mahout in action. Manning Publications Co. Greenwich, CT, USA

  159. Apache Storm. https://storm.apache.org/

  160. Neumeyer L, Robbins B, Nair A et al (2010) S4: distributed stream computing platform. In: Proceeding of the 2010 international conference on data mining workshops (ICDMW). IEEE

  161. Stoica I (2014) Conquering big data with spark and BDAS. In: Proceeding of the ACM international conference on measurement and modeling of computer systems

  162. Bifet A, Holmes G, Kirkby R et al (2010) MOA: massive online analysis. J Mach Learn Res (JMLR) 11:1601–1604

    Google Scholar 

  163. Apache Drill. http://drill.apache.org/

  164. Franceschini M (2013) How to maximize the value of big data with the open source SpagoBI suite through a comprehensive approach. In: Proceeding of the VLDB endowment, vol 6, pp 1170–1171

  165. Bostock M, Ogievetsky V, Heer J (2011) D3 data-driven documents. IEEE Trans Vis Comput Graph 17(12):2301–2309

    Article  Google Scholar 

  166. SMLC: Smart Manufacturing Leadership Coalition. https://smartmanufacturingcoalition.org/

  167. Ahmed KN (2013) Putting big data to work. Mech Eng 135:32–37

    Google Scholar 

  168. Guillemin P, Friess P (2009) Internet of things: strategic research roadmap. The cluster of European research projects. Tech. Rep. http://www.internet-of-things-research.eu/pdf/IoT_Cluster_Strategic_Research_Agenda_2009.pdf. Accessed 11 Aug 2015

  169. Perera C, Zaslavsky A, Christen P et al (2014) Context aware computing for the internet of things: a survey. IEEE Commun Surv Tutor 16(1):414–454

    Article  Google Scholar 

  170. Stimmel CL, Gohn B (2012) Smart grid data analytics: smart meter, grid operations, asset management, and renewable energy integration data analytics: global market analysis and forecasts. Research Report (Executive Summary), 3Q, pp 1–16

  171. Qin X, Zhou X (2013) A survey on benchmarks for big data and some more considerations. In: Yin H, Tang K, Gao Y et al (eds) Intelligent data engineering and automated learning-IDEAL 2013. LNCS, vol 8206. Springer, Berlin, Heidelberg, pp 619–627

  172. Baru C, Bhandarkar M, Nambiar E et al (2013) Benchmarking big data systems and the big data top100 list. Big Data 1(1):60–64

    Article  Google Scholar 

  173. Xiong W, Yu Z, Bei Z et al (2013) A characterization of big data benchmarks. In: 2013 IEEE international conference on big data, pp 118–125

  174. Ming Z, Luo C, Gao W et al (2014) BDGS: a scalable big data generator suite in big data benchmarking. Adv Big Data Benchmark LNCS 8585:138–154

    Google Scholar 

  175. Wang L, Zhan J, Luo C et al (2014) BigDataBench: A Big Data Benchmark Suite from Internet Services. In: Proceeding of the IEEE 20th international symposium on high performance computer architecture (HPCA), pp 488–499

  176. Shekhar S, Evans MR, Gunturi V (2014) Benchmarking spatial big data. Specif Big Data Bechmark LNCS 8163:81–93

    Article  Google Scholar 

  177. Dean J (2014) Big data, data mining and machine learning: value creation for business leaders and practitioners. Wiley, New York

    Book  Google Scholar 

  178. Tang N (2014) Big data cleaning. Web Technol Appl LNCS 8709:13–24

    Google Scholar 

Download references

Acknowledgments

The authors are very grateful to National Technological of Mexico for supporting this work. Also, this research paper was sponsored by the National Council of Science and Technology (CONACYT), as well as by the Public Education Secretary (SEP) through PRODEP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giner Alor-Hernández.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodríguez-Mazahua, L., Rodríguez-Enríquez, CA., Sánchez-Cervantes, J.L. et al. A general perspective of Big Data: applications, tools, challenges and trends. J Supercomput 72, 3073–3113 (2016). https://doi.org/10.1007/s11227-015-1501-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1501-1

Keywords

Navigation