Advertisement

The Journal of Supercomputing

, Volume 72, Issue 8, pp 3073–3113 | Cite as

A general perspective of Big Data: applications, tools, challenges and trends

  • Lisbeth Rodríguez-Mazahua
  • Cristian-Aarón Rodríguez-Enríquez
  • José Luis Sánchez-Cervantes
  • Jair Cervantes
  • Jorge Luis García-Alcaraz
  • Giner Alor-Hernández
Article

Abstract

Big Data has become a very popular term. It refers to the enormous amount of structured, semi-structured and unstructured data that are exponentially generated by high-performance applications in many domains: biochemistry, genetics, molecular biology, physics, astronomy, business, to mention a few. Since the literature of Big Data has increased significantly in recent years, it becomes necessary to develop an overview of the state-of-the-art in Big Data. This paper aims to provide a comprehensive review of Big Data literature of the last 4 years, to identify the main challenges, areas of application, tools and emergent trends of Big Data. To meet this objective, we have analyzed and classified 457 papers concerning Big Data. This review gives relevant information to practitioners and researchers about the main trends in research and application of Big Data in different technical domains, as well as a reference overview of Big Data tools.

Keywords

Application domains Classification Big Data Literature review 

Notes

Acknowledgments

The authors are very grateful to National Technological of Mexico for supporting this work. Also, this research paper was sponsored by the National Council of Science and Technology (CONACYT), as well as by the Public Education Secretary (SEP) through PRODEP.

References

  1. 1.
    Talia D (2013) Clouds for scalable big data analytics. Computer 46(5):98–101CrossRefGoogle Scholar
  2. 2.
    Lomotey RK, Deters R (2014) Towards knowledge discovery in big data. In: Proceeding of the 8th international symposium on service oriented system engineering. IEEE Computer Society, pp 181–191Google Scholar
  3. 3.
    Laney D (2001) 3-D management: controlling data volume, velocity, and variety. Application Delivery Strategies. META Group Original Research Note 949, pp 1–4. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 11 Aug 2015
  4. 4.
    Fan W, Bifet A (2012) Mining big data: current status, and forescast to the future. SIGKDD Explor 14(2):1–5CrossRefGoogle Scholar
  5. 5.
    Begoli E (2012) A short survey on the state of the art in architectures and platforms for large scale data analysis and knowledge discovery from data. In: Proceeding of the joint working IEEE/IFIP Conference on software architecture (WICSA) and European conference on software architecture (ECSA), pp 177–183Google Scholar
  6. 6.
    Sagiroglu S, Sinanc D (2013) Big data: a review. In: Proceeding of the 2013 international conference on collaboration technologies and systems (CTS). IEEE Computer Society, pp 42–47Google Scholar
  7. 7.
    Katal A, Wazid M, Goudar RH (2013) Big data: issues, challenges, tools and good practices, In: Sixth international conference on contemporary computing (IC3), pp 404–409Google Scholar
  8. 8.
    Kaisler S, Armour F, Espinosa JA, Money W (2013) Big data: issues and challenges moving forward. In: Proceeding of the 46th Hawaii international conference on system sciences, pp 995–1004Google Scholar
  9. 9.
    Louridas P, Ebert C (2013) Embedded Analytics and Statistics for Big Data. IEEE Softw 30(6):33–39CrossRefGoogle Scholar
  10. 10.
    Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573CrossRefGoogle Scholar
  11. 11.
    Chen PCL, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci Elsevier 275:314–347CrossRefGoogle Scholar
  12. 12.
    Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19:171–209MathSciNetCrossRefGoogle Scholar
  13. 13.
    Halevi G, Moed H (2012) The evolution of big data as a research and scientific topic: overview of the literature. Res Trends 30:3–6Google Scholar
  14. 14.
    Lee J, Lapira E, Bagheri B, Kao H (2013) Recent advances and trends in predictive manufacturing systems in big data environment. Manufact Lett 1(1):38–41CrossRefGoogle Scholar
  15. 15.
    Costa FF (2014) Big data in biomedicine. Drug Discov Today Elsevier 19(4):433–440CrossRefGoogle Scholar
  16. 16.
    Patel AB, Birla M, Nair U (2012) Addressing big data problem using Hadoop and MapReduce. In: NIRMA University international conference on engineering, NuiCONE, pp 1–5Google Scholar
  17. 17.
    Brown B, Chui M, Manyika J (2011) Are you Ready for the Era of ‘Big Data’? McKinsey Q 4:24–35Google Scholar
  18. 18.
    Gantz J, Reinsel D (2011) Extracting value from chaos. IDC IVIEW: IDC Analyze the Future 1142:1–12Google Scholar
  19. 19.
    Manovich L (2012) Trending: the promises and the challenges of big social data. In: Gold MK (ed) Debates in the digital humanities. University of Minessota Press, Minneapolis, pp 460–475Google Scholar
  20. 20.
    Burgess J, Bruns A (2012) Twitter archives and the challenges of “Big Social Data” for media and communication research. M/C J 15(5):1–7Google Scholar
  21. 21.
    Mahrt M, Scharkow M (2013) The value of big data in digital media research. J Broadcast Electron Media 57(1):20–33CrossRefGoogle Scholar
  22. 22.
    Dobre C, Xhafa F (2014) Intelligent services for big data science. Future Gener Comput Syst 37:267–281CrossRefGoogle Scholar
  23. 23.
    Laurila JK, Gatica-Perez D, Aad I et al (2013) From big smartphone data to worldwide research: the mobile data challenge. Pervasive Mob Comput 9(6):752–771CrossRefGoogle Scholar
  24. 24.
    Demchenko Y, Grosso P, de Laat C, Membrey P (2013) Addressing Big Data Issues in Scientific Data Infrastructure. In: International Conference on Collaboration Technologies and Systems (CTS). IEEE Computer SocietyGoogle Scholar
  25. 25.
    Hu H, Wen Y, Chua T-S, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687CrossRefGoogle Scholar
  26. 26.
    Agrawal D, Bernstein P, Bertino E et al (2011) Challenges and Opportunities with Big Data 2011-1. Cyber Center Technical Reports, (Paper 1). Retrieved from http://dpcs.lib.purdue.edu/cctech/1
  27. 27.
    He Y, Lee R, Huai Y et al. (2011) RCFile: a fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Proceeding of the IEEE international conference on data engineering (ICDE), pp 1199–1208Google Scholar
  28. 28.
    Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40CrossRefGoogle Scholar
  29. 29.
    The Apache Software Foundation. Apache HBase. http://hbase.apache.org
  30. 30.
    Voldemort. Project Voldemort. http://project-voldemort.com
  31. 31.
    Rabl T, Sadoghi M, Jacobsen H-A et al (2012) Solving big data challenges for enterprise application performance management. J VLDB Endow 5(12):1724–1735CrossRefGoogle Scholar
  32. 32.
    Dean J, Ghemawat S (2008) MapReduce: Simplified Data Processing on Large Clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  33. 33.
    White T (2009) Hadoop: the definite guide, 1st edn. OReilly Media Inc, SebastopolGoogle Scholar
  34. 34.
    Schadt E, Linderman MD, Sorenson J et al (2010) Computational Solutions to Large-Scale Data Management and Analysis. Nat Rev Genet 11:647–657CrossRefGoogle Scholar
  35. 35.
    Marx V (2013) Biology: The Big Challenges of Big Data. Nature 498:255–260CrossRefGoogle Scholar
  36. 36.
    Gantz J, Reinsel D (2012) The digital Universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC IVIEW: IDC Analyze the Future 1414_v3:1–16Google Scholar
  37. 37.
    Thusoo A, Sarma JS, Jain N et al (2010) Hive-A petabyte scale data Warehouse using Hadoop. In: Proceeding of ICDE. IEEE, pp 996–1005Google Scholar
  38. 38.
    Olston C, Reed B, Srivastava U et al (2008) Pig Latin: a not-so-foreign language for data processing. In: Proceeding of the SIGMOD conference, pp 1099–1110Google Scholar
  39. 39.
    Chaiken R, Jenkins B, Larson PA et al (2008) SCOPE: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276CrossRefGoogle Scholar
  40. 40.
    Chaudhuri S (2012) What next? A Half-Dozen data management research goals for big data and the cloud. In: Proceeding of the symposium on principles of database systems (PODS). ACM, pp 1–4Google Scholar
  41. 41.
    Naseer A, Laera L, Matsutsuka T (2013) Enterprise BigGraph. In: 46th Hawaii international conference on system sciences. IEEE Computer Society, pp 1005–1014Google Scholar
  42. 42.
    Wood D (2012) Linking enterprise data. Springer, New YorkGoogle Scholar
  43. 43.
    Hampton SE, Strasser CA et al (2013) Big data and the future of ecology. Front Ecol Environ 11(3):156–162CrossRefGoogle Scholar
  44. 44.
    Schadt E (2012) The changing privacy landscape in the Era of big data. Mol Syst Biol 8(612):1–3Google Scholar
  45. 45.
    Ranganathan S, Schönbach C, Kelso J et al (2011) Towards big data science in the decade ahead from 10 years of InCoB and the 1st ISCB-Asia joint conference. BMC Inf 12(13):1–4Google Scholar
  46. 46.
    Zhang X, Yang LT, Liu C, Chen J (2014) A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Trans Parallel Distrib Syst 25(2):363–373CrossRefGoogle Scholar
  47. 47.
    Manyika J, Chui M, Brown B et al (2011) Big data: the next frontier for innovation, competition and productivity. McKinsey Global Institute, New YorkGoogle Scholar
  48. 48.
    McAfee A, Brynjolfsson E (2012) Big data: the management revolution. Harv Bus Rev 90(10):60–68Google Scholar
  49. 49.
    Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big impact. Manag Inf Syst Q (MIS) Q 36(4):1165–1188Google Scholar
  50. 50.
    Boyd D, Crawford K (2012) Critical questions for big data provocations for a cultural, technological, and scholarly phenomenon. Inf Commun Soc 15(5):662–679CrossRefGoogle Scholar
  51. 51.
    Kezunovic M, Xie L, Grijalva S (2013) The role of big data in improving power system operation and protection. In: IREP symposium bulk power system dynamics and control -ix optimization, security and control of the emerging power grid. IEEE computer societyGoogle Scholar
  52. 52.
    Belaud J-P, Negny S, Dupros F et al (2014) Collaborative simulation and scientific big data analysis: illustration for sustainability in natural hazards management and chemical process engineering. Comput Ind 65:521–535CrossRefGoogle Scholar
  53. 53.
    Herodotou H, Lim H, Luo G et al (2011) Starfish: a self-tuning system for big data analytics. In: Proceeding of the 5th biennial conference on innovative data systems research (CIDR 11), pp 261–272Google Scholar
  54. 54.
    Begoli E, Horey J (2012) Design principles for effective knowledge discovery from big data. In: Proceeding of the joint working IEEE/IFIP conference on software architecture (WICSA) and European conference on software architecture (ECSA), pp 215–218Google Scholar
  55. 55.
    Agrawal D, Das S, Abbadi AE (2011) Big data and cloud computing: current state and future opportunities. In: Proceeding of the 14th international conference on extending database technology (EDBT/ICDT). ACM, pp 530–533Google Scholar
  56. 56.
    Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. J VLDB Endow 5(12):1802–1813CrossRefGoogle Scholar
  57. 57.
    Walker DW, Dongarra JJ (1996) MPI: a standard message passing interface. Supercomputer 12:56–68Google Scholar
  58. 58.
    Huai Y, Lee R, Zhang S et al (2011) DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceeding of the ACM symposium on cloud computingGoogle Scholar
  59. 59.
    Costa P, Donnelly A, Rowstron A, OShea G (2012) Camdoop: exploiting in-network aggregation for big data applications. In: Proceeding of the USENIX symposium on networked systems design and implementation (NSDI). ACMGoogle Scholar
  60. 60.
    Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107CrossRefGoogle Scholar
  61. 61.
    Bu Y, Brokar V, Carey MJ et al (2012) Scaling datalog for machine learning on big data. Computer research repository (CoRR) Cornell University Library, pp 1–14. http://arxiv.org/pdf/1203.0160v2.pdf. Accessed 11 Aug 2015
  62. 62.
    Suthaharan S (2014) Big data classification: problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Perform Eval Rev 41(4):70–73CrossRefGoogle Scholar
  63. 63.
    Wang W, Lu D, Zhou X et al (2013) Statistical wavelet-based anomaly detection in big data with compressive sensing. EURASIP J Wirel Commun Netw 2013(269):1–6Google Scholar
  64. 64.
    Madden S (2012) From databases to big data. IEEE Internet Comput 16(3):4–6CrossRefGoogle Scholar
  65. 65.
    Borkar V, Carey MJ, Li C (2012) Inside “Big Data Management”: ogres, onions, or parfaits? In: Proceeding of EDBT/ICDT joint conference. ACMGoogle Scholar
  66. 66.
    Fisher D, DeLine R, Czerwinsk M, Drucker S (2012) Interactions with big data analytics. Interactions 19(3):50–59CrossRefGoogle Scholar
  67. 67.
    Shen Z, Wei J, Sundaresan N, Ma K-L (2012) Visual analysis of massive web session data. In: IEEE symposium on large data analysis and visualization (LDAV), pp 65–72Google Scholar
  68. 68.
    Light RP, Polley DE, Börner K (2014) Open data and open code for big science studies. Scientometrics 101(2):1535–1551CrossRefGoogle Scholar
  69. 69.
    Camacho J (2014) Visualizing big data with compressed score plots: approach and research challenges. Chemometr Intell Lab Syst 135:110–125CrossRefGoogle Scholar
  70. 70.
    Aronova E, Baker KS, Oreskes N (2010) Big science and big data in biology. Hist Stud Nat Sci 40(2):183–224CrossRefGoogle Scholar
  71. 71.
    Bughin J, Chui M, Maniya J (2010) Clouds, big data, and smart assets: ten tech-enabled business trends to watch. McKinsey Q 56(1):75–86Google Scholar
  72. 72.
    Ari I, Olmezogullari E, Celebi OF (2012) Data stream analytics and mining in the cloud. In: IEEE international conference on cloud computing technology and science. IEEE Computer Society, pp 857–862Google Scholar
  73. 73.
    Takeda S, Kobayashi A, Kobayashi H et al (2012) Irregular trend finder: visualization tool for analyzing time-series big data. In: IEEE international conference on visual analytics science and technology (VAST). IEEE Computer Society, pp 305–306Google Scholar
  74. 74.
    Ma C-L, Shang X-F, Yuan Y-B (2012) A three-dimensional display for big data sets. In: International conference on machine learning and cybernetics (ICMLC). IEEE Computer Society, pp 1541–1545Google Scholar
  75. 75.
    Xu X, Yang Z, Xiu J, Liu C (2013) A big data acquisition engine based on rule engine. J Chin Univ Post Telecommun 20(1):45–49CrossRefGoogle Scholar
  76. 76.
    Uehara M (2013) Split file model for big data in low throughput storage. In: IEEE International conference on complex, intelligent, and software intensive systems, pp 250–256Google Scholar
  77. 77.
    Khalid A, Afzal H, Aftab S (2014) Balancing scalability, performance and fault tolerance for structured data (BSPF). In: IEEE international conference on advanced communication technology (ICACT), pp 725–732Google Scholar
  78. 78.
    Xu Z, Mei L, Liu Y, Hu C (2013) Video structural description: a semantic based model for representing and organizing video surveillance big data. In: IEEE international conference on computational science and engineering, pp 802–809Google Scholar
  79. 79.
    Wang Y, Li B, Luo R, Chen Y (2014) Energy efficient neural networks for big data analytics. In: Design, automation and test in Europe conference and exhibition (DATE), pp 1–2Google Scholar
  80. 80.
    Bi C, Ono K, Ma K-L et al (2013) Proper orthogonal decomposition based parallel compression for visualizing big data on the K computer. In: IEEE symposium on large data analysis and visualization, pp 121–122Google Scholar
  81. 81.
    Bao F, Chen J (2014) Visual framework for big data in d3.js. In: Proceeding of the 2014 IEEE workshop on electronics, computer and applications, pp 47–50Google Scholar
  82. 82.
    Cuzzocrea A, Moussa R, Xu G (2013) OLAP*: effectively and efficiently supporting parallel OLAP over big data. Model Data Eng 8216:38–49CrossRefGoogle Scholar
  83. 83.
    Czarnul P (2014) A workflow application for parallel processing of big data from an internet portal. Proc Comput Sci 29:499–508CrossRefGoogle Scholar
  84. 84.
    Hui K, Mou J (2013) Case of small-data analysis for ion implanters in the era of big-data FDC. In: IEEE annual SEMI advanced semiconductor manufacturing conference (ASMC), pp 315–319Google Scholar
  85. 85.
    Steed CA, Ricciuto DM, Shipman G et al (2013) Big data visual analytics for exploratory earth system simulation analysis. Comput Geosci 61:71–82CrossRefGoogle Scholar
  86. 86.
    Gao S, Li L, Li W et al (2014) Constructing Gazetteers from volunteered big geo-data based on Hadoop. Comput Environ Urban Syst. doi: 10.1016/j.compenvurbsys.2014.02.004 Google Scholar
  87. 87.
    Afendi FM, Ono N, Nakamura Y et al (2013) Data mining methods for OMICS and knowledge of crude medicinal plants toward big data biology. Comput Struct Biotechnol J 4(5):1–14CrossRefGoogle Scholar
  88. 88.
    Levy V (2013) A predictive tool for nonattendance at a speciality clinic: an application of multivariate probabilistic big data analytics. In: Proceeding of the IEEE international conference and expo on emerging technologies for a smarter world (CEWIT), pp 1–4Google Scholar
  89. 89.
    Park HW, Leydesdorff L (2013) Decomposing social and semantic networks in emerging “Big Data” research. J Inf 7(3):756–765CrossRefGoogle Scholar
  90. 90.
    Ackermann K, Angus SD (2014) A resource efficient big data analysis method for the social sciences: the case of global IP activity. Proc Comput Sci 29(2014):2360–2369CrossRefGoogle Scholar
  91. 91.
    Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59CrossRefGoogle Scholar
  92. 92.
    Rybicki J, von St Vieth B, Mallmann D (2013) A concept of generic workspace for big data processing in humanities. In: IEEE international conference on big data, pp 63–70Google Scholar
  93. 93.
    O’Driscoll A, Daugelaite J, Sleator RD (2013) “Big Data”, Hadoop and cloud computing in genomics. J Biomed Inform 46(6):774–781CrossRefMATHGoogle Scholar
  94. 94.
  95. 95.
    Jacobs A (2009) The pathologies of big data. Commun ACM 52(8):36–44CrossRefGoogle Scholar
  96. 96.
    Chang F, Dean J, Ghemawat S et al (2008) BigTable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):1–26CrossRefGoogle Scholar
  97. 97.
    DeCandia G, Hastorum D, Jampani M et al (2007) Dynamo: Amazons highly available key-value store. In: Proceeding of the 21st ACM SIGOPS symposium on operating systems principles, pp 205–220Google Scholar
  98. 98.
    Dou W, Zhang X, Liu J et al (2013) HireSome-II: towards privacy-aware cross-cloud service composition for big data applications. IEEE Trans Parallel Distrib Syst TPDS 26(2):455–466CrossRefGoogle Scholar
  99. 99.
    Zhang X, Liu C, Nepal S et al (2014) A hybrid approach for scalable sub-tree anonymization over big data using mapreduce on cloud. J Comput Syst Sci 80(5):1008–1020MathSciNetCrossRefMATHGoogle Scholar
  100. 100.
    Jung G, Gnanasambandam N, Mukherjee T (2012) Synchronous parallel processing of big-data analytics services to optimize performance in federated clouds. In: Proceeding of the 2012 IEEE 5th international conference on cloud computing, pp 811–818Google Scholar
  101. 101.
    Yang C, Zhang X, Zhong C et al (2014) A Spatiotemporal compression based approach for efficient big data processing on cloud. J Comput Syst Sci 80(8):1563–1583MathSciNetCrossRefMATHGoogle Scholar
  102. 102.
  103. 103.
    Hazen BT, Boone CA, Ezell JD et al (2014) Data Quality for data science, predictive analysis, and big data in supply chain management: an introduction to the problem and suggestions for research and applications. Int J Prod Econ 154:72–80CrossRefGoogle Scholar
  104. 104.
    Tannahill BK, Jamshidi M (2014) System of systems and big data analytics -bridging the gap. Comput Electr Eng 40:2–15CrossRefGoogle Scholar
  105. 105.
    Lohr S (2012) The age of big data. The New York Times, New YorkGoogle Scholar
  106. 106.
    Cohen J, Dolan B, Dunlap M et al (2009) MAD skills: new analysis practices for big data. In: Proceeding of the VLDB 09. VLDB endowmentGoogle Scholar
  107. 107.
    Kumar A, Niu F, Ré C (2013) Hazy: make it easier to build and maintain big-data analytics. Commun ACM 56(3):40–49CrossRefGoogle Scholar
  108. 108.
    Machanavajjgala A, Reiter JP (2012) Big privacy: protecting confidentiality in big data. Magazine XRDS: crossroads. ACM Mag Stud Big Data 19(1):20–23Google Scholar
  109. 109.
    Feldman D, Schmidt M, Sohler C (2013) Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceeding of the annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1434–1453Google Scholar
  110. 110.
    Laptev N, Zeng K, Zaniolo C (2013) Very fast estimation for result and accuracy of big data analytics: the EARL system. In: Proceeding of the IEEE international conference on data engineering (ICDE), pp 1296–1299Google Scholar
  111. 111.
    Wu Z, Chin OB (2014) From big data to data science: a multi-disciplinary perspective. Big Data Res 1:1CrossRefGoogle Scholar
  112. 112.
    Chandramouli B, Goldstein J, Duan S (2012) Temporal analytics on big data for web advertising. In: Proceeding of the IEEE 28th international conference on data engineering (ICDE), pp 90–101Google Scholar
  113. 113.
    LaValle S, Lesser E, Shockley R et al (2011) Big data, analytics, and the path from insights to value. Hum Cap Rev Focus Hum Cap Anal 1(1)Google Scholar
  114. 114.
    Russom P (2011) Big data analytics. TDWI Best Practices Report, Fourth Quarter, pp 1–37. ftp://ftp.software.ibm.com/software/tw/Defining_Big_Data_through_3V_v.pdf. Accessed 11 Aug 2015
  115. 115.
    Borgman CL (2010) Research data: who will share what, with whom, when, and why? Working Paper No. 161, German Data Forum (RatSWD). Retrieved from www.germandataforum.de
  116. 116.
    Yang C, Goodchild M, Huang Q et al (2011) Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing? Int J Digit Earth 4(4):305–329CrossRefGoogle Scholar
  117. 117.
    Pijanowski BC, Tayyebi A, Doucette J et al (2014) A big data urban growth simulation at a national scale: configuring the GIS and neural network based land transformation model to run in a high performance computing (HPC) environment. Environ Model Softw 51:250–268CrossRefGoogle Scholar
  118. 118.
    Callebaut W (2012) Scientific perspectivism: a philosopher of sciences response to the challenge of big data biology. Stud Hist Philos Biol Biomed Sci 43(1):69–80MathSciNetCrossRefGoogle Scholar
  119. 119.
    Vanacek J (2012) How cloud and big data are impacting the human genome: touching 7 billion lives. Forbes. http://www.forbes.com/sites/sap/2012/04/16/how-cloud-and-big-data-are-impacting-the-human-genome-touching-7-billion-lives/. Accessed 11 Aug 2015
  120. 120.
    Costa FF (2012) Big data in genomics: challenges and solutions. GIT Lab J 11–12:1–4Google Scholar
  121. 121.
    Varpoorte R, Kim H, Choi Y (2006) Plants as source of medicines:new perspectives. In: Bogers RJ, Craker LE, Lange D (eds) Medicinal and aromatic plants. Springer, Netherlands, pp 261–273Google Scholar
  122. 122.
    Boyd D, Crawford K (2011) Six provocations for big data. In: A decade in internet time: symposium on the dynamics of the internet and society. doi: 10.2139/ssrn.1926431. Accessed 11 Aug 2015
  123. 123.
    Ansolabehere S, Hersh E (2012) Validation: what big data reveal about survey misreporting and the real electorate. Polit Anal 20(4):437–459CrossRefGoogle Scholar
  124. 124.
    Tene O, Polonetsky J (2012) Privacy in the age of big data: a time for big decisions. Standf Law Rev 63:63–69Google Scholar
  125. 125.
    Spalation Neutron Source (SNS). http://neutrons.ornl.gov/sns
  126. 126.
    White AA (2013) Big data are shaping the future of materials science. MRS Bull 38:594–595CrossRefGoogle Scholar
  127. 127.
  128. 128.
    Von Lilienfeld OA (2013) First principles view on chemical compound space: gaining rigorous atomistic control of molecular properties. Int J Quantum Chem 113(12):1676–1689Google Scholar
  129. 129.
    Groves P, Kayyali B, Knott D et al (2013) The big-data revolution in US health care: accelerating value and innovation. McKinsey & Company, New YorkGoogle Scholar
  130. 130.
    Kayyali B, Knott D, Van Kauiken S (2013) The big-data revolution in US health care: accelerating value and innovation. McKinsey & Company, New YorkGoogle Scholar
  131. 131.
    Lusher SJ, McGuire R, van Schaik RC et al (2014) Data-driven medicinal chemistry in the Era of big data. Drug Discov Today 19(7):859–868CrossRefGoogle Scholar
  132. 132.
    Costa FF (2013) Social networks, web-based tools and diseases: implication for biomedical research. Drug Discov Today Elsevier 18(5–6):272–281CrossRefGoogle Scholar
  133. 133.
    New Vantage Partners (2012) Big data executive survey 2012. Consolidated summary report. http://newvantage.com/wp-content/uploads/2012/12/NVP-Big-Data-Survey-Themes-Trends.pdf. Accessed 11 Aug 2015
  134. 134.
    Demirkan H, Delen D (2013) Leveraging the capabilities of service-oriented decision support systems: putting analytics and big data in cloud. Decis Support Syst 558(1):412–421CrossRefGoogle Scholar
  135. 135.
    Roman S, Katerina S (2012) The usability of agent-based simulation in decision support system of e-commerce architecture. Int J Inf Eng Electron Bus 4(1):10–17CrossRefGoogle Scholar
  136. 136.
    Harrison C, Eckman B, Hamilton R et al (2010) Foundations for smarter cities. IBM J Res Dev 54(4):1–16CrossRefGoogle Scholar
  137. 137.
    Khan Z, Anjum A, Liaquat Kiani S (2013) Cloud based big data analytics for smart future cities. In: Proceeding of the IEE/ACM 6th international conference on utility and cloud computing, pp 381–386Google Scholar
  138. 138.
    Vilajosana I, Llosa J, Martinez B et al (2013) Bootstrapping smart cities through a self-sustainable model based on big data flows. IEEE Commun Mag 51(6):128–134CrossRefGoogle Scholar
  139. 139.
    Dey S, Chakravorty A, Naskar S, Misra P (2012) Smart city surveillance: leveraging benefits of cloud data stores. In: Proceeding of the first IEEE international workshop on global trends in smart cities, pp 868–876Google Scholar
  140. 140.
    Jara AJ, Genoud D, Bocchi Y (2014) Big data in smart cities: from poisson to human dynamics. In: Proceeding of the IEEE 28th international conference on advanced information networking and applications workshops (WAINA). IEEE computer society, pp 785–790Google Scholar
  141. 141.
    Girtelschmid S, Steinbauer M, Kumar V et al (2013) Big data in large scale intelligent smart city installations. In: Proceeding of the international conference on information integration and web-based applications and services (IIWAS). ACMGoogle Scholar
  142. 142.
    Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (1996) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, New YorkGoogle Scholar
  143. 143.
    Rajaraman A, Ullman J (2011) Mining of massive data sets. Cambridge Univercity Press, CambridgeCrossRefGoogle Scholar
  144. 144.
    Berkovich S, Liao D (2012) On clusterization of big data streams. In. Proceeding of the 3rd international conference on computing for geospatial research and applications (COM.Geo). ACMGoogle Scholar
  145. 145.
    Moens S, Aksehirli E, Goethals B (2013) Frequent itemset mining for big data. In: Proceeding of the IEEE international conference on big data, pp 111–118Google Scholar
  146. 146.
    Ledolter J (2013) Data mining and business analytics with R. John Wiley & Sons, New YorkCrossRefMATHGoogle Scholar
  147. 147.
    Slavakis K, Giannakis GB, Mateos G (2014) Modeling and optimization for big data analytics. IEEE Signal Process Mag 31(5):18–31CrossRefGoogle Scholar
  148. 148.
    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRefGoogle Scholar
  149. 149.
    Grolinger K, Hayes M, Higashino WA et al (2014) Challenges for MapReduce in big data. In: Proceeding of the 2014 IEEE world congress on services (SERVICES), pp 182–189Google Scholar
  150. 150.
    Hashem IAT, Yaqoob I, Badrul Anuar N et al (2015) The rise of “Big Data” on cloud computing: review and open research issues. Inf Syst 47:98–115CrossRefGoogle Scholar
  151. 151.
    Zhifeng X, Yang X (2013) Security and privacy in cloud computing. IEEE Commun Surv Tutor 15(2):843–859CrossRefGoogle Scholar
  152. 152.
    Esposito C, Ficco M, Palmieri F et al (2014) A knowledge-based platform for big data analytics based on publish/subscribe services and stream processing. Knowl Based Syst 79:3–17CrossRefGoogle Scholar
  153. 153.
    López V, del Río S, Benítez JM et al (2014) Cost-sensitive Linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data. Fuzzy Sets Syst 258:5–38MathSciNetCrossRefGoogle Scholar
  154. 154.
    Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: Proceeding of the 19th ACM symposium on operating systems principles SOSP 03, pp 29–43Google Scholar
  155. 155.
    Lin J, Ryaboy D (2012) Scaling big data mining infrastructure: the twitter experience. SIGKDD Explor 14(2):6–19CrossRefGoogle Scholar
  156. 156.
    Isard M, Budiu M, Yu Y et al (2007) Dryad: distributed data-parallel programs from sequential building blocks In: Proceeding of the 2nd ACM SIGOPS/EuroSys European conference on computer systems, pp 59–72Google Scholar
  157. 157.
    Yu Y, Isard M, Fetterly D et al (2008) DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: Proceeding of the 8th USENIX conference on operating systems design and implementation, pp 1–14Google Scholar
  158. 158.
    Owen S, Anil R, Dunning T et al (2011) Mahout in action. Manning Publications Co. Greenwich, CT, USAGoogle Scholar
  159. 159.
  160. 160.
    Neumeyer L, Robbins B, Nair A et al (2010) S4: distributed stream computing platform. In: Proceeding of the 2010 international conference on data mining workshops (ICDMW). IEEEGoogle Scholar
  161. 161.
    Stoica I (2014) Conquering big data with spark and BDAS. In: Proceeding of the ACM international conference on measurement and modeling of computer systemsGoogle Scholar
  162. 162.
    Bifet A, Holmes G, Kirkby R et al (2010) MOA: massive online analysis. J Mach Learn Res (JMLR) 11:1601–1604Google Scholar
  163. 163.
  164. 164.
    Franceschini M (2013) How to maximize the value of big data with the open source SpagoBI suite through a comprehensive approach. In: Proceeding of the VLDB endowment, vol 6, pp 1170–1171Google Scholar
  165. 165.
    Bostock M, Ogievetsky V, Heer J (2011) D3 data-driven documents. IEEE Trans Vis Comput Graph 17(12):2301–2309CrossRefGoogle Scholar
  166. 166.
    SMLC: Smart Manufacturing Leadership Coalition. https://smartmanufacturingcoalition.org/
  167. 167.
    Ahmed KN (2013) Putting big data to work. Mech Eng 135:32–37Google Scholar
  168. 168.
    Guillemin P, Friess P (2009) Internet of things: strategic research roadmap. The cluster of European research projects. Tech. Rep. http://www.internet-of-things-research.eu/pdf/IoT_Cluster_Strategic_Research_Agenda_2009.pdf. Accessed 11 Aug 2015
  169. 169.
    Perera C, Zaslavsky A, Christen P et al (2014) Context aware computing for the internet of things: a survey. IEEE Commun Surv Tutor 16(1):414–454CrossRefGoogle Scholar
  170. 170.
    Stimmel CL, Gohn B (2012) Smart grid data analytics: smart meter, grid operations, asset management, and renewable energy integration data analytics: global market analysis and forecasts. Research Report (Executive Summary), 3Q, pp 1–16Google Scholar
  171. 171.
    Qin X, Zhou X (2013) A survey on benchmarks for big data and some more considerations. In: Yin H, Tang K, Gao Y et al (eds) Intelligent data engineering and automated learning-IDEAL 2013. LNCS, vol 8206. Springer, Berlin, Heidelberg, pp 619–627Google Scholar
  172. 172.
    Baru C, Bhandarkar M, Nambiar E et al (2013) Benchmarking big data systems and the big data top100 list. Big Data 1(1):60–64CrossRefGoogle Scholar
  173. 173.
    Xiong W, Yu Z, Bei Z et al (2013) A characterization of big data benchmarks. In: 2013 IEEE international conference on big data, pp 118–125Google Scholar
  174. 174.
    Ming Z, Luo C, Gao W et al (2014) BDGS: a scalable big data generator suite in big data benchmarking. Adv Big Data Benchmark LNCS 8585:138–154Google Scholar
  175. 175.
    Wang L, Zhan J, Luo C et al (2014) BigDataBench: A Big Data Benchmark Suite from Internet Services. In: Proceeding of the IEEE 20th international symposium on high performance computer architecture (HPCA), pp 488–499Google Scholar
  176. 176.
    Shekhar S, Evans MR, Gunturi V (2014) Benchmarking spatial big data. Specif Big Data Bechmark LNCS 8163:81–93CrossRefGoogle Scholar
  177. 177.
    Dean J (2014) Big data, data mining and machine learning: value creation for business leaders and practitioners. Wiley, New YorkCrossRefGoogle Scholar
  178. 178.
    Tang N (2014) Big data cleaning. Web Technol Appl LNCS 8709:13–24Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Lisbeth Rodríguez-Mazahua
    • 1
  • Cristian-Aarón Rodríguez-Enríquez
    • 1
  • José Luis Sánchez-Cervantes
    • 1
  • Jair Cervantes
    • 2
  • Jorge Luis García-Alcaraz
    • 3
  • Giner Alor-Hernández
    • 1
  1. 1.Division of Research and Postgraduate StudiesInstituto Tecnológico de OrizabaOrizabaMexico
  2. 2.Centro Universitario UAEM TexcocoUniversidad Autónoma del Estado de MéxicoTexcocoMexico
  3. 3.Departamento de Ingeniera Industrial y Manufactura, Instituto de Ingeniería y TecnologíaUniversidad Autónoma de Ciudad JuárezCiudad JuárezMexico

Personalised recommendations