Advertisement

A Unified Approach to Data Modeling and Management in Big Data Era

  • Catalin Negru
  • Florin PopEmail author
  • Mariana Mocanu
  • Valentin Cristea

Abstract

The emergence of big data paradigm and developments in other areas, such as cyber-infrastructures, smart cities, e-health, social media, Web 3.0, etc., has led to the production of huge volumes of data. Moreover, these data are often unstructured or semi-structured, with a high level of heterogeneity. Nowadays, information represents an essential factor in the process for supporting decision-making, and that is the reason that heterogeneous data must be integrated and analyzed to provide a unique view of information for many types of application. This chapter addresses the problem of modeling and integration of heterogeneous data that comes from multiple heterogeneous sources in the context of cyber-infrastructure systems and big data platforms. Furthermore, this chapter analyzes different heterogeneous data models in relation to heterogeneous sources such as the following: sensors, mobile users, web, and public open data sources (e.g., regulatory institutions). A CyberWater case study is also presented for the purposes of modeling, integration, and operation of these data in order to provide a unified approach and a unique view. The case study aims to offer support for different processes inside the CyberWater platform such as monitoring, analysis, and control of natural water resources, with the scope to preserve the water quality.

Keywords

Big data Data modeling Data management Cloud computing Heterogeneous distributed systems Cyber-infrastructure Natural resources 

References

  1. 1.
    Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big Data and its technical challenges. Commun ACM 57(7):86–94. doi: 10.1145/2611567 CrossRefGoogle Scholar
  2. 2.
    Kitchin R (2013) Big Data and human geography opportunities, challenges and risks. Dialogues in Hum Geogr 3(3):262–267. doi: 10.1177/2043820613513388 CrossRefGoogle Scholar
  3. 3.
    Kitchin R (2014) The real-time city? Big Data and smart urbanism. GeoJournal 79(1):1–14. doi: 10.1007/s10708-013-9516-8 CrossRefGoogle Scholar
  4. 4.
    Liu W, Park EK (2014) Big Data as an e-health service. In: International conference on Computing, Networking and Communications (ICNC), February 2014, IEEE, pp 982–988. doi: 10.1109/ICCNC.2014.6785471
  5. 5.
    Boyd D, Crawford K (2012) Critical questions for Big Data: provocations for a cultural, technological, and scholarly phenomenon. Inf Commun Soc 15(5):662–679. doi: 10.1080/1369118X.2012.678878 CrossRefGoogle Scholar
  6. 6.
    Zlatanova S, Fabbri AG (2009) Geo-ICT for risk and disaster management. In: Geospatial technology and the role of location in science, Springer, Dordrecht, pp 239–266. doi: 10.1007/978-90-481-2620-0_13
  7. 7.
    Careem M, De Silva C, De Silva R, Raschid L, Weerawarana S (2006) Sahana: overview of a disaster management system. In: International conference on Information and Automation, 2006. ICIA 2006, IEEE, pp 361–366. doi: 10.1109/ICINFA.2006.374152
  8. 8.
    Boulos MNK, Resch B, Crowley DN, Breslin JG, Sohn G, Burtner R, Chuang KYS (2011) Crowdsourcing, citizen sensing and sensor web technologies for public and environmental health surveillance and crisis management: trends, OGC standards and application examples. Int J Health Geogr 10(1):67. doi: 10.1186/1476-072X-10-67 CrossRefGoogle Scholar
  9. 9.
    Habiba M, Akhter S (2013) A cloud based natural disaster management system. In: Grid and pervasive computing. Springer, Berlin/Heidelberg, pp 152–161. doi: 10.1007/978-3-642-38027-3_16
  10. 10.
    Smith K (2013) Environmental hazards: assessing risk and reducing disaster. Routledge, New YorkGoogle Scholar
  11. 11.
    Lohr S (2012) The age of Big Data. New York Times, http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?_r=0. Accessed 25 Mar 2015
  12. 12.
    Bizer C, Boncz P, Brodie ML, Erling O (2012) The meaningful use of Big Data: four perspectives-four challenges. ACM SIGMOD Rec 40(4):56–60. doi: 10.1145/2094114.2094129 CrossRefGoogle Scholar
  13. 13.
    Ghit B, Capota M, Hegeman T, Hidders J, Epema D, Iosup A (2014) V for Vicissitude: the challenge of scaling complex Big Data workflows. In: 2014 14th IEEE/ACM international symposium on Cluster, Cloud and Grid Computing (CCGrid), IEEE, May, pp 927–932. doi: 10.1109/CCGrid.2014.97
  14. 14.
    Davenport T (2014) Three big benefits of Big Data analytics. http://www.sas.Com/tr_tr/news/sascom/2014q3/Big-data-davenport.html. Accessed 25 Mar 2015
  15. 15.
    Pop F, Cristea V (2015) The art of scheduling for big data science. In: Kuan-Ching Li, Hai Jiang, Yang LT, Cuzzocrea A (eds) Big data: algorithms, analytics, and applications. Chapman & Hall/CRC Big Data Series, pp 105–120, ISBN 978-1482240559Google Scholar
  16. 16.
    Freitas A, Curry E, Oliveira JG, O'Riain S (2012) Querying heterogeneous datasets on the linked data web: challenges, approaches, and trends. IEEE Internet Comput 16(1):24–33. doi: 10.1109/MIC.2011.141 CrossRefGoogle Scholar
  17. 17.
    Calì A, Calvanese D, De Giacomo G, Lenzerini M (2013) Data integration under integrity constraints. In: Seminal contributions to information systems engineering. Springer, Berlin/Heidelberg, pp 335–352. doi: 10.1007/3-540-47961-9_20
  18. 18.
    Buytaert W, Vitolo C, Reaney SM, Beven K (2012) Hydrological models as web services: experiences from the environmental virtual observatory project. In: AGU fall meeting abstracts, vol 1, p 1491Google Scholar
  19. 19.
    The Open Geospatial Consortium (OGC) Why is the OGC involved in sensor webs? http://www.opengeospatial.org/domain/swe. Accessed: 15 Apr 2015
  20. 20.
    Bröring A, Echterhoff J, Jirka S, Simonis I, Everding T, Stasch C, Lemmens R (2011) New generation sensor web enablement. Sensors 11(3):2652–2699. doi: 10.3390/s110302652 CrossRefGoogle Scholar
  21. 21.
    Reed C, Botts M, Davidson J, Percivall G (2007) OGC® sensor web enablement: overview and high level architecture. In: Autotestcon, 2007 IEEE, IEEE, pp 372–380. doi: 10.1007/978-3-540-79996-2_10
  22. 22.
    Chen N, Wang K, Xiao C, Gong J (2014) A heterogeneous sensor web node meta-model for the management of a flood monitoring system. Environ Model Softw 54:222–237. doi: 10.1016/j.envsoft.2014.01.014 CrossRefGoogle Scholar
  23. 23.
    Chen N, Di L, Yu G, Min M (2009) A flexible geospatial sensor observation service for diverse sensor data based on web service. ISPRS J Photogramm Remote Sens 64(2):234–242. doi: 10.1016/j.isprsjprs.2008.12.001 CrossRefGoogle Scholar
  24. 24.
    Gao Y, Wang F, Luan H, Chua TS (2014) Brand data gathering from live social media streams. In: Proceedings of international conference on multimedia retrieval, ACM, April, p 169. doi: 10.1145/2578726.2578748
  25. 25.
    Cuzzocrea A, Song IY, Davis KC (2011) Analytics over large-scale multidimensional data: the Big Data revolution! In: Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP, ACM, October, pp 101–104. doi: 10.1145/2064676.2064695
  26. 26.
    Ali MI (2011) Distributed heterogeneous web data sources integration: DeXIN approach. LAP Lambert Academic Publishing, SaarbrückenGoogle Scholar
  27. 27.
    Riley J (2001) The indicator explosion: local needs and international challenges. Agric Ecosyst Environ 87:119–120. doi: 10.1016/S0167-8809(01)00271-7 CrossRefGoogle Scholar
  28. 28.
    Wessels KJ, Van Den Bergh F, Scholes RJ (2012) Limits to detectability of land degradation by trend analysis of vegetation index data. Remote Sens Environ 125:10–22. doi: 10.1016/j.rse.2012.06.022 CrossRefGoogle Scholar
  29. 29.
    Amin S, Goldstein MP (eds) (2008) Data against natural disasters: establishing effective systems for relief, recovery, and reconstruction. World Bank-free PDFGoogle Scholar
  30. 30.
    Alazawi Z, Altowaijri S, Mehmood R, Abdljabar MB (2011) Intelligent disaster management system based on cloud-enabled vehicular networks. In: 2011 11th international conference on ITS Telecommunications (ITST), IEEE, August, pp 361–368. doi: 10.1109/ITST.2011.6060083
  31. 31.
    Huang W, Chen KW, Xiao C (2014) Integration on heterogeneous data with uncertainty in emergency system. In: Fuzzy information & engineering and operations research & management. Springer, Berlin/Heidelberg, pp 483–490. doi: 10.1007/978-3-642-38667-1_48 CrossRefGoogle Scholar
  32. 32.
    van Loenen B, Grothe M (2014) INSPIRE as enabler of open data objectives. In: INSPIRE conference: INSPIRE for good governanceGoogle Scholar
  33. 33.
    Leadbetter AM, Vodden PN (2015) Semantic linking of complex properties, monitoring processes and facilities in web-based representations of the environment. Int J of Digital Earth 9(3):1–38. doi: 10.1080/17538947.2015.1033483
  34. 34.
    Sicari S, Grieco LA, Boggia G, Coen-Porisini A (2012) DyDAP: a dynamic data aggregation scheme for privacy aware wireless sensor networks. J Syst Softw 85(1):152–166. doi: 10.1016/j.jss.2011.07.043 CrossRefGoogle Scholar
  35. 35.
    Stasch C, Foerster T, Autermann C, Pebesma E (2012) Spatio-temporal aggregation of European air quality observations in the sensor web. Comput Geosci 47:111–118. doi: 10.1016/j.cageo.2011.11.008 CrossRefGoogle Scholar
  36. 36.
    Bellavista P, Corradi A, Fanelli M, Foschini L (2012) A survey of context data distribution for mobile ubiquitous systems. ACM Comput Surv (CSUR) 44(4):24. doi: 10.1145/2333112.2333119 CrossRefGoogle Scholar
  37. 37.
    Xu J, Pottinger R (2014) Integrating domain heterogeneous data sources using decomposition aggregation queries. Inf Syst 39:80–107. doi: 10.1016/j.is.2013.06.003 CrossRefGoogle Scholar
  38. 38.
    Garg N (2013) Apache Kafka. Packt Publishing Ltd., BirminghamGoogle Scholar
  39. 39.
    Moharil B, Gokhale C, Ghadge V, Tambvekar P, Pundlik S, Rai G (2014) Real time generalized log file management and analysis using pattern matching and dynamic clustering. Int J Comput Appl 91(16):1–6. doi: 10.5120/15962-5320 Google Scholar
  40. 40.
    Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi: 10.1145/1327452.1327492 CrossRefGoogle Scholar
  41. 41.
    Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS operating systems review, vol 41, no. 3, pp 59–72, March ACM. doi: 10.1145/1272998.1273005
  42. 42.
    Ankit T, Siddarth T, Amit S, Karthik R, Jignesh MP, Sanjeev K, Jason J, Krishna G, Maosong F, Jake D, Nikunj B, Sailesh M, Dmitriy R (2014) Storm@twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data (SIGMOD ’14). ACM, New York, pp 147–156. doi: 10.1145/2588555.2595641
  43. 43.
    Matei Z, Mosharaf C, Michael JF, Scott S, Ion S (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in Cloud Computing (HotCloud’10). USENIX Association, Berkeley, CA, USA, pp 10–10Google Scholar
  44. 44.
    Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Ryaboy D (2014) Storm@twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM, June, pp 147–156. doi: 10.1145/2588555.2595641
  45. 45.
    Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment 2(1):922–933. doi: 10.14778/1687627.1687731
  46. 46.
    Leida M, Gusmini A, Davies J (2013) Semantics-aware data integration for heterogeneous data sources. J Ambient Intell Humaniz Comput 4(4):471–491. doi: 10.1007/s12652-012-0165-4 CrossRefGoogle Scholar
  47. 47.
    Whitehorn M. Aster Data founders explain unified approach to data big and small. http://www.computerweekly.com/feature/Aster-Data-founders-explain-unified-approach-to-data-big-and-small. Accessed: 15 Apr 2015
  48. 48.
    Singh VK, Gao M, Jain R (2012) Situation recognition: an evolving problem for heterogeneous dynamic big multimedia data. In: Proceedings of the 20th ACM international conference on Multimedia, ACM, October, pp 1209–1218. doi: 10.1145/2393347.2396421
  49. 49.
    Ciolofan SN, Mocanu M, Pop F, Cristea V (2014) Improving quality of water related data in a cyberinfrastructure. In: Third international workshop on cyber physical systems. doi: 10.13140/2.1.1380.4803

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Catalin Negru
    • 1
  • Florin Pop
    • 1
    Email author
  • Mariana Mocanu
    • 1
  • Valentin Cristea
    • 1
  1. 1.Computer Science Department, Faculty of Automatic Control and ComputersUniversity Politehnica of BucharestBucharestRomania

Personalised recommendations