Skip to main content

Advertisement

Log in

Big data challenges in ocean observation: a survey

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Ocean observation plays an essential role in ocean exploration. Ocean science is entering into big data era with the exponentially growth of information technology and advances in ocean observatories. Ocean observatories are collections of platforms capable of carrying sensors to sample the ocean over appropriate spatiotemporal scales. Data collected by these platforms help answer a range of fundamental and applied research questions. Many countries are spending considerable amount of resources on ocean observing programs for various purposes. Given the huge volume, diverse types, sustained measurement, and potential uses of ocean observing data, it is a typical kind of big data, namely marine big data. The traditional data-centric infrastructure is insufficient to deal with new challenges arising in ocean science. New distributed, large-scale modern infrastructure backbone is urgently required. This paper discusses some possible strategies to solve marine big data challenges in the phases of data storage, data computing, and analysis. Some applications in physics, chemistry, geology, and biology illustrate the significant uses of marine big data. Finally, we highlight some challenges and key issues in marine big data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. http://www.whoi.edu/

  2. Schofield O, Glenn S, Orcutt J, Arrott M, Meisinger M, Gangopadhyay A, Brown W, Signell R, Moline M, Chao Y, Chien S, Thompson D, Balasuriya A, Lermusiaux P, Oliver M (2010) Automated sensor network to advance ocean science. Eos, Trans Am Geophys Union 91(39):345–346. doi:10.1029/2010EO390001

    Article  Google Scholar 

  3. http://oceanobservatories.org/

  4. Chave AD, Arrott M, Farcas C, Farcas E, Krueger I, Meisinger M, Orcutt JA, Vernon FL, Peach C, Schofield O, Kleinert JE (2009) Cyberinfrastructure for the US ocean observatories initiative: enabling interactive observation in the ocean. In: IEEE OCEANS 2009—EUROPE, Bremen, 11–14 May 2009, pp 1–10. doi:10.1109/OCEANSE.2009.5278134

  5. Beyer MA, Laney D (2012) The Importance of ‘Big Data’: a definition. Gartner Inc, Stamford

    Google Scholar 

  6. Farcas C, Meisinger M, Stuebe D, Mueller C, Ampe T, Arrott M, Chave A, Farcas E, Graybeal J, Krueger I, Manning M, Orcutt J, Schofield O, Vernon F(2011) Ocean Observatories Initiative Scientific Data Model. In: IEEE OCEANS 2011, Waikoloa, HI, 19–22 Sept. 2011, pp 1–10

  7. Park K, Nguyen MC, Won H (2015) Web-based collaborative big data analytics on big data as a service platform. In: IEEE Advanced Communication Technology (ICACT), 2015 17th International Conference on, Seoul, 1–3 July 2015, pp 564–567. doi:10.1109/ICACT.2015.7224859

  8. Bellatreche L, Furtado P, Mohania MK (2015) Guest editorial: a special issue in physical design for big data warehousing and mining. Distrib parallel databases 34(3):289–292. doi:10.1007/s10619-015-7182-1

    Article  Google Scholar 

  9. Demchenko Y, Laat Cd, Membrey P (2014) Defining architecture components of the Big Data Ecosystem. In: IEEE Collaboration Technologies and Systems (CTS), International Conference on Minneapolis, MN, 19–23 May 2014, pp 104–112. doi:10.1109/CTS.2014.6867550

  10. Du Y, Wang Z, Huang D, Yu J (2012) Study of migration model based on the massive marine data hybrid cloud storage. In: IEEE Agro-Geoinformatics (Agro-Geoinformatics), First International Conference on, Shanghai, 2–4 Aug. 2012, pp 1–4. doi:10.1109/Agro-Geoinformatics.2012.6311684

  11. Huang D, Zhao D, Wei L, Wang Z, Du Y (2015) Modeling and analysis in marine big data: advances and challenges. Math Probl Eng. doi:10.1155/2015/384742

    Google Scholar 

  12. Yang K, Jia X, Ren K, Xie R, Huang L (2014) Enabling efficient access control with dynamic policy updating for big data in the cloud. In: IEEE INFOCOM, 2014 Proceedings IEEE, Toronto, ON, April 27 2014-May 2 2014, pp 2013–2021. doi:10.1109/INFOCOM.2014.6848142

  13. Schofield O, Glenn SM, Moline MA, Oliver M, Irwin A, Chao Y, Arrott M (2013) Ocean Observatories and Information: building a global ocean observing network. In: Orcutt J (ed) Earth system monitoring: selected entries from the encyclopedia of sustainability science and technology. Springer, New York, pp 319–336. doi:10.1007/978-1-4614-5684-1_14

  14. http://www.argo.net/

  15. http://www.ioc-goos.org/

  16. http://www.oceannetworks.ca/

  17. http://www.ioos.noaa.gov/

  18. Siriweera THAS, Paik I, Kumara BTGS, Koswatta KRC (2015) Intelligent Big Data Analysis Architecture Based on Automatic Service Composition. In: IEEE Big Data (BigData Congress), 2015 IEEE International Congress on, New York, NY, June 27 2015–July 2 2015, pp 276–280. doi:10.1109/BigDataCongress.2015.46

  19. Antonia C, Andrei N, María-Jesús G (2011) DAMAR: Information management system for marine data. In: OCEANS, 2011 IEEE—Spain, Santander, 6–9 June 2011. IEEE, pp 1–6. doi:10.1109/Oceans-Spain.2011.6003456

  20. Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209. doi:10.1007/s11036-013-0489-0

    Article  Google Scholar 

  21. Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. SIGOPS Oper Syst Rev 37(5):29–43. doi:10.1145/1165389.945450

    Article  Google Scholar 

  22. Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop Distributed File System. In: IEEE Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, Incline Village, NV, 3–7 May 2010, pp 1–10. doi:10.1109/MSST.2010.5496972

  23. Chaiken R, Jenkins B, Larson P-Å, Ramsey B, Shakib D, Weaver S, Zhou J (2008) SCOPE: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276. doi:10.14778/1454159.1454166

    Article  Google Scholar 

  24. Beaver D, Kumar S, Li HC, Sobel J, Vajgel P (2010) Finding a needle in Haystack: Facebook’s photo storage. Paper presented at the Proceedings of the 9th USENIX conference on Operating systems design and implementation, Vancouver

  25. DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. SIGOPS Oper Syst Rev 41(6):205–220. doi:10.1145/1323293.1294281

    Article  Google Scholar 

  26. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a Distributed Storage System for Structured Data. ACM Trans Comput Syst 26(2):1–26. doi:10.1145/1365815.1365816

    Article  Google Scholar 

  27. Chodorow K (2013) MongoDB: the definitive guide, 2nd edn. O’Reilly Media, Sebastopol

    Google Scholar 

  28. Murty J (2008) Programming amazon web services: S3, EC2, SQS, FPS, and SimpleDB. O’Reilly Media, Sebastopol

    Google Scholar 

  29. Anderson JC, Lehnardt J, Slater N (2010) CouchDB: The Definitive Guide. O’Reilly Media, Sebastopol

    Google Scholar 

  30. Cho S (2015) Fast memory and storage architectures for the big data era. In: IEEE Solid-State Circuits Conference (A-SSCC), 2015 IEEE Asian, Xiamen, 9–11 Nov. 2015, pp 1–4. doi:10.1109/ASSCC.2015.7387515

  31. Mühlbauer T, Rödiger W, Seilbeck R, Reiser A, Kemper A, Neumann T (2013) Instant loading for main memory databases. Proc VLDB Endow 6(14):1702–1713. doi:10.14778/2556549.2556555

    Article  Google Scholar 

  32. Raynaud T, Haque R, Aït-kaci H (2014) CedCom: A high-performance architecture for Big Data applications. In: IEEE Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on, Doha, 10–13 Nov. 2014, pp 621–632. doi:10.1109/AICCSA.2014.7073257

  33. Ousterhout J, Agrawal P, Erickson D, Kozyrakis C, Leverich J, Mazières D, Mitra S, Narayanan A, Parulkar G, Rosenblum M, Rumble SM, Stratmann E, Stutsman R (2010) The case for RAMClouds: scalable high-performance storage entirely in DRAM. SIGOPS Oper Syst Rev 43(4):92–105. doi:10.1145/1713254.1713276

    Article  Google Scholar 

  34. Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107. doi:10.1109/TKDE.2013.109

    Article  Google Scholar 

  35. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492

    Article  Google Scholar 

  36. Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. Paper presented at the Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, Lisbon, Portugal

  37. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. Paper presented at the Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, Indianapolis, Indiana, USA, 6–11 June 2010

  38. Moretti C, Bulosan J, Thain D, Flynn PJ (2008) All-pairs: An abstraction for data-intensive cloud computing. In: IEEE Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on Miami, FL, 14–18 April 2008, pp 1–11. doi:10.1109/IPDPS.2008.4536311

  39. Wu XD, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37. doi:10.1007/s10115-007-0114-2

    Article  Google Scholar 

  40. Chang EY, Bai H, Zhu K (2009) Parallel algorithms for mining large-scale rich-media data. Paper presented at the Proceedings of the 17th ACM international conference on Multimedia, Beijing, China

  41. Leung CK-S, Hayduk Y (2013) Mining frequent patterns from uncertain data with MapReduce for big data analytics. In: 18th International Conference on Database Systems for Advanced Applications, DASFAA 2013, Wuhan, China, 22–25 April 2013. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag, pp 440–455. doi:10.1007/978-3-642-37487-6_33

  42. Leung CKS, MacKinnon RK, Jiang F (2014) Reducing the Search Space for Big Data Mining for Interesting Patterns from Uncertain Data. In: IEEE Big Data (BigData Congress), 2014 IEEE International Congress on, Anchorage, AK, June 27 2014–July 2 2014, pp 315–322. doi:10.1109/BigData.Congress.2014.53

  43. Xindong W, Shichao Z (2003) Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 15(2):353–367. doi:10.1109/TKDE.2003.1185839

    Article  Google Scholar 

  44. Domingos P, Hulten G (2000) Mining high-speed data streams. Paper presented at the Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston

  45. Zhu W, Cui P, Wang Z, Hua G (2015) Multimedia Big Data Computing. IEEE Multimedia 22(3):96-c3. doi:10.1109/MMUL.2015.66

    Article  Google Scholar 

  46. Kantere V A (2014) Holistic Framework for Big Scientific Data Management. In: IEEE Big Data (Big Data Congress), 2014 IEEE International Congress on, Anchorage, AK, June 27 2014–July 2 2014, pp 220–226. doi:10.1109/BigData.Congress.2014.39

  47. Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. Paper presented at the Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Vancouver, Canada

  48. Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010) Hive—a petabyte scale data warehouse using Hadoop. In: IEEE Data Engineering (ICDE), 2010 IEEE 26th International Conference on Long Beach, CA, 1–6 March 2010 , pp 996–1005. doi:10.1109/ICDE.2010.5447738

  49. Das S, Sismanis Y, Beyer KS, Gemulla R, Haas PJ, McPherson J (2010) Ricardo: integrating R and Hadoop. Paper presented at the Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, Indianapolis, Indiana, USA

  50. Wegener D, Mock M, Adranale D, Wrobel S (2009) Toolkit-Based High-Performance Data Mining of Large Data on MapReduce Clusters. In: IEEE Data Mining Workshops, 2009. ICDMW ‘09. IEEE International Conference on Miami, FL 6 Dec. 2009, pp 296–301. doi:10.1109/ICDMW.2009.34

  51. Lin YC, Wu C-W, Tseng VS Mining high utility itemsets in big data. In: 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19–22, 2015 2015. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag, pp 649-–661. doi:10.1007/978-3-319-18032-8_51

  52. http://www.pmel.noaa.gov/co2/

  53. Palumbi SR, Sandifer PA, Allan JD, Beck MW, Fautin DG, Fogarty MJ, Halpern BS, Incze LS, Leong J-A, Norse E, Stachowicz JJ, Wall DH (2009) Managing for ocean biodiversity to sustain marine ecosystem services. Front Ecol Environ 7(4):204–211. doi:10.1890/070135

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61572448, 61379127, 61673357, and by the Shandong Provincial Natural Science Foundation, China under Grant No. ZR2014JL043.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingjian Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Qiu, M., Liu, C. et al. Big data challenges in ocean observation: a survey. Pers Ubiquit Comput 21, 55–65 (2017). https://doi.org/10.1007/s00779-016-0980-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-016-0980-2

Keywords

Navigation