Abstract
Observational measurements and model output data acquired or generated by the various research areas within the realm of Geosciences (also known as Earth Science) encompass a spatial scale of tens of thousands of kilometers and temporal scales of seconds to millions of years. Here geosciences refers to the study of atmosphere, hydrosphere, oceans, and biosphere as well as the earth’s core. Rapid advances in sensor deployments, computational capacity, and data storage density have been resulted in dramatic increases in the volume and complexity of data in geosciences. Geoscientists now see the data-intensive computing approach as part of their knowledge discovery process alongside traditional theoretical, experimental, and computational archetype [1]. Data-intensive computing poses unique challenges to the geoscience community that is exacerbated by the sheer size of the datasets involved.
Keywords
- Geospatial Data
- Hadoop Distribute File System
- Open Geospatial Consortium
- Global Telecommunication System
- Parallel File System
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
T. Hey, et al., The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, Washington: Microsoft Corporation, 2009.
F. M. Hoffman, et al., “Multivariate Spatio-Temporal Clustering (MSTC) as a data mining tool for environmental applications,” in the iEMSs Fourth Biennial Meeting: International Congress on Environmental Modelling and Software Society (iEMSs 2008), 2008, pp. 1774–1781.
F. M. Hoffman, et al., “Data Mining in Earth System,” in the International Conference on Computational Science (ICCS), 2011, pp. 1450–1455.
O. J. Reichman, et al. (2011) Challenges and opportunities of open data in ecology. Science. 703–705.
M. Keller, et al., “A continental strategy for the National Ecological Observatory Network,” Front. Ecol. Environ Special Issue on Continental-Scale Ecology, vol. 5, pp. 282–284, 2008.
D. Schimel, et al., “NEON: A hierarchically designed national ecological network,” Front. Ecol. Environ, vol. 2, 2007.
June, 17, 2011). The Open Geospatial Consortium (OGC) Available: http://www.opengeo http://spatial.org
G. Percivall and C. Reed, “OGC Sensor Web Enabliment Standards,” Sensors and Transducers Journal, vol. 71, pp. 698–706, 2006.
MTPE EOS Reference Handbook the EOS Project Science Office, code 900, NASA Goddard Space Flight Center, 1995.
The Global Telecommunication System. Available: http://www.wmo.int/pages/prog/www/TEM/GTS/index_en.html
National Center for Environmental Prediction (NCEP). Available: http://www.ncep.noaa.gov/
Panasas: Parallel File System for HPC Storage. Available: http://www.panasas.com/
M. M. Kuhn, et al., “Dynamic file system semantics to enable metadata optimizations in PVFS,” Concurrency and Computation: Practice and Experience, vol. 21, 2009.
P. J. Braam, “Lustre: a scalable high-performance file system,” 2002.
F. B. Schmuck and R. L. Haskin, “GPFS: A Shared-Disk File System for Large Computing Clusters,” in the Conference on File and Storage Technologies, 2002, pp. 231–244.
J. Lofstead, et al., “Managing Variability in the IO Performance of Petascale Storage Systems,” presented at the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
M. P. I. Forum, “MPI-2: Extensions to the Message-Passing Interface,” 1997.
S. Ghemawat, et al., “The Google File System,” ACM SIGOPS Operating Systems Review, vol. 37, 2003.
HDF-5. Available: http://hdf.ncsa.uiuc.edu/products/hdf5/
NetCDF4. Available: http://www.hdfgroup.org/projects/netcdf-4/.
J. Li, et al., “Parallel netCDF: A high-performance scientific I/O interface,” in ACM Supercomputing (SC03), 2003.
H. Abbasi, et al., “DataStager: scalable data staging services for petascale applications,” in ACM international Symposium on High Performance Distributed Computing, 2009.
J. Craig Upson, et al., “The Application Visualization System: A computational environment for scientific visualization,” IEEE Computer Graphics and Applications, pp. 30–42, 1989.
VisIt Visualization Tool. Available: https://wci.llnl.gov/codes/visit/home.html
R. Daley, Atmospheric Data Analysis: Cambridge atmospheric and space science series, 1993.
O. Wildi, Data Analysis in Vegetation Ecology Willey, 2010.
P. Rigaux, et al., Spatial Databases with Application to GIS: Morgan Kaufmann, 2002.
S. Shekhar and S. Chawla, Spatial Database: A Tour: Prentice Hall, 2002.
P. Longley, et al., Geographic Information Systems and Science, 3 ed.: John Wiley & Sons, 2011.
R. Rew and G. Davis, “NetCDF: an interface for scientific data access,” IEEE Computer Graphics and Applications, vol. 10, pp. 76–82, 1990.
Common Data Model. Available: http://www.unidata.ucar.edu/software/netcdf-java/CDM/
P. Cudre-Mauroux, et al., “A Demonstration of SciDB: A Science-Oriented DBMS,” in the 2009 VLDB Endowment 2009.
J. Buck, et al., “SciHadoop: Array-based Query Processing in Hadoop,” UCSC2011.
(2010, The HDF Group. Hierarchical data format version 5. http://www.hdfgroup.org/HDF5.
(2011, FITS Support Office. http://fits.gsfc.nasa.gov/.
D. C. Wells, et al., “FITS: A Flexible Image Transport System,” Astronomy & Astrophysics, vol. 44, pp. 363–370, 1981.
P. Cornillon, et al., “OPeNDAP: Accessing data in a distributed, heterogeneous environment,” Data Science Journal, vol. 2, pp. 164–174, 2003.
D. M. Karl, et al., “Building the long-term picture: U.S. JGOFS Time-series Programs,” Oceanography, pp. 6–17, 2001.
P. Ramsey, “PostGIS Manual,” ed: Refractions Research.
A. Guttman, “R-trees: a dynamic index structure for spatial searching,” in Proceedings of the 1984 ACM SIGMOD international conference on Management of data, ed. Boston, Massachusetts: ACM, 1984, pp. 47–57.
S. Tilak, et al., “The Ring Buffer Network Bus (RBNB) DataTurbine Streaming Data Middleware for Environmental Observing Systems,” in IEEE e-Science, 2007, pp. 125–133.
D. N. Williams, et al., “The Earth System Grid: Enabling Access to Multi-Model Climate Simulation Data,” Bulletin of the American Meteorological Society, vol. 90, pp. 195–205, 2009.
B. Domenico, et al., “Thematic Real-time Environmental Distributed Data Services (THREDDS): Incorporating Interactive Analysis Tools into NSDL,” Journal of Interactivity in Digital Libraries, vol. 2, 2002.
A. Shoshani, et al., “Storage Resource Managers (SRM) in the Earth System Grid,” Earth System Grid2009.
G. Khanna, et al., “A Dynamic Scheduling Approach for Coordinated Wide-Area Data Transfers using GridFTP,” in the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), 2008.
Globus Online \(\vert\) Reliable File Transfer. No IT Required. Available: https://www.globuson http://line.org/
P. G. Brown, “Overview of sciDB: large scale array storage, processing and analysis,” in Proceedings of the 2010 international conference on Management of data, ed. Indianapolis, Indiana, USA: ACM, 2010, pp. 963–968.
M. S. Mit, et al. (2009, Requirements for Science Data Bases and SciDB.
J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, pp. 107–113, 2008.
A. Akdogan, et al., “Voronoi-Based Geospatial Query Processing with MapReduce,” in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, ed, 2010, pp. 9–16.
Y. Wang and S. Wang, “Research and implementation on spatial data storage and operation based on Hadoop platform,” in Geoscience and Remote Sensing (IITA-GRS), 2010 Second IITA International Conference on vol. 2, ed, 2010, pp. 275–278.
Apache Hadoop. Available: http://hadoop.apache.org/
Hadoop Distributed File System. Available: http://hadoop.apache.org/hdfs/
J. Wang, et al., “Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems,” in Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, ed. Portland, Oregon: ACM, 2009, pp. 12:1–12:8.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Lee Pallickara, S., Malensek, M., Pallickara, S. (2011). On the Processing of Extreme Scale Datasets in the Geosciences. In: Furht, B., Escalante, A. (eds) Handbook of Data Intensive Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1415-5_20
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1415-5_20
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1414-8
Online ISBN: 978-1-4614-1415-5
eBook Packages: Computer ScienceComputer Science (R0)