Abstract
Explosive growth of raster data volumes in numerical simulations, remote sensing and other fields stimulate the development of new efficient data processing techniques. For example, in-situ approach queries data in diverse file formats avoiding time-consuming import phase. However, after data are read from file, their further processing always takes place with code developed almost from scratch. Standalone command line tools are one of the most popular ways for in-situ processing of raster files. Decades of development and feedback resulted in numerous feature-rich, elaborate, free and quality-assured tools optimized mostly for a single machine. The paper reports current development state and first results on performance evaluation of ChronosServer – distributed system partially delegating in-situ raster data processing to external tools. The new delegation approach is anticipated to readily provide rich collection of raster operations at scale. ChronosServer already outperforms state-of-the-art array DBMS on single machine up to 193×.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Launching DigitalGlobe’s Maps API | Mapbox. https://www.mapbox.com/blog/digitalglobe-maps-api/
Grawinkel, M., et al.: Analysis of the ECMWF storage landscape. In: 13th USENIX Conference on File and Storage Technologies, 16–19 February 2015. Santa Clara, CA (2015). https://usenix.org/system/files/login/articles/login_june_18_reports.pdf
GeoTIFF. http://trac.osgeo.org/geotiff/
ImageMagic: History. http://imagemagick.org/script/history.php
NCO Homepage. http://nco.sourceforge.net/
The Orfeo ToolBox on Open Hub. https://www.openhub.net/p/otb
Rodriges Zalipynis, R.A.: ChronosServer: real-time access to “native” multi-terabyte retrospective data warehouse by thousands of concurrent clients. Inform. Cybern. Comput. Eng. 14(188), 151–161 (2011)
ChronosServer. http://www.wikience.org/chronosserver/
Raster Data Management, Queries, and Applications (Chapter 5). http://postgis.net/docs/manual-2.2/using_raster_dataman.html
Baumann, P., Dumitru, A.M., Merticariu, V.: The array database that is not a database: file based array query answering in rasdaman. In: Nascimento, M.A., Sellis, T., Cheng, R., Sander, J., Zheng, Y., Kriegel, H.-P., Renz, M., Sengstock, C. (eds.) SSTD 2013. LNCS, vol. 8098, pp. 478–483. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40235-7_32
RasDaMan features. http://www.rasdaman.org/wiki/Features
Wang, Y., Nandi, A., Agrawal, G.: SAGA: array storage as a DB with support for structural aggregations. In: SSDBM 2014, June 30–July 02 (2014)
Wang, L., et al.: Clustered workflow execution of retargeted data analysis scripts. In: CCGRID (2008)
Buck, J.B., Watkins, N., LeFevre, J., Ioannidou, K., Maltzahn, C., Polyzotis, N., Brandt, S.: SciHadoop: array-based query processing in Hadoop. In: Proceedings of SC (2011)
Wang, Y., Jiang, W., Agrawal, G.: SciMATE: a novel mapreduce-like framework for multiple scientific data formats. In: Proceedings of CCGRID, pp. 443–450, May 2012
Malensek, M., Pallickara, S.: Galileo: a framework for distributed storage of high-throughput data streams. In: Proceedings of the 4th IEEE/ACM International Conference on Utility and Cloud Computing (2011)
ArcGIS for Server | Image Extension. http://www.esri.com/software/arcgis/arcgisserver/ extensions/image-extension
Oracle Spatial and Graph. http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/index.html
Georaster: Import very large images with sdo_ge… | Oracle Community. https://community.oracle.com/thread/3820691?start=0&tstart=0
Paradigm4: Creators of SciDB. http://scidb.org/
Interpolation - SciDB usage - SciDB Forum. http://forum.paradigm4.com/t/interpolation/1283
TileDB - Scientific data management made fast and easy. http://istc-bigdata.org/tiledb/index.html
Hadoop Streaming. wiki.apache.org/hadoop/HadoopStreaming
GitHub - Paradigm4/streaming: Prototype Hadoop streaming-like SciDB API. https://github.com/Paradigm4/streaming
Zhang, Y., et al.: SciQL: bridging the gap between science and relational DBMS. In: IDEAS 2011, September 21–23. Lisbon, Portugal (2011)
NCEP-DOE AMIP-II Reanalysis. http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.html
Zender, C.S.: Analysis of self-describing gridded geoscience data with netCDF Operators (NCO). Environ. Model Softw. 23, 1338–1342 (2008)
Zender, C.S., Mangalam, H.: Scaling properties of common statistical operators for gridded datasets. Int. J. High Perform. Comput. Appl. 21(4), 458–498 (2007)
Geospatial raster data processing. http://rgeo.wikience.org/pdf/slides/rgeo-course-04-raster_processing.pdf
Wickham, H.: The split-apply-combine strategy for data analysis. J. Stat. Softw. 40, 1–29 (2011)
Yang, H.C., Dasdan, A., Hsiao, R.L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: ACM SIGMOD, June 12–14, Beijing (2007)
Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15, 54–62 (2013)
Cudre-Mauroux, P., et al.: A demonstration of SciDB: a science-oriented DBMS. Proc. VLDB Endow. 2(2), 1534–1537 (2009)
Planthaber, G., Stonebraker, M., Frew, J.: EarthDB: scalable analysis of MODIS data using SciDB. In: BigSpatial, pp. 11–19 (2012)
Acknowledgements
This work was partially supported by Russian Foundation for Basic Research (grant #16-37-00416).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Appendix. ChronosServer Queries
-
Max U-wind speed (Sect. 5.3):
-
Calculate wind speed (Sect. 5.4):
-
Alter chunk shape to 10×10×8 (Sect. 5.5):
B Appendix. SciDB Queries
-
Initial SciDB array for U-wind speed:
-
Max U-wind speed (Sect. 5.3):
-
Calculate wind speed (Sect. 5.4):
-
Alter chunk shape to 10×10×8 (Sect. 5.5):
According to the answer of SciDB developers on their forum (question posted by the author of this paper in August 2016), above query is currently the fastest way to alter chunk size in SciDB: http://forum.paradigm4.com/t/fastest-way-to-alter-chunk-size/.
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Rodriges Zalipynis, R.A. (2016). ChronosServer: Fast In Situ Processing of Large Multidimensional Arrays with Command Line Tools. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2016. Communications in Computer and Information Science, vol 687. Springer, Cham. https://doi.org/10.1007/978-3-319-55669-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-55669-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55668-0
Online ISBN: 978-3-319-55669-7
eBook Packages: Computer ScienceComputer Science (R0)