Skip to main content

ChronosServer: Fast In Situ Processing of Large Multidimensional Arrays with Command Line Tools

  • Conference paper
  • First Online:
Supercomputing (RuSCDays 2016)

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Included in the following conference series:

Abstract

Explosive growth of raster data volumes in numerical simulations, remote sensing and other fields stimulate the development of new efficient data processing techniques. For example, in-situ approach queries data in diverse file formats avoiding time-consuming import phase. However, after data are read from file, their further processing always takes place with code developed almost from scratch. Standalone command line tools are one of the most popular ways for in-situ processing of raster files. Decades of development and feedback resulted in numerous feature-rich, elaborate, free and quality-assured tools optimized mostly for a single machine. The paper reports current development state and first results on performance evaluation of ChronosServer – distributed system partially delegating in-situ raster data processing to external tools. The new delegation approach is anticipated to readily provide rich collection of raster operations at scale. ChronosServer already outperforms state-of-the-art array DBMS on single machine up to 193×.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Launching DigitalGlobe’s Maps API | Mapbox. https://www.mapbox.com/blog/digitalglobe-maps-api/

  2. Grawinkel, M., et al.: Analysis of the ECMWF storage landscape. In: 13th USENIX Conference on File and Storage Technologies, 16–19 February 2015. Santa Clara, CA (2015). https://usenix.org/system/files/login/articles/login_june_18_reports.pdf

  3. GeoTIFF. http://trac.osgeo.org/geotiff/

  4. ImageMagic: History. http://imagemagick.org/script/history.php

  5. NCO Homepage. http://nco.sourceforge.net/

  6. The Orfeo ToolBox on Open Hub. https://www.openhub.net/p/otb

  7. Rodriges Zalipynis, R.A.: ChronosServer: real-time access to “native” multi-terabyte retrospective data warehouse by thousands of concurrent clients. Inform. Cybern. Comput. Eng. 14(188), 151–161 (2011)

    Google Scholar 

  8. ChronosServer. http://www.wikience.org/chronosserver/

  9. Raster Data Management, Queries, and Applications (Chapter 5). http://postgis.net/docs/manual-2.2/using_raster_dataman.html

  10. Baumann, P., Dumitru, A.M., Merticariu, V.: The array database that is not a database: file based array query answering in rasdaman. In: Nascimento, M.A., Sellis, T., Cheng, R., Sander, J., Zheng, Y., Kriegel, H.-P., Renz, M., Sengstock, C. (eds.) SSTD 2013. LNCS, vol. 8098, pp. 478–483. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40235-7_32

    Chapter  Google Scholar 

  11. RasDaMan features. http://www.rasdaman.org/wiki/Features

  12. NetCDF. http://www.unidata.ucar.edu/software/netcdf/docs/

  13. Wang, Y., Nandi, A., Agrawal, G.: SAGA: array storage as a DB with support for structural aggregations. In: SSDBM 2014, June 30–July 02 (2014)

    Google Scholar 

  14. Wang, L., et al.: Clustered workflow execution of retargeted data analysis scripts. In: CCGRID (2008)

    Google Scholar 

  15. Buck, J.B., Watkins, N., LeFevre, J., Ioannidou, K., Maltzahn, C., Polyzotis, N., Brandt, S.: SciHadoop: array-based query processing in Hadoop. In: Proceedings of SC (2011)

    Google Scholar 

  16. Wang, Y., Jiang, W., Agrawal, G.: SciMATE: a novel mapreduce-like framework for multiple scientific data formats. In: Proceedings of CCGRID, pp. 443–450, May 2012

    Google Scholar 

  17. Malensek, M., Pallickara, S.: Galileo: a framework for distributed storage of high-throughput data streams. In: Proceedings of the 4th IEEE/ACM International Conference on Utility and Cloud Computing (2011)

    Google Scholar 

  18. ArcGIS for Server | Image Extension. http://www.esri.com/software/arcgis/arcgisserver/ extensions/image-extension

  19. Oracle Spatial and Graph. http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/index.html

  20. Georaster: Import very large images with sdo_ge… | Oracle Community. https://community.oracle.com/thread/3820691?start=0&tstart=0

  21. Paradigm4: Creators of SciDB. http://scidb.org/

  22. Interpolation - SciDB usage - SciDB Forum. http://forum.paradigm4.com/t/interpolation/1283

  23. TileDB - Scientific data management made fast and easy. http://istc-bigdata.org/tiledb/index.html

  24. Hadoop Streaming. wiki.apache.org/hadoop/HadoopStreaming

    Google Scholar 

  25. GitHub - Paradigm4/streaming: Prototype Hadoop streaming-like SciDB API. https://github.com/Paradigm4/streaming

  26. Zhang, Y., et al.: SciQL: bridging the gap between science and relational DBMS. In: IDEAS 2011, September 21–23. Lisbon, Portugal (2011)

    Google Scholar 

  27. NCEP-DOE AMIP-II Reanalysis. http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.html

  28. Zender, C.S.: Analysis of self-describing gridded geoscience data with netCDF Operators (NCO). Environ. Model Softw. 23, 1338–1342 (2008)

    Article  Google Scholar 

  29. Zender, C.S., Mangalam, H.: Scaling properties of common statistical operators for gridded datasets. Int. J. High Perform. Comput. Appl. 21(4), 458–498 (2007)

    Article  Google Scholar 

  30. Geospatial raster data processing. http://rgeo.wikience.org/pdf/slides/rgeo-course-04-raster_processing.pdf

  31. Wickham, H.: The split-apply-combine strategy for data analysis. J. Stat. Softw. 40, 1–29 (2011)

    Google Scholar 

  32. Yang, H.C., Dasdan, A., Hsiao, R.L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: ACM SIGMOD, June 12–14, Beijing (2007)

    Google Scholar 

  33. Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15, 54–62 (2013)

    Article  Google Scholar 

  34. Cudre-Mauroux, P., et al.: A demonstration of SciDB: a science-oriented DBMS. Proc. VLDB Endow. 2(2), 1534–1537 (2009)

    Article  Google Scholar 

  35. Planthaber, G., Stonebraker, M., Frew, J.: EarthDB: scalable analysis of MODIS data using SciDB. In: BigSpatial, pp. 11–19 (2012)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by Russian Foundation for Basic Research (grant #16-37-00416).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramon Antonio Rodriges Zalipynis .

Editor information

Editors and Affiliations

Appendices

A Appendix. ChronosServer Queries

  • Max U-wind speed (Sect. 5.3):

  • Calculate wind speed (Sect. 5.4):

  • Alter chunk shape to 10×10×8 (Sect. 5.5):

B Appendix. SciDB Queries

  • Initial SciDB array for U-wind speed:

  • Max U-wind speed (Sect. 5.3):

  • Calculate wind speed (Sect. 5.4):

  • Alter chunk shape to 10×10×8 (Sect. 5.5):

According to the answer of SciDB developers on their forum (question posted by the author of this paper in August 2016), above query is currently the fastest way to alter chunk size in SciDB: http://forum.paradigm4.com/t/fastest-way-to-alter-chunk-size/.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Rodriges Zalipynis, R.A. (2016). ChronosServer: Fast In Situ Processing of Large Multidimensional Arrays with Command Line Tools. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2016. Communications in Computer and Information Science, vol 687. Springer, Cham. https://doi.org/10.1007/978-3-319-55669-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55669-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55668-0

  • Online ISBN: 978-3-319-55669-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics