Advertisement

Array DBMS and Satellite Imagery: Towards Big Raster Data in the Cloud

  • Ramon Antonio Rodriges ZalipynisEmail author
  • Evgeniy Pozdeev
  • Anton Bryukhov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10716)

Abstract

Satellite imagery have always been “big” data. Array DBMS is one of the tools to streamline raster data processing. However, raster data are usually stored in files, not in databases. Respective command line tools have long been developed to process these files. Most of the tools are feature-rich and free but optimized for a single machine. The approach of partially delegating in situ raster data processing to such tools has been recently proposed. The approach includes a new formal N-d array data model to abstract from the files and the tools as well as new formal distributed algorithms based on the model. ChronosServer is a distributed array DBMS under development into which the approach is being integrated. This paper extends the approach with a new algorithm for the reshaping (tiling) of arbitrary N-d arrays onto a set of overlapping N-d arrays with a fixed shape. Cutting arrays with an overlap enables to perform a broad range of large imagery processing operations in a distributed shared-nothing fashion. Currently ChronosServer provides a rich collection of raster operations at scale and outperforms SciDB up to 80\(\times \) on Landsat data. SciDB is the only freely available distributed array DBMS to date. Experiments were carried out on 8- and 16-node clusters in Microsoft Azure Cloud.

Keywords

ChronosServer SciDB Cloud computing Array DBMS Satellite imagery In situ Command line tools Big data Landsat 

Notes

Acknowledgments

This work was partially supported by Russian Foundation for Basic Research (grant №16-37-00416).

Contributions

Rodriges: all text, figures, design and implementation of algorithms and ChronosServer, ChronosServer data model, Azure management code, SciDB import code, experimental setup. Pozdeev: SciDB cluster deployment. Bryukhov: adapted SciDB import code to Landsat data. All authors: experiments.

References

  1. 1.
  2. 2.
    Baumann, P., Dumitru, A.M., Merticariu, V.: The array database that is not a database: file based array query answering in rasdaman. In: Nascimento, M.A., Sellis, T., Cheng, R., Sander, J., Zheng, Y., Kriegel, H.-P., Renz, M., Sengstock, C. (eds.) SSTD 2013. LNCS, vol. 8098, pp. 478–483. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40235-7_32 CrossRefGoogle Scholar
  3. 3.
    Baumann, P., Holsten, S.: A comparative analysis of array models for databases. Int. J. Database Theory Appl. 5(1), 89–120 (2012)Google Scholar
  4. 4.
    Blanas, S., Wu, K., Byna, S., Dong, B., Shoshani, A.: Parallel data analysis directly on scientific file formats. In: ACM SIGMOD 2014, pp. 385–396 (2014)Google Scholar
  5. 5.
  6. 6.
    Cudre-Mauroux, P., et al.: A demonstration of SciDB: a science-oriented DBMS. Proc. VLDB Endowment 2(2), 1534–1537 (2009)CrossRefGoogle Scholar
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
    Nativi, S., Caron, J., Domenico, B., Bigagli, L.: Unidatas common data model mapping to the ISO 19123 data model. Earth Sci. Inf. 1, 59–78 (2008)CrossRefGoogle Scholar
  15. 15.
  16. 16.
    Papadopoulos, S., et al.: The TileDB array data storage manager. Proc. VLDB Endowment 10, 349–360 (2016)CrossRefGoogle Scholar
  17. 17.
  18. 18.
  19. 19.
    Richards, J.A.: Remote Sensing Digital Image Analysis: An Introduction, 5th edn. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-30062-2 CrossRefGoogle Scholar
  20. 20.
    Rodriges Zalipynis, R.A.: Chronosserver: real-time access to “native” multi-terabyte retrospective data warehouse by thousands of concurrent clients. Inf. Cybern. Comput. Eng. 14(188), 151–161 (2011)Google Scholar
  21. 21.
    Rodriges Zalipynis, R.A.: ChronosServer: fast in situ processing of large multidimensional arrays with command line tools. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2016. CCIS, vol. 687, pp. 27–40. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-55669-7_3 CrossRefGoogle Scholar
  22. 22.
    Rodriges Zalipynis, R.A.: Distributed in situ processing of big raster data in the Cloud. In: Perspectives of System Informatics - 11th International Andrei Ershov Informatics Conference, PSI 2017, Moscow, Russia, June 27–29, 2017, Revised Selected Papers. LNCS. Springer (2017, in press)Google Scholar
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
    Zhang, Y., et al.: SciQL: bridging the gap between science and relational DBMS. In: IDEAS (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Ramon Antonio Rodriges Zalipynis
    • 1
    Email author
  • Evgeniy Pozdeev
    • 1
  • Anton Bryukhov
    • 1
  1. 1.National Research University Higher School of EconomicsMoscowRussia

Personalised recommendations