Advertisement

BBoxDB: a distributed and highly available key-bounding-box-value store

  • Jan Kristof NidzwetzkiEmail author
  • Ralf Hartmut Güting
Article

Abstract

BBoxDB is a distributed and highly available key-bounding-box-value store, which is designed to handle multi-dimensional big data. To handle large amounts of data, the software splits the stored data into multi-dimensional shards and spreads them across a cluster of nodes. Unlike existing key-value stores, BBoxDB stores each value together with an n-dimensional, axis parallel bounding box. The bounding box describes the spatial location of the value in an n-dimensional space. Multi-dimensional data can be retrieved by using range queries, which are efficiently supported by indices. A space partitioner (e.g., a K-D Tree, a Quad-Tree or a Grid) is used to split the n-dimensional space into disjoint regions (distribution regions). Distribution regions are created dynamically, based on the stored data. BBoxDB can handle growing and shrinking datasets. The data redistribution is performed in the background and does not affect the availability of the system; read and write access is still possible at any time. BBoxDB works with distribution groups, the data of all tables in a distribution group are distributed in the same way (co-partitioned). Spatial joins on co-partitioned tables can be executed efficiently without data shuffling between nodes. BBoxDB supports spatial joins out-of-the-box using the bounding boxes of the stored data. The joins are supported by a spatial index and executed in a distributed and parallel manner on the nodes of the cluster.

Keywords

Distributed data store Storage engine Key-bounding-box-value store Multi-dimensional data store 

Notes

Acknowledgements

We are grateful for the free license of JProfiler, which ej-technologies GmbH provided for the BBoxDB open source project. The profiler helped us to speed up the implementation of BBoxDB significantly.

References

  1. 1.
    Website of the Apache Accumulo project. https://accumulo.apache.org/, 2017. Online Accessed 05 Oct 2017
  2. 2.
    Website of the Apache CouchDB project, 2018. http://couchdb.apache.org/. Online Accessed 15 April 2018
  3. 3.
    Website of Apache Hadoop project. http://hadoop.apache.org/, 2018. Online Accessed 15 April 2018
  4. 4.
    Apache software license, version 2.0, 2004. http://www.apache.org/licenses/. Online Accessed 15 May 2017
  5. 5.
    Athanassoulis, M., Kester, M.S., Maas, L.M., Stoica, R., Idreos, S., Ailamaki, A., Callaghan, M.: Designing access methods: The RUM conjecture. In Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, March 15–16, 2016, Bordeaux, France, March 15–16, 2016, pp 461–466, (2016)Google Scholar
  6. 6.
    Baker, H.C., Hewitt, C.: The incremental garbage collection of processes. In Proceedings of the 1977 Symposium on Artificial Intelligence and Programming Languages, pp. 55–59, New York, NY, USA, ACM (1977)Google Scholar
  7. 7.
    BBoxDB at the maven repository, 2018. https://maven-repository.com/artifact/org.bboxdb. Online Accessed 15 April 2018
  8. 8.
    Docker image of BBoxDB on DockerHub, 2018. https://hub.docker.com/r/jnidzwetzki/bboxdb/. Online Accessed 15 April 2018
  9. 9.
    Website of Docker Compose, 2018. https://docs.docker.com/compose/. Online Accessed 15 April 2018
  10. 10.
    Website of the BBoxDB project. http://bboxdb.org, 2018. Online Accessed 03 Feb 2018
  11. 11.
    Website of the Docker project, 2018. https://www.docker.com/. Online Accessed 15 Apr 2018
  12. 12.
    The network protocol of BBoxDB. https://jnidzwetzki.github.io/bboxdb//dev/network.html, 2019. Online Accessed 16 Jul 2019
  13. 13.
    Cassandra based baseline approach for performance evaluation, 2018. https://github.com/jnidzwetzki/bboxdb/blob/master/bboxdb-experiments/src/main/java/org/bboxdb/experiments/TestBaselineApproach.java. Online Accessed 21 Oct 2018
  14. 14.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefGoogle Scholar
  16. 16.
    Böhm, C., Klump, G., Kriegel, H.P.: XZ-Ordering: A Space-Filling Curve for Objects with Spatial Extension. In Advances in Spatial Databases, 6th International Symposium, SSD’99, Hong Kong, China, July 20–23, 1999, Proceedings, pp. 75–90 (1999)Google Scholar
  17. 17.
    Bracciale, Lorenzo, Bonola, Marco, Loreti, Pierpaolo, Bianchi, Giuseppe, Amici, Raul, Rabuffi, Antonello: CRAWDAD dataset roma/taxi (v. 2014-07-17). Downloaded from https://crawdad.org/roma/taxi/20140717, July (2014)
  18. 18.
    Cassandra---ALLOW FILTERING explained, 2019. https://www.datastax.com/dev/blog/allow-filtering-explained-2. Online Accessed 15 Jul 2019
  19. 19.
    Cassandra---Create Index, 2019. https://docs.datastax.com/en/archived/cql/3.3/cql/cql_reference/cqlCreateIndex.html. Online Accessed 15 Jul 2019
  20. 20.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008)CrossRefGoogle Scholar
  21. 21.
    Transaction Processing Performance Council. TPC BENCHMARK H (Decision Support) Standard Specification. http://www.tpc.org/tpch/. Online Accessed 22 April 2018
  22. 22.
    Website of Elasticsearch. https://www.elastic.co/products/elasticsearch/, 2018. Online Accessed 23 April 2018
  23. 23.
    Eldawy, A., Mokbel, M.F.: SpatialHadoop: A MapReduce Framework for Spatial Data. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13–17, 2015, pp. 1352–1363, (2015)Google Scholar
  24. 24.
    Escriva, R., Wong, B., Sirer, E.G.: HyperDex: A Distributed, Searchable Key-value Store. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’12, pp 25–36, New York, NY, USA, ACM (2012)Google Scholar
  25. 25.
    Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inf. 4(1), 1–9 (1974)CrossRefGoogle Scholar
  26. 26.
    Fox, A., Eichelberger, C., Hughes, J., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In 2013 IEEE International Conference on Big Data, pages 291–299, October (2013)Google Scholar
  27. 27.
    Website of GeoCouch. https://github.com/couchbase/geocouch, 2018. Online Accessed 23 April 2018
  28. 28.
    The Wikipedia article about Geohashing. https://en.wikipedia.org/wiki/Geohash, 2018. Online Accessed 03 Feb 2018
  29. 29.
    Website of the GeoMesa project. http://www.geomesa.org, 2017. Online Accessed 05 Oct 2017
  30. 30.
    Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP ’03, pp 29–43, New York, NY, USA, ACM (2003)Google Scholar
  31. 31.
    Güting, R.H., Behr, T., Düntgen, C.: Secondo: A platform for moving objects database research and for publishing and integrating research implementations. IEEE Data Eng. Bull. 33(2), 56–63 (2010)Google Scholar
  32. 32.
    Güting, R.H., Schneider, M.: Moving Objects Databases. Morgan Kaufmann, Los Altos (2005)zbMATHGoogle Scholar
  33. 33.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. SIGMOD Rec. 14(2), 47–57 (1984)CrossRefGoogle Scholar
  34. 34.
    Han, D., Stroulia, E.: HGrid: A Data Model for Large Geospatial Data Sets in HBase. pp. 910–917, 06 (2013)Google Scholar
  35. 35.
    Website of Apache HBase. https://hbase.apache.org/, 2018. Online Accessed 12 Feb 2018
  36. 36.
    HBase - Secondary Index, 2019. http://hbase.apache.org/book.html#secondary.indexes. Online Accessed 15 Jul 2019
  37. 37.
    Hughes, J., Annex, A., Eichelberger, C., Fox, A., Hulbert, A., Ronquest, M.: Geomesa: a distributed architecture for spatio-temporal fusion. In Geospatial Informatics, Fusion, and Motion Video Analytics V, 94730F, volume 9473 of Proceedings SPIE, pp. 9473–9473–13, (2015)Google Scholar
  38. 38.
    Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pages 11–25, Berkeley, CA, USA, USENIX Association (2010)Google Scholar
  39. 39.
    ST\_Intersects - Spatial Relationships and Measurements, 2018. http://postgis.net/docs/ST_Intersects.html. Online Accessed 15 May 2018
  40. 40.
    Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC ’97, pp. 654–663, New York, NY. USA, ACM (1997)Google Scholar
  41. 41.
    Kleppmann, M.: Designing Data-Intensive Applications. O’Reilly, Beijing (2017). ISBN 978-1-4493-7332-0Google Scholar
  42. 42.
    Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRefGoogle Scholar
  43. 43.
    Li, S., Hu, S., Ganti, R.K., Srivatsa, M., Abdelzaher, T.F.: Pyro: A Spatial-Temporal Big-Data Storage System. In 2015 USENIX Annual Technical Conference, USENIX ATC ’15, July 8–10, Santa Clara, CA, USA, pp. 97–109, (2015)Google Scholar
  44. 44.
    Website of MongoDB project. https://www.mongodb.com/, 2018. Online Accessed 23 Feb 2018
  45. 45.
    Morton, G.M.: A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. International Business Machines Company, Ottawa (1966)Google Scholar
  46. 46.
    Nidzwetzki, J.K., Güting, R.H.: Distributed secondo: an extensible and scalable database management system. Distrib. Parallel Databases 35(3–4), 197–248 (2017)CrossRefGoogle Scholar
  47. 47.
    Nidzwetzki, J.K., Güting, R.H.: BBoxDB - A Scalable Data Store for Multi-Dimensional Big Data (Demo-Paper). In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM ’18, pp. 1867–1870, New York, NY, USA, ACM (2018)Google Scholar
  48. 48.
    Nishimura, S., Das, S., Agrawal, D., Abbadi, A.E.: MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services. In Proceedings of the 2011 IEEE 12th International Conference on Mobile Data Management—Volume 01, MDM ’11, pp. 7–16, Washington, DC, USA, IEEE Computer Society (2011)Google Scholar
  49. 49.
    O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (lsm-tree). Acta Inf. 33(4), 351–385 (1996)CrossRefGoogle Scholar
  50. 50.
    Website of the Open Street Map Project, 2018. http://www.openstreetmap.org. Online Accessed 15 Apr 2018
  51. 51.
    Orenstein, J.: A comparison of spatial query processing techniques for native and parameter spaces. SIGMOD Rec. 19(2), 343–352 (1990)CrossRefGoogle Scholar
  52. 52.
    Tamer Özsu, M., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, New York (2011)Google Scholar
  53. 53.
    Patel, J.M., DeWitt, D.J.: Partition based spatial-merge join. SIGMOD Rec. 25(2), 259–270 (1996)CrossRefGoogle Scholar
  54. 54.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)zbMATHGoogle Scholar
  55. 55.
    Sprugnoli, R.: Perfect hashing functions: a single probe retrieving method for static sets. Commun. ACM 20(11), 841–850 (1977)MathSciNetCrossRefGoogle Scholar
  56. 56.
    Modified version of Tiny MD-HBase on Github, 2018. https://github.com/jnidzwetzki/Tiny-MD-HBase. Online Accessed 15 Jul 2018
  57. 57.
    The Tiny MD-HBase project on Github, 2018. https://github.com/shojinishimura/Tiny-MD-HBase. Online Accessed 26 Apr 2018
  58. 58.
    Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)CrossRefGoogle Scholar
  59. 59.
    Whitby, M.A., Fecher, R., Bennight, C.: GeoWave: Utilizing Distributed Key-Value Stores for Multidimensional Data. In Advances in Spatial and Temporal Databases - 15th International Symposium, SSTD 2017, Arlington, VA, USA, August 21–23, 2017, Proceedings, pp. 105–122, (2017)Google Scholar
  60. 60.
    Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: parallelizing spatial join with mapreduce on clusters. In Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31 - September 4, 2009, New Orleans, Louisiana, USA, pp. 1–8, (2009)Google Scholar
  61. 61.
    Zhou, X., Zhang, X., Wang, Y., Li, R., Wang, S.: Efficient Distributed Multi-dimensional Index for Big Data Management. In Proceedings of the 14th International Conference on Web-Age Information Management, WAIM’13, pp. 130–141, Berlin, Heidelberg, Springer (2013)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Mathematics and Computer ScienceFernUniversität HagenHagenGermany

Personalised recommendations