Skip to main content
Log in

Distributed secondo: an extensible and scalable database management system

Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

This paper describes a novel method to couple a standalone database management system (DBMS) with a highly scalable key-value store. The system employs Apache Cassandra as data storage and the extensible DBMS Secondo as a query processing engine. The resulting system is a distributed, general-purpose DBMS which is highly scalable and fault tolerant. The logical ring of Cassandra is used to split up input data into smaller units of work (UOWs), which can be processed independently. A decentralized algorithm is responsible to assign the UOWs to query processing nodes. In case of a node failure, UOWs are recalculated on a different node. All the data models (e.g. relational, spatial and spatio-temporal) and functions (e.g. filter, aggregates, joins and spatial-joins) implemented in Secondo can be used in a scalable way without changing the implementation. Many aspects of the distribution are hidden from the user. Existing sequential queries can be easily converted into parallel ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

Notes

  1. Some special functions, like the interaction with other distributed systems, are excluded.

  2. In Secondo, nested lists are used at some points to interchange structured data. For example: ((value1 value2) (value3)).

  3. The two cases \(begin = p_0\) and \(end = p_n\) are ignored in the description to keep the examples clear.

  4. Phase 3 is influenced by the speculative task execution of Hadoop [8, p. 3]. The table system_pending prevents, that all idle QPNs are processing the same UOW at the same time. This would lead to hot spots (parts of the logical ring that are read or written by many nodes simultaneously) and to longer query processing times.

  5. The part of the logical ring which is read, is determined by the the UOW which is processed at the moment.

  6. Each line contains 5000 characters + 4 field separators (e.g. \(\gg ,\ll \)) + 1 new line character (e.g. \(\gg \textbackslash n\ll \) = 5005 bytes per line. By creating 10,000,000 lines with 5005 bytes each, 46,61 GB data in total is generated.

  7. The data generator creates 46.61 GB of data, 38.84 GB needs to be transferred. With an 1 Gbit/s network link, the transfer takes 333.63 s.

  8. The parallel version executes multiple Secondo-threads on one hardware node. This is the reason, why the parallel version can not use 6 GB memory for each thread. However, UOWs are small and with 1.5 GB memory only one MMR-tree needs to be created. As a consequence, the second relation needs to be analyzed only once.

References

  1. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)

    Article  Google Scholar 

  2. Apache license, version 2.0. http://www.apache.org/licenses/ (2004). Accessed 30 Jul 2015

  3. Ceri, S., Pelagatti, G.: Distributed Databases Principles and Systems. McGraw-Hill Inc, New York (1984)

    MATH  Google Scholar 

  4. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, OSDI’06, vol. 7, pp. 15–15. USENIX Association, Berkeley (2006)

  5. Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., Woodford, D.: Spanner: Google’s globally-distributed database. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 251–264. USENIX Association, Berkeley (2012)

  6. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, OSDI’04, vol. 6, pp. 10. USENIX Association, Berkeley (2004)

  7. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)

    Article  Google Scholar 

  8. Dinun, F., Ng, T.S.E.: Understanding the effects and implications of compute node related failures in hadoop. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC’12, pp. 187–198. ACM, New York (2012)

  9. Dittrich, J.P., Seeger, B.: Data redundancy and duplicate detection in spatial join processing. In: ICDE, pp. 535–546 (2000)

  10. Düntgen, C., Behr, T., Güting, R.H.: Berlinmod: a benchmark for moving object databases. VLDB J. 18(6), 1335–1368 (2009)

    Article  Google Scholar 

  11. Eldawy, A., Mokbel, M.F.: Pigeon: a spatial mapreduce language. In: IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31–April 4, 2014, pp. 1242–1245 (2014)

  12. Eldawy, A., Mokbel, M.F.: SpatialHadoop: a mapreduce framework for spatial data. In: 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, pp. 1352–1363, 13–17 April 2015

  13. Gantz, J.F., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadow’s, and biggest growth in the far east. In: IDC (2012)

  14. George, L.: HBase: The Definitive Guide. O’Reilly Media Inc, Sebastopol (2011)

    Google Scholar 

  15. Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. SOSP’03, pp. 29–43. ACM, New York (2003)

  16. Güting, R.H.: Operator Based Query Progress Estimation. Fern Universität in Hagen, Hagen (2008)

    Google Scholar 

  17. Güting, R.H., Behr, T., Düntgen, C.: Secondo: a platform for moving objects database research and for publishing and integrating research implementations. IEEE Data Eng. Bull. 33(2), 56–63 (2010)

    Google Scholar 

  18. Idreos, S., Liarou, E., Koubarakis, M.: Continuous multi-way joins over distributed hash tables. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology. EDBT’08, pp. 594–605. ACM, New York (2008)

  19. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC’97, pp. 654–663. ACM, New York (1997)

  20. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  21. Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)

    Article  Google Scholar 

  22. Leach, P., Mealling, M., Salz, R.: RFC 4122: A Universally Unique IDentifier (UUID) URN Namespace (2005)

  23. Lu, J., Güting., R.H.: Parallel secondo: boosting database engines with hadoop. In: 2013 International Conference on Parallel and Distributed Systems, pp. 738–743 (2012)

  24. Nidzwetzki, J.K.: Entwicklung eines skalierbaren und verteilten Datenbanksystems. Springer, Berlin (2016)

    Book  Google Scholar 

  25. Nidzwetzki, J.K., Güting, R.H.: Distributed SECONDO: a highly available and scalable system for spatial data processing. In: Advances in spatial and temporal databases—14th international symposium, SSTD 2015, Hong Kong, China, pp. 491–496, 26–28 August 2015

  26. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. SIGMOD’08, pp. 1099–1110. ACM, New York (2008)

  27. Özsu, M.T., Valduriez, P. (eds.): Principles of Distributed Database Systems, vol. 3. Springer, New York (2011)

  28. Palma, W., Akbarinia, R., Pacitti, E., Valduriez, P.: Distributed processing of continuous join queries using DHT networks. In: Proceedings of the 2009 EDBT/ICDT Workshops. EDBT/ICDT’09, pp. 34–41. ACM, New York (2009)

  29. Patel, J.M., DeWitt, D.J.: Partition based spatial-merge join. SIGMOD Rec. 25(2), 259–270 (1996)

    Article  Google Scholar 

  30. Rothnie, J.B., Goodman, N.: A survey of research and development in distributed database management. In: Proceedings of the Third International Conference on Very Large Data Bases, VLDB’77, vol. 3, pp. 48–62. VLDB Endowment (1977)

  31. Rothnie, J.B., Bernstein, P.A., Fox, S., Goodman, N., Hammer, M., Landers, T.A., Reeve, C., Shipman, D.W., Wong, E.: Introduction to a system for distributed databases (SDD-1). ACM Trans. Database Syst. 5(1), 1–17 (1980)

    Article  Google Scholar 

  32. Shute, J., Oancea, M., Ellner, S., Handy, B., Rollins, E., Samwel, B., Vingralek, R., Whipkey, C., Chen, X., Jegerlehner, B., Littleield, K., Tong, P.: F1: the fault-tolerant distributed RDBMS supporting googles ad business. In: SIGMOD, 2012. Talk given at SIGMOD (2012)

  33. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for internet applications. SIGCOMM Comput. Commun. Rev. 31(4), 149–160 (2001)

    Article  Google Scholar 

  34. Tanenbaum, A.S., Steen, Mv: Distributed Systems: Principles and Paradigms, vol. 2. Prentice-Hall, Inc., Upper Saddle River (2006)

    MATH  Google Scholar 

  35. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)

    Article  Google Scholar 

  36. Transaction Processing Performance Council. TPC BENCHMARK H (Decision Support) Standard Specification. http://www.tpc.org/tpch/. Accessed 15 May 2015

  37. Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)

    Article  Google Scholar 

  38. Website of Apache Drill. http://drill.apache.org (2015). Accessed 20 July 2015

  39. Website of Apache Spark. http://spark.apache.org/ (2015). Accessed 20 Jul 2015

  40. Website of cpp-driver for Cassandra. https://github.com/datastax/cpp-driver (2015). Accessed 15 Sept 2015

  41. Website of distributed secondo http://dna.fernuni-hagen.de/secondo/DSecondo/DSECONDO-Website/index.html (2015). Accessed 15 Nov 2015

  42. Website of the Open Street Map Project. http://www.openstreetmap.org (2015). Accessed 09 July 2015

  43. White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)

    Google Scholar 

  44. Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 International Conference on Management of Data. SIGMOD’16, pp. 1071–1085. ACM, New York (2016)

  45. You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. Technical Report http://www-cs.ccny.cuny.edu/~jzhang/papers/spatial_cc_tr.pdf (2016). Accessed 14 Mar 2017

  46. Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: parallelizing spatial join with mapreduce on clusters. In: Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31–September 4, 2009, New Orleans, Louisiana, USA, pp. 1–8 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Kristof Nidzwetzki.

Queries of the experiments

Queries of the experiments

figure h
figure i
figure j
figure k
figure l
figure m
figure n
figure o

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nidzwetzki, J.K., Güting, R.H. Distributed secondo: an extensible and scalable database management system. Distrib Parallel Databases 35, 197–248 (2017). https://doi.org/10.1007/s10619-017-7198-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-017-7198-9

Keywords

Navigation