An Effective Scalable SQL Engine for NoSQL Databases

  • Ricardo Vilaça
  • Francisco Cruz
  • José Pereira
  • Rui Oliveira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7891)

Abstract

NoSQL databases were initially devised to support a few concrete extreme scale applications. Since the specificity and scale of the target systems justified the investment of manually crafting application code their limited query and indexing capabilities were not a major impediment. However, with a considerable number of mature alternatives now available there is an increasing willingness to use NoSQL databases in a wider and more diverse spectrum of applications and, to most of them, hand-crafted query code is not an enticing trade-off.

In this paper we address this shortcoming of current NoSQL databases with an effective approach for executing SQL queries while preserving their scalability and schema flexibility. We show how a full-fledged SQL engine can be integrated atop of HBase leading to an ANSI SQL compliant database. Under a standard TPC-C workload our prototype scales linearly with the number of nodes in the system and outperforms a NoSQL TPC-C implementation optimized for HBase.

Keywords

SQL NoSQL Cloud Computing Middleware 

References

  1. 1.
    BigQuery: Google (2011), http://code.google.com/apis/bigquery/
  2. 2.
    Hive: Hive (2011), http://hive.apache.org/
  3. 3.
    Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2, 922–933 (2009)Google Scholar
  4. 4.
    Armbrust, M., Curtis, K., Kraska, T., Fox, A., Franklin, M.J., Patterson, D.A.: PIQL: success-tolerant query processing in the cloud. Proc. VLDB Endow. 5(3), 181–192 (2011)Google Scholar
  5. 5.
    Baker, J., Bondç, C., Corbett, J.C., Furman, J.J., Khorlin, A., Larson, J., Léon, J.M., Li, Y., Lloyd, A., Yushprakh, V.: Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In: CIDR (2011)Google Scholar
  6. 6.
    Brantner, M., Florescu, D., Graf, D., Kossmann, D., Kraska, T.: Building a database on S3. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 251–264. ACM, New York (2008), http://doi.acm.org/10.1145/1376616.1376645 CrossRefGoogle Scholar
  7. 7.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI 2006 (2006)Google Scholar
  8. 8.
    Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.A., Puz, N., Weaver, D., Yerneni, R.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow (2008)Google Scholar
  9. 9.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC 2010 (2010)Google Scholar
  10. 10.
    DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: SOSP 2007 (2007)Google Scholar
  11. 11.
    Foundation, A.S.: Apache derby (2013), http://db.apache.org/derby/
  12. 12.
    George, L.: HBase: The Definitive Guide. O’Reilly Media (2011)Google Scholar
  13. 13.
    Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. SIGOPS Operating Systems Review 37(5), 29–43 (2003)CrossRefGoogle Scholar
  14. 14.
    Gomes, P., Pereira, J., Oliveira, R.: An object mapping for the Cassandra distributed database. In: Inforum (2011)Google Scholar
  15. 15.
    Google: Cloud SQL: pick the plan that fits your app. (May 2012), http://googleappengine.blogspot.pt/2012/05/cloud-sql-pick-plan-that-fits-your-app.html
  16. 16.
    Hacigümüs, H., Tatemura, J., Hsiung, W.P., Moon, H.J., Po, O., Sawires, A., Chi, Y., Jafarpour, H.: CloudDB: One Size Fits All Revived. In: Proceedings of the 2010 6th World Congress on Services (2010)Google Scholar
  17. 17.
    Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 981–992. ACM, New York (2008), http://doi.acm.org/10.1145/1376616.1376713 CrossRefGoogle Scholar
  18. 18.
    Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2010, pp. 11–11. USENIX Association, Berkeley (2010), http://dl.acm.org/citation.cfm?id=1855840.1855851 Google Scholar
  19. 19.
    Lakshman, A., Malik, P.: Cassandra - A Decentralized Structured Storage System. In: LADIS 2009 (2009)Google Scholar
  20. 20.
    Lin, L., Lychagina, V., Wong, M.: Tenzing: A SQL implementation on the MapReduce framework. Proceedings of the VLDB Endowment 4(12), 1318–1327 (2011)Google Scholar
  21. 21.
    Meijer, E., Bierman, G.: A co-relational model of data for large shared data banks. ACM Queue 9(3), 30:30–30:48 (2011), http://doi.acm.org/10.1145/1952746.1961297 Google Scholar
  22. 22.
    Nadkarni, P., Brandt, C.: Data Extraction and Ad Hoc Query of an Entity-Attribute-Value Database. Journal of the American Medical Informatics Association 5(6), 511–527 (1998)CrossRefGoogle Scholar
  23. 23.
    Rys, M.: Scalable SQL. ACM Queue: Tomorrow’s Computing Today 9(4), 30 (2011)CrossRefGoogle Scholar
  24. 24.
    SalesForce.com: Phoenix: A SQL layer over HBase (May 2013), https://github.com/forcedotcom/phoenix
  25. 25.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2010, pp. 1–10. IEEE Computer Society, Washington, DC (2010), http://dx.doi.org/10.1109/MSST.2010.5496972 CrossRefGoogle Scholar
  26. 26.
    Stonebraker, M., Cattell, R.: 10 rules for scalable performance in ’simple operation’ datastores. Commun. ACM 54(6), 72–80 (2011), http://doi.acm.org/10.1145/1953122.1953144 CrossRefGoogle Scholar
  27. 27.
    Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era (it’s time for a complete rewrite). In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 1150–1160. VLDB Endowment (2007), http://dl.acm.org/citation.cfm?id=1325851.1325981
  28. 28.
    Vilaça, R., Cruz, F., Oliveira, R.: On the expressiveness and trade-offs of large scale tuple stores. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6427, pp. 727–744. Springer, Heidelberg (2010), http://dx.doi.org/10.1007/978-3-642-16949-6_5 CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2013

Authors and Affiliations

  • Ricardo Vilaça
    • 1
  • Francisco Cruz
    • 1
  • José Pereira
    • 1
  • Rui Oliveira
    • 1
  1. 1.HASLab - High-Assurance Software LaboratoryINESC TEC and Universidade do MinhoBragaPortugal

Personalised recommendations