Journal of Grid Computing

, Volume 10, Issue 1, pp 109–132 | Cite as

Performance Evaluation of Range Queries in Key Value Stores

  • Pouria Pirzadeh
  • Junichi Tatemura
  • Oliver Po
  • Hakan Hacıgümüş
Article

Abstract

Recently there has been a considerable increase in the number of different Key-Value stores, for supporting data storage and applications on the cloud environment. While all these solutions try to offer highly available and scalable services on the cloud, they are significantly different with each other in terms of the architecture and types of the applications, they try to support. Considering three widely-used such systems: Cassandra, HBase and Voldemort; in this paper we compare them in terms of their support for different types of query workloads. We are mainly focused on the range queries. Unlike HBase and Cassandra that have built-in support for range queries, Voldemort does not support this type of queries via its available API. For this matter, practical techniques are presented on top of Voldemort to support range queries. Our performance evaluation is based on mixed query workloads, in the sense that they contain a combination of short and long range queries, beside other types of typical queries on key-value stores such as lookup and update. We show that there are trade-offs in the performance of the selected system and scheme, and the types of the query workloads that can be processed efficiently.

Keywords

Key-value store Range query Range index Performance study 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. In: The Proceedings of VLDB Endowment, vol. 2 issue 1, pp. 922–933 (2009)Google Scholar
  2. 2.
    Agrawal, P., Silberstein, A., Cooper, B.F., Srivastava, U., Ramakrishnan, R.: Asynchronous view maintenance for vlsd databases. SIGMOD ’09. ACM (2009)Google Scholar
  3. 3.
    Aguilera, M.K., Golab, W., Golab, M.A.: A practical scalable distributed b-tree. In: Proceedings of VLDB Endow., vol. 1, pp. 598–609 (2008)Google Scholar
  4. 4.
    Andrzejak, A., Xu, Z.: Scalable, efficient range queries for grid information services. Peer-to-Peer Computing, pp. 33–40 (2002)Google Scholar
  5. 5.
    Apache CouchDB. http://couchdb.apache.org/. Accessed date Nov 2010
  6. 6.
    Apache HDFS. http://hadoop.apache.org/hdfs/. Accessed date Nov 2010
  7. 7.
    Aspnes, J., Kirsch, J., Krishnamurthy, A.: Load balancing and locality in range-queriable data structures. PODC ’04, pp. 115–124. ACM (2004)Google Scholar
  8. 8.
    Binnig, C., Kossmann, D., Kraska, T., Loesing, S.: How is the weather tomorrow?: towards a benchmark for the cloud. In: Proceedings of the 2nd International Workshop on Testing Database Systems, DBTest ’09, pp. 9:1–9:6. ACM (2009)Google Scholar
  9. 9.
    Brantner, M., Florescu, D., Graf, D.A., Kossmann, D., Kraska, T.: Building a database on s3. SIGMOD Conference, pp. 251–264 (2008)Google Scholar
  10. 10.
    Cassandra. http://cassandra.apache.org/. Accessed date Nov 2010
  11. 11.
    Cattell, R.: Scalable sql and nosql data stores. SIGMOD Rec. 39, 12–27 (2011)CrossRefGoogle Scholar
  12. 12.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 2, 4:1–4:26 (2008)Google Scholar
  13. 13.
    Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.-A., Puz, N., Weaver, D., Yerneni, R.: Pnuts: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1, 1277–1288 (2008)Google Scholar
  14. 14.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with ycsb. SoCC, pp. 143–154 (2010)Google Scholar
  15. 15.
    Ganesan, P., Bawa, M., Garcia-molina, H.:Online balancing of range-partitioned data with applications to Peer-to-Peer systems. In: VLDB, pp. 444–455 (2004)Google Scholar
  16. 16.
    Ganesan, P., Yang, B., Garcia-Molina, H.: One torus to rule them all: multidimensional queries in p2p systems. WebDB (2004)Google Scholar
  17. 17.
    Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. SIGMOD Rec. 23, 243–252 (1994)CrossRefGoogle Scholar
  18. 18.
    Gupta, A., Agrawal, D., Abbadi, A.E.: Approximate range selection queries in peer-to-peer systems. CIDR (2003)Google Scholar
  19. 19.
    Hastorun, D., Jampani, M., Kakulapati, G., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazons highly available key-value store. In: Proceedings of SOSP, pp. 205–220 2007Google Scholar
  20. 20.
    HBase. http://hbase.apache.org/. Accessed date Nov 2010
  21. 21.
    Jagadish, H.V., Ooi, B.C., Vu, Q.H.: Baton: a balanced tree structure for peer-to-peer networks. In: VLDB, pp. 661–672 (2005)Google Scholar
  22. 22.
    Lehman, P.L., Yao, S.B.: Efficient locking for concurrent operations on b-trees. ACM Trans. Database Syst. 6(4) 650–670 (1981)MATHCrossRefGoogle Scholar
  23. 23.
    Lomet, D.: Replicated indexes for distributed data. DIS ’96, IEEE Computer Society, pp. 108–119 (1996)Google Scholar
  24. 24.
    MongoDB. http://www.mongodb.org/. Accessed date Nov 2010
  25. 25.
    Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. SIGMOD Conference, pp. 165–178 (2009)Google Scholar
  26. 26.
    Pitoura, T., Ntarmos, N., Triantafillou, P.: Replication, load balancing and efficient range query processing in dhts. EDBT, pp. 131–148 (2006)Google Scholar
  27. 27.
    Project Voldemort. http://project-voldemort.com/. Accessed date Nov 2010
  28. 28.
    Ramabhadran, S., Ratnasamy, S., Hellerstein, J.M., Shenker, S.: Brief announcement: prefix hash tree. PODC ’04. ACM (2004)Google Scholar
  29. 29.
    Sahin, O.D., Gupta, A., Agrawal, D., Abbadi, A.E.: A peer-to-peer framework for caching range queries. ICDE, pp. 165–176 (2004)Google Scholar
  30. 30.
    Schütt, T., Schintke, F., Reinefeld, A.: Structured overlay without consistent hashing: empirical results. CCGRID (2006)Google Scholar
  31. 31.
    Schütt, T., Schintke, F., Reinefeld, A.: Range queries on structured overlay networks. Computer Communications, vol. 31 (2008)Google Scholar
  32. 32.
    Shi, Y., Meng, X., Zhao, J., Hu, X., Liu, B., Wang, H.: Benchmarking cloud-based data management systems, In: Proceedings of the second international workshop on Cloud data management. CloudDB ’10, pp. 47–54. ACM (2010)Google Scholar
  33. 33.
    Vo, H.T., Chen, C., Ooi, B.C.: Towards elastic transactional cloud storage with range query support. In: The Proceedings of VLDB Endowment, vol. 3, pp. 506–517 (2010)Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  • Pouria Pirzadeh
    • 1
  • Junichi Tatemura
    • 2
  • Oliver Po
    • 2
  • Hakan Hacıgümüş
    • 2
  1. 1.Department of Computer ScienceUniversity of California IrvineIrvineUSA
  2. 2.NEC Laboratories America, Inc.CupertinoUSA

Personalised recommendations