Skip to main content

A Correlation-Aware Data Placement Strategy for Key-Value Stores

  • Conference paper
Distributed Applications and Interoperable Systems (DAIS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 6723))

Abstract

Key-value stores hold the unprecedented bulk of the data produced by applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency. Moreover, existing key-value stores have only random or order based placement strategies.

In this paper we exploit arbitrary data relations easily expressed by the application to foster data locality and improve the performance of complex queries common in social network read-intensive workloads.

We present a novel data placement strategy, supporting dynamic tags, based on multidimensional locality-preserving mappings. We compare our data placement strategy with the ones used in existing key-value stores under the workload of a typical social network application and show that the proposed correlation-aware data placement strategy offers a major improvement on the system’s overall response time and network requirements.

Partially funded by the Portuguese Science Foundation (FCT) under project Stratus – A Layered Approach to Data Management in the Cloud (PTDC/EIA-CCO/115570/2009) and grant SFRH/BD/38529/2007.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aspnes, J., Shah, G.: Skip graphs. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 384–393. Society for Industrial and Applied Mathematics, Philadelphia (2003), http://portal.acm.org/citation.cfm?id=644108.644170

    Google Scholar 

  2. Boyd, D., Golder, S., Lotan, G.: Tweet tweet retweet: Conversational aspects of retweeting on twitter. In: Society, I.C. (ed.) Proceedings of HICSS-43 (January 2010)

    Google Scholar 

  3. Butz, A.R.: Alternative algorithm for hilbert’s space-filling curve. IEEE Trans. Comput. 20(4), 424–426 (1971)

    Article  MATH  Google Scholar 

  4. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI 2006: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, pp. 205–218. USENIX Association, Berkeley (2006)

    Google Scholar 

  5. Chawathe, Y., Ramabhadran, S., Ratnasamy, S., LaMarca, A., Shenker, S., Hellerstein, J.: A case study in building layered DHT applications. In: SIGCOMM 2005: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 97–108. ACM, New York (2005)

    Chapter  Google Scholar 

  6. Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.A., Puz, N., Weaver, D., Yerneni, R.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)

    Article  Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA (December 2004)

    Google Scholar 

  8. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: SOSP 2007: Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, pp. 205–220. ACM, New York (2007)

    Chapter  Google Scholar 

  9. Galuba, W., Aberer, K., Despotovic, Z., Kellerer, W.: Protopeer: From simulation to live deployment in one step. In: Eighth International Conference on Peer-to-Peer Computing, P2P 2008, pp. 191–192 (September 2008)

    Google Scholar 

  10. Ganesan, P., Yang, B., Garcia-Molina, H.: One torus to rule them all: multi-dimensional queries in p2p systems. In: WebDB 2004: Proceedings of the 7th International Workshop on the Web and Databases, pp. 19–24. ACM, New York (2004)

    Google Scholar 

  11. Garg, A.K., Gotlieb, C.C.: Order-preserving key transformations. ACM Trans. Database Syst. 11(2), 213–234 (1986)

    Article  Google Scholar 

  12. Gupta, A., Liskov, B., Rodrigues, R.: Efficient routing for peer-to-peer overlays. In: First Symposium on Networked Systems Design and Implementation (NSDI), San Francisco, CA (March 2004)

    Google Scholar 

  13. Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: WebKDD/SNA-KDD 2007: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, pp. 56–65. ACM, New York (2007)

    Google Scholar 

  14. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: STOC 1997: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pp. 654–663. ACM, New York (1997)

    Chapter  Google Scholar 

  15. Karger, D.R., Ruhl, M.: Simple efficient load balancing algorithms for peer-to-peer systems. In: SPAA 2004: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 36–43. ACM, New York (2004)

    Chapter  Google Scholar 

  16. Krishnamurthy, B., Gill, P., Arlitt, M.: A few chirps about twitter. In: WOSP 2008: Proceedings of the First Workshop on Online Social Networks, pp. 19–24. ACM, New York (2008)

    Chapter  Google Scholar 

  17. Lakshman, A., Malik, P.: Cassandra - A Decentralized Structured Storage System. In: SOSP Workshop on Large Scale Distributed Systems and Middleware (LADIS), Big Sky, MT (Ocotber 2009)

    Google Scholar 

  18. Risson, J., Harwood, A., Moors, T.: Stable high-capacity one-hop distributed hash tables. In: ISCC 2006: Proceedings of the 11th IEEE Symposium on Computers and Communications, pp. 687–694. IEEE Computer Society, Washington, DC, USA (2006)

    Google Scholar 

  19. Sagan, H.: Space-Filling Curves. Springer, New York (1994)

    Book  MATH  Google Scholar 

  20. Schmidt, C., Parashar, M.: Flexible information discovery in decentralized distributed systems. In: HPDC 2003: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, p. 226. IEEE Computer Society, Washington, DC, USA (2003)

    Chapter  Google Scholar 

  21. Sousa, A., Pereira, J., Soares, L., Correia Jr., A., Rocha, L., Oliveira, R., Moura, F.: Testing the Dependability and Performance of Group Communication Based Database Replication Protocols. In: International Conference on Dependable Systems and Networks (DSN 2005) (June 2005)

    Google Scholar 

  22. Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable Peer-To-Peer lookup service for internet applications. In: Proceedings of the 2001 ACM SIGCOMM Conference, pp. 149–160 (2001)

    Google Scholar 

  23. Vilaça, R., Cruz, F., Oliveira, R.: On the expressiveness and trade-offs of large scale tuple stores. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6427, pp. 727–744. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  24. Vilaça, R., Oliveira, R., Pereira, J.: A correlation-aware data placement strategy for key-value stores. Tech. Rep. DI-CCTC-10-08, CCTC Research Centre, Universidade do Minho (2010), http://gsd.di.uminho.pt/members/rmvilaca/papers/ddtr.pdf

  25. Xiongpai, Q., Wei, C., Shan, W.: Simulation of main memory database parallel recovery. In: SpringSim 2009: Proceedings of the 2009 Spring Simulation Multiconference, pp. 1–8. Society for Computer Simulation International, San Diego (2009)

    Google Scholar 

  26. Yu, H., Gibbons, P.B., Nath, S.: Availability of multi-object operations. In: NSDI 2006: Proceedings of the 3rd conference on 3rd Symposium on Networked Systems Design & Implementation, p. 16. USENIX Association, Berkeley (2006)

    Google Scholar 

  27. Zhong, M., Shen, K., Seiferas, J.: Correlation-aware object placement for multi-object operations. In: ICDCS 2008: Proceedings of the 28th International Conference on Distributed Computing Systems, pp. 512–521. IEEE Computer Society, Washington, DC, USA (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 IFIP International Federation for Information Processing

About this paper

Cite this paper

Vilaça, R., Oliveira, R., Pereira, J. (2011). A Correlation-Aware Data Placement Strategy for Key-Value Stores. In: Felber, P., Rouvoy, R. (eds) Distributed Applications and Interoperable Systems. DAIS 2011. Lecture Notes in Computer Science, vol 6723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21387-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21387-8_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21386-1

  • Online ISBN: 978-3-642-21387-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics