Advertisement

Data Similarity-Aware Computation Infrastructure for the Cloud

  • Yu HuaEmail author
  • Xue Liu
Chapter

Abstract

The cloud is emerging for scalable and efficient cloud services. In order to meet the needs of handling massive data and decreasing data migration, the computation infrastructure requires efficient data placement and proper management for cached data. We propose an efficient and cost-effective multilevel caching scheme, called MERCURY, as computation infrastructure of the cloud. The idea behind MERCURY is to explore and exploit data similarity and support efficient data placement. In order to accurately and efficiently capture the data similarity, we leverage low-complexity Locality-Sensitive Hashing (LSH). In our design, in addition to the problem of space inefficiency, we identify that a conventional LSH scheme also suffers from the problem of homogeneous data placement. To address these two problems, we design a novel Multicore-enabled LSH (MC-LSH) that accurately captures the differentiated similarity across data. The similarity-aware MERCURY hence partitions data into L1 cache, L2 cache, and main memory based on their distinct localities, which help optimize cache utilization and minimize the pollution in the last-level cache. Besides extensive evaluation through simulations, we also implemented MERCURY in a system. Experimental results based on real-world applications and datasets demonstrate the efficiency and efficacy of our proposed schemes (©{2014}IEEE. Reprinted, with permission, from Ref. [1].).

References

  1. 1.
    Y. Hua, X. Liu, D. Feng, Data similarity-aware computation infrastructure for the cloud. IEEE Trans. Comput. (TC) 63(1), 3–16 (2014)MathSciNetCrossRefGoogle Scholar
  2. 2.
    IDC iView, Extracting Value from Chaos (2011)Google Scholar
  3. 3.
    Science Staff, Dealing with data - challenges and opportunities. Science 331(6018), 692–693 (2011)CrossRefGoogle Scholar
  4. 4.
    M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica et al., A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)CrossRefGoogle Scholar
  5. 5.
    S. Bykov, A. Geller, G. Kliot, J. Larus, R. Pandya, J. Thelin, Orleans: cloud computing for everyone, in Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)Google Scholar
  6. 6.
    S. Wu, F. Li, S. Mehrotra, B. Ooi, Query optimization for massively parallel data processing, in Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)Google Scholar
  7. 7.
    L. Soares, D. Tam, M. Stumm, Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer, in Proceedings of the MICRO (2009), pp. 258–269Google Scholar
  8. 8.
    S. Biswas, D. Franklin, A. Savage, R. Dixon, T. Sherwood, F. Chong, Multi-execution: multicore caching for data-similar executions, in Proceedings of the ISCA (2009)Google Scholar
  9. 9.
    M. Chaudhuri, Pagenuca: selected policies for page-grain locality management in large shared chip-multiprocessor caches, in Proceedings of the HPCA (2009), pp. 227–238Google Scholar
  10. 10.
    S. Srikantaiah, R. Das, A.K. Mishra, C.R. Das, M. Kandemir, A case for integrated processor-cache partitioning in chip multiprocessors, in Proceedings of the SC (2009)Google Scholar
  11. 11.
    X. Ding, K. Wang, X. Zhang, SRM-buffer: an OS buffer management technique to prevent last level cache from thrashing in multicores, in Proceedings of the EuroSys (2011)Google Scholar
  12. 12.
    Y. Chen, S. Byna, X. Sun, Data access history cache and associated data prefetching mechanisms, in Proceedings of the SC (2007)Google Scholar
  13. 13.
    J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Enabling software management for multicore caches with a lightweight hardware support, in Proceedings of the SC (2009)Google Scholar
  14. 14.
    D. Zhan, H. Jiang, S.C. Seth, STEM: spatiotemporal management of capacity for intra-core last level caches, in Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2010)Google Scholar
  15. 15.
    D. Zhan, H. Jiang, S.C. Seth, Locality & utility co-optimization for practical capacity management of shared last level caches, in Proceedings of the ACM International Conference on Supercomputing (2012)Google Scholar
  16. 16.
    J. Stuecheli, D. Kaseridis, D. Daly, H. Hunter, L. John, The virtual write queue: coordinating DRAM and last-level cache policies, in Proceedings of the ISCA (2010)Google Scholar
  17. 17.
    Y. Hua, X. Liu, D. Feng, MERCURY: a scalable and similarity-aware scheme in multi-level cache hierarchy, in Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2012)Google Scholar
  18. 18.
    P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of the STOC (1998)Google Scholar
  19. 19.
    A. Forin, B. Neekzad, N. Lynch, Giano: the two-headed system simulator, Technical Report MSR-TR-2006-130 (Microsoft Research, Redmond, 2006)Google Scholar
  20. 20.
    S. Biswas, D. Franklin, T. Sherwood, F. Chong, Conflict-avoidance in multicore caching for data-similar executions, in Proceedings of the ISPAN (2009)Google Scholar
  21. 21.
  22. 22.
    R. Lee, X. Ding, F. Chen, Q. Lu, X. Zhang, MCC-DB: minimizing cache conflicts in multi-core processors for databases. Proc. VLDB 2(1), 373–384 (2009)CrossRefGoogle Scholar
  23. 23.
    T.R.B. Bershad, D. Lee, B. Chen, Avoiding conflict misses dynamically in large direct-mapped caches, in Proceedings of the ASPLOS (1994)Google Scholar
  24. 24.
    Y. Yan, X. Zhang, Z. Zhang, Cacheminer: a runtime approach to exploit cache locality on smp. IEEE Trans. Parallel Distrib. Syst. 11(4), 357–374 (2000)CrossRefGoogle Scholar
  25. 25.
    K. Zhang, Z. Wang, Y. Chen, H. Zhu, X. Sun, Pac-plru: a cache replacement policy to salvage discarded predictions from hardware prefetchers, Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2011), pp. 265–274Google Scholar
  26. 26.
    G. Suh, S. Devadas, L. Rudolph, Analytical cache models with applications to cache partitioning, in Proceedings of the ACM ICS (2001)Google Scholar
  27. 27.
    The Forest CoverType dataset, UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets/Covertype
  28. 28.
    D. Ellard, J. Ledlie, P. Malkani, M. Seltzer, Passive NFS tracing of email and research workloads, in Proceedings of the FAST (2003)Google Scholar
  29. 29.
    E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of the FAST (2002)Google Scholar
  30. 30.
  31. 31.
    S. Carr, K. Kennedy, Compiler blockability of numerical algorithms, Proceedings of the Supercomputing Conference (1992)Google Scholar
  32. 32.
    E.E.R.M.S. Lam, M.E. Wolf, The cache performance and optimizations of blocked algorithms, in Proceedings of the ASPLOS (1991)Google Scholar
  33. 33.
    M.S.L.T.C. Mowry, A. Gupta, Design and evaluation of a compiler algorithm for prefetching, in Proceedings of the ASPLOS (1992)Google Scholar
  34. 34.
    J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems, in Proceedings of the HPCA (2008)Google Scholar
  35. 35.
    Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, Proceedings of the VLDB(2007), pp. 950–961Google Scholar
  36. 36.
    R. Shinde, A. Goel, P. Gupta, D. Dutta, Similarity search and locality sensitive hashing using ternary content addressable memories, in Proceedings of the SIGMOD (2010), pp. 375–386Google Scholar
  37. 37.
    A. Joly, O. Buisson, A posteriori multi-probe locality sensitive hashing, Proceedings of the ACM International Conference on Multimedia (2008)Google Scholar
  38. 38.
    G. Taylor, P. Davies, M. Farmwald, The TLB slice-a low-cost high-speed address translation mechanism, in Proceedings of the ISCA (1990)Google Scholar
  39. 39.
    A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)CrossRefGoogle Scholar
  40. 40.
    L. Fan, P. Cao, J. Almeida, A. Broder, Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)CrossRefGoogle Scholar
  41. 41.
    B. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefGoogle Scholar
  42. 42.
    Y. Tao, K. Yi, C. Sheng, P. Kalnis, Quality and efficiency in high-dimensional nearest neighbor search, in Proceedings of the SIGMOD (2009)Google Scholar
  43. 43.
    Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of the ICPP (2008), pp. 644–651Google Scholar
  44. 44.
  45. 45.
    Y. Hua, B. Xiao, B. Veeravalli, D. Feng, Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. 61(6), 817–830 (2012)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Z. Zhang, Z. Zhu, X. Zhang, Cached dram for ilp processor memory access latency reduction. IEEE Micro 21(4), 22–32 (2001)CrossRefGoogle Scholar
  47. 47.
    S. Byna, Y. Chen, X. Sun, R. Thakur, W. Gropp, Parallel I/O prefetching using MPI file caching and I/O signatures, in Proceedings of the SC (2008)Google Scholar
  48. 48.
    Z. Zhang, Z. Zhu, X. Zhang, Design and optimization of large size and low overhead off-chip caches. IEEE Trans. Comput. 53(7), 843–855 (2004)CrossRefGoogle Scholar
  49. 49.
    N. Hardavellas, M. Ferdman, B. Falsafi, A. Ailamaki, Near-optimal cache block placement with reactive nonuniform cache architectures. IEEE Micro 30(1), 20–28 (2010)CrossRefGoogle Scholar
  50. 50.
    J. Torrellas, A. Tucker, A. Gupta, Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary, in Proceedings of the ACM SIGMETRICS (1993)Google Scholar
  51. 51.
    H. Lee, S. Cho, B. Childers, Cloudcache: expanding and shrinking private caches, in Proceedings of the HPCA (2011), pp. 219–230Google Scholar
  52. 52.
    X. Zhang, S. Dwarkadas, K. Shen, Hardware execution throttling for multi-core resource management, in Proceedings of the USENIX Annual Technical Conference (2009)Google Scholar
  53. 53.
    A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. Martinez, Scavenger: a new last level cache architecture with global block priority, in Proceedings of the MICRO (2007), pp. 421–432Google Scholar
  54. 54.
    J. Chhugani, A. Nguyen, V. Lee, W. Macy, M. Hagog, Y. Chen, A. Baransi, S. Kumar, P. Dubey, Efficient implementation of sorting on multi-core SIMD CPU architecture, in Proceedings of the VLDB (2008)Google Scholar
  55. 55.
    S. Park, T. Kim, J. Park, J. Kim, H. Im, Parallel skyline computation on multicore architectures, in Proceedings of the ICDE (2009)Google Scholar
  56. 56.
    S. Das, S. Antony, D. Agrawal, A. El Abbadi, Thread cooperation in multicore architectures for frequency counting over multiple data streams, in Proceedings of the VLDB (2009)Google Scholar
  57. 57.
    J. Cieslewicz, K. Ross, Adaptive aggregation on chip multiprocessors, in Proceedings of the VLDB (2007)Google Scholar
  58. 58.
    L. Qiao, V. Raman, F. Reiss, P. Haas, G. Lohman, Main-memory scan sharing for multi-core CPUs, in Proceedings of the VLDB (2008)Google Scholar
  59. 59.
    W. Han, J. Lee, Dependency-aware reordering for parallelizing query optimization in multi-core CPUs, in Proceedings of the SIGMOD (2009)Google Scholar
  60. 60.
    S. Tatikonda, S. Parthasarathy, Mining tree-structured data on multicore systems, in Proceedings of the VLDB (2009)Google Scholar
  61. 61.
    C. Kim, T. Kaldewey, V. Lee, E. Sedlar, A. Nguyen, N. Satish, J. Chhugani, A. Di Blas, P. Dubey, Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs, in Proceedings of the VLDB (2009)Google Scholar
  62. 62.
    M. Kleanthous, Y. Sazeides, CATCH: a mechanism for dynamically detecting cache-content-duplication and its application to instruction caches, in Proceedings of the DATE (2008)Google Scholar
  63. 63.
    A. Alameldeen, D. Wood, Adaptive cache compression for high-performance processors, in Proceedings of the ISCA (2004)Google Scholar
  64. 64.
    J. Chang, G. Sohi, Cooperative caching for chip multiprocessors, in Proceedings of the ISCA (2006)Google Scholar
  65. 65.
    C. Kim, D. Burger, S. Keckler, An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches, in Proceedings of the ASPLOS (2002)Google Scholar
  66. 66.
    Z. Chishti, M. Powell, T. Vijaykumar, Distance associativity for high-performance energy-efficient non-uniform cache architectures, in Proceedings of the MICRO (2003)Google Scholar
  67. 67.
    R. Manikantan, K. Rajan, R. Govindarajan, Nucache: an efficient multicore cache organization based on next-use distance, in Proceedings of the HPCA (2011), pp. 243–253Google Scholar
  68. 68.
    N. Lakshminarayana, J. Lee, H. Kim, Age based scheduling for asymmetric multiprocessors, in Proceedings of the ACM/IEEE Supercomputing Conference (2009)Google Scholar
  69. 69.
    J. Zhou, J. Cieslewicz, K. Ross, M. Shah, Improving database performance on simultaneous multithreading processors, in Proceedings of the VLDB (2005)Google Scholar
  70. 70.
    S. Boyd-Wickizer, R. Morris, M.F. Kaashoek, Reinventing scheduling for multicore systems, in Proceedings of the HotOS (2009)Google Scholar
  71. 71.
    L. Shalev, J. Satran, E. Borovik, M. Ben-Yehuda, IsoStack: highly efficient network processing on dedicated cores, in Proceedings of the USENIX Annual Technical Conference (2010)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Huazhong University of Science and TechnologyWuhanChina
  2. 2.McGill UniversityMontrealCanada

Personalised recommendations