Skip to main content

Semantic-Aware Data Cube for Cloud Networks

  • Chapter
  • First Online:
Searchable Storage in Cloud Computing

Abstract

Today’s cloud data centers contain more than millions of servers and offer high bandwidth. A fundamental problem is how to significantly improve the large-scale system’s scalability to interconnect a large number of servers and meanwhile support various online services in cloud computing. One way is to deal with the challenge of potential mismatching between the network architecture and the data placement. To address this challenge, we present ANTELOPE, a scalable distributed data-centric scheme in cloud data centers, in which we systematically take into account both the property of network architecture and the optimization of data placement. The basic idea behind ANTELOPE is to leverage precomputation-based data cube to support online cloud services. Since the construction of data cube suffers from the high costs of full materialization, we use a semantic-aware partial materialization solution to significantly reduce the operation and space overheads. Extensive experiments on the real system implementations demonstrate the efficacy and efficiency of our proposed scheme (©{2014}IEEE. Reprinted, with permission, from Ref. [1].).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Y. Hua, X. Liu, H. Jiang, ANTELOPE: a semantic-aware data cube scheme for cloud data center networks. IEEE Trans. Comput. (TC) 63(9), 2146–2159 (2014)

    Article  MathSciNet  Google Scholar 

  2. IDC iView, The Digital Universe Decade - Are You Ready?, May 2010

    Google Scholar 

  3. A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, H. Liu, Data warehousing and analytics infrastructure at Facebook, in Proceedings of the SIGMOD (2010), pp. 1013–1020

    Google Scholar 

  4. Science Staff, Dealing with data - challenges and opportunities. Science 331(6018), 692–693 (2011)

    Article  Google Scholar 

  5. J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  6. Hadoop, http://hadoop.apache.org/

  7. M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, in Proceedings of the ACM SIGOPS/EuroSys (2007), pp. 59–72

    Google Scholar 

  8. C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig latin: a not-so-foreign language for data processing, in Proceedings of the ACM SIGMOD (2008), pp. 1099–1110

    Google Scholar 

  9. A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, R. Murthy, Hive-a petabyte scale data warehouse using hadoop, in Proceedings of the ICDE (2010)

    Google Scholar 

  10. G. Bell, T. Hey, A. Szalay, Beyond the data deluge. Science 323(5919), 1297–1298 (2009)

    Article  Google Scholar 

  11. J. Dai, J. Huang, S. Huang, B. Huang, Y. Liu, Hitune: dataflow-based performance analysis for big data cloud, in Proceedings of the USENIX Annual Technical Conference (2011)

    Google Scholar 

  12. R. Katz, Tech titans building boom. IEEE Spectr. 46(2), 40–54 (2009)

    Article  Google Scholar 

  13. A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, B. Maggs, Cutting the electric bill for internet-scale systems. ACM SIGCOMM Comput. Commun. Rev. 39(4), 123–134 (2009)

    Article  Google Scholar 

  14. J. Dean, Evolution and future directions of large-scale storage and computation systems at Google, in Keynote in ACM Symposium on Cloud Computing (ACM SOCC) (2010)

    Google Scholar 

  15. J. Sobel, Building Facebook: performance at massive scale, in Keynote in ACM Symposium on Cloud Computing (ACM SOCC) (2010)

    Google Scholar 

  16. D. Kossmann, How new is the cloud?, in Keynotes in ICDE (2010)

    Google Scholar 

  17. C. Lu, G. Alvarez, J. Wilkes, Aqueduct: online data migration with performance guarantees, in Proceedings of the FAST (2002), pp. 219–230

    Google Scholar 

  18. C. Pu, A. Leff, Replica control in distributed systems: as asynchronous approach. ACM SIGMOD Rec. 20(2), 377–386 (1991)

    Article  Google Scholar 

  19. N. Mysore et al., PortLand: a scalable fault-tolerant layer 2 data center network fabric, in Proceedings of the ACM SIGCOMM (2009)

    Google Scholar 

  20. D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, S. Lu, FiConn: using backup port for server interconnection in data centers, in Proceedings of the IEEE INFOCOM (2009)

    Google Scholar 

  21. A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. Maltz, P. Patel, S. Sengupta, VL2: a scalable and flexible data center network, in Proceedings of the ACM SIGCOMM (2009)

    Google Scholar 

  22. C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, S. Lu, DCell: a scalable and fault-tolerant network structure for data centers, in Proceedings of the ACM SIGCOMM (2008)

    Google Scholar 

  23. C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, S. Lu, BCube: a high performance, server-centric network architecture for modular data centers, in Proceedings of the ACM SIGCOMM (2009)

    Google Scholar 

  24. J. Mudigonda, P. Yalagandula, M. Al-Fares, J. Mogul, Spain: COTS data-center ethernet for multipathing over arbitrary topologies, in Proceedings of the USENIX NSDI (2010)

    Google Scholar 

  25. M. Al-Fares, A. Loukissas, A. Vahdat, A scalable, commodity data center network architecture, in Proceedings of the ACM SIGCOMM 2008 (2008)

    Google Scholar 

  26. A. Shieh, S. Kandula, A. Greenberg, C. Kim, B. Saha, Sharing the data center network, in Proceedings of the USENIX NSDI (2011)

    Google Scholar 

  27. K. Chen, C. Guo, H. Wu, J. Yuan, Z. Feng, Y. Chen, S. Lu, W. Wu, Generic and automatic address configuration for data center networks, in Proceedings of the ACM SIGCOMM (2010)

    Google Scholar 

  28. A. Viswanathan, A. Hussain, J. Mirkovic, S. Schwab, J. Wroclawski, A semantic framework for data analysis in networked systems, in Proceedings of the USENIX NSDI (2011)

    Google Scholar 

  29. S. Ghemawat, H. Gobioff, S. Leung, The Google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 43 (2003)

    Article  Google Scholar 

  30. F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, R. Gruber, Bigtable: a distributed storage system for structured data, in Proceedings of the OSDI (2006)

    Google Scholar 

  31. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, H. Pirahesh, Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)

    Article  Google Scholar 

  32. J. Hamilton, Internet scale storage, in Keynote in SIGMOD (2011)

    Google Scholar 

  33. J. Larus, The cloud will change everything, in Keynote in ASPLOS (2011)

    Google Scholar 

  34. R. Weber, H. Schek, S. Blott, A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, in Proceedings of the VLDB (1998), pp. 194–205

    Google Scholar 

  35. P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of the ACM Symposium on Theory of Computing (1998), pp. 604–613

    Google Scholar 

  36. A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  37. Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of the International Conference on Parallel Processing (ICPP) (2008), pp. 644–651

    Google Scholar 

  38. Los Alamos National Lab (LANL) File System Data, http://institute.lanl.gov/data/archive-data/

  39. E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of the FAST (2002)

    Google Scholar 

  40. S. Kavalanekar, B. Worthington, Q. Zhang, V. Sharda, Characterization of storage workload traces from production Windows servers, in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC) (2008)

    Google Scholar 

  41. J.L. Hellerstein, Google Cluster Data, http://googleresearch.blogspot.com/2010/01/google-cluster-data.html, Jan 2010

  42. B. Babcock, S. Chaudhuri, G. Das, Dynamic sample selection for approximate query processing, in Proceedings of the ACM SIGMOD (2003)

    Google Scholar 

  43. R. Missaoui, C. Goutte, A. Choupo, A. Boujenoui, A probabilistic model for data cube compression and query approximation, in Proceedings of the ACM Data Warehousing and OLAP (2007), pp. 33–40

    Google Scholar 

  44. J. Shanmugasundaram, U. Fayyad, P. Bradley, Compressed data cubes for OLAP aggregate query approximation on continuous dimensions, in Proceedings of the ACM SIGKDD (1999), pp. 223–232

    Google Scholar 

  45. D. Barbara, X. Wu, Loglinear-based quasi cubes. J. Intell. Inf. Syst. 16(3), 255–276 (2001)

    Article  Google Scholar 

  46. T. Wu, D. Xin, J. Han, ARCube: supporting ranking aggregate queries in partially materialized data cubes, in Proceedings of the ACM SIGMOD (2008), pp. 79–92

    Google Scholar 

  47. D. Xin, J. Han, H. Cheng, X. Li, Answering top-k queries with multi-dimensional selections: the ranking cube approach, in Proceedings of the VLDB (2006), pp. 463–474

    Google Scholar 

  48. M. Riedewald, D. Agrawal, A. El Abbadi, pCube: update-efficient online aggregation with progressive feedback and error bounds, in Proceedings of the SSDBM (2000), pp. 95–108

    Google Scholar 

  49. W. Lu, J. Yu, Condensed cube: an effective approach to reducing data cube size, in Proceedings of the ICDE (2002), pp. 155–165

    Google Scholar 

  50. Y. Feng, D. Agrawal, A. El Abbadi, A. Metwally, Range cube: efficient cube computation by exploiting data correlation, in Proceedings of the ICDE (2004), pp. 658–669

    Google Scholar 

  51. X. Jin, J. Han, L. Cao, J. Luo, B. Ding, C. Lin, Visual cube and on-line analytical processing of images, in Proceedings of the 19th ACM International Conference on Information and Knowledge Management (2010), pp. 849–858

    Google Scholar 

  52. P. Zhao, X. Li, D. Xin, J. Han, Graph cube: on warehousing and OLAP multidimensional networks, in Proceedings of the SIGMOD (2011), pp. 853–864

    Google Scholar 

  53. B. Ding, B. Zhao, C. Lin, J. Han, C. Zhai, Topcells: keyword-based search of top-k aggregated documents in text cube, in Proceedings of the ICDE (2010), pp. 381–384

    Google Scholar 

  54. Y. Yu, C. Lin, Y. Sun, C. Chen, J. Han, B. Liao, T. Wu, C. Zhai, D. Zhang, B. Zhao, iNextCube: information network-enhanced text cube, in Proceedings of the VLDB (2009)

    Google Scholar 

  55. B. Bi, S. Lee, B. Kao, R. Cheng, CubeLSI: an effective and efficient method for searching resources in social tagging systems, in Proceedings of the IDCE (2011), pp. 27–38

    Google Scholar 

  56. M. Liu, E. Rundensteiner, K. Greenfield, C. Gupta, S. Wang, I. Ari, A. Mehta, E-cube: multi-dimensional event sequence processing using concept and pattern hierarchies, in Proceedings of the ICDE (2010), pp. 1097–1100

    Google Scholar 

  57. J. Lee, S. Hwang, Z. Nie, J. Wen, Product entitycube: a recommendation and navigation system for product search, in Demonstrations in ICDE (2010)

    Google Scholar 

  58. G. Salton, A. Wong, C. Yang, A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  59. J. Hartigan, M. Wong, Algorithm AS 136: a K-means clustering algorithm. Appl. Stat. 100–108, (1979)

    Google Scholar 

  60. S. Deerwester, S. Dumas, G. Furnas, T. Landauer, R. Harsman, Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)

    Article  Google Scholar 

  61. M.W. Berry, S. Dumas, G. OBrien, Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573–595 (1995)

    Article  MathSciNet  Google Scholar 

  62. C. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)

    Article  MathSciNet  Google Scholar 

  63. G. Golub, C. Van Loan, Matrix Computations (Johns Hopkins University Press, 1996)

    Google Scholar 

  64. Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, Semantic-aware metadata organization paradigm in next-generation file systems. IEEE Trans. Parallel Distrib. Syst. 23(2), 337–344 (2012)

    Article  Google Scholar 

  65. C. Tang, S. Dwarkadas, Z. Xu, On scaling latent semantic indexing for large peer-to-peer systems, in Proceedings of the ACM SIGIR (2004), pp. 112–121

    Google Scholar 

  66. S. Lee, S. Chun, D. Kim, J. Lee, C. Chung, Similarity search for multidimensional data sequences, in Proceedings of the ICDE (2000)

    Google Scholar 

  67. A. Guttman, R-trees: a dynamic index structure for spatial searching, in Proceedings of the ACM SIGMOD (1984), pp. 47–57

    Article  Google Scholar 

  68. G. Salton, A. Wong, C. Yang, A vector space model for information retrieval. J. Am. Soc. Inf. Retr. 613–620, (1975)

    Google Scholar 

  69. S. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, C. Maltzahn, Ceph: a scalable, high-performance distributed file system, in Proceedings of the OSDI (2006)

    Google Scholar 

  70. C. Tang, Z. Xu, S. Dwarkadas, Peer-to-peer information retrieval using self-organizing semantic overlay networks, in Proceedings of the SIGCOMM (2003)

    Google Scholar 

  71. Z. Xu, C. Tang, Z. Zhang, Building topology-aware overlays using global soft-state, in Proceedings of the ICDCS (2003)

    Google Scholar 

  72. C. Buckley, Implementation of the smart information retrieval system. Technical Report, Cornell University (1985)

    Google Scholar 

  73. M.W. Berry, Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6(1), 13–49 (1992)

    Article  Google Scholar 

  74. G.H. Golub, C. Reinsch, Singular value decomposition and least squares solutions. Numer. Math. 14(5), 403–420 (1970)

    Article  MathSciNet  Google Scholar 

  75. L. De Lathauwer, B. De Moor, J. Vandewalle, A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)

    Article  MathSciNet  Google Scholar 

  76. H. Wu, G. Lu, D. Li, C. Guo, Y. Zhang, MDCube: a high performance network structure for modular data center interconnection, in Proceedings of the CoNEXT (2009), pp. 25–36

    Google Scholar 

  77. M. Casado, D. Erickson, I.A. Ganichev, R. Griffith, B. Heller, N. Mckeown, D. Moon, T. Koponen, S. Shenker, K. Zarifis, Ripcord: a modular platform for data center networking. Technical Report No. UCB/EECS-2010-93, EECS Department, University of California, Berkeley (2010)

    Google Scholar 

  78. D. Li, M. Xu, H. Zhao, X. Fu, Building mega data center from heterogeneous containers, in Proceedings of the IEEE ICNP (2011)

    Google Scholar 

  79. D. Li, H. Cui, Y. Hu, Y. Xia, X. Wang, Scalable data center multicast using multi-class Bloom filter, in Proceedings of the IEEE ICNP (2011)

    Google Scholar 

  80. J. Mudigonda, P. Yalagandula, J.C. Mogul, Taming the flying cable monster: A topology design and optimization framework for data-center networks, in Proceedings of the USENIX Annual Technical Conference (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Hua .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hua, Y., Liu, X. (2019). Semantic-Aware Data Cube for Cloud Networks. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2721-6_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2720-9

  • Online ISBN: 978-981-13-2721-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics