Abstract
Today’s cloud data centers contain more than millions of servers and offer high bandwidth. A fundamental problem is how to significantly improve the large-scale system’s scalability to interconnect a large number of servers and meanwhile support various online services in cloud computing. One way is to deal with the challenge of potential mismatching between the network architecture and the data placement. To address this challenge, we present ANTELOPE, a scalable distributed data-centric scheme in cloud data centers, in which we systematically take into account both the property of network architecture and the optimization of data placement. The basic idea behind ANTELOPE is to leverage precomputation-based data cube to support online cloud services. Since the construction of data cube suffers from the high costs of full materialization, we use a semantic-aware partial materialization solution to significantly reduce the operation and space overheads. Extensive experiments on the real system implementations demonstrate the efficacy and efficiency of our proposed scheme (©{2014}IEEE. Reprinted, with permission, from Ref. [1].).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Y. Hua, X. Liu, H. Jiang, ANTELOPE: a semantic-aware data cube scheme for cloud data center networks. IEEE Trans. Comput. (TC) 63(9), 2146–2159 (2014)
IDC iView, The Digital Universe Decade - Are You Ready?, May 2010
A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, H. Liu, Data warehousing and analytics infrastructure at Facebook, in Proceedings of the SIGMOD (2010), pp. 1013–1020
Science Staff, Dealing with data - challenges and opportunities. Science 331(6018), 692–693 (2011)
J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Hadoop, http://hadoop.apache.org/
M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, in Proceedings of the ACM SIGOPS/EuroSys (2007), pp. 59–72
C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig latin: a not-so-foreign language for data processing, in Proceedings of the ACM SIGMOD (2008), pp. 1099–1110
A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, R. Murthy, Hive-a petabyte scale data warehouse using hadoop, in Proceedings of the ICDE (2010)
G. Bell, T. Hey, A. Szalay, Beyond the data deluge. Science 323(5919), 1297–1298 (2009)
J. Dai, J. Huang, S. Huang, B. Huang, Y. Liu, Hitune: dataflow-based performance analysis for big data cloud, in Proceedings of the USENIX Annual Technical Conference (2011)
R. Katz, Tech titans building boom. IEEE Spectr. 46(2), 40–54 (2009)
A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, B. Maggs, Cutting the electric bill for internet-scale systems. ACM SIGCOMM Comput. Commun. Rev. 39(4), 123–134 (2009)
J. Dean, Evolution and future directions of large-scale storage and computation systems at Google, in Keynote in ACM Symposium on Cloud Computing (ACM SOCC) (2010)
J. Sobel, Building Facebook: performance at massive scale, in Keynote in ACM Symposium on Cloud Computing (ACM SOCC) (2010)
D. Kossmann, How new is the cloud?, in Keynotes in ICDE (2010)
C. Lu, G. Alvarez, J. Wilkes, Aqueduct: online data migration with performance guarantees, in Proceedings of the FAST (2002), pp. 219–230
C. Pu, A. Leff, Replica control in distributed systems: as asynchronous approach. ACM SIGMOD Rec. 20(2), 377–386 (1991)
N. Mysore et al., PortLand: a scalable fault-tolerant layer 2 data center network fabric, in Proceedings of the ACM SIGCOMM (2009)
D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, S. Lu, FiConn: using backup port for server interconnection in data centers, in Proceedings of the IEEE INFOCOM (2009)
A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. Maltz, P. Patel, S. Sengupta, VL2: a scalable and flexible data center network, in Proceedings of the ACM SIGCOMM (2009)
C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, S. Lu, DCell: a scalable and fault-tolerant network structure for data centers, in Proceedings of the ACM SIGCOMM (2008)
C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, S. Lu, BCube: a high performance, server-centric network architecture for modular data centers, in Proceedings of the ACM SIGCOMM (2009)
J. Mudigonda, P. Yalagandula, M. Al-Fares, J. Mogul, Spain: COTS data-center ethernet for multipathing over arbitrary topologies, in Proceedings of the USENIX NSDI (2010)
M. Al-Fares, A. Loukissas, A. Vahdat, A scalable, commodity data center network architecture, in Proceedings of the ACM SIGCOMM 2008 (2008)
A. Shieh, S. Kandula, A. Greenberg, C. Kim, B. Saha, Sharing the data center network, in Proceedings of the USENIX NSDI (2011)
K. Chen, C. Guo, H. Wu, J. Yuan, Z. Feng, Y. Chen, S. Lu, W. Wu, Generic and automatic address configuration for data center networks, in Proceedings of the ACM SIGCOMM (2010)
A. Viswanathan, A. Hussain, J. Mirkovic, S. Schwab, J. Wroclawski, A semantic framework for data analysis in networked systems, in Proceedings of the USENIX NSDI (2011)
S. Ghemawat, H. Gobioff, S. Leung, The Google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 43 (2003)
F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, R. Gruber, Bigtable: a distributed storage system for structured data, in Proceedings of the OSDI (2006)
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, H. Pirahesh, Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)
J. Hamilton, Internet scale storage, in Keynote in SIGMOD (2011)
J. Larus, The cloud will change everything, in Keynote in ASPLOS (2011)
R. Weber, H. Schek, S. Blott, A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, in Proceedings of the VLDB (1998), pp. 194–205
P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of the ACM Symposium on Theory of Computing (1998), pp. 604–613
A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of the International Conference on Parallel Processing (ICPP) (2008), pp. 644–651
Los Alamos National Lab (LANL) File System Data, http://institute.lanl.gov/data/archive-data/
E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of the FAST (2002)
S. Kavalanekar, B. Worthington, Q. Zhang, V. Sharda, Characterization of storage workload traces from production Windows servers, in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC) (2008)
J.L. Hellerstein, Google Cluster Data, http://googleresearch.blogspot.com/2010/01/google-cluster-data.html, Jan 2010
B. Babcock, S. Chaudhuri, G. Das, Dynamic sample selection for approximate query processing, in Proceedings of the ACM SIGMOD (2003)
R. Missaoui, C. Goutte, A. Choupo, A. Boujenoui, A probabilistic model for data cube compression and query approximation, in Proceedings of the ACM Data Warehousing and OLAP (2007), pp. 33–40
J. Shanmugasundaram, U. Fayyad, P. Bradley, Compressed data cubes for OLAP aggregate query approximation on continuous dimensions, in Proceedings of the ACM SIGKDD (1999), pp. 223–232
D. Barbara, X. Wu, Loglinear-based quasi cubes. J. Intell. Inf. Syst. 16(3), 255–276 (2001)
T. Wu, D. Xin, J. Han, ARCube: supporting ranking aggregate queries in partially materialized data cubes, in Proceedings of the ACM SIGMOD (2008), pp. 79–92
D. Xin, J. Han, H. Cheng, X. Li, Answering top-k queries with multi-dimensional selections: the ranking cube approach, in Proceedings of the VLDB (2006), pp. 463–474
M. Riedewald, D. Agrawal, A. El Abbadi, pCube: update-efficient online aggregation with progressive feedback and error bounds, in Proceedings of the SSDBM (2000), pp. 95–108
W. Lu, J. Yu, Condensed cube: an effective approach to reducing data cube size, in Proceedings of the ICDE (2002), pp. 155–165
Y. Feng, D. Agrawal, A. El Abbadi, A. Metwally, Range cube: efficient cube computation by exploiting data correlation, in Proceedings of the ICDE (2004), pp. 658–669
X. Jin, J. Han, L. Cao, J. Luo, B. Ding, C. Lin, Visual cube and on-line analytical processing of images, in Proceedings of the 19th ACM International Conference on Information and Knowledge Management (2010), pp. 849–858
P. Zhao, X. Li, D. Xin, J. Han, Graph cube: on warehousing and OLAP multidimensional networks, in Proceedings of the SIGMOD (2011), pp. 853–864
B. Ding, B. Zhao, C. Lin, J. Han, C. Zhai, Topcells: keyword-based search of top-k aggregated documents in text cube, in Proceedings of the ICDE (2010), pp. 381–384
Y. Yu, C. Lin, Y. Sun, C. Chen, J. Han, B. Liao, T. Wu, C. Zhai, D. Zhang, B. Zhao, iNextCube: information network-enhanced text cube, in Proceedings of the VLDB (2009)
B. Bi, S. Lee, B. Kao, R. Cheng, CubeLSI: an effective and efficient method for searching resources in social tagging systems, in Proceedings of the IDCE (2011), pp. 27–38
M. Liu, E. Rundensteiner, K. Greenfield, C. Gupta, S. Wang, I. Ari, A. Mehta, E-cube: multi-dimensional event sequence processing using concept and pattern hierarchies, in Proceedings of the ICDE (2010), pp. 1097–1100
J. Lee, S. Hwang, Z. Nie, J. Wen, Product entitycube: a recommendation and navigation system for product search, in Demonstrations in ICDE (2010)
G. Salton, A. Wong, C. Yang, A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
J. Hartigan, M. Wong, Algorithm AS 136: a K-means clustering algorithm. Appl. Stat. 100–108, (1979)
S. Deerwester, S. Dumas, G. Furnas, T. Landauer, R. Harsman, Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
M.W. Berry, S. Dumas, G. OBrien, Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573–595 (1995)
C. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)
G. Golub, C. Van Loan, Matrix Computations (Johns Hopkins University Press, 1996)
Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, Semantic-aware metadata organization paradigm in next-generation file systems. IEEE Trans. Parallel Distrib. Syst. 23(2), 337–344 (2012)
C. Tang, S. Dwarkadas, Z. Xu, On scaling latent semantic indexing for large peer-to-peer systems, in Proceedings of the ACM SIGIR (2004), pp. 112–121
S. Lee, S. Chun, D. Kim, J. Lee, C. Chung, Similarity search for multidimensional data sequences, in Proceedings of the ICDE (2000)
A. Guttman, R-trees: a dynamic index structure for spatial searching, in Proceedings of the ACM SIGMOD (1984), pp. 47–57
G. Salton, A. Wong, C. Yang, A vector space model for information retrieval. J. Am. Soc. Inf. Retr. 613–620, (1975)
S. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, C. Maltzahn, Ceph: a scalable, high-performance distributed file system, in Proceedings of the OSDI (2006)
C. Tang, Z. Xu, S. Dwarkadas, Peer-to-peer information retrieval using self-organizing semantic overlay networks, in Proceedings of the SIGCOMM (2003)
Z. Xu, C. Tang, Z. Zhang, Building topology-aware overlays using global soft-state, in Proceedings of the ICDCS (2003)
C. Buckley, Implementation of the smart information retrieval system. Technical Report, Cornell University (1985)
M.W. Berry, Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6(1), 13–49 (1992)
G.H. Golub, C. Reinsch, Singular value decomposition and least squares solutions. Numer. Math. 14(5), 403–420 (1970)
L. De Lathauwer, B. De Moor, J. Vandewalle, A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
H. Wu, G. Lu, D. Li, C. Guo, Y. Zhang, MDCube: a high performance network structure for modular data center interconnection, in Proceedings of the CoNEXT (2009), pp. 25–36
M. Casado, D. Erickson, I.A. Ganichev, R. Griffith, B. Heller, N. Mckeown, D. Moon, T. Koponen, S. Shenker, K. Zarifis, Ripcord: a modular platform for data center networking. Technical Report No. UCB/EECS-2010-93, EECS Department, University of California, Berkeley (2010)
D. Li, M. Xu, H. Zhao, X. Fu, Building mega data center from heterogeneous containers, in Proceedings of the IEEE ICNP (2011)
D. Li, H. Cui, Y. Hu, Y. Xia, X. Wang, Scalable data center multicast using multi-class Bloom filter, in Proceedings of the IEEE ICNP (2011)
J. Mudigonda, P. Yalagandula, J.C. Mogul, Taming the flying cable monster: A topology design and optimization framework for data-center networks, in Proceedings of the USENIX Annual Technical Conference (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Hua, Y., Liu, X. (2019). Semantic-Aware Data Cube for Cloud Networks. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-2721-6_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2720-9
Online ISBN: 978-981-13-2721-6
eBook Packages: Computer ScienceComputer Science (R0)