Abstract
For a big data stream processing (BDSP) application, we may have many different processing units in the form of VMs. These VMs are highly correlated by the data streams as one’s output may be another one’s input. Consequently, the networking shall have a deeper impact to the performance and efficiency of BDSP, compared to batch data processing. Besides, virtualized network functions (VNF), also in the form of VMs, can also be added in stream processing. For example, we may require all the data streams first go through deep packet inspection (DPI) VM before actual processing. How to manage these VMs as well as the communications between them in data centers is critical to the cost-efficiency BDSP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
L. Gu, D. Zeng, S. Guo and I. Stojmenovic, “A general communication cost optimization framework for big data stream processing in geo-distributed data centers,” Online, 2014.
G. Lee, J. Lin, C. Liu, A. Lorek, and D. Ryaboy, “The Unified Logging Infrastructure for Data Analytics at Twitter,” Proc. VLDB Endow., vol. 5, no. 12, pp. 1771–1780, 2012.
G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin, “Fast data in the era of big data: Twitter’s real-time related query suggestion architecture,” in Proceedings of the 2013 international conference on Management of data. ACM, pp. 1147–1158, 2013.
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012, pp. 2–2.
Z. Zhang, M. Zhang, A. G. Greenberg, Y. C. Hu, R. Mahajan, and B. Christian, “Optimizing Cost and Performance in Online Service Provider Networks.” in Proc. USENIX NSDI, 2010, pp. 33–48.
P. Bodík, I. Menache, M. Chowdhury, P. Mani, D. A. Maltz, and I. Stoica, “Surviving failures in bandwidth-constrained datacenters,” in Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication. ACM, pp. 431–442, 2012.
K. yin Chen, Y. Xu, K. Xi, and H. Chao, “Intelligent virtual machine placement for cost efficiency in geo-distributed cloud systems,” in Communications (ICC), 2013 IEEE International Conference on, pp. 3498–3503, 2013.
“Amazon EC2,” http://aws.amazon.com/ec2/pricing.
A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The Cost of a Cloud: Research Problems in Data Center Networks,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 68–73, Dec. 2008.
Y. Chen, S. Jain, V. Adhikari, Z.-L. Zhang, and K. Xu, “A first look at inter-data center traffic characteristics via yahoo! datasets,” in INFOCOM, 2011 Proceedings IEEE, IEEE, pp. 1620–1628, 2011.
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. B. Zdonik, “Scalable Distributed Stream Processing.” in CIDR, vol. 3, 2003, pp. 257–268.
L. Tian and K. M. Chandy, “Resource allocation in streaming environments,” in Grid Computing, 7th IEEE/ACM International Conference on. IEEE, 2006, pp. 270–277.
J. Jiang, T. Lan, S. Ha, M. Chen, and M. Chiang, “Joint vm placement and routing for data center traffic engineering,” in INFOCOM, 2012 Proceedings IEEE, March 2012, pp. 2876–2880.
K. You, B. Tang, Z. Qian, S. Lu, and D. Chen, “Qos-aware placement of stream processing service,” The Journal of Supercomputing, vol. 64, no. 3, pp. 919–941, 2013.
H. Ballani, K. Jang, T. Karagiannis, C. Kim, D. Gunawardena, and G. O’Shea, “Chatty Tenants and the Cloud Network Sharing Problem,” in Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2013, pp. 171–184.
W. Fang, X. Liang, S. Li, L. Chiaraviglio, and N. Xiong, “VMPlanner: Optimizing virtual machine placement and traffic flow routing to reduce network power costs in cloud data centers,” Computer Networks, vol. 57, no. 1, pp. 179–196, 2013.
X. Li, J. Wu, S. Tang, and S. Lu, “Let’s Stay Together: Towards Traffic Aware Virtual Machine Placement in Data Centers,” in Proc. of the 33rd IEEE International Conference on Computer Communications (INFOCOM), 2014.
L. Wang, F. Zhang, J. Arjona Aroca, A. Vasilakos, K. Zheng, C. Hou, D. Li, and Z. Liu, “GreenDCN: A General Framework for Achieving Energy Efficiency in Data Center Networks,” Selected Areas in Communications, IEEE Journal on, vol. 32, no. 1, pp. 4–15, January 2014.
H. C. Zhao, C. H. Xia, Z. Liu, and D. Towsley, “A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks,” in Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS ’10. ACM, 2010, pp. 299–310.
K. LaCurts, S. Deng, A. Goyal, and H. Balakrishnan, “Choreo: network-aware task placement for cloud applications,” in Proceedings of the 2013 conference on Internet measurement conference. ACM, 2013, pp. 191–204.
C.-G. Lee and Z. Ma, “The generalized quadratic assignment problem,” Research Rep., Dept., Mechanical Industrial Eng., Univ. Toronto, Canada, 2004.
“Data Center Locations,” http://www.google.com/about/datacenters/inside/locations/index.html.
B. Chinoy and H.-W. Braun, “The National Science Foundation Network,” Technical Report GA-A21029, SDSC, Tech. Rep., 1992.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Zeng, D., Gu, L., Guo, S. (2015). A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers. In: Cloud Networking for Big Data. Wireless Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-24720-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-24720-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24718-2
Online ISBN: 978-3-319-24720-5
eBook Packages: Computer ScienceComputer Science (R0)