Skip to main content

A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers

  • Chapter
  • First Online:
Cloud Networking for Big Data

Part of the book series: Wireless Networks ((WN))

Abstract

For a big data stream processing (BDSP) application, we may have many different processing units in the form of VMs. These VMs are highly correlated by the data streams as one’s output may be another one’s input. Consequently, the networking shall have a deeper impact to the performance and efficiency of BDSP, compared to batch data processing. Besides, virtualized network functions (VNF), also in the form of VMs, can also be added in stream processing. For example, we may require all the data streams first go through deep packet inspection (DPI) VM before actual processing. How to manage these VMs as well as the communications between them in data centers is critical to the cost-efficiency BDSP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. L. Gu, D. Zeng, S. Guo and I. Stojmenovic, “A general communication cost optimization framework for big data stream processing in geo-distributed data centers,” Online, 2014.

    Google Scholar 

  2. G. Lee, J. Lin, C. Liu, A. Lorek, and D. Ryaboy, “The Unified Logging Infrastructure for Data Analytics at Twitter,” Proc. VLDB Endow., vol. 5, no. 12, pp. 1771–1780, 2012.

    Article  Google Scholar 

  3. G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin, “Fast data in the era of big data: Twitter’s real-time related query suggestion architecture,” in Proceedings of the 2013 international conference on Management of data. ACM, pp. 1147–1158, 2013.

    Google Scholar 

  4. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012, pp. 2–2.

    Google Scholar 

  5. Z. Zhang, M. Zhang, A. G. Greenberg, Y. C. Hu, R. Mahajan, and B. Christian, “Optimizing Cost and Performance in Online Service Provider Networks.” in Proc. USENIX NSDI, 2010, pp. 33–48.

    Google Scholar 

  6. P. Bodík, I. Menache, M. Chowdhury, P. Mani, D. A. Maltz, and I. Stoica, “Surviving failures in bandwidth-constrained datacenters,” in Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication. ACM, pp. 431–442, 2012.

    Google Scholar 

  7. K. yin Chen, Y. Xu, K. Xi, and H. Chao, “Intelligent virtual machine placement for cost efficiency in geo-distributed cloud systems,” in Communications (ICC), 2013 IEEE International Conference on, pp. 3498–3503, 2013.

    Google Scholar 

  8. “Amazon EC2,” http://aws.amazon.com/ec2/pricing.

  9. A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The Cost of a Cloud: Research Problems in Data Center Networks,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 68–73, Dec. 2008.

    Google Scholar 

  10. Y. Chen, S. Jain, V. Adhikari, Z.-L. Zhang, and K. Xu, “A first look at inter-data center traffic characteristics via yahoo! datasets,” in INFOCOM, 2011 Proceedings IEEE, IEEE, pp. 1620–1628, 2011.

    Google Scholar 

  11. M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. B. Zdonik, “Scalable Distributed Stream Processing.” in CIDR, vol. 3, 2003, pp. 257–268.

    Google Scholar 

  12. L. Tian and K. M. Chandy, “Resource allocation in streaming environments,” in Grid Computing, 7th IEEE/ACM International Conference on. IEEE, 2006, pp. 270–277.

    Google Scholar 

  13. J. Jiang, T. Lan, S. Ha, M. Chen, and M. Chiang, “Joint vm placement and routing for data center traffic engineering,” in INFOCOM, 2012 Proceedings IEEE, March 2012, pp. 2876–2880.

    Google Scholar 

  14. K. You, B. Tang, Z. Qian, S. Lu, and D. Chen, “Qos-aware placement of stream processing service,” The Journal of Supercomputing, vol. 64, no. 3, pp. 919–941, 2013.

    Article  Google Scholar 

  15. H. Ballani, K. Jang, T. Karagiannis, C. Kim, D. Gunawardena, and G. O’Shea, “Chatty Tenants and the Cloud Network Sharing Problem,” in Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2013, pp. 171–184.

    Google Scholar 

  16. W. Fang, X. Liang, S. Li, L. Chiaraviglio, and N. Xiong, “VMPlanner: Optimizing virtual machine placement and traffic flow routing to reduce network power costs in cloud data centers,” Computer Networks, vol. 57, no. 1, pp. 179–196, 2013.

    Article  Google Scholar 

  17. X. Li, J. Wu, S. Tang, and S. Lu, “Let’s Stay Together: Towards Traffic Aware Virtual Machine Placement in Data Centers,” in Proc. of the 33rd IEEE International Conference on Computer Communications (INFOCOM), 2014.

    Google Scholar 

  18. L. Wang, F. Zhang, J. Arjona Aroca, A. Vasilakos, K. Zheng, C. Hou, D. Li, and Z. Liu, “GreenDCN: A General Framework for Achieving Energy Efficiency in Data Center Networks,” Selected Areas in Communications, IEEE Journal on, vol. 32, no. 1, pp. 4–15, January 2014.

    Google Scholar 

  19. H. C. Zhao, C. H. Xia, Z. Liu, and D. Towsley, “A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks,” in Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS ’10. ACM, 2010, pp. 299–310.

    Google Scholar 

  20. K. LaCurts, S. Deng, A. Goyal, and H. Balakrishnan, “Choreo: network-aware task placement for cloud applications,” in Proceedings of the 2013 conference on Internet measurement conference. ACM, 2013, pp. 191–204.

    Google Scholar 

  21. C.-G. Lee and Z. Ma, “The generalized quadratic assignment problem,” Research Rep., Dept., Mechanical Industrial Eng., Univ. Toronto, Canada, 2004.

    Google Scholar 

  22. “Data Center Locations,” http://www.google.com/about/datacenters/inside/locations/index.html.

  23. B. Chinoy and H.-W. Braun, “The National Science Foundation Network,” Technical Report GA-A21029, SDSC, Tech. Rep., 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Zeng, D., Gu, L., Guo, S. (2015). A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers. In: Cloud Networking for Big Data. Wireless Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-24720-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24720-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24718-2

  • Online ISBN: 978-3-319-24720-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics