Computing infrastructure for big data processing

Abstract

With computing systems undergone a fundamental transformation from single-processor devices at the turn of the century to the ubiquitous and networked devices and the warehouse-scale computing via the cloud, the parallelism has become ubiquitous at many levels. At micro level, parallelisms are being explored from the underlying circuits, to pipelining and instruction level parallelism on multi-cores or many cores on a chip as well as in a machine. From macro level, parallelisms are being promoted from multiple machines on a rack, many racks in a data center, to the globally shared infrastructure of the Internet. With the push of big data, we are entering a new era of parallel computing driven by novel and ground breaking research innovation on elastic parallelism and scalability. In this paper, we will give an overview of computing infrastructure for big data processing, focusing on architectural, storage and networking challenges of supporting big data paper. We will briefly discuss emerging computing infrastructure and technologies that are promising for improving data parallelism, task parallelism and encouraging vertical and horizontal computation parallelism.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers A H. Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011, 1–137

    Google Scholar 

  2. 2.

    Graphics Processing Unit (GPU). http://en.wikipedia.org/wiki/Graphics_processing_unit

  3. 3.

    Kim N S, Draper S C, Zhou S T, Katariya S, Ghasemi H R, Park T. Analyzing the impact of joint optimization of cell size, redundancy, and ECC on low-voltage SRAM array total area. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2012, 20(12): 2333–2337

    Article  Google Scholar 

  4. 4.

    Gilani S Z, Kim N S, Schulte M J. Power-efficient computing for compute-intensive GPGPU applications. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. 2012, 445–446

    Google Scholar 

  5. 5.

    Mattson T. The future of many core computing: a tale of two processors. Intel Labs Report. 2010

    Google Scholar 

  6. 6.

    Borkar S. Thousand core chips: a technology perspective. In: Proceedings of the 44th Annual Design Automation Conference. 2007, 746–749

    Google Scholar 

  7. 7.

    Phase-change memory (pcm). http://en.wikipedia.org/wiki/Phasechange_memory

  8. 8.

    21st century computer architecture. http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf

  9. 9.

    Malewicz G, Austern M H, Bik A J, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 International Conference on Management of Data. 2010, 135–146

    Google Scholar 

  10. 10.

    Kyrölä A, Blelloch G, Guestrin C. GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, 31–46

  11. 11.

    Altavista web page hyperlink connectivity graph. 2012. http://webgraph.sandbox.yahoo.com

  12. 12.

    Guo Y, Pan Z, Heflin J. LUBM: a benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web, 2005, 3(2): 158–182

    Article  Google Scholar 

  13. 13.

    Prud’Hommeaux E, Seaborne A. SPARQL query language for RDF. W3C Recommendation, 2008

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ling Liu.

Additional information

Ling LIU is a full professor in the School of Computer Science at Georgia Institute of Technology. She directs the research programs in Distributed Data Intensive Systems Lab (DiSL). Prof. Liu’s researoh interests are in the areas of cloud computing, big data and big data analytics, distributed computing, and Internet services, with the focus on performance, availability, fault tolerance, security and privacy. She has published over 300 international journal and conference papers. She is a recipient of 2012 IEEE Computer Society Technical Achievement Award and an Outstanding Doctoral Thesis Advisor Award in 2012 from Georgia Institute of Technology. She is a co-Editor-in-Chief of the 5 volumes of Encyclopedia of Database Systems (Springer 2010), the Editorin-Chief of IEEE Transactions on Service Computing, and is on the editorial board of over a dozen international journals. Her research is primarily supported by grants from the U.S. National Science Foundation (NSF) and industrial companies such as IBM and Intel.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Liu, L. Computing infrastructure for big data processing. Front. Comput. Sci. 7, 165–170 (2013). https://doi.org/10.1007/s11704-013-3900-x

Download citation

Keywords

  • big data
  • cloud computing
  • data analytics
  • elastic scalability
  • heterogeneous computing
  • GPU
  • PCM
  • big data processing