Using Memory in the Right Way to Accelerate Big Data Processing

Yan, Dong; Yin, Xu-Sen; Lian, Cheng; Zhong, Xiang; Zhou, Xin; Wu, Gan-Sha

doi:10.1007/s11390-015-1502-9

Using Memory in the Right Way to Accelerate Big Data Processing

Regular Paper
Published: 25 January 2015

Volume 30, pages 30–41, (2015)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Dong Yan¹,
Xu-Sen Yin²,
Cheng Lian²,
Xiang Zhong²,
Xin Zhou² &
…
Gan-Sha Wu²

247 Accesses
7 Citations
Explore all metrics

Abstract

Big data processing is becoming a standout part of data center computation. However, latest research has indicated that big data workloads cannot make full use of modern memory systems. We find that the dramatic inefficiency of the big data processing is from the enormous amount of cache misses and stalls of the depended memory accesses. In this paper, we introduce two optimizations to tackle these problems. The first one is the slice-and-merge strategy, which reduces the cache miss rate of the sort procedure. The second optimization is direct-memory-access, which reforms the data structure used in key/value storage. These optimizations are evaluated with both micro-benchmarks and the real-world benchmark HiBench. The results of our micro-benchmarks clearly demonstrate the effectiveness of our optimizations in terms of hardware event counts; and the additional results of HiBench show the 1.21X average speedup on the application-level. Both results illustrate that careful hardware/software co-design will improve the memory efficiency of big data processing. Our work has already been integrated into Intel distribution for Apache Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1):107–113.
Article Google Scholar
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J M. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1006.4990, 2010. http://arxiv.org/abs/1006.4990, Dec. 2014.
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M, Shenker S, Stoica I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. the 9th USENIX Conference on Networked Systems Design and Implementation, April 2012, pp.15–28.
Shafer J, Rixner S, Cox A L. The Hadoop distributed filesystem: Balancing portability and performance. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software, March 2010, pp.122-133.
Wang Y, Xu C, Li X, Yu W. JVM-bypass for efficient Hadoop shuffling. In Proc. the 27th IEEE International Symposium on Parallel and Distributed Processing, May 2013, pp.569–578.
Hardavellas N, Ferdman M, Falsafi B, Ailamaki A. Toward dark silicon in servers. IEEE Micro, 2011, 31(4): 6–15.
Article Google Scholar
Horowitz M, Alon E, Patil D, Naffziger S, Kumar R, Bernstein K. Scaling, power, and the future of CMOS. In Proc. IEEE Int. Electron Devices Meeting, December 2005, pp.7–15.
Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu A D, Ailamaki A, Falsafi B. Quantifying the mismatch between emerging scale-out applications and modern processors. ACM Transactions on Computer Systems, 2012, 30(4): Article No. 15.
Huang S, Huang J, Dai J, Xie T, Huang B. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In Proc. the 26th IEEE International Conference on Data Engineering Workshops, March 2010, pp.41–51.
Yang D, Zhong X, Yan D, Dai F, Yin X, Lian C, Zhu Z, Jiang W, Wu G. NativeTask: A Hadoop compatible framework for high performance. In Proc. IEEE International Conference on Big Data, October 2013, pp.94–101.
Chen R, Chen H, Zang B. Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling. In Proc. the 19th International Conference on Parallel Architectures and Compilation Techniques, September 2010, pp.523-534.
Brodal G S, Fagerberg R, Vinther K. Engineering a cacheoblivious sorting algorithm. Journal of Experimental Algorithmics, 2008, 12(2): Article No. 22.
Levinthal D. Cycle accounting analysis on Intel^® Core™ 2 processors. Intel Corp., 2012. https://software.intel.com/sites/products/collateral/hpc/vtune/cycle accounting analysis.pdf, Dec. 2014.
Levinthal D. Performance analysis guide for Intel^® Core™i7 processor and Intel^® XeonTM 5500 processors. Intel Performance Analysis Guide, 2009. https://software.intel.com/sites/products/collateral/hpc/vtune/performanceanalysis guide.pdf, Dec. 2014.
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C. Evaluating MapReduce for multi-core and multiprocessor systems. In Proc. the 13th IEEE International Symposium on High Performance Computer Architecture, February 2007, pp.13–24.
Yoo R M, Romano A, Kozyrakis C. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In Proc. IEEE International Symposium on Workload Characterization, October 2009, pp.198–207.
He B, Fang W, Luo Q, Govindaraju N K, Wang T. Mars: A MapReduce framework on graphics processors. In Proc. the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008, pp.260–269.
Hong C, Chen D, Chen W, Zheng W, Lin H. MapCG: Writing parallel program portable between CPU and GPU. In Proc. the 19th International Conference on Parallel Architectures and Compilation Techniques, September 2010, pp.217–226.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Dong Yan
Intel Labs China, Beijing, 100091, China
Xu-Sen Yin, Cheng Lian, Xiang Zhong, Xin Zhou & Gan-Sha Wu

Authors

Dong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Sen Yin
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Lian
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Xin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Gan-Sha Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Yan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, D., Yin, XS., Lian, C. et al. Using Memory in the Right Way to Accelerate Big Data Processing. J. Comput. Sci. Technol. 30, 30–41 (2015). https://doi.org/10.1007/s11390-015-1502-9

Download citation

Received: 14 July 2014
Revised: 16 December 2014
Published: 25 January 2015
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11390-015-1502-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Memory in the Right Way to Accelerate Big Data Processing

Abstract

Access this article

Similar content being viewed by others

Barriers to the Widespread Adoption of Processing-in-Memory Architectures

Performance Test for Big Data Workloads on Various Emerging Memories

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using Memory in the Right Way to Accelerate Big Data Processing

Abstract

Access this article

Similar content being viewed by others

Barriers to the Widespread Adoption of Processing-in-Memory Architectures

Performance Test for Big Data Workloads on Various Emerging Memories

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation