Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1):107–113.
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J M. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1006.4990, 2010. http://arxiv.org/abs/1006.4990, Dec. 2014.
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M, Shenker S, Stoica I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. the 9th USENIX Conference on Networked Systems Design and Implementation, April 2012, pp.15–28.
Shafer J, Rixner S, Cox A L. The Hadoop distributed filesystem: Balancing portability and performance. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software, March 2010, pp.122-133.
Wang Y, Xu C, Li X, Yu W. JVM-bypass for efficient Hadoop shuffling. In Proc. the 27th IEEE International Symposium on Parallel and Distributed Processing, May 2013, pp.569–578.
Hardavellas N, Ferdman M, Falsafi B, Ailamaki A. Toward dark silicon in servers. IEEE Micro, 2011, 31(4): 6–15.
Horowitz M, Alon E, Patil D, Naffziger S, Kumar R, Bernstein K. Scaling, power, and the future of CMOS. In Proc. IEEE Int. Electron Devices Meeting, December 2005, pp.7–15.
Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu A D, Ailamaki A, Falsafi B. Quantifying the mismatch between emerging scale-out applications and modern processors. ACM Transactions on Computer Systems, 2012, 30(4): Article No. 15.
Huang S, Huang J, Dai J, Xie T, Huang B. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In Proc. the 26th IEEE International Conference on Data Engineering Workshops, March 2010, pp.41–51.
Yang D, Zhong X, Yan D, Dai F, Yin X, Lian C, Zhu Z, Jiang W, Wu G. NativeTask: A Hadoop compatible framework for high performance. In Proc. IEEE International Conference on Big Data, October 2013, pp.94–101.
Chen R, Chen H, Zang B. Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling. In Proc. the 19th International Conference on Parallel Architectures and Compilation Techniques, September 2010, pp.523-534.
Brodal G S, Fagerberg R, Vinther K. Engineering a cacheoblivious sorting algorithm. Journal of Experimental Algorithmics, 2008, 12(2): Article No. 22.
Levinthal D. Cycle accounting analysis on Intel® Core™ 2 processors. Intel Corp., 2012. https://software.intel.com/sites/products/collateral/hpc/vtune/cycle accounting analysis.pdf, Dec. 2014.
Levinthal D. Performance analysis guide for Intel® Core™i7 processor and Intel® XeonTM 5500 processors. Intel Performance Analysis Guide, 2009. https://software.intel.com/sites/products/collateral/hpc/vtune/performanceanalysis guide.pdf, Dec. 2014.
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C. Evaluating MapReduce for multi-core and multiprocessor systems. In Proc. the 13th IEEE International Symposium on High Performance Computer Architecture, February 2007, pp.13–24.
Yoo R M, Romano A, Kozyrakis C. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In Proc. IEEE International Symposium on Workload Characterization, October 2009, pp.198–207.
He B, Fang W, Luo Q, Govindaraju N K, Wang T. Mars: A MapReduce framework on graphics processors. In Proc. the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008, pp.260–269.
Hong C, Chen D, Chen W, Zheng W, Lin H. MapCG: Writing parallel program portable between CPU and GPU. In Proc. the 19th International Conference on Parallel Architectures and Compilation Techniques, September 2010, pp.217–226.