A Join Optimization Method for CPU/MIC Heterogeneous Systems

  • Kailai Zhou
  • Hong ChenEmail author
  • Hui Sun
  • Cuiping Li
  • Tianzhen Wu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9659)


In recent years, heterogeneous systems consisting of general CPUs and many-core coprocessors have become the main trend in the high-performance computing area due to their powerful parallel computing capabilities and superior energy efficiencies. Join is one of the most important operations in database system. In order to effectively exploit each hardware’s advantages in heterogeneous systems, in this paper we focus on how to optimize the join algorithm in hybrid CPU/MIC system. We design a join method with CPU and MIC working collaboratively when implementing the join operation. In order to fully utilize the MIC’s parallel computing power, we also propose a Sort-Scatter-Join (SSJ) algorithm for MIC to generate the join index. Through turning the traditional process of comparison and matching into the process of computing and scattering, the SSJ gains more beneficial from thread-level parallelism and SIMD data parallelism. Experiment results show that, compared with the traditional parallel sort-merge join algorithm, the peak performance of the SSJ running on MIC is improved by around 26 %.


Join Optimization CPU-MIC Sort-merge join 


  1. 1.
    Casper, J., Olukotun, K.: Hardware acceleration of database operations. In: Proceedings of the ACM/SIGDA International Symposium on FPGA, pp. 151–160. ACM, New York (2014)Google Scholar
  2. 2.
    Stuart, O., Brian, R., Ziliang, Z.: SQLPhi: a SQL-based database engine for Intel Xeon Phi coprocessors. In: Proceedings of the 2014 International Conference on Big Data Science and Computing, pp. 1–6. ACM Press, New York (2014)Google Scholar
  3. 3.
    Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 37–48, New York (2011)Google Scholar
  4. 4.
    Balkesen, C., Alonso, G., Teubner, J. et al.: Multi-core, main-memory joins: sort vs. hash revisited. In: The 40th International Conference on Very Large Data Bases, pp. 85–96, Hangzhou (2014)Google Scholar
  5. 5.
    Kim, C., Sedlar, E., Chhugani, J., et al.: Sort vs. hash revisited fast join implementation on modern multi-core CPUs. VLDB Endow. 2(2), 1378–1389 (2009)CrossRefGoogle Scholar
  6. 6.
    Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. VLDB Endow. 5(10), 1064–1075 (2012)CrossRefGoogle Scholar
  7. 7.
    He, B., Lu, M., Yang, K.: Relational query co-processing on graphics processors. Trans. Database Syst. ACM 34(4), 23–32 (2009)CrossRefGoogle Scholar
  8. 8.
    He, B., Yang, K., et al.: Relational joins on graphics processors. In: ACM SIGMOD International Conference on Management of Data, pp. 511–524. ACM, New York (2008)Google Scholar
  9. 9.
    Kaldewey, T., Lohman, G., et al.: GPU join processing revisited. In: Proceedings of the 18th International Workshop on Data Management on New Hardware, pp. 55–62 (2012)Google Scholar
  10. 10.
    Pirk, H., Kersten, M., Manegold, S.: Accelerating foreign-key joins using asymmetric memory channels. In: The 2nd International Conference on Accelerating Data Management Systems (2011)Google Scholar
  11. 11.
    Karnagel, T., Habich, D., Schlegel, B., et al.: Heterogeneity-aware operator placement in column-store DBMS. Datenbank-Spektrum 14(3), 211–221 (2014)CrossRefGoogle Scholar
  12. 12.
    Jim, J., James, R.: Intel Xeon Phi Coprocessor High Performance Programming. Morgan Kaufmann, San Francisco (2013)Google Scholar
  13. 13.
    Jha, S., He, B., Lu, M., et al.: Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach. VLDB Endow. 8(6), 642–653 (2015)CrossRefGoogle Scholar
  14. 14.
    Tian, X., Saito, H., Preis, S.V., et al.: Effective SIMD vectorization for Intel Xeon Phi coprocessors. Sci. Program. 2015, 1–14 (2015)Google Scholar
  15. 15.
    Potluri, S., Venkatesh, A., et al.: Efficient intra-node communication on Intel-MIC clusters. In: The 13th IEEE/ACM Cluster, Cloud and Grid Computing, pp. 128–135 (2013)Google Scholar
  16. 16.
    Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: The 23rd IEEE International Symposium on Parallel and Distributed Processing, pp. 1–10 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Kailai Zhou
    • 1
    • 2
  • Hong Chen
    • 1
    Email author
  • Hui Sun
    • 1
  • Cuiping Li
    • 1
  • Tianzhen Wu
    • 1
  1. 1.Key Lab of Data Engineering and Knowledge Engineering of MOE, and School of InformationRenmin University of ChinaBeijingChina
  2. 2.School of Computer and InformationSouthwest Forestry UniversityKunmingChina

Personalised recommendations