Skip to main content
Log in

Blocking optimized SIMD tree search on modern processors

  • Information Technology
  • Published:
Journal of Shanghai University (English Edition)

Abstract

Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kim C, Chhugani J, Satish N, Sedlar E, Nguyen A D, Kaldeway T, Lee V W, Brandt S A, Dubey P. FAST: Fast architecture sensitive tree search on modern CPUs and GPUs [C]// The 2010 International Conference of SIMOD, Indianapolis, USA. 2010: 339–350.

  2. Yang Y, Wang Y. Dictionary mechanism for Chinese word segmentation: Initial Bopomofo of secondcharacter Hash mechanism [J]. Computer Engineering and Design, 2010, 31(6): 1369–1375.

    Google Scholar 

  3. Knuth D E. The art of computer programming, volume III: Sorting and searching [M]. Baston: Addison-Wesley, 1973.

    Google Scholar 

  4. Schlegel B, Gemulla R, Lehner W. k-Ary search on modern processors [C]// Proceedings of the 5th International Workshop on Data Management on New Hardware, Providence, Rhode Island. 2009: 52–60.

  5. Lin Hai-bo, Xie Hai-bo, Shao Ling, Wang Yuanhong. Cell BE processor programming guide [M]. Beijing: Publishing House of Electronics Industry, 2008 (in Chinese).

    Google Scholar 

  6. Gedik B, Bordawekar R R, Yu P S. Cellsort: High performance sorting on the Cell processor [C]// Proceedings of the 33rd International Conference on Very Large Date Bases, Vienna, Austria. 2009: 52–60.

  7. Ross K A. Efficient hash probes on modern processors [C]// IEEE the 23rd International Conference on Data Engineering, Istanbul, Turkey. 2007: 1297–1301.

  8. Kim C, Sedlar E, Chhugani J, Kaldeway T, Nguyen A, Diblas A, Lee V, Satish N, Dubey P. Sort vs. hash revisited: Fast join implementation on multi-core CPUs [J]. Proceedings of the VLDB Endowment, 2009, 2(2): 1378–1389.

    Google Scholar 

  9. Zhou J, Ross K A. Implementing database operations using simd instructions [C]// Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, New York, USA. 2002: 145–156.

  10. Kaldewey T, Hagen J, Blas A D, Sedlar E. Parallel search on video cards [C]// The First USENIX Workshop on Hot Topics in Parallelism, Berkeley, CA. 2009.

  11. Gerber R, Bik A, Smith K, Tian X The software optimization cookbook: High-performance recipes for IA-32 platforms [M]. 2ed. Hillsboro: Intel Press, 2006.

    Google Scholar 

  12. IBM, Sony, Toshiba. SDK for multicore acceleration, programming tutorial [EB/OL]. Version 3.1. (2008-10-24) [2011-04-30]. http://public.dhe.ibm.com/software/dw/cell/CBE_Programming_Tutorial v3.1.pdf.

  13. Intel Corporation. Intel SSE4 programming reference [EB/OL]. (2007-07-12) [2011-04-30]. http://software. intel.com/file/18187/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei-min Xu  (徐炜民).

Additional information

Project supported by the Shanghai Leading Academic Discipline Project (Grant No.J50103), and the Graduate Student Innovation Foundation of Shanghai University (Grant No.SHUCX112167)

About this article

Cite this article

Zhang, Z., Lu, Yf., Shen, Wf. et al. Blocking optimized SIMD tree search on modern processors. J. Shanghai Univ.(Engl. Ed.) 15, 437–444 (2011). https://doi.org/10.1007/s11741-011-0765-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11741-011-0765-2

Keywords

Navigation