Abstract
Modern GPUs (Graphics processing units) can perform computation at a very high rate as compared to CPU’s; as a result they are increasingly used for general purpose parallel computation. Parallel algorithms can be developed for GPUs using different computing architectures like CUDA (compute unified device architecture) and OpenCL (Open Computing Language). Determining Optimal Binary Search Tree is an optimization problem to find the optimal arrangement of nodes in a binary search tree so that average search time is minimized. A Dynamic programming algorithm can solve this problem within O(n3)-time complexity and a workspace of size O(n2). We have developed a fast parallel implementation of this O(n3)-time algorithm on a GPU. For achieving the required goal we need to provide data structures suitable for parallel computation of this algorithm, besides we need to efficiently utilize the cache memory available and to minimize thread divergence. Our implementation executes this algorithm within 114.4 s for an instance containing 16384 keys on an NVidia GTX 570, while a conventional CPU based implementation takes 48166 s to execute. Thus, a speed up factor of 422 compared to a conventional CPU based implementation is obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cormen, T.H., Lieserson, C.E., Rivest, R.L.: Introduction to Algorithms, 4th edn. MIT Press, London (1990)
Neapolitan, R., Naimipour, K.: Foundations of Algorithms Using C ++ Pseudo Code. Jones & Bartlett, Toronto (2003)
NVidia Corporation: CUDA programming guide version 4.1. (2011) http://docs.nvidia.com/cuda/cuda-c-programming-guide/
NVidia Corporation: CUDA C Best Practices Guide version 4.1 (2011). http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/
Hwu, W.W.: GPU Computing Gems, Emerald edn. Morgan Kaufmann, San Francisco (2011)
Man, D., Uda, K., Ito, Y., Nakano, K.: A GPU implementation of computing Euclidean distance map. In: 2011 Second International Conference on Networking and Computing (ICNC), pp. 68–76. IEEE (2011)
Nishida, K., Nakano, K., Ito, Y.: Accelerating the Dynamic Programming for the Optimal Polygon Triangulation on the GPU. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds.) ICA3PP 2012. LNCS, vol. 7439, pp. 1–15. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33078-0_1
Nishida, K., Ito, Y., Nakano, K.: Accelerating the dynamic programming for the matrix chain product on the GPU. In: 2011 Second International Conference on Networking and Computing (ICNC), pp. 320–326. IEEE (2011)
Liu, Y., Schmidt, B.: GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. In: Fox, G.C., Hey, A.J. (eds.) Concurrency and Computation Practice and Experience, pp. 958–972. Wiley, New York (2014)
Li, K., Liu, J., Wan, L., Yin, S., Li, K.: A cost-optimal parallel algorithm for the 0–1 knapsack problem and its performance on multicore CPU and GPU implementations. Parallel Comput. 43, 27–42 (2015)
Chakroun, I., Melab, N.: An adaptative multi-GPU based branch-and-bound. A case study: the flow-shop scheduling problem. In: Proceedings of the 2012 IEEE 14th International Conference, HPCC 2012, pp. 389–395. IEEE Computer Society, Washington, DC (2012)
Gmys, J., Mezmaz, M., Melab, N., Tuyttens, D.: A GPU-based Branch-and-Bound algorithm using Integer-Vector-Matrix data structure. Parallel Comput. 59, 119–139 (2016)
Chakroun, I., Melab, N.: Operator-level GPU-accelerated branch and bound algorithms. Proc. Comput. Sci. 18(2013), 280–289 (2013)
Tan, G., Feng, S., Sun, N.: Locality and parallelism optimization for dynamic programming algorithm in bioinformatics. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 41. IEEE (2006)
NVidia Corporation, NVIDIA’s Next Generation CUDATM Compute Architecture: Fermi (2009). http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf
Han, B., Lu, Y.: Research on optimization and parallelization of optimal binary search tree using dynamic programming. In: Advances in Intelligent Systems Research. Atlantis Press, Paris (2012)
Myoupo, J.F., Tchendji, V.K.: Parallel dynamic programming for solving the optimal search binary tree problem on CGM. Int. J. High Perform. Comput. Netw. 7(4), 269–280 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wani, M.A., Ahmad, M. (2017). A Fast GPU Based Implementation of Optimal Binary Search Tree Using Dynamic Programming. In: Kaushik, S., Gupta, D., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2017. Communications in Computer and Information Science, vol 750. Springer, Singapore. https://doi.org/10.1007/978-981-10-6544-6_26
Download citation
DOI: https://doi.org/10.1007/978-981-10-6544-6_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6543-9
Online ISBN: 978-981-10-6544-6
eBook Packages: Computer ScienceComputer Science (R0)