Information Systems Frontiers

, Volume 14, Issue 4, pp 909–924 | Cite as

Scaling database performance on GPUs

  • Yue-Shan Chang
  • Ruey-Kai SheuEmail author
  • Shyan-Ming Yuan
  • Jyn-Jie Hsu


The market leaders of Cloud Computing try to leverage the parallel-processing capability of GPUs to provide more economic services than traditions. As the cornerstone of enterprise applications, database systems are of the highest priority to be improved for the performance and design complexity reduction. It is the purpose of this paper to design an in-memory database, called CUDADB, to scale up the performance of the database system on GPU with CUDA. The details of implementation and algorithms are presented, and the experiences of GPU-enabled CUDA database operations are also shared in this paper. For performance evaluation purposes, SQLite is used as the comparison target. From the experimental results, CUDADB performs better than SQLite for most test cases. And, surprisingly, the CUDADB performance is independent from the number of data records in a query result set. The CUDADB performance is a static proportion of the total number of data records in the target table. Finally, this paper comes out a concept of turning point that represents the difference ratio between CUDADB and SQLite.


GPU CUDA SQLite In-Memory Database 


  1. Ailamaki, A., DeWitt, D. J., Hill, M. D., & Skounakis, M. (2001). “Weaving Relations for Cache Performance,” In Proceedings of the 27th International Conference on Very Large Data Bases, pp. 169–180, San Francisco, USA.Google Scholar
  2. Akiyma, Y. “Large-scale Bioinformatics Applications on Multi-node GPU Environment,” URL:
  3. Atallah, M. J., Kosaraju, S. R., Larmore, L. L., Miller, G. L., & Teng, S.-H. (1989). “Constructing Trees in Parallel.” in Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, pp. 421–431Google Scholar
  4. Bakkum, P. & Skadron, K. (2010). “Accelerating SQL Database Operations on a GPU with CUDA.” In Proceedings of the 3 rd International Workshop on GPGPU, pp.94–103, New York, USA.Google Scholar
  5. Chang, Y. S. & Cheng, H-T. “A scientific data extraction architecture using classified metadata,” Journal of Supercomputing, doi: 10.1007/s11227-010-0462-7.
  6. Ding, S., He, J., Yan, H., & Suel, T. (2009). “Using Graphics Processors for High Performance IR Query Processing.” In Proceedings of the 18th International Conference on World Wide Web, pp. 421–430, April. 20–24, 2009, Madrid, Spain.Google Scholar
  7. Ferraro, P., Hanna, P., Imbert, L. & Izard, T., (2009). “Accelerating Query-Humming on GPU.” In Proceedings of the 10th Information Society for Music Information Retrieval Conference, pp. 279–284.Google Scholar
  8. Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., et al. (2008). Parallel Computing Experiences with CUDA. IEEE in Micro, 28(4), 13–27.CrossRefGoogle Scholar
  9. Govindaraju, N. K. Lloyd, B., Wang, W., Lin, M. & Manocha, D. (2004). “Fast Computation of Database Operations using Graphics Processors.” In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 215–226, Paris, France.Google Scholar
  10. Haboush, A., & Qawasmeh, S. (2011). Parallel Sequential Searching for Unsorted Array. Research Journal of Applied Science, 6(1), 70–75.CrossRefGoogle Scholar
  11. Harris, M. (2008). "Parallel Prefix Sum (Scan) with CUDA," NVIDIA.Google Scholar
  12. He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N. K., & Luo, Q. et al. (2008). “Relational Joins on Graphics Processors.” In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524, Vancouver, BC, Canada.Google Scholar
  13. Jung, J. J. (2010). Reusing Ontology Mappings for Query Segmentation and Routing in Semantic Peer-to-Peer Environment. Information Sciences, 180(17), 3248–3257.CrossRefGoogle Scholar
  14. Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: “A Unified Graphics and Computing Architecture”. IEEE Micro, 28(2), 39–55.CrossRefGoogle Scholar
  15. Liu, Z., & Ma, W. (2008). “Exploiting Computing Power on Graphics Processing Unit,” In Proceedings of International Conference on Computer Science and Software Engineering, pp. 1062–1065, Dec.Google Scholar
  16. Manavski, S. A. (2007). “CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptograph.” In Proceedings of International Conference on Signal Processing and Communication, ICSPC 2007, pp.65–68, November.Google Scholar
  17. Manegold, S., Boncz, P., & Kersten, M. L. (2000). “What Happens During a Join? Dissecting CPU and Memory Optimization Effects”. In Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 339–350, September 10–14, San Francisco, USA.Google Scholar
  18. Meki, S., & Kambayashi, Y. (August 2000). Acceleration of Relational Database Operations on Vector Processors. Systems and Computers, 31(8), 79–88.CrossRefGoogle Scholar
  19. Nickolls, J., Buck, I., Garland, M., & Skadron, K. (2008). Scalable Parallel Programming With CUDA. ACM Queue, 6(2), 40–53.CrossRefGoogle Scholar
  20. Owens, M. The Definitive Guide to SQLite, ISBN-13: 978-1-59059-673-9Google Scholar
  21. Pushpa, S., Vinod, P., & Maple, C. (2006). “Creating a Forest of Binary Search Trees for a Multiprocessor System.” in Proceedings of International Symposium on Parallel Computing in Electrical Engineering (PARELEC’06), pp. 290–295.Google Scholar
  22. Qihang Huang, Zhiyi Huang, Paul Werstein and Martin Purvis, “GPU as a General Purpose Computing Resource,” In Proceedings of PDCAT’08, pp. 151–158, Washington DC, 2008.Google Scholar
  23. Rao, J. & Ross, K. A. (1999). “Cache Conscious Indexing for Decision-Support in Main Memory.” In Proceedings of the 25th International Conference on Vary Large Data Bases, pp. 78–89.Google Scholar
  24. Rodrigues, C. I,. Hardy, D. J., Stone, J. E., Chulten, K., & Hwu, W.-M.W. (2008). “GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications.” In Proceedings of the Conference on Computing Frontiers, May 5–7.Google Scholar
  25. Ross, K. A. (2002). “Conjunctive Selection Conditions in Main Memory.” In Proceedings of the 21th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 109–120.Google Scholar
  26. Schatz, M., Trapnell, C., Delcher, A., Varschney, A. (2007). “High-Throughput Sequence Alignment Using Graphics Processing Units.” BMC Bioinformatics, 8(1).Google Scholar
  27. Sengupta, S., Harris, M., Zhang, Y. & Owens, J. D. (2007). “Scan Primitives for GPU Computing." In Proceedings of the 22th ACM SIGGRAPH Symposium on Graphic Hardware, pp. 97–106, Aug. 4–5.Google Scholar
  28. Chengen, W. & Lida, X. “Parameter mapping and data transformation for engineering application integration." Information Systems Frontiers, 10(5), 589–600.Google Scholar
  29. Wynters, E. (2011). Parallel Processing on NVIDIA Graphics Processing Units Using CUDA. Journal of Computing Sciences in Colleges, 26(3), Jan.Google Scholar
  30. Yuan, Z., Zhang, Y., Zhao, J., Ding, Y., Long, C., Xiong, L., et al. (2010). Real-time Simulation for 3D Tissue Deformation with CUDA Based GPU Computing. Journal of Convergence Information Technology, 5(4), 109–119.CrossRefGoogle Scholar
  31. Zhang, Y., Frank, M., Cui, X. & Potok, T. (2011). “Data-Intensive Document Clustering on Graphics Processing Unit Clusters.” Journal of Parallel and Distributed Computing, 71(2), Feb.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Yue-Shan Chang
    • 1
  • Ruey-Kai Sheu
    • 2
    Email author
  • Shyan-Ming Yuan
    • 3
  • Jyn-Jie Hsu
    • 4
  1. 1.Department of Computer Science and Information EngineeringNational Taipei UniversityTaipeiTaiwan
  2. 2.Department of Computer ScienceTunghai UniversityTaichungTaiwan
  3. 3.Department of Computer Science and Information EngineeringProvidence UniversityTaichungTaiwan
  4. 4.Department of Computer ScienceNational Chiao Tung UniversityHsinchuTaiwan

Personalised recommendations