Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures

  • Luke J. Gosink
  • Kesheng Wu
  • E. Wes Bethel
  • John D. Owens
  • Kenneth I. Joy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5566)

Abstract

The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitions and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel.

We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture–for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS’s performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column’s base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g. GPUs).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Becla, J., Lim, K.T.: Report from the workshop on extremely large databases (2007)Google Scholar
  2. 2.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRefGoogle Scholar
  3. 3.
    Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D., Heber, G.: Scientific data management in the coming decade. CTWatch Quarterly (2005)Google Scholar
  4. 4.
    Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, Electrical Engineering and Computer Sciences, University of California at Berkeley (2006)Google Scholar
  5. 5.
    DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35, 85–98 (1992)CrossRefGoogle Scholar
  6. 6.
    Raman, R., Vishkin, U.: Parallel algorithms for database operations and a database operation for parallel algorithms. In: Proc. International Parallel Processing Symposium (1995)Google Scholar
  7. 7.
    Litwin, W., Neimat, M.A., Schneider, D.A.: LH*—a scalable, distributed data structure. ACM Trans. Database Syst. 21, 480–525 (1996)CrossRefGoogle Scholar
  8. 8.
    Norman, M.G., Zurek, T., Thanisch, P.: Much ado about shared-nothing. SIGMOD Rec. 25, 16–21 (1996)CrossRefGoogle Scholar
  9. 9.
    Bamha, M., Hains, G.: Frequency-adaptive join for shared nothing machines. Parallel and Distributed Computing Practices 2 (1999)Google Scholar
  10. 10.
    Rahayu, J.W., Taniar, D.: Parallel selection query processing involving index in parallel database systems. In: ISPAN 2002, p. 0309 (2002)Google Scholar
  11. 11.
    Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M.C., Manocha, D.: Fast computation of database operations using graphics processors. In: Proc. of SIGMOD, pp. 215–226 (2004)Google Scholar
  12. 12.
    Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: Proc. of SIGMOD, pp. 325–336 (2006)Google Scholar
  13. 13.
    Fang, R., He, B., Lu, M., Yang, K., Govindaraju, N.K., Luo, Q., Sander, P.V.: GPUQP: query co-processing using graphics processors. In: Proc. SIGMOD, pp. 1061–1063 (2007)Google Scholar
  14. 14.
    He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: Proc. SIGMOD, pp. 511–524 (2008)Google Scholar
  15. 15.
    Sun, C., Agrawal, D., Abbadi, A.E.: Hardware acceleration for spatial selections and joins. In: Proc. of SIGMOD, pp. 455–466 (2003)Google Scholar
  16. 16.
    O’Neil, P.E., Quass, D.: Improved query performance with variant indexes. In: Proc. of SIGMOD, pp. 38–49 (1997)Google Scholar
  17. 17.
    Comer, D.: The ubiquitous B-tree. Computing Surveys 11, 121–137 (1979)CrossRefMATHGoogle Scholar
  18. 18.
    Gaede, V., Günther, O.: Multidimension access methods. ACM Computing Surveys 30, 170–231 (1998)CrossRefGoogle Scholar
  19. 19.
    Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. on Database Systems 31, 1–38 (2006)CrossRefGoogle Scholar
  20. 20.
    Stockinger, K., Wu, K., Shoshani, A.: Evaluation strategies for bitmap indices with binning. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 120–129. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  21. 21.
    Antoshenkov, G.: Byte-aligned bitmap compression. In: Proc. of the Conference on Data Compression, p. 476 (1995)Google Scholar
  22. 22.
    Antoshenkov, G., Ziauddin, M.: Query processing and optimization in ORACLE RDB. In: Proc. of VLDB, pp. 229–237 (1996)Google Scholar
  23. 23.
    Wu, K., Otoo, E., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: Proc. of VLDB, pp. 24–35 (2004)Google Scholar
  24. 24.
    Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-Pipelining Query Execution. In: Proc. Conference on Innovative Data Systems Research, Asilomar, CA, USA, pp. 225–237 (2005)Google Scholar
  25. 25.
    Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented dbms. In: Proc. of VLDB, pp. 553–564 (2005)Google Scholar
  26. 26.
    Gray, J., Liu, D.T., Nieto-Santisteban, M.A., Szalay, A.S., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. SIGMOD Record 34, 34–41 (2005)CrossRefGoogle Scholar
  27. 27.
    Zhang, R., Ooi, B.C., Tan, K.L.: Making the pyramid technique robust to query types and workloads. In: Proc. of ICDE, p. 313 (2004)Google Scholar
  28. 28.
    O’Neil, P.E.: Model 204 architecture and performance. In: Gawlick, D., Reuter, A., Haynie, M. (eds.) HPTS 1987. LNCS, vol. 359, pp. 40–59. Springer, Heidelberg (1989)Google Scholar
  29. 29.
    Amer-Yahia, S., Johnson, T.: Optimizing queries on compressed bitmaps. In: Proc. of VLDB, pp. 329–338 (2000)Google Scholar
  30. 30.
    Wu, K., Stockinger, K., Shoshani, A.: Breaking the curse of cardinality on bitmap indexes. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 348–365. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  31. 31.
    Sinha, R.R., Winslett, M.: Multi-resolution bitmap indexes for scientific data. ACM Trans. Database Syst. 32, 16 (2007)CrossRefGoogle Scholar
  32. 32.
    Glatter, M., Huang, J., Gao, J., Mollenhour, C.: Scalable data servers for large multivariate volume visualization. Trans. on Visualization and Computer Graphics 12, 1291–1298 (2006)CrossRefGoogle Scholar
  33. 33.
    McCormick, P., Inman, J., Ahrens, J., Hansen, C., Roth, G.: Scout: A hardware-accelerated system for quantitatively driven visualization and analysis. In: Proc. of IEEE Visualization, pp. 171–178 (2004)Google Scholar
  34. 34.
    He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proc. of the conference on Supercomputing, pp. 1–12 (2007)Google Scholar
  35. 35.
    Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26, 80–113 (2007)CrossRefGoogle Scholar
  36. 36.
    Lieberman, M.D., Sankaranarayanan, J., Samet, H.: A fast similarity join algorithm using graphics processing units. In: Proc. of ICDE, pp. 1111–1120 (2008)Google Scholar
  37. 37.
    NVIDIA Corporation: NVIDIA CUDA compute unified device architecture programming guide (2007), http://developer.nvidia.com/cuda
  38. 38.
    Bethel, E.W., Campbell, S., Dart, E., Stockinger, K., Wu, K.: Accelerating network traffic analysis using query-driven visualization. In: Proc. of the Symposium on Visual Analytics Science and Technology, pp. 115–122 (2006)Google Scholar
  39. 39.
    Stockinger, K., Shalf, J., Wu, K., Bethel, E.W.: Query-driven visualization of large data sets. In: Proc. of IEEE Visualization, pp. 167–174 (2005)Google Scholar
  40. 40.
    Gosink, L., Anderson, J.C., Bethel, E.W., Joy, K.I.: Variable interactions in query driven visualization. IEEE Trans. on Visualization and Computer Graphics. 13, 1400–1407 (2007)CrossRefGoogle Scholar
  41. 41.
    Nichols, B., Buttlar, D., Farrell, J.P.: Pthreads Programming. O’Reilly, Sebastopol (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Luke J. Gosink
    • 1
  • Kesheng Wu
    • 2
  • E. Wes Bethel
    • 3
  • John D. Owens
    • 1
  • Kenneth I. Joy
    • 1
  1. 1.Institute for Data Analysis and Visualization (IDAV) One Shields AvenueUniversity of CaliforniaDavisU.S.A.
  2. 2.Scientific Data Management Group, Lawrence Berkeley National LaboratoryBerkeleyU.S.A.
  3. 3.Visualization Group, Lawrence Berkeley National LaboratoryBerkeleyU.S.A.

Personalised recommendations