Programming and Computer Software

, Volume 43, Issue 3, pp 131–144 | Cite as

Parallel processing of very large databases using distributed column indexes

Article
  • 77 Downloads

Abstract

The development and investigation of efficient methods of parallel processing of very large databases using the columnar data representation designed for computer cluster is discussed. An approach that combines the advantages of relational and column-oriented DBMSs is proposed. A new type of distributed column indexes fragmented based on the domain-interval principle is introduced. The column indexes are auxiliary structures that are constantly stored in the distributed main memory of a computer cluster. To match the elements of a column index to the tuples of the original relation, surrogate keys are used. Resource hungry relational operations are performed on the corresponding column indexes rather than on the original relations of the database. As a result, a precomputation table is obtained. Using this table, the DBMS reconstructs the resulting relation. For basic relational operations on column indexes, methods for their parallel decomposition that do not require massive data exchanges between the processor nodes are proposed. This approach improves the class OLAP query performance by hundreds of times.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Turner, V., Gantz, J.F., Reinsel, D., et al., The Digital Universe of Opportunities: Rich Data and the creasing Value of the Internet of Things: IDC white paper, 2014. http://www.idcdocserv.com/1678.Google Scholar
  2. 2.
    Big Data Insights. Microsoft, 2013. https://blogs.msdn.microsoft.com/microsoftenterpriseinsight/2013/ 04/12/big-data-insights/.Google Scholar
  3. 3.
    Stonebraker, M., Madden, S., and Dubey, P., Intel “big data” science and technology center vision and execution plan, ACM SIGMOD Record, 2013, vol. 42, no. 1, pp. 44–49.CrossRefGoogle Scholar
  4. 4.
    Harizopoulos S., Abadi D., Madden S., and Stonebraker, M., OLTP through the looking glass, and what we found there, in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 981–992.Google Scholar
  5. 5.
    Williams, M.H. and Zhou, S., Data placement in parallel database systems, Parallel database techniques, 1998, pp. 203–218.Google Scholar
  6. 6.
    TOP500: 500 most powerful computer systems in the world. http://top500.org.Google Scholar
  7. 7.
    Kostenetskii, P.S. and Sokolinsky, L.B., Simulation of hierarchical multiprocessor database systems, Program. Comput. Software, 2013, vol. 39, no. 1, pp. 10–24.MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Lepikhov, A.V. and Sokolinsky, L.B., Query processing in a DBMS for cluster systems, Program. Comput. Software, 2010, vol. 36, no. 4, pp. 205–215.MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Lima, A.A., Furtado, C., Valduriez, P., and Mattoso, M., Parallel OLAP query processing in database clusters with data replication, Distributed Parallel Databases, 2009, vol. 25, no. 1–2, pp. 97–123.CrossRefGoogle Scholar
  10. 10.
    Pukdesree, S., Lacharoj, V., and Sirisang, P., Performance evaluation of distributed database on PCcluster computers, WSEAS Trans. Comput., 2011, vol. 10, no. 1, pp. 21–30.Google Scholar
  11. 11.
    Sokolinsky, L.B., Parallel Database Systems. Moscow: Mosk. Gos. Univ., 2013.MATHGoogle Scholar
  12. 12.
    Taniar, D., Leung, C.H.C., Rahayu, W., and Goel, S., High Performance Parallel Database Processing and Grid Databases, Wiley, 2008.CrossRefGoogle Scholar
  13. 13.
    Sokolinsky, L.B., Survey of architectures of parallel database systems, Program. Comput. Software, 2004, vol. 30, no. 6, pp. 337–346.CrossRefGoogle Scholar
  14. 14.
    Deshmukh, P.A., Review on main memory database, Int. J. Comput. Commun. Technol., 2011. vol. 2, no. 7, pp. 54–58.Google Scholar
  15. 15.
    Garcia-Molina, H. and Salem, K., Main memory database systems: An overview, IEEE Trans. Knowl. Data Eng., 1992, vol. 4, no. 6, pp. 509–516.CrossRefGoogle Scholar
  16. 16.
    Plattner, H. and Zeier, A., In-Memory Data Management: An Inflection Point for Enterprise Applications, Springer, 2011.CrossRefGoogle Scholar
  17. 17.
    LeHong, H., Fenn, J., Hype Cycle for Emerging Technologies, Research Report, Gartner, 2013.Google Scholar
  18. 18.
    Chaudhuri, S. and Dayal, U., An overview of data warehousing and OLAP technology, SIGMOD Record, 1997, vol. 26, no. 1, pp. 65–74.CrossRefGoogle Scholar
  19. 19.
    Furtado, P., A survey of parallel and distributed data warehouses, Int. J. Data Warehousing Mining, 2009, vol. 5, no. 5, pp. 57–77.CrossRefGoogle Scholar
  20. 20.
    Golfarelli, M. and Rizzi, S., A survey on temporal data warehousing, Int. J. Data Warehousing Mining, 2009, vol. 5, no. 1, pp. 1–17.CrossRefGoogle Scholar
  21. 21.
    Oueslati, W. and Akaichi, J., A survey on data warehouse evolution, Int. J. Database Management Syst., 2010, vol. 2, no. 4, pp. 11–24.CrossRefGoogle Scholar
  22. 22.
    Boncz, P.A. and Kersten, M.L., MIL primitives for querying a fragmented world, VLDB J., 1999, vol. 8, no. 2, pp. 101–119.CrossRefGoogle Scholar
  23. 23.
    Boncz, P.A., Zukowski, M., and Nes, N., MonetDB/X100: Hyper-pipelining query execution, in Proc. of the Second Biennial Conference on Innovative Data Systems Research (CIDR), 2005, pp. 225–237.Google Scholar
  24. 24.
    Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S.R., O’Neil E.J., O’Neil, P.E., Rasin, A., Tran, N., and Zdonik, S.B., C-Store: A column-oriented DBMS in Proc. of the 31st Int. Conf. on Very Large Data Bases (VLDB’05), 2005, pp. 553–564.Google Scholar
  25. 25.
    Abadi, D.J., Madden, S.R., and Hachem, N., Column-stores vs. row-stores: How different are they really? in Proc. of the 2008 ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 967–980.CrossRefGoogle Scholar
  26. 26.
    Abadi, D.J., Madden, S.R., and Ferreira, M., Integrating compression and execution in column-oriented database systems, in Proc. of the 2006 ACM SIGMOD Int. Conf. on Management of Data, 2006, pp. 671–682.CrossRefGoogle Scholar
  27. 27.
    Chernyshev, G.A., Organization of the physical level of column-oriented DBMSs, Tr. St. Petersburg Inst. Infor. Avtom. Ross. Akad. Nauk SPIIRAN, 2013, no. 7 (30), pp. 204–222. http://www.proceedings.spiiras.nw.ru/ojs/index.php/sp/index.Google Scholar
  28. 28.
    Abadi, D.J., Boncz, P.A., and Harizopoulos, S., Column-oriented database Systems, in Proc. of the VLDB Endowment, 2009, vol. 2, no. 2, pp. 1664–1665.CrossRefGoogle Scholar
  29. 29.
    Abadi, D.J., Boncz, P.A., Harizopoulos, S., Idreos, S., and Madden S., The design and implementation of modern column-oriented database systems, Foundations Trends Databases, 2013, vol. 5, no. 3, pp. 197–280.CrossRefGoogle Scholar
  30. 30.
    Plattner, H., A common database approach for OLTP and OLAP using an in-memory column database, in Proc. of the 2009 ACM SIGMOD Int. Conf. on Management of Data, 2009, pp. 1–2.Google Scholar
  31. 31.
    Copeland, G.P. and Khoshafian, S. N., A decomposition storage model, in Proc. of the 1985 ACM SIGMOD Int. Conf. on Management of Data, 1985, pp. 268–279.CrossRefGoogle Scholar
  32. 32.
    Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S., and Kersten, M.L., MonetDB: Two decades of research in column-oriented database architectures, IEEE Data Eng. Bull., 2012, vol. 35, no. 1, pp. 40–45.Google Scholar
  33. 33.
    Zukowski, M., Heman, S., Nes, N., and Boncz, P., Super-scalar RAM-CPU cache compression, Proc. of the 22nd Int. Conf. on Data Engineering, 2006, pp. 59–71.Google Scholar
  34. 34.
    Chen, Z., Gehrke, J., and Korn, F., Query optimization in compressed database systems, in Proc. of the 2001 ACM SIGMOD International Conference on Management of Data, 2001, pp. 271–282.CrossRefGoogle Scholar
  35. 35.
    Westmann, T., Kossmann, D., Helmer, S., and Moerkotte, G., The implementation and performance of compressed databases, ACM SIGMOD Record, 2000. vol. 29, no. 3, pp. 55–67.CrossRefGoogle Scholar
  36. 36.
    Aghav, S., Database compression techniques for performance optimization, in Proc. of the 2010 2nd Int. Conf. on Computer Engineering and Technology (ICCET), 2010, pp. 714–717.Google Scholar
  37. 37.
    Lemke, C., Sattler, K.-U., Faerber, F., Zeier, A., Speeding up queries in column stores: A case for compression, Proc. of the 12th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK’10), 2010, pp. 117–129.CrossRefGoogle Scholar
  38. 38.
    Ramamurthy, R., Dewitt, D., and Su, Q., A case for fractured mirrors, in Proc. of the VLDB Endowment, 2002, vol. 12, no. 2. pp. 89–101.Google Scholar
  39. 39.
    Khoshafian, S., Copeland, G., Jagodis, T., Boral, H., and Valduriez, P., A query processing strategy for the decomposed storage model, in Proc. of the Third Int. Conf. on Data Engineering, 1987, pp. 636–643.Google Scholar
  40. 40.
    Bruno, N., Teaching an old elephant new tricks, in Online Proc. of the Fourth Biennial Conf. on Innovative Data Systems Research (CIDR 2009), 2009. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_2.pdf.Google Scholar
  41. 41.
    El-Helw, A., Ross, K.A., Bhattacharjee, B., Lang, C.A., and Mihaila, G.A., Column-oriented query processing for row stores, Proc. of the ACM 14th Int. Workshop on Data Warehousing and OLAP (DOLAP’ 11), 2011, pp. 67–74.CrossRefGoogle Scholar
  42. 42.
    Larson, P.-A., Clinciu, C., Hanson, E.N., Oks, A., Price, S.L., Rangarajan, S., Surna, A., and Zhou, Q., SQL server column store indexes, in Proc. of the 2011 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’ 11), 2011, pp. 1177–1184.CrossRefGoogle Scholar
  43. 43.
    TPC Benchmark DS–Standard Specification, Transaction Processing Performance Council, 2015. http://www.tpc.org/TPC_Documents_Current_Versions/pdf/tpc-ds_v2.1.0.pdf.Google Scholar
  44. 44.
    Shapiro, M. and Miller, E., Managing databases with binary large objects, in 16th IEEE Symp. on Mass Storage Systems, 1999, pp. 185–193.Google Scholar
  45. 45.
    Padmanabhan, S., Malkemus, T., Agarwal, R., and Jhingran A., Block oriented processing of relational database operations in modern computer architectures, in Proc. of the 17th Int. Conf. on Data Engineering, 2001, pp. 567–574.CrossRefGoogle Scholar
  46. 46.
    O’Neil, P.E., Chen, X., and O’Neil, E.J., Adjoined dimension column index to improve star schema query performance, in Proc. of the 24th Int. Conf. on Data Engineering (ICDE 2008), 2008, pp. 1409–1411.Google Scholar
  47. 47.
    O’Neil, P.E., O’Neil, E.J., and Chen, X., The Star Schema Benchmark (SSB), Revision 3, June 5, 2009. http://www.cs.umb.edu/ poneil/StarSchemaB.PDF.Google Scholar
  48. 48.
    O’Neil, P.E., O’Neil, E.J., Chen, X., and Revilak, S., The star schema benchmark and augmented fact table indexing: performance evaluation and benchmarking, in First TPC Technology Conference (TPCTC 2009), 2009, pp. 237–252.Google Scholar
  49. 49.
    Garcia-Molina, H., Ullman, J.D., and Widom, J., Database Systems: The Complete Book, Upper Saddle River, NJ: Prentice Hall, 2002.Google Scholar
  50. 50.
    Ivanova, E. and Sokolinsky, L.B., Join decomposition based on fragmented column indices, Lobachevskii J. Math., 2016, vol. 37, no. 3, pp. 255–260.MathSciNetCrossRefMATHGoogle Scholar
  51. 51.
    Ivanova, E.V. and Sokolinsky, Using Intel Xeon Phi Coprocessors for execution of natural join on compressed data, Vychisl. Metody Program: Novye Vychisl. Tekhnol, 2015, vol. 16, no. 4, pp. 534–542.Google Scholar
  52. 52.
    Ivanova, E.V. and Sokolinsky, L.B., Decomposition of the grouping operation based on distributed column indexes, Nauka YurGU: Materialy 67 nauchnoi konferentsii professorsko-prepodavatel’skogo sostava, aspirantov i sotrudnikov, Sec. Estestvennykh nauk (Proc. of the Conf. of the faculty and postgraduates of Yuzhno-Ural’sk State Unversity, Ser. Natural Sciences), 2015, pp. 15–22.Google Scholar
  53. 53.
    Ivanova, E.V. and Sokolinsky, L.B., Decomposition of intersection and join operations based on the domain interval fragmented column indexes, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 1, pp. 44–56.Google Scholar
  54. 54.
    Ivanova, E.V. and Sokolinsky, L.B., Parallel decomposition of relational operations based on fragmented column indexes, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 4, pp. 80–100.Google Scholar
  55. 55.
    Deutsch, P. and Gailly, J.-L., ZLIB Compressed Data Format Specification version 3.3. RFC Editor, 1996. https://www.ietf.org/rfc/rfc1950.txt.Google Scholar
  56. 56.
    Roelofs, G., Gailly, J., and Adler, M., Zlib: A Massively Spiffy Yet Delicately Unobtrusive Compression Library. http://www.zlib.net/.Google Scholar
  57. 57.
    Deutsch, P., DEFLATE Compressed Data Format Specification version 1.3. RFC Editor, 1996. https:// www.ietf.org/rfc/rfc1951.txt.CrossRefGoogle Scholar
  58. 58.
    Kostenetskiy, P.S. and Safonov, A.Y., SUSU Supercomputer Resources, in Proc. of the 10th Annual Int. Scientific Conf. on Parallel Computing Technologies (PCT 2016), CEUR Workshop Proceedings, Vol. 1576, CEUR-WS 2015, pp. 561–573.Google Scholar
  59. 59.
    Massively Parallel Supercomputer RSC PetaStream.http://rscgroup.ru/ru/our-solutions/massivno-parallelnyy-superkompyuter-rsc-petastream.Google Scholar
  60. 60.
    TPC Benchmark H–Standard Specification. Transaction Processing Performance Council, 2014. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf.Google Scholar
  61. 61.
    Ivanova, E.V. and Sokolinsky, L.B., Columnar database coprocessor for computing cluster system, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 4, pp. 5–31.Google Scholar
  62. 62.
    Gray, J., Sundaresan, P., Englert, S., Baclawski, K.,and Weinberger, P.J., Quickly generating billion-record synthetic databases in Proc. of the 1994 ACM SIGMOD Int. Conf. on Management of Data, 1994, pp. 243–252.CrossRefGoogle Scholar

Copyright information

© Pleiades Publishing, Ltd. 2017

Authors and Affiliations

  1. 1.South Ural State UniversityChelyabinskRussia

Personalised recommendations