Techniques and guidelines for effective migration from RDBMS to NoSQL

  • Ho-Jun Kim
  • Eun-Jeong Ko
  • Young-Ho Jeon
  • Ki-Hoon Lee


Migration from RDBMS to NoSQL has become an important topic in a big data era. This paper provides comprehensive techniques and guidelines for effective migration from RDBMS to NoSQL. We discuss the challenges faced in translating SQL queries; the effects of denormalization, column families, secondary indexes, join algorithms, and column name length; and decision support for the migration. We focus on a column-oriented NoSQL, HBase because it is widely used by many Internet enterprises such as Facebook, Twitter, and LinkedIn. Because HBase does not support SQL, we use Apache Phoenix as an SQL layer on top of HBase. Experimental results using TPC-H show that column-level denormalization with atomicity and grouping columns into column families significantly improve query performance; the use of secondary indexes on foreign keys is not as effective as in RDBMSs; the query optimizer of Phoenix is not very sophisticated; shortened column names significantly reduce the database size and improve query performance; and the SVM classifier can predict whether query performance is improved by migration or not. Important open problems in NoSQL research are supporting complex SQL queries, automatic index selection, and optimizing SQL queries for NoSQL.


Migration RDBMS NoSQL Denormalization Column family Secondary index Query optimization Decision support 



This work was supported by the National Research Foundation of Korea(NRF) Grant funded by the Korea government (MSIT) (No. NRF-2015R 1C 1A 1A02036517). The present Research has been conducted by the Research Grant of Kwangwoon University in 2017.


  1. 1.
    Kim H-J, Ko E-J, Jeon Y-H, Lee K-H (2017) Migration from RDBMS to column-oriented NoSQL: lessons learned and open problems. In: EDB, LNEE, vol 461, pp 25–33Google Scholar
  2. 2.
    Yoo J, Lee K-H, Jeon Y-H (2018) Migration from RDBMS to NoSQL using column-level denormalization and atomic aggregates. J Inf Sci Eng 34(1):243–259Google Scholar
  3. 3.
    Karnitis G, Arnicans G (2015) Migration of relational database to document-oriented database: structure denormalization and data transformation. In: CICSyN, pp 113–118Google Scholar
  4. 4.
    Zhao G, Lin Q, Li L, Li Z (2014) Schema conversion model of SQL database to NoSQL. In: 3PGCIC, pp 355–362Google Scholar
  5. 5.
    Lee C-H, Zheng Y-L (2015) Automatic SQL-to-NoSQL schema transformation over the MySQL and HBase databases. In: IEEE ICCE-TW, pp 426–427Google Scholar
  6. 6.
    Zhao G, Li L, Li Z, Lin Q (2014) Multiple nested schema of HBase for migration from SQL. In: 3PGCIC, pp 338–343Google Scholar
  7. 7.
    Lee C-H, Zheng Y-L (2015) SQL-to-NoSQL schema denormalization and migration: a study on content management systems. In: IEEE SMC, pp 2022–2026Google Scholar
  8. 8.
    Vajk T, Feher P, Fekete K, Charaf H (2013) Denormalizing data into schema-free databases. In: IEEE CogInfoCom, pp 747–752Google Scholar
  9. 9.
    Vajk T, Deak L, Fekete K, Mezei G (2013) Automatic NoSQL schema development: a case study. In: PDCN, pp 656–663Google Scholar
  10. 10.
    Ho L-Y, Hsieh M-J, Wu J-J, Liu P (2015) Data partition optimization for column-family NoSQL databases. In: IEEE Smart City, pp 668–675Google Scholar
  11. 11.
    Mior MJ, Salem K, Aboulnaga A, Liu R (2016) NoSE: schema design for NoSQL applications. In: IEEE ICDE, pp 181–192Google Scholar
  12. 12.
    Ge W, Huang Y, Zhao D, Luo S, Yuan C, Zhou W, Tang Y, Zhou J (2014) A secondary index with hotscore caching policy on key-value data store. In: ADMA, LNCS, vol 8933, pp 602–615Google Scholar
  13. 13.
    Gadkari A, Nikam VB, Meshram BB (2014) Implementing joins over HBase on cloud platform. In: IEEE CIT, pp 547–554Google Scholar
  14. 14.
    Han D, Stroulia E (2012) A three-dimensional data model in HBase for large time-series dataset analysis. In: IEEE MESOCA, pp 47–56Google Scholar
  15. 15.
    Baralis E, Valle AD, Garza P, Rossi C, Scullino F (2017) SQL versus NoSQL databases for geospatial applications. In: IEEE BSDGoogle Scholar
  16. 16.
    Lee S-A, Kim J-H, Moon Y-S, Lee W-K (2015) Efficient level-based top-down data cube computation using MapReduce. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXI, LNCS, vol 9260, pp 1–19Google Scholar
  17. 17.
    Lee K-H, Park Y-H (2011) Revisiting source-level XQuery normalization. IEICE Trans Inf Syst E94-D(3):622–631CrossRefGoogle Scholar
  18. 18.
    Lee K-H, Kim S-Y, Whang E, Lee J-G (2006) A practitioner’s approach to normalizing XQuery expressions. In: DASFAA, LNCS, vol 3882, pp 437–453Google Scholar
  19. 19.
    Kim W (1982) On optimizing an SQL-like nested query. ACM Trans database Syst 7(3):443–469CrossRefzbMATHGoogle Scholar
  20. 20.
    Ganski R, Wong H (1987) Optimization of nested SQL queries revisited. In: ACM SIGMOD, pp 23–33Google Scholar
  21. 21.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer and Information EngineeringKwangwoon UniversitySeoulRepublic of Korea

Personalised recommendations