Abstract
As data exploration has increased rapidly in recent years, the datastore and data processing are getting more and more attention in extracting important information. To find a scalable solution to process the large-scale data is a critical issue in either the relational database system or the emerging NoSQL database. With the inherent scalability and fault tolerance of Hadoop, MapReduce is attractive to process the massive data in parallel. Most of previous researches focus on developing the SQL or SQL-like queries translator with the Hadoop distributed file system. However, it could be difficult to update data frequently in such file system. Therefore, we need a flexible datastore as HBase not only to place the data over a scale-out storage system, but also to manipulate the changeable data in a transparent way. However, the HBase interface is not friendly enough for most users. A GUI composed of SQL client application and database connection to HBase will ease the learning curve. In this paper, we propose the JackHare framework with SQL query compiler, JDBC driver and a systematical method using MapReduce framework for processing the unstructured data in HBase. After importing the JDBC driver to a SQL client GUI, we can exploit the HBase as the underlying datastore to execute the ANSI-SQL queries. Experimental results show that our approaches can perform well with efficiency and scalability.
Similar content being viewed by others
References
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proceedings of the VLDB Endowment. VLDB Endowment, Armonk pp. 922–933 (2009)
Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)
Apache Hadoop: http://hadoop.apache.org (2013)
Apache HBase: http://hbase.apache.org (2013)
Binnig, C., Rehrmann, R., Faerber, F., Riewe, R.: FunSQL: it is time to make SQL functional. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 41–46. ACM, New York (2012)
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MaPreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 975–986 (2010)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 1–26 (2008)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43. ACM, New York (2003)
Gowraj, N., Ravi, P.V., Sumalatha, M.R.: S2MART: smart sql to map-reduce translators. In: Proceedings of the Web Technologies and Applications. LNCS, vol. 7808, pp. 571–582. Springer, Berlin (2013)
Hive HBase Integration (2013). https://cwiki.apache.org/Hive/hbaseintegration.html
Kaldewey, T., Shekita, E.J., Tata, S.: Clydesdale: structured data processing on MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 15–25. ACM, New York (2012)
Lee, R., Luo, T., Huai, Y., Wang, F., He, Y., Zhang, X.Y.: Yet another SQL-to-MapReduce translator. In: Proceeding of the 2011 31st International Conference on Distributed Computing Systems, Washington, pp. 25–36 (2011)
Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 949–960. ACM, New York (2011)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009)
Stonebraker, M., Cattell, R.: 10 rules for scalable performance in ’simple operation’ datastores. Commun. ACM 54(6), 72–80 (2011)
Su, X., Swart, G.: Oracle in-database hadoop: when mapreduce meets RDBMS. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 779–790. ACM, New York (2012)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment. VLDB Endowment, Armonk pp. 1626–1629 (2009)
Xu, Y., Hu, S.: QMapper: a tool for SQL optimization on hive using query rewriting. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 212–221. ACM, Geneva (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chung, WC., Lin, HP., Chen, SC. et al. JackHare: a framework for SQL to NoSQL translation using MapReduce. Autom Softw Eng 21, 489–508 (2014). https://doi.org/10.1007/s10515-013-0135-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-013-0135-x