JackHare: a framework for SQL to NoSQL translation using MapReduce

Chung, Wu-Chun; Lin, Hung-Pin; Chen, Shih-Chang; Jiang, Mon-Fong; Chung, Yeh-Ching

doi:10.1007/s10515-013-0135-x

JackHare: a framework for SQL to NoSQL translation using MapReduce

Published: 28 September 2013

Volume 21, pages 489–508, (2014)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Wu-Chun Chung¹,
Hung-Pin Lin¹,
Shih-Chang Chen¹,
Mon-Fong Jiang² &
…
Yeh-Ching Chung¹

1453 Accesses
25 Citations
3 Altmetric
Explore all metrics

Abstract

As data exploration has increased rapidly in recent years, the datastore and data processing are getting more and more attention in extracting important information. To find a scalable solution to process the large-scale data is a critical issue in either the relational database system or the emerging NoSQL database. With the inherent scalability and fault tolerance of Hadoop, MapReduce is attractive to process the massive data in parallel. Most of previous researches focus on developing the SQL or SQL-like queries translator with the Hadoop distributed file system. However, it could be difficult to update data frequently in such file system. Therefore, we need a flexible datastore as HBase not only to place the data over a scale-out storage system, but also to manipulate the changeable data in a transparent way. However, the HBase interface is not friendly enough for most users. A GUI composed of SQL client application and database connection to HBase will ease the learning curve. In this paper, we propose the JackHare framework with SQL query compiler, JDBC driver and a systematical method using MapReduce framework for processing the unstructured data in HBase. After importing the JDBC driver to a SQL client GUI, we can exploit the HBase as the underlying datastore to execute the ANSI-SQL queries. Experimental results show that our approaches can perform well with efficiency and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proceedings of the VLDB Endowment. VLDB Endowment, Armonk pp. 922–933 (2009)
Google Scholar
Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)
Chapter Google Scholar
Apache Hadoop: http://hadoop.apache.org (2013)
Apache HBase: http://hbase.apache.org (2013)
Binnig, C., Rehrmann, R., Faerber, F., Riewe, R.: FunSQL: it is time to make SQL functional. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 41–46. ACM, New York (2012)
Chapter Google Scholar
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MaPreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 975–986 (2010)
Chapter Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 1–26 (2008)
Article MATH Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43. ACM, New York (2003)
Chapter Google Scholar
Gowraj, N., Ravi, P.V., Sumalatha, M.R.: S2MART: smart sql to map-reduce translators. In: Proceedings of the Web Technologies and Applications. LNCS, vol. 7808, pp. 571–582. Springer, Berlin (2013)
Chapter Google Scholar
Hive HBase Integration (2013). https://cwiki.apache.org/Hive/hbaseintegration.html
Kaldewey, T., Shekita, E.J., Tata, S.: Clydesdale: structured data processing on MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 15–25. ACM, New York (2012)
Chapter Google Scholar
Lee, R., Luo, T., Huai, Y., Wang, F., He, Y., Zhang, X.Y.: Yet another SQL-to-MapReduce translator. In: Proceeding of the 2011 31st International Conference on Distributed Computing Systems, Washington, pp. 25–36 (2011)
Chapter Google Scholar
Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 949–960. ACM, New York (2011)
Chapter Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)
Chapter Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009)
Google Scholar
Stonebraker, M., Cattell, R.: 10 rules for scalable performance in ’simple operation’ datastores. Commun. ACM 54(6), 72–80 (2011)
Article Google Scholar
Su, X., Swart, G.: Oracle in-database hadoop: when mapreduce meets RDBMS. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 779–790. ACM, New York (2012)
Chapter Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment. VLDB Endowment, Armonk pp. 1626–1629 (2009)
Google Scholar
Xu, Y., Hu, S.: QMapper: a tool for SQL optimization on hive using query rewriting. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 212–221. ACM, Geneva (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan
Wu-Chun Chung, Hung-Pin Lin, Shih-Chang Chen & Yeh-Ching Chung
is-land Systems Inc., Hsinchu Science Park, 3F, No.4, Prosperity Rd. 2, Hsinchu, 300, Taiwan
Mon-Fong Jiang

Authors

Wu-Chun Chung
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Pin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Chang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mon-Fong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yeh-Ching Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shih-Chang Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, WC., Lin, HP., Chen, SC. et al. JackHare: a framework for SQL to NoSQL translation using MapReduce. Autom Softw Eng 21, 489–508 (2014). https://doi.org/10.1007/s10515-013-0135-x

Download citation

Received: 15 December 2012
Accepted: 06 September 2013
Published: 28 September 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10515-013-0135-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

JackHare: a framework for SQL to NoSQL translation using MapReduce

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

NoSQL: Future of BigData Analytics Characteristics and Comparison with RDBMS

MongoDB Vs PostgreSQL: A comparative study on performance aspects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

JackHare: a framework for SQL to NoSQL translation using MapReduce

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

NoSQL: Future of BigData Analytics Characteristics and Comparison with RDBMS

MongoDB Vs PostgreSQL: A comparative study on performance aspects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation