Advertisement

Automated Software Engineering

, Volume 21, Issue 4, pp 489–508 | Cite as

JackHare: a framework for SQL to NoSQL translation using MapReduce

  • Wu-Chun Chung
  • Hung-Pin Lin
  • Shih-Chang ChenEmail author
  • Mon-Fong Jiang
  • Yeh-Ching Chung
Article

Abstract

As data exploration has increased rapidly in recent years, the datastore and data processing are getting more and more attention in extracting important information. To find a scalable solution to process the large-scale data is a critical issue in either the relational database system or the emerging NoSQL database. With the inherent scalability and fault tolerance of Hadoop, MapReduce is attractive to process the massive data in parallel. Most of previous researches focus on developing the SQL or SQL-like queries translator with the Hadoop distributed file system. However, it could be difficult to update data frequently in such file system. Therefore, we need a flexible datastore as HBase not only to place the data over a scale-out storage system, but also to manipulate the changeable data in a transparent way. However, the HBase interface is not friendly enough for most users. A GUI composed of SQL client application and database connection to HBase will ease the learning curve. In this paper, we propose the JackHare framework with SQL query compiler, JDBC driver and a systematical method using MapReduce framework for processing the unstructured data in HBase. After importing the JDBC driver to a SQL client GUI, we can exploit the HBase as the underlying datastore to execute the ANSI-SQL queries. Experimental results show that our approaches can perform well with efficiency and scalability.

Keywords

Cloud computing Unstructured data processing MapReduce NoSQL database HBase JDBC Compiler 

References

  1. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proceedings of the VLDB Endowment. VLDB Endowment, Armonk pp. 922–933 (2009) Google Scholar
  2. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010) CrossRefGoogle Scholar
  3. Apache Hadoop: http://hadoop.apache.org (2013)
  4. Apache HBase: http://hbase.apache.org (2013)
  5. Binnig, C., Rehrmann, R., Faerber, F., Riewe, R.: FunSQL: it is time to make SQL functional. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 41–46. ACM, New York (2012) CrossRefGoogle Scholar
  6. Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MaPreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 975–986 (2010) CrossRefGoogle Scholar
  7. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 1–26 (2008) CrossRefzbMATHGoogle Scholar
  8. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008) CrossRefGoogle Scholar
  9. Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43. ACM, New York (2003) CrossRefGoogle Scholar
  10. Gowraj, N., Ravi, P.V., Sumalatha, M.R.: S2MART: smart sql to map-reduce translators. In: Proceedings of the Web Technologies and Applications. LNCS, vol. 7808, pp. 571–582. Springer, Berlin (2013) CrossRefGoogle Scholar
  11. Kaldewey, T., Shekita, E.J., Tata, S.: Clydesdale: structured data processing on MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 15–25. ACM, New York (2012) CrossRefGoogle Scholar
  12. Lee, R., Luo, T., Huai, Y., Wang, F., He, Y., Zhang, X.Y.: Yet another SQL-to-MapReduce translator. In: Proceeding of the 2011 31st International Conference on Distributed Computing Systems, Washington, pp. 25–36 (2011) CrossRefGoogle Scholar
  13. Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 949–960. ACM, New York (2011) CrossRefGoogle Scholar
  14. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008) CrossRefGoogle Scholar
  15. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009) Google Scholar
  16. Stonebraker, M., Cattell, R.: 10 rules for scalable performance in ’simple operation’ datastores. Commun. ACM 54(6), 72–80 (2011) CrossRefGoogle Scholar
  17. Su, X., Swart, G.: Oracle in-database hadoop: when mapreduce meets RDBMS. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 779–790. ACM, New York (2012) CrossRefGoogle Scholar
  18. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment. VLDB Endowment, Armonk pp. 1626–1629 (2009) Google Scholar
  19. Xu, Y., Hu, S.: QMapper: a tool for SQL optimization on hive using query rewriting. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 212–221. ACM, Geneva (2013) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Wu-Chun Chung
    • 1
  • Hung-Pin Lin
    • 1
  • Shih-Chang Chen
    • 1
    Email author
  • Mon-Fong Jiang
    • 2
  • Yeh-Ching Chung
    • 1
  1. 1.Dept. of Computer ScienceNational Tsing Hua UniversityHsinchuTaiwan
  2. 2.is-land Systems Inc.Hsinchu Science ParkHsinchuTaiwan

Personalised recommendations