Journal of Signal Processing Systems

, Volume 86, Issue 2–3, pp 191–205 | Cite as

Column Access-aware In-stream Data Cache with Stream Processing Framework

Article

Abstract

In recent years, researches focus on addressing the query bottleneck issue of big data, e.g. NoSQL databases, MapReduce and big data processing framework. Although NoSQL databases have many advantages on On-Line Analytical Processing (OLAP), it is a big project to migrate Relational Database Management System (RDBMS) to NoSQL. Therefore, the optimization of RDBMS is still important. In this paper, we construct Column Access-aware In-stream Data Cache (CAIDC) for relational databases, which is an integral part of RDBMS and in-memory cache. Furthermore, a live synchronization approach from physical RDBMS to in-memory data cache using stream processing framework is proposed. On one hand, CAIDC provides low latency while supporting log-based trigger in the presence of updates to maintain data consistency because of stream processing framework. On the other hand, CAIDC translates the frequently accessed data to column-oriented in-memory cache by the column access frequency to ensure heavy hitter queries. Finally, experimental results show that this approach is supporting a wide range of applications with big data.

Keywords

Big data Stream processing NoSQL Stream computing Data cache Access frequency 

Notes

Acknowledgments

This work was supported by the Doctoral Fund of University of Jinan (XBS1237), the Shandong Provincial Natural Science Foundation (ZR2014FQ029), the Shandong Provincial Key R&D Program (2015GGX106007), the Teaching Research Project of University of Jinan (J1344), the National Key Technology R&D Program (2012BAF12B07), and the Open Project Funding of Shandong Provincial Key Laboratory of Software Engineering (No. 2015SE03).

References

  1. 1.
    Ahirrao, S., & Ingle, R. (2013). Scalable transactions in cloud data stores. In 2013 IEEE 3rd international advance computing conference (IACC) (pp. 116–119). IEEE.Google Scholar
  2. 2.
    Bo, L.C.L. (2010). An improvement on window snapshot differential algorithm. Computer Applications and Software, 4, 047.Google Scholar
  3. 3.
    Casters, M., Bouman, R., & Van Dongen, J. (2010). Pentaho Kettle solutions: building open source ETL solutions with Pentaho data integration. Wiley.Google Scholar
  4. 4.
    Cattell, R. (2011). Scalable sql and nosql data stores. ACM SIGMOD Record, 39(4), 12–27.CrossRefGoogle Scholar
  5. 5.
    Consulting, A. Mongify - move data from sql to mongodb with ease., http://mongify.com/.
  6. 6.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., & Sears, R. (2010). Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on cloud computing (pp. 143–154). ACM.Google Scholar
  7. 7.
    Das, S., Botev, C., Surlaker, K., Ghosh, B., Varadarajan, B., Nagaraj, S., Zhang, D., Gao, L., Westerman, J., Ganti, P., & et al. (2012). All aboard the databus!: Linkedin’s scalable consistent change data capture platform. In Proceedings of the 3rd ACM symposium on cloud computing (p. 18). ACM.Google Scholar
  8. 8.
    Dean, J., & Ghemawat, S. (2010). Mapreduce: a flexible data processing tool. Communications of the ACM, 53(1), 72–77.Google Scholar
  9. 9.
    Dong, F., Ma, K., & Yang, B. (2015). Cache system for frequently updated data in the cloud. WSEAS Transactions on Computers, 14, 163–170.Google Scholar
  10. 10.
    Fitzpatrick, B. (2004). Distributed caching with memcached. Linux journal, 2004(124), 5.Google Scholar
  11. 11.
    Ghandeharizadeh, S., & Yap, J. (2012). Gumball: a race condition prevention technique for cache augmented sql database management systems. In Proceedings of the 2nd ACM SIGMOD workshop on databases and social networks (pp. 1–6). ACM.Google Scholar
  12. 12.
    Ghandeharizadeh, S., & Yap, J. (2013). Cache augmented database management systems. In Proceedings of the ACM SIGMOD workshop on databases and social networks (pp. 31–36). ACM.Google Scholar
  13. 13.
    Gupta, P., Zeldovich, N., & Madden, S. (2011). A trigger-based middleware cache for orms. In Middleware 2011 (pp. 329–349). Springer.Google Scholar
  14. 14.
    Gupta, P., Zeldovich, N., & Madden, S. (2011). A trigger-based middleware cache for orms. In Middleware 2011 (pp. 329–349). Springer.Google Scholar
  15. 15.
    Liu, Y., Liu, W., Song, J., & He, H. (2015). An empirical study on implementing highly reliable stream computing systems with private cloud. Ad Hoc Networks.Google Scholar
  16. 16.
    Ma, K., & Dong, F. (2015). Live data migration approach from relational tables to schema-free collections with mapreduce. International Journal of Services Technology and Management, 21(4/5/6), 318–335.Google Scholar
  17. 17.
    Ma, K., & Yang, B. (2015). Access-aware in-memory data cache middleware for relational databases. In Proceedings of 17th IEEE international conference on high performance computing and communications (pp. 1506–1511).Google Scholar
  18. 18.
    Ma, K., & Yang, B. (2015). Log-based change data capture from schema-free document stores using mapreduce. In Proceedings of 2015 international conference of cloud computing technologies and applications (pp. 1–6).Google Scholar
  19. 19.
    Mi, P., & Scacchi, W. (1992). Process integration in case environments. IEEE Software, 9(2), 45–53.CrossRefGoogle Scholar
  20. 20.
    Plattner, H. (2009). A common database approach for oltp and olap using an in-memory column database. In Proceedings of the 2009 ACM SIGMOD international conference on management of data (pp. 1–2). ACM.Google Scholar
  21. 21.
    Ports, D.R., Clements, A.T., Zhang, I., Madden, S., & Liskov, B. (2010). Transactional consistency and automatic management in an application data cache. In OSDI, (Vol. 10 pp. 1–15).Google Scholar
  22. 22.
    Qin, L., Yu, J.X., & Chang, L. (2009). Keyword search in databases: the power of rdbms. In Proceedings of the 2009 ACM SIGMOD international conference on management of data (pp. 681–694). ACM.Google Scholar
  23. 23.
    Schwartz, B., Zaitsev, P., & Tkachenko, V. (2012). High performance MySQL: optimization, backups, and replication. O’Reilly Media Inc.Google Scholar
  24. 24.
    Stonebraker, M. (2010). Sql databases v. nosql databases. Communications of the ACM, 53(4), 10–11.CrossRefGoogle Scholar
  25. 25.
    Vassiliadis, P. (2009). A survey of extract-transform-load technology. International Journal of Data Warehousing and Mining, 5(3), 1–27.CrossRefGoogle Scholar
  26. 26.
    Xhafa, F., Naranjo, V., & Caballé, S. (2015). Processing and analytics of big data streams with yahoo! s4. In Proceedings of 2015 IEEE 29th international conference on advanced information networking and applications (pp. 263–270).Google Scholar
  27. 27.
    Zhou, H., Yang, D., & Xu, Y. (2012). An etl strategy for real-time data warehouse. In Practical applications of intelligent systems (pp. 329–336). Springer.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Shandong Provincial Key Laboratory of Network Based Intelligent ComputingUniversity of JinanJinanChina

Personalised recommendations