Skip to main content

Column Access-aware In-stream Data Cache with Stream Processing Framework

Abstract

In recent years, researches focus on addressing the query bottleneck issue of big data, e.g. NoSQL databases, MapReduce and big data processing framework. Although NoSQL databases have many advantages on On-Line Analytical Processing (OLAP), it is a big project to migrate Relational Database Management System (RDBMS) to NoSQL. Therefore, the optimization of RDBMS is still important. In this paper, we construct Column Access-aware In-stream Data Cache (CAIDC) for relational databases, which is an integral part of RDBMS and in-memory cache. Furthermore, a live synchronization approach from physical RDBMS to in-memory data cache using stream processing framework is proposed. On one hand, CAIDC provides low latency while supporting log-based trigger in the presence of updates to maintain data consistency because of stream processing framework. On the other hand, CAIDC translates the frequently accessed data to column-oriented in-memory cache by the column access frequency to ensure heavy hitter queries. Finally, experimental results show that this approach is supporting a wide range of applications with big data.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

References

  1. 1.

    Ahirrao, S., & Ingle, R. (2013). Scalable transactions in cloud data stores. In 2013 IEEE 3rd international advance computing conference (IACC) (pp. 116–119). IEEE.

  2. 2.

    Bo, L.C.L. (2010). An improvement on window snapshot differential algorithm. Computer Applications and Software, 4, 047.

    Google Scholar 

  3. 3.

    Casters, M., Bouman, R., & Van Dongen, J. (2010). Pentaho Kettle solutions: building open source ETL solutions with Pentaho data integration. Wiley.

  4. 4.

    Cattell, R. (2011). Scalable sql and nosql data stores. ACM SIGMOD Record, 39(4), 12–27.

    Article  Google Scholar 

  5. 5.

    Consulting, A. Mongify - move data from sql to mongodb with ease., http://mongify.com/.

  6. 6.

    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., & Sears, R. (2010). Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on cloud computing (pp. 143–154). ACM.

  7. 7.

    Das, S., Botev, C., Surlaker, K., Ghosh, B., Varadarajan, B., Nagaraj, S., Zhang, D., Gao, L., Westerman, J., Ganti, P., & et al. (2012). All aboard the databus!: Linkedin’s scalable consistent change data capture platform. In Proceedings of the 3rd ACM symposium on cloud computing (p. 18). ACM.

  8. 8.

    Dean, J., & Ghemawat, S. (2010). Mapreduce: a flexible data processing tool. Communications of the ACM, 53(1), 72–77.

  9. 9.

    Dong, F., Ma, K., & Yang, B. (2015). Cache system for frequently updated data in the cloud. WSEAS Transactions on Computers, 14, 163–170.

    Google Scholar 

  10. 10.

    Fitzpatrick, B. (2004). Distributed caching with memcached. Linux journal, 2004(124), 5.

    Google Scholar 

  11. 11.

    Ghandeharizadeh, S., & Yap, J. (2012). Gumball: a race condition prevention technique for cache augmented sql database management systems. In Proceedings of the 2nd ACM SIGMOD workshop on databases and social networks (pp. 1–6). ACM.

  12. 12.

    Ghandeharizadeh, S., & Yap, J. (2013). Cache augmented database management systems. In Proceedings of the ACM SIGMOD workshop on databases and social networks (pp. 31–36). ACM.

  13. 13.

    Gupta, P., Zeldovich, N., & Madden, S. (2011). A trigger-based middleware cache for orms. In Middleware 2011 (pp. 329–349). Springer.

  14. 14.

    Gupta, P., Zeldovich, N., & Madden, S. (2011). A trigger-based middleware cache for orms. In Middleware 2011 (pp. 329–349). Springer.

  15. 15.

    Liu, Y., Liu, W., Song, J., & He, H. (2015). An empirical study on implementing highly reliable stream computing systems with private cloud. Ad Hoc Networks.

  16. 16.

    Ma, K., & Dong, F. (2015). Live data migration approach from relational tables to schema-free collections with mapreduce. International Journal of Services Technology and Management, 21(4/5/6), 318–335.

  17. 17.

    Ma, K., & Yang, B. (2015). Access-aware in-memory data cache middleware for relational databases. In Proceedings of 17th IEEE international conference on high performance computing and communications (pp. 1506–1511).

  18. 18.

    Ma, K., & Yang, B. (2015). Log-based change data capture from schema-free document stores using mapreduce. In Proceedings of 2015 international conference of cloud computing technologies and applications (pp. 1–6).

  19. 19.

    Mi, P., & Scacchi, W. (1992). Process integration in case environments. IEEE Software, 9(2), 45–53.

    Article  Google Scholar 

  20. 20.

    Plattner, H. (2009). A common database approach for oltp and olap using an in-memory column database. In Proceedings of the 2009 ACM SIGMOD international conference on management of data (pp. 1–2). ACM.

  21. 21.

    Ports, D.R., Clements, A.T., Zhang, I., Madden, S., & Liskov, B. (2010). Transactional consistency and automatic management in an application data cache. In OSDI, (Vol. 10 pp. 1–15).

  22. 22.

    Qin, L., Yu, J.X., & Chang, L. (2009). Keyword search in databases: the power of rdbms. In Proceedings of the 2009 ACM SIGMOD international conference on management of data (pp. 681–694). ACM.

  23. 23.

    Schwartz, B., Zaitsev, P., & Tkachenko, V. (2012). High performance MySQL: optimization, backups, and replication. O’Reilly Media Inc.

  24. 24.

    Stonebraker, M. (2010). Sql databases v. nosql databases. Communications of the ACM, 53(4), 10–11.

    Article  Google Scholar 

  25. 25.

    Vassiliadis, P. (2009). A survey of extract-transform-load technology. International Journal of Data Warehousing and Mining, 5(3), 1–27.

    Article  Google Scholar 

  26. 26.

    Xhafa, F., Naranjo, V., & Caballé, S. (2015). Processing and analytics of big data streams with yahoo! s4. In Proceedings of 2015 IEEE 29th international conference on advanced information networking and applications (pp. 263–270).

  27. 27.

    Zhou, H., Yang, D., & Xu, Y. (2012). An etl strategy for real-time data warehouse. In Practical applications of intelligent systems (pp. 329–336). Springer.

Download references

Acknowledgments

This work was supported by the Doctoral Fund of University of Jinan (XBS1237), the Shandong Provincial Natural Science Foundation (ZR2014FQ029), the Shandong Provincial Key R&D Program (2015GGX106007), the Teaching Research Project of University of Jinan (J1344), the National Key Technology R&D Program (2012BAF12B07), and the Open Project Funding of Shandong Provincial Key Laboratory of Software Engineering (No. 2015SE03).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kun Ma.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ma, K., Yang, B. Column Access-aware In-stream Data Cache with Stream Processing Framework. J Sign Process Syst 86, 191–205 (2017). https://doi.org/10.1007/s11265-016-1117-6

Download citation

Keywords

  • Big data
  • Stream processing
  • NoSQL
  • Stream computing
  • Data cache
  • Access frequency