MapReduce-Based Data Stream Processing over Large History Data

  • Kaiyuan Qi
  • Zhuofeng Zhao
  • Jun Fang
  • Yanbo Han
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7636)


With the development of Internet of Things applications based on sensor data, how to process high speed data stream over large scale history data brings a new challenge. This paper proposes a new programming model RTMR, which improves the real-time capability of traditional batch processing based MapReduce by preprocessing and caching, along with pipelining and localizing. Furthermore, to adapt the topologies to application characteristics and cluster environments, a model analysis based RTMR cluster constructing method is proposed. The benchmark built on the urban vehicle monitoring system shows RTMR can provide the real-time capability and scalability for data stream processing over large scale data.


data stream processing large scale data processing MapReduce 


  1. 1.
    Motwani, R., Widom, J., Arasu, A., et al.: Query processing, resource management, and approximation in a data stream management system. In: 1st Biennial Conference on Innovative Data Systems Research, pp. 176–187. ACM Press, New York (2003)Google Scholar
  2. 2.
    Abadi, D.J., Ahmad, Y., Balazinska, M., et al.: The design of the Borealis stream processing engine. In: 2nd Biennial Conference on Innovative Data Systems Research, pp. 277–289. ACM Press, New York (2005)Google Scholar
  3. 3.
    Jin, C.Q., Qian, W.N., Zhou, A.Y.: Analysis and management of streaming data: A survey. Journal of Software 15(8), 1172–1181 (2004)zbMATHGoogle Scholar
  4. 4.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. ACM Communication 51(1), 107–113 (2008)CrossRefGoogle Scholar
  5. 5.
    Ranger, C., Raghuraman, R., Penmetsa, A., et al.: Evaluating MapReduce for multi-core and multiprocessor systems. In: 13th International Conference on High Performance Computer Architecture, pp. 13–24. IEEE Computer Society, Washington (2007)Google Scholar
  6. 6.
    Kaashoek, F., Morris, R., Mao, Y.: Optimizing MapReduce for multicore architectures. Technical Report, MIT Computer Science and Artificial Intelligence Laboratory (2010)Google Scholar
  7. 7.
    Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: A distributed storage system for structured data. In: 7th Symposium on Operating Systems Design and Implementation, pp. 205–218. USENIX Association, Berkeley (2006)Google Scholar
  8. 8.
    Diao, Z.J., Zheng, H.D., Liu, J.Z., et al.: Operational Research. Higher Education Press, Beijing (2010)Google Scholar
  9. 9.
    Shah, M.A., Hellerstein, J.M., Chandrasekaran, S., et al.: Flux: An adaptive partitioning operator for continuous query systems. In: 19th International Conference on Data Engineering, pp. 25–36. IEEE Computer Society, Washington (2003)Google Scholar
  10. 10.
    Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: 9th USENIX Symposium on Operating Systems Design and Implementation, pp. 251–264. USENIX Association, Berkeley (2010)Google Scholar
  11. 11.
    Ekanayake, J., Li, H., Zhang, B., et al.: Twister: A runtime for iterative MapReduce. In: 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818. ACM Press, New York (2010)Google Scholar
  12. 12.
    Zaharia, M., Chowdhury, N.M., Franklin, M., et al.: Spark: Cluster competing with working sets. In: 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 1–10. USENIX Association, Berkeley (2010)Google Scholar
  13. 13.
    Condie, T., Conway, N., Alvaro, P., et al.: MapReduce online. In: 7th USENIX Symposium on Networked Systems Design and Implementation, pp. 313–328. USENIX Association, Berkeley (2010)Google Scholar
  14. 14.
    Neumeyer, L., Robbins, L., Nair, A., et al.: S4: Distributed stream computing platform. In: 10th IEEE International Conference on Data Mining Workshops, pp. 170–177. IEEE Computer Society, Washington (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kaiyuan Qi
    • 1
    • 2
  • Zhuofeng Zhao
    • 1
  • Jun Fang
    • 1
  • Yanbo Han
    • 1
  1. 1.Cloud Computing Research CenterNorth China University of TechnologyBeijingChina
  2. 2.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations