Integrating Map-Reduce and Stream-Processing for Efficiency (MRSP)

  • Pedro Martins
  • Maryam Abbasi
  • José Cecílio
  • Pedro Furtado
Conference paper

DOI: 10.1007/978-3-319-58274-0_1

Part of the Communications in Computer and Information Science book series (CCIS, volume 716)
Cite this paper as:
Martins P., Abbasi M., Cecílio J., Furtado P. (2017) Integrating Map-Reduce and Stream-Processing for Efficiency (MRSP). In: Kozielski S., Mrozek D., Kasprowski P., Małysiak-Mrozek B., Kostrzewa D. (eds) Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. BDAS 2017. Communications in Computer and Information Science, vol 716. Springer, Cham

Abstract

Works in the field of data warehousing (DW) do not address Stream Processing (SP) integration in order to provide results freshness (i.e. results that include information that is not yet stored into the DW) and at the same time to relax the DW processing load. Previous research works focus mainly on parallelization, for instance: adding more hardware resources; parallelizing operators, queries, and storage. A very known and studied approach is to use Map-Reduce to scale horizontally in order to achieve more storage and processing performance. In many contexts, high-rate data needs to be processed in small time windows without storing results (e.g. for near real-time monitoring), in other cases, the objective is to relax the data warehouse usage (e.g. keeping results updated for web-pages reload). In both cases, stream processing solutions can be set to work together with the data warehouse (Map-Reduce or not) to keep results available on the fly avoiding high query execution times, and, this way leaving the DW servers more available to process other heavy tasks (e.g. data mining).

In this work, we propose the integration of Stream Processing and Map-Reduce (MRSP) for better query and DW performance. This approach allows to relax the data warehouse load, and, by consequence reducing the network usage. This mechanism integrates into Map-Reduce scalability mechanisms and uses the Map-Reduce nodes to process Stream queries.

Results show/compare performance gains on the DW side and the quality of experience (QoE) when executing queries and loading data.

Keywords

Complex event processing Stream processing Extraction transformation and load Distributed system Data warehouse Big data Small data Map-Reduce 

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Pedro Martins
    • 1
  • Maryam Abbasi
    • 1
  • José Cecílio
    • 1
  • Pedro Furtado
    • 1
  1. 1.Polytechnic Institute of Viseu, Department of Computer SciencesUniversity of Coimbra (CISUC Research Group)CoimbraPortugal

Personalised recommendations