In the previous chapter, we focused on a long-term processing job, which runs in a Hadoop cluster and leverages YARN or Hive. In this chapter, I would like to introduce you to what I call the 2014 way of processing the data: streaming data. Indeed, more and more data processing infrastructures are relying on streaming or logging architecture that ingest the data, make some transformation, and then transport the data to a data persistency layer.


Configuration File Processing Pipeline Site Traffic Hadoop Cluster Clickstream Data 

Copyright information

© Bahaaldine Azarmi 2016

Authors and Affiliations

  • Bahaaldine Azarmi
    • 1
  1. 1.Saint CloudFrance

Personalised recommendations