© 2016

Practical Hadoop Ecosystem

A Definitive Guide to Hadoop-Related Frameworks and Tools


Table of contents

  1. Front Matter
    Pages i-xx
  2. Fundamentals

    1. Front Matter
      Pages 1-1
    2. Deepak Vohra
      Pages 3-162
    3. Deepak Vohra
      Pages 163-205
  3. Storing & Querying

    1. Front Matter
      Pages 207-207
    2. Deepak Vohra
      Pages 209-231
    3. Deepak Vohra
      Pages 233-257
  4. Bulk Transferring & Streaming

    1. Front Matter
      Pages 259-259
    2. Deepak Vohra
      Pages 261-286
    3. Deepak Vohra
      Pages 287-300
  5. Serializing

    1. Front Matter
      Pages 301-301
    2. Deepak Vohra
      Pages 303-323
    3. Deepak Vohra
      Pages 325-335
  6. Messaging & Indexing

    1. Front Matter
      Pages 337-337
    2. Deepak Vohra
      Pages 339-347
    3. Deepak Vohra
      Pages 349-376
    4. Deepak Vohra
      Pages 377-414
  7. Back Matter
    Pages 415-421

About this book


This book is a practical guide on using the Apache Hadoop projects including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr. From setting up the environment to running sample applications each chapter is a practical tutorial on using a Apache Hadoop ecosystem project. While several books on Apache Hadoop are available, most are based on the main projects MapReduce and HDFS and none discusses the other Apache Hadoop ecosystem projects and how these all work together as a cohesive big data development platform.

What you'll learn
  • How to set up environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5. 
  • How to run a MapReduce job
  • How to store data with Apache Hive, Apache HBase
  • How to index data in HDFS with Apache Solr
  • How to develop a Kafka messaging system
  • How to develop a Mahout User Recommender System
  • How to stream Logs to HDFS with Apache Flume
  • How to transfer data from MySQL database to Hive, HDFS and HBase with Sqoop
  • How create a Hive table over Apache Solr


Hadoop framework big data cloud database HBase Apache Hadoop Apache HBase Apache open source

Authors and affiliations

  1. 1.White RockCanada

About the authors

Deepak Vohra is a coder, developer, programmer, book author, and technical reviewer.

Bibliographic information