Pro Hadoop Data Analytics

Designing and Building Big Data Systems using the Hadoop Ecosystem

  • Kerry Koitzsch

Table of contents

  1. Front Matter
    Pages i-xxi
  2. Concepts

    1. Front Matter
      Pages 1-1
    2. Kerry Koitzsch
      Pages 29-42
    3. Kerry Koitzsch
      Pages 43-62
    4. Kerry Koitzsch
      Pages 63-76
    5. Kerry Koitzsch
      Pages 77-90
  3. Architectures and Algorithms

    1. Front Matter
      Pages 137-137
  4. Components and Systems

    1. Front Matter
      Pages 177-177
  5. Case Studies and Applications

  6. Back Matter
    Pages 275-298

About this book


Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics of classification, clustering, and recommendation.

In Pro Hadoop Data Analytics best practices are emphasized to ensure coherent, efficient development. A complete example system will be developed using standard third-party components which will consist of the toolkits, libraries, visualization and reporting code, as well as support glue to provide a working and extensible end-to-end system.

The book emphasizes four important topics:

  • The importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results. Deep-dive topics will include Spark, H20, Vopal Wabbit (NLP), Stanford  NLP, and other appropriate toolkits and plugins.
  • Best practices and structured design principles. This will include strategic topics as well as the how to example portions.
  • The importance of mix-and-match or hybrid systems, using different analytical components in one application to accomplish application goals. The hybrid approach will be prominent in the examples.
  • Use of existing third-party libraries is key to effective development. Deep dive examples of the functionality of some of these toolkits will be showcased as you develop the example system.





Analytics Data Analytics Hadoop Scala Python Maven OpenCV Apache Mahout NoSQL Ralational Database Lucene Solr Architecture Algorithms Data Visualisation Machine Learning

Authors and affiliations

  • Kerry Koitzsch
    • 1
  1. 1.SunnyvaleUSA

Bibliographic information