Chrum: The Tool for Convenient Generation of Apache Oozie Workflows

Chapter

Abstract

Conducting a research in an efficient, repetitive, evaluable, but also convenient (in terms of development) way has always been a challenge. To satisfy those requirements in a long term and simultaneously minimize costs of the software engineering process, one has to follow a certain set of guidelines. This article describes such guidelines based on the research environment called Content Analysis System (CoAnSys) created in the Center for Open Science (CeON). In addition to best practices for working in the Apache Hadoop environment, the tool for convenient generation of Apache Oozie workflows is presented.

Keywords

Hadoop Research environment Big data CoAnSys Text mining 

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Interdisciplinary Centre for Mathematical and Computational ModellingUniversity of WarsawWarszawaPoland

Personalised recommendations