The techniques presented in the previous section have been implemented as an open-source library called CLoDSA (that stands for Classification, Localization, Detection, Segmentation Augmentor). CLoDSA is implemented in Python and relies on OpenCV [18] and SciPy [19] to deal with the different augmentation techniques. The CLoDSA library can be used in any operating system, and it is also independent from any particular machine learning framework.
CLoDSA configuration
CLoDSA augmentation procedure is flexible to adapt to different needs and it is based on six parameters: the dataset of images, the kind of problem, the input annotation mode, the output annotation mode, the generation mode, and the techniques to be applied. The dataset of images is given by the path where the images are located; and the kind of problem is either classification, localization, detection, segmentation, instance segmentation, stack classification, stack detection, or stack segmentation (the former five can be applied to datasets of 2D images, and the latter 3 to datasets of multi-dimensional images). The other four parameters and how they are managed in CLoDSA deserve a more detailed explanation.
The input annotation mode refers to the way of providing the labels for the images. CLoDSA supports the most-widely employed formats for annotating classification, localization, detection, semantic and instance segmentation tasks. For example, for object classification problems, the images can be organized by folders, and the label of an image be given by the name of the containing folder; another option for object classification labels is a spreadsheet with two columns that provide, respectively, the path of the image and the label; for object localization and detection there are several formats to annotate images such as the PASCAL VOC format [21] or the OpenCV format [22]; for semantic segmentation, the annotation images can be given in a devoted folder or in the same folder as the images; and, for instance segmentation, the COCO format is usually employed [23]. CLoDSA has been designed to manage different alternatives for the different problems, and can be easily extended to include new input modes that might appear in the future. To this aim, several design patterns, like the Factory pattern [24], and software engineering principles, such as dependency inversion or open/closed [25], have been applied. The list of input formats supported by CLoDSA for each kind of problem is given in Table 2 — a detailed explanation of the process to include new formats is provided in the project webpage.
Table 2 List of supported annotation formats
The output annotation mode indicates the way of storing the augmented images and their annotations. The first option can be as simple as using the same format or approach used to input the annotations. However, this might have the drawback of storing a large amount of images in the hard drive. To deal with this problem, it can be useful to store the augmented dataset using the standard Hierarchical Data Format (HDF5) [26] — a format designed to store and organize large amounts of data. Another approach to tackle the storage problem, and since the final aim of data augmentation is the use of the augmented images to train a model, consists in directly feeding the augmented images as batches to the model, as done for instance in Keras [12]. CLoDSA features these three approaches, and has been designed to easily include new methods in the future. The complete list of output formats supported by CLoDSA is given in Table 2.
The generation mode indicates how the augmentation techniques will be applied. Currently, there are only two possible modes: linear and power — in the future, new modes can be included. In the linear mode, given a dataset of n images, and a list of m augmentation techniques, each technique is applied to the n images producing at most n×m images. The power mode is a pipeline approach where augmentation techniques are chained together. In this approach, the images produced in one step of the pipeline are added to the dataset that will be fed in the next step of the pipeline producing a total of (2m−1)×n new images (where n is the size of the original dataset and m is the cardinal of the set of techniques of the pipeline).
Finally, the last but not least important parameter is the set of augmentation techniques to apply — the list of techniques available in CLoDSA is given in Table 1, and a more detailed explanation of the techniques and the parameters to configure them is provided in the project webpage. Depending on the particular problem, the CLoDSA users can select the techniques that are more fitted for their needs.
The CLoDSA architecture
In order to implement the methods presented in “Methods” section, we have followed a common pattern applicable to all the cases: the Dependency Inversion pattern [24]. We can distinguished three kind of classes in our architecture: technique classes, that implement the augmentation techniques; transformer classes, that implement the different strategies presented in “Methods” section; and augmentor classes, that implement the functionality to read and save images and annotations in different formats. We explain the design of these classes as follows.
We have first defined an abstract class called Technique with two abstract subclasses called PositionVariantTechnique and PositionInvariantTechnique — to indicate whether the technique belongs to the position variant or invariant class — and with an abstract method called apply, that given an image produces a new image after applying the transformation technique. Subsequently, we have implemented the list of techniques presented in Table 1 as classes that extend either the PositionVariantTechnique or the PositionInvariantTechnique class, see Fig. 6.
Subsequently, we have defined a generic abstract class [29] called Transformer <T1,T2>, where T1 represents the type of data (2D or multi-dimensional images) to be transformed, and T2 represents the type of the annotation for T1; for example, a box or a mask — the concrete types are fixed in the concrete classes extending the abstract class. This abstract class has two parameters, an object of type Technique, and a function f from label to label; and an abstract method called transform that given a pair (T1,T2) (for instance, in object detection, an image and a list of boxes indicating the position of the objects in the image) produces a new pair (T1,T2) using one of the augmentation strategies presented in “Methods” section — the strategy is implemented in the subclasses of the Transformer <T1,T2> class. The purpose of the function f is to allow the transform method to not only change the position of the annotations but also their associated class. As we have previously mentioned, in general, data augmentation procedures apply techniques that do not change the class of the objects of the image; but there are cases when the transformation technique changes the class (for instance, if we have a dataset of images annotated with two classes, people looking to the left and people looking to the right, applying a vertical flip changes the class); the function f encodes that modification — by default, this function is defined as the identity function. This part or the architecture is depicted in Fig. 7.
Finally, we have defined an interface called IAugmentor that has three methods addTransformer, readDataAndAnnotations, and applyAugmentation; see Fig. 8. The classes implementing this interface are in charge of reading the data and annotations in a concrete format (using the readDataAndAnnotations), applying the augmentation (by means of the applyAugmentation and using objects of the class Transformer injected using the addTransformer method), and storing the result — the input and output format available are indicated in Table 2. In order to ensure that the different objects of the architecture are constructed properly (that is, satisfying the required dependencies) the Factory pattern has been employed [24].
Therefore, using this approach, the functionality of CLoDSA can be easily extended in several ways. It is possible to add new augmentation techniques by adding new classes that extend the Technique class. Moreover, we can also extend the kinds of problems that can be tackled in CLoDSA by adding new classes that extend the Transformer class. Finally, we can manage new input/output formats by providing classes that implement the IAugmentor interface. Several examples showing how to include new functionality in CLoDSA can be found in the project webpage.
Using CLoDSA
We finish this section by explaining the different modes of using CLoDSA. This library can be employed by both expert and non-expert users.
First of all, users that are used to work with Python libraries can import CLoDSA as any other library and use it directly in their own projects. Several examples of how the library can be imported and employed are provided in the project webpage. This kind of users can extend CLoDSA with new augmentation techniques easily. The second, and probably the most common, kind of CLoDSA’s users are researchers that know how to employ Python but do not want to integrate CLoDSA with their own code. In this case, we have provided several Jupyter notebooks to illustrate how to employ CLoDSA for data augmentation in several contexts — again the notebooks are provided in the project webpage and also as supplementary materials. An example of this interaction is provided in Appendix A.
CLoDSA can be also employed without any knowledge of Python. To this aim, CLoDSA can be executed as a command line program that can be configured by means of a JavaScript Object Notation (JSON) file [30]. Therefore, users who know how to write JSON files can employ this approach. Finally, and due to the fact that the creation of a JSON file might be a challenge for some users since there is a great variety of options to configure the library; we have created a step-by-step Java wizard that guides the user in the process of creating the JSON file and invoking the CLoDSA library. In this way, the users, instead of writing a JSON file, select in a simple graphical user interface the different options for augmenting their dataset of images, and the wizard is in charge of generating the JSON file and executing the augmentation procedure. Besides, since new configuration options might appear in the future for CLoDSA, the Java wizard can include those options by modifying a configuration file — this avoids the issue of modifying the Java wizard every time that a new option is included in CLoDSA.