Building a Bioimage Analysis Workflow Using Deep Learning

Gómez-de-Mariscal, Estibaliz; Franco-Barranco, Daniel; Muñoz-Barrutia, Arrate; Arganda-Carreras, Ignacio

doi:10.1007/978-3-030-76394-7_4

Estibaliz Gómez-de-Mariscal⁹,
Daniel Franco-Barranco¹⁰,
Arrate Muñoz-Barrutia¹¹ &
…
Ignacio Arganda-Carreras¹²

Part of the book series: Learning Materials in Biosciences ((LMB))

4064 Accesses
22 Altmetric

Abstract

The aim of this workflow is to quantify the morphology of pancreatic stem cells lying on a 2D polystyrene substrate from phase contrast microscopy images. For this purpose, the images are first processed with a Deep Learning model trained for semantic segmentation (cell/background); next, the result is refined and individual cell instances are segmented before characterizing their morphology. Through this workflow the readers will learn the nomenclature and understand the principles of Deep Learning applied to image processing.

This Chapter has been reviewed by Sébastien Tosi, IRB, Barcelona.

You have full access to this open access chapter, Download chapter PDF

UNSEG: unsupervised segmentation of cells and their nuclei in complex tissue samples

Article Open access 30 August 2024

DeepSplit: Segmentation of Microscopy Images Using Multi-task Convolutional Networks

U-Net: deep learning for cell counting, detection, and morphometry

Article 17 December 2018

FormalPara What You Will Learn in This Chapter

The aim of this workflow is to quantify the morphology of pancreatic stem cells lying on a 2D polystyrene substrate from phase contrast microscopy images. For this purpose, the images are first processed with a Deep Learning model trained for semantic segmentation (cell/background); next, the result is refined and individual cell instances are segmented before characterizing their morphology. Through this workflow the readers will learn the nomenclature and understand the principles of Deep Learning applied to image processing. Having followed all the steps in this chapter, the reader is expected to know how to use Google Colaboratory (Bisong, 2019) notebooks, ImageJ/Fiji (Schindelin et al., 2012; Schneider et al., 2012; Rueden et al., 2017), DeepImageJ (Gómez-de Mariscal et al., 2019) and MorpholibJ (Legland et al., 2016). This complete workflow sets the basis to develop further methods in the field of Bioimage Analysis using Deep Learning. All the material needed for this chapter is provided in the following GitHub repository (under chap 4): https://github.com/NEUBIAS/neubias-springer-book-2021.^{Footnote 1}

4.1 Why You Should Know About Deep Learning

The workflow presented in this Chapter extracts binary masks for cells in 2D phase contrast microscopy images, identifies the cells in the image and quantifies their morphology. The central component of the workflow is the step to obtain a binary mask to distinguish the pixels belonging to the cells from the rest of pixels in the image. In particular, we will train a well established Deep Learning architecture called U-Net (Ronneberger et al., 2015; Falk et al., 2019) to perform this task.

Machine Learning and Deep Learning have become common technical terms in life-science. They are now large fields of study that have boosted both research and industry. While both are strongly related, they also belong to a larger field called Artificial Intelligence, which pursues mimicking (or even surpassing) human intelligence with a machine (Goodfellow et al., 2016). The techniques to extract the proper information and use it in an intelligent way is what we call Machine Learning (ML). The ML techniques are commonly divided into two main groups: supervised and unsupervised methods. Supervised learning is the task of learning a function that maps an input to an output based on sample input-output pairs. Namely, it infers such a function from labeled training data consisting of a set of training examples. When no labels or information about the correct output are given, then we are talking about unsupervised learning, and the corresponding function is inferred using the data structure only. All the clustering methods are thus included in the latter.

A simple example of ML is a linear classifier, technically called perceptron (Rosenblatt, 1961), which is able, for example, to split a set of 2D points into two different classes. In practice, ML classifiers operate on objects of way higher dimensions (e.g., images) and solve tasks far more complex than classifying input data into two groups. For this reason, in practice, multiple perceptrons are stacked together to build what is known as an Artificial Neural Network (ANN). That is, we define deep architectures to support better mathematical representations of our data. This, combined with a suitable training schedule, allows the computer to learn the correct patterns to perform the desired task. This is called Deep Learning (DL from now on) and, at the moment, it has proven to be among the most powerful frameworks for supervised learning.

What sets apart DL from classical approaches is that the system learns automatically from the data without any definition or explicit programming of complex heuristic rules. A pioneer work using DL for bioimage analysis is the Convolutional Neural Network (CNN) architecture called U-Net (Ronneberger et al., 2015). It was first introduced to the community in 2015 at the International Symposium on Biomedical Imaging (ISBI) and then published at the Medical Image Computing and Computer Assisted Interventions (MICCAI) conference, two of the most important conferences for biomedical image analysis. Since then, a growing number of manuscripts (about 390 in 2020 according to PubMed) related to biomedical image analysis using DL are published every year (Litjens et al., 2017).

Note that DL techniques do not only require sophisticated algorithms but also large sets of (manually) annotated images and an enormous amount of computational power. Data collection itself could be a whole project in Computer Vision (Roh et al., 2021), not only for being critical for the success of ML techniques, but also for the complexity that handling large amounts of data involves and the related time and economical costs. In contrast with other fields in Computer Vision, the availability of useful, large and robustly annotated datasets in bioimage analysis is still a bottleneck for the use of DL. This is due to the high economical cost that their acquisition implies, and the need for expertise to generate manual annotations. Indeed, preparing manual annotations can be tedious and many times non-viable. Some freely available annotation tools are QuPath (Bankhead et al., 2017), 3D Slicer (Kapur et al., 2016), Paintera,^{Footnote 2} Mastodon,^{Footnote 3} Catmaid (Saalfeld et al., 2009), TrakEM2 (Cardona et al, 2012), Napari (Sofroniew et al., 2020) and ITK-SNAP (Yushkevich et al., 2006); they offer a wide range of possibilities to simplify the annotation process and make it reasonably efficient. However, there is still a need for a general approach to annotate complex structures in higher dimensions (i.e., 3D, time, multiple channels, multi-modality images). Additionally, the large variability among the images acquired following exactly the same setup but in a different laboratory or by a different technician prevents the transfer of trained DL models. For this reason, we want to warn the reader about the necessity of retraining the DL model provided on the target data to be processed. Fortunately, as it will be demonstrated, this is quite simple to do with a basic knowledge of Python and some libraries such as TensorFlow (Abadi et al., 2016), Keras (Chollet et al., 2015), or Pytorch (Paszke et al., 2019), which release the user from many computational and programming technicalities. Other even more user-friendly frameworks are Ilastik (Berg et al., 2019), ImJoy (Ouyang et al., 2019), ZeroCostDL4Mic (von Chamier et al., 2020), and the ones integrated in Fiji/ImageJ, CSBDeep (Weigert et al., 2018), and deepImageJ (Gómez-de Mariscal et al., 2019). These tools allow the direct use and/or retraining of DL models using zero-code.

(Re)training DL models requires considerable computational power. The use of a graphics processing unit (GPU) such as the ones found in modern graphics boards, or specialized tensor processing units (TPU), is strongly recommended in most cases to speed up the training process. Access to these resources is possible through non-free cloud computing services such as the ones provided by Amazon or Google. Fortunately, there is a free alternative available for Google users through the Google Colaboratory ("Google Colab") framework (Bisong, 2019). It provides serverless Python Jupyter notebooks running on this hardware with pre-installed DL libraries. The use of these resources is limited but most of the time sufficient to train and test bioimage analysis (BIA) models.

4.2 Dataset

The original data processed by this workflow can be found on the web page of the Cell Tracking Challenge (CTC) (Maška et al., 2014; Ulman et al., 2017).^{Footnote 4} It is provided as two independent datasets (training and challenge) since it aims to benchmark (evaluate) cell segmentation and tracking computational methods. The training set is the only one for which Ground Truth^{Footnote 5} (GT) is publicly available. Additionally, the CTC provides a set called Silver Truth^{Footnote 6} (ST). The ST set is much larger than the GT set, so it is more suitable for DL tasks. An example of training data is illustrated in Fig. 4.1.

For this work, we will use the training set of the challenge and the ST annotations to train and evaluate our method. The ST is processed to extract the contours of each cell that will be used by the workflow (Fig. 4.1). A ready-to-use dataset is provided.^{Footnote 7} Note that the data is distributed into three groups (training, validation and test). We will elaborate more on this in the following sections. For the final step of the workflow, we will apply the trained models to unseen data for which manual annotations are not available. For this, we will use the challenge data provided at the CTC web page.^{Footnote 8} In a real case scenario, the trained models are always applied to unseen data, with no GT available, otherwise we would not need to train any method!

4.3 Tools

Some tools and software packages need to be installed to run the workflow:

Fiji^{Footnote 9}
- Download URL: https://imagej.net/Fiji/Downloads
- MorphoLibJ plugin. IJPB update site URL: https://sites.imagej.net/IJPB-plugins/
- DeepImageJ plugin. Update site URL: https://sites.imagej.net/DeepImageJ/
To install Fiji plugins, in Fiji, click on Help > Update... Once the ImageJ Updater opens, click on Manage update sites. There you need to select the IJPB-plugins for MorpholibJ. To install deepImageJ, you need to click on Add update site. Then, fill the fields with Name: DeepImageJ and update site URL. Click on Close and Apply changes.
Python Notebooks: they can be executed locally or in Google Colaboratory^{Footnote 10} which provides free access to cloud GPU. The latter requires a Google account.
- Link to the notebook.^{Footnote 11}
- Link to open the notebook directly in Google Colaboratory.^{Footnote 12} It is recommended to make a local copy of the Notebook, as it will be editable.

4.4 Workflow

The steps of the workflow covered in this chapter are summarized in Fig. 4.2.

4.4.1 Step 1: Setting up a Google Colaboratory Notebook

After opening a Google Colab notebook, we configure the hardware needed for its execution. In this case, we set up a GPU runtime (Fig. 4.3). Now we can run the notebook. The way to proceed is by clicking on the "play" button on the left side of each code cell. For example, the first cell will install the correct version of the required DL libraries (TensorFlow and Keras). This is critical for results reproducibility since functions performance can differ among different versions, or the code may even crash (Fig. 4.4).

4.4.2 Step 2: Download and Split the Data into Training, Validation and Test

When using ML methods, we need to split the available annotated (GT) data into three exclusive sets: training, validation and test. The training set is used to train the method and let it learn the task of interest (e.g., binary segmentation). Such set needs to be large enough as to cover all representative scenarios (e.g., poor signal-to-noise ratio, blurred images) and events visible in the data (e.g., artifacts, debris, mitosis, apoptosis, clusters of cells). The validation set, as indicated by its name, serves to evaluate the performance of the method during training, to ensure that it is learning and to prevent over-fitting.^{Footnote 13} The test set will be used to assess the performance of the method once the training procedure has finished. Both validation and test sets need to be independent of the training set, so that when the accuracy of the model becomes acceptable on the validation set, we can be confident that it is because the model is properly trained and that it has not over-fit the training set. The evaluation of the model performance on the test set aims to assess its ability to generalize to unseen data.

The GT data, in this particular case, consists of two independent time-lapse videos (sequences 01 and 02). Some frames from sequence 01 are used as training data while some other frames from the sequence 02 are used for both validation (frames $140, \ldots , 250$) and test (frames $151, 152,\ldots ,248, 249$). This data organization is compiled in a zip file that needs to be downloaded and unzipped (in the cloud, if running the workflow in Google Colab). These operations are performed in the second code cell by the following commands:

After decompression, the new folder called dataset contains three sub-folders (input, binary_masks and contours) for the three different sets.

4.4.3 Step 3: Train a Deep Learning Model for Binary Segmentation

A U-Net DL network is designed and trained to segment the cells in the images. We train the network by using the original 2D phase contrast microscopy images as input, and a set of three binary masks as output: 1) background mask (with pixel values of 1 for the background and 0 for the rest), 2) cell mask (1 cells and 0 the rest) and 3) cell contour (1 cell contour and 0 the rest). In other words, the network will learn to classify each input pixel as belonging to one of these three classes: background, foreground or contour.

Since the classification is performed per pixel, this process is called semantic segmentation, as opposed to instance segmentation, for which the model outputs a unique label per object of interest (here, independent cells).

4.4.3.1 Step 3.1: Preparing the Data for Training

Read the images for training and store them into memory by running the following code:

You should get the following message together with the figures from Fig. 4.5.

The U-Net network we are going to train has $\sim 500,000$ trainable parameters, which requires a large amount of memory. Thus, to reduce memory usage and make it fit to the hardware offered by Google Colab, we crop small random patches of size $256\times 256$ pixels from the original images. To do so, we create a function that crops a fixed number of patches from each image. We need to make sure that the part cropped out from the input image and the output patches (annotation binary masks) correspond to each other. Then, we use this function to crop out patches from the training data in the following code section:

We choose to normalize the intensity values of the input and output images between 0.0 and 1.0. This way, a common range of values for all the images is set without changing the differences among them or their properties. This helps the network to find the optimal parameters which give generality to the model and in some cases, to speed up the training.

Note that the class of each pixel is mathematically written using a one-hot encoding representation, for which we need three binary matrices (one per class) for each image. Hence, a pixel in the background is encoded as $\left[ 1,0,0\right] $, as $\left[ 0,1,0\right] $ for foreground and as $\left[ 0,0,1\right] $ for cell contour. This is performed by the following code section:

Exercise 1

Repeat the same procedure for the validation set. You should obtain two variables X_val and Y_val with shapes $n\times 256\times 256\times 1$ and $n\times 256\times 256\times 3$, respectively, n being the total number of patches generated from the validation set. We recommend to generate 6 patches for each image as there are only 11 images in the validation set and you will only crop small patches from them.

4.4.3.2 Step 3.2: Building a U-Net Shaped Convolutional Neural Network

The key component of any DL method used for image analysis are the convolutional layers: A filter kernel, convolution matrix, which is a small matrix that is convolved with the input image (see Fig. 4.6a). Convolution is a (linear) operation of summing elements in a local neighbourhood in the image, each weighted by the given kernel coefficients, with an aim to cause an effect on the input image (i.e., blurring, enhancement, edge detection). In the DL context, we use the word kernel when referring to this small matrix. The coefficients of the matrix are called the kernel weights. The learning process consists of finding the optimal weights for each convolutional kernel. Most of the time, the features extracted with the convolutional layers are not complex enough as to represent and analyze the relevant information in the image. A common strategy is to encode the features into a high dimensional space, process them and recover the original spatial representation by decoding the processed features. In the encoding path, the number of filters in the convolutional layer is increased and the size of the image decreased. This way, a higher dimensional space of features is reached (see Fig. 4.6b). To recover the original spatial representation, the number of filters is decreased as the spatial dimensions are increased (see Fig. 4.6d). The architectures that follow this schema are called encoder-decoders. A well established encoder-decoder for biomedical image analysis is the U-Net, which has encoding levels in the contracting path (the encoder), a bottleneck and decoding levels in the expanding path (decoder). See Fig. 4.7 for a graphical description of the U-Net-like architecture used in the current workflow.

The layers in Keras can be defined as output = Operation(number of filters, size)(input). Some additional arguments that can be specified are: the type of activation function used in the convolutional layer (activation), the initial distribution of the weights (kernel_initializer), and whether to use zero padding or not to preserve the size of the images after every convolution (padding).

The encoding path of the U-Net can be programmed simply by a downsampling of the image. Here we use AveragePooling2D.^{Footnote 14} Similarly, the decoding can be achieved by upsampling. However, in this case, we decided to use transposed or inversed convolutions (Conv2DTranspose) that need to be trained as well as the convolutional layers. The final configuration is as follows:

Note that the layers are sequentially connected, that is, the output of a layer is the input of the following layer.

4.4.3.3 Step 3.3: Loss and Accuracy Measures

The training schedule is a common optimization process. During each iteration of the training, the output of the CNN is compared with the corresponding GT through a loss function (summarizing the differences between them as a numerical value). Hence, the learning process consists in minimizing the loss function. To perform this optimization, the gradient of the loss function is computed and the network parameters (the kernels weights) are updated accordingly in the direction of the gradient variation by step sizes proportional to the learning rate.

The most common loss functions are the mean squared error (MSE), the binary cross-entropy (BCE) and the categorical cross-entropy (CCE). MSE is used for regression problems (when the output is not a class but a continuous value), while BCE and CCE are used in classification tasks. Patterson and Gibson (2017) provide further details about loss functions in DL. TensorFlow and Keras have also implemented quite many ready-to-use loss functions.^{Footnote 15} Standard optimizers for neural networks are the Stochastic Gradient Descent (SGD) (Kiefer et al., 1952), Root Mean Square propagation (RMSprop)^{Footnote 16} and Adaptive Moment Estimation (Adam) (Kingma and Ba, 2014). The latter is an optimization algorithm specifically designed for DL.

Here, we use the CCE loss function (Eq. 4.1), and the Adam optimizer with a learning rate set to 0.0003 (experimentally estimated but learning rates are typically in this range of values; see comments in Appendix):

$$\begin{aligned} CCE(y, p) = -\sum \limits _{c=1}^{C}y_{i,c}log(p_{i,c}) \end{aligned}$$

(4.1)

where y is the GT, p the predicted value, C the total number of classes ($C=3$ in this case); $y_{i,c}=1$ if the class of the observation i is c and 0, otherwise, and $p_{i,c}$ is the predicted probability for the observation i of being of class c. The values of the loss function are usually difficult to interpret since the better the performance is, the lower its value. The accuracy measure gives an indication of how close is the output of the network to the Ground Truth. This metric is easier to interpret and visualize than the loss value but it is not suitable to guide the network optimization during training. Its values are limited to the $\left[ 0,1\right] $ range, 1 being a perfect match between the result and the GT. Some standard accuracy measures for classification are the Jaccard index (also called Intersection over Union (IoU)), the Dice coefficient, the Hausdorff distance and the rate of True or False Positives and Negatives.

In Keras, many standard loss functions are available but we need to define a suitable accuracy measure for the problem at hand. As we deal with a segmentation task, we will use the Jaccard index, a good indicator of the overlap between our predicted and target segmented cells. It is defined for a binary image as:

$$\begin{aligned} J(y, p) = \frac{|y\cap p|}{| y\cup p|} = \frac{TP}{TP + FN + FP} \end{aligned}$$

(4.2)

where y is the GT, p the predicted value, TP the true positives, FN false negatives and FP false positives. Note that the Jaccard index measures the ratio of correctly classified pixels. Although the network output has three channels (background, foreground and object-contours), we compute the accuracy measure as the average Jaccard index of the last two classes (channels). Since many pixels belong to the background class, including them into the computation would produce misleadingly high Jaccard index values. A function computing this metric can be implemented in TensorFlow as follows:

Once the network and all the required functions have been defined, we can compile the model by calling:

4.4.3.4 Step 3.4: Executing the Training Schedule

We set up the training schedule with a maximum of 100 epochs^{Footnote 17} and a batch size^{Footnote 18} of 10. The validation accuracy is monitored during the training. If it does not change for a certain number of epochs (i.e., patience), then the training process is interrupted and the best performing instance of the model is returned. Patience is initially set to 50 using the EarlyStopping callback of Keras.

To execute the training process, we just need to specify the training (X_train and Y_train) and the validation data (X_val and Y_val). During the training, the model (variable model) is automatically updated:

It is possible to store the details of the training for each epoch (variable history in the code) and plot them afterwards (Fig. 4.8):

In Fig. 4.8, we can observe that the loss value in the training dataset decreases after each epoch while the loss for the validation data does only decrease until epoch 40 and then starts to increase slightly. This is a sign that the training cannot further improve the model and could even degrade it by over-fitting to the training dataset. A similar behavior can be observed when looking at the Jaccard index. It seems that the method can still do it better for the training dataset but not for the validation set. This is the second hint pointing that the model was optimized as much as possible given the training data.

Exercise 2

Train the network using a smaller amount of images. This can be done easily, by reducing the file lists train_input_filenames, train_masks_filenames and train_contours_filenames, in Step 3.1. You will notice that when using few images the accuracy of the network on the validation and test data is decreased. We suggest to increase the number of epochs so you can also visualize any existing over-fitting or whether the network needs a longer training process.

4.4.4 Step 4: Evaluating the Trained Model

Keras enables simple evaluation of the performance of the method as long as the same information as for the training is available for the test dataset (input and GT images). For this, we just need to initialize two variables X_test and Y_test, see Exercise 3.

Exercise 3

Same as what was asked in Exercise 1, read the images in the test folder and create two normalized Numpy arrays X_test and Y_test. However, note that random patches are not adopted this time as we want to evaluate the performance on the whole image. Additionally, the size of the network input needs to be a multiple of 16 due to the downsampling layers and skip connections (Fig. 4.7). Hence, crop the largest possible ($560\times 704$ pixels) central patch for each image and its manual annotations. The expected shapes of X_test and Y_test are $90\times 560\times 704\times 1$ and $90\times 560\times 704\times 3$, respectively.

4.4.5 Step 5: Building a DeepImageJ Bundled Model to Process New Data

4.4.5.1 Step 5.1: Saving the Trained Model in TensorFlow’s Format

DeepImageJ is a plugin toolset in Fiji/ImageJ designed to load and run TensorFlow models. Next, we show how to store the model in a SavedModel ProtoBuffer format (default file format in TensorFlow), so that deepImageJ can read it and process an image directly loaded from ImageJ using the trained model:

A new folder called DeepImageJ-model is created with two items inside: saved_model.pb and a folder variables. We recommend to compress this folder into a DeepImageJ-model.zip file and download it so you can work on it locally with Fiji/ImageJ:

Unzip the file in your local machine. Note that the folder should look exactly like the one we had in the cloud (DeepImage-model).

4.4.5.2 Step 5.2: Creating a DeepImageJ Bundled Model

DeepImageJ comprises three different plugins: Run, Explore and BuildBundled Model. First, the TensorFlow model needs to be converted into a deepImageJ’s bundled model. Click on ImageJ> Plugins> DeepImageJ> Build BundledModel and open an example image for this processing. We opened the image t199.tif from the test set. A dialog box pops up indicating the steps to follow (see Fig. 4.9).

The pre-processing ImageJ macro^{Footnote 19} is used to normalize the input images:

If no post-processing macro is set, we get the raw output of the network (Fig. 4.10). However, we would like to identify each independent cell in the mask (i.e., instance segmentation). So, a distance transform Watershed routine is included in the post-processing macro^{Footnote 20} together with some morphological operations to split cell clusters and refine the results:

4.4.6 Step 6: Process All Images in Fiji Using DeepImageJ and MorpholibJ

We are now reaching the final stage of the workflow! We are ready to quantify the morphology of the cells from the test set. Download the data from the CTC web page (Sect. 4.2) and unzip it. Use the Fiji/ImageJ macro provided in this chapter^{Footnote 21} to process the new images. Please, update the path in the macro with the location of the unzipped CTC images in your computer.

With this macro, the individual masks of the cells extracted from the downloaded CTC images will be stored (one label image per input image) together with their corresponding morphological measurements in an easy-to-read comma-separated values (CSV) file (see Fig. 4.11). More precisely, for each segmented cell, the area, perimeter, circularity, Euler number, bounding box, centroid coordinates, equivalent ellipse, ellipse elongation, convexity, maximum Feret diameter, oriented box, oriented box elongation, geodesic diameter, tortuosity, maximum inscribed disc, average thickness and geodesic elongation will be recorded. For a detailed description of each measurement, see the latest version of MorphoLibJ manual.^{Footnote 22}

Take-Home Message

In this chapter, we have presented a complete bioimage analysis workflow leveraging a DL model to segment cells from phase contrast images. The proposed workflow is versatile and meant to be customizable to other image segmentation-related tasks. As was demonstrated, DL models for bioimage processing can be easily used in Fiji/ImageJ. However, trained models do not perform generally as well on new (and different) images unless they are re-trained. That being said, the proposed workflow can be effortlessly applied to new (similar) datasets by simply modifying the input folders and reproducing the steps described in this document.

Notes

1.
This chapter was communicated by Sebastién Tosi, IRB Barcelona, Spain.
2.
https://github.com/saalfeldlab/paintera.
3.
https://github.com/mastodon-sc/mastodon.
4.
http://celltrackingchallenge.net/2d-datasets/.
5.
Ground Truth: It refers to manually annotated images or to the output of controlled simulations. It is the ideal solution that we expect from a computational processing.
6.
Silver Truth: It refers to the combination of all the predictions for this particular dataset of the best performing algorithms in the challenge.
7.
https://github.com/NEUBIAS/neubias-springer-book-2021/blob/master/Ch04_Building_a_Bioimage_Analysis_Workflow_using_Deep_Learning/data4notebooks.zip.
8.
http://data.celltrackingchallenge.net/challenge-datasets/PhC-C2DL-PSC.zip.
9.
All the steps described in this chapter are reproducible in Fiji and ImageJ.
10.
https://colab.research.google.com/.
11.
https://github.com/NEUBIAS/neubias-springer-book-2021/blob/master/Ch04_Building_a_Bioimage_Analysis_Workflow_using_Deep_Learning/notebook/U_Net_PhC_C2DL_PSC_segmentation.ipynb.
12.
https://colab.research.google.com/github/NEUBIAS/neubias-springer-book-2021/blob/master/Ch04_Building_a_Bioimage_Analysis_Workflow_using_Deep_Learning/notebook/U_Net_PhC_C2DL_PSC_segmentation.ipynb.
13.
When the model processes the training data accurately but fails to generalize the accurate prediction to the test set, we say that it over-fits the training data.
14.
More pooling layer types at https://keras.io/api/layers/pooling_layers/.
15.
https://www.tensorflow.org/api_docs/python/tf/keras/losses.
16.
G. Hinton, 2012 (https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf).
17.
Epochs: the number of times that the whole data is covered in the learning process.
18.
Batch size: the number of training examples seen by the network before updating its weights.
19.
Macro available at https://github.com/NEUBIAS/neubias-springer-book-2021/tree/master/Ch04_Building_a_Bioimage_Analysis_Workflow_using_Deep_Learning/ij-macros/preprocessing.txt.
20.
Macro available at https://github.com/NEUBIAS/neubias-springer-book-2021/tree/master/Ch04_Building_a_Bioimage_Analysis_Workflow_using_Deep_Learning/ij-macros/postprocessing.txt.
21.
Macro available at https://github.com/NEUBIAS/neubias-springer-book-2021/blob/master/Ch04_Building_a_Bioimage_Analysis_Workflow_using_Deep_Learning/ij-macros/Step-5-process-folder.ijm.
22.
https://github.com/ijpb/MorphoLibJ/releases/download/v1.4.3/MorphoLibJ-manual-v1.4.3.pdf.
23.
A good python library to implement DA generators with a wide variety of transformations: https://github.com/aleju/imgaug.
24.
Notebook with the implementation: https://github.com/NEUBIAS/neubias-springer-book-2021/blob/master/Ch04_Building_a_Bioimage_Analysis_Workflow_using_Deep_Learning/notebook/U_Net_PhC_C2DL_PSC_segmentation_DA.ipynb.

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: A system for large-scale machine learning. In: 12th $\{$USENIX$\}$ symposium on operating systems design and implementation ($\{$OSDI$\}$ 16), p 265–283
Google Scholar
Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y, McArt DG, Dunne PD, McQuaid S, Gray RT, Murray LJ, Coleman HG et al (2017) Qupath: open source software for digital pathology image analysis. Sci Rep 7(1):1–7
Article CAS Google Scholar
Berg S, Kutra D, Kroeger T, Straehle CN, Kausler BX, Haubold C, Schiegg M, Ales J, Beier T, Rudy M, Eren K, Cervantes JI, Xu B, Beuttenmueller F, Wolny A, Zhang C, Koethe U, Hamprecht FA, Kreshuk A (2019) ilastik: interactive machine learning for (bio)image analysis. Nat Methods 16:1226–1232. https://doi.org/10.1038/s41592-019-0582-9
Bisong E (2019) Google colaboratory. Building machine learning and deep learning models on google cloud platform. Springer, Berlin, pp 59–64
Google Scholar
Cardona A, Saalfeld S, Schindelin J, Arganda-Carreras I, Preibisch S, Longair M, Tomancak P, Hartenstein V, Douglas RJ (2012) Trakem2 software for neural circuit reconstruction. PLoS One 7(6):e38011
Google Scholar
Chollet F, et al. (2015) keras
Google Scholar
Falk T, Mai D, Bensch R, Çiçek Ö, Abdulkadir A, Marrakchi Y, Böhm A, Deubner J, Jäckel Z, Seiwald K et al (2019) U-net: deep learning for cell counting, detection, and morphometry. Nat Methods 16(1):67–70
Article CAS Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge. http://www.deeplearningbook.org
Kapur T, Pieper S, Fedorov A, Fillion-Robin JC, Halle M, O’Donnell L, Lasso A, Ungi T, Pinter C, Finet J et al (2016) Increasing the impact of medical image computing using community-based open-access hackathons: the NA-MIC and 3d slicer experience. Med Image Anal 33:176–180
Article Google Scholar
Kiefer J, Wolfowitz J et al (1952) Stochastic estimation of the maximum of a regression function. Ann Math Stat 23(3):462–466
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. eprint: 1412.6980
Google Scholar
Legland D, Arganda-Carreras I, Andrey P (2016) Morpholibj: integrated library and plugins for mathematical morphology with imagej. Bioinformatics 32(22):3532–3534
CAS PubMed Google Scholar
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88
Article Google Scholar
Gómez-de Mariscal E, García-López-de Haro C, Donati L, Unser M, Muñoz-Barrutia A, Sage D (2019) Deepimagej: a user-friendly plugin to run deep learning models in imagej. bioRxiv p 799270
Google Scholar
Maška M, Ulman V, Svoboda D, Matula P, Matula P, Ederra C, Urbiola A, España T, Venkatesan S, Balak DM et al (2014) A benchmark for comparison of cell tracking algorithms. Bioinformatics 30(11):1609–1617
Article Google Scholar
Ouyang W, Mueller F, Hjelmare M, Lundberg E, Zimmer C (2019) Imjoy: an open-source computational platform for the deep learning era. Nat Methods 16(12):1199–1200
Article CAS Google Scholar
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol. 32. Curran Associates, Red Hook, p 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Patterson J, Gibson A (2017) Deep learning: a practitioner’s approach. O’Reilly, Beijing. https://www.safaribooksonline.com/library/view/deep-learning/9781491924570/
Roh Y, Heo G, Whang SE (2021) A survey on data collection for machine learning: a big data-AI integration perspective. IEEE Trans Knowl Data Eng 33(4):1328–1347
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 234–241
Google Scholar
Rosenblatt F (1961) Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Tech. rep., Cornell Aeronautical Lab Inc Buffalo NY
Google Scholar
Rueden CT, Schindelin J, Hiner MC, DeZonia BE, Walter AE, Arena ET, Eliceiri KW (2017) Imagej 2: Imagej for the next generation of scientific image data. BMC Bioinf 18(1):529
Google Scholar
Saalfeld S, Cardona A, Hartenstein V, Tomančák P (2009) CATMAID: collaborative annotation toolkit for massive amounts of image data. Bioinformatics 25(15):1984–1986. https://doi.org/10.1093/bioinformatics/btp266. https://academic.oup.com/bioinformatics/article-pdf/25/15/1984/555362/btp266.pdf
Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B et al (2012) Fiji: an open-source platform for biological-image analysis. Nat Methods 9(7):676–682
Article CAS Google Scholar
Schneider CA, Rasband WS, Eliceiri KW (2012) Nih image to imagej: 25 years of image analysis. Nat Methods 9(7):671–675
Article CAS Google Scholar
Sofroniew N, Lambert T, Evans K, Nunez-Iglesias J, Yamauchi K, Solak AC, Bokota G, ziyangczi, Buckley G, Winston P, Tung T, Pop DD, Hector, Freeman J, Bussonnier M, Boone P, Royer L, Har-Gil H, Axelrod S, Rokem A, Bryant, Kiggins J, Huang M, Vemuri P, Dunham R, Manz T, jakirkham, Wood C, de Siqueira A, Chopra B (2020) napari/napari: 0.3.8rc2. https://doi.org/10.5281/zenodo.4048613
Ulman V, Maška M, Magnusson KE, Ronneberger O, Haubold C, Harder N, Matula P, Matula P, Svoboda D, Radojevic M et al (2017) An objective comparison of cell-tracking algorithms. Nat Methods 14(12):1141–1152
Article CAS Google Scholar
von Chamier L, Jukkala J, Spahn C, Lerche M, Hernández-Pérez S, Mattila PK, Karinou E, Holden S, Solak AC, Krull A, Buchholz TO, Jug F, Royer LA, Heilemann M, Laine RF, Jacquemet G, Henriques R (2020) Zerocostdl4mic: an open platform to simplify access and use of deep-learning in microscopy. https://doi.org/10.1101/2020.03.20.000133. https://www.biorxiv.org/content/early/2020/03/20/2020.03.20.000133, https://www.biorxiv.org/content/early/2020/03/20/2020.03.20.000133.full.pdf
Weigert M, Schmidt U, Boothe T, Müller A, Dibrov A, Jain A, Wilhelm B, Schmidt D, Broaddus C, Culley S, Rocha-Martins M, Segovia-Miranda F, Norden C, Henriques R, Zerial M, Solimena M, Rink J, Tomancak P, Royer L, Jug F, Myers EW (2018) Content-aware image restoration: pushing the limits of fluorescence microscopy. Nat Methods 15(12):1090–1097. https://doi.org/10.1038/s41592-018-0216-7
Yushkevich PA, Piven J, Cody Hazlett H, Gimpel Smith R, Ho S, Gee JC, Gerig G (2006) User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31(3):1116–1128
Article Google Scholar

Download references

Acknowledgements

This work is partially supported by the Spanish Ministry of Economy and Competitiveness (TEC2015-73064-EXP, TEC2016-78052-R and PID2019-109820RB-I00) and by a 2017 and a 2020 Leonardo Grant for Researchers and Cultural Creators, BBVA Foundation. We thank the program "Short Term Scientific Missions" of NEUBIAS (network of European bioimage analysists). We also want to acknowledge the support of NVIDIA Corporation with the donation of the Titan X (Pascal) GPU board used for this research. We would like to thank the continuous support of DeepImageJ contributors: C. García-López-de-Haro (UC3M) and D. Sage (EPFL).

Author information

Authors and Affiliations

Department of Bioengineering and Aerospace Engineering, Biomedical Imaging and Instrumentation Group, Universidad Carlos III de Madrid, Leganés, Spain
Estibaliz Gómez-de-Mariscal
Donostia International Physics Center (DIPC), Guipúzcoa, Spain
Daniel Franco-Barranco
Department of Bioengineering and Aerospace Engineering, Biomedical Imaging and Instrumentation Group, Universidad Carlos III de Madrid, Leganés, Spain
Arrate Muñoz-Barrutia
Departamento de Ciencias de la Computación e Inteligencia Artificial, Facultad de Informática, Universidad del Pais Vasco, Guipúzcoa, Spain
Ignacio Arganda-Carreras

Authors

Estibaliz Gómez-de-Mariscal
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Franco-Barranco
View author publications
You can also search for this author in PubMed Google Scholar
Arrate Muñoz-Barrutia
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Arganda-Carreras
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nikon Imaging Center, University of Heidelberg, Heidelberg, Germany
Kota Miura
Department of Information Technology, Uppsala University, Uppsala, Sweden
Nataša Sladoje

Appendix

4.1.1 Training Hyper-Parameters

The hyper-parameters of a DL model (e.g., number of filters) affect the training process and the final result or instance of the model. The study of how to optimize hyper-parameters is a field itself in Computer Vision. Note that the training is a stochastic procedure for which it is almost impossible to reproduce exactly the same training schedule. Additionally, optimizing the combination of hyper-parameters is an exhausting task due to the large amount of time and complexity it requires. In the following paragraphs, we provide you with some tips on how to adjust the most common hyper-parameters:

Size of the convolution kernels: The larger the kernel size is, the wider the receptive field of the CNN is. Namely, the size of the region in the input that produces the feature is larger. However, it is unusual to see kernel sizes larger than $5\times 5$ as it compromises the use of memory. Note that 3D convolutions are also available in Keras (Conv3D) and are defined using 4 dimensions: filters and size of the kernel.
Number of features in the convolutional layers: The more layers and features in a network, the deeper it is and, in theory, the higher its generalizing capacity. For the U-Net, it is recommended to start at least with 16 features in the first convolutional layer and double it as the encoder path becomes deeper (Fig. 4.7).
Learning rate: We choose 0.0003 since experimentally, it was the value for which we got best results. Nevertheless, we tried other values such as 0.001, 0.0005 and 0.0001, as the choice of an optimal learning rate value still remains a trial-and-error problem.
Number of training epochs: It is recommended to set a high value, monitor the training and stop it once you are satisfied with the result (high accuracy, no over-fitting) or you see no improvement.

4.1.2 Optimizer

There are three most common optimizers (SGD, Adam and RMSprop). We chose Adam empirically as, for this dataset, it makes the training improving faster. Adam is also a very common choice, since it is a computationally efficient optimizer that adapts the learning rate to produce smooth convergence. SGD, on the other hand, maintains a single learning rate value for all weight updates during the whole training and can therefore get stuck in local minimum. That being said, and although SGD takes more time to train the model, it sometimes leads to a better generalization of the network.

4.1.3 Halo and Receptive Field of a Network

The output of a single convolution has a smaller size than the input image, unless extra values are added around the image (i.e., padding is performed). For CNN, the pixels in the contours of the output image need to be discarded. The halo is equal to the cumulative padding performed along the CNN and it is determined by the receptive field of one pixel (R) in the U-Net:

$$\begin{aligned} R = 2^{p}\left( l\left( \frac{k-1}{2}\right) \right) + 2\sum \limits _{i=0}^{p-1}2^i\left( l\left( \frac{k-1}{2}\right) \right) \end{aligned}$$

(4.3)

where k is the kernel size for each convolutional layer, p is the number of poolings, and l is the number of convolutional layers at each level of the U-Net. In our case $k=3$, $p=3$ and $l=2$, so the receptive field is 44 (see Fig. 4.7). In Eq. 4.3, it is assumed that the encoding and decoding paths are symmetrical (i.e. same number of down and up-samplings). Likewise, it is assumed that all convolutional kernels are squared and are all of the same size. The last one, as it is of size $1\times 1$, does not affect the final result of R. See Appendix in the notebook for a computational solution when the analytical expression for R is not available.

4.1.4 Data Augmentation

Increasing the amount of data when training a DL model can improve its capacity to generalize and its performance. However, and often in the biomedical field, obtaining annotated data is difficult and expensive. Therefore, a common technique called data augmentation (DA) is used in DL to provide the model with more unseen data. It consists on creating new images applying some transformations to the original ones (i.e., flips, shearing, shifting, zooming). More complex techniques such as elastic transformations, contrast changes or blurring can also be used.^{Footnote 23} The goal is to generate plausible images, so not all the transformations may necessarily improve the training process. For instance, applying contrast variations in the DA process may hinder the learning process if those are not present in the real image data set.

Here we present a common DA implementation based on Keras class

ImageDataGenerator and its inner flow() function, that allows us to choose between a bunch of different transformations.^{Footnote 24} Its implementation enables DA on the fly: it applies a random transformation to each image patch before feeding it to the network. Hence, in each iteration, a new sample not seen before is used to train the network. Note that the channels of each mask should be transformed together with their corresponding image. This can be ensured by (1) choosing the same generator configuration for each of the channels in the masks and the input images (X_train), and (2) setting the seed parameter to the same value for all the cases.

The following code contains a function that creates a DA generator to transform the image patches. By default, the applied transforms include a random choice between 90, 180 or 270 rotations, and vertical and horizontal flips:

As the network is fed with generators, a validation data generator needs to be created. Note that we create it without applying any transformation to the data, as it needs to be unchanged and always the same to ensure a correct validation of the model.

The following code calls the previous function to create the data generators and displays a few images ensuring that the generators produce a transformed version of the original ones together with their associated masks (see Fig. 4.12):

As we are training with generators, on this version of Tensorflow the function to train the network must be changed to fit_generator() instead of fit(). Thus, the following code should be used:

Solutions to the Exercises