FormalPara What You Will Learn in This Chapter

Many of the methods developed for the analysis of bioimages focus on microscopy images on the cellular level. However, bioimages can also be used by biologists to assess non-cellular level morphological phenotypes. Collecting non-cellular images and developing image workflows for them is similar to working with microscopic images, but also has its unique challenges. We hope to impart upon the reader the following:Footnote 1

  1. 1.

    Why images and workflows are necessary for improved assessment of subjective phenotypes (e.g. shades of color);

  2. 2.

    Which points to consider when collecting color images;

  3. 3.

    How to incorporate an ilastik segmentation model into an ImageJ macro;

  4. 4.

    One example workflow illustrating how to derive metrics for spatial patterns.

7.1 Introduction

Often times morphological phenotypes are subjectively scored manually, based on visual inspection, using a rating scale for either the severity of a characteristic or simply the presence or absence of a feature. However, in the cases where images exist, the pixel values and patterns of distribution can be more accurately measured using automated algorithms leading to more fine-scaled analyses, which was the original motivation for the work done in this chapter.

7.1.1 What Is the Big Deal with Color Images and Fly Eyes?

Color is a phenotype that has been used throughout the history of genetics research. The birth of modern genetics began with Mendel’s systematic experiments with the colors of the flowers and seeds of the pea plant, but even before this, farmers were selecting on color for livestock and crops (Mendel, 1866). In this chapter we will focus on the popular model organism, Drosophila melanogaster, the fruit fly. One of the first phenotypes described in D. melanogaster involved a mutant of the sex-linked gene, white, which encodes for an important intermediate product that leads to the red pigmentation in fly eyes. Mutant flies have white eyes as opposed to wild-type red eyes (Morgan, 1910). Following the discovery of the white gene, a number of other eye-color mutants were also discovered (Morgan, 1911) (See Fig. 7.1).

Fig. 7.1
figure 1

Images of a fly head showing eyes of different colors (top row) and rectangular eye ’swatches’ cropped from head images of wm4 mutants, which have mottled eye color. The darker ’spots’ in these eye images are pigmented eye cells/patches and a corresponding Likert scale value of the degree of patchiness/pigmentation is indicated on each image. (bottom row)

More complicated eye-pigmentation mutants arose as more and more genetic tools were being developed in the fruit fly model. One mutant of note (and the focus of this paper) is the wm4 mutant (Muller, 1930). The wm4 mutant is a classical example of position-effect variegation (PEV). An inversion on the X chromosome relocates the white gene next to pericentric heterochromatin so that the neighboring chromatin state determines whether or not the white gene is expressed. When the neighboring chromatin is in the euchromatic state, white is expressed, whereas in the heterochromatic state, white is silenced. These alternate states are subject to random cellular events during eye development, so that patches of cells with different chromatin states exist in the same eye. The result is eyes with mottled or variegated patterning as shown on Fig. 7.1, with some eye cells expressing white and therefore red pigmented, whereas other eye cells have silenced white and thus are white. Other PEV mutants include bwVDe2, which places pericentric heterochromatin next to the brown (bw) gene, resulting in variegated brown pigment in the eyes (Sass and Henikoff, 1998).

PEV mutants have been used to indirectly assess overall changes in chromatin regulation in genetic experiments, with more pigmented eye cells implying ‘looser’ chromatin, and fewer pigmented eye cells implying ‘tighter’ packed chromatin.

7.1.2 How Is PEV Quantified Now and Potential Issues

Different methods to assess the amount of “variegated-ness” have been developed over time, starting with pigment extraction and quantifying the pigment using spectrophotometry (Ephrussi and Herold, 1944). However, the reliability of this approach has been questioned and the current method proposed by Sass and Henikoff (1998) uses an experimenter-defined ranking system (or Likert scale) for the extent of red pigment based on visual inspection. Additional safeguards for reproducibility were built into the method by establishing the five-rank Likert scale on independent samples of wild-type and mutant flies and adding a second experimenter to help define the scale and independently score the flies in the same order as the first experimenter. Scores from both experimenters are then averaged together for analysis.

The Likert scale (LS) approach has been successfully used in previous studies to quantify modifiers of PEV, and is generally a popular method to quantify the intensity of a visible phenotype. Although there is nothing inherently wrong with this approach, it might not be the most appropriate in all situations. Although the scale has an adaptable number of ranks (five, six, ten, or even all the way up to one million, etc.), the LS in reality has far fewer “effective” ranks. Users of a LS tend to agree in score at the extreme ends of the scale, but there is less agreement among the middle scores. Although losing “effective” ranks is not catastrophic when phenotype modifiers are one or two genes of large effect, it could prove to be problematic in a case in which phenotype differences must be detected with fine resolution. An example of such a case is when there are many modifiers of small effect and one hopes to identify these modifiers using a quantitative-trait locus (QTL) mapping approach. By the law of large numbers, more precise Likert scores can be achieved by more independent users rating each specimen, or by increasing the sample size of specimens. Both solutions require considerably more effort and time.

7.1.3 The Fallacy of Human Perception and Why Automated Analysis of Images Is King

Color is a deceptively easy phenotype to score. Human vision, especially perception of color, is itself a highly variable trait determined by the underlying combination of molecular/genetic mechanisms (Deeb, 2005) and neural processing of visual signals (Schlaffke et al., 2015). Defects in the former can result in different forms of color blindness, which can affect a person’s ability to correctly judge a color phenotype. The latter however should be more worrying because people can often be intentionally or unintentionally tricked into incorrectly perceiving color by their own biology.

In the age of digital images, we can bypass the color-scoring biases produced by the subjectivity and variability of human perceptions and improve upon the labor-intensive gold standard of scoring PEV by visual inspection using a LS. Using images and automated analysis, we can obtain objective and consistent measurements faster via computer vision. We therefore lay out in this chapter an automated method that can quantify eye color from images of fly heads using a commercially available imaging setup and open-source analysis software. In addition, we explore ways to more precisely quantify position-effect variegation with additional spatial metrics to assess the “patchiness” of variegation.The spatial patterns of the patches might be biologically significant in determining how randomness enters the developmental process.

7.2 Dataset

For this chapter, the dataset includes 20 images taken with brightfield microscopy of the heads of male progeny produced by crossing wm4 Drosophila melanogaster females to either of two different PEV modifier mutant males. Some of the heads are heavily ’patchy’ and others are not (i.e. variable variegation). Figure 7.1 is representative of what the image set looks like.

7.2.1 Imaging Conditions

For illuminating each fly head, as with many forms of imaging, a fixed lighting source should be used and care needs to be taken that it is the only light source in the environment. Natural light, as well as overhead room lighting, can unintentionally add noise and shadows in images. Depending on the source of fixed lighting, intensity of the light could be affected by factors such as lamp warm up time.

7.2.2 About Image Acquisition, Preprocessing, and Color Normalization

We used the Keyence VHX-1000 microscopy system for our image acquisition. All fly heads were imaged at 400\(\times \) magnification for a 0.6 s exposure. Before processing each batch of specimens, the camera was white balanced with a standard white card (Vello white balance card set). In addition, we acquired both a darkfield and a brightfield background image to perform corrections against camera sensor noise (e.g. “hot pixels”), as well as variable background illumination intensity (i.e. flat-field correction; Landini, 2020). Finally, we imaged an 18% grey card (Vello white balance card set) for color normalization across imaging batches.

Heads are not flat and the depth of field (focus) is shallow when imaging small objects up close, so we take image stacks of each head and apply a focus-stacking process to the stack to produce a single image that has a greater depth of field, and thus the entire head is in focus. On the Keyence VHX-1000 microscope, stack acquisition and focus-stacking are automated, resulting in a single fully-in-focus TIF image of each head. However, any microscope that can take image stacks can be used, and open-source image processing software packages such as ImageJ have components for focus-stacking.Footnote 2

7.2.3 Dataset Download

Image data used in this chapter can be downloaded from our Zenodo repository (http://doi.org/10.5281/zenodo.3975644) or from the GitHub repository associated with this book.

The ZIP package contains several folders. The first one called "Data" contains two folders relative to the two different investigated mutants. A second folder called "Processfiles" contains some files used in preprocessing for white balancing and dead pixel corrections, the ilastik training file and the full code. Note that due to limitations of the ilastik plugin it is important to save the ilastik-project file (.ilp) in a folder structure without spaces in the path.

Once the data is downloaded, we can do a quick pre-assignment before we dive into the workflow. Looking at the raw non-processed sample images, appreciate the variability of patchiness of the fly eye; as it is commonly said—"beauty is in the eye of the beholder." Use the subjective Likert scale approach to create a base set of ’manual scores’ and rate the ’patchiness’ of the left and right eye images on a scale of 0 (no patches) to 5 (very patchy). After this chapter, we can compare these subjective evaluations against the objective measurements obtained from this chapter’s automated processing and analysis macro.

7.3 Tools

7.3.1 ilastik Configuration in ImageJ

The ilastik plugin for ImageJ needs to be installed and ensured to be up-to-date. Before running the macro command, be sure that the plugin is properly configured. In the submenu of the ilastik plugin, select “Configure ilastik Executable location”. You will be asked to choose the file ilastik.exe, the number of threads to use (4, use \(-1\) to use the maximum number), and amount of RAM to dedicate to the task (4 GB is more than enough). As we will work with (relatively small) 2D images, there is no need to allocate more than a few gigabytes of RAM.

7.4 Workflow Overview

Fig. 7.2
figure 2

Schematic of the workflow, with boxes colored in orange when the code is dealing majorly with images, in grey with arrays, and in green when it computes the features which are stored in the output tables and used in the subsequent analysis

The workflow overview (Fig. 7.2) is described below. Fully automatic steps and steps that require user interactions are separately labeled. References to the lines in the full macro code found in the code repository of this book are also provided.

  1. 1.

    Workflow preparation

    • Asking user to input working directory (User Interactive) [line 6–19];

    • Listing all files in user directory, preparing empty arrays for data collection, and finally opening the image of the fly head (Automated) [line 23–30; 209–233];

  2. 2.

    Crop the left and the right eye areas (User Interactive) [line 50–67; 181–207];

  3. 3.

    Use ilastik to perform pixel-based segmentation of the fly eye and get binary masks (Automated) [line 70–74];

  4. 4.

    Use the binary masks to extract relevant information concerning the eye and its patchiness (Automated):

    • Part A: Simple metrics [line 79–128]

      • Analyze Patch Area

      • Analyze Patch Intensity;

    • Part B: More advanced metrics

      • Assess ’Crowdedness’ via calculating the Maximum Triangle Packing Value [line 137, 346–353]

      • Assess ’Organizedness’ using distances between patches [line 139–179, 268–344];

  5. 5.

    Export the calculated features [line 356–389];

  6. 6.

    Batch processing with multiple folders (application of Chapter 1) [line 23–30; 209–233].

The subsequent sections describe in more details each of the steps included in the workflow.

7.5 Step 1. Workflow Preparation

7.5.1 Selecting the Working Directory

We start with a very simple user interface that will ask for the directory containing images of the whole fly heads. We present two options: to use the getDirectory function, or the #@ File script parameter.

The getDirectory works in one step by calling the function that will directly open a path selection window. The #@ File option is more customizable and works in several steps. A user interface will open with prompt to select a working directory, which can be set using a Browse button. Once selected, the macro will continue with the selected directory by clicking Ok. By pressing Cancel, it will stop right away. Both methods are valid, but the former does not allow prompts for the user and this can get confusing if a workflow requires multiple sources of files and additional input (e.g. output folder, labels, etc.), which can be handled in one window with #@ File. Thus we find #@ File the more intuitive for the user since it enables to provide hints to the user for what to select. For more information on script parameters, please refer to the Batch Processing chapter (Ch. 1) in this book.

figure a

We use the working directory to gather the list of files to analyze. For now, all the images are contained in one folder. At the end of this chapter, we will present a solution to navigate subfolders.

Fig. 7.3
figure 3

Illustration of the Cropping (Step 2) and Segmentation (Step 3). Rectangular areas are cropped from the left and right eyes and ilastik is used to perform the segmentation (1. black \(=\) background, 2. white \(=\) debris, 3. grey \(=\) pigmented patches)

7.6 Step 2. Cropping Left and Right Eye Areas

Once an image is open, the next step is to implement a preprocessing cropping step to sidestep an issue of segmenting the eye from the ’face’ of the fly. Segmentation by thresholding is difficult in cases when the fly eye color is closer to white such that the boundary between the eye and the face is almost indistinguishable. Thus we simplify this step and crop rectangles out from the eye regions for further analysis. This is illustrated in Fig. 7.3. In our provided data set, our rectangular cropped image of an eye is \(184 \times 416\) pixels in size because this was determined to be the largest rectangle that fits in the area of the fly eye in the images of the whole fly head. In general, the cropping could be of any (suitable) size or shape.

Exercise 1: Write a Cropping Macro Using the Help and Hints Below

Write a small function that marks the region of interest (ROI) on the main image before cropping it.

  • Display a rectangular ROI;

  • Allow the user to reposition;

  • Crop the rectangle.

Hint #1: To pause the macro and let the user do the ROI selection, one can use the command waitForUser. We use it as a direct interface between the code and the user, but it could also be used, in principle, as a hacky breakpoint to pause macros for debugging code.

Hint #2: In Fiji, we crop by duplicating ROIs. Thoughtful naming of the duplicated image can help organize and describe what is in the image. We propose to attune the original file name to reflect which eye has been cropped. Therefore the function needs to handle another argument.

Extra credit: Save the duplicated image as a file in the working directory. We can entertain the possibility that the function could check if the cropped image of the eye exists in the working directory. If it does already exist, the function can ask the user if they wish to crop again in case the crop was not satisfactory in the previous attempt.

The solution to this exercise can be found at the end of the chapter.

7.7 Step 3. Segmentation by Using Ilastik

With the cropped images now saved, we can move forward with segmentation by using ilastik. We use here the simple Pixel-based Segmentation option in the ilastik software (Berg et al., 2019), which can be run with the following command:

figure b

It will take a few seconds to get the segmented image, but afterwards is where things get interesting; we can now start extracting information from the images. An example of segmentation is shown in Fig. 7.3.

We are not going to go into details regarding using ilastik itself here, because great tutorials are available on the ilastik website.Footnote 3 We do not provide a training set for the performed segmentation, but instead provide a pre-trained ilastik model as part of the accompanying materials to this chapter. We note that training is laborious and requires some Drosophila domain knowledge in recognizing features in the images. However, we will make some general points about the training process itself and how segmentation works in ilastik.

ilastik segmentation is based on machine learning and requires selecting features, and training a model to categorize pixels into different segmentation classes. Training ilastik models requires an annotated training set of images, which can be created by the user within the software. In short, the user assigns labels to pixels using a very intuitive interface where they simply draw or ’color’ on a training image. We found that the accuracy of the annotation was dependent on how familiar the user was with the subject matter in the images, with more experience leading to more accurate annotation. To minimize the effect of variations in the subsequent analysis due to different levels of the domain knowledge (affecting quality of annotations), we provide a trained model as a part of this workflow. In our case, we have three output classes: the background (i.e. white), pigmented patches (i.e. red), and debris (e.g. bristles and flecks that were not successfully cleaned off pre-imaging).

Images segmented using the provided ilastik model will have new pixel values corresponding to their label numbers. The pixels with the value 0 are pixels classified with label 1 (background), those with the value 1 are classified with label 2 (debris), and those with the value 2 are classified with label 3 (pigmented patches). Selecting a specific pixel value (i.e. specific label) is easy with the thresholding tool in ImageJ with the lower and upper bound set to the desired pixel/label value.

figure c

Being able to select labels now allows us to extract information on relevant features in the image. We will see how to retrieve intensity information from the original images in a bit.

7.8 Step 4. Extracting Measurements from the Segmented Objects

7.8.1 Part A: Simple Metrics Using [Analyze Particles...]

Running [Analyze Particles...] will give all the information we are going to need for the next parts of the analysis workflow. We will set up the measurements by selecting which metrics we want to extract. For our workflow, the intensity “Mean Gray Value”, the Area, and the X and Y coordinates (“Centroid”) of each patch are important. When running [Analyze Particles...], we can also select Summarize to display additional Results tables including the number of particles (pigmented patches, in our case with fly eyes), the total area covered by the particles, and the percentage of pigmented pixels. [Analyze Particles...] will fill the roiManager. When we use [Analyze Particles...] on the segmented image (or ’binary’ masks), it should be noted that the intensity of the segmented particles is not a true reflection of the intensity of ’color’ in raw image because the value of pixels in the segmented image (resulting from ilastik) corresponds to the output class label number we defined with image annotation and model training. Therefore, to retrieve the original intensity of each ROI, another measurement must be performed on the raw cropped image, however using the ROIs obtained from the segmented image (the masks obtained from ilastik).

As with many tasks in image analysis, these metrics can also be retrieved in other ways. For example, to get the number of patches, the command roiManager("count") will also work. Additionally, the total area can be calculated by looping through each ROI, calculating the area, and updating a variable that keeps track of the total area. Dividing the total area by the number of ROIs gives the average size of each ROI. The percentage of pigmented pixels is the total area of the ROIs divided by the total size of a labelled image and multipled by 100 (i.e. the size of the labelled image is defined as the size of the cropped rectangle—ROI—which is 184*416 pixels in our particular case).

The code below shows the essence of the use of roiManager to perform measurements iteratively over differently labeled objects (different pigments), and storing measured values in arrays.

figure d

In addition to what was previously mentioned, we decided to build two arrays with length equal to the total number of patches. We will store the area of each patch in the first one, and the intensity of each patch in the second one. We will discuss the results tables later.

7.8.2 Part B: Crowdedness and Organizedness

Information about the number, average size, and percentage area of patches does not necessarily reveal much about how these patches are distributed in our eye images. We illustrate in Fig. 7.4 a case where these metrics are not enough. Let us take the example of eye images A and B, with 10 pigmented patches each, and equal total eye image area. The patches are all equal in size, but in eye A the 10 patches are arranged in a line, whereas in eye B the patches are randomly distributed across the eye. The number (10 patches), average size (100 pixels), and percentage of area covered by patches (2%) are exactly the same between eyes A and B and thus we cannot capture the difference between the distributions of the patches—the straight line versus random pattern in eye A and B, respectively, using the metrics introduced so far in this chapter. Hence, we need other metrics to describe the spatial distribution of the patches and capture this difference in patch arrangements.

Fig. 7.4
figure 4

Hypothetical examples of patches where basic metrics are exactly the same, but appearance is vastly different

Understanding spatial patterns of the patches can be biologically significant in assessing the randomness of the developmental mechanism and help us pose stronger hypotheses regarding its underlying nature. Many spatial distribution analysis methods exist already; in this chapter, we walk through how to derive some simple metrics:

  • Crowdedness: This metric will essentially take into account how large the pigmented patches are and how many more we could fit into our total image. This one metric roughly describes what a combination of number, average size, and percentage of pigmented area describes, with an advantage that it is much easier to comprehend one metric than an array of 3 metrics when performing statistical analyses.

  • Organizedness: This metric evaluates Euclidean distances between (the centroids of) the pigmented patches (ROIs) to assess how they are organized in the 2-D space of our image. We can compare what we calculate from our images with theoretical organizedness values from literature to assess how far from an ideal organization the patches are in 2-D space. We will also present a second way of assessment using a statistical method called a permutation test to generate a distribution of simulated organizedness metrics, which we can then compare our actual computed organizedness metric with, to get statistical significance (e.g. a p-value) and assess how far the considered organization is from a totally random one.

7.9 Deriving Crowdedness

7.9.1 First compute the Max Triangular Packing Value

The Max Triangle Packing value (MTPV) gives how many of the patches can fit in the studied area. We have already computed the percentage occupancy for the patches in the area in Part A. This number is a proxy of how ’crowded’ the eye is with patches. However, we are not taking into account that the eye is a compound eye composed of individual units called ommatidia, and the distribution of these patches can be influenced by these repeating units and the developmental process that produces them. By solely taking the percentage, we do not capture the information of how these patches are spatially arranged. To take the repeated eye units into account, we assume pigmented patches to have a perfectly circular shape, and the same area (and so the same radius), and we "pack" these circles into the cropped eye image, as illustrated in Fig. 7.5.

Fig. 7.5
figure 5

Triangular packing: what Drosophilla eye cells would look like if flattened out in 2D space (left), and the visual depiction of how we pack perfectly round patches into a rectangular space—length and width are expressed in terms of the number of rows m and columns n, and the average radius r of a hypothetically circular patch

This is not a perfect estimate because the eye is three-dimensional. Even though all eye units are equal in size, units viewed from different angles due to the curvature of the eye will be different-sized pixel-wise in the 2-D images. Additionally, we do see in the bottom row of images in Fig. 7.1 that one eye unit can contain several patches. However, we make an additional assumption here that the view plane is flat and one patch equals one eye unit in our first triangle packing approximation. These assumptions can be improved upon in future versions.

The simplified pattern of organizing the maximum number of circles in a rectangle, which is considered the ’optimal’ arrangement is called triangular packing. The total number of circles that fit is our Max Triangle Packing value—MTPV. Let us now work through how to come to the ’optimal’ arrangement, in the following exercise.

Exercise 2: Write the TrianglePacking Function for Our Rectangular Cropped Eye Image That Is 184 \(\times \) 416 Pixels in Size

Hint #1: First, determine, for a given average patch size, what is its radius, r, if it were a perfect circle. Here is the hint: Work backwards using the formula for the area of a circle, \(\pi r^2\), from the average area of a patch that was computed earlier in this chapter (Step 4; Part A). It is the TotalPatchArea divided by the number of patches.

Hint #2: With this radius, r, how many circular patches can fit in our region of interest? Break it down to how many circles can fit across both dimensions (width and length-wise). Use the diameter.

7.9.1.1 Calculating Crowdedness

To calculate ’Crowdedness’ we need to determine the ratio of the number of patches we have segmented (and previously counted with [Analyze Particles...] and MTPV:

$$\begin{aligned} Crowdedness = \frac{Number\ of\ patches}{Max\ Triangle\ Packing\ Value} \end{aligned}$$
(7.1)

7.10 Assessing Organizedness

For this particular metric which we will now dive into, we borrow some theoretical basis of organizedness from (Audet et al., 2010); it discusses the mathematical optimality of points arranged in 2D space.

7.10.1 Computing Pairwise Distances

We want to measure the pairwise distance between the patches. For that purpose, we will compute the Euclidean distance between the centroids of the patches. We got the coordinates of the centroid of each patch from [Analyze Particles...].

In the next set of exercises with hints following each prompt, we walk through how to compute the pairwise distances. Solutions can be found at the end of the chapter.

Exercise 3.1 : Write the Steps to Iterate Through All Pairwise Patches

We can use a drawer of n socks to illustrate how we perform counting of pairs. How many different pairs of socks can be made from the drawer? (Note: Any pair should be counted, there are no "matching" and "non-matching" socks, and no difference between "sock 1" and "sock 2".) To count these in a structured way, we take out one sock and count all pairs we can make of it and the n-1 remaining socks in the drawer. Then, we set the first sock aside and pull out a second sock and count the number of pairs we can make with it and the remaining n-2 socks left in the drawer. We move on to the third, fourth, fifth socks in the same manner, pairing it with the decreasing n-3, n-4, n-5 socks left in the drawer. This pattern leads to the number of possible pairings, as the sum of (n-1) + (n-2) + (n-3) +...+ 1, which simplifies down to n*(n-1)/2 total combinations or, in combinatorics notation, there are \({{n}\atopwithdelims (){2}}\) ways to select two elements out of n elements. Use the sock drawer algorithm to write down the steps needed to make pairwise combinations of the patches.

Hint: We need two for loops when we are considering pairs of objects. Also, be mindful of the indices!

Exercise 3.2 : Build in a Way to Keep Track of Distance Calculations

Since we know the total number of combinations, the calculated pairwise distances will be put in an array of a size of n*(n-1)/2. To fill the array in a progressive way, we can use a counter or index variable that will increment by one each time a distance is calculated. We would also like to keep track of which two patches were used to calculate each distance. Now write down the steps to achieve this.

Exercise 3.3: Write the Code to Calculate Pairwise Distances

Now translate the step-by-step instructions from the sock drawer algorithm into code in the form of a function named DistanceAnalysis.

7.10.2 Ratio of Maximum and Minimum Distances (r)

For a given number of patches, the way that the patches are organized can be defined as mathematically ’optimal’ if the distance between each pair of patches is maximized as detailed in Audet et al. (2010). Adding to this basis, the ’most optimal’ or most organized arrangement is when the ratio of the maximum distance to the minimum distance between patches reaches a minimum. It is perhaps easier to comprehend this by looking at Fig. 7.6, which gives a few examples of such ’optimal’ organizations with red lines being maximal distances and blue being minimal distances. It should be noted that there is also an additional constraint in our use-case with regards to the arrangement of the patches, since the arrangement must fit within the space and shape of the cropped eye image.

Fig. 7.6
figure 6

Examples of optimal arrangements of n points, with indicated maximal distances (red) and minimal distances (blue) in different configurations. Their ratios of Maximum and Minimum Distance (r) and a graph presenting the relationship of the ratio r and the number of patches are also shown. A fit by a power function should be used when there are more than 30 patches, which is shown in the plot (bottom right)

Exercise 4.1: Compute a Summary of Organizedness, r

We already have all the distances between all the patches that have been computed. To get a summary min-max ratio, and by that a sense of overall ’degree of organization’ of all the patches, we will use the average maximum and average minimum distances considering each pair of patches. We will build arrays that contain the maximum distances for each found patch and the same for minimum distances, take the average of each array and then compute the general min-max ratio. As we previously built a nice table containing the distances and the indices for each point, it should not be too difficult to implement this.

Hint 1: To write the structure to go through all the distances, we need to use a for loop to go through all the patches, and then a second for loop to go through all the computed distances, with the right if statement.

Hint 2: To take the mean of an array, we can sum all the terms and divide by the number of terms. Alternatively, we can use the command/function Array.getStatistics.

Exercise 4.2 : Compare the Min-Max Ratio with Theoretical r Values to Assess Organizedness

Now that we have a summary min-max ratio, we should use it to assess the degree of organization. We compare our min-max ratio to theoretically ideal ratio values provided by Audet et al. (2010) by looking up the ratio that corresponds to the number of patches we have. For the purpose of our exercise, we hard-coded the theoretical ’optimal’ ratios from Audet et al. (2010) in an array contained in the file, Min_MaxMinDistRatio.csv. We can then simply look for the right index corresponding to the number of patches and make a comparison via computing a ratio of the observed versus actual values. A value close to 1 would indicate a high degree of organization. That being said, the ideal ratio has been derived for only a small number of points. For any number above 30 points (or patches in our case), we compute an approximation using a power law function as it gives the best fit (\(R^{2}= 0.99\)) to link the expected minimal ratio r between the maximum and the minimum distance for a set of points spread in a two dimensional space and the number of points (or patches) of this set. The resulting approximation is \(r = 0,7191*[Number Of Patches]^{0.5719}\), which can be seen in plot form in Fig. 7.6.

7.10.2.1 Comparison with a Distribution of Randomly Shuffled Patches

There are alternative ways to assess organizedness in addition to looking at theoretical r values. One method we will go through now is to repeatedly shuffle the patches and measure the difference in the Minimal ratio of maximum and minimum distances, for these different patch distributions. We are essentially checking if the distribution we have and the corresponding computed min-max ratio is an outlier value, compared to the distributions of min-max ratios of random arrangements in our cropped eye image. If the patches in an eye are in a clustered pattern, that ratio will be significantly different from the ratios of random patch arrangements. In statistics, the method we are using here is called a permutation test. Implementing a random shuffling of objects seems fun and coding it could be an interesting challenge, if some guidance is provided.

Fiji has a command moveROI that takes a desired displacement as an argument. Giving a new set of coordinates for the patches can easily be done by using the command random that returns a number between 0 and 1. Multiplying this random number by the height of our cropped eye image and then repeating the same procedure with another random number and the width, gives random displacement to relocate our patches. However, the new set of coordinates needs to fulfill some constraints for the ROI to be valid.

Exercise 5.1: Come up with the Two Conditions That Must Be Checked When We Move Around ROIs

Hint 1: Think about what happens when the centroid of a ROI which is 5 pixels wide gets moved to coordinate (0, 0).

Hint 2: Now move another ROI to (0, 1), next to the assumed 5 pixel-wide ROI at (0, 0), as mentioned in Hint 1. What do we observe?

Exercise 5.2 : Write the Function random() to Check the Conditions Identified in the Previous Exercise and Then to Randomly Move Around the ROIs

Hint 1: We can use a do...while loop to implement this randomization, where the code is executed as long as the while condition is true. We can write it in such a way that the loop keeps running (i.e. a new set of coordinates is generated) if the new set of coordinates does not satisfy the conditions defined in the previous exercise. This loop needs to be done for each ROI patch.

Hint 2: There is always a risk that, for a given arrangement of patches generated by some moment during the randomization, there may be no way to place the next patch without violating the overlapping and out-of-bounds conditions defined above, and we are stuck in an infinite loop. To remedy this, the function should be aborted if too many attempts are made. We add a testcount variable in the do...while loop to force an exit when it looks like there will be no solution, based on the number of placement attempts. At this point, we will need to restart the randomization process by creating a new empty image and starting over. The hope is that we will not repeat the same arrangement that led us to the infinite loop in the first place.

Exercise 5.3 : Write Down the Steps to Generate a Distribution of Summary Min-Max Ratios

We use the new, randomized set of coordinates of the patches and compute the pairwise distances as well as the degree of organizedness with our previously coded functions, DistanceAnalysis() and MaxMinRatio().

Once we have a randomized distribution of min-max ratios, we can compute a p-value here by counting the number of r values greater than the actual summary r and dividing it by the total number of random r values generated overall. When we talk about p-values, we enter into the realm of hypothesis testing in statistics, which means we are making a call on whether or not we reject a null hypothesis. The null hypothesis in our case is that the min-max ratio we calculate from the patches in the cropped eye image is not in an extreme arrangement, meaning particularly large or small ratio. By generating random arrangements of patches like we just did, we are generating a distribution of min-max ratios where the arrangement is mostly ’random’, whereas we get extreme arrangements when clustering or ’organized’ patterns of patches occur, which result in extreme min-max ratios. Therefore, if our p-value is small (e.g. \(p < 0.05\)), we reject the null hypothesis, i.e., the hypothesis that the patch arrangement in our cropped eye image is in some sort of extreme arrangement.

7.11 Step 5. Exporting the Calculated Metrics into Tables

We create three different tables containing:

  • the different calculated metrics for the overall cropped eye image (e.g. number of patches, average patch size, crowdedness, etc.;

  • the area of each patch for each analyzed eye in such a way that distribution of areas can be retrieved;

  • the intensity of each patch for each analyzed eye.

There are two ways of making a table. We either add a new row each time an image is analyzed or store the different arrays in such a way that they can all be put in a table as columns at the end, when all the images are processed. We use the former because we work with one eye image at a time and the speed of the analysis is not our main concern here. We will create different functions to fill up the tables. There is nothing difficult there besides keeping track of the indices. These steps are addressed in the main code using the functions fillFinalTable and fillAreaTable.

7.12 Step 6. Batch Processing and Further Considerations

Now that we have all the bricks, building the full macro should not be too much of an issue, besides a few additional thoughts. To summarize the whole workflow we presented in this chapter, please revisit Fig. 7.2.

Exercise 6: Write the Step-by-Step Instructions for Batch Processing, Considering the Comments Collected in the Hint Below

Hint: We present a couple considerations to be kept on mind when stringing together the functions in batch processing.

7.12.1 A Fly Head Has Two Eyes

A very obvious first point: a fly has two eyes and both matter for the analysis. We have one input, the entire fly head image, which will turn into two when we crop the eye regions of interest. We want to ensure that the indices are consistent when filling the final table with the analysis results. In the table, the image of the head of the first fly with the index 0 (in the list of head images) will give two lines in the final table, one for the left and one for the right eye (i.e. rows 0 and 1). The second head image with the index 1 will give two new rows of results (i.e. rows 2 and 3) and so on.

7.12.2 Eyes That Have Zero or One Patch

When we come up with a general solution for a problem, we should also give thought to special cases that could potentially lead to errors, and handle those cases separately. For example, sometimes we encounter eye images that do not have patches or have only one patch. This would present a problem in computing pairwise distances because there cannot be a pair of patches. It therefore makes sense that, when only one patch is found or there are no patches, we do not calculate pairwise distances, or anything downstream that depends on the distances in the workflow.

Keep in mind we would need to add to our tables with results a NaN or N/A for these special cases when a measurement cannot be computed. Otherwise, we will encounter a mismatch of indices between the image being analysed and where the calculated metrics need to go in the tables.

7.12.3 Step 6.3 Batch Processing into Multiple Folders

Batch processing is rather simple once we build an array containing the paths for the files to analyse. The command getFileList(path) is doing exactly that and therefore a simple for-loop going through the array of files will work fine:

figure e

The idea here is also that the folder ONLY contains images and no other files, otherwise errors will occur if ImageJ encounters a file type it cannot process. It can be more challenging to analyse files found in multiple folders. Therefore, it should be carefully thought out how to store images when generating them so that we do not need to spend time arranging and rearranging files for running image processing/analysis workflows. Additional conditions can be added to check if the file has the right extension, size, etc.

figure f

As we want to build an array with all the paths for the files to analyze, we need to know how many files there are to initialize properly the array. We simply use a function with the same structure, but only to count the number of files to analyze as shown in the code above. Finally we can link everything with the right initialization of the different variables.

7.13 Visualizing Results: Presentation and Discussion

At the end of the workflow, we will get three tables:

  • Area Distribution.csv contains the area of each patch for each analysed eye;

  • Intensity Distribution.csv contains the average intensity of each patch for each analysed eye;

  • FinalTable.csv contains the different metrics collected along the way (Number of Patches, Average Size, Average Intensity of the whole cropped eye image, Percentage of Area, Crowdedness, Ideal Ratio of distances, Deviation from Ideal, Deviation from Random Distribution)

The tables are a convenient output from Fiji. However, a graph can speak a thousands words (or numbers). We suggest using plotting packages in R, such as ggplot2, or in Python, such as the seaborn library or if one is extra ambitious and nitpicky, the matplotlib library. Plotting in Python is covered in Chapter 2 of this book. Graphs are a quick way to see the outcome of the macro we just coded. We can visually compare the metrics such as the number of patches, the average size, the ’crowdedness,’ the average intensity or even the degree of organization (i.e. deviation from ideal ratio of the maximum and minimum distances or the deviation of the measured ratio from our randomized distribution). The distribution of the average intensity per patch shows differences between the two mutants.

In addition to running the here developed workflow, the authors also scored, by visual inspection, the cropped eye regions using the Likert scale approach, to compare to the metrics generated by the workflow. Figure 7.7 illustrates our metrics generated by our coded workflow. The plots reflect the same patterns (or inverse pattern in patch intensity) as our manual scoring (top left strip plot). Having a coded workflow enables to tweak different parts of our workflow and rerun the code quickly and as many times as we want. This is not reasonable to do manually when the number of images to analyse is large. These plots in Fig. 7.7 validate what our eyes are able to perceive. We successfully translated our subjective perceptions into objective quantities!

Fig. 7.7
figure 7

Strip plots illustrating the similarity in patterns between manually curated Likert scores (top left) and various metrics generated from coded workflow. Mutant 1 is represented in blue, and Mutant 2 in orange. The red dots are the average of the population

Take-Home Message

We were able to extract and utilize relevant information from color (brightfield microscopy) images of fly eyes, to produce quantitative metrics for position-effect variegation.

To do so, we learned to use the existing functions and create our own metrics to adapt to our very own problem. We learned useful tricks such as how to pass the content of a variable from the function to the main body of the code.

This workflow can be adapted for problems beyond fly eyes. For example, understanding the spatial organization of objects in a given space is widely applicable in situations that include the foci in nuclei, lipid droplets in cells, endosomes in the cytoplasm, bacteria in an enclosed environment, etc.

The overall goal of this work was to translate our subjective qualitative observations (i.e. how patchy is this fly eye) into reproducible, quantifiable, fine-scale metrics (e.g. number of patches, intensity, organization,...) so that we can be more objective in measurements. This allows us to run more nuanced quantitative analysis to derive more precise conclusions in future studies.

Solutions to the Exercises

Each solution is included in the respective subfolder in the code repository associated with this chapter and book. The macros are intended for educational purposes and have been slightly tweaked from the full macro code, to work as stand alone exercises.

Exercise 1: Write a Cropping Macro

The First Cropping Function

figure g

The Second Cropping Function

figure h

Repeat for the other side

figure i

The More Advanced Cropping Function

The first lines of this version are exactly the same as the second version; the difference appears in the very last lines, where we take the decision to crop, recrop or not recrop.

figure j

Exercise 2 : Write the TrianglePacking Function

figure k

Exercise 3.1 : Write the Steps to Iterate Through All Pairwise Patches

The first loop needs to run n-1 times and the second loop needs to run one iteration less than the first loop and so on, until considering the final distance between the object n-1 and the object n. The second loop needs to start at the index of the first loop.

figure l

Exercise 3.2: Build in a Way to Keep Track of Distance Calculations

figure m

Exercise 3.3 : Write the Code to Calculate Pairwise Distances

figure n

Exercise 4.1 : Compute a Summary of Organizedness

figure o

Exercise 4.2: Compare the Min-Max Ratio with Theoretical r Values to Assess Organizedness

figure p

Exercise 5.1 : Come up with the Two Conditions That Must Be Checked When We Move Around ROIs

  • The new ROI will fit in the window

    • If the ROI goes over the boundary of the image, its area will be less than the actual ROI area.

  • The new ROI will not overlap with any other objects.

    • If the ROI overlaps with another region that has been validated and filled with a maximum value (255 in a 8-bit image), the average intensity of the this new ROI will be less than 255, it will be easy to conclude that it is not at a valid position.

Exercise 5.2 : Write the Function random() to Check the Conditions Identified in the Previous Exercise and Then to Randomly Move Around the ROIs

figure q

The working code FunctionRandomize.ijm can be found in the subfolder 5_Shuffle.

Exercise 5.3 : Write Down the Steps to Generate a Distribution of Summary Min-Max Ratios

figure r

The actual code can be found in the full macro between the lines 147 and 164.

Exercise 6: Write the Step-by-Step Instructions for Batch Processing

figure s