1 Introduction

Image-based phenotyping has become an integral part of many plant biological studies, assisting researchers in extracting and exploiting information implicit in collected image data. The focus can vary from specific plant organs [1, 2] to whole individual plants [3]. In this work, we are interested in the below-ground portion of the plant, its root system. Plants rely on their roots for water and nutrient uptake, which largely determine their performance and development [4]. We focus on the analysis of multiple interacting plants, as their root systems can facilitate either cooperative or competitive interactions. This is achieved, e.g. by influencing the composition of the bacterial flora in the rhizosphere, which may positively affect the nutrient availability, or by competing for (limited) resources [5].

When roots are to be examined, they are usually either destructively removed from their environment [6] or grown in artificial media [7], which may alter their natural growth behaviour due to the lack of complex biological, chemical and physical properties usually found in soil [8]. An alternative solution that allows roots to be imaged in soil is provided by X-ray micro computed tomography (\(\upmu \)CT), which is becoming increasingly accessible [9]. An additional advantage to its non-disruptive characteristic [10] is the acquisition of three-dimensional volumetric image data, which supports more accurate quantification of root system traits. Plant root systems are complex, highly branched structures, composed of many individual roots of varying size. The recovery of the fine and complex structure of plant roots from \(\upmu \)CT image data is a challenging problem. The process is complicated by the highly heterogeneous growth environment, composed of minerals, soil particles, organic matter, water and air filled pores.

We present a visual object tracking framework that allows the extraction of interacting plant root systems from their soil environment in \(\upmu \)CT image data. The paper both provides a detailed elaboration of previous single root mechanisms [11, 12] and describes their extension to multiple interacting root systems. A given data volume can be horizontally sliced into thin cross-sections to obtain a stack of images. Using a level set method guided by a greyscale intensity distribution model, we identify the boundaries of root cross-sections in each image. When traversing these images in sequence, root objects will appear at slightly different positions due to the root’s slanted growth through the soil environment. The architectural structure of plant root systems is recovered by tracking individual root cross-sections through a sequence of image slices.

Tracking is achieved using an adaptive appearance model and readjusting the interface of the level set function to the new location and outline of the root object. In the presence of multiple root systems, multiple trackers can be used but root cross-sections need to be distinguished from one another in order to allow correct labelling of neighbouring plants. However, because all root objects are likely to have similar greyscale intensity values, their appearance models can be expected to be similar, or even identical. If two or more independently tracked targets interact, their trackers can easily drift away to the object that best fits the model [13]. This can result in uncontrolled behaviour in which trackers switch their targets or follow the same target, while losing hold of others. During root extraction, this can lead to root cross-sections being assigned to incorrect root systems. To address the target coalescence problem, a shape constraint is added to the evolving interface of the level set function during target interactions.

This paper is an extension of [14], in which we present additional experimental work on the recovery of interacting root systems of wheat plants. The resulting data are used to identify spatial characteristics in relation to neighbouring plants, to facilitate the examination of resource competition. In what follows we briefly overview related work on the extraction of root-structure-like networks with a focus on X-ray CT (Sect. 2) and give a detailed description of our proposed method (Sect. 3). The extraction method is first applied to volume data of individual and then of multiple interacting root systems of winter wheat Cordiale (Triticum aestivum L.) (Sect. 4), followed by discussion and conclusions (Sect. 5).

2 Related work

Using a high-energy X-ray CT scanner, Heeraman et al. [15] endeavoured to image and quantify the root systems of plants grown in sand culture. In this they were among the first to show that roots can be separated from non-root material on a computational basis and not just by human assumption of the presence of roots. A number of voxels were manually selected to provide samples of different components (air, roots, sand). These were tested for normality and used to statistically classify the remaining voxels to one of these groups. The method does not guarantee connectivity and outlier voxels can easily be assigned to incorrect components. Seeking to advance imaging and analysis procedures, Lontoc-Roy et al. [16] presented methods and results obtained using X-ray CT for soil-root studies. Roots were segmented from the images by visually choosing lower and upper threshold values. The resulting segmentation included primarily larger roots. In a second step, an iterative three-dimensional region growing method was used, appending voxels connected to the initial extraction, but which also fall within a second, wider, threshold boundary. A similar approach is reported by Perret et al. [17]. A predefined threshold boundary was applied, after which a 26-neighbour connectivity constraint was imposed. While this guarantees connectivity of the root system, thresholding only gives satisfactory results if the greyscale values of different components do not overlap. This is often not the case.

In Pierret et al. [18], image slices were first segmented using a combination of thresholding and a top-hat filter [19]. By superimposing two consecutive images, the resulting root cross-sections were tested for continuity, while roughly defining the roots’ skeleton. Since elliptical objects were prone to artefacts, they were ignored in the analysis, which had the disadvantage of missing horizontally and near-horizontally growing roots. The authors were aware of this limitation, but considered it a reasonable compromise, leaving the method useful for preliminary investigations. Quantitative measurements were made based on the extracted skeletons. To overcome the limitation of thresholding for overlapping greyscale intensity distributions, Kaestner et al. [20] applied a non-linear diffusion filter multiple times with different parameters to smooth out the texture of the surrounding sand. As a result, the intensity distribution of root material was shifted to the tail of the sand distribution, making Rosin’s unimodal thresholding algorithm applicable [21]. To remove misclassified voxels, a dilation by reconstruction operation [22] was applied to eliminate speckles, while preserving thin root segments and enforcing connectivity of the root system. Filtering the data does not always result in the distribution of root material being shifted to the tail of the background distribution. The effect depends on the condition and composition of the soil matrix. While the methods presented by Pierret et al. [18] and Kaestner et al. [20] make use of thresholding to perform an initial crude segmentation, additional rules are applied to help decide whether an extracted object reflects the characteristics of a root segment.

An alternative approach to the non-invasive study of the interaction of root systems of different plants grown in soil was recently presented by [23], who demonstrated the combined use of magnetic resonance imaging (MRI) and positron emission tomography (PET) to image root systems of two maize plants grown in a single soil column. The sample was imaged using MRI to visualise the root systems and to separate them from the surrounding soil. \(^{11}\hbox {CO}_{2}\) was then inducted to the shoot of one of the plants, which was taken up and transferred to the root system, providing a radioactive label. The radioactivity was measured using PET and the resulting data co-registered with the structural description of the root system recovered from MRI. This methodology has the potential to identify and analyse individual root systems in an environment shared by multiple plants.

More recently, Metzner et al. [24] performed a direct comparison of the ability of X-ray CT and MRI to support the extraction of roots of 3-week-old bean plants from their soil environment in a variety of pot sizes. Both imaging methods allowed roots to be extracted. The ability to tune the MRI process to specific materials meant that it provided images with much higher contrast between roots and soil, easing root detection, particularly in larger pots. These images were, however, of much lower resolution than those obtained from X-ray CT, and the acquisition process requires the soil used to be heavily processed, destroying its natural structure. Although the extraction of roots from X-ray CT is challenging, and interactive methods were employed by Metzner et al. [24], a successful automatic solution would provide more accurate descriptions of root system architectures than MRI, in their natural environment.

Using an electron beam X-ray CT scanner, Sonka et al. [25] presented a method able to identify airway trees in lungs, which share a similar structure with plant root systems. Analogous to the method presented by Lontoc-Roy et al. [16], a conservative threshold was employed within a three-dimensional region growing algorithm to recover the primary tree of the airway structure, but typically missed fine, smaller diameter segments. To improve performance on small airways, the image was scaled by a factor of 2 and enhanced using a top-hat transform [26]. Using edge-based region growing, the enhanced image was segmented into airways, vessels and background (corresponding to dark, bright and intermediate greyscale values). A rule-based analysis capturing prior knowledge of the anatomical structure of airways and their relationship with pulmonary vascular trees, was used to refine the segmentation. Although prior knowledge of root system structures could be useful in their recovery, linking root segments to their environment is not straightforward.

An alternative method for the extraction of airways from electron beam X-ray CT image data was presented by Aykac et al. [27]. Their method is based on mathematical morphology, which was also a key component in Kaestner et al.’s method [20]. A greyscale morphological reconstruction was used to identify local minima in cross-sectional images, which are likely to correspond to fine airway segments. The image was then thresholded using a relative value lying between the minimum and maximum greyscale values. This process was repeated a number of times using differently sized morphological structure elements. The union of all candidate regions was used to reconstruct the airway tree. While morphological operations can enhance fine details in the image data, it cannot completely overcome the limitations of threshold-based segmentation. In addition to methods based on region growing [25] or mathematical morphology [27], solutions were proposed that use a tracing strategy [28].

Tracing was also found to be successful in the extraction of three-dimensional and root-structure-like networks outside of X-ray CT imaging. Flasque et al. [29] for instance, used magnetic resonance angiography (MRA) to image cerebral blood vessels and developed a centreline tracing-based method for their extraction. The centreline was traced stepwise, with successive points being estimated by searching within an orientated parallelepiped around previously identified points. Rules, like the definition of a maximum allowable curvature, were imposed on each search area. A rule-based approach allows the specification of a profile that is based on prior knowledge. To deal with the detection of junctions or branches, the number of entry and exit points along the surface of each parallelepiped is noted. By the definition of a continuous vessel, a parallelepiped must have exactly one entry and exit point. If more than one exit point is detected, then the presence of a junction is assumed, and a new starting point is created. In a final step, all traced centreline points are connected using B-spline curves. A common problem when tracing centrelines is the possibility of loops being formed due to interactions with other vessels or irregularities in the image data. An alternative approach was presented by Wilson and Noble [30]. To extract the vascular network from the image data, an adaptive expectation maximization (EM) algorithm was presented that recursively divides the volume into smaller sub-volumes, within which a localised segmentation was performed. The parameters of the distributions identified within a sub-volume indicate which tissues are present, and support the classification of individual voxels. Variation in signal intensity is expected for arteries, but not for the cerebrospinal fluid and brain tissue. In this, the data differ from soil-root samples, where the soil environment is found to be highly heterogeneous. Other complex root-structure-like networks are found, for instance, in neuronal arborescences [31].

3 Method

In this section we give a detailed description of the proposed extraction technique, beginning with the extraction of a single individual root system (i.e. assuming that all root cross-sections belong to the same plant) [11, 12]. We introduce each of the components and show how they are integrated into the tracking framework. A collision detection mechanism is then added to identify the interaction of multiple targets, to which a shape constraint is imposed, allowing the separate extraction of multiple interacting plant root systems [14]. The objectives of the work reported here are to

  • identify the boundaries of root cross-sections,

  • track individual root cross-sections and

  • keep root cross-sections arising from different plants separate.

3.1 Object boundary detection

We adopt the level set framework [32] to search for the boundaries of root cross-sections. We aim at finding the interface

$$\begin{aligned} C (t)=\{(x,y)|\varPhi _{x,y,t} = 0\} \end{aligned}$$

of a time-dependent function \(\varPhi _{x,y,t}\) that separates an object consisting of comparable intensity values from its heterogeneous background. The interface of \(\varPhi _{x,y,t}\) can be implicitly propagated by solving a partial differential equation

$$\begin{aligned} \frac{\partial \varPhi _{x,y,t}}{\partial t} + F |\nabla \varPhi _{x,y,t}|= 0 \end{aligned}$$

which can be approximated and rewritten using a finite forward difference scheme in time

$$\begin{aligned} \frac{\varPhi _{x,y}^{t+1}-\varPhi _{x,y}^{t}}{\varDelta t} + F | \nabla _{x,y}\varPhi _{x,y}^{t}| = 0 \end{aligned}$$

giving a general formulation of the time-discretised level set method, with F being a speed function that defines the motion of the front over time t. One possible way to find the boundary of an arbitrary object is to define a speed function that stops at high image gradients. A solution based on the formulation presented in [33] was tested, but failed to correctly identify root objects: blurred and low contrast boundaries are common in CT data. A solution is therefore proposed that evolves a level set function guided by a greyscale intensity distribution model [11]. Assuming we have the greyscale intensity values of a known root object, we use a kernel density estimator to build a statistical probability density function, which we will refer to as our root appearance model \(p_{m}\)

$$\begin{aligned} p_{m}(x) = \frac{1}{nh} \sum ^{n}_{i=0} K\left( \frac{x - x \left( i \right) }{h} \right) , \end{aligned}$$

where n is the number of data points, x(i) the samples of the greyscale distribution, h the bandwidth and K a Gaussian smoothing kernel \(K(x) = \frac{1}{\sqrt{2\pi }} \mathrm{e}^{-\frac{1}{2} x^{2}}\). Using the Jensen–Shannon (JS) divergence [34] as given in Eq. 5, we compute the distance between a probability density function \(p_{f}\) estimated around the interface of the level set function and our known root model \(p_{m}\)

$$\begin{aligned} \hbox {JS}(p_{f}, p_{m}) = H(w_{1} p_{f} + w_{2} p_{m}) - w_{1}H(p_{f}) - w_{2}H(p_{m})\nonumber \\ \end{aligned}$$

where H is the Shannon entropy function calculated as in Eq. 6 over the range of the probability density function of possible greyscale values n. \(w_{1}\) and \(w_{2}\) are two weighting parameters \(w_{1},w_{2} \ge 0, w_{1}+w_{2}=1\) used to balance the contribution of the two statistical probability density functions and useful for conditional probability studies where the weighting parameters represent prior probabilities. In our case, however, we set \(w_{1}=w_{2}=0.5\).

$$\begin{aligned} H(p) = -\sum ^{n}_{i=0} p_{i}\log _{b}(p_{i}) \end{aligned}$$

The JS divergence is a non-negative and symmetric dissimilarity measure, bounded by \([0, \log _{b} 2]\). Using a logarithm of base 2 results in a distance that is measured within [0, 1], where 0 is considered a complete match between two probability density functions. The higher the value of the JS divergence, the lower is the probability that the data come from the same distribution. These properties, and the fact that the dissimilarity measure is not constrained by the number of samples and their shape of the distribution, make the JS divergence a good choice for our application. Given the above definitions, we can now build them into a level set framework

$$\begin{aligned} \varPhi ^{t+1}_{x,y}= & {} \varPhi ^{t}_{x,y}+ \varDelta t [-(\alpha )(\hbox {JS}_{\beta \vee }\nabla ^{+} +\hbox {JS}_{\beta \wedge }\nabla ^{-})+(1-\alpha )(\kappa )],\nonumber \\ \end{aligned}$$

where \(\hbox {JS}_{\beta \vee } = \max (\lceil \beta -\hbox {JS}\rceil ,0)\) and \(\hbox {JS}_{\beta \wedge } = \min (\lfloor \beta -\hbox {JS}\rfloor ,0)\) are the propagation forces, with \(\beta \in [0,1]\) defining the acceptance distance of the JS divergence between model and data distribution. \(\alpha \in [0,1]\) is a weighting parameter between the propagation force and the curvature dependency \( \kappa = \nabla \cdot \frac{\nabla \varPhi ^{t}_{x,y}}{|\nabla \varPhi ^{t}_{x,y}| }\) of the front. The numerical solution requires a difference scheme to be chosen that propagates information in the direction upwind to the moving interface. This is achieved through \(\nabla ^{+}= [\max (D^{-x},0)^{2}+\min (D^{+x},0)^{2}+\max (D^{-y},0)^{2}+\min (D^{+y},0)^{2}]^{1/2}\) in case of an expanding force and similarly through \(\nabla ^{-}= [\max (D^{+x},0)^{2}+\min (D^{-x},0)^{2}+\max (D^{+y},0)^{2}+\min (D^{-y},0)^{2}]^{1/2}\) for a contracting force, where \(D^{+x} = \frac{\varPhi _{x+\varDelta x, y, t}-\varPhi _{x,y,t}}{\varDelta x}\) is the forward difference operator and \(D^{-x} = \frac{\varPhi _{x, y, t}-\varPhi _{x-\varDelta x,y,t}}{\varDelta x}\) the backward difference operator in x, and respectively, \(D^{+y}\) and \(D^{-y}\) in y. The interface evolves until the front converges to a stationary solution, which is checked by counting the number of sign changes of the level set function. This has further the advantage that the evolution process can be terminated even if the front oscillates [32]. The level set framework is implemented using the narrow band strategy [35] for increased efficiency and the fast sweeping method [36] for re-initialisation. Figure 1 shows a cross-sectional image in which root objects are identified and separated from their complex and heterogeneous soil environment using the method described above.

Fig. 1
figure 1

Cross-sectional image showing a raw data and b raw data with root objects identified

3.2 Tracking root objects

Target objects are selected for tracking by the user manually setting seed points in the first (top) image in the stack. An initial root appearance model is built for each target from the greyscale intensity values within a 5 pixel radius. To provide a good initial root model, seed points should be placed to capture as many valid pixels as possible. Seed points can be placed anywhere within large root cross-sections. The seed points also mark the initial interface of the propagating level set function, which is evolved until the root object boundaries are identified. Since a level set function can implicitly represent multiple interfaces, a classical two-pass connected component algorithm [37] is used to assign a label to each root cross-section. Labels are propagated when constructing the narrow band around an interface and it is therefore possible to evolve the level set function using different appearance models for each root object. This means that we do not have a single model that represents all the root objects in a plant at the same time, but several models that are generated, each representing a single target (root segment).

Once the boundaries of root objects are identified, the aim is to track these target objects through a sequence of horizontal slices, or images, building up a three-dimensional segmentation of the root system. Due to the high resolution of X-ray \(\upmu \)CT data, we assume that corresponding root locations in consecutive images partially overlap, and that their greyscale intensity distributions vary smoothly. Some variation is to be expected due to the heterogeneous environment of varying density materials and the unevenly distributed water content in both the soil and the root system, which can affect the estimated X-ray attenuation values making up the voxel data. Therefore, as root objects are tracked through the image sequence, their assigned root model distribution must be updated to adapt to their changing appearance. This is done by re-computing the root appearance model from the greyscale intensity values enclosed by each of the converged interfaces of the level set function.

Updating the root model is an inevitable step, yet it conceals potential problems. Noise or small areas of background might be included within the interface and so contribute to its probability density function. These errors can accumulate and result in a model that is no longer an appropriate representation of a tracked root object. To reduce the potential of model drift, we use a complex Fourier shape descriptor [38] to compare the shape of a root object in pairs of consecutive images and only update the root appearance model when the sum of squared differences of their filtered and normalised power spectra is below an empirically determined threshold.

A root system is composed of several branching roots. Splitting of a root boundary as it branches throughout the image stack is implicitly dealt with by the level set’s ability to adapt to changing topologies: as the level set interface evolves from one state to another, it can split into multiple disjoint interfaces. When a target object separates, the level set evolves based on the same root model, but will become two distinct objects with their own, independently updating root appearance model after proceeding to the next image slice. Figure 2 shows a sequence of cross-sectional images in which root objects are tracked from manual initialisation on the (single) root stem.

Fig. 2
figure 2

A sequence of cross-sectional images, taken from a single CT stack at 40-slice intervals, with tracked root cross-sections highlighted. Tracking is initialised manually on the (single) root stem. Note the large number of distinct targets tracked by a single level set, and the changing shape of the tracked region (inset images)

The tracker described so far, as it follows a root object through the image stack, will only capture roots that branch and grow downward, along the search direction. Any upward oriented roots will be missed, since they appear in the image stack before they connect to an identified target object. To address this, an additional step is introduced, allowing the tracker to ‘look back’ at the previously analysed image, using any of the currently identified targets to search for new root objects that have not been detected before. If such objects are found, they are temporarily labelled as potential upward growing roots, while the extraction process is continued downwards until the end of the image stack is reached. If, at the end, one or more objects have been labelled as upward oriented roots, then the image stack is reversed and each of the labels picked up by the tracker and followed as if they were downward oriented. Jumping to assigned labels avoids re-examination of the entire stack and further tracking of previously identified roots. This process of alternating direction is repeated until all targets have been examined and no new labels remain [12].

3.3 Multiple interacting objects

To extract multiple root systems, a level set tracker is initialised to each plant and their level set functions evolved simultaneously. In this work we adopt the concept of multiple level set functions as presented in [39]. Let \(\varPhi ^{t}_{A}\) and \(\varPhi ^{t}_{B}\) be two level set functions and their interfaces occupy two different regions at time t. The level set functions evolve separately, based on their individual root appearance models, resulting in a temporary state of \(\varPhi ^{*}_{A}\) and \(\varPhi ^{*}_{B}\). \(\varPhi ^{*}_{A}\) and \(\varPhi ^{*}_{B}\) are then combined to obtain the level set functions \(\varPhi ^{t+1}_{A}\) and \(\varPhi ^{t+1}_{B}\) at time \(t+1\). The combination of the temporary level set functions depends on whether or not the interface of A can penetrate the interface of B, or vice versa, and as such pushes back the adjacent interface. Assuming that A can penetrate B, but B cannot penetrate A, then the new level set function at time step \(t+1\) will be updated according to

$$\begin{aligned} \varPhi _{A}^{t+1}= & {} \varPhi _{A}^{*} \nonumber \\ \varPhi _{B}^{t+1}= & {} \max (\varPhi _{B}^{*},- \varPhi _{A}^{*}). \end{aligned}$$

The following examples show how the rule in Eq. 8 solves the interaction between the level set functions \(\varPhi _{A}^{*}\) and \(\varPhi _{B}^{*}\). To recall, a level set function has negative values inside and positive values outside of its interface. There are different possible situations when updating \(\varPhi _{B}^{t+1}\), taking \(\varPhi _{B}^{*}\) and \(\varPhi _{A}^{*}\) into account. First, a point may appear inside both \(\varPhi _{A}^{*}\) and \(\varPhi _{B}^{*}\), so both level set functions have negative values. As \(\varPhi _{B}^{t+1}\) is obtained by taking the maximum of \(\varPhi _{B}^{*}\) and \(- \varPhi _{A}^{*}\), and the negative value of \(\varPhi _{A}^{*}\) is turned into a positive value, the point is assigned to the outside of the interface \(\varPhi _{B}^{t+1}\), while \(\varPhi _{A}^{t+1}\) remains negative. The effect is that A pushes away the interface of B. Now let us assume a point that is neither part of the interface of A nor B. This means that both values are positive. By placing the minus sign in front of \(-\varPhi _{A}^{*}\), the positive value becomes negative, but because of the maximum operator, the updated value for \(\varPhi _{B}^{t+1}\) remains positive, and therefore is not affected by the level set function \(\varPhi _{A}^{*}\). Finally, let us assume that a point is inside the interface of B but outside of A. Because the value of a level set function represents the distance to its interface, the negative value of \(- \varPhi _{A}^{*}\) will be less or equal the negative value of \(\varPhi _{B}^{*}\), and again, the result for \(\varPhi _{B}^{t+1}\) at that point remains unaffected by the level set function \(\varPhi _{A}^{*}\).

These examples show how interacting level set fronts can be controlled: this rule can be modified to define similar rules such that during an encounter of two level set fronts, neither is allowed to penetrate the other. This will stop them from advancing further and give an exact partition of the two regions at the front of collision. The mechanism of multiple fronts can easily be extended to any number of level set functions using the same principles of combination. Each evolving front in the set must be compared to all other level set functions in the same set. This allows easy identification of any collisions between interfaces and determination of which of the level set functions interact. Figure 3 shows three different scenarios in which two level set functions (front A (red) and front B (blue)) are evolved until their fronts interact with each other, at which point different combination rules are applied. This is a key element in the extraction of multiple interacting root systems, but not sufficient to allow separation of those root systems. While the combination rules allow individual trackers to be separated, the true boundary between touching root cross-sections remains unknown. Although level set functions can penetrate each other’s interface, there is no definition given yet of when these rules are to be applied. For this, shape information is used to estimate the boundary of root objects and so to find the intersecting front between them.

Fig. 3
figure 3

Two level set function A (red) and B (blue) interacting with each other, where a front A penetrates front B, b front B penetrates front A and c neither A nor B is penetrated (color figure online)

While tracking target objects through the image stack, their shape is noted and used to control appearance model updates. We can, therefore, easily recall an object’s outline and store the most recent shape information seen before the interaction with other objects began. This information is kept until the interaction ceases. Let \(U = \lbrace {\mathbf {u}}_{i} | i=1\ldots N_{u}\rbrace \) be a set of data points along the outline of a stored shape and \(V = \lbrace {\mathbf {v}}_{i} | i=1\ldots N_{v}\rbrace \) be a set of data points along a level set’s interface. The rotation matrix \(\mathbf{R }\) and the translation matrix \(\mathbf{T }\) are sought which minimise the root mean squared distance between U and V and therefore find the best alignment of the two point sets. This can be achieved using the iterative closest point (ICP) algorithm [40]. By calculating the centre of mass \({\varvec{\mu }}_{\mathbf{u}}\) and \({\varvec{\mu }}_{\mathbf{v}}\) of the two point clouds, it is possible to determine the cross-covariance matrix \(\hbox {cov}_{uv}=\dfrac{1}{N_{u}}\sum _{i=1}^{N_{u}} [({\mathbf {u}}_{i}-{\varvec{\mu }}_{\mathbf{u}})({\mathbf {v}}_{i}-{\varvec{\mu }}_{\mathbf{v}})^\intercal ]\) for U and V. Using the cyclic components \({\mathbf {a}}=(A_{23},A_{31},A_{12})\) of a matrix \(\mathbf{A }= \hbox {cov}_{uv}- \hbox {cov}_{uv}^\intercal \) allows the definition of a \(4\times 4\) matrix \(\mathbf{Q }\)

$$\begin{aligned} \mathbf{Q }_{4 \times 4}= \left( \begin{array}{c@{\quad }c} \hbox {tr}(\hbox {cov}_{uv}) &{} {\mathbf {a}}^\intercal \\ {\mathbf {a}} &{}\hbox {cov}_{uv}+ \hbox {cov}_{uv}^\intercal -\hbox {tr}(\hbox {cov}_{uv}){\mathbf{I }}_{3} \end{array}\right) \end{aligned}$$

The eigenvector \({\mathbf {r}}=(q_{1} \quad q_{2} \quad q_{3} \quad q_{4})\) of the matrix \(\mathbf{Q }\) with the maximum eigenvalue is used to define the rotation matrix \(\mathbf{R }\)

$$\begin{aligned} \mathbf{R }= \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} q_{1}^{2}+ q_{2}^{2}- q_{3}^{2}- q_{4}^{2} &{} 2(q_{2} q_{3}- q_{1} q_{4}) &{} 2(q_{2} q_{4}+ q_{1} q_{3}) &{} 0 \\ 2(q_{2} q_{3}+ q_{1} q_{4}) &{} q_{1}^{2}+ q_{3}^{2}- q_{2}^{2}- q_{4}^{2}&{} 2(q_{3} q_{4}- q_{1} q_{2}) &{} 0\\ 2(q_{2} q_{4}- q_{1} q_{3}) &{} 2(q_{3} q_{4}+ q_{1} q_{2}) &{} q_{1}^{2}+ q_{4}^{2}- q_{2}^{2}- q_{3}^{2} &{} 0\\ 0 &{} 0 &{} 0 &{} 1 \end{array}\right) \nonumber \\ \end{aligned}$$

The vector \({\mathbf {t}}=({\varvec{\mu }}_{\mathbf{v}}-\mathbf{R }{\varvec{\mu }}_{\mathbf{u}})\) is used to define the translation matrix \(\mathbf{T }\)

$$\begin{aligned} \mathbf{T }= \left( \begin{array}{c@{\qquad }c@{\qquad }c@{\qquad }c} 1 &{} 0 &{} 0 &{} t_{1} \\ 0 &{} 1 &{} 0 &{} t_{2} \\ 0 &{} 0 &{} 1 &{} t_{3} \\ 0 &{} 0 &{} 0 &{} 1 \end{array}\right) \end{aligned}$$

The ICP algorithm is initialised by setting the rotation and translation matrices equal to the identity matrix \(\mathbf{R }=\mathbf{T }=\mathbf{I }\) and begins by identifying for each point \({\mathbf {u}} \in U\) the best match with the shortest distance \(d({\mathbf {u}},V)=\min _{{\mathbf {v}} \in V}||{\mathbf {v}}-{\mathbf {u}}||\). This step can be efficiently performed using a k–d tree [41]. With the set of matching pairs as input, the best registration is calculated using the quaternion-based least square method, determining \(\mathbf{R }\) and \(\mathbf{T }\) which are then applied to U. The whole process is repeated iteratively, finding new matching points and their transformation, until the change in mean squared error falls below a given threshold.

When the interfaces of two level set functions collide, and each is made impenetrable, race conditions are generated, as illustrated in Fig. 4. This, however, can be solved using shape constraints. The ICP algorithm, as described above, is used to find the best alignment of the stored shape to the evolving interface. This leaves each point within the interface in one of two possible states: it is either outside or inside of its aligned region. Let \(S=\{S_{1}\ldots S_{n}\}\) be the enclosed areas of each aligned shape to its corresponding level set function, \(L=\{\varPhi _{1} \ldots \varPhi _{n}\}\) be the set of level set functions at time t and \(L^{*}=\{\varPhi _{1}^{*}\ldots \varPhi _{n}^{*}\}\) the set of their temporary states, then the final value of the level set function \(\varPhi ^{t+1}_{i}\) at time step \(t+1\) and position p is updated accordingly

$$\begin{aligned} \varPhi _{i}^{t+1}= {\left\{ \begin{array}{ll} \varPhi _{i}^{*} &{} \text {if} \left( p \in S_{i} \right) \wedge \left( p \notin \{S \backslash S_{i}\} \right) \\ \max \left( \varPhi _{i}^{*},- \{ L_{j} | p \in S_{j} \} \right) &{} \text {if} \left( p \in S_{i} \right) \wedge \left( p \cap \{S \backslash S_{i}\} \ne \emptyset \right) \\ \max \left( \varPhi _{i}^{*},- \{L^{*}\backslash \varPhi _{i}^{*}\} \right) &{} \text {if} \left( p \cap S_{i} = \emptyset \right) \end{array}\right. }\nonumber \\ \end{aligned}$$

A particular benefit of this solution is that, while it constrains the movement of the front, the selected root object is not required to maintain the registered shape. This allows the detection of lateral roots, since a level set function can still evolve beyond the aligned region. At the same time it prevents the path of a level set function being blocked by faster evolving level sets and allows their interface to be penetrated so that control over its target is maintained. The effect of adding shape constraints to the level set functions is illustrated in Fig. 4. Figure 5 shows a sequence of images in which tracked root cross-sections interact with each other.

Fig. 4
figure 4

Two colliding target objects; a raw (artificial) data, b objects extracted using the conventional level set tracking approach and c when the original method is combined with the ICP algorithm during the period of contact (5–9)

Fig. 5
figure 5

A sequence of cross-sectional images with multiple and interacting target objects tracked and highlighted. Images are selected at irregular intervals of 20–40 slices, to best capture the interactions between roots

4 Experiment

Winter wheat Cordiale (Triticum aestivum L.) were grown in eight columns of 30 mm in diameter filled with soil. The seeds were germinated in Petri dishes on wet filter papers, covered with an aluminium foil to shield them from sunlight, and planted after two days. A single seed was placed in four of the 30 mm columns, of which two were filled with loamy sand and another two with clay loam. Two seeds were placed, approximately 10 mm apart, in the remaining four columns, each filled with loamy sand. The soil was air-dried and sieved to \(<\)2 mm before being packed into the columns. The plants grew in environment-controlled growth rooms with a 16/8 h light cycle at a temperature of 23/18 \(^{\circ }\hbox {C}\) and were scanned 10 days after germination. The water status of the samples at the point of imaging was approximately at field capacity.

The imaging device used in this experiment was a Nanotom (Phoenix X-ray/GE Measurement and Control Systems) X-ray \(\upmu \)CT scanner. Scanning of the 30 mm columns was performed at 120 keV and 250 \(\upmu \)A, taking 1200 projections at an exposure time of 750 ms, using a signal averaging of 3 and 1 skipping per projection. A 0.1 mm Cu filter was used to harden the beam. Samples were placed 200 mm away from the X-ray gun, resulting in a volume with resolution of 25.0 \(\upmu \)m voxel size and an image stack of \(1400\times 1400\times 2200\) voxels. The acquired volume data were saved to a stack of 8-bit images. The tracking framework proposed here was used to recover the root systems from the image data. Seed points were selected manually in the first image of each stack to mark target objects and to initialise separate trackers to each of the root systems. The time needed to recover the root systems from the CT images depends on the size of the data and the number of root objects being tracked. The root systems in this experiment were extracted within four to five hours on an Intel Core i7-3820 3.60 GHz processor, using only a single core due to the implementation. Since the root systems are extracted by analysing one image slice at a time, the allocation of memory is kept to a minimum, even for large data volumes.

Although achieved here via visual tracking, the recovery of plant root systems from X-ray \(\upmu \)CT data is effectively a segmentation task. Segmentation methods are typically evaluated either by quantitative comparison to a ground truth dataset or by assessing their ability to support some higher level task. When seeking to distinguish roots and soil, ground truth data are difficult to obtain. The size and complexity of \(\upmu \)CT scans of these heterogeneous samples mean that manually generated ground truth is expensive and may not be definitive. Similarly, though simulation of CT images is possible, artificially generated data is not entirely representative of real root/soil samples [42]. The practical goal of the work reported here is robust root phenotyping—the recovery of quantitative measurements of root system traits that can be used to assess and guide the development of new, higher performing plant variations. Assessment against a real phenotyping task requires a large number of samples to be produced and analysed against a specific biological question. The methods presented here are a key component of the University of Nottingham’s recently opened Hounsfield Facility [43] where such experiments are now underway. Description of this work is, however, beyond the scope of the current paper. For the present we demonstrate the performance of the proposed techniques through a smaller pilot experiment.

Figure 6 shows rendered images of the extracted root systems of the individually grown wheat plants. The average diameter of roots that were extracted from the data was 15 to 30 pixels. After scanning the samples with X-ray \(\upmu \)CT, they were root-washed free of soil, placed on a water tray and imaged with a flatbed scanner at 400 dpi. The resulting two-dimensional images provide a reference for the 3D descriptions extracted from the X-ray data. Comparison of the figures shows that the architecture of the root systems has been largely recovered, capturing its main shape and structure. It has proven difficult to recover all of the fine lateral roots of the root system at this resolution of scanning. While some might not be visible in the image data, due to their small size, others might be present, but not necessarily shown as connected due to disruptions caused by small image irregularities.

Fig. 6
figure 6

Extracted root systems of wheat grown in a, b loamy sand and c, d clay loam, (x.1) imaged for comparison with a flatbed scanner and (x.2) the rendered root systems extracted from X-ray \(\upmu \)CT data using \(\alpha \), \(\beta \), \(\gamma \) and \(\delta \) for alignment reference. The root systems in (x.1) once extracted from the soil, lost their three-dimensional geometry information, while still preserved in (x.2)

Figure 7 shows images rendered from the data extracted from a scan of two interacting wheat plants, with the root systems highlighted in different colours.

Fig. 7
figure 7

Extracted root systems of two interacting wheat plants highlighted in red and blue, respectively (color figure online)

Although roots came into contact with their neighbouring root system, they were correctly associated with their originating plant. Note that the proposed approach is not limited to two interacting root systems within a sample, but can be applied to an arbitrary number of plants. This number, however, is restricted by the size of column suitable for the X-ray \(\upmu \)CT system used and the ability of its X-ray gun to penetrate the sample. To demonstrate the method on a more sophisticated dataset, we have prepared a slightly larger sample treated under the same conditions as the previous ones, but comprising three wheat plants grown in a column 60 mm in diameter. The scan was performed at 130 keV and 200 \(\upmu \)A, taking 1440 projections at an exposure time of 1000 ms, using a signal averaging of 4 and 1 skipping per projection. A 0.2 mm Cu filter was used to harden the beam. The sample was placed 220 mm away from the X-ray gun, resulting in a volume with resolution of 27.5 \(\upmu \)m voxel size and an image stack of \(2100\times 2100\times 2260\) voxels. The root systems from the sample are shown in Fig. 8 extracted. Due to the larger sample size, the number of projections and exposure time had to be increased to produce reasonable image data, even though the amount of noise in these data was higher compared to the other scans.

Fig. 8
figure 8

Extracted root systems of three interacting wheat plants a all together, b highlighting the first, c the second and d the third root system

4.1 Spatial characteristics

Motivation for the recovery of plant root systems from X-ray \(\upmu \)CT data comes from a pressing need to analyse the spatial characteristics of root systems, particularly in relation to their neighbouring plant(s). Root growth is driven by apical meristems, which are groups of cells found close to root tips, formed either during the embryonic development of primary roots or in the primordium of lateral roots [44]. The root system’s further exploration of the soil environment therefore derives from existing root tips, which confines our attention.

To support the recovery of quantitative data on root system traits and interactions, each root system’s skeleton is extracted from the volumetric segmentation data produced by the methods presented above. An inverted 3D Manhattan distance map, beginning from the root walls and extending over the root cross-sections, is computed for each root system. The local minima of each root cross-section are then determined and used to identify a set of control points which are input to a non-parametric regression model using a linear piecewise curve-fitting function. Optimal smoothing parameters are found through cross-validation, finding a smooth and continuous representation of the root skeleton. Root tips are found at the ends of the skeleton, where root cross-sections are entirely enclosed by root walls (i.e. where all distances of a root cross-section are equal to zero). This ensures that root tips are labelled at the outer layer of the roots and not within the roots, away from their surface boundary. The result is a nested tree structure expressed in Root System Markup Language (RSML) format [45].

A vantage point (VP) tree [46] is generated for each of the skeletonised root systems and used in conjunction with the detected root tips to find the minimum distance from each of the tips to their neighbouring root system(s). Plant root systems that compete for localised accumulated resources are expected to show an increased number of close distances as the majority of roots are likely to grow towards the same location. The opposite is expected for root systems that avoid each other’s presence and would mostly grow towards unoccupied areas in the soil. A more substantive conclusion on plant root competition for resources could be derived from temporally acquired data, as it supports analysis of when and how distances change over time. Analysis of time series data and biologically focused studies of multiple root systems and their characteristic traits will be the subject of future work; for the present we demonstrate only the ability of our methods to provide information on root interactions. To this end, Fig. 9 visualises the computed minimum distances between root systems, represented as translucent spheres centred at root tips.

Fig. 9
figure 9

Extracted root systems of two interacting wheat plants highlighted in red and blue, respectively. The centre of the spheres is determined by the root tips identified from the skeleton and the radius is determined by the minimum distance to their neighbouring root system (color figure online)

5 Discussion and conclusions

We have presented a visual object tracking framework for the extraction of plant root systems grown in soil from X-ray \(\upmu \)CT volume data, allowing the recovery of both individual and multiple interacting plant root systems. The method proposed here uses a modified level set framework that is guided by a greyscale intensity distribution model to find the boundaries of root cross-sections. The appearance model is updated to adapt to variations in the greyscale intensity values of the target object. The interface of a level set function is continuously readjusted to locate the new position and outline of the target objects in subsequent images. After following root cross-sections through the image stack, the resulting information is used to reassemble the complete root system of a plant.

In the presence of multiple root systems, multiple trackers are deployed, but need to be able to keep their targets distinguished from each other. This is challenging since root cross-sections are likely to share similar, if not identical, greyscale intensity distributions and hence the appearance model used by the trackers is not enough to keep the objects separate. Shape constraints are therefore added when objects interact, and help lock the trackers to their correct targets.

The method proposed here was tested on root systems of winter wheat Cordiale (Triticum aestivum L.), using data showing both individual and multiple interacting root systems. Results show that the proposed technique can successfully recover and separate plant root systems from each other.

As more mature plant root systems are examined, larger columns are needed to provide enough space for the root system to explore the soil environment. When using larger samples, scan resolution will be compromised, resulting in more disjoint root segments. While at present an adaptive appearance model is used by the tracking framework, its motion model is still very simplistic, relying on the assumption that root cross-section will partially overlap in consecutive images. This assumption might not hold if larger samples are used. Hence a more sophisticated motion model will be required. Another compromise in using larger sample sizes is that more fine lateral roots will become unidentifiable due to the reduction in resolution. These issues will be the subject of future reports.