MowJoe: a method for automated-high throughput dissected leaf phenotyping
Accurate and automated phenotyping of leaf images is necessary for high throughput studies of leaf form like genome-wide association analysis and other forms of quantitative trait locus mapping. Dissected leaves (also referred to as compound) that are subdivided into individual units are an attractive system to study diversification of form. However, there are only few software tools for their automated analysis. Thus, high-throughput image processing algorithms are needed that can partition these leaves in their phenotypically relevant units and calculate morphological features based on these units.
We have developed MowJoe, an image processing algorithm that dissects a dissected leaf into leaflets, petiolule, rachis and petioles. It employs image skeletonization to convert leaves into graphs, and thereafter applies algorithms operating on graph structures. This partitioning of a leaf allows the derivation of morphological features such as leaf size, or eccentricity of leaflets. Furthermore, MowJoe automatically places landmarks onto the terminal leaflet that can be used for further leaf shape analysis. It generates specific output files that can directly be imported into downstream shape analysis tools. We applied the algorithm to two accessions of Cardamine hirsuta and show that our features are able to robustly discriminate between these accessions.
MowJoe is a tool for the semi-automated, quantitative high throughput shape analysis of dissected leaf images. It provides the statistical power for the detection of the genetic basis of quantitative morphological variations.
Plant leaves are critical for survival as they are the primary site of photosynthesis. Leaf shape and size show tremendous variation between species, which is assumed to be the result of adaptive evolutionary processes tinkering with leaf shape to allow best performance in particular ecological niches [12, 14, 20, 23, 36]. Therefore, plant leaves have attracted scientists from diverse disciplines to study ecology, evolution, development and patterning mechanisms [5, 17, 21, 22, 27, 34, 40, 41].
Qualitative descriptions of leaf shape, traditionally used for species classification, are insufficient to characterize the developmental and genetic factors underlying phenotypic variation. Rather, it is necessary to quantitatively describe the geometric features of leaf shape and perform shape analysis [6, 26]. Easily accessible measures include length, width, perimeter and area. A more refined morphometric analysis, however, is based on the extraction of multivariate shape features. Typically these methods analyse the relative position of landmarks—homologous points identified in each leaf sample—or sequential positions along the leaf outlines or combinations of the two. These approaches are collectively referred to as geometric morphometrics. Examples of methods based on outline analysis are Eigenshape analyses and elliptic Fourier analysis [19, 24, 31].
In order to perform quantitative shape analyses of dissected leaves in larger populations, we have developed the MowJoe analysis pipeline. As compared to entire leaves, the shape of dissected leaves is more challenging to image processing algorithms. Instead of reporting merely morphological features of the entire leaf, one needs to cut the dissected leaf into pre-defined, phenotypically meaningful parts beforehand. A recent approach  relies on the assumptions of circular leaflets and the symmetric positioning of leaflets along the rachis to build a leaf shape model. Leaflets are then segmented by Active Contours. Another approach searches for concavity points in the leaf and partitions the leaf based on these points  or deletes the rachis of the leaves by fitting a polynomial curve . The drawback of these methods is their sensitivity to violations of leaflet convexity. These may occur naturally or through imaging artifacts like fissures in the leaf produced by its fixation to a plain surface.
Recently, skeletonization of an image was combined with morphological operations in order to measure the length of the branches in rice panicles . Informally, the skeleton is a reduced, one-dimensional representation of a leaf through its “central” points (see Fig. 2c). We use skeletonization to derive marker points on the skeleton of a leaf, which mark, e.g., its leaflets. In contrast to , we exploit the fact that skeleton points located within a leaflet have a larger distance to the outline of the leaf than skeleton points lying in the rachis or the petiolule. This allows the fast and reliable determination of the points on the skeleton which separate leaflets from petiolules and rachis. Based on this algorithm, we have developed MowJoe, a software tool that segments dissected leaves into phenotypically meaningful units and calculates morphological features for the whole leaf as well as for the individual leaflets, the petiolule, the inter-rachis, the petiole and the terminal rachis. Additionally, it determines landmarks and outlines of the terminal leaflet that can directly be used as input files for downstream shape analysis software, such as MorphoJ , Eigenshape Analysis  and R shapes . We apply MowJoe to two different accessions of the model plant species Cardamine hirsuta  and demonstrate the potential of MowJoe to identify leaf shape variation in dissected leaves.
Image acquisition and binarization
We obtained digitized 2D color images by scanning leaves of the fifth node. In order to improve the processing time, the images were rescaled by a factor of 25% using a bicubic interpolation (Fig. 2a).
Foreground pixels were extracted by using a method similar to . The image was converted from RGB to HSV space and a 2-means clustering of the pixels in the saturation-value (SV) space was performed. This initial clustering gave an estimate of the intensity centroids of foreground and background pixels in SV space, which served as an initialization of a 2-multivariate Gaussian mixture model on the SV values of the image pixels. The covariance matrices of that model were initialized as scaled identity matrices, the scale chosen as the standard deviation of the whole data set. From the resulting segmentation, the largest connected foreground (green) component was kept (Fig. 2b). The Gaussian mixture method outperformed other common methods such as Otsu thresholding  applied to the grey scale or the green component of an image (Additional file 1: Fig. S1).
Graph representation of a leaf
The binarized leaf is converted into a skeleton representation. Skeletonization of a binary image is a standard procedure in image processing . Informally, it calculates one-dimensional summary of a 2-dimensional binary image (Fig. 2c). Technically, a distance transform is applied to the image first. This operation assigns to every leaf pixel x the Euclidean distance d(x) to the nearest background pixel. The distance value d(x) can be thought of as the radius of the largest circle centered at x which touches the border of the leaf. Those points whose circle touches the border at least twice are the skeleton pixels (Fig. 2c).
Afterwards, the skeleton is pruned further by removing small branches whose length falls below a threshold of 25 pixels (corresponding to 4.5 mm. This value can be adjusted arbitrarily in the MowJoe GUI). The skeleton is then converted into a graph with one node for every skeleton pixel and one edge for every pair of neighboring pixels in the skeleton. This graph representation of the skeleton simplifies further processing.
This yields an easy criterion to robustly identify the correct cut point: it is the (first) node k which minimizes M(k). There might be several paths ending in the same leaflet (e.g., see the terminal leaflet in Fig. 2c), possibly leading to different cut nodes for the same leaflet. However in practice, these cut points always agreed. In case of disagreement, we suggest to use the cut point with the smallest value.
Dissection of leaf components
The cut nodes serve for the identification of individual leaflets in the binarized image. To define the boundary of a leaflet, we choose b1, the point on the image boundary at minimum distance to a given cut node c. Next, we define b2 as the boundary point at minimum distance to c with respect to the constraint \(d(b1,b2)>d(c,b2)\), i.e., b2 lies on the “opposite side” of b1 (see Fig. 2d). The line between b1 and b2 separates one leaflet from the rest of the leaf. This finally results in a separation of the leaf into petiole, rachis, petiolule and leaflets (see Fig. 2e). In order to verify the accuracy of our method, we manually measured the rachis and the petiolule of 5 leaves (Additional file 1: Fig. S2) and compared the results to MowJoe’s results. The mean petiolule length was 26.3 pixels (4.6 mm) as measured manually. The mean deviation between MowJow and manual measurements of petiolule length was 2.9 pixels (0.5 mm), corresponding to a mean relative deviation of 11%. For the rachis, the mean length was 368.4 pixels (64.8 mm), and the mean deviation was 2.6 pixels (0.46 mm), corresponding to a mean relative deviation of 0.7%. All in all, manual and MowJoe measurements were in good agreement and did not show any systematic differences.
Morphometric shape analysis
Morphometric shape analysis allows a quantitative description of shape additional to one dimensional size measures. This multivariate analysis makes use of biological homologous points so called landmarks that determine shape. In order to calculate homologous landmarks for each leaf in the MowJoe software, we first searched the top point for every leaf. To do so, we calculated the line from the leaf bottom node to the terminal leaf cut node. The intersection of this line with the terminal leaf outline determined the top point.
Comparison of combined leaf features
We generated a unified leaf representation for each accession, by an affine mapping of all the leaf’s marker points into one coordinate system. This transformation is defined by mapping the leaf base point (lower rachis end) to the origin of the 2d plane, and mapping the top point of the leaf to (0,1) on the y-axis. The overlay of all leaf images of one accession creates the so-called “metaleaf”, which provides an overview of the morphological variety of that accession (Fig. 5a). On the metaleaf, the difference between the accessions becomes obvious (Fig. 5a). The Ox accession has larger distances between branching nodes and cut nodes but smaller distances between cut nodes and leaflet centers.
In order to define a low-dimensional Euclidean space that satisfactorily captures the morphological variety of entire leaves, we merged whole leaf features, like leaf area and leaf perimeter, with features derived from individual leaflets. We only included features of the first three leaflets, as the leaves have different numbers of leaflets and we wanted to keep the feature sets comparable. We then performed principal components analysis for further dimensionality reduction. The leaves of the two C. hirsuta accessions are clearly separated in this morphological leaf space, indicating that our features extract relevant information (Fig. 5b).
Several tools for leaf size and shape measurements have been developed [2, 4, 7, 33, 42]. However, dissected leaves are more complex, they consist of several distinct morphological units that need to be identified accurately. We developed an image processing algorithm that is able to extract these units from the entire leaf.
It first identifies the leaf component, extracts its skeleton and identifies potential cut nodes using the distance transformation, and in the end selects the optimal cut nodes by a loss function. We applied this algorithm to leaves of the plant C. hirsuta, a model organism for dissected leaf development that moved into focus due to its leaf shape and its close relationship to A. thaliana .
As the processing of a single leaf image takes only a few seconds, our method is applicable in high-throughput applications where a large variety of measurements for thousands of leaves have to be taken. Based on the separation into the phenotypically interesting units we calculate several morphological measurements. These measurements include shape parameters of the leaflet, like leaflet area, eccentricity and perimeter. Additionally, our algorithm analyses the local position of a leaflet in the whole leaf, e.g. the distance of the leaflet from the rachis (petiolule length) or the distance of the leaflet from the terminal leaflet (inter-rachis, terminal rachis length). This is a complete representation of all morphological features of a dissected leaf, which are useful for a wide spectrum of applications—e.g for mapping the genetic basis of variation in individual or combined shape features, investigation of leaf shape plasticity in responses to environmental variables or clustering of mutant phenotypes.
To test the power of our algorithm we applied it to the two C. hirsuta accessions Nz from New Zealand and Ox originating from Oxford. We show that entire leaf features as well as features for individual leaflets are able to discriminate between the two accessions. Our open-source software is a versatile tool that enables QTL analysis of leaf morphological variation in large mapping populations.
Plant growth conditions
The two C. hirsuta accessions Ox and Nz were grown under long day conditions in the greenhouse. The leaf of the fifth leaf node was harvested at identical developmental state—at flowering when the inflorescence was 15 cm of height. It was digitized with the Epson V700 Photo scanner at 600 dpi.
Image processing and feature detection
Image skeletonization and Gaussian mixture clustering were applied using the particular methods in the Matlab image processing and statistics toolbox. Shape features for the whole leaf and the leaflets were calculated using the standard Matlab methods. In order to calculate the petiolule length, two points at the start and the end of the petiolule were calculated. These points were defined by the cut of the line between opposite border points b1 and b2 and the skeleton. The petiolule was afterwards determined by counting the number of skeleton pixels (graph nodes) between these points. The rachis length was calculated by counting the number of skeleton pixels from the bottom point to the intersection between terminal leaf border points and the skeleton as explained above. The inter-rachis distance was calculated by searching in the graph the nearest branching node with the same orientation in direction of the terminal leaflet. Shape analysis was carried out by MorphoJ . Comparison of different accessions was performed using discriminant factor analysis in MorphoJ. Manual measurements were determined using the software ImageJ .
Performance and scalability
A set of 60 images was processed, consisting of 28 images of the Nz accession and 31 images of the Ox accession. The images had a resolution of 600 dpi, resulting in a image size of \(4200\times 1200\) pixel. Processing of an average image took in total about 14 s on a MacBook Pro (1.4 GHz Intel Core i5, 4 GB RAM). The segmentation of the whole leaf component of a single image by Gaussian Mixture clustering took about 1.5 s. The identification of crossing points and cut points and the segmentation of single leaflets took about 2 s. Calculation of the features and generation of the output plots took the remaining time (about 10 s).
Software and availability
All analysis steps were implemented in Matlab and are combined in the software MowJoe. This software tool provides a rudimental graphical user interface in which a folder with leaf images can be processed. The software tool, as well as the Matlab source code and raw and processed data were published according to  and can be found at https://github.com/Henrik86/Mow_Joe ( https://doi.org/10.5281/zenodo.1181810).
HF, NK and AT developed the MowJoe software and analysed the data. JL and MC performed the experiments. AT and MT initiated the research and designed the experiments. HF, JL, AT, MT wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claimsin published maps and institutional affiliations.
- 1.Arora A, Gupta A, Bagmar N, Mishra S, Bhattacharya A. A plant identification system using shape and morphological features on segmented leaflets: Team IITK, CLEF 2012. In: CLEF 2012 evaluation labs and workshop, Online Working Notes, Rome; 2012.Google Scholar
- 11.Cerutti G, Tougne L, Mille J, Vacavant A, Coquin D. A model-based approach for compound leaves understanding and identification. In: IEEE international conference on image processing; 2013.Google Scholar
- 15.Dryden IL. shapes package. R Foundation for Statistical Computing, Vienna, Austria, 2015. Contributed package, Version 1.1-11.Google Scholar
- 20.Givinish TJ. Ecological aspects of plant morphology: leaf form in relation to environment. Theor Plant Morphol. 1978;27:83–142.Google Scholar
- 22.Hay AS, Pieper B, Cooke E, Mandáková T, Cartolano M, Tattersall AD, Ioio RD, McGowan SJ, Barkoulas M, Galinha C, Rast MI, Hofhuis H, Then C, Plieske J, Ganal M, Mott R, Martinez-Garcia JF, Carine MA, Scotland RW, Gan X, Filatov DA, Lysak MA, Tsiantis M. Cardamine hirsuta: a versatile genetic system for comparative studies. Plant J. 2014;78(1):1–15.CrossRefPubMedGoogle Scholar
- 23.Klein LL, Caito M, Chapnick C, Kitchen C, O’Hanlon R. igital morphometrics of two north american grapevines (vitis: Vitaceae) quantifies leaf variation between species, within species, and among individuals. Front. Plant Sci. 2017;8:373.Google Scholar
- 26.Krieger JD. Controlling for curvature in the quantification of leaf form. In: Elewa AMT, editor. Morphometrics for nonmorphometricians, volume 124 of lecture notes in Earth Sciences. Berlin: Springer; 2010. p. 27–71.Google Scholar
- 28.Kumar N, Belhumeur PN, Biswas A, Jacobs DW, Kress WJ, Lopez IC, Soares JVB. Leafsnap: a computer vision system for automatic plant species identification. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C, editors. Computer vision – ECCV 2012. Lecture notes in computer science, vol. 7573. Berlin: Springer; 2012. p. 502–16.CrossRefGoogle Scholar
- 32.MacLeod N. Generalizing and extending the eigenshape method of shape space visualization and analysis. Paleobiology. 1999;25(1):107–38.Google Scholar
- 33.Maloof JN, Nozue K, Mumbach MR, Palmer CM. LeafJ: an ImageJ plugin for semi-automated leaf shape measurement. J Vis Exp. 2013;71:e50028.Google Scholar
- 35.Mzoughi O, Yahiaoui I, Boujemaa N, Zagrouba E. Multiple leaflets-based identification approach for compound leaf species. In: Vrochidis S, Karatzas KD, Karppinen A, Joly A, editors. Proceedings of the 1st international workshop on environmental multimedia retrieval co-located with ACM international conference on multimedia retrieval, EMR@ICMR 2014, Glasgow, volume 1222 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. p. 53–60Google Scholar
- 37.Otsu N. A threshold selection method from gray-level histograms. Automatica. 1975;11(285–296):23–7.Google Scholar
- 39.Saeed K, Tabedzki M, Rybnik M, Adamski M. K3m: a universal algorithm for image skeletonization and a review of thinning techniques. Int J Appl Math Comput Sci. 2010;20(2):317–35.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.