1 Introduction

Increasing food security now and for the future relies heavily on identifying and understanding beneficial phenotypes in crop plants. A phenotype is a visible feature of an organism. Some phenotypic variations are desirable improvements in agricultural crops, such as increased disease resistance, better yield in poorer soils or under drought, and improved nutritional content. All these phenotypes vary tremendously among members of a species, and all are targeted for improvement by many investigators. Current agricultural methods will be insufficient to keep pace with the projected growth in population and the need for improved nutrition [13]. A step change in the yield and quality of food crops, and in the sustainability of their production, is urgently needed. Nearly all of the world’s food is grown in farm fields. World-wide vegetable production in greenhouses in 2015 totalled 414,127 ha (1.02 million acres), while in 2014 in the United States alone, acreage harvested for just corn and soybeans totalled 67.26 million hectares (166.2 million acres) [4, 5]. Research has shown that greenhouse experiments are poor predictors of field performance for grain yield and drought tolerance [6, 7]. So agronomically important phenotypes must be studied in field experiments: for food, it is the field that counts.

The foundation of crop improvement is to detect and characterize potentially beneficial phenotypes. This is done in experimental fields around the world, each of which is planted with hundreds or thousands of genetically different varieties of a crop. In this situation, the desired plant may be only one out of tens or hundreds of thousands in a field. Today, experienced observers, working alone or in small teams, scrutinize thousands of plants in a single season, albeit with inter- and intra-observer variations [8, 9]. Many agronomically important phenotypes are signaled by changes in the plant’s morphology (the size, shape, color, and spatial position of the plant and its organs) over time [10]. Monitoring these phenotypes requires either human examination or imaging of plants in situ. Automating the capture and analysis of plant images would spare human effort for more complex tasks and improve the quantitation of the phenotypes. But as we will see, field plants do not pose nicely for the camera: they crowd together, irregularly occlude each other, grow unevenly, shade parts of each other and the soil, lay on the ground, and hide their phenotypically informative organs. Substitutes for human expertise, even for relatively simple tasks, will require algorithmic approaches that can cope with such issues.

Because phenotyping involves the analysis of multiple individual plants, the machine vision challenges posed by crop fields used in genetic and agronomic research are quite different from those presented by fields in production agriculture or yield trials. In the latter, a field will be planted with a single variety that will exhibit much less phenotypic variation than in the genetically diverse populations of research fields. The difference between the two is illustrated in Fig. 1.

Fig. 1
figure 1

Two different field situations. a A research field. Each row is planted with a genetically and phenotypically different variety of maize, as evidenced by differences in height and color. A row of shorter maize is marked with a red triangle; a row of maize with yellow–orange lesions is marked with a white triangle; and a row of taller maize is marked with a cyan triangle. b A production agriculture field planted with a single variety of maize. In this field, average leaf angle could probably be estimated from measuring the angle between a vertical axis and planes defined by each leaf (especially those in the upper half of the field), similar to the work of reference [11]

In a uniformly planted field, many parameters can be measured “disembodied and in bulk”: the average values of the greens for chlorophyll content; or the average position of the blue/green sky/plant boundary for plant height; or the average deviation of a leaf from the vertical for leaf angle [1116]. Traditional assessments of crop health by aerial and satellite vehicles rely on existing techniques, which are now being applied to ground-based images in production situations. Active research, detailed in an excellent recent review and elsewhere in this issue, focusses on phenotyping in greenhouses, where the problems of occlusion and image standardization are much less acute [17]. In contrast, field phenotyping, still in its infancy, is a new frontier for machine vision [6, 18, 19].

Here, we offer our views on three challenges for research in computer vision that are common to many phenotyping tasks in the field. Our perspective is that of agronomists, geneticists, and computational biologists who photograph maize plants and phenotypes in the field for characterization, and extract phenotypic information from the images [20, 21]. We selected these challenges based on three sources of information. The first source is the methods described in the literature of plant genetics and breeding (for example, reference [22]). Second, we have had many conversations with our plant science colleagues who work with many different species, but especially maize, on their phenotyping tasks and which ones they would most like to have automated. We have also been fortunate to watch our colleagues at work in the field. Finally, our own work in maize has involved many of the phenotyping tasks mentioned in this paper, and thinking about how one might automate these tasks led us to consider the machine vision challenges that would need to be surmounted. Nonetheless, the selection and abstraction of the challenges is ours alone, and our interlocutors are blameless.

The challenges are:

  • Disambiguation Individuating plants from the mass of green is essential for many phenotyping tasks: one must know which plant has what phenotype. We define disambiguation as the algorithmic segmentation of one plant from its neighbors.

  • Assignment As the plants grow, the appearance of a field changes from orderly rows of physically well-separated little plants to a jungle of leaves, stems, and reproductive organs. Which body parts are from the same plant? We define assignment as the algorithmic assembly of visually separated organs into the correct plant.

  • Identification Scientists use thousands of archival images in the literature and databases to identify the phenotypes they see in the field and to determine when a phenotype may be novel. We define identification as the detection and classification of phenotypes by comparing a plant’s features to those of related plants and to archival images.

We illustrate the challenges with images of maize fields photographed from the ground. The images show what can be readily captured today, using either human photographers or with minimal automation. The images shown here are downsampled from the high- resolution ones posted online. The images in the online material were shot with either Nikon D80 with an AF MicroNikkor 60 mm lens (denoted DSLR in the figure legends), an iPad2, or unknown cameras. These images, and those posted online, are not the traditional data sets that one might use in algorithm development and testing. Collecting large data sets that combine experimentation with imaging techniques and technologies and include the collection and annotation of ground-truth information requires adequately supported collaborations between biological and computational scientists. Realistically, the scale and complexity of the data set workers in machine vision need are simply beyond the capacity of biological scientists to “squeeze in” during the very busy field season without compromising funded experiments. Instead, our goal is to illustrate the challenges in field phenotyping and provide enough images to let those working in computer vision see if the problems are interesting enough to pursue in collaboration with biological scientists. We believe that the best work in field phenotyping will require sustained, long-term, and mutually beneficial collaborations, and we wish to encourage those. The good news is that many biologists are seeking help with image processing and phenotyping tasks.

Fig. 2
figure 2

Disambiguation of a typical maize plant at flowering. In (a), the plant in its field context, with rows behind it and weeds around it. In (b), the plant has been partially isolated using a cloth background, an unusual and laborious photographic technique. In (c), the plant has been manually isolated in the image by selecting it and setting the remaining context pixels to off-white. The male reproductive organ, the tassel, is indicated by the red triangle, and the female reproductive organ, the ear, by the blue triangle. Photographed with a DSLR

We first describe maize to provide some context for the challenges. A typical plant is shown in Fig. 2. Z. mays is a major crop world-wide; many phenotypes are visible to the eye; and the plants are large enough that intra-plant spatial differences are easily detected [22]. Over a hundred years of intensive study of this important cereal crop, mostly in farm fields, has identified many genetically different varieties of maize. Their phenotypes vary widely in size; shape; color; number, placement, and types of organs; the rates at which the plant grows, develops, and dies; the yield of kernels and other useful parts; and their responses to different environments [23] (In the United States, maize is colloquially called “corn”, a term used elsewhere in the world for any cereal grain). Starting algorithm development with maize is particularly advantageous because the plants are larger, less densely planted, and more distinct from many weeds than rice or wheat.

2 Three challenges for computer vision

2.1 Disambiguation of plants

Disambiguation segments plants, or key plant organs, from the mass of green in the field. Two common field tasks are to count the number of crop plants in each row and to detect unusual stem shapes. Consider the images in Fig. 3.

Fig. 3
figure 3

Rows of maize shot from two different vantage points. a Looking diagonally across several rows of maize from a fortuitously empty spot in the field. b Looking along a row. c The same image as in (a), now with three very closely spaced plants in the foreground row marked with a red triangle, and an example of a zigzagged stem is marked with an cyan line. This particular field was planted by machine; irregular spacing of plants will routinely occur with either manual or machine planting. d The same image as in (b), but now with red triangles marking the first four stems. In all panels, the emerging ears are covered with white or striped shoot bags; the brown paper bags at the top cover tassels; and smaller weeds are visible. Photographed with a DSLR

The two different vantage points—diagonally across multiple rows and along a row—balance different types of plant occlusion against confusion by background plants. Shooting multiple rows reduces occlusion within a row, but does not eliminate it: the foreground row contains three very closely spaced plants (red triangle). Segmenting stems by selecting for contiguous, vertical dark green areas would need to adapt to zigzagging of the stem (cyan line), and the occasional very bent stem (see images 1/series/DSC_0449–451. NEF, posted online, for an example). Similarly, the rows in the image are parallel to each other, so that the angle of the line defined by the intersection of the rows’ stems and the soil is relatively constant, once the background weeds are eliminated. Determining in which row a plant lies might require some estimate of depth of field in different parts of the image, allowing for the varying sizes of the plants. In contrast, looking along a row increases occlusion, including from plant organs near the camera, but simplifies determining the row.

A more labor-intensive alternative is to shoot stills or video along a row from several different vantage points, and then reconstruct the row after registering the plants. Figure 4 shows one such series of images for the same row. In the images in Fig. 4, registration is simplified by the presence of shoot and tassel bags (small white and large brown, respectively). This would not be true much of the time. As the camera proceeds down the row, portions of the rows behind the row of interest appear.

Fig. 4
figure 4

A row shot from different vantage points along the row. Panel a, a view of the entire row for orientation. Panels bd show a series of close-ups shot at different vantage points within the row, starting at the right end of the row. Photographed with a DSLR

For phenotypes that can be determined in a uniform stand of plants, disambiguation can be bypassed by looking at populations of organs. An example is shown in the right in Fig. 1, panel (b). 3D reconstructions of the outer edges of soybean stands have detected changes in leaf angle without assigning leaves to plants [11]. Current approaches that detect flowering tassels rely on evaluation of the hyperspectral reflectance of the canopy [24, 25]. Nonetheless, there will be many situations where disambiguation is important; constructing crude 3D models of plants might help assign unoccluded organs. Isolated maize and rose plants have been reconstructed using photogrammetry and the Microsoft Kinect, respectively [26, 27]. The photogrammetry was obtained from consumer DSLR cameras, and the Kinect can be run on battery power (Guilherme DeSouza, personal communication).

Fig. 5
figure 5

Panel a, a row containing four plants displaying a spotted leaf phenotype, marked by red triangles. Panel b illustrates one way humans assign organs to plants: the cyan arcs trace some paths between stem and leaves for a plant of interest. Shot with an iPad2

2.2 Assignment of organs to plants

It is usually not enough to know a field contains a phenotypically different plant: biologists need to know which particular plant has the phenotype of interest. If one considers a plant as a collection of organs, then the challenge of assignment is to form the correct set of organs from an individual plant, for any number of plants. Figure 5 shows a row containing plants with an unusual leaf phenotype. In the image, the plants with spotted leaves are marked by red triangles. How many plants have spotted leaves? Which plants are those? The answers depend on associating the leaves to other plant parts, most likely stems. Two tricks humans use might be helpful in algorithm development. The first is to look at the junctions the leaves make with a stem, starting from a leaf and following its path to the stem (or other organ). This is illustrated in panel (b) of Fig. 5 for first plant in the row. The second is to wiggle a stem and watch for coupled motions of its organs. In the material posted online, we include an iPad2 video of several rows that illustrate several distinct motion components (IMG_4655.MOV). Exploiting video would require a good understanding of the relationships between different motions and the plant parts that display them [28].

Fig. 6
figure 6

Two weedy situations. In panel a, a dense growth of short weeds fills the space between two rows. In panel b, a sparse growth of weeds is mixed with two semidwarf maize plants (marked by red triangles) and normal plants. Photographed with a DSLR

2.3 Phenotype identification

Biologists depend heavily on archival images from the literature to learn to identify different species and phenotypes. Much scientific value lies in detecting novel phenotypes. Comparing archival images (or one’s memory of those) to the plants in front of one is the key visual step in recognizing phenotypes and determining their novelty.

Distinguishing weeds from crop plants is very important in production agriculture. Compared to the problem of identifying phenotypes from archival images, this simpler goal has already received considerable attention [1216]. Figure 6 illustrates two situations in which such algorithms might be applied. The panel (a) shows several clear differences between weeds and crop, including height, position, color, and plant structure. Panel (b) seems more problematic for current algorithms: the weeds are more sparsely and irregularly spaced and the maize includes plants of normal and much shorter heights.

The more challenging version of this problem is to exploit the information in archival images to identify phenotypes. Figure 7 shows two very different leaf spot phenotypes. Similar phenotypes can be found in the online resources of MaizeGDB, and we have included a link to a zip file of mutant images collected by Gerald Neuffer in the Appendix’s Table 1.

Fig. 7
figure 7

Two different leaf phenotypes. Top row in panel a, a clear spot-like phenotype; panel b, a disease lesion mimic phenotype. Bottom row archival images of the same phenotypes from MaizeGDB [29]. Panel c, a csp1-NA1173 showing the same clear spots; panel d, an lls1 mutant showing very similar lesions. The top row was shot with an iPad2; the camera(s) for the bottom row is unknown

Each biologist’s image was taken to illustrate a particular phenotype, without considering computational processing. Neither the images of Fig. 7 nor those in MaizeGDB or the literature are standardized in composition, photographic technique, or annotation. Some images isolate individual plants or organs; others include several plants in the same frame for comparison; still others show rows. Finding common and distinguishing features among large sets of images, with each phenotype represented by a relatively small number of images, will be quite challenging. Nonetheless, many fundamental elements are repeated among the images, increasing the sample size despite compositional diversity. Learning to recognize organs such as leaves, stems, tassels, and ears would open the door to identifying many phenotypes, and with refinement might be extended to smaller scale phenotypes.

3 Constraints on image collection and processing

The images shown here and in the online material are sobering from the standpoint of computer vision. The challenges described above all need algorithms that are robust to the images one can actually take today and that yield biologically useful data. Because image collection is not the primary task of the biologist, it must be simple, easy, and fast. So the most common camera used in the field is a point-and-shoot (often a smart phone). Very few image sets are consistent in composition, internal standards, photographic parameters, or lighting, and rarely do images include a calibration standard. Sustained collaborations between biologists and computer vision scientists might change this situation.

The coming era of robotic collection of images from the air and the ground will surely increase the number of images, but these images may pose similar challenges to computer vision. Already, there is considerable experimentation with large, tractor-based platforms that carry a set of sensors; traditional remote sensing; and aerial vehicles [6, 15, 30, 31]. Robots offer a wider range of imaging frequencies and techniques, opening new algorithmic possibilities [15, 32].

3.1 Not every photographic issue can be ameliorated

Many things that would simplify a computational problem change or eliminate the phenotypes of biological interest. For example, increasing the space among plants to simplify disambiguation decreases plant height: crowded plants must grow taller to capture enough sunlight [33]. Good places for diagonal shots across rows, such as in Fig. 3, are rare in most fields: plants must be held out of the line of sight (see images 333–360 in the online material for some standard corn photography trickery) or simply mown down.

Other photographic difficulties are manageable at the cost of development effort, personnel, time, machinery, or all of these. For example, irregular shading could complicate segmentation and assignment. Figure 8 shows several examples of images that are easy to collect, but could be algorithmically challenging.

Fig. 8
figure 8

Examples of irregular shading. Panel a the light green patches on the leaf vanish as the leaf is illuminated by the sun on the right. Color mottling due to uneven illumination is visible on the leaf below. The identifying tag has been whited out to preserve the investigator’s privacy. Panel b the leaf is evenly illuminated, but the soil is not; so the white–brown leaf boundary one might use to mask the leaf tends to vanish in the sunny patch. Panel c shadows cast on leaves, stems, soil, and bag by the plants could complicate segmentation. Photographed with an iPad2

3.2 The phenotype of interest strongly influences data collection procedures

In phenotype identification, the nature of the target phenotype will strongly influence image composition and the scale and rate of image collection. Subtle changes at small spatial scales suggest close-ups, simplifying masking out extraneous parts of the image. Whole-plant phenotypes, such as height, could be imaged in wide fields of view capturing multiple rows, but now the issues of disambiguation and assignment recur. Tracking changes in a phenotype over time means imaging the same plants’ organs, and achieving photographic consistency is much more laborious.

A complementary approach to field images is to remove plant organs or products and image these in the laboratory, either by photography or scanning. Such destructive sampling speeds image collection and facilitates more consistent image composition and lighting, permitting simpler algorithmic approaches. This approach has been used to size and count kernels, measure ear dimensions, or identify leaf lesions [21, 3436] and (Nathan Miller, personal communication). Specialized laboratory equipment, such as microscopes and systems to image roots grown in transparent media, is used to generate image series for morphometric measurements and 3D reconstruction [20, 37].

For any phenotype, imaging demands good engineering of the data collection regime, whether the images are stills, time-lapse, or video; and whether manually or robotically collected. Sample sizes must be adequate to ensure reasonable levels of statistical confidence in the results, so image collection procedures need to be fast enough to be feasible. Since the rate of phenotype development can vary widely, pilot experiments may be needed to determine a reasonable sampling protocol.

3.3 How much biological knowledge is really needed?

Lurking behind these challenges is the question of how much biological knowledge is needed to tune collection schemes and identify regions of interest and phenotypes. In some cases, the knowledge needed is fairly minimal. For example, approximate models of the plant’s anatomy, perhaps one for each organ or plant feature of interest, could be used to produce best fits in assignments. Robotically “feeling” imaged plants along their stems would help in tuning such models and fitting them to images. Another example is imaging along a row, where the ambiguities that must be resolved to produce good registration change. Knowing how consistent the biological structures are could help with selecting regions to align. In other cases, more knowledge of both the target phenotype and the appearance of normal plants is needed: detecting broken stems, insect bites, or cankers requires some sort of model of expected plant morphology. Shredded leaves, such as those produced by the Shr*-N2477, Shr*-N2483, and shr1-JH87 mutant alleles, exemplify how a phenotype difficult to directly image might yield to a clever proxy based on biological knowledge. In plants with these mutations, the leaf decomposes into long thin strips, joined at their ends, that occupy a large volume [3840]. Measuring the reduced green area or smaller amplitude, higher period motions in the volumes the leaves are expected to occupy might be good proxies for detecting the shredded phenotype.

Changes in multiple dimensions may signal a phenotype of interest or be the result of normal plant development. Which combinations of dimensions are most informative varies with the phenotype, and may not be fully generalizable. Since even genetically identical plants do not look or behave exactly the same, recognizing the significant variations requires the biologist look at many plants and remember their appearance, in the context of the known biological relationships. Such knowledge of inconsequential variation could be useful in thresholding changes.

3.4 Use by biologists

Diffusion of algorithms that solve these challenges into the biological community will hinge on how easily they can be incorporated into field workflows. High-throughput phenotyping depends just as much on organizing the entire workflow and maintaining the provenance of physical objects and data as it does on computer vision [6, 30, 41]. Currently, one difficulty is that workflow management for field experiments is in its infancy. Field phenotyping magnifies the organizational challenges compared to greenhouse-based systems, which usually include pre-packaged workflow and data management systems [42]. There have been several generations of both workflow and interoperability systems, but to the best of our knowledge their application so far has been limited to molecular data collected in the laboratory [4348].

In the face of such moving targets, a brief description of the non-image information biologists collect may provide some perspective. Data collection, transfer, storage, pre-processing, phenotype extraction, and generation of quantitative data are essential steps in the phenotyping workflow. Data include locational information on the fields, rows, and plants (both GPS and relative positions); reference points for measurements; weather and other environmental data; field sensors; genotypic and physiological data from the laboratory; and detailed protocols for collecting each type of data. The ability to easily cross-reference data, images, and descriptions from other projects and servers around the world will be increasingly important. The present state of the art is mostly clicking, with model organism databases supplying some cross-referencing as their resources permit (Mary Schaeffer, personal communication). All of these require planning on the front end to determine the structure of the data collected and the desired connections to be made; to preserve provenance information throughout the workflow; to maximize the scalability of the databases and computation servers; to define the quantified phenotypes; and to ensure all participants are trained. Shared cyberinfrastructure, such as the iPlant project, will prove crucial in support and in training investigators [49].

4 Online image sets

In cooperation with several maize geneticists, MaizeGDB, and iPlant, we have made several sets of images available. Table 1 of the Appendix lists the URLs, photographic subject, one or more computational challenges one might explore with these images, and the images’ contributors. Navigating to the root URL will show either a directory of image files (nearly all) or the zipped file (the Neuffer phenotype images at MaizeGDB).

While the images do not include benchmarks, they should provide a preliminary venue for experimentation when considered with this paper. We have included a variety of ground-based images, and MaizeGDB’s images are annotated with phenotypic descriptions that identify the target phenotype in the image. Browsing the image sets lets one rapidly explore potential problems and approaches. In many cases, we have included multiple images of the same subject in case the slight motions of the subject offer some algorithmic possibilities.

5 Prospects

We do not minimize the difficulties of these challenges. Solving these, directly or by having better ideas, will require the collaboration of a wide variety of specialists and interdisciplinary workers. The rewards for even modest improvements in our ability to characterize phenotypes in the field at higher speeds and better discrimination are both very great and very timely. Crop improvement is necessary in increasing food security, though many socioeconomic factors must also change to meet the expanding needs of the world’s people [1]. High-throughput phenotyping in the field is pivotal to crop improvement. Come help.