1 Introduction

The word “biodiversity” is a synonym of “biological diversity”. The Convention on Biological Diversity (CBD) defines biodiversity as: “the variability among living organisms from all sources including, inter alia, terrestrial, marine and other aquatic ecosystems and the ecological complexes of which they are a part; this includes diversity within species, between species, and of ecosystems.”Footnote 1 Therefore, there are three levels of biodiversity: intra-specific (genetic), inter-specific, and ecosystemic. Even though a full understanding of all three levels is indispensable to guide biodiversity conservation efforts, this paper focuses on inter-specific biodiversity and some associated taxonomic challenges.

The CBD Strategic Plan 2011–2020 has explicitly stated twenty ambitious targets known as the Aichi TargetsFootnote 2. Aichi Target 19 specifically proposes that “knowledge, the science base and technologies relating to biodiversity, its values, functioning, status and trends, and the consequences of its loss, are improved, widely shared and transferred, and applied”; but in fact biodiversity informatics will be fundamental to the achievement of all of the Aichi Targets.

It is estimated that about 10 million species of macro organisms inhabit the earth. This vast inter-specific biodiversity and the need to better understand it led to the development of classification schemes called biological taxonomies. Even since Aristotle’s times, when only approximately 500 species of animals had been identified, Aristotle himself established a classification method. In the XVIII century, Carl Linnaeus “father of modern taxonomy” formalized a system of naming organisms called binomial nomenclature which is used to this day.

Unfortunately, in addition to the enormous biodiversity of the earth, current identification and classification workflows are both slow and error-prone. Furthermore, classification expertise is in the hands of a small, decreasing number of expert taxonomists. This has been identified as a serious problem and is known as the “global taxonomic impediment”Footnote 3. Automated identification of organisms has therefore become not just a dream among systematists for centuries [1] but a need to better understand, use, and save biodiversity.

Even though the number of plant species (about 400,000) is considerably smaller than the number of animal species, taxonomic work on them is still a monumental task. However, plant species identification is particularly important for biodiversity conservation. It is critical to conduct studies of biodiversity richness of a region, monitoring of populations of endangered plants and animals, climate change impact on forest coverage, bioliteracy, payment for environmental services, and weed control, among many other major challenges.

The rest of this paper is organized as follows: Sect. 2 summarizes progress made to automate the identification of taxa in systematics. It starts with a description of the traditional dichotomous keys approach, and then presents interactive keys, morphometric approaches, briefly describes DNA barcoding, and concludes with recent approaches based on machine learning and computer vision techniques. Section 3 summarizes the state of the art of leaf-based plant species identification using computer vision. Finally, Sect. 4 concludes with current challenges and opportunities.

2 Automated Taxon Identification in Systematics

Traditionally, systematists have not relied on quantitative data alone to identify taxa. They prefer the visual inspection of morphology, the (mostly) qualitative assessment of characters, and the comparison of these to reference specimens and/or images. While this process works, it is not quick, efficient or reliable [1]. The following subsections describe some attempts to automate or at least define algorithms that can be followed either manually (e.g., dichotomous keys and morphometrics) or translated into software that fully or partially automates the taxa identification process. In some cases, the resulting software guides a human user (e.g., interactive keys) who actually makes the decisions. In other cases, it fully automates the taxa identification process by extracting additional data from specimens (e.g., molecular and chemical data) or multimedia information such as digital images and sound.

2.1 Single-Access Keys

In biology, an identification key is a document or software that takes the form of a decision tree that offers a fixed sequence of identification steps. If each step has only two alternatives, the key is said to be dichotomous, otherwise it is polytomous. These keys are possibly the oldest attempt to designing algorithms for organismal identification long before computers were available. They aim at reducing the rate of errors, making explicit and objective the rules to be followed, and selecting optimal or semi-optimal sequences of questions.

This approach has several drawbacks even when those algorithms have been programmed. Among them are the difficulty to accommodate new species descriptions and the assumption that a user has all the information available to proceed from the top question (the single-access key) to the following levels. The latter means that when only partial information is available about the organism (e.g., only leaves or flowers of a plant), a user might not be able to go past the very first question.

2.2 Multiple-access Keys

These are decision trees that have multiple starting points that allow users to follow different paths, possibly because he/she has partial morphological information. In its computerized version, they are also called interactive keys. They start with a full domain of candidates (e.g., all plants from a country), and proceed to gradually discard candidates as the user proceeds answering questions in an arbitrary order. The final result could be a unitary set of candidates (full identification achieved), an empty set (a new species or an incomplete key), or a set with cardinality greater than 1 (some questions remain to be answered).

2.3 Morphometric Approaches

Morphometrics is the study of shape variation and its co-variation with other variables [2]. Three general approaches are usually distinguished: traditional morphometrics, landmark-based morphometrics and outline-based morphometrics. Traditional morphometrics is the application of multivariate statistical analysis to sets of quantitative variables such as length, width, and height. Geometric morphometrics emphasizes methods that capture the geometry of the morphological structures of interest and preserve this information throughout the analyses. Outline-based morphometrics focuses on shape variation along the contour of an object. These three approaches are not necessarily mutually exclusive. [3] provides an excellent survey on this subject.

2.4 DNA Barcoding

DNA barcoding is a taxonomic method that uses a short genetic marker in an organism’s DNA to identify it as belonging to a particular species [4]. The gene region that is being used as the standard barcode for almost all animal groups is a 648 base-pair region in the mitochondrial cytochrome c oxidase 1 gene (“CO1”). For plants, two gene regions in the chloroplast, matK and rbcL, have been approved as the barcode region. DNA barcoding has met with a strong reaction from scientists, especially systematists, who either express their enthusiastic support or vehement opposition [5, 6]. The current trend appears to be that DNA barcoding should be used alongside traditional taxonomic tools and alternative forms of molecular systematics so that problem cases can be identified and errors detected.

2.5 Crowd Sourcing (Collective Intelligence)

Crowd sourcing approaches to species identification is neither a quantitative nor an automated method. However, it is included it in this survey because it uses computer technology to gather georeferenced multimedia information (e.g., images) and a community of citizen scientists and biologists who jointly tackle the challenge of identifying an organism based on an image, collective knowledge, and interactive keys or other forms of computer-based tools. Besides, it is a low-cost high impact approach to empower and engage the general public in cibertaxonomy and biodiversity conservation. iNaturalistFootnote 4 and Pl@ntNETFootnote 5 [7] are two excellent examples of this approach. On the negative side, high levels of quality control are imperative because the community involved does not necessarily comprise domain experts.

2.6 Computer Vision and Machine Learning

In spite of enormous progress in the application of computer vision algorithms in other areas such as medical imaging, OCR, and biometrics [8], only recently have they been applied to identify taxa. Images of plant leaves and insect wings have been particularly attractive because they are flat and their morphology is used in most identification keys. Thus, in the last decade, research in computer vision has produced algorithms to help botanists and non-experts classify plants based on images of their leaves [913]. However only a few studies have resulted in efficient systems that are used by the general public, such as LeafSnap [14].

Computer vision and machine learning are two highly related artificial intelligence fields. In a supervised learning scenario, the general approach for organismal identification using computer vision comprises two general steps. First, digital images of identified species are fed to an algorithm that cleans them, segments them, and extracts relevant features. As a result, source images are typically transformed from the bitmap domain to a more tractable domain (e.g., histograms) and stored in a training dataset D. The second step consists of using the training dataset D to train an algorithm A. Unsupervised learning (e.g., cluster analysis) can also be used when a dataset of images is available but the associated species have not been identified.

Once algorithm A has been trained and tested, it is ready to try to identify species based on images of organisms. In the typical scenario, algorithm A has two inputs, namely, an image I of the unidentified organism and the dataset D. Algorithm A applies to image I the same filters used to create the dataset D and outputs a ranking of k candidate species. The larger the number k is, the better the chance of including the correct identification in the ranking is. However, most users would expect k to be a small value to be useful. Details on the use of computer vision and machine learning to identify plants based on images of leaves are presented in the following section.

3 Automated Leaf-Based Plant Species Identification

Several surveys regarding leaf-based identification of plants have been published in the past. [15] covers most classification methods such as k Nearest Neightbors (kNN), Probabilistic Neural Network (PNN), and Support Vector Machines (SVM), as well as their accuracy and precision. In [16], Metre and Ghorpade survey different texture-only techniques, provide a comparison schema for them, and pinpoint how important it is to create a centralized dataset of leaf images.

Most researchers agree on a general workflow to identify species based on images of their leaves [913]. The first step is data acquisition. Acquiring leaf images is a time consuming task. Because of the lack of standards and centralized repositories, researchers have typically generated isolated datasets for their projects. Segmentation of the leaf is then executed to explicitly separate leaf from non-leaf pixels. Afterwards, different techniques are used to extract features based on venation [17], curvature [14] and morphometrics [11]. Finally, machine learning techniques are used to generate the trained algorithm [913].

3.1 Data Acquisition

Existing leaf recognition datasets use images of individual leaves on uniformly colored backgrounds for easier leaf segmentation. There are several datasets publicly available but, to our knowledge, there is not yet a centralized dataset which can grow as researchers and citizen scientists add more images and data. The following are examples of datasets from different projects:

  • The Flavia Dataset [12] encompasses 32 species and a total of 3,621 fresh leaf images on white backgrounds. Leaves were collected in Nanking, China.

  • Kumar et al. [14] created a dataset for 184 tree species from Northeastern USA that includes 23,916 images of fresh leaves with uniform backgrounds. It is used by the LeafSnap mobile app.

  • Mata-Montero and Carranza-Rojas [18] from the Costa Rica Institute of Technology created a dataset that comprises 2,345 noisy and 1,468 clean leaf images from 67 Costa Rican tree species, all with uniform background.

  • ImageCLEF is a leaf classification competition that has created its own dataset [7]. It currently includes 1,000 plant species from West Europe. It has more than 100,000 images of leaves, as well as flowers, fruits, stem and the whole plant pictures. It comprises both images with white background and images taken directly in the field with complex backgrounds and noise [7].

3.2 Leaf Segmentation

Leaf segmentation can act on images with uniform backgrounds, such as a white piece of paper, or complex backgrounds. The former is simpler although artifacts such as shadows and light gradients still generate some problems. Most researchers use uniform backgrounds to simplify this phase. In [14, 18] Expectation-Maximization (EM) is used to cluster pixels. This produces fairly good segmentation but shadows tend to generate false positives. Similarly, in [19] the authors study how a semi-controlled light environment affects clustering algorithms. They perform color clustering and then apply Grab-Cut to find the global optimal segmentation solution.

Very few studies have tackled the problem of segmenting leaves with complex backgrounds [20, 21]. This feature is highly desirable for at least two types of leaves: leaves of tall trees from which it is difficult to take a sample and then photograph it with a uniform background, and leaves of plants that have been mounted on herbarium sheets. In the former case, it would be ideal to zoom-in with the camera and take a picture of the leaf in its tree. In the latter, the background may not be as complex as a natural setting but overlapping of leaves and other plant elements in the herbarium sheet makes the automated extraction of leaves and their subsequent segmentation very challenging.

We are not aware of any research that aims at generating leaf image datasets from herbarium sheets. The benefit of doing this is twofold. First of all, herbaria all over the world have invested millions of dollars over long periods of time to collect samples of plants. Rather than going again to the field to take pictures or collect more samples, it would be considerably less expensive to use leaves of plants that have already been identified and conserved in herbaria. Secondly, it would help demonstrate the value of herbaria collections.

3.3 Feature Extraction and Identification

Segmentation of the input image I produces a segmented image \(I'\) to which feature extraction is applied. This subsection briefly surveys approaches that use curvature, texture, venation, leaf morphometrics, or combinations of them.

Curvature. In [14] Kumar et al. create what they call a Histogram of Curvature over Scale (HCoS), which consists on measuring the leaf area and arc length of the intersection of the leaf and disks of radius r, where \(1<= r <= 25\) pixels, and the disks are centered at every leaf contours pixel of the leaf in \(I'\). All calculations are then added into a unique histogram that describes the contour of the leaf. Using kNN and histogram intersection, a list of the k species whose leaves more closely match the leaf in I is presented to the user. Another method applied on both simple and complex leaves is the one described in [22]. Their method captures both global and local shape features and uses them separately during identification. This allows to discriminate leaves with similar shape but different margin patterns, and viceversa. Similarly to [14], several scales are explored by convolving the contour against a Gaussian filter with different values \(\sigma \). This is particularly useful for serration of the margin.

Texture. Local Binary Patterns (LBP) descriptors are used in [23] to identify medicinal and house plants from Indonesia. Different LBP descriptors were extracted from different sample points and radius, and concatenated into histograms. Then a four layer PNN classifier was used. For complex background images the achieved precision was 77 % and for uniform background images 86.67 %. In [24] Speeded Up Robust Features (SURF) features were used to develop an Android application for leaf recognition. The reported precision was 95.94 % on the Flavia dataset [12]. In [10] authors identify plants based only on a portion of the leaf, allowing botanists to identify damaged plants. The reported precision is 98.7 % when using Artificial Neural Network (ANN) for classification on their own small dataset.

Venation. Very few studies have used venation extraction as the basis for taxa identification. Venation extraction is not trivial, since veins are often merged with other leaf features. Some authors have simplified the task by using special equipment or treatments that render images with more clearly identified veins [25, 26]. However, this defeats the goal of having users get an automated identification for specimens that they have photographed with ordinary digital cameras.

In [25], vein pixels are extracted from laser scanned images in 3D. The laser scans a 3D point cloud in which veins are 3D-convex. A curvature threshold is then used to obtain potential vein pixels. Finally a squared linear fitting is applied to approximate the vein contour lines. In [17] researchers developed a tool to help botanists extract veins of leaves with minimum human interaction. They used a patch-based approach where a set of linear functions are learned from patches of images containing veins using Independent Component Analysis (ICA). Then these learned functions are used as a pattern map for vein detection.

Leaf Morphometrics. Leaves display very rich morphology. Traditional leaf measurements include aspect ratio, leaf area, rectangularity, circularity, convexity, and solidity, among others [11]. Additionally, color moments for gray scale intensities such as mean, variance, kurtosis, skewness have also been used [11]. Traditional, landmark-based, and outline-based morphometrics have been used both separately and in combined form.

Multimodal Approaches. In [9], a multimodal system composed of 38 morphological features and a Principal Component Analysis (PCA) approach for texture were used. The PCA training phase took all the dataset pictures and put them in a matrix, where a small number of characteristic features called eigenpictures were generated. Then, each image was represented as a linear combination of these eigenpictures. Their reported precision on the Flavia dataset [12] for the morphological features was 91.9 %, for the PCA algorithm 85.4 %, and for both combined 89.2 %.

In [13] a combination of shape, texture and color was used to recognize Indonesian medicinal plants. As a classifier they used PNN with a reported precision of 72.16 % over 51 medicinal species, with a total of 2,448 images. The authors created a mobile app which runs on Android OS called Medleaf [13]. Their best precisions were achieved by using Local Binary Pattern Variance (LBPV) as a feature base and not morphological features.

In [18] texture extraction of the whole leaf using LBP was compared with the HCoS curvature method developed by [14]. In the experiments it was proved that texture is more resilient to noise on leaf images. Better accuracy was achieved by assigning a small importance factor to curvature (10 %) and a larger one to texture (90 %). This result also matches results of [23] with regard to the usefulness of LBP for identification based on images of damaged leaves.

Deep Learning Approaches. Deep learning has become a huge success in computer vision research [27]. In [28] a Convolutional Neural Networks (CNN) was applied to a dataset with 44 species. The CNN was not coded with layers for specific features (e.g., curvature or texture), but the authors could infer that a layer was related to shape/curvature and another one to patterns similar to texture/venation. With this interpretation, the authors conclude that shape/curvature is not as discriminating as texture/venation, which is consistent with [18].

4 Challenges and Opportunities

Biodiversity conservation presents several monumental challenges. At the political and management level, it requires information and a deep understanding of living nature. However, about 80 % of the organisms on the planet do not even have a name. The scientific task of naming and classifying those organisms is gigantic, not only because of the large number of species to identify and describe, but also because it is tedious, slow, and error-prone. The global taxonomic impediment adds to the complexity of these challenges. Finally, access to this knowledge is limited by the scientific and non-digital nature of large amounts of literature.

Fortunately, computer vision and machine learning techniques that have been very effective in other realms are now being used to identify organisms, in particular plants, with high levels of accuracy (90 % or more). This could have an important impact in concrete conservation actions such as control of trade of endangered species and the execution of rapid biodiversity inventories. The following paragraphs, summarize some opportunities we currently have to cope with the above mentioned challenges.

Building a Global Dataset: Global biodiversity informatics initiatives such as GBIFFootnote 6, EOLFootnote 7, and BHLFootnote 8, have successfully built large global databases of biodiversity information that is freely available on the web. GBIF currently provides more than 600 million specimen-level records, EOL over a million species level descriptions, and BHL more than 50 million pages of literature. An analogous dataset of digital images of plant elements (e.g., leaves) does not exist. However, there are several opportunities that should be taken. First of all, digital cameras are now very inexpensive and powerful. Secondly, even though data sharing protocols and standards need to be in place, organizations such as TDWGFootnote 9 are devoted to precisely this endeavor. Finally, crowd sourcing offers now excellent opportunities to both, generate large repositories of information, and raise awareness of the general public through citizen science projects. iNaturalist and Pl@ntNET [7] have been very successful and deserve being emulated. The PlantCLEF dataset already demonstrates that this can be done at the European level.

Work with Herbaria: Herbaria hold treasures of information that should be critical to scale up the size and impact of a global dataset of digital images of elements of plants. Herbaria maintain large collections of plants that have been carefully mounted on sheets, could be digitized, and whose elements (e.g., leaves) could be extracted to feed a global dataset. Because herbaria sheets contain juxtaposed leaves, flowers, and other plant elements, research on detection and extraction of leaves needs to be further developed. In addition, more research is needed to deal with noisy images, complex backgrounds, damage detection and digital image repair, along with leaf identifications based on portions of the leaf (in case it is damaged). Landmark-based morphometrics research should help with the latter. Finally, as a very important herbaria financial sustainability side effect, herbaria around the world would have more arguments to demonstrate the value and impact of maintaining and investing in their collections. However, it is very critical for herbaria to supplement their collections with digital images through crowd sourcing and changes in their traditional workflows.

Deep Learning: Deep Learning, particularly using CNN, is a very hot topic in computer vision. The exciting results obtained in events such as ImageNet [27] have generated a lot of expectation. As more data and computational power are now available, this technique has become the most widely used, without substantial algorithmic changes since its inception. Instead of following a gradual path that aims at using images of elements of an organism first (e.g., leaves or flowers of a plant), and then pictures of the whole organism, CNN tackles directly the challenge of identifying organisms by using pictures of the whole or parts of the organism. However, this approach has at least two important limitations. First, it tends to work better with very large sets of images [29]. Secondly, it lacks the explanatory power of other approaches such as landmark-based morphometrics. Nevertheless, as global data sets are developed, it is just a matter of time to overcome the former. Additionally, research work is already under way to overcome the latter [28].