Cultivated potato (Solanum tuberosum L.) was domesticated from wild Solanum species native to the Peruvian Andes 8000–9000 years ago (Spooner et al. 2005). Through polyploidization (Watanabe and Peloquin 1989), photoperiodism changes (Hosaka 2003, 2004), and other diversification mechanisms (Hardigan et al. 2017), potato became adopted globally and it is now the third most important food crop for human consumption (Birch et al. 2012). Commercial potato cultivars exhibit great phenotypic diversity. For example, available germplasm varies in tuber shape, skin color, plant architecture, phenology, among others ( Despite this variability, most of the commercial cultivars are susceptible to late blight, a major global disease caused by Phytophthora infestans (Mont.) de Bary and responsible for the Irish Potato Famine in the mid-1800s (Majeed et al. 2022). Since then, plant breeders have looked for sources of resistance across potato’s crop wild relatives.

Potato landraces and wild species are widely distributed across Central and South America (Hardigan et al. 2015; Hijmans et al. 2002). These materials display remarkable phenotypic variation, encompassing diverse tuber colors, sizes and shapes, canopy architectures, and growth habits (Spooner et al. 2005). More importantly, landraces and wild species are excellent sources of resistance to nematodes, viruses, insects, fungi and late blight (Bachmann-Pfabe et al. 2019; Dandurand et al. 2019; del Rio and Bamberg 2020; Fürstenberg-Hägg et al. 2013; Ritter et al. 2008; Ruiz de Galarreta et al. 1998). The number of species in the Solanum genus has changed over the years (Hawkes 1990; Spooner and Salas 2006), however, Spooner et al. (2014) recognizes 107 distinct wild potato species. Solanum species exhibit varying ploidy levels from diploid (2n = 2x = 24), to triploid (2n = 3x = 36), tetraploid (2n = 4x = 48), pentaploid (2n = 5x = 60), and hexaploids (2n = 6x = 72). The species are distributed across most of the American continent, with the greatest diversity found in central Mexico and the highlands of the Andes (Hijmans and Spooner 2001; Hijmans et al. 2002; Spooner et al. 2004, 2014). Wild potato species inhabit diverse habitats, often isolated by geographic or ecological niches rather than reproductive barriers (Bethke et al. 2017).

In recent years, much of the phylogenetic relationships across Solanum species have been revealed through the use of genomic information (Gagnon et al. 2022; Hardigan et al. 2015; Huang et al. 2019). Moreover, the integration of phenotypic and genotypic data has allowed researchers to identify marker-trait associations for tuber quality characteristics (Aversano et al. 2017; Sulli et al. 2017; Wolters et al. 2010), abiotic factors resistance (Esposito et al. 2017; Vega and Bamberg 1995; Watanabe et al. 2011), and pest and pathogen resistance (Fulladolsa et al. 2015; Huang et al. 2023; Li et al. 2018c; Meade et al. 2020; Sudha et al. 2016; Tiwari et al. 2015; Yang et al. 2017). For instance, at least 14 loci have been identified for late blight resistance (Meade et al. 2020; Paluchowska et al. 2022; Tiwari et al. 2013), in addition to other resistance loci and tuber quality traits (Bhatia et al. 2023; Collins et al. 1999; Li et al. 2005). From a breeding perspective, potato wild relatives have become an important reservoir of novel allelic variation for trait introgression. In this context, cataloging plant genetic resources is fundamental for predicting, assessing, and promoting the use of uncultivated or underutilized germplasm, such as crop wild relatives, for plant improvement (Machida-Hirano 2015; Spooner and Bamberg 1994).

In recent decades, genebanks worldwide have accumulated extensive ex-situ collections of exotic germplasm, including crop wild relatives and wild species (Ellis et al. 2020). Advancements in genotyping technologies have revolutionized genetic characterization, enabling genome-wide analyses of numerous accessions and entire ex-situ collections within genebanks. High-throughput sequencing, in particular, has facilitated genome characterization (Tang et al. 2022) and the development of core collections, which aim to capture the majority of genetic diversity with a reduced number of accessions (Wambugu et al. 2018). However, the comprehensive characterization of phenotypic information and the exploration of environmental interactions, vital for harnessing the potential of exotic germplasm and identifying species and accessions suitable for climate resilience, still remains challenging. Moreover, the implementation of cost-effective genotyping strategies relies on the availability of high-quality genome assemblies and appropriate genotyping pipelines (Pavan et al. 2020), which is not always the case in minor crops or crop wild relatives.

Leaf morphometrics has been used to catalog genetic diversity in crop species such as grapevine (Demmings et al. 2019; Klein et al. 2017), tomato (Chitwood et al. 2013), cranberry (Diaz-Garcia et al. 2018), cotton (Andres et al. 2017) and apple (Migicovsky et al. 2017). Moreover, this approach has supported large-scale phenotyping efforts in orphan crops and crop wild relatives, including kura clover (Schlautman et al. 2020), Solanum pennellii (Li et al. 2018b), Viburnum (Spriggs et al. 2018), Oxalis (Morello et al. 2018), and Passiflora (Chitwood and Otoni 2017). Leaves serve as the primary photosynthetic organs in most plants, and in many cases, leaf-traits such as size and shape are highly informative and provide important insights into physiological and phenological states (Wang et al. 2021). In certain research, leaf size and shape has been associated with distinct fruit characteristics. For example, leaf morphometry in oriental persimmon correlates with fruit shape (Maeda et al. 2018). In other species, like grape and apple, leaf color has been associated with berry flesh color (Migicovsky et al. 2017). While quantitative methods exist for measuring leaf size and color, the assessment of leaf shapes often relies on qualitative measurements. This poses a challenge, particularly in plant species with multifoliate leaves, such as Solanum sp., where the absence of homologous landmark points complicates leaf shape comparisons (Li et al. 2017).

In this study, more than 150 accessions corresponding to 12 species of wild and cultivated potato were analyzed using computer vision and advanced morphometric methods. The accessions studied here were collected across a wide geographical range of the American continent, thus offering an important depiction of the existing diversity within wild potato species. Many of the accessions surveyed in this study hold great relevance for breeding, as they have been examined for their potential as sources of resistance to late blight and other adaptive traits (del Rio and Bamberg 2020; Enciso-Maldonado et al. 2022).

Materials and methods

Plant material

Seeds of 161 wild potato accessions spanning 12 Solanum species were obtained from the United States Potato Genebank (Peninsular Agricultural Research Station, University of Wisconsin-Madison) (Table S1). Accessions for these species were originally collected between 1940 and 2000 at different locations in Mexico, Bolivia, Peru, Argentina, Colombia, and Guatemala, at altitudes ranging from 1800 to 4100 masl (Fig. 1). These materials are part of the breeding program at Toluca, Mexico, particularly because of their potential for late blight resistance, phenology characteristics, canopy architectures, among others. In addition, many of these materials are being genotyped. A Dutch S. tuberosum cultivar (‘Alpha’; 1925) was also included in the study. Ploidy levels represented in the surveyed collection included diploids (n = 23), triploids (n = 1), tetraploids (n = 27), and hexaploids (n = 111; Fig. 1).

Fig. 1
figure 1

The geographic distribution and ploidy of 13 Solanum accessions investigated in the present study. Upper-right panel zooms in into the collecting sites for Mexican accessions. Different color points correspond to different species, as indicated in the lower-right table. EBN = Endosperm Balance Number. (Color figure online)

The seeds were germinated in the greenhouse using 200-hole seedling trays and peat moss as media, for a period of six weeks and at 25 °C. Once the plants developed roots and at least four leaves, they were transplanted to the field at ICAMEX, Estado de Mexico, Mexico (latitude 19.2435, longitude − 99.59135, at 2606 masl). Plants were established in the field using a complete randomized design with two repetitions; each experiment unit comprised two plants. For every plant in the experiment, two fully mature leaves collected from the middle of the canopy were manually sampled for imaging.

Image acquisition and morphometric analysis

Collected leaves were scanned using a flatbed scanner (Epson Perfection V39,) at 300 dpi. Two sets of images were obtained. In the first set (Fig. 2a), the leaves were scanned with all their corresponding leaflets attached. In the second set of images (Fig. 2b), the leaflets were detached and individually placed in the scanner for imaging, while recording the position of each leaflet within the leaf, from the apex to the base, and from left to right. Each image included a size reference and a QR code for image identification. Images were imported into R, thresholded, and converted into binary images using EBImage (Pau et al. 2010). Then, individual leaflets were selected and contours (outlines) were generated for downstream analysis with the R package Momocs (Bonhomme et al. 2014), as in Chitwood et al. (2014). For each individual leaflet, several basic descriptors were computed, including area, perimeter, length, width, aspect ratio, convexity, circularity, and color. Additionally, normalized elliptical Fourier descriptors based on 10 harmonics were calculated for each leaflet (Bonhomme et al. 2014). Because of the lack of common landmarks across accessions, the only variable computed for complete leaves (with attached leaflets, Fig. 2a) was total leaf area.

Fig. 2
figure 2

Examples of the two sets of images produced in this study. a Complete leaves with attached leaflets. b Individual leaflets detached and imaged individually

Data visualization and statistical analysis

Leaflet contours were plotted using Momocs (Bonhomme et al. 2014) and visually inspected for errors. Incomplete leaves due to mechanical or insect damage, or errors during image segmentation were removed from the analysis. Correlations among leaf phenotypes were calculated using Pearson’s correlations. All basic statistical functions were performed in R and visualized in the package ggplot2 (Wickham 2009). Analysis of variance and Kruskal–Wallis tests were conducted for normally and non-normally distributed traits, respectively.


Leaf size and shape exhibit discernible patterns

More than 7500 leaflets sampled from 1129 leaves and 162 Solanum accessions were analyzed in this study. Leaf shape variation was surveyed in 13 potato species, more than in any other study before. This comprehensive study revealed large variation in leaf size and architecture, which arised from major differences in their corresponding leaflets. For instance, leaflet size and shape varied as a function of its position within the leaf (Fig. 3), which is further described below.

Fig. 3
figure 3

Digital representation of leaf variation in 13 Solanum species. Each leaflet shape was generated by averaging leaflet shapes at similar leaflet positions within the leaf. Leaflets are colored according to the average color estimated from scanned images. (Color figure online)

Leaf size was determined using two different approaches. Firstly, leaf size was calculated from the images containing complete leaves (with their leaflets attached), thereby representing the true leaf area. Secondly, an alternative estimation was derived by aggregating the areas of individually scanned detached leaflets (Fig. 4a). As expected, outcomes from both approaches exhibited a robust positive correlation (r2 = 0.95, p value = 2.2e−16). Nevertheless, leaf area calculated from intact leaves, on average, surpassed the cumulative area of all leaflets by 11.1%. This difference can be attributed to the presence of small interjected leaflets located between the sampled leaflets and the petiole. Overall, discrepancies between leaf area (using complete leaves) and area derived from leaflet area summation ranged from 0.9 (no interjected leaflets, reduced petiole), to 18%. When comparing leaf sizes (using the cumulative leaflet areas) among different accessions, a twofold difference was observed. For example, ‘Alpha’ (S. tuberosum) exhibited the largest leaf size (221.08 cm2), followed by accessions 161,367 (99.31 cm2, S. demissum) and 161,179 (98.16 cm2, S. demissum). In contrast, the accessions 653,799 (4.97 cm2, S. michoacanum), 230,489 (6.53 cm2, S. pinnatisectum), and 275,230 (7.97 cm2, S. pinnatisectum) had relatively smaller leaf sizes.

Fig. 4
figure 4

Leaflet size and shape shows great variability across 161 wild and a cultivated potato. Variation in a leaf area and b the ratio of terminal leaflet to non-terminal leaflet. c Species variability in leaflet area, perimeter, length, width, aspect ratio, convexity and circularity

The distribution of leaf area within the leaflets exhibited discernible patterns. Terminal leaflets consistently displayed larger sizes across all species, with accession 631,212 (S. microdontum) having the largest terminal leaflet size (96.11 cm2), and accession 653,799 (S. michoacanum), with a 1.15 cm2 terminal leaf. Additionally, leaflet size decreased gradually as a function of their position along the petiole, which was expected as apical leaflets emerge and develop first.

By discerning the specific position of each leaflet along the petiole, other descriptive variables were calculated. For example, the ratio of terminal leaflet size to non-terminal leaflet size was highly variable across accessions (Fig. 4b). Species like S. tuberosum and S. pinnatisectum had higher ratios (0.84–0.73), indicating that lateral leaflets were approximately three quarters the size of the terminal leaflet. On the other hand, species like S. microdontum and S. albicans had a ratio closer to 0, indicating significantly larger terminal leaflets compared to lateral leaflets. Overall, these findings collectively underscore the influence of leaflet position on size distribution, with the apical locations predominantly exhibiting larger leaflets.

Leaflet properties depend on their position within the leaf

Leaflet shape also exhibited variation depending on leaflet position. Two distinct patterns were observed based on the aspect ratio of the leaflets (Fig. 4c). In certain species like S. acaule, S. albicans, S. berthaultii, and S. tuberosum, an asymptotic-like relationship was observed between aspect ratio and leaflet position. These species displayed larger aspect ratios (indicating more elongated leaflets) in their second and third leaflet pairs, while the aspect ratios were smaller for the first and fourth leaflet pairs, indicating more rounded leaflets. Conversely, species such as S. michoacanum and S. pinnatisectum consistently showed an increase in aspect ratio as the leaflet position progressed from basal to apical positions. Unlike the species with the asymptotic relationship, S. michoacanum and S. pinnatisectum exhibited considerably larger aspect ratios ranging between two and four. The circularity of the leaflets followed a similar pattern, as it was highly correlated with the aspect ratio (r2 = 0.91).

The correlation between leaflet traits is species-dependent

Size-related descriptors including area, perimeter, length, and width exhibited a strong positive correlation (r2 = 0.92–0.99). Among the three basic shape descriptors, circularity and aspect ratio showed a high positive correlation, while convexity exhibited no significant correlation with any of the traits. In general, no correlation was observed between leaflet size and shape (Fig. 5a). However, upon conducting a more in-depth exploration of trait correlations across various species, distinct patterns of species-specific significant correlations came to light (Fig. 5b). For instance, although leaflet circularity and leaflet area showed poor correlation when considering all species together, they exhibited a medium to high negative correlation (r2 = 0.80–0.18) specifically within the S. microdontum species. Similarly, while convexity showed little correlation with other traits overall, it displayed a medium negative correlation (r2 = − 0.32 to − 0.36) with traits such as width, aspect ratio, perimeter, and length specifically in the S. tuberosum species. Conversely, the opposite trend was also observed, with high correlations among all species except for specific ones. For example, width and length exhibited a medium correlation (r2 = 0.50, 0.60) in the S. pinnatisectum and S. tuberosum species, while for the rest of the species, the correlation between these traits was larger than 0.8 (average r2 = 0.93). These findings reveal the complexity and high variability of Solanum leaf size and shape across different species, with leaflet properties playing a major role in determining these characteristics.

Fig. 5
figure 5

Correlation among shape and size traits. a Pearson’s correlation using non-averaged leaflet values. b Species-specific variation in trait correlations. Different marker symbols/colors correspond to different species; red lines highlight the extent of the variation in the correlation coefficients for each pair of traits. (Color figure online)

Elliptical Fourier descriptors

Compared with basic shape descriptors such as aspect ratio, circularity, and convexity, the Fourier analysis allows the description of global and local features to quantify variance among different shapes (Chitwood et al. 2013, 2014). This approach extracts the information embedded in the object outlines (leaflets) into a weighted sum of wave functions with varying frequencies. Then, the Fourier coefficients determine the contribution of each waveform to the shape under inspection. For instance, lower-order harmonics provide insights into the overall shape, while higher-order harmonics capture local variations in the outlines (Iwata et al. 1998). One notable advantage of the EFD-based methods is its ability to separate and independently analyze symmetric and asymmetric variances in shape, which is a common feature in biological systems.

Leaflet shape variation was analyzed using symmetrical and asymmetrical principal components, which allowed for a better understanding of the results from the elliptical Fourier descriptor analysis (Fig. 6). Given the unbalanced distribution of sample sizes for S. demissum, the complete dataset was analyzed, and additionally, the analysis was performed with a subset of S. demissum samples (10 randomly selected accessions). Overall, considering the limited variability observed in S. demissum accessions, the results remain consistent, whether including the entire dataset (Fig. S2) or only a subset of S. demissum samples (Fig. 6). The first three principal components (PC) collectively accounted for 96.2% of the symmetric variance in leaflet variation, while the first three asymmetric components explained only 89.9%. Particularly, PC1 accounted for 90% of the symmetric variance, but only 82.3% of the asymmetric variance.

Fig. 6
figure 6

Elliptical Fourier Descriptors. a Variation of harmonic coefficients. b Harmonic contributions to shape resulting from EFD analysis. Principal components (PC) c PC1 and PC2, d, e PC1 and PC3. f PCs explaining variance in leaflet shape

Full leaf structure can be measured using persistent homology

Significant variation in both leaflet size and shape was observed among the accessions examined in this study. Leaflet position and the allocation of leaf area among different leaflet positions (Fig. 4) were identified as major sources of variation. However, complete leaves, as a whole, also exhibited significant variation, not only in size as discussed above, but in structure (i.e. number of leaflets) and shape. For instance, different species and accessions displayed varying numbers of leaflets. Some had only one pair of lateral leaflets (S. microdontum), while others had three (S. albicans, S. brevicaule, S. chacoense, S. demissum, S. guerreroense, S. juzepczukii, S. michoacanum, S. stoloniferum, S. tuberosum), or the maximum observed, four (S. acaule, S. berthaultii, S. pinnatisectum). Interjected leaflets at different leaflet positions were also observed in certain accessions. Additionally, petiole size varied across species, and in some cases (e.g., S. tuberosum), accounted for a significant portion of the total leaf area.

Comparative analysis of morphometric phenotypes in multifoliate leaves is challenging because of the absence of clearly defined homologous points (e.g., landmarks) across different leaf structures (Li et al. 2018a). Therefore, techniques like persistent homology have been used to quantitatively account for shape variation in complex traits, providing a promising avenue for analysis (Diaz-Garcia et al. 2018; Li et al. 2019; Schlautman et al. 2020). In this study, the set of images containing complete leaves (with attached leaflets, Fig. 2a) was subjected to persistent homology analysis. A shape barcode consisting of 800 values was generated for each leaf, and principal component analysis was performed on the entire dataset, considering the barcodes of all the leaves. The first three principal components derived from PH explained 44, 22, and 6% of the variance in leaf shape, respectively. Seventeen principal components from PH explained 90% of the shape variance, indicating the significance of local and accession-specific shape features. Two analyses were conducted with the PH data. First, a correlation analysis was performed between the principal components derived from PH (using complete leaves) and those obtained from the elliptical Fourier descriptor analysis (using leaflets). Interestingly, a strong correlation was observed between the first principal components of PH and EFD, likely suggesting that persistent homology applied to complete leaves is sensitive to shape variation in individual leaflets. Second, by plotting principal components 1 and 2 of persistent homology (Figs. 7a, b), the accessions were partially grouped according to their assigned species. For example, all three S. pinnatisectum accessions formed a distinct group along principal component 1, while the S. berthaultii accessions also formed a well-defined group. However, species with a larger number of sampled accessions, such as S. demissum and S. acaule, showed more scattered grouping, indicating pronounced intraspecies leaf shape variation.

Fig. 7
figure 7

Variation in full leaf shape and color. Principal components (PC) obtained from the persistent homology analysis: a PC1 and PC2, and b PC3. c Density plots for color variation by species; red, green, blue plots correspond to variability in R, G, and B channels, respectively, of the leaf scans. (Color figure online)

Analysis of color

The species S. tuberosum displayed average RGB values closer to zero, indicating darker-toned leaves (Fig. 7c). On the other hand, S. pinnatisectum and S. michoacanum exhibited the least color variability across the three RGB channels, while S. microdontum, S. stoloniferum, and S. juzepczukii showed higher variability. According to the statistical analysis, three groups were formed. The first group includes the species S. acaule (68; 85; 47; RGB respectively) and S. michoacanum (62; 76; 51; RGB respectively). The second group consists of S. tuberosum (43; 59; 28; RGB respectively), and the third group includes the remaining species.


Leaves hold a vital importance within plants, orchestrating essential physiological processes like photosynthesis, respiration, and transpiration. Both leaf size and shape impact in photosynthetic efficiency, which is intimately intertwined with agronomic productivity (e.g. yield) and tolerance to biotic and abiotic stressors (Nicotra et al. 2011; Wang et al. 2021; Zhang et al. 2021). Leaf size, for instance, directly affects physiological functions, including light capture, thermoregulation, water absorption, and transpiration, while also interacting with surrounding organisms (Niinemets et al. 2006; Pickup et al. 2005; Sarlikioti et al. 2011; Westoby et al. 2002). This feature is crucial for survival in diverse habitats and for addressing specific challenges; smaller leaves may offer advantages in warm, dry habitats with high solar radiation, whereas larger leaves may excel in environments with lower irradiation, cooler temperatures, and higher humidity (Ackerly et al. 2002; Falster and Westoby 2003; Givnish and Vermeij 1976; Gates 1965; Niinemets et al. 2006).

The diversity in leaf size and shape, particularly in response to biotic stresses like herbivore insects, has played a significant role in the evolutionary trajectory of leaves. Deeply divided compound leaves can reduce insect feeding efficiency, leading herbivores to prefer plants with simple or less divided compound leaves (Bright and Rausher 2008; Brown and Lawton 1991; Ferris 2019). Simultaneously, leaf size and shape influence plant selection by insects, revealing sensitivity to leaf morphology in many cases.

Leaves exhibit many heritable traits, including size and shape (Karamat et al. 2021), which makes them particularly useful for taxonomic classification and germplasm curation. Characterizing leaf morphometry not only reveals the existing variability among species but also offers a comprehensive overview of their breeding potential, adaptability and functionality each species might exhibit within specific environments (Wang et al. 2021). Additionally, the accurate identification and characterization of plant species is crucial for preserving the genetic diversity of plants, especially those with economic, ecological, or cultural significance, and it contributes to the maintenance of genebanks that store seeds and vegetative material for long-term conservation (Singh et al. 2019).

Comprehensive phenotyping of wild species

Genebanks play an essential role in the conservation and characterization of plant biodiversity, particularly when dealing with wild species (Smale and Jamora 2020).The conservation of wild species in genebanks allows for their study in crop improvement, as these species may carry genes resistant to pathogens, tolerance to extreme environmental conditions, and other desirable traits that can be crucial in creating more resilient and sustainable crops (Smale and Jamora 2020; Wang et al. 2017). While new genotyping platforms have the potential to reveal genetic diversity at the DNA level, this is not always sufficient to understand the phenotypic adaptations of these plants to different environments, and it does not provide direct information on how these genes manifest in visible traits. In this regard, phenotypic characterization is as important as genotypic characterization, as it allows for the description and establishment of relationships between cultivar groups and accessions, as well as the identification of promising materials for germplasm improvement and conservation (Muli et al. 2021; Pereira-Dias et al. 2020; Plazas et al. 2014).

Plant phenotyping is essential to understand how wild species adapt to their environment. While traditional phenotyping relies on manual observation and measurement, high-throughput phenotyping uses advanced technologies such as image analysis and morphometrics to efficiently measure multiple traits on a large scale, enabling comprehensive characterization of wild species, revealing complex traits and precise adaptations (Diaz-Garcia et al. 2016; Wang et al. 2017). This is especially valuable when searching for specific traits crucial for crop adaptation. Moreover, these high-throughput techniques are fundamental in creating collections that more accurately and efficiently represent genetic diversity (Schlautman et al. 2020).

Our study provides comprehensive information regarding leaf shape, size, and color properties for 12 wild potato species and a commercial cultivar. The wild species included in our survey were sampled from the native range of potato, and several of the accessions hold potential for breeding, particularly in terms of disease resistance (e.g., late blight). As discussed above, leaf morphology has been shown to affect plant adaptation (e.g., herbivorous preferences, light interception, etc.). However, our study did not investigate the relationship between leaf characteristics and their role in plant survival and adaptation. Nevertheless, our screening offered detailed, quantitative, and measurable information that can be integrated with further experimental research to understand such plant-pathogen-environment associations. Moreover, our survey adds an additional and valuable layer of information to catalog potato wild germplasm and create data-informed core collections. Within the subset of samples/species studied, persistent homology and elliptical Fourier descriptor analysis are sensitive to heritable leaf shape properties. Of course, this does not ensure that these techniques will show equal sensitivity and clustering capabilities in the other 88% of the species. However, considering fundamental work (e.g., Li et al. 2018a, b, c) on the use of PH, even across different plant genera and families, it is very likely that PH and EFD will outperform traditional descriptors during classification. Additionally, it informs breeders interested in prebreeding strategies to introgress novel allelic variation. Finally, we leveraged the use of affordable, simple-to-use computer vision algorithms to generate massive amounts of phenotypic data that have the capabilities of capturing relevant features differentiating individual species.