Introduction

The use of digital 3D models in morphological studies is increasing in many scientific disciplines, including palaeontology and evolutionary biology. The digitalization of an object not only facilitates detailed analysis of the size and shape of fragile specimens but also helps investigation of diverse evolutionary questions (e.g. Cardini et al. 2015; Cornette et al. 2013).

The use of close-range photogrammetry (PH) has grown in many fields because it is economical, portable, easy to apply and accurately reproduces the geometry and colour pattern of real and complex objects (Falkingham 2012). For this reason, it has become widely employed in a variety of disciplines such as biology (Evin et al. 2016), palaeontology (Bates et al. 2010), anthropology (Katz and Friess 2014) and medicine (Ege et al. 2004), among others.

In analyses of shape and size of objects (as in biological studies), the 3D models are often integrated with geometric morphometric methods (GMM). This approach has proved particularly useful in bats, where, for example, GMM has provided additional information on divergence of cryptic species (Sztencel-Jabłonka et al. 2009). Nevertheless, acquiring landmarks on bone sutures of bat skulls, particularly for Microchiroptera sensu Simmons and Geisler (1998), is quite difficult due to early suture ossification and their small size. This challenge often forces researchers to employ extremely precise equipment at considerable cost. However, no studies have addressed the utility of PH for this group and other similar sized mammals.

Katz and Friess (2014) and Evin et al. (2016) demonstrated the accuracy of close-range PH for large skulls (humans and wolves, respectively) relative to laser scan models. Fahlke and Autenrieth (2016) compared PH performance relative to µCT scan models for a vertebrate fossil skull (condyle-basal length = 37.5 cm) and similarly found high similarity. Very few studies have attempted to apply it to smaller specimens although Muñoz-Muñoz et al. (2016) and assessed the repeatability of PH for mice skulls (length = 45 mm) and suggested it might be appropriate for small mammals. Durão et al. (2018) suggested a protocol for 3D reconstruction of vole humerii by mean of PH. Nevertheless, no tests were conducted to assess its performance against more established 3D reconstruction techniques (e.g. µCT scan). High measurement error (random error, in particular) is well-known in small specimens and largely arises due to difficulties in landmark identification (Badawi-Fayad and Cabanis 2007; Cramon-Taubadel et al. 2007; Fourie et al. 2011; Marcy et al. 2018; Muñoz-Muñoz et al. 2016). The extent of biological variation is of paramount importance when considering the impact of technique-based error on the results (Marcy et al. 2018).

An additional incentive for analysing differences between techniques is that it may lead to an understanding of when it is feasible to combine data acquired using different techniques. The introduction of random and systematic errors intrinsic to each technique is known to create unreal patterns and/or obscure biological variation (Fruciano et al. 2017; Marcy et al. 2018; Robinson and Terhune 2017).

This study was motivated by the need to assess PH as a tool for reliable analysis of bat skull morphology and assess its performance relative to µCT scan and surface laser scan (LS). We used GMM to assess the relative accuracy of PH models for quantifying size and shape via anatomical landmarks. Phylogenetic comparative methods (Cornwell and Nakagawa 2017) were used to assess the strength of biological signal against the technique error and the phylogenetic uncertainty. Our aims were to quantify the extent of measurement error introduced by the PH/GMM approach and assess the reliability of combining data extracted from different reconstruction techniques (PH, µCT, LS).

Materials and Methods

Sample

GMM and phylogenetic comparative methods were used to examine the reliability of PH for the digital reconstruction of bat skulls and assess its performance in interspecific (19 species) evolutionary analyses.

Crania from nineteen different bat species from the Natural History Museum of Paris (MNHN) were reconstructed in 3D using three different techniques: PH, LS and µCT scanning. The specimens were selected to represent bat species of small and medium size, with an average skull length of 15.62 mm (Table S1).

Data Acquisition and Model Landmarking

The 3D models have been reconstructed with three different techniques (PH, LS, CT).

The PH 3D models were obtained employing a digital SLR Nikon D5300 camera attached to a Nikkor 60 mm macro lens. The general camera lighting settings and positioning, specimen arrangement and number of pictures per specimen were adapted from Falkingham (2012) and Mallison and Wings (2014). Average mesh size was ~ 3,000,000 faces.

For the LS models, we employed a Breuckmann Laser Scan, model SmartSCAN R5/C5 5.0 MegaPixel housed at the MNHN of Paris. We used the field of view S-030 which is optimal for very small objects (240 mm length) and can achieve a maximum resolution of 10 µm.

To obtain the CT models we used a phoenix v|tome|x s housed at the MNHN of Paris. Scans resolution ranged between 18 and 28 µm (average 23 µm) in voxel.

Detailed information on devices and workflow are available in the supplementary material.

The open source software Landmark Editor (Wiley et al. 2005) was used to place 24 unilateral landmarks on the dorsal, lateral and ventral side of the cranium (Fig. 1a and Table S2). See Supplementary material for details on landmarks acquisition and coordinates transformation procedure (Procrustes Shape Coordinates).

Fig. 1
figure 1

a Landmark configuration used in the study. Species: R. ferrumequinum. Anatomical definition in Table S1. b Visualization of mesh distances on dorsal and ventral views between (a) PH and µCT; (b) PH and LS; (c) µCT and LS. The colour represents the distances in mm. Species: R. ferrumequinum (skull length: 18.78 mm)

For the interspecific study (19 species), we assessed the landmarking error by recording coordinates three times on a subsample of nine species, selected to represent the morphological variation within our sample (Carolia perspicillata, Desmodus rotundus, Glossophaga soricina, Myotis emarginatus, Myotis capaccinii, Nyctalus noctula, Rhinolophus hipposideros, Rhinolophus ferrumequinum and Tadarida teniiotis) for each technique (LS, PH, µCT). Some species were morphologically very divergent, as assessed from principal component scores (see later) (e.g. D. rotundus and G. soricina) while others were very similar (e.g. R. hipposideros and R. ferrumequinum).

Measurement error evaluation

Mesh Distances

The average distances between the 19 paired models were calculated in R software (R Core Team 2018) using the meshDist function in the “Morpho” package (Schlager 2013). This distance is defined as an average of the shortest distances between every triangle of a mesh and the closest triangle of the other (Bærentzen and Henrik 2002). It returns the average distance and a coloured scale model that visually represents the differences between each pair of meshes.

Shape Visualization

The preliminary visual analysis of the shape differences between the specimens was achieved using a Principal Component Analysis (PCA) for the interspecific dataset. We used the variance–covariance matrix of the Procrustes coordinates to extract orthogonal vectors (PCs) that summarise variation within our sample. Shape changes in 3D skulls were visualised by warping the 3D coordinates along the PC axes. This was achieved applying a Thin-Plate-Spline (Bookstein 1989) algorithm on the mean shape of the morphospace. The 3D bat skull with lowest deviation from the mean shape was chosen for the visualisation. This model was warped along the positive and the negative sides of PC axes to display the shape variation in the sample (Drake and Klingenberg 2010).

Error in Geometric Morphometrics

Pearson and Mantel tests were employed to assess the similarity between the centroid size vectors produced by each technique, and their shape coordinates matrices, respectively (Cardini 2014). Procrustes and standard ANOVAs were used to quantify the variance explained by the different techniques for shape and size, respectively. Nested ANOVAs were used to analyse replicate measurements to assess the landmarking error in a subsample of the data (nine species, see above), with repeatability computed using the intraclass correlation coefficient, i.e., among individual-variance divided by within-individual variance components (see Fruciano 2016). The variability of Procrustes variance computed for each triplet of replicate was used as a further indicator of random measurement error within each technique (Marcy et al. 2018). The Procrustes variance, also known as morphological disparity, measures the magnitude of morphological variation for each triplet by technique (Zelditch et al. 2012). Kruskal–Wallis tests were used to compare median Procrustes variances between techniques (greater variation suggests lower precision in landmark identification). Pearson correlation tests between Procrustes variance and centroid size assessed whether errors in landmark identification were greater for smaller specimens.

Error in Evolutionary Analysis

Additional analyses were performed on the interspecific dataset to assess the use of PH-generated data in evolutionary studies.

Phylogenetic trees for the 19 selected species were inferred by Bayesian inference, as implemented in MrBayes version 3.2 (Huelsenbeck and Ronquist 2001). Input data consisted of an alignment of 20,364 base pairs of nuclear and mitochondrial DNA from Shi and Rabosky (2015). The alignment was divided into 29 partitions (for details see Shi and Rabosky 2015) to allow for evolutionary differences between partitions. The GTR + G model was applied to each partition. The MCMC chain was run for 5 million generations, with trees saved every 500 generations and the first 5 × 103 trees discarded as burn-in. The remaining posterior sample of 1000 trees and the 50% majority rule consensus tree was used for subsequent analyses.

The R packages “ape” (Paradis et al. 2004) and “geomorph” (Adams and Otárola-Castillo 2013) were used to test for the presence of evolutionary allometry (Cardini and Polly 2013) in the three datasets using the log10 transformed centroid size as the independent variable and Procrustes shape coordinates as the dependent variable. Phylogenetic generalised least squares (PGLS) analyses with 999 permutations was employed on the three datasets separately to test for the presence of evolutionary allometry after taking the phylogenetic variance–covariance matrix into account, with the phylogeny represented by the Bayesian consensus tree (Adams and Collyer 2015; Rohlf 2007).

The presence of phylogenetic signal (quantified by the K statistic, Blomberg et al. 2003) in the three datasets and the degree of congruence for size and shape (Adams 2014) were also analysed using the consensus tree. The K statistic reflects the degree of congruence between phenotypic data and the phylogeny (Blomberg et al. 2003). Statistical significance of K and its multivariate extension Kmultiv was assessed using randomization (Adams 2014).

To examine whether the same evolutionary conclusions were obtained using different techniques, we computed a series of ANOVAs with morphological data (log10 transformed centroid size and shape coordinates) as the dependent variable and ecological data (log10 transformed frequency peak and log10 transformed bite force) as the independent variable for all species in the study except Pipistrellus nathusii (no data on bite force were available for the species). Frequency peak data were extracted from the literature (Brinkløv et al. 2011; Kalko et al. 1998; Rodríguez-San Pedro and Allendes 2017; Russo and Jones 2002; Siemers et al. 2001; Siemers and Schnitzler 2004). We obtained unpublished (collected by AH) and published bite force data (Aguirre et al. 2002) for these analyses. The same analyses were repeated under a phylogenetic comparative approach using PGLS.

To assess whether the same results were obtained from mixed datasets acquired from the three different 3D reconstruction techniques, we constructed 1000 morphological datasets in which data for each of the nineteen species were randomly selected from one of the three techniques (PH, µCT, LS). Allometry, phylogenetic signal and correlation with bite force and frequency peak (assessed as previously described) were analysed for each dataset using standard and phylogenetic comparative approaches. The mean, standard deviation, minimum and maximum of the parameter distributions were used as statistical descriptors of the variables distributions and were compared to the original results obtained with singular-technique datasets (PH, µCT, LS).

Fruciano et al.’s (2017) approach was used to assess the error due to phylogenetic uncertainty in the evolutionary analyses. The 1000 posterior trees represented the phylogenetic uncertainty in these analyses. Three common evolutionary analyses were performed: assessment of phylogenetic signal, relation between peak frequency and morphological data, relation between bite force and morphological data. For each technique-tree combination we performed the three analyses for both size and shape obtaining a distribution of 1000 estimates for each analysis. ANOVAs were performed on each distribution to assess the variance explained by the both phylogenetic uncertainty and reconstruction technique.

Results

The nineteen models were reconstructed in 3D with the three different technique and the photogrammetric 3D model of Rhinolophus ferrumequinum (MNHN-ZM-MO-1977-58) can be downloaded as an example from Morphosource (model ID = M30222; https://www.morphosource.org).

Mesh Distances

Visual examination of the meshes revealed strong general similarity between the three data sets except in certain specific areas (Fig. S1). There were small distances between the surfaces of the models as shown for Rhinolophus ferrumequinum (Fig. 1b). The average distance between PH and LS models was 0.041 mm, in agreement with that found by Evin et al. (2016) for 5 wolf skulls (0.088 mm) (Table 1). The average distance between the PH and µCT models was 0.054 mm. Finally, the µCT and LS models were extremely similar with an average distance of 0.015 mm (Table 1).

Table 1 Average distances (mm) between the surface of the models

Shape Visualization

The morphospace of the 111 specimens (i.e., 57 models plus 54 replicates) displays the shape variability in the sample (Fig. 2). The first principal component (PC1) explains 40.26% of the total variance while PC2 explains 20.26%. PC1 shows shape variation mainly related to the length of the supra-occipital bone, while PC2 represents variation in palate length (warped skulls in Fig. 2). Samples clearly cluster according to the species/individuals and not to the technique employed. Replicates were also tightly clustered, except for M. capaccinii (which had some cartilage tissue still attached to the bone making landmark identification difficult). Replicate clusters indicated no evidence of explicit random or systematic (i.e., bias) errors: none of the technique showed greater variability relative to the others nor was there evidence of differences in mean positioning due to replicate/technique.

Fig. 2
figure 2

PCA of 57 models (19 specimens × 3 techniques) and 54 replicas (9 specimens × 2 replicas × 3 techniques). Each skull was reconstructed with three different techniques (filled circle LS, filled square PH and filled triangle µCT). For nine specimens we recorded the landmarks three times (1,2,3 inside the symbols): C.pers, Drotu, G.sori, M.emar, M.capa, N.noct, R.hipp, R.ferr and T.teni. The four skull images on the two axes represent the extreme shapes of the morphospace for PC1 and PC2 (species used as reference model for the warping: Plecotus austriacus). Carolia perspicillata (C.pers), Desmodus rotundus (D.rotu), Eptesicus serotinus (E.sero), Glossophaga soricina (G.sori), Hypsugo savii (H.savi), Miniopterus schreibersi (M.schr), Myotis daubentonii (M.daub), Myotis dasycname (M.dasy), Myotis emarginatus (M.emar), Myotis capaccinii (M.capa), Myotis nigricans (M.nigr), Noctilio albiventris (N.albi), Nyctalus noctula (N.noct), Pipistrellus nathusii (P.nath), Pipistrellus pipistrellus (P.pipi), Plecotus austriacus (P.aust), Rhinolophus hipposideros (R.hipp), Rhinolophus ferrumequinum (R.ferr) and Tadarida teniotis (T.teni)

Error in Geometric Morphometrics

Correlations between centroid size vectors obtained from the different models provided coefficients greater than 0.99 for all combinations (PH-LS: R = 0.997, p < 0.001; µCT-PH: R = 0.996, p < 0.001; LS-µCT: R = 0.998, p < 0.001). Similarly, high associations were obtained from Mantel matrix correlations on the Procrustes distances between individual specimens across the techniques (PH-LS: R = 0.988, p < 0.001; µCT-PH: R = 0.988, p < 0.001; LS-µCT: R = 0.992, p < 0.001). Furthermore, the ANOVA test on size showed that 99.67% (p = 0.001) of the variance can be explained by biological differences between specimens, with only 0.14% attributable to the technique (p = 0.001) (Table 2). In terms of shape, 94.52% (p < 0.001) of the shape variance was explained by the specimen differences while only the 0.26% was represented by the different techniques (p = 0.001).

Table 2 (A) ANOVA on size and (B) Procrustes ANOVA on shape for 57 models (19 specimens × 3 techniques) and 54 replicas (9 specimens × 2 replicas × 3 techniques)

The landmarking error represented a small portion of the variance in both size (between-replicate variance: 0.02%, p = 0.999) and shape (between-replicate variance: 2.03%, p = 0.001). The repeatability was 0.99 for size and 0.97 for shape (Table 3a-b).

Table 3 Landmarking error and repeatability for replicas only. (A) ANOVA on size and (B) Procrustes ANOVA on shape for 81 models (9 specimens × 3 replicas × 3 techniques)

The mean Procrustes variance was not statistically different between techniques (p = 0.979) suggesting difficulty in landmark identification is similar between the techniques (Fig. 3a). Correlations between Procrustes variances (for each technique) and centroid size showed no significant associations (PH: R = 0.16, p = 0.683; CT: R = 0.48, p = 0.187; LS: R = 0.052, p = 0.894).

Fig. 3
figure 3

Procrustes variation computed for replicates/technique for the interspecific dataset

Error in Evolutionary Analysis

Comparisons between the three different scanning techniques for all 19 species identified consistent (although non-significant) evolutionary allometry patterns (Table 4). These were validated by PGLS analyses (Table 4). When testing for phylogenetic signal across the three datasets using the consensus tree, we obtained Kmultiv values that were highly significant and close to one (Table 4). The signal was less strong for size but equally significant regardless of the technique (Table 4). The results for the association between morphological data and ecological data (peak frequency and bite force) are reported in Table 4 for each technique, with and without phylogenetic correction, and show a high degree of concordance between techniques.

Table 4 Results of Kmultiv phylogenetic signal and R2 for allometry and correlation with ecological variables for the interspecific dataset

Comparisons of parameter values obtained with the single-techniques (PH, µCT, LS) against our 1000 mixed datasets revealed similar means and standard deviations. Nevertheless, in most of the cases standard deviations were slightly greater for multi-technique datasets (Table S3).

When testing for variation due to the phylogenetic uncertainty and technique error the distributions of parameters estimates displayed similar shapes between techniques but in some cases the technique caused a shift in their location (i.e., allometry, phylogenetic signal for shape and correlation between shape and bite force) (Fig. S2). In particular, means of R2 distributions for allometry differed between each technique (PH = 0.098; µCT = 0.105; LS = 0.099) but standard deviations did not (PH = µCT = LS = 0.004). A similar pattern is observed for the Kmultiv of shape (mean: PH = 0.916, µCT = 0.936, LS = 0.969; standard deviation: PH = 0.024, µCT = 0.026 LS = 0.025) and R2 for correlations between shape and bite force (mean: PH = 0.100, µCT = 0.105, LS = 0.107; standard deviation PH = µCT = LS = 0.004). Nevertheless, the p-values for Kmultiv of shape were smaller than 0.001 for all combinations of trees/techniques. P-values for allometry and shape correlation with bite force equally resulted in coherent non-significant patterns (p > 0.15 in all cases). The ANOVA on the allometry estimates revealed that 36.35% (p < 0.001) of the variance in allometry was explained by the technique employed, while 62.54% (p < 0.001) by the phylogenetic uncertainty. The ANOVA on the phylogenetic signal for size demonstrated that the majority of the variance was due to the phylogenetic uncertainty in the dataset (Table 5). The phylogenetic signal variance for shape is still mainly represented by the phylogenetic uncertainty (55.75%, p < 0.001) but a significant portion of the variance is due to the different technique employed (43.75%, p < 0.001). When the correlation between morphological data and frequency peak is computed the variance due to the technique error is significant but small (size: 1.15%, p < 0.001; shape: 2.04%, p < 0.001). Similar results were obtained for the correlation between bite force and size (0.35%, p < 0.001). Nevertheless, 37.00% of the correlation between bite force and shape was explained by the technique (p < 0.001) and 61.65% was explained by phylogenetic uncertainty (p < 0.001) (Table 5).

Table 5 ANOVAs on parameter estimates for allometry (R2), phylogenetic signal (Kmultiv) for size (PS Size) and shape (PS shape), and correlation (R2) with ecological variables (BF bite force, FP frequency peak) computed by technique and using 1000 trees from the posterior distribution

Discussion

Performance of Photogrammetry Technique

Analyses of mesh distances, shape visualisation (i.e., PCA graphs) and geometric morphometric error in the interspecific dataset demonstrated that PH, µCT and LS provide comparable raw material (i.e., centroid size and Procrustes coordinates) for GMM analyses. This was supported by high correlation coefficients for centroid size and Procrustes coordinates between the techniques and low proportion of variance explained by the techniques for both size and shape. This was in accordance with previous studies of much larger skulls, for example humans (Katz and Friess 2014) and wolves (Evin et al. 2016).

High intraclass correlation coefficients indicated high repeatability and reflected low random measurement error, which suggested that landmarking error was not important for our interspecific dataset. These coefficients (0.97–0.99) were similar to values previously obtained for human skulls (0.99; Badawi-Fayad and Cabanis 2007), kangaroo-size skulls (0.95; Fruciano et al. 2017) and small rodent skulls (0.75; Marcy et al. 2018). No technique-related differences in landmarking difficulties were found, based on Procrustes variance, which contrasts Marcy et al. (2018) finding of systematically better µCT relative to laser scans. This difference might be due to their use of a fast data collection scheme (10 min/sample) without employing additional measures to ensure quality of the models. Alternatively it could be linked to intrinsic differences in the LS and PH devices that were employed.

Experience plays an important role in identification and placement of landmarks (Osis et al. 2015; Sholts et al. 2011) and different approaches can induce different levels of systematic error (Marcy et al. 2018). In the current study, we did not specifically test for operator bias as previous studies reported inter-operator error being similar across different techniques (Robinson and Terhune 2017).

We also showed that centroid size and Procrustes coordinates extracted from PH models are suitable for subsequent macro-evolutionary analyses such as size-shape correlations (allometry), calculation of phylogenetic signal and correlation between morphological (size and shape) and ecological (frequency peak and bite force) data: parameters estimates were similar among techniques even when accounted for phylogenetic relatedness. All methods lead to the same biological interpretation, further confirming that PH provides suitable raw data for evolutionary analysis.

PH has several advantages in addition to being affordable and easy to use. It is particularly suitable when access to more expensive equipment is limited, where specimens cannot easily be transported, and/or where data collection has to take place in a remote location. Nevertheless, a significant down-side was the lack of detail achieved for teeth reconstruction and difficulties in reproducing thin structures (such as the zygomatic arch). Future studies may explore the use of focused stacking techniques in order to achieve a greater level of detail (Brecko et al. 2014; Nguyen et al. 2014; Santella and Milner 2017).

Mixed Data from Different Reconstruction Techniques

Our examination of multi-technique datasets revealed increases in standard deviations for allometry, phylogenetic signal and correlation with ecological variables compared with single-technique datasets, although this had no impact on the biological interpretation of the results. This suggests that multi-technique datasets could be potentially used (with caution and following exploratory studies), at least for interspecific analysis as long as the use of different techniques was relatively balanced across different groups (such as species, populations or sex). Mixing data from different devices is not recommended when researchers suspect a relatively small portion of biological variance in the sample (e.g. in populational studies).

When the same analyses were performed using the set of posterior trees, the interaction between phylogenetic uncertainty and technique became significant. However, the amount of parameter variation was relatively small and mainly due to the phylogenetic variation rather than technique error. Also, the general biological conclusions are essentially the same for almost all analyses (i.e., degrees of allometry and phylogenetic signal for size and variance explained by ecological variables). For instance, under the different techniques, bite force predicts between 8.85 and 11.94% of the shape variance supporting the inference that bite force moderately influences the shape evolution in bats. Fruciano et al. (2017) has pointed out that the phylogenetic signal in shape (as reflected by K statistics) is strongly influenced by both phylogenetic uncertainty and technique. In our sample, Kmultiv varies from 0.85 to 1.05 between techniques which would lead to different evolutionary conclusions (Adams 2014; Blomberg et al. 2003), but the significance of K is unaffected. Revell et al. (2008) noted that K is indicative of statistical dependence between traits and phylogenetic relatedness but no inference on evolutionary rate and mode of evolution should be drawn from its value alone. Therefore, while we suggest that researchers should be cautious about inferring biological meaning from the magnitude of K for shape on mixed technique datasets, its significance can provide a reliable indicator of the presence of phylogenetic signal.

In conclusion, combining data acquired from model reconstructed with different techniques inevitably introduces an additional source of error. Its impact needs to be assessed according to whether it has an effect on the biological conclusions. Phylogenetic uncertainty can interact with other source of error (e.g. technique employed) suggesting preliminary test on phylogenetic comparative analysis are essential to identify possible not negligible source of error.