Introduction

Taxonomic sufficiency is the practice of defining appropriate levels of taxonomic resolution for biological assemblages investigated in biomonitoring and biodiversity studies (Ellis 1985; Ferraro and Cole 1992). Taxonomic sufficiency has been debated extensively for invertebrate biomonitoring studies in aquatic systems (Bowman and Bailey 1997; Bailey et al. 2001; Lenant and Resh 2001; Heino and Soininen 2007; Jones 2008; Jiang et al. 2013). More recently, this issue has also been explored for terrestrial systems (Timms et al. 2013) where the ability to identify specimens to the genus or species level is not routinely inhibited by the collection of larval specimens. The concept of taxonomic sufficiency is necessary to overcome key obstacles in the processing of field samples: (1) overwhelming amounts of biological material to process, generally with limited resources dedicated to proper processing and archiving and (2) a shortage of adequately trained taxonomists for many diverse groups, especially insects and other invertebrates (Cardoso et al. 2011). The level of skill, training and experience varies among taxonomists. Initiatives such as the Society for Freshwater Science taxonomic certification program (http://www.nabstcp.com) provide some accreditation and standardization at least among aquatic invertebrate taxonomists for particular taxonomic groups; however, variability in expertise will undoubtedly persist. Taxonomic sufficiency is believed to ameliorate these issues by identifying levels of taxonomic identification that efficiently and effectively use limited resources and are still appropriate for the individual study objectives (Bailey et al. 2001).

Defining an optimal taxonomic level for each major taxon likely to occur in a sample may result in mixed taxonomic levels being used to calculate sample metrics related to biodiversity or ecological conditions (Carter and Resh 2001; Jones 2008; Jiang et al. 2013). The lowest practical taxonomic level for each taxon may depend on its constituent diversity: the number of higher resolution taxa (genera or species) belonging to a family or genus (Holzenthal et al. 2010; Monk et al. 2012). Thus, a taxon with a greater number of constituent taxa may require a more detailed taxonomic description (Holzenthal et al. 2010). Using mixed taxonomic levels in analyses may not be ideal for biodiversity studies, but establishing consistent taxonomic effort for each class, order or family can provide standardization for biomonitoring programs (e.g., Environment Canada 2012a). How to proceed with a specimen that cannot be identified to the specified level, however, needs to be resolved.

Traditional evaluations of taxonomic sufficiency do not fully address the challenge of encountering specimens that are not suitable for traditional taxonomic analysis due to developmental stage because of size (age)-dependent taxonomy. A variable proportion of each biomonitoring or biodiversity sample is comprised of specimens too small or of insufficient developmental condition to achieve the ideal level of taxonomic identification. Yet, this property of biomonitoring data is often ignored, even when quality control issues are being investigated (e.g., Haase et al. 2006; Mueller et al. 2013).

Individual studies and programs need to resolve this complication within their sample processing methodology. A common practice is to identify these smallest specimens to the lowest ‘practical’ taxonomic level (Carter and Resh 2001). This solution can unintentionally lead to size-biased samples because larger specimens at more mature developmental stages are more easily identified to a lower taxonomic level. Groups of smaller specimens remain at coarse levels of taxonomic identification, creating an inverse relationship between size and level of taxonomic identification. While analytical approaches have been proposed to resolve apparent discrepancies in the taxonomic data (e.g., Cuffney et al. 2007; Mueller et al. 2013), there is still uncertainty over the degree of occurrence within samples and the consequent ecological implications. Moreover, these analytical approaches do not address the underlying cause of the problem. Understanding the properties of the bias in these samples may obviate the need to artificially resolve these discrepancies.

By assessing two relevant morphological traits (body size and body shape) of individual specimens collected in standardized biomonitoring samples, we can explore the variation and trends that occur in the smaller-sized fraction. We address two objectives for this investigation. First, we use measured body length and level of taxonomic resolution achieved for each specimen to demonstrate a systematic size bias in the description of benthic macroinvertebrates from four orders of aquatic insects (Ephemeroptera, Plecoptera, Trichoptera, and Odonata) often targeted by biomonitoring programs. Second, we apply geometric morphometric techniques to illustrate variation in the body shape of specimens identified to each taxonomic level (family or genus level), within the same four insect orders. We predict that specimens that can only be reliably identified to the family level will be smaller than specimens identified to the genus level and have greater variation in shape because these smaller-sized specimens will represent more than one taxonomic group (species and/or genera). Thus, we demonstrate how two morphological traits provide complementary evidence supporting a systematic bias in the context of routine biomonitoring sample analysis and may also provide alternative, independent data for biomonitoring and biodiversity metrics.

Materials and methods

Collection sites

Sites were located along two tributaries in the Miramichi River basin (New Brunswick, Canada) and represent variable flow and substrate conditions. All sites were categorized as reference or near reference (least impacted) in 2010 according to criteria established by the Canadian Aquatic Biomonitoring Network (CABIN; Environment Canada 2012b). Samples were obtained from the South Branch Renous River (SBREN; 46.79287°N, 66.48058°W) and two locations on the Dungarvon River. Dungarvon mid-stream (DUNMR; 46.070777°N, 66.5686°W) was taken approximately 23.4 km upstream of Dungarvon downstream (DUNDS; 46.81393°N, 65.91795°W).

Benthic sampling

Each benthic macroinvertebrate sample was collected using the standard CABIN protocol (Environment Canada 2012b). A three-minute traveling kick-net (mesh size 400 μm) procedure was used to collect each sample. Kick-net samples provide an integrated sample across the primary microhabitats present and are adequate for characterizing the benthic macroinvertebrate assemblage in each reach. Samples were collected on November 2, 2007 into 10 % buffered Formalin and transferred to 70 % ethanol after approximately 24–48 h.

Specimen processing

We extracted our target taxa (Ephemeroptera ‘E’, Plecoptera ‘P’, Trichoptera ‘T’ and Odonata ‘O’) from each sample. The entire sample was used to ensure adequate material and to prevent any unintentional size bias as a result of subsampling. Genus level is the CABIN program standard of taxonomic effort for all aquatic insect taxa (Environment Canada 2012a) collected at reference sites. Therefore, genus-level identification was attempted for each individual specimen (Merritt et al. 2008; Leica Wild M3C, Wetzlar Germany, 10X). If the individual could not be reliably identified to genus (e.g., lacking sufficient gill development, unable to distinguish labial characters or ambiguous setae in some Ephemeroptera and Plecoptera genera), the specimen was retained at the family level. All specimens were given equal treatment, and the author’s (J.M.O.) identifications were confirmed by a second certified taxonomist to ensure quality and prevent intentional bias. Following identification and quality control, each specimen was digitally photographed using a stereomicroscope (Leica Mz 16 A, Wetzlar, Germany; Q Imaging MicroPublisher 5.0 RTV, Surrey, British Columbia, Canada attached with a Leica 10446261 0.63 × extension tube, Wetzlar, Germany).

Size

Total body length, measured as the distance from the anterior margin of the head to the posterior tip of the last abdominal segment, was our indicator of body size. Calibrated digital photographs were used to measure size using AutoMontage Pro software (Syncroscopy, Synoptics Ltd., Cambridge, UK).

Shape

Geometric morphometrics was used to describe larval aquatic insect shape independent of size by evaluating the configuration of a consistent set of landmark positions among broad taxonomic groups (Zelditch et al. 2004; Claude 2008). Landmark locations (Fig. 1; Table 1) were selected separately for hemimetabolous (EPO) versus holometabolous (T) taxonomic orders due to their strongly divergent morphologies; however, within these categories, the same landmarks were applied to all specimens. Several type 2 (maxima, minima or endpoints of a structure) and type 3 (extremal points of morphological structures relative to other features) landmarks (Bookstein 1991; Zelditch et al. 2004) were selected to adequately describe general patterns in a taxonomically diverse aquatic insect assemblage. We identified 15 landmarks and 1 sliding landmark (used to ‘unbend’ distorted specimens) for a total of 16 landmarks for EPO (Fig. 1a) and 11 landmarks plus a series of pseudo-landmarks for the Trichoptera (Fig. 1b). Pseudolandmarks were used to correct for the abdomen curvature of some Trichoptera specimens (e.g., Hydropsychidae; Fig. 1b). Each body region (head, thorax, and abdomen) was digitized separately in the R programing environment (Urbanek 2011; R Development Core Team 2012; R Studio 2012) and reassembled prior to analysis to reduce the influence of non-shape variation due to photography (Adams and Rohlf 2000). A Procrustes analysis (superimposition) of the digitized landmark coordinates was performed in tpsSplin (life.bio.sunysb.edu/morph/index.html) to eliminate the effects of non-shape variation due to rotation of the specimen, translation or position of the specimen and the size or scaling of the specimen in the photograph (Zelditch et al. 2004). These standardized coordinate values were used to calculate a weight matrix composed of partial warp scores (non-uniform, non-affine shape components) and uniform, affine shape components for each pair of landmark coordinates using tpsRelw (life.bio.sunysb.edu/morph/index.html). The resulting weight matrix provides the shape variables appropriate for statistical analysis (Zelditch et al. 2004).

Fig. 1
figure 1

a Dorsal view of a stonefly larva (Perlidae) showing position of 16 landmarks used to define the shape of Ephemeroptera, Plecoptera and Odonata specimens. b Lateral view of a caddisfly larva (Hydropsychidae) showing position of 11 landmarks and relative position of pseudo-landmarks used to define the shape of Trichoptera specimens. Position details included in Table 1

Table 1 Morphological landmarks used to characterize larval aquatic insect shape for four orders (Ephemeroptera, Plecoptera, Odonata, and Trichoptera)

Data analysis

Data analysis was performed using the R programming environment (R Development Core Team 2012; R Studio 2012). A two-tailed t test on logarithmically transformed body-length data was used for size comparisons of genus versus family-level specimens. Eight abundant families comprised of several (2–4) genera were tested. A MANOVA (Pillai’s trace) was performed on the weight matrix for shape comparison between the specimens identified at the genus level or family level for the same eight abundant families.

Results

For all analyses, identification was dependent on the properties of each individual larval insect specimen; therefore, specimens within the same family were identified to either the genus level or retained at the family level (i.e., the level of taxonomic resolution was not predetermined based on specimen size). Sample processing resulted in 4,714 individual specimens among 64 EPTO taxa (families and genera; Online Resource 1). We identified 2,305 specimens to the genus level with the remaining 51 % (2,409 specimens) identified to only the family level (Online Resource 1). In only a few cases, limited to several North American mono-generic insect families, could all the individuals of a family be identified to the genus level (e.g., Isonychiidae, Isonychia; Helicopsychidae, Helicopsyche; and Rhyacophilidae, Rhyacophila), for all other families a combination of genus and family-level taxonomic resolution resulted.

Systematic size bias in specimens identified to family versus genus level

The specimens obtained in our biomonitoring samples exhibited a high degree of variability in body size at both genus and family levels (Online Resource 1). Almost a quarter (23 %) of taxa observed possessed a range of specimen size values exceeding one or more orders of magnitude (Online Resource 1). Our objective was to examine if this variation was skewed between family and genus-level specimens.

A general trend was observed for many families possessing multiple genera: the smallest specimens tended to be classified at the family level (Fig. 2). We tested this pattern using two-tailed t tests to compare the mean size of specimens identified only to the family level to specimens identified at the genus level (pooled across two to four genera for each family). Sufficient data were available to test for potential size bias in eight families representing each of the four orders surveyed. Body size data were logarithmically transformed prior to analysis. The largest specimens were identified to a finer taxonomic resolution (genus level) for six of the eight families compared (Fig. 3). The size discrepancy between family and genus-level median size (logarithmically transformed mean) was most pronounced in the Trichopteran family, Leptoceridae (Fig. 3), which also possessed the smallest individuals observed in our study (Online Resource 1). Two families, Brachycentridae (Trichoptera) and Chloroperlidae (Plecoptera), possessed genus and family-level specimens of similar size (Fig. 3).

Fig. 2
figure 2

Boxplot depicting body size variation among 52 aquatic insect EPTO family units arranged according to order and family. Family total is included to illustrate the variation for all specimens belonging to a family. Boxes display the median with upper and lower quartiles of the size distribution for each taxon and are arranged on a log 10 scale. The body sizes of specimens among our three samples spanned two orders of magnitude, and this range can separate the smallest and largest specimens within a single family

Fig. 3
figure 3

Size comparison between specimens identified to the family or genus level for eight abundant aquatic insect EPTO families each possessing several genera. Smaller specimens were retained at the family level for the majority of taxa (6/8). Results for corresponding two-tailed t test reported with symbol Asterisk indicating p value <0.01

Variation in body shape of specimens identified to family compared with genus level

We used a geometric morphometric approach to examine variation in body shape among family and genus-level specimens. Shape variables for each taxonomic unit (family or genus) were compared within families using MANOVA. Body shapes of genus-level specimens were unique and differed significantly from both the family-only group and the overall shape for the family (Table 2). Variation in Trichoptera families (Brachycentridae, Hydropsychidae, and Leptoceridae) occurred primarily in the length and angle of the head with additional variation in the length of thoracic segments (Fig. 4). The head and thorax region were variable among genera in the other families examined, especially the Ephemeropteran families Ephemerellidae and Heptageniidae (Fig. 4). Abdomen shape, however, particularly the landmark indicating the widest point of the abdomen, also showed higher levels of variation between genera belonging to the same family (e.g., Gomphidae and Perlidae; Fig. 4). A qualitative comparison of the variability in the location of each landmark based on the variance ellipses suggest that specimens identified only to the family level possessed higher within group variation than either the individual genera or all specimens combined (Fig. 4).

Table 2 Geometric morphometric shape compared between EPTO specimens identified to genus or family level from 3 biomonitoring samples collect in the Miramichi River Basin on 2 November 2007
Fig. 4
figure 4

Diagrams of morphometric landmark outlines to illustrate shape and variability of eight abundant aquatic insect EPTO families each comprised of several genera. Each diagram shows half of the body outlined by the location of the landmarks. Numbers indicate the location of landmark positions illustrated in Fig. 2. Panels are arranged in three columns: all specimens, family-only specimens, and genus-level specimens. Eight rows correspond to the eight families in the analysis. The ellipses illustrate variation in landmark position for all individuals in each taxon as noted in each panel. Body shape outlines differ between genera, genera and family-only specimens and all specimen shapes. Variation in landmark position is greater for family-only specimens than for all specimens combined and could represent a combination of multiple species or genera in this small size class

Discussion

Taxonomic sufficiency is just one aspect of biomonitoring and biodiversity studies that have been evaluated, compared and debated in the literature. Largely absent from this conversation has been any discussion of how to treat the substantial number of specimens too small to properly identify that are often collected in bulk samples for biodiversity assessment and biomonitoring programs. Our study examined the morphological traits body size and body shape of a representative group of organisms in our aquatic biomonitoring samples in order to verify or refute systematic bias due to size-dependent taxonomy. The level of taxonomic resolution achieved given a specimen’s size was used to describe the degree of size bias for the biomonitoring sample and for individual taxonomic groups. The variability in shape information provided a means of evaluating the potential taxonomic composition of the smaller specimens that could not be reliably identified.

According to the standard biomonitoring methodology employed, our biological samples were collected in autumn, which maximizes the likelihood of capturing mature specimens of North Temperate insect taxa (Environment Canada 2012b). Despite this, less than half of the specimens possessed distinguishable characters permitting genus-level identification, creating mixtures of genus and family-level identifications for most of the aquatic insect families in our biomonitoring samples. While the phenology of certain taxa may change over the year, the issue of size-dependent taxonomy is independent of season as differential growth, overlapping cohorts, differences in voltinism, and emergence patterns will contribute smaller forms during every season (Huryn and Wallace 2000). Thus, optimizing the timing of sampling clearly cannot eliminate the collection of immature specimens, which frustrate genus-level identification. Evaluation of the seasonality component for biomonitoring protocol development is confounded by the size-bias problem, since exploring seasonal patterns in taxa (either at the population or the community scale) is itself severely constrained by size-dependent taxonomy.

Measurement of individual specimens reveals a wide range of body sizes at the assemblage scale, but also a high degree of variability in size within individual taxonomic groups. High natural variability within a population may be due to both biotic and abiotic factors. The net effect of physical characteristics, such as: flow regime, temperature, and type of substrate for each stream site used in our study, may promote or retard growth and development of individuals or populations (Ward 1992). These physical constraints coupled with biological properties of the organisms, including growth rate and dispersal (Huryn and Wallace 2000), will generate mixtures of sizes, even of the same taxon, collected for bulk biomonitoring samples using various devices and methodologies. High variability in traits like size may be ecologically informative, but pose a significant obstacle to taxonomists.

We did observe a size bias for six out of the eight taxa with sufficient numbers to examine in detail for this study. Smaller taxa were retained at the family level, while larger, presumably more developed specimens, were able to be identified at the genus level. Size bias may be more problematic for particular taxa. Some genera may have distinctive features that allow accurate diagnosis even when larvae are smaller and less mature. While size of a specimen may aid in identification, the quality of the specimen (e.g., damaged or missing features) may ultimately determine whether a specimen can be identified to the desired level (Carter and Resh 2001). Our results do suggest that a size bias is present. Although, this bias may not be observed to the same degree in all taxa. Therefore, the implications and risks associated with this bias should be investigated further.

By applying the geometric morphometric approach, we observed significant differences in shape among genera within seven of the eight families in our detailed analysis. Variation was detected in each body region (head, thorax and abdomen) with deflections in particular landmarks or entire regions being useful for recognizing some taxa. Dominant shape characteristics of each family were visible when the shape was averaged among all specimens in the family, including those identified to genus. Each genus shares some of the same features with the family average, but genera can still be distinguished by the unique locations of specific landmarks.

As predicted, we detected significant variation in body shape for specimens retained at the family level. Higher variability in shape information for this smaller-sized fraction indicates the presence of multiple species or genera within this portion of the biomonitoring sample, which may or may not correspond to the taxonomic records for the specimens of larger size. Thus, geometric morphometric techniques were useful as a coarse indication of the taxonomic variability within the smaller-sized fraction and falsifies the assumption that the smallest individuals in the sample are all taxonomically equivalent.

Applications of geometric morphometrics as a proxy for genus determination for smaller specimens may be possible as recognition of these shapes could be useful in taxonomic identification. In fact, automated photographic identification methods are already being developed (e.g., Lytle et al. 2010). Non-destructive methods like geometric morphometric and automated imaged-based programs may support the quickly advancing field of DNA barcoding used for specimen identification (Hajibabaei et al. 2011). As these technologies develop, become more practical, and less expensive, the combination of approaches may ameliorate some of the uncertainty in metrics calculated and conclusions drawn from size-biased biomonitoring samples.

Grouping the smallest specimens in a sample at the family or even order level can have unpredictable consequences for the calculation of common metrics and parameters used in biodiversity or biomonitoring assessments. We describe how the characteristics of the sized-biased portion of a biomonitoring sample may alter several common community metrics, including richness, evenness, α diversity, β diversity, trait states, and functional diversity.

Richness, evenness, α-diversity

Richness, evenness and α diversity are fundamental metrics calculated by most biomonitoring programs and are likely to be influenced in similar ways by the omission of information on the small size fraction of the biomonitoring sample. Richness is a direct measure of the number of different taxa present in a sample (Magurran 2004; Magurran and McGill 2011) or the numbers of higher taxonomic levels present within an order or family (e.g., EPT richness). Evenness is the parameter that evaluates how the specimens are distributed among the taxa present in the sample (Magurran 2004; Magurran and McGill 2011). Diversity is an amalgamated assessment of both richness and evenness measures and can be calculated according to several different formulae (Magurran 2004; Magurran and McGill 2011). Each of these metrics may over or under-estimate the properties of the sample depending on the characteristics of the size-biased fraction. The size-biased fraction may contain only smaller individuals of the same taxonomic groups (genera) that have already been reported in the sample and in a similar proportion resulting in no net change in the basic diversity measure. However, the smaller-sized fraction may contain taxa that have not already been identified in the sample and, given that invertebrate populations can vary widely both in numbers and distribution (Gaston and Lawton 1988) sufficiently high numbers of additional specimens may skew the abundance of both reported and unreported taxa. In this case, richness, evenness and subsequently α-diversity would be inaccurate underestimates of the biomonitoring sample diversity measures.

β-Diversity

A primary goal of biomonitoring programs is to determine the quality or status of a site by comparing taxonomic composition between sites using some measure of β-diversity: the turnover or change in the community composition between sites (Jurasinski et al. 2009). Several definitions and calculations apply for β-diversity determination (e.g., Anderson et al. 2006; Jost et al. 2010; Tuomisto 2010a, b); however, each of these approaches could be influenced by the description of the composition of the size-biased portion of standard biomonitoring samples. Additional taxon records may result in higher or lower similarity among sites depending on the biogeographical setting and taxonomic breadth of the sampling. The omission of this information could increase or decrease site similarity depending on taxonomic identity and proportion of each taxon in the size-biased fraction. Inaccurate calculation of β-diversity parameters may obscure early signals of degradation or indicate a change that has not occurred potentially leading to misappropriated time and resources for conservation or restoration practices.

Functional diversity and trait states

Descriptions of functional diversity incorporate elements of both taxonomic and trait composition (Petchey et al. 2009) and may be sensitive to changes in either component. Traits are measurable, heritable characteristics of individuals that are linked to organism fitness (McGill et al. 2006). Organisms recorded in biomonitoring samples can be categorized into various trait states (e.g., Poff et al. 2006; Horrigan and Baird 2008; Culp et al. 2011). Trait states represent broad categories and provide a coarse description of community structure, but one that is more easily related to ecological function (Culp et al. 2011). Trait metrics, such as trait richness and trait diversity will generally have a lower magnitude than the same metrics calculated for the community based on taxonomic data, but provide a useful means of community trait comparison (e.g., Finn and Poff 2005; Bonada et al. 2007; Dolédec et al. 2011). Trait metrics may differ in sensitivity to the trait state records or trait state abundance derived from the biomonitoring sample. Often, genera within a family share characteristics that are phylogenetically linked (Poff et al. 2006); therefore, simply adding additional genus records may not increase the trait states present in the sample. Genera may not always share the same trait characteristics, however, so failure to observe these hidden trait states may lead to errors in the descriptions of trait patterns at the community level and lead to an under-estimate for functional diversity since each additional taxonomic or trait unit could potentially contribute to a higher value of functional diversity.

Thus, the individual effects of reporting a sized-biased sample can alter the overall assessment of diversity resulting in either a low risk of over-estimates or a high risk of under-estimates. The basic assumption regarding the smallest specimens in a sample is that they are a direct, proportional match for the rest of the specimens. Our results demonstrate that this assumption can easily be falsified, and there is little ecological justification for it based on studies of stream-insect larval community dynamics (e.g., Elliott 1967). Reasonable and reliable natural resource management decisions depend on accurate diversity data.

Conclusions

Valuable information may be gained from an examination of the smallest sized fraction of a sample depending on selectivity of the collection method and the study or monitoring objectives. Accurate calculation of biomonitoring parameters may depend on the composition of the size-biased fraction and can have significant implications for comparing biodiversity, designing effective biomonitoring programs and managing ecosystems. The development of standards or thresholds of size for the attainment of higher or desired level of taxonomic resolution for specimens within different taxonomic groups could ensure consistent taxonomic effort and enable accurate data comparisons by taxonomists and biomonitoring programs. This type of information is available in particular taxonomic treatments (e.g., Trichoptera, Wiggins 1996), but is largely absent for many taxonomic groups. Once developed, these general or taxon-specific standards could be evaluated by expert taxonomists for different taxonomic groups and tested by practitioners and other professionals.

Trait data, such as body size, may be a useful complement or alternative to traditional taxonomic data as information on many traits can be easily aggregated from direct observations or measurements of individuals irrespective of size, or can be retrieved from trait databases (e.g., Tachet et al. 1991; Vieira et al. 2006). Meta-analyses, detailed studies or simulations, like those used for taxonomic data (e.g., Cuffney et al. 2007), could be used to optimize biomonitoring data-processing procedures for taxonomic and trait data (e.g., Mueller et al. 2013).

Studies of taxonomic sufficiency provide scientific evidence and practical commentary on a challenging topic that has important implications for ecological assessment, natural resource use, and conservation. A similar assessment of the costs, benefits, and risks regarding the treatment and analysis of the smaller-sized fraction of our samples is required. Alternative assessment tools including trait measurement approaches (Culp et al. 2011; Mueller et al. 2013) and new technology such as DNA-based taxonomy (Baird and Hajibabaei 2012) may improve our handling and analysis of small specimens, and thus, our taxonomic standards and data requirements may need to shift accordingly as these tools become more available and affordable. Our study supports the goal of finding the most efficient means to acquire data of the highest quality for biodiversity and biomonitoring research.