Introduction

Animals’ mouthpart morphologies relate to their diet: variation in mouthpart shape and size across similarly feeding species determines which species exploits which part of a diet range if potential food is heterogeneous and mouthpart morphology associates with differences in exploitation efficiency. Morphological variance within foraging guilds may result in specialization and niche segregation (Freed et al. 1987; Conant 1988; Pratt 2005). Similarly, within-population variation results in differences in food exploitation and may also induce character displacement, and ultimately, speciation (Grant and Grant 2003, 2014). Evidence for these effects was found in a wide range of taxa (fishes: Sampaio et al. 2013; amphibians: Amanat Behbahani et al. 2014; lizards: Brecko et al. 2008; birds: Herrel et al. 2005). In insects, the mouthparts’ structural adaptation to diet is striking, e.g., when comparing blood- vs. nectar-feeding flies (Karolyi et al. 2014), or fruit-piercing moths vs. nectar-feeding Lepidoptera (Srivastava and Bogawat 1969; Ramkumar et al. 2010).

In insect pollinators, mouthpart length in conjunction with corolla length variation may impact the choice between flowers and influence feeding efficiency, resource partitioning, and pollination. Hence, to understand interactions between plants and their pollinators, investigating morphological variation in nectarivorous insect mouthparts is essential (Inouye 1980; Harder 1985; Johnson 1986; Krenn et al. 2005; Borrell and Krenn 2006; Pauw et al. 2009; Stang et al. 2009; Haverkamp et al. 2016; Szigeti et al. 2020). Kearns and Inouye (1993) suggested that the most interesting measurable traits among morphological characters in flower-visiting insects is mouthpart length. On the one hand, nectarivores with short mouthparts are excluded from deep flowers due to size incompatibility. On the other hand, species with long mouthparts may be excluded from shallow flowers, due to high nectar viscosity, since viscous liquids require more strength to imbibe through longer tubes (Johnson 1986; Kim et al. 2011; Haverkamp et al. 2016). Nevertheless, generally used methodology on how to measure mouthpart length in pollinators is not available, especially for measuring live specimens, and recommendations for reliable measurements are still scarce (e.g., Harder 1982 for bees; similarly, widely accepted methods for investigating terrestrial arthropod morphology were missing (Moretti et al. 2017)). In contrast, several papers include well-detailed protocols that may be used as a sound basis of a general methodology (see, e.g., Krenn et al. 2001; Bauder et al. 2014; Cariveau et al. 2016; Düster et al. 2018). According to Kearns and Inouye (1993), proboscis length measurement seems to be relatively easy in insect pollinators. In fact there are a variety of procedures available. These require different amounts of research investment, likely yielding different results.

Documenting and measuring different anatomical traits, such as size and shape have been important parts of natural history ever since its early students (see, e.g., Aristotle 2004; Richardson et al. 1831). Behavioural biologists, ecologists and taxonomists use a broad spectrum of morphological methods, they investigate organisms from different perspectives, thus morphometrics is not a coherent discipline (Daly 1985; Wipfler et al. 2016). The number of studies quantitatively investigating anatomical shapes is still increasing and morphometric methods will probably remain important techniques in the near future (Adams and Otárola-Castillo 2013; Wipfler et al. 2016). Recently, a vast range of studies applied morphometric methods, e.g., for classifying taxa (e.g., Peruzzi and Passalacqua 2008; Görföl et al. 2014; Csősz and Fisher 2015); revealing morphological changes at different circumstances (e.g., Langerhans et al. 2004; Kaliontzopoulou et al. 2010); looking for relationships between body size and reproductive success (e.g., Berger et al. 2008; Réale et al. 2009; De León et al. 2012); investigating tiny surface structures for understanding functional mechanisms (Wipfler et al. 2016; Xue et al. 2016); and developing new measurements or analysis techniques (Adams and Otárola-Castillo 2013; Bánszegi et al. 2014; Csősz and Fisher 2015; Stec et al. 2016). The range of devices and methods has been rapidly increasing (Muñoz-Muñoz and Perpiñán 2010).

Data quality is a central concern for researchers (Garamszegi et al. 2009), and is determined by the characteristics of the measurements applied, including the practice and experience of the person performing the measurements. Although comprehensive norms and rules for precise measurement in morphometrics exist (Walther and Moore 2005; Muñoz-Muñoz and Perpiñán 2010; Van Hook et al. 2012; Stec et al. 2016; Moretti et al. 2017), Ioannidis (2005) suggests that a large part of the studies lack high measurement accuracy. The quality of morphological data depends on preparation and measurement techniques (Arnqvist and Mårtensson 1998), and the followings are worth considering before taking measurements. First, different treatments and techniques during specimen preparation likely yield different results, e.g., dried specimens lose their water content, thus their flexibility, and may become contracted to some degree compared to fresh individuals (Kearns and Inouye 1993; von Schiller and Solimini 2005; Muñoz-Muñoz and Perpiñán 2010; Knapp 2012; Van Hook et al. 2012; Moretti et al. 2017). Second, the morphological landmarks should be undoubtedly homologous for all measured individuals and/or species. Landmarks should be easily detectable and measurable, and similar across studies, to acquire repeatable measurements (Daly 1985; Kouchi et al. 1999; Zelditch et al. 2004; Van Hook et al. 2012). However, accurate landmark definition is challenging in many cases, e.g., when the measured structures are flexible (Yezerinac et al. 1992; Muñoz-Muñoz and Perpiñán 2010; Moretti et al. 2017). Third, the quality of the results depends on measurement resolution, accuracy and precision (i.e., device quality and adequacy) (Ulijaszek and Kerr 1999; Zelditch et al. 2004; Walther and Moore 2005; Harris and Smith 2009; Wolak et al. 2012). The potential error of the device and the skills of the measuring person limit measurement repeatability (Kearns and Inouye 1993; Zelditch et al. 2004; Blackwell et al. 2006; Van Hook et al. 2012; García-Barros 2015). Furthermore, as measurement error increases, the chance to fail to detect biologically relevant differences among the investigated groups is also increasing (Yezerinac et al. 1992). Fourth, the power of analyses depends on sample size (Batterham and Atkinson 2005; Van Hook et al. 2012; Cardini et al. 2015; Stec et al. 2016). Researchers’ choice of sample size depends on the aim of the study, the population variability in the target variables, the effect size of interest and the confidence level needed (Van Hook et al. 2012; Cardini et al. 2015; Moretti et al. 2017). Sample size may be constrained by limited sampling opportunities or the number of available specimens, as well as by ethical issues. If researchers sample only a small part of a population, the potential error of measurement will increase considerably, even in case of random sampling, and in field ecology, true random sampling is nearly impossible. Fifth, if scientists are working with living organisms, they should take into account ethical considerations (Farnsworth and Rosovsky 1993; Costello et al. 2016; Fischer and Larson 2019). In small natural populations, removing specimens for measurements may alter population structure, thus collecting sufficient data to estimate population distribution using dead specimens may severely harm the population or is simply not feasible (Invertebrate Link (JCCBI) 2002). These not only constrain sample size but make some desired measurements to be avoided and the development of new measurement protocols mandatory (Moretti et al. 2017).

Our aim was to review and reveal the available proboscis length measurement methodologies for butterflies and moths (Lepidoptera). Glossatan Lepidoptera have long proboscides, specialised mouthparts evolved as an adaptation to imbibe floral nectar as a primary food resource at the adult stage in most species (Krenn 2000, 2010; Erhardt and Mevi-Schütz 2009; Bauder et al. 2011). Nectar consumption affects lifespan and fecundity (O’Brien et al. 2004; Cahenzli and Erhardt 2013), and butterflies may choose the most rewarding among the available nectar plant species. This may ultimately result in resource partitioning and evolution (Erhardt and Mevi-Schütz 2009; Thomas and Schultz 2016). Some species consume other resources, such as pollen, fruit and plant sap, mud and excrement, whereas several species do not feed as an imago (Erhardt and Mevi-Schütz 2009). Lepidopteran proboscis is an ideal study organ to address plant-pollinator morphological compatibilities, since its length may be highly variable within a single population (Szigeti et al. 2020) and is an important predictor of resource-use (Krenn 2000; Bauder et al. 2011). Here we present a methodological review on proboscis length measurements and we hope it can facilitate further mouthpart studies. Our focus is on how the authors performed measurements of lepidopteran proboscides, how accurate the measurements were, and how were these constrained by sampling effort. We also highlight challenges in measuring proboscis length, and we provide recommendations for future sampling, taking into account the five important points for appropriate measurements listed above.

Methods

Data sources

To review studies measuring proboscis length in Lepidoptera, we searched for research papers upon three groups of search terms: (1) “funct*”, “length”, “morpho*”, “size”; (2) “galea”, “mouthpart”, “mouth-part”, “proboscis”, “tongue”; and (3) “butterfly”, “lepidoptera”, “moth”. We used “and” operators between groups, “or” operator between keywords within groups and “*” denotes wildcards. We used the databases ISI Web of Science and Scopus, accessed on 2020-06-04. We found 420 papers and we selected 114, those presenting their own measurements of the total length of lepidopteran proboscis. We found 6 further papers by browsing the Internet and 15 from other articles' reference lists. We included only research articles, we did not use books, book sections, or theses. All-together, we used 135 research articles, 126 were in English, 5 German, 2 French, 1 Portuguese and 1 Spanish (see references of the reviewed studies: Reference list S1).

Extracted variables

We categorised the reviewed studies according to (I) the aim of the proboscis length measurement, (II) the method of specimen preparation, and (III) the method of proboscis length measurement (see raw data: Table S1).

If the title and the abstract were available in English, we counted the number of the important keywords (“galea”, “mouthpart”, “mouth-part”, “proboscis”, “tongue”) in both, then we calculated important keyword proportion: we divided keywords with the total number of the words in the title and the abstract. We used this proportion keywords variable as an estimate for the importance of proboscis length measurement in the given studies.

We extracted the following information from the articles for proboscis preparation methodology: (1) if live or dead specimens were measured; (2) treatment of live specimens (i.e., immobilization); the methods applied on dead specimens: (3) preservation; (4) preparation on dead specimens before fixation, mostly flexibilisation; and (5) fixation.

We extracted the following details for proboscis length measurement methodology: (1) the state of proboscis when measured (coiled vs. uncoiled); (2) landmarks used for measurements; (3) magnifying devices (e.g., stereo-microscope); (4) measurement devices (e.g., ruler, digital photograph) and (5) their resolution; (6) the techniques for reading measurements (e.g., naked eye, software); (7) and if the repeatability and/or accuracy of measurements were calculated. We also recorded if the authors had referred to other studies for the methods applied.

We extracted further numerical data: (1) the number of investigated species; (2) the number of all measured individuals; and (3) the year of publication. Furthermore, we assessed the descriptive statistics on proboscis length given in the articles (e.g., mean, standard deviation, range; in some of the articles different statistics were provided for different species and we included all types of these statistics, see Table S1).

In a few publications, the authors used multiple methods for measuring proboscis length, we present them all.

Data analysis

We present descriptive statistics of the extracted variables by providing median, minimum and maximum values, showing box-plots with individual data points and bar-plots. We analysed the following relationships between the variables characterising the measurements:

To investigate how the importance of proboscis length and the scrupulousness in presenting methodology are related, we correlated proportion keywords in the title and abstract to (1) the number of missing data (hereafter NA) in the description of the methodology in preparing specimens, (2) the number of NA-s in measurement descriptions, (3) resolution estimates for the devices, and (4) the number of measured individuals. We calculated Kendall’s rank correlation coefficients.

We tested if shorter proboscides were measured more likely in dead, rather than live specimens, because we hypothesised that smaller species are more difficult to measure alive, since fragility increases with decreasing size. We built a mixed effect model, where the response variable was proboscis length and the explanatory variable was measurement condition (dead or alive), and the random factor was the study (Zuur et al. 2009).

We analysed all data in the R 3.4.4 statistical environment (R Core Team 2018). We used the “lmerTest” 3.1-0 package (Kuznetsova et al. 2017) for the mixed effect model.

Results

We reviewed 135 studies on proboscis length measurements in Lepidoptera, published from 1924 to 2020 (see Reference list S1, Fig. S1). Proboscis length was provided only as supplementary descriptive data in 6 cases. The aim of the rest of the studies were to investigate body size relationships in 12, mouthpart morphology and functionality in 33, foraging behaviour strategies in 39, proboscis length and flower depth relationships in 41, pollination effectiveness in 57 and pollinator communities in 19 cases. Many studies (59) had several aims (see raw data: Table S1).

Authors investigated 1–117 (median: 5; Fig. 1) lepidopteran species per study. Proboscis length was measured on 4 (median; range: 1–537; Fig. 2) individuals per species. Altogether, data were published on 13,816 specimens of 977 species. Per-species proboscis length means varied between 0.35–280.0 (median: 16) mm, the range of standard deviations was 0.01–32.0 (median: 1.5) mm, and the CV% was between 0.08–122.6% (median: 6.1%). The number of measured species and the number of measured individuals were different for the different aims of the studies (Figs. 1 and 2).

Fig. 1
figure 1

Number of the measured species according to the aim of the study. Box-plots show medians, lower and upper quartiles, notches show 95% confidence intervals for medians, whiskers include the range of distribution without outliers. Grey × symbols represent publications, and are jittered on the horizontal axis for better visibility. Vertical axes are log10-scaled

Fig. 2
figure 2

Number of the measured individuals according to the aim of the study. Box-plots show medians, lower and upper quartiles, notches show 95% confidence intervals for medians, whiskers include the range of distribution without outliers. Grey × symbols represent species from publications, and are jittered on the horizontal axis for better visibility. Vertical axes are log10-scaled

Various methods were used for preparation and for measurements. Many papers fell short to provide a thorough description of the procedures applied, and the reasons why the given methods had been used were often unexplained. For example, 61 (43.3%) studies provided no information on proboscis preparation, 67 (47.5%) on proboscis measurements.

Proboscis preparation

Proboscis lengths were measured in live specimens in 18 (12.8%) studies. Although these specimens probably survived being measured, this was not stated. Sixty-two (44.0%) studies reported using dead specimens, including voucher specimens, and animals captured in their natural habitats or reared and then killed for the measurements. Sixty-one studies (43.3%) did not provide information on whether the specimens survived the measurements or not.

Live individuals were immobilised for measurements by one of the following methods: cooling, anaesthetising with CO2 or ethyl acetate, stabilizing with styrofoam, fixing on glass slide, fixing on plastic board with clips, or covering with a meshed bag. In some cases, researchers did not use any interventions, or they did not state if live specimens were sedated. Dead specimens were either immediately measured after being killed or they were stored as dried or frozen or kept in ethanol (70% or 95%; see Table S1). Preparation of the dead specimens before fixation was mostly flexibilisation, e.g., soaking in 20–50% lactic acid, 5–10% KOH, diluted household cleaner, distilled water, or kept in a relaxing chamber (for further details, see Table S1). In some cases, the solutions were heated, in others, specimens were just soaked for a couple of days. In 30 (58.4%) publications the authors did not state using any kind of preparation on dead specimens. The prepared specimens would be mounted on microscope slides, stubs, sample holders, or spreading boards and embedded by different methods (polyvinyl-lactophenol, DPX mountan, Entellan, Canada balsam, Euparal, graphite adhesive tape, transparent tape, etc. see Table S1). We also found one study, where samples for measurements were frozen with liquid nitrogen.

Proboscis measurements

Proboscides were uncoiled in 51 (35.2%) of the measurements. In 3 cases proboscides were not uncoiled, and in further cases this information was not provided. Magnifying devices were stereo microscopes, light microscopes, scanning electron microscopes or 3D X-ray technology either or not combined with digital photography. Digital cameras by themselves were also used. Measurement devices were analogue and digital callipers, rulers, millimetre scales, ocular micrometers, drawings (drawing tubes and digitalising tablets) and photographs (Fig. 3). The techniques for reading measurements were the naked eye, digital interfaces, or software (Fig. 3; see details in Table S1). The applied image analysis software were Amira; AxioVision; Image Tool for Windows; ImageJ; Imaris; Microsoft PowerPoint; Olympus Soft Imaging Solution and Sigma Scan Scientific Measurement System. Only 6 publications referred to other publications for the applied measurement techniques.

Fig. 3
figure 3

Proportion devices or techniques used for measuring proboscis. Columns from left to right are hierarchically organised: e.g., the bars of measurement devices in the range along the y-axis for the magnifying device stereomicroscope represent measurement devices for stereomicroscopy, etc.

Device resolutions ranged between 0.0001 and 1 mm, and most devices measured to the nearest millimetre. The best resolutions were measured from photographs with software. The best measured resolution was 0.5 mm for rulers, and 0.01 for callipers (Fig. 4). We did not find information on measurement accuracy and precision.

Fig. 4
figure 4

Device resolutions used for proboscis measurements. Grey × symbols represent the articles

Relationships between the variables characterising the measurements

The larger the proportion keywords was, including the title and abstract, the lower was the number of NA-s in preparation description (Kendalls's tau = – 0.25, P < 0.001, n = 129; Fig. 5.) and in measurement description (tau = − 0.23, P < 0.001, n = 129; Fig. 5). Proportion keywords was not related to device resolution (tau = − 0.18, P = 0.100, n = 51; Fig. 5) or to the number of measured individuals per species (tau = 0.08, P = 0.301, n = 86).

Fig. 5
figure 5

Relationship between proportion keywords, in the title and abstract and A the number of NA-s in preparation description; and B the number of NA-s in measurement description; and C the resolution of the provided results of the proboscis length measurements. Grey × symbols represent the articles

We did not find differences in proboscis length between measurements performed on dead or live specimens (P = 0.716, ndead = 643, nalive = 362, i.e., species with shorter proboscides were not measured more likely as dead than as live specimens).

Discussion

The number of papers published including lepidopteran proboscis length measurements more than doubled in the last decade, compared to the preceding three decades (Fig. S1), showing an increased interest. We found various preparation and measurement techniques for quantifying proboscis length. Research aims were different among the reviewed studies, hence the diversity in methodology, e.g., different techniques are needed for studying the sensillas on proboscis by scanning electron microscope or feeding behaviour in the field.

About half of the reviewed studies did not provide information on measurement methodology. This impedes reproducibility and may raise the doubt if these studies were carefully designed with regard to proboscis length measurements and if they took into account the vast range of potential bias (see, e.g., potential problems in measuring body sizes other than proboscis in insects: von Schiller and Solimini 2005; Knapp 2012; Van Hook et al. 2012; García-Barros 2015). Authors provided more methodological information on proboscis length measurement, if information on proboscis length was important from their perspective (proportion keywords in the title and abstract, Fig. 5).

The reviewed studies applied different types of preparation techniques. Different techniques may shrink insect body parts in varying degrees (Kearns and Inouye 1993; von Schiller and Solimini 2005; Knapp 2012; Van Hook et al. 2012; Moretti et al. 2017). In contrast, Fox et al. (2015) suggested that the differences in preparation may not influence proboscis length, since it is mainly built of hard and resistant chitin. Although Fox et al's (2015) arguments are reasonable, we did not find studies with suitable data to test this hypothesis. Students measuring live specimens also face further challenges (Blackwell et al. 2006; Van Hook et al. 2012): handling live, fragile specimens and avoiding injuries is difficult. In contrast, anaesthesia, even for relatively short time periods may permanently alter insect behaviour (Kearns and Inouye 1993; Chuda-Mickiewicz et al. 2012). In a few studies even small species were successfully immobilised and carefully managed by cooling (Kunte 2007; Tiple et al. 2009; Bauder et al. 2013).

Well-defined landmarks are essential for accurate body size measurements (Kouchi et al. 1999; Van Hook et al. 2012), and defining them seems to be relatively easy in the case of lepidopteran proboscis, compared to, e.g., the expandable tongue of bees (Morse 1977; Harder 1982; Kearns and Inouye 1993). Only 16 (11.3%) of the reviewed studies specified the landmarks to measure proboscis length. Length was defined as the distance from the anterior edge of the eye to the proboscis tip in most cases (e.g., Corbet 2000; Kunte 2007; Chupp et al. 2015). These landmarks are reasonable, because the proboscis base is not always visible from a lateral view, since it can be covered with the hairy labial palpus. Furthermore, when measuring proboscis from digital photographs, coiled and uncoiled states of the same proboscis should provide different values due to pixel organization, thus for relative estimates, only one of these methods can be used throughout a study.

A measurement is always a comparison between the measured object and a standard scaled device, and are investigated by persons. Photographs, drawing tubes, analogue or digital callipers, rulers or millimetre scales were applied to measure proboscis length. Contrary to Van Hook et al. (2012), who suggested that measuring butterflies' forewings with different devices yield similar results, we suggest that the different methods and devices are likely differ in resolution, accuracy and precision. Non-standardized devices may differ in bias, e.g., plastic rulers could be biased compared to each other, thus incurring random error (Kemper and Schwerdtfeger 2009; Van Hook et al. 2012). Although measuring from photographs provided the highest resolution, it does not affect accuracy or precision. If the scaling device was a general-purpose ruler or another non-standardized scale, measurement accuracy can be doubtful, although usable for relative estimates if only a single device had been used. A further problem could be optical distortion, especially with low-quality optics (Larson and Chandler 2010). Measurement duration may also be different across methods, e.g., if the speed of measurements increases bias (Daly 1985; Kemper and Schwerdtfeger 2009; Van Hook et al. 2012). We found that the reviewed studies often gave the resolution of the measurements, while precision, accuracy, and repeatabilities were rarely reported. Furthermore, 24.1% of the authors used callipers, rulers and millimetre scales, while these devices can measure only straight objects. Unless mounted on a slide, proboscis is not straight even if uncoiled, since it has a tendency to remain curved, resulting in an underestimate (see, e.g., Photo 3 of Ryckewaert et al. (2011)).

We found a large variance in sample sizes among and within studies. However, we did not find a relationship between proportion keywords and the number of measured individuals per species. Many authors measured a relatively small number of individuals, similarly to cases measuring other morphological traits in various taxa (Cardini et al. 2015). However, we found a few good examples, where sample sizes were carefully chosen (e.g., Krenn 1998; Kawahara et al. 2012; Haverkamp et al. 2016). The large variance in proboscis length within species (e.g., found by Szigeti et al. 2020) and the variance due to preparation and measurement techniques make choosing an appropriate sample size crucial if the aim of the study is to characterise population distribution. Sample size may be deliberately chosen low, to avoid the negative impact on natural populations by removing many individuals (Farnsworth and Rosovsky 1993; Costello et al. 2016; Fischer and Larson 2019). Researchers also have to trade-off sampling different variables, and it is a further constraint to achieve large sample sizes (Szigeti et al. 2016).

Although some studies reported means and standard deviations (see Table S1), descriptive data on proboscis length were not provided in many cases, similarly to the findings of Stang et al. (2009) and Amorim et al. (2014) for measurements other than proboscis. In several cases the type of the statistics, i.e., if a value was the mean or the median or a single value was not provided (Zenker et al. 2011; Meerabai 2012). In a few cases, authors gave different types of descriptive statistics within a single table (Atachi et al. 1989; Singer and Cocucci 1997). In contrast, some publications provided well detailed descriptive statistics: beside the mean and SD, some gave the range and the number of measured individuals (Grant and Grant 1983; Kramer et al. 2015). Entire datasets were published only in a few cases (Kislev et al. 1972; Johnson and Raguso 2015).

Here, we reviewed how lepidopteran proboscis length had been measured. We did not find detailed protocols for proboscis length measurement, but there are some publications with well described measurement methodology (see, e.g., Krenn et al. 2001; Bauder et al. 2014). There are a few guidelines to measure bee tongues (Harder 1982; Kearns and Inouye 1993), and these may also help students of Lepidoptera. Hereafter, we provide recommendations and a guideline (Table 1) based upon this review and our own field experience.

Table 1 Guidelines for measurements of lepidopteran proboscis lengths

Recommendations

Primarily, we highlight the importance to provide detailed descriptions on the methods applied. We recommend providing the following information on measurement techniques: if measurements were performed on dead or live specimens; how they were handled, e.g., mounted for measurements; if alive, sedated or not; if dead, how the specimens were stored and proboscides relaxed; if measurements were taken on coiled or uncoiled proboscides; landmarks for measurements; the device used for magnifying the proboscis; measurement technique; how values were read, the software applied for measurements, including version number; and any other equipment used during the measurement procedure. Provide the following descriptive statistics for the measured values: the number of the measured species and individuals, including the number of males and females if determined; mean, SD, minimum and maximum. Access to entire datasets via public repositories is a good practice, since it makes research transparent and more credible (Reichman et al. 2011), and provide data for meta-analyses (Mortelliti et al. 2010; Szigeti et al. 2016; Amato and Petit 2017) or for trait-based studies (Moretti et al. 2017; Wong et al. 2019).

If survival is important for the study (e.g., investigating behaviour and/or endangered species), measurement could be achieved either by sedation (e.g., Moré et al. 2012; Bauder et al. 2013), with the risk of altering behaviour, or by mounting specimens on styrofoam or on plastic plates while measuring (e.g., Martins and Johnson 2007; Lehnert et al. 2014), although this may be difficult in small, fragile species. In case of working with freshly collected dead specimens, measurements should be carried out as soon as possible to avoid potential shrinkage due to desiccation. Note that using the same preparation methods within a study still allows taking relative measurements, thus within-study comparisons (Kearns and Inouye 1993; Van Hook et al. 2012). To safeguard these specimens in collections is beneficial, since they can be used for further studies (Nilsson and Rabakonandrianina 1988).

We suggest to avoid measuring anything in science by millimetre-paper or a general-purpose ruler (Kemper and Schwerdtfeger 2009; Muñoz-Muñoz and Perpiñán 2010). We recommend avoiding straight scales, such as callipers or rulers for measuring proboscis with the naked eyes. Rather, shoot photo macrographs including a high resolution printed scale on each photograph, then measure proboscis length with a dedicated software. Accurate scales can be drawn with graphical software. Photography can be used both in the lab or outdoors (e.g., Bauder et al. 2013). High-resolution photographs have the advantage of zooming into the picture and adjusting contrast or colour to improve landmark identification. Photographs can be archived and later revisited (Kemper and Schwerdtfeger 2009). Pay attention to: (1) using a macro-lens with the smallest possible geometric distortion, (2) that proboscis and scale should be in the same distance from the lens and parallel with the lens's plane (bubble levels insertable to camera hot shoes can be handy for levelling), (3) using well calibrated scales and (4) trying to standardize the measure of proboscis extension as much as possible.

Different preparation and measurement techniques may potentially yield different results. Resolution (i.e., the smallest readable unit), precision (i.e., the random error), accuracy (i.e., the systematic error) of the measuring device and the influence of the measuring person should be taken into account when planning the study, and these data should be provided. The amount of bias can be accumulated during the procedure of preparation and measurement. This may cause larger error than the investigated differences, i.e., the biological variation (Yezerinac et al. 1992; Arnqvist and Mårtensson 1998; Van Hook et al. 2012), thus biasing the conclusion of a study. The size of measurement error is inversely related to the quality of the data, and measurement standardization is the most effective way to minimize these errors (Ulijaszek and Kerr 1999; Van Hook et al. 2012). We encourage researchers to develop standard preparation and measurement protocols. Repeatability tests are useful, especially for newly developed techniques as well as for checking the reliability within and among the persons conducting measurements. We also emphasise the importance of measurement calibration and the observers' training to further enhance data reliability (Gordon and Bradtmiller 1992; Kouchi et al. 1999; Blackwell et al. 2006; Van Hook et al. 2012). We agree with Blackwell et al. (2006) to replicate measurements at least twice or thrice and use the mean of the replications to decrease random error, when necessary (Arnqvist and Mårtensson 1998). Multiple shots on each specimen may also be useful to check measurement repeatabilities (Daly 1985; Kemper and Schwerdtfeger 2009; Muñoz-Muñoz and Perpiñán 2010).

We recommend to chose an appropriate sample size. Van Hook et al. (2012) suggested that a sample of 30 specimens is enough per population for measuring wing length in butterflies. Similar sample sizes were recommended for accurate estimates of mean body sizes in other taxa (Cardini and Elton 2007; Griffiths et al. 2016; Stec et al. 2016; Wong et al. 2019). Note that although no rule of thumb exist on the minimum sample size upon which the shape of a distribution can be estimated, 30 seems to be a safe minimum for this purpose. We are aware that in many, especially field studies, this sample size can simply be not achieved. In such cases, results should be interpreted with caution. We found considerable within-species variability in proboscis length in some of the studies and others suggest that intraspecific variation in arthropods’ traits may have a significant impact on the studied systems (Griffiths et al. 2016; Moretti et al. 2017; Wong et al. 2019; Szigeti et al. 2020). These imply that very low sample sizes are likely to bias distribution estimates severely, although the required sample size could be rather different among species, aims, the required confidence level, and may be different for different analyses (Batterham and Atkinson 2005). Since accurate results require an estimate on the appropriate sample size, we suggest conducting preliminary studies on the target species or on data from related taxa, when feasible. Optimizing sample size is not an easy task, and sample size often depends on the time spent in the field or the number of traps available, hence these could be considered when planning the sampling (Cardini et al. 2015). Finally, we suggest to take into account ethical and nature conservation issues when deciding on measurement methods or sample sizes (Costello et al. 2016; Fischer and Larson 2019).

Conclusion

The array of methods and devices have been increasing in insect morphometrics. New technologies, such as automated measurements with dedicated software from photographs (Bánszegi et al. 2014), 3D photographing (Olsen and Westneat 2015), microCT (Metscher 2009), probably will influence the development of morphological measurements. Several studies have already used and thoroughly presented new techniques for measuring proboscis length (Grant et al. 2012; Bauder et al. 2013; Lehnert et al. 2016). In contrast, many publications did not disclose the necessary details on measurement procedures, regardless to using or not modern techniques. Deficiencies in the methods and the results were also found in other types of ecological publications (Mortelliti et al. 2010; Szigeti et al. 2016). Insufficient description of methodology is an important problem, since it makes the given study doubtful, and its reproducibility impossible (Moretti et al. 2017). Furthermore, such publications are mostly inappropriate to be included in meta-analyses (Mortelliti et al. 2010; Szigeti et al. 2016; Amato and Petit 2017). Hence, we emphasise that well-planed methodology and detailed descriptions of the applied methods are essential for accurate conclusions. We think that further methodological development to measure proboscis length is important and general protocols could enhance data quality, thus improving cross-study comparisons. Thoroughly planned studies comparing sampling methodologies and comparing their appropriateness and accuracy at different circumstances are still mandatory.