Introduction

Surveying and monitoring are necessary components of conservation programs, both to identify those species (of the ones effectively sampled) that do and do not require conservation action, and to assess the efficacy of conservation actions (Dennis 1993; Pollard and Yates 1993). Butterfly abundance varies greatly among broods attributable to fluctuations in abundance due to climate and other factors (Dennis 1993; Pollard and Yates 1993). As a result, long-term monitoring is necessary to assess a butterfly species’ status and trend (Thomas et al. 2002). In England, weekly counts have occurred long-term on fixed routes, with results among trained observers validated to be functionally similar (Pollard and Yates 1993). Many European countries have similar programs (van Swaay and van Strien 2005). These are rigorous monitoring programs, but such data are not available for Minnesota, USA due to limitations of resources and personnel. To obtain long-term data, survey results from different teams using different methods must be evaluated. An underlying premise of most butterfly status/trend assessments is that data from different or informal (variable) methodologies can be pooled in some manner (Saarinen et al. 2003; Shuey 2005; van Swaay and van Strien 2005; van Swaay et al. 2006; Kuussaari et al. 2007).

At the more informal end of the spectrum are collector/observer records. Such presence/absence records are a function of effort just as abundance indices are (Dennis et al. 1999; Dennis and Thomas 2000), but it is difficult to assess what locations were visited and when, if only positive data are available. Studies may correct for this bias (Dennis et al. 1999), not correct for this after determining it to be a minor error in the analyzed dataset (Saarinen et al. 2003), not correct or test for this bias (Parmesan 1996; Komonen 2007), or conclude that the data can’t be interpreted (Swengel and Swengel 2001a). The datasets in this meta-analysis are more rigorous than collector/observer records, in that date, effort (hour or km of surveying, or both), number of surveyor(s), and surveyed species (found or not) are known for the available datasets, and often weather conditions, time of day, and route location. Nonetheless, unquantified sources of variation (e.g., differences in exact survey route, although location at the scale of site is controlled) are potential sources of error in this study, requiring more care in interpreting the results.

During 1993–1996, two teams—Schlicht (one surveyor) and Swengels (two surveyors)—happened to survey the same Minnesota prairies at much the same time in the same years, but without any coordination of survey sites, transect routes, survey methods, survey dates, and results between the teams. In this paper, butterfly abundance indices were tested for correlation between the two teams at the scale of the site and the subsite. Since strong covariance occurred, thus establishing a validation of data pooling, a calibration of the indices between the two teams was then conducted. Abundance indices of prairie-specialist butterfly species in western Minnesota prairies were then calculated for long-term monitoring during 1979–2005 (“monitoring” defined here as a time series of population indices used to calculate trend), using not only Schlicht and Swengel surveys but also publicly available data from five other teams. State-listed butterfly and day-flying moths occurring in western Minnesota prairie (Table 1) are a conservation concern primarily because of vast prairie destruction (99.6% in Minnesota) due to conversion to intensive agriculture and urbanization (Samson and Knopf 1994), as well as isolation, degradation, and unfavorable land use/management of remaining conserved and unconserved tracts (McCabe 1981; Dana 1991; Schlicht 2001). As a comparison, long-term monitoring indices were also calculated for the five most abundant butterflies in these surveys. The goals of the individual teams’ studies were to assess the status of these prairie-specialist species and to document their habitat and management preferences. The goals of this meta-analysis were to assess the comparability of data among teams, so as to determine whether and how the teams’ data could be aggregated to analyze status and trend of key species in major reserves.

Table 1 Butterfly and day-flying moth species inhabiting prairie in western Minnesota with a legal conservation status in Minnesota (MDNR 2007b)

Methods

Surveys

Two teams conducted transect butterfly surveys in summer in Minnesota: Schlicht during 1993–1997 and 2000 (Schlicht 1997a, b; Schlicht and Saunders 1993, 1995; Schlicht 2001, 2003) and Swengels during 1988–1997 (Swengel 1996, 1998; Swengel and Swengel 1999a). Numbers of all butterfly species and diurnal Schinia moths were recorded (see Tables 1 and 2 for scientific names). Special effort was made to identify and survey habitat of target species: for Schlicht, primarily Dakota skipper but also regal fritillary, Poweshiek skipperling, and Arogos and Ottoe skippers (“midsummer specialists”); for Swengels, those same species plus “late-summer specialists” (Leonard’s and common branded skippers). “Specialist” is defined as restricted, or nearly so, to native prairie vegetation, being sensitive to vegetative degradation (Swengel 1998; Schlicht et al. 2007). The target species were selected because these specialists were particularly restricted in their requirements and/or range, and were of particular conservation concern, e.g., having a legal status in the study region (Table 1, Coffin and Pfannmuller 1988).

Table 2 Spearman rank correlations of Schlicht and Swengel population indices (individuals/h) of 18 most recorded species (* = prairie specialist) in 27 visits to the same 14 sites surveyed in the same years (1993–1996) in midsummer (30 June to 18 July)

Methods were similar between the two teams (e.g., both teams used binoculars for identification), with these differences. Schlicht included within survey time the collection of voucher specimens, net-and-release identification, and recording of wing wear of target species, while Swengels did not use nets, vouchered with photography but deducted that from survey time, and recorded nectar visits and behavior of targets. Schlicht used a transect 10 m wide; Swengels an unlimited width. Where routes were marked on a topographic map in the Schlicht survey reports, distance surveyed was measured from the map for analysis here. Swengels estimated distance surveyed at the time of each survey, based on site maps and landmarks of known distance apart. Schlicht surveys conducted by 1–3 surveyors; when one, occasionally the surveyor wasn’t Schlicht. Swengels conducted all surveys together on parallel separate transects about 10 m apart. Schlicht used a stricter protocol for time of day and weather than Swengels. Schlicht used or adapted loop routes mapped by Minnesota Department of Natural Resources (MDNR) personnel, and for 1995–1996, the straight line transects previously established at Prairie Coteau by Selby (Selby and Glenn-Lewin 1989, 1990) and elsewhere, typically straight line transects with landmarks marking the start, end, and turns. Swengels followed routes (rarely straight lines) they established in previous years, but sometimes added or omitted units; each unit contained vegetation relatively similar in type, quality (based on amount of brush and diversity and abundance of native and non-native flora), and management. Schlicht resurveyed some sites to provide broader coverage of the target species’ varying flight periods (time when a butterfly species is in the adult life stage); the late June–July sampling period was 14–25 days long per year 1993–1997. Swengels sampled in shorter periods (2–4 days each year in June 1988–1989, 1–3 days twice in June 1990, 3–6 days each year in July 1990–1997, 4–5 days each year in August 1991–1993) and resurveyed within period only due to weather. While a number of factors varied between the Schlicht and Swengel teams, the primary differences in survey methods were in transect width (10 m vs. unlimited), number of surveyors (typically one vs. always two), length of survey period (14–25 days vs. 1–6 days), and route location.

Overview of analyses

Butterfly population indices were calculated as individuals per survey time (h) or distance (km) to create an observation rate (relative abundance). The Spearman rank correlation was used for all correlations, the Wilcoxon signed ranks test for all tests for differences between paired samples, the Mann-Whitney U test for all tests between two categories, and the binomial probability test for all tests for a preponderance of negative or positive correlation coefficients in a set of correlations (random distribution = 50% positive, 50% negative). All tests were two-tailed, with statistical significance set at P < 0.05. Since significant results typically occurred at a frequency well above that expected due to spurious Type I statistical error, the critical P value was not lowered further, as more Type II errors (biologically meaningful patterns lacking statistical significance) would be created than Type I errors eliminated. However, the samples of sites and N years in this study were relatively small and an individual result can nonetheless be a Type I or Type II error. Due to the numerous statistical tests in this paper, greater confidence should be placed in patterns that recurred frequently with significance or consistent non-significance, as well as contrasts between different ecological groups of species. All statistics were calculated using ABstat 7.20 (1994 Anderson-Bell, Parker, Colorado, USA).

Validation

Correlation of indices between teams was analyzed separately at the site and subsite scales. A site is a named prairie, as in preserve guides (Wendt 1984; TNC 1988, 1994), except the original and new acquisitions at Hole-in-the-Mountain Prairie were distinguished as two separate (but contiguous) sites when possible. The entire survey at a site on a single date by a team was used to calculate the index, regardless of whether and how much the routes overlapped between teams. A subsite consists of the minimum number of transect segments surveyed by Schlicht on a single date that can be cross-referenced to the minimum set of units surveyed on a single date by Swengels; e.g., one Schlicht segment might correspond to two Swengel units, or vice versa. All Schlicht surveys in these correlations were conducted by Schlicht alone.

In 27 instances, both teams surveyed the same site in the same year; N = 14 sites during 1993–1996, with 1–3 years per site, in a total of 39.2 h (Schlicht) and 28.4 h (Swengels) of surveying between 30 June and 18 July. Within pair of site surveys by the two teams, survey date averaged 3.7 days apart (range 0–12 days). All species with a minimum of 30 individuals observed by both teams combined were analyzed. Indices both per hour and per km were available in 12 pairs of site surveys at 7 sites (in a total of 23.1 h and 41.7 km for Schlicht, 13.7 h and 26.4 km for Swengels); all species with a minimum of 15 individuals observed by both teams combined were analyzed. In 25 instances, both teams surveyed the same subsite in the same year; N = 15 subsites of 5 sites during 1995–1996, with 1–2 years per subsite, in a total of 12.7 h and 25.1 km (Schlicht) and 8.1 h and 15.9 km (Swengels) of surveying during 1–13 July. Within pair of subsite surveys, survey date averaged 2.6 days apart (range 0–4 days). All species with a minimum of 15 individuals observed by both teams combined were analyzed.

Several variables were tested for their relationship to the strength of these correlations between the two teams’ indices. The correlation coefficients (r) were tested for significant correlation with (1) total number of individuals recorded by both teams combined, (2) percent zero indices (out of all indices in sample, pooled for both teams), and (3) percent “mismatched zero” indices (i.e., one index in the pair was zero but the other positive). To test for whether results varied by type of index (per hour or per km), correlations were calculated for both types of indices.

Calibration

To make indices between the two teams comparable, we conducted a calibration between Schlicht and Swengel indices. The mean index (separately per hour and per km) per team was calculated for each species, separately at the site and subsite scales, using the data in the validation correlations. All the mean indices per species were then averaged to create a grand mean index for all species per team. The grand means were used to calculate a calibration constant between the two teams, separately at the site and subsite scales and separately per hour and per km. This is not the calibration of transect surveys (relative abundance) to estimates of absolute abundance per Thomas (1983) and MacKenzie et al. (2005), as no pairings of transects to estimates were available matched by species, site, and year. Rather, this is a calibration of transect surveys to make indices comparable between the Schlicht team (one surveyor; fixed-width transect) and the Swengel team (two surveyors; unlimited-width transect).

A species-specific calibration was not used because sampling error would be greater, as well as variation among species due to true differences in observed numbers depending on exactly where each team surveyed (different areas in a site and differences in weather among dates might result in different observed butterfly densities) and phenological variation between the teams’ surveys. The larger the sample (the more survey effort, and species and individuals observed), the more these confounding factors should average out.

To test for butterfly detection factors affecting between-team calibration for different species, a calibration-ratio was calculated by dividing the Swengel mean index by the Schlicht mean index for each species in the site-scale correlations. These ratios were tested for significant relationships to species’ characteristics that might allow relatively more of a species to be recorded on an unlimited width transect by two surveyors than a fixed width by one, thus making the calibration ratio higher: e.g., larger butterflies can be recorded from further away, more localized species are more likely to be found in the wider strip. If one team’s survey dates skewed earlier than the other’s in these pairs of surveys, then the calibration-ratio would skew to the team surveying nearer to a species’ peak timing. To check for effects of statistical power, the same variables (total individuals, percent zero indices, percent mismatched zero indices) in “Validation” (above) were tested for whether more individuals and fewer zero indices related to lower calibration-ratios.

Each species’ size was calculated as the average of the minimum and maximum wingspan in Marrone (2002), Royer (2003), and Schlicht et al. (2007). Species were classified by “detectability”: 1 for medium/large in size (>3.5 cm mean wingspan) and easy to identify (N = 10 species) vs. 2 for small (<3.2 cm mean wingspan) and/or hard to identify (N = 8 species). The “encounterability” code used 1 for widespread species inhabiting many vegetation types and 2 for localized species restricted to a few vegetation types, based on Marrone (2002), Royer (2003), and Schlicht et al. (2007). Each species’ codes for detectability and encounterability were summed as a “combination code”: e.g., 2 is both more detectable and encounterable, 4 is both less detectable and encounterable. Survey date was tested for a significant difference between teams, and each species’ phenology was classified by correlating each species’ indices to date, separately by team. When both teams had a negative correlation coefficient vs. date, the species was classified as later in the adult brood; when both were positive, earlier in the adult brood; when sign differed between teams (always negative for Schlicht and positive for Swengel), the species was intermediate or indeterminate. Species were also classified as “multivoltine” (known to have multiple broadly overlapping generations per year) or not, per Schlicht et al. (2007).

Tests were performed on all species as well as subgroups, e.g., more vs. less detectable or encounterable species, including/excluding specialist species (as calibration ratios might be systematically lower for these due to both teams targeting them) or multivoltine species (which, with their overlapping generations, might have less phenological variation than species with distinct generations). Likewise, Dennis et al. (2006) analyzed the relationship of higher (or sooner) recorded butterfly numbers to variables similar to the ones here: wing length, brightness of coloration (cf. detectability code here), and length of flight period (cf. code for multivoltine species here), as well as other variables not feasible and/or not applicable to this study.

Monitoring

For each midsummer specialist, one index per year was identified to represent its abundance at a site, for sites with indices in main flight period for >5 different years, at sites surveyed in midsummer over a span of >9 years. These additional datasets were available: mark-release-recapture (MRR) surveys of Dakota and Ottoe skippers at Hole-in-the-Mountain in 1979–1981 by Dana (1991); transect surveys of the midsummer specialists (1988) and those and many other butterfly species (1989–1990) at Prairie Coteau by Selby (Selby and Glenn-Lewin 1989, 1990); surveys of all butterflies at various sites in 2003–2005 (Selby 2006) and regal fritillary, Poweshiek skipperling, and Dakota skipper by Skadsen at Glacial Lakes State Park in 2001 as reported in Selby (2006), with implication of similar routes by Skadsen and Selby (mean Selby distance per transect location was used for Skadsen’s survey effort here); regal fritillary surveys at many sites in 1998–1999 by Mason (2001), and Dakota skipper data (Britten 2001; Britten and Glasford 2002).

Peak Schlicht indices per site per year for each species were designated as the standard to which other teams’ indices needed to be made comparable. Thomas (1983) found that a single transect survey through core habitat of a butterfly during main flight period was adequate to generate effective, comparable population indices among sites. Likewise, time series of single peak transect survey indices covaried significantly between two teams, each surveying the federally endangered Karner blue (Lycaeides melissa samuelis) on different schedules in adjacent counties in Wisconsin (AB Swengel and SR Swengel 2005a).

These adjustments were made to other teams’ indices to make them more comparable in survey effort to Schlicht’s indices. Swengel indices were divided by the constant calculated in “Calibration.” Whichever team had the highest index per site per year on a single date was used. Since only the Schlicht and Swengel teams surveyed the same sites in the same year, a calibration constant could only be calculated between these teams. Dana in litt. (5 March 1993) provided estimated ranges of individuals/h on the peak day per year for Dakota and Ottoe skippers excluding time spent on MRR. Since multiplying the individuals captured/h on the peak date each year 1979–1981 (Dana 1991: Figs. 13–14) by 2.5 reached the low end of his estimated range, that constant was used to calibrate all daily MRR capture rates (Dana 1991: Figs. 13–14) to Schlicht indices. Since Dana (1991) and Selby (Selby and Glenn-Lewin 1989, 1990) surveyed most days during their targets’ flight periods, their peak date per year is likely a better approximation of peak than in the Schlicht and Swengel datasets. For their surveys in each species’ main flight period (the 6–9 days provided in Dana 1991: Figs. 13–14; 8–19 day spans for 1989–1990 in Selby and Glenn-Lewin (1989, 1990), and 5 days in 1988 as surveys began partway through main flight period), the median (not peak) index was used. Since survey data on Arogos skipper and Poweshiek skipperling were not provided in Dana (1991), the minimum estimate from Dana in litt. (5 March 1993) was used for these species only for one year, designated as 1980, and was divided in half to make these estimates conservative, even though Dakota skipper median indices averaged 88% of peak indices and 67% for Ottoe skipper for 1979–1981 (Dana 1991). In 1989, Selby surveyed one unit several times per day (Selby and Glenn-Lewin 1989). As in Schlicht (2001, 2003), the survey started at 1200 h Central Standard Time was used here.

For Glacial Lakes 2003–2005 (Selby 2006), a correction for number of surveyors (one or two) was not made, since the methods did not indicate whether a correction was needed. Furthermore, at this site, many surveys occurred during late June to mid-July in 2003–2004 (seven each year), but peak count was used instead of median. For Prairie Coteau in 2000 (Schlicht 2001), a correction for N = two surveyors was also not performed. If number of surveyors and use of peak rather than median counts from multiple survey dates in these recent years cause a bias, it would be against a negative trend as also done by Franzén and Johannesson (2007) and Groenendijk and van der Meulen 2004: (comparison of pre-1990 to 1990–2000 data). A calibration constant >2 for Swengel indices would be expected a priori to bias against declines since Swengel surveys weight earlier in the time series. The use of ranking (non-parametric) statistics here also accounts for the indices being relative abundance estimates, not precise population counts.

As a comparison (outgroup) to specialist butterflies, the same methods for identifying annual abundance indices were used for the five most frequently recorded non-specialist (“common”) species in the combined database during the late June to mid-July period: Aphrodite fritillary, meadow fritillary, common wood-nymph, monarch, long dash. If it was unclear whether a sampling period was adequately in a species’ main flight period or not (i.e., low recorded numbers could be due to timing or low fluctuation in abundance), for specialists the decision was biased against a negative trend but for common species, against a positive trend (Appendix 1). Swengel surveys in mid-August were used for a peak index in lieu of the same team’s early July period in the same year only for univoltine species peaking between those two survey periods (regal fritillary, Aphrodite fritillary, common wood-nymph); surveys in later years by other teams included dates in the mid/late July peak. Indices per hour were used unless indices per km increased number of years in the species’ time series.

Trend (correlation of indices with year) was calculated for each species individually by site. The correlation coefficients (r) (both as absolute value and with sign) were correlated with start year of the trend test, end year, N years in the time series, and span of years (duration between start year and end year). These year variables were tested for a significant difference between specialist and common species. Sets of correlation coefficients have frequently been analyzed for patterns of spatial variation (e.g., SR Swengel and AB Swengel 2005b), often with unequal samples generating these coefficients (e.g., Hanski and Woiwod 1993; Williams and Liebhold 2000; Koenig 2006). The application here is to analyze coefficients for patterns of temporal variation. Since all analyzed indices represent relative abundance, no direct comparisons of abundance were made among species. Instead, only relative change within a site (as represented by correlation coefficients) was compared among species.

Two a posteriori sensitivity tests were performed to determine how much the adjustments of indices to make them comparable among teams influenced trend results. First, trends were re-calculated two ways: excluding Dana (1991) and Dana in litt. (5 March 1993), as this study was most different in methods (MRR) from the other studies (transect surveys) in this meta-analysis, and including only Dana (1991: Figs. 13–14) and not Dana in litt. (5 March 1993) by using the median published MRR capture rates (for Dakota and Ottoe Skippers only) as indices. Second, trends were re-calculated using other calibration constants: 2.42 for all Swengel indices (which affected only one site, the indices per km at Prairie Coteau) because the Schlicht and Swengel teams averaged a similar walking speed (Table 5), and 2.0 for all Swengel indices because this would be a logical value to account for the difference in number of observers in the absence of the validation analysis. The trends were also re-run with no adjustments for effort, by using no calibration constant for Swengel indices, the unadjusted indices reported by Dana in litt. (5 March 1993) for all four species for each of the three years in his study (as the mean of the range provided), and the peak date for all indices (not median date for Selby and Glenn-Lewin 1989, 1990). In each re-run of the trends, Schlicht and Swengel indices were re-compared to each other to identify which was the peak index for a species at each site each year.

Results

Validation

For the 18 species analyzed in the site-scale correlations, abundance indices significantly covaried between the two teams for 11 (61%) species, including 2/3 specialists; no species significantly correlated negatively; and 17/18 species had positive correlations (Table 2). For the ten species analyzed in the subsite-scale correlations, two (20%) significantly covaried in abundance indices (both per hour and per km) between the two teams, no species significantly correlated negatively, and 9/10 species had positive correlations (Table 3). Indices could be calculated both per hour and per km for 12 pairs of site surveys (Table 4). For both types of indices, 10/12 species had positive correlations and 6–7 had significant correlations, all positive. The correlation coefficients per hour and per km (Tables 3 and 4) significantly covaried, as did the two kinds of indices to each other (P < 0.001 for all these tests). The preponderance of positive correlations was significant in all sets of correlations in Tables 2 (P < 0.001), 3 (P < 0.02), and 4 (P < 0.05). In all survey datasets, survey time and survey distance covaried strongly (Table 5).

Table 3 Spearman rank correlations of Schlicht and Swengel population indices (individuals/h) of ten most recorded species (* = prairie specialist) in 25 visits to the same subsites (N = 15 different subsites) in the same sites (N = 5 different sites) surveyed in the same years (1995–96) in midsummer (1–13 July)
Table 4 Spearman rank correlations of Schlicht and Swengel population indices (per hour and per km) of 12 most recorded species (* = prairie specialist) in 12 pairs of visits to the same sites (N = 7 different sites) surveyed in the same years (1995–1996) in midsummer (1–13 July)
Table 5 Spearman rank correlation of distance (km) versus time (h) spent surveying

Correlation coefficients between teams correlated positively with total individuals observed and negatively with percent zero indices (non-significantly at the site scale and significantly at the subsite scale) and correlated negatively with percent mismatched zero indices (significantly at the site scale and non-significantly at the subsite scale) (Table 6). Total individuals and percent zero indices significantly and negatively correlated (P < 0.01, tested separately for values in Tables 2, 3 and 4), but neither related significantly to percent mismatched zero indices (P > 0.10). Species included in “Trend” (below) that had such small samples in the dataset for this validation analysis as not to covary significantly (regal fritillary) or be unanalyzable (Ottoe and Arogos skippers) here had larger samples available in the surveys analyzed for trend.

Table 6 Spearman rank correlations (r) of coefficients in Tables 24 with N individuals recorded by both teams combined, percent zero indices, and percent mismatched zero indices

Calibration

The calibration-constant between Swengel indices (two surveyors; unlimited-width transect) to Schlicht indices (one surveyor; fixed-width transect) varied, but Swengel indices were usually >2 times the Schlicht indices (Table 7). For the largest scale and largest sample (from Table 2), Swengel indices averaged 2.42 times the Schlicht indices (the calibration constant used in “Monitoring” below). In the smaller samples available for indices per km, the calibration constant at the site scale (2.89) was used in “Monitoring” below. It was inconsistent whether the specialists (targets) or widespread species had larger or smaller calibration constant between the two teams. Specialists always had a lower constant than comparable samples of all species.

Table 7 Grand mean of mean indices per species and calibration constant between Schlicht and Swengel grand means, and mean correlation coefficients (r) from Tables 24, for all, three specialist (regal fritillary, Poweshiek skipperling, Dakota skipper), and three widespread (clouded sulphur, common wood-nymph, monarch) species

Individual calibration-ratios at the species scale had no significant relationships to size, detectability, N individuals observed, percent zero indices, and percent mismatched zero indices. These ratios did not differ significantly by encounterability when including all species, but did (P < 0.05) when excluding the three specialists, skewing toward lower ratios with greater encounterability. The correlation of the calibration-ratio to the combined code (detectability and encounterability) was far from significant for all species but was significant when excluding the three specialists (r = +0.57, P < 0.05, = 15 species), with ratios decreasing with greater combined detectability plus encounterability. Although surveys occurred on dates close to each other, Schlicht surveys usually followed Swengel surveys and this difference in date between the two teams (median 10 July for Schlicht, 7 July for Swengels) was significant (P < 0.05). Eight species were classified as later in the brood, six as intermediate, and four as earlier. But the calibration-ratio did not significantly relate to phenological category, either in a correlation using the three categories, or a binary test of later vs. intermediate/earlier combined. See “Trend” below for effects of alternate calibrations.

Monitoring

At the monitoring sites (Table 8), for specialists, 25/30 trend tests were negative regardless of significance (Table 9). This skewing to negative coefficients was highly significant (P < 0.0005). One species (Arogos skipper) also had a significant skewing to negative trends (P < 0.05). In addition, Ottoe skipper had 3/3 negative trends, a sample too small for a significant probability but 2/3 sites had significant declines. Regal fritillary had the lowest proportion of negative trends (4/7). By contrast, common species had 16 negative trends regardless of significance, 16 positive, and 3 exactly 0—an essentially random distribution (Table 9). Five individual trend tests of common species were significant (2 negative, 3 positive). No individual common species had a significant skewing to negative or positive trends. For all species combined, one site had a significant skewing of individual species trends: Prairie Coteau (9/10 negative, P < 0.05). Sample size was inadequate to test this by specialist vs. common species.

Table 8 Minnesota prairies used in long-term trend analysis; all were preserves managed primarily with fire during this study. Sources: TNC (1988, 1994, 2008), Wendt (1984), Schlicht (2003), Selby (2006), MDNR (2008)
Table 9 Summary of Spearman rank correlations of trend (year versus abundance index) for specialist and common species, with binomial probability for non-random skewing of signs of correlation coefficients

Specialists had significantly more negative trend coefficients than common species (mean r with sign = −0.21 for specialists vs. +0.035 for common species, P < 0.01), but degree of significance regardless of sign (r as absolute value) did not significantly vary between specialist and common species. Start year significantly differed between specialist and common species (median 1988 for specialists and 1990 for common species) (P < 0.005), but end year did not (P > 0.10). Furthermore, start year and end year did not significantly correlate with r (either as absolute value or with sign, for the entire sample and separately for specialist and common species). R (with sign or as absolute value) did not significantly differ by whether the end year was 2005 or an earlier year (> 0.10), for all species and for specialists (not testable for common species as N = 5 for end year <2005). Specialists did not have significantly more years in a trend test (mean 8.0 years) vs. common species (mean 7.3 years) (median = 7 years for both groups) (P > 0.10). Number of years did not significantly correlate with r (with sign), while r (as absolute value) did so negatively (as expected, because the critical value of r at P < 0.05 declines with increasing N) for all species (r = −0.25, P < 0.05, N = 65) and for common species (r = −0.37, P < 0.05, N = 35) but non-significantly for specialists (r = −0.13, P > 0.10, N = 30). Span of years (duration between start year and end year) in a trend test did significantly correlate negatively with r (with sign) for all species (r = −0.29, P < 0.05, N = 65) but not for species subgroups (specialist or common species) and not with r as absolute value. Span of years did not significantly differ between the specialist and common species.

For some specialists at some sites, thresholds were apparent between the last year when a higher index (or any individual) was recorded and all subsequent years, when consistently lower indices (or zeroes) were recorded (Table 10). While adjacent sites had similarly timed thresholds within species (Ottoe skipper at Hole-in-the-Mountain old and new; Poweshiek at Bicentennial and Blazing Star), these thresholds were not synchronized within species among sites in different counties (e.g., Arogos skipper at Glacial Lakes vs. Hole-in-the-Mountain new; Ottoe skipper at Hole-in-the-Mountain vs. Prairie Coteau; Poweshiek at Bicentennial and Blazing Star vs. Glacial Lakes). No positive thresholds were identified for specialists, while both positive and negative thresholds occurred for common species, although fewer distinct threshold patterns were apparent for the latter.

Table 10 Threshold (boldfaced and italicized) at which a persistent change in index occurred, if the change persisted for at least three and all remaining indices: for declines, the last time a higher rate was seen (rates prior are not presented and could have been lower) and the first time all subsequent rates were less than a certain value; for increases, all values in the earlier period were at the lower rate, and all values in the later period were at the higher rate

The other state-listed butterflies (Table 1) were not analyzed in this study. For Uhler’s arctic and common branded skipper, a few surveys at several sites were in proper timing and range (Coffin and Pfannmuller 1988) but none were recorded. Garita skipperling and Uncas skipper were not known to occur in counties covered by these datasets (Coffin and Pfannmuller 1988). Leonard’s skipper was recorded in August during 1989–1993 at a total of five sites, but sampling occurred in its flight period in too few years for trend analysis. Data on phlox and leadplant moths are in Swengel and Swengel (1999b).

The a posteriori sensitivity tests indicated minor effects on the overall outcome. Changing indices from Dana (1991) to only the median MRR capture rates resulted in a change from 4/4 negative correlations (one significant) for the four affected specialist skippers at Hole-in-the-Mountain (old) to 3/4 negative (none significant), for an overall change to the entire meta-analysis from 25/30 negative trends for specialists to 24/30 (still significantly skewed negative, P < 0.0005). Excluding Dana (1991) altogether resulted in 2/4 negative trends (none significant) and 23/30 negative trends overall (still significantly skewed negative, P < 0.005). Changing the calibration constant to 2.42 for all Swengel indices (affecting eight species at Prairie Coteau) resulted in no change in sign or significance of trend tests. The two largest changes in adjusting indices resulted in no change in overall outcome between specialist and common species, but with an increasingly negative shift in the coefficients. Changing the calibration constant to 2.0 resulted in no change in sign or significance of trends for specialists but the mean r changed from −0.21 to −0.24 (a significant decrease, P < 0.005), while common species shifted from an even distribution to 19 negative and 16 positive trends (still a non-significant distribution, P > 0.10) and from a mean r of +0.035 to +0.006 (a significant decrease, P < 0.005). Changing to no adjustments resulted for specialists in an increase in 1 negative and 1 0.0 r (reducing positive trends by 2) and a mean r of −0.32 (a significant decrease, P < 0.001), while common species shifted to 19 negative, 2 with r = 0.0, and 14 positive trends (still non-significant, P > 0.10), with a mean r of −0.066 (a significant decrease, P < 0.001).

Discussion

Validation

In spite of differences between the two teams’ survey protocols, an overwhelming non-random pattern of covariance occurred in abundance indices between the Schlicht and Swengel teams when indices were matched by year and location (Tables 2, 3 and 4). The site-scale comparison could have greater statistical power (i.e., more frequent significant covariance) due to the greater survey effort used to obtain the indices. The strength of covariance statistically related either to number of individuals (positively) and percent zero indices (negatively), or to percent mismatched zero indices (negatively), suggesting that weaker correlations occurred for species outside their main flight period and/or localized in distribution within a site. The very strong correlation between survey time and distance within each team (Table 5) explain why indices per hour and per km produce consistent results (Tables 3 and 4). When Thomas (1983) validated a single transect survey through core habitat of the butterfly species during the main flight period to MRR results, he concluded that an even more approximate survey method would rank the abundance of different populations adequately. Based on that finding, it is not surprising that the nonparametric ranked statistics used here (which are more conservative than parametric tests) nonetheless detected robust patterns.

In this analysis, transect surveys by different teams at the same sites but not the same routes within sites produced similar rankings of species abundance among sites. This suggests the validity of combining survey datasets from different teams for monitoring butterfly abundance in western Minnesota prairies. In spite of the sampling error inherent in individual 250 m transects analyzed by Pellet (2008), robustness of indices greatly increased in that study when more subsites were lumped together. Likewise, consistent (co-varying) patterns emerged in this study, which mostly had much longer than 250 m transect routes used for indices, especially at the site scale. The premise of this meta-analysis is that transect survey indices can be used for robust ranking not only of sites within year (per Thomas 1983) but within site among years.

This study also indicates that, in the absence of data from fixed transects sampled frequently per year for many years, an adequate dataset can result if survey effort (preferably both distance and time), location (site or subsite), date, number of surveyors, and number of individuals of as many species as possible are recorded. A considerable confounding factor is the dramatic variation in butterfly phenology by up to three or more weeks among years in the midwestern USA (Swengel and Swengel 1999c): e.g., peak Dakota skipper numbers occurred on 9–14 July in 1979 (Dana 1991) but 20–24 June in 1988 (Appendix 1, Selby and Glenn-Lewin 1989). Great care is required to obtain surveys during the main flight period (e.g., most teams used a multiple-week survey period and re-surveyed sites) and to eliminate surveys outside the main flight period (e.g., certain years of Swengel surveys not included for specific species per Appendix 1).

While a comprehensive monitoring program is preferable, as more species are more systematically covered, this appears impractical. The study region is about 96 km east-west by 326 km north-south (31,000 km2). The city of record for each team in this meta-analysis was outside this area, from 115 to 513 km (median 256) straight-line distance away from the nearest site in Table 8. The counties containing the study sites (Table 8) have <10 people per square km (1990 and 2000 censuses), compared to >200 in Great Britain during the same time period (WAEG 2002). The methods and species in this study are adequate to provide information on the effectiveness of prairie conservation and management. This approach might also be appropriate in other regions with both high habitat loss (cf. about 99% in tallgrass prairie per Samson and Knopf 1994) and low human population density and therefore few qualified surveyors available.

Calibration

As a result of this validation, calibration constants between Schlicht (one surveyor; fixed-width transect) and Swengel (two surveyors; unlimited-width transect) indices were calculated (Table 7), both per hour (2.42) and per km (2.89). At the species level, the calibration ratio of Swengel to Schlicht indices appeared to increase (as expected) in relation to reduced encounterability, and a combination of reduced detectability and encounterability (both patterns significant when excluding the three specialists). Otherwise, patterns explaining species-specific variability in calibration ratios were difficult to identify. Dennis et al. (2006) obtained relatively more significant patterns related to higher (or sooner) butterfly numbers detected than in this study, in much larger datasets that involved more varied vegetation (i.e., canopy heights) but with some variation in which variables mattered and how among those datasets. Significant variables relevant to this study included wing length (although contradictorily), brightness, and long flight periods. While it was not possible to calculate a calibration constant to any other teams, it is suggestive that a systematic skewing did not occur with datasets from later years than sampled by Schlicht and Swengels, because end year of a trend test did not significantly relate to trend result (see “Monitoring”). See “Trend” below for discussion of alternate calibrations.

Monitoring

Specialists had significantly different outcomes from common species, with specialists strongly skewed to negative trends while common species had an equal number of positive and negative trends (Table 9). All significant trends (3/3) for specialists were negative, while only 2/5 were negative for common species. While trends for individual species at individual sites have meaning, these would be more prone to sampling error (cf. Harker and Shreeve 2008) and confounding factors (e.g., comparability among teams, variation in weather and phenology on survey date) than the relative comparisons of outcome between specialist and common species, since the same methods were applied to all species and systematic differences in statistical power (N years, span of years) did not occur between specialist and common species. Since end year did not significantly differ between specialist and common species, and end year did not significantly relate to trend results, annual climatic variation as well as calibration differences between Selby in 2005 vs. earlier teams do not explain the negative trends of specialists nor the difference in trends between specialist and common species. Thus, outcomes of the trend tests were not attributable to differential statistical power between specialist and common species, or a climatic pattern in a certain year, or systematic miscalibration between earlier and later teams, but rather whether the species was a specialist or not.

The general pattern of specialist decline reported here is confirmed in presence/absence analysis by MDNR (2007a), cross-referenced to Coffin and Pfannmuller (1988), that extensive surveys indicated rapid disappearance of Dakota skipper from remnant habitat, very few records for Poweshiek skipperling during 2003–2006 (while it was formerly the most frequently encountered prairie obligate skipper), and no observations of Ottoe skipper reported in the state since 1995. Direct confirmation of the trend results for common species was not available, but true patterns in some species are likely. In Europe, widespread species show less decline than localized butterflies, or even stable and increasing patterns (Pollard and Eversham 1995; Kuussaari et al. 2007), with approximately as many increases as decreases among widespread species since 1980 (van Swaay et al. 2006). This resembles the equal number of local increases and declines in this study.

On the one hand, the sensitivity tests indicated that the adjustments of indices to make them more comparable to Schlicht indices performed as expected by successfully biasing against negative trends, since lower calibration constants and no adjustments produced coefficients significantly more negative than the a priori method used. On the other hand, all sets of trend tests (regardless of whether and how adjustments were done) produced the same overall outcome: specialists significantly skewed to negative trends, which had significantly lower coefficients than the approximately even distribution of positive and negative coefficients for common species. Different methods of including or excluding the most different study (Dana 1991 and Dana in litt. 5 March 1993) had a minor effect on the overall outcome, as this affected only four of ten species at one of seven sites. These results suggest the benefit of obtaining larger datasets, even at the expense of rigor and with the comparability issues inherent to meta-analyses.

Implications for conservation

Warren (1993) demonstrated that losses of rare/localized butterflies had been just as great on protected as on unprotected land. Our study confirms a similar situation of prairie butterfly decline in prairie reserves in the highly altered and fragmented landscape of western Minnesota. In contrast to the “semi-natural” communities described by Warren (1993), these sites in Minnesota (Table 8) contain high-quality (“virgin” – i.e., never tilled) native vegetation explicitly managed throughout this study for natural ecosystem value primarily with fire, usually on a rotation of about 3–6 years (see citations in Table 8). They are relatively large and longer preserved, and include those sites considered most valuable for prairie butterfly conservation (Dana 1997). As in Warren’s (1993) and van Swaay et al.’s (2006) studies, multiple and differing causes of decline (not just burning) can be involved in the Minnesota prairie landscape, both among reserves and between reserves and the non-conserved landscape.

A direct comparison of burning to alternative managements was not possible in this study due to the lack of time series of indices at sites not managed with fire. Burning usually began in a reserve before much butterfly surveying had occurred, most reserves were managed primarily with fire, and most surveys focused on sites after they were conserved (Wendt 1984; TNC 1988, 1994; Swengel 1996, 2001; Swengel and Swengel 1997; Schlicht 2001, 2003; Nekola 2002). However, the pronounced declines of specialist but not common butterflies in this study is consistent with research that fire management is most unfavorable for specialist butterflies, compared to other butterfly species in prairies, and that other unintensive management types tend to be more favorable for specialist butterflies (Swengel 1998, 2001; Schlicht 2001, 2003; Swengel and Swengel 2007).

Are better outcomes possible for prairie-specialist butterflies? In the USA as well as other countries, favorable results (stable or increasing trends) have occurred for localized butterflies of open vegetation in highly fragmented landscapes when management was designed specifically in consideration of individual butterfly species’ biology and requirements, not generally for a vegetative or ecosystem type (Thomas 1984; New 1993; Pollard and Yates 1993; New et al. 1995; Oates 1995; Robertson et al. 1995; Thomas 1995; Pullin 1996; Mattoni et al. 2001; Pryke and Samways 2001; Bourn and Thomas 2004; Sands and New 2002; Swengel and Swengel 2005a, 2007). Because many midwestern prairies were managed for many years with light grazing or haying until being made into preserves, and unintensive regimes of grazing, haying, mowing, localized brush-cutting, and idling tend to support higher populations of prairie-specialist butterflies than burning, these methods should be more extensively studied and employed in prairie conservation (McCabe 1981; Swengel 1996, 1998, 2001; Swengel and Swengel 1999a; Schlicht 2001, 2003; Powell et al. 2007; Schlicht et al. 2007). As a result, ongoing research in western Minnesota on grazing management as it affects prairie-specialist species (Selby 2006) is very valuable.

As found elsewhere (Maes et al. 2006; Dennis et al. 2007), matrix (non-habitat) for one butterfly species is habitat for others, but the boundaries between matrix and habitat may not be discrete. That is, a species may use some resources in the adjoining matrix off the prairie preserve (Dana 1991). Conversely, not all of a prairie reserve, not even all the high-quality prairie vegetation in a reserve, may be habitat for a prairie-specialist butterfly species (McCabe 1981; Dana 1991, Schlicht et al. 2007). Furthermore, presence of both upland and lowland grassland in a site associates with significantly higher abundance for Poweshiek skipperling, Dakota skipper, and regal fritillary even though these species significantly peak in abundance in dry prairie grassland (Swengel 1997; Swengel and Swengel 1999a). While “core areas” (areas of highest abundance) in dry prairie are useful focuses for favorable management (e.g., Swengel and Swengel 2007), it would be more beneficial to manage favorably all components required by the butterfly (e.g., lowlands as well as uplands). As reported elsewhere for other butterflies (Maes et al. 2006; Dennis et al. 2007), for most favorable outcomes, the specific resources and management tolerances required by prairie specialists need to be identified (e.g., McCabe 1981; Swengel 1997; Schlicht and Orwig 1998; Swengel and Swengel 1999a; Schlicht 2001) and incorporated into individual site management plans.

The results here also indicate a contrasting outcome for butterflies between an ecosystem (or vegetative) approach to reserve selection vs. management after preservation. Other studies have validated that using plant species richness or vegetative diversity for reserve selection is effective at capturing populations of associated specialist butterflies and other insects (Panzer and Schwartz 1998; Haddad et al. 2001; Kerr et al. 2001; Shuey 2005). Prairie species of conservation concern are also effectively captured by strategic use of an umbrella species, like the greater prairie-chicken (Tympanuchus cupido pinnatus), which may outperform a strategy of locating the largest prairie patches (Poiani et al. 2001). However, butterfly conservation outcomes in midwestern USA prairies and savannahs managed for ecosystem value primarily through restoring ecological processes (especially fire) have often been poor for specialized butterflies, but significantly better when alternative managements have been employed, especially by applying findings from prior research into management responses of individual species (McCabe 1981; Schlicht and Orwig 1998; Schlicht 2001; Swengel 2001; Swengel and Swengel 2001a, 2007; Powell et al. 2007; Schlicht et al. 2007). For the Karner blue, much better long-term trends occurred on reserves using management modified to help this federally endangered butterfly (e.g., permanent non-fire refugia) than in the landscape at large (including public lands) (Swengel and Swengel 2005a, 2007) or in other states where ecosystem management with fire was the principle management (Grigore and Windus 1994; Lane and Dana 1994). Other invertebrates such as grassland snails (Nekola 2002) and other grassland insects of conservation concern besides butterflies (Swengel 2001) have also fared significantly better with alternatives to fire.

Thomas et al. (2004) found that butterflies declined sooner and steeper than birds and plants. Thus, declines of butterflies (the best studied of those organisms with characteristics that predispose greater vulnerability, such as low dispersal tendency and short, often synchronized generations; cf. review in Bobo et al. 2006) are a warning of possible declines of other less studied groups while altering management to be more favorable for butterflies may improve the outcome for other groups, and therefore for the ecosystem.

Even with the vast destruction of prairie (Samson and Knopf 1994), relatively many populations of prairie-specialist butterflies have been documented in western Minnesota prairies in extensive surveys (Coffin and Pfannmuller 1988; the datasets used here). Unfortunately, preserving vegetation and inventorying reserves are not sufficient to safeguard against systematic declines in prairie-specialist butterflies. As Shuey (2005) noted, butterfly species will likely continue to decline unless reserves are designed and managed specifically with consideration for these species.