Levels of contaminants in fish and their health status in pollution gradients are a rich source of information for applied environmental science (WHO 1993; van der Oost et al. 2003; Law et al. 2010). Whereas investigations of sediments and other abiotic compartments may provide helpful information on ecosystem exposure to hazardous substances (Håkanson and Jansson 1983; Förstner 1989; Jonsson 1992; Tarazona et al. 2014; Apler 2021), investigations of fish have the potential to address the actual environmental effects (Södergren 1989; Munkittrick 1992; Sandström et al. 2005). Thus, measurements of contaminant levels in fish and their health status can be used as a complementary tool to assess risks from contaminated sediments on the environment and people's health.

Sediments polluted by metals, organochlorines, and other organic compounds, a legacy from previous unawareness of environmentally hazardous substances, frequently occur outside paper and pulp mills (Pearson and Rosenberg 1976; Håkanson et al. 1988; Jonsson et al. 1993; Kähkönen et al. 1998; Kienle et al. 2013; Hoffman et al. 2017; Apler et al. 2020). The ultimate risk-reducing measure would be removing or capturing the contaminated layers (Peng et al. 2018; Lehoux et al. 2020). However, this is not feasible on a large scale, neither from a technical nor an economic perspective. Direct measures also demand lots of energy, material resources, and areas on land for disposal (Suer et al. 2004). Hence, active measures may be sub-optimal from a sustainability perspective, contradicting other environmental goals, e.g., reducing global warming. The removal of contaminated sediments itself can also be problematic for aquatic life (Baumann and Harshbarger 1998) unless robust protective measures are undertaken. Therefore, society needs tools to distinguish between places where there is a need for risk-reducing efforts to protect aquatic life from areas that, although being contaminated, show small or no signs of being affected and where natural recovery processes occur (Magar and Wenning 2006; Förstner and Apitz 2007; Fuchsman et al. 2014; Fetters et al. 2020).

The environmental effects of emissions from the cellulose industry have, since the 1960s, been an area for environmental research forming the basis for successive mitigation actions to protect aquatic life (Norrström and Karlsson 2015; Ussery et al. 2021). Numerous field studies targeting effects on fish have been conducted in Scandinavia (Sandström et al. 1988; Landner et al. 1994; Förlin et al. 1995; Karels and Oikari 2000), North- and South America (Adams et al. 1992; Munkittrick et al. 1994; McMaster et al. 2006; Chiang et al. 2010; Mower et al. 2011; Barra et al. 2021) and Oceania (Harris et al. 1992; van den Heuvel et al. 2010). For natural reasons, most studies have focused on the effects of present emissions, whereas relatively few studies, with some exceptions (Meriläinen et al. 2001; Hynynen et al. 2004; Arciszewski et al. 2021), have investigated the status of ecological indicators after mill closure. Generally, evident signs of ecosystem recovery have been observed after mill closure or at mills still operating, having improved their environmental performance. However, residual disturbances in terms of eutrophication and toxic effects on fish may persist in some cases (Sandström et al. 2016).

This study conducted field surveys outside ten Swedish paper and pulp mills with well-documented occurrences of polluted fibrous sediments. Some of the mills have been closed for decades, whereas others are still operating, with high demands on environmental performance. The aims were to (1) investigate to what degree the historical emissions from pulping affect today's environmental conditions in the receiving areas concerning pollutant content and health status of the non-migratory, prevalent, and ecologically important fish species Perca fluviatilis, (2) incorporate new data into existing data records to conclude the environmental development and possible success of undertaken remedial actions as well as what stressors that remain (3) to identify weaknesses and evaluate the applied sampling strategy, e.g., are the sample sizes and the number of chemical analyses acceptable from a statistical standpoint? In a broader context, the goal is to develop a cost-effective monitoring protocol using fish as a sentinel to guide risk assessments of contaminated sediments in coastal and limnic ecosystems.

Background

Chemical Contaminants

From around 1940 until the end of the 1960s, mercury (Hg) preparations were added as a pesticide for mucus control and conservation of wet pulp in most Swedish cellulose industries (Jerkeman and Norrström, 2017). Moreover, at some places along Sweden's coast and inland, chlor-alkali plants have been located next to the cellulose industries to produce lye and chlorine gas for boiling and bleaching chemical pulp. In that process, significant Hg emissions to air and water were also generated through residual emissions of graphite sludge. The total discharges of Hg from the Swedish cellulose- and chlor-alkali industries have been around 500 tons (Jerkeman and Norrström, 2017). The usage of Hg was banned in Sweden in 1968. As a result, emissions from the pulp- and paper industry were sharply reduced around 1970.

Traditionally, the potentially hazardous metals such as arsenic, lead, cadmium, copper, chromium, nickel, and zinc are measured in Swedish environmental monitoring. The primary source of these metals within the cellulose industry was, and still is, the raw wood material since the trees take up metals from the soil and are exposed to atmospheric deposition. Metals generally occur in low levels in wastewater from the process, usually not being reduced in the effluent treatment. Therefore, handling large amounts of wood produces significant emissions (Sandström et al. 2016). Moreover, in the past, the production of sulfuric acid by roasting sulfur silica for sulfite pulp production generated pyrite-ash contaminated by metals that may have been released into the biosphere (Baragaño et al. 2020; Apler 2021).

Pollution by persistent organics has also occurred in areas where pulp mill effluents have been discharged. Chlorinated dioxins and furans (PCDD/Fs) were previously inadvertently formed in manufacturing bleached chemical pulp when elemental chlorine was used in the bleaching process (Swanson et al. 1988). Increased closure and the transition to bleaching with chlorine dioxide in the late 1980s and early 1990s significantly reduced emissions of chlorinated substances, and the dioxin formation ceased (Berry et al. 1991; Strömberg et al. 1996). In addition, which also applies to other globally dispersed pollutants, precipitation of long-range air-borne emissions over the forests where the raw material grows can accumulate dioxins in the bark of the trees (Salamova and Hites 2010). Therefore, a possible source of dioxins is run-off from wood storage and debarking. Atmospheric deposition over the catchment area of the mill's raw water supply (Josefsson et al. 2011) and over the effluent-receiving water bodies is another potential source.

Neither of the chlorinated compounds PCBs, DDT, and HCB have been used as a specific auxiliary or additive chemical in the pulping process. However, PCBs have generally had extensive use in society, such as additives in hydraulic and transformer oils and sealants in building structures (Breivik et al. 2002). DDT has been used for insect control at the timber stores in the forest and may have accompanied the raw material into the industrial areas. The use of DDT in Sweden was banned in 1969. HCB was inadvertently formed in the chlor-alkali process. Atmospheric deposition of these compounds over the forests where the raw wood material has grown may also, like PCDD/Fs, have contributed to a presence in the bark (Salamova and Hites 2010).

Fish in Environmental Monitoring

The non-migratory, spring-spawning fish species European perch (Perca fluviatilis) is frequent in Scandinavia's brackish and freshwater ecosystems. Being relatively sedentary (Hansson et al. 2019), perch has for decades been used for environmental monitoring purposes (Sandström et al. 2005). The fall is the period for active gametogenesis of perch and the standard sampling period in the Swedish national monitoring of contaminant levels and health status (SMNH 2012). For contaminant monitoring, perch with a length of 15–20 cm is usually used, whereas the effect monitoring programs often use fish within the length interval of 20–25 cm (SEPA 1997) but also smaller individuals divided into length classes (Sandström and Neuman 2003).

Reflections on long-term records of surveys in ecosystems may provide valuable information for environmental management (Arciszewski et al. 2021; Ussery et al. 2021). An evaluation of about fifty fish health surveys in the Swedish cellulose industry receiving waters conducted between 1985 and 2015 (Sandström et al. 2016) found that there has been a good recovery of fish health in most receiving waters compared to conditions during the 1980s and 1990s. In general, biochemical health measures had responded well to process improvements in the mills. In contrast, morphometric measures seem to have had a longer response time, and residual effects on mainly reproductive measures were demonstrated in some receiving waters in recent years. This, combined with experience from the Canadian EEM program (Lowell et al. 2005) and the practical benefits of focusing on morphological measures, guided the final choice of biomarkers.

Materials and Methods

Study Areas

The survey included areas around ten Swedish mills adjacent to coastal areas, large lakes, and a small forest lake (Fig. 1; Table 1). The production has ceased at Hallstanäs in 1976, Kramfors in 1977, and Norrsundet in 2008, while the others are still active. However, the production processes may have changed considerably since the emissions of pollutants took place, in most cases going back to the 1970s and earlier. Some mills, especially Norrsundet, have a data record of regularly recurring surveys since the 1980s, while others have only been surveyed sporadically. The principle for selecting sites within each area, following previous Swedish sampling strategies (SEPA 1997), is shown in Fig. 2. A sampling point named "near" (near receiving site) was placed near the mill's discharge point and identified sediment contaminants. Within 5–10 km from the mill but still within reach of the wastewater plume, another sampling point, called "remote" (remote receiving site), was placed. Upstream, or at a sufficiently large distance from the mill to be considered unaffected by the industrial emission, a sampling point designated "ref." (reference site) was placed. In Waldetoft et al. (2020), detailed maps from each study site are available. Results from seven other Swedish lake- and coastal areas sampled simultaneously (Fig. 1) were also included for comparison.

Fig. 1
figure 1

Mill study areas and comparison areas

Table 1 Production and characteristics of the receiving water areas at the study sites
Fig. 2
figure 2

Examples illustrating the concept of near and remote receiving areas and reference areas at Norrsundet (upper) and Grycksbo (lower)

Field Sampling and Dissection

The fieldwork was carried out from 2017 to 2019. Sampling of fish, length 15–20 cm, took place from late August to early October. Fish were captured using gill nets (18–25 mm mesh size) set overnight. After the sampling, the fish was frozen on-site and transported to IVL Swedish Environmental Research Institute's laboratory in Stockholm. In total, 878 individuals were sampled and examined from the ten study areas (Table 2).

Table 2 Descriptive statistics (mean ± standard error) of sampled fish. Red = near receiving site, yellow = remote receiving site, green = reference site. The number of sexually mature individuals in parenthesis

For each individual, the condition factor (CF: 100 × weight (g)/length3 (cm)), gonadosomatic index (GSI: 100 × gonad weight (g)/somatic weight (g)) and liversomatic index (LSI: 100 × liver weight (g)/somatic weight (g)) were calculated. In addition, an assessment of growth (cm/year) was made based on back-calculated length (Sandström et al. 1988). Also, the proportion of sexually mature individuals (SM) was calculated. Individuals with GSI ≥ 1 were considered sexually mature. After dissection, three pooled muscle and liver tissue samples were prepared from each fishing site. Each pooled sample contained an equal amount of tissue (about 100 g) from four to ten, most often eight, fish individuals.

Chemical Analyses

Standard analytical methods were applied; for Hg, US EPA 1631, other trace metals, SS-EN 13,805:2014, PCDD/F, US EPA 1613, PCB, DDT, and HCB, SS-EN 16,167:2018, EN ISO 6468:1996. Organochlorines and Hg were analyzed in muscle tissue, whereas the other trace metals were analyzed in liver tissue. Detection levels are shown in Table 3.

Table 3 Applied methodology

Evaluation

A linear mixed effect model was used to assess differences in mercury and cadmium levels between near receiving, remote receiving, and reference sites for all mills at once and between different environmental types (i.e., coastal, estuary, and lake (Table 1)). The contaminant was the response variable. The type of site, environmental type, and the interaction between these were the explanatory variables. The random effects were to which mill and to which site an observation belongs. Tukey's pairwise comparison was used as a post hoc test.

One-way ANOVA was used for evaluating PCB levels within sites. A sample size of three samples per site is small, but on the other hand, variances are reduced when pooling samples. A simulation showed that ANOVA on pooled data, compared to individual data, leads to inflated p-values, but differences of ecological relevance can be detected. Thus, it was decided to conduct and report the results of the tests.

Using linear regression, statistical evaluation of the CF, LSI, GSI, and growth at age was performed for each mill. Each morphometric index was the response variable. The explanatory variables were site (near receiving/remote receiving/reference) and gender (male/female). Gender was included as a control variable to account for differences between the sexes. For the analysis of GSI, only sexually mature individuals were included. The regression for growth at age was performed on log-transformed values with the logarithmized age of the fish as a control variable. Dunnett's test was used as a post hoc test.

The test for differences in sexual maturity (SM) was performed using a logistic regression model with gender status (sexually mature/not sexually mature) as the response variable and site (near receiving/remote receiving/reference) and length of the fish (cm) as explanatory variables. All analyses were performed in R (R Core Team 2020), and an example code is available in Waldetoft et al. (2020).

In addition to evaluating statistical significance, the concept of critical effect size (CES) was used (Munkittrick et al. 2009). The idea was developed in the Canadian EEM program for CF, LSI, and GSI (Lowell et al. 2003). CES acts as an identifier of a minimum difference between receiving water and reference sites regarded as potentially unacceptable. CES for CF, LSI, and GSI is  ± 10%, ± 25%, and ± 25%, respectively. For example, if LSI in a receiving water site is more than 25% larger than in the reference site, CES is said to be exceeded. In the Canadian EEM program, if CES is exceeded at a mill for two consecutive surveys, it acts as an identifier that either more focused monitoring to assess the magnitude of the effect or an investigation of the cause of the effect is needed.

A meta-analysis was performed to assess whether overall response patterns were present for any morphological index. The model used was a meta-analytic mixed effects model (e.g., Raudenbush 2009). A two-stage approach was used (Burke et al. 2017), in which coefficients and variance–covariance matrices from regressions on each mill were used in the meta-analytic mixed effects model. The meta-analytic model was fitted using the metafor-package in R (Viechtbauer 2010). The fixed effect was the model coefficients, and each coefficient was allowed to vary randomly for each mill by specifying the random effects as random intercept and slope.

The time trends of Hg and PCDD/F levels in fish were assessed statistically using linear regression with the natural logarithm of Hg content as the response variable and sampling year as the explanatory variable.

Areal and Temporal Comparisons, Normalization

Parallel with the fish surveys in the cellulose industry receiving waters 2017–2019, IVL Swedish Environmental Research Institute has conducted fish surveys using the same methodology in other inland and coastal waters in Sweden to address contaminant levels. Data from these surveys have been used for comparisons (denoted in figures as "comp. areas"). In addition, data from previously published articles, reports from regional environmental monitoring, and environmental assessments performed according to the permitting processes for individual mills (Södergren 1989; Lundgren et al. 1991; Heinemo 2001; Olsson et al. 2005; Karlsson and Malmaeus; 2012; Sandström et al. 2016; SEPA 2022) were used to evaluate time trends in pollutant levels.

Due to bioaccumulation, Hg levels in fish are correlated to fish age and size (Olsson 1976). Therefore, to improve the comparability of Hg content among fish of different sizes and sites in space and time, observed Hg levels were standardized to correspond to a 300-g perch based on an empirically supported transfer function (Åkerblom et al. 2014).

Most organochlorines are lipophilic and therefore correlated with the fish's lipid content. To enable comparisons between different species and different parts of the fish, levels of lipophilic substances in the European Union are normalized to 5% lipid content (European Commission, 2014). Perch is a lean fish where lipid content typically varies between 0.5 and 1%. In this study, a typical value for the perch lipid content was set to 0.8%, and measured levels were normalized to 5% lipid content, i.e., multiplied by a factor of 6.25. Toxic equivalence (TEQ) for PCDD/F and dioxin-like PCBs (dl-PCB) was calculated using congener-specific toxic equivalence factors (TEFs) reported by the World Health Organization (van den Berg et al. 2006). When the PCDD/F levels were below detection limits for all congeners, the average medium bound for all samples with no detectable PCDD/Fs was used to calculate TEQ. An overall summary of the applied methodology is given in Table 3.

Results and Discussion

Levels of Contaminants

Metals

Concerning Hg levels, differences between sites are found at specific mills (Fig. 3). In two areas, Hallstanäs and Lake Grycken, Hg levels exceeded the limit value for fish marketing within the EU. There are well-documented fiber banks with high Hg content in both areas. It is also noteworthy that in the Grycksbo system, the levels in the upstream reference lake are approximately at the marketing limit. However, there is no statistical evidence for overall higher Hg in perch from near and remote receiving sites in relation to reference sites for each environmental type (Fig. 4a).

Fig. 3
figure 3

Levels of Hg in pooled samples of 5–10 perch individuals (15–20 cm length) caught between 2017 and 2019. Error bars show 1.96*SE (SE = standard error). Stars indicate significant differences toward reference (5% significance level). Red line at 0.5 mg/kg ww = marketing limit value within the European Union (EC 1881/2006)

Fig. 4
figure 4

Average Hg (upper)) and Cd (lower) levels in perch from near receiving, remote receiving, and reference sites categorized in the environmental types Coastal, Estuary, and Lake. The same letter indicates a non-significant difference (5% significance level). Values were log-transformed before analysis

The general trend in fish from Swedish waters is that Hg levels have decreased over the 50 years of environmental monitoring (Johnels et al. 1967; Lindeström, 2001; Åkerblom et al. 2014). Figure 5 shows time trends of Hg levels in fish from study sites where such data were available. At Östrand, a significant decreasing time trend was found (p < 0.05). At Grycksbo, the time trend was non-significant (p > 0.05). Hallstanäs and Iggesund were not examined statistically due to small sample sizes.

Fig. 5
figure 5

Time trends of Hg levels in perch in the receiving waters of four mills per 5-year period from 1965 to 2019. Historical data from regional environmental monitoring programs. Error bars indicate standard deviation (SD)

Decreasing Hg levels in fish over time have also been reported in studies of the Great Lakes of North America (Blukacz-Richards et al. 2017) and other inland and coastal waters of the USA, Canada, and Australia (Munthe et al. 2007). However, recent studies have also indicated increasing levels (Miller et al. 2013; Gandhi et al., 2014). The reason is unclear, but confounding factors may be increased long-range atmospheric Hg depositions and changes in land use around the sampled water bodies. In summary, a trend toward declining Hg levels is clear but not unambiguous. Environmental factors affecting methylation, bio-dilution, uptake, sediment burial, and how the Hg was released (inorganic form or as directly bioavailable phenylmercury) can explain some of the differences. Plausible explanatory factors for the observations (Fig. 3) are, e.g., the lakes in the Grycksbo system, are low-productivity humic lakes, which is a general risk factor for elevated Hg levels (Lindqvist et al. 1991; Sonesten 2003). At Hallstanäs, Hg was discharged in a directly bioavailable form (Heinemo 2001). In the Aspa receiving area (Lake Vättern), high levels of zinc counteract the uptake of Hg (Lindeström et al. 2002).

Cadmium (Cd) is an example of a substance whose bioavailability, to a large extent, is controlled by water chemistry, e.g., salinity and the presence of chloride ions (WHO 1992). The Cd levels in the investigated freshwater lakes around Grycksbo, Gruvön, and Aspa and the low-saline estuary areas in the vicinity of Obbola, Väja, Hallstanäs, and Kramfors were significantly higher than those in the more brackish coastal waters outside Östrand, Iggesund, and Norrsundet (Fig. 4b). In contrast to the differences between the environmental types, there were no significant differences between the different site types (near, remote, and reference). Cd is thus an example of a substance where comparisons between areas must be made with caution, where it is of utmost importance to have reference areas of the same environmental type for comparison, and where levels in fish in absolute numbers cannot be used for risk assessment of contaminated sediments.

In contrast, other metals, e.g., zinc (Zn), show less variation both within and between the study areas (Fig. 6). Zn is an essential substance for all organisms, which means that fish have a more prominent ability to regulate the metal themselves. Thus, Zn is an example of a substance where measured levels in fish cannot be used for risk assessments of contaminated sediments.

Fig. 6
figure 6

Zn levels in perch from the study sites. Error bars show standard errors (SE). Stars indicate a significant difference toward reference (5% significance level)

Raw data for measured levels of the other trace metals (As, Cr, Cu, Ni, Pb) are presented in Appendix A and evaluated in Waldetoft et al. (2020). Experimental studies have indicated that metals bound to fiber sediments generally have low bioavailability (Apler et al. 2018; Frogner-Kockum et al. 2020). Environmental factors, e.g., water chemical properties and mineralizations in the catchment area bedrock, can play a more significant role than the exposures to locally polluted sediments for the metal content in fish (Björnberg et al. 1988; Förstner and Wittmann, 1981).

Organochlorines

In most cases, the PCDD/F levels were close to the detection limit of the analysis method (0.05–0.1 pg TEQ/g w.w., Fig. 7). This suggests, keeping in mind that a result below a relatively low detection limit also carries information, that it would be even more informative in the future, as reported in Dahmer et al. (2015), to measure PCDD/F levels in the liver that is a more fat-rich tissue. However, levels above detection limits were noted in a couple of cases, e.g., Lake Grycken. Underlying environmental factors like low bioproduction (Sandström et al. 2015) and, thereby, weak biodilution, limited sediment growth, and slow water turnover (Table 1) probably contributed. The theoretical exchange time of water in Lake Grycken is about 2 months compared to typically a few days in the coastal areas of the Baltic Sea (Bryhn et al. 2017). In a water area with slow water exchange, the effect of molecular diffusion from the sediments is more significant compared to an area with rapid water exchange when everything else is constant (Håkanson 1999).

Fig. 7
figure 7

Lipid-normalized PCDD/F levels in perch from the study sites. Red line at 3.5 pg/g w.w. corresponds to marketing limit value within the European Union (EC 1881/2006)

In the receiving area of the Norrsundet mill, there was a declining gradient with slightly elevated levels at the effluent discharge receiving sites. However, compared with historical data, the dioxin levels in perch have decreased (p < 0.05 for slope coefficient) (Fig. 8). The development of concentrations in fish follows relatively well the reduction of emissions of chlorinated substance (AOX), which can be linked to process changes and environmental protection measures taken. At the Norrsundet mill, the transition to ECF (Elemental Chlorine Free) bleaching occurred in 1994. In 2008, production at the Norrsundet mill ceased. It is yet possible to detect dioxin levels that are slightly higher than background premises, indicating that a specific, albeit small, bio-uptake occurs from the sediments either through molecular diffusion of dissolved dioxins into the organism or ingested via the food. Another possible contributing source of PCDD/Fs in the area was a sawmill that, until 1978, used chlorophenols to impregnate wood. The picture of declining PCDD/F levels in fish outside the Norrsundet mill is consistent with observations outside North American pulp mills (Pryke et al. 1995; Hagen et al. 1997; Pryke and Barden 2006; Dahmer et al. 2015). It is gratifying to note the apparent decline in dioxin levels. The last remnants of a vital cellulose industry environmental issue discussed for almost 50 years seem to be ending.

Fig. 8
figure 8

Left, time trends of AOX emissions from Norrsundet mill, data from Norrström and Karlsson (2015) (1980–1990), and environmental performance reporting from the mill (1994–2008). Right, PCDD/F levels in perch from the effluent receiving area. Data from Södergren 1989; Olsson et al. 2005; Karlsson and Malmaeus 2012 and actual survey

However, elevated levels of PCDD/Fs in pelagic fatty fish of the Baltic Sea, like herring (Clupea harengus) and salmon (Salmo salar), still is a severe environmental problem. Levels often exceed the EU marketing limits, and specific population groups (women of childbearing age and children) are advised by food safety authorities to limit their fish consumption. The results presented in this study show that temporal and spatial environmental monitoring can contribute to decision-making. Most pulp mills' receiving water areas in the coastal zone show low levels in non-migratory perch exposed to the legacy of the previous PCDD/F contamination. This indicates that measures against contaminated sediments outside pulp mills would be of minor importance for the remediation of the Baltic Sea's current dioxin problem. Instead, elevated levels in pelagic fish are primarily driven by atmospheric precipitation (Armitage et al. 2009; Assefa et al. 2019; SEPA 2021), similar to what has also been found in the North American Great Lakes (Pearson et al. 1998; Dahmer et al. 2015).

Significantly higher PCB levels in the near receiving areas were found at several mills (Obbola, Väja, Hallstanäs, Kramfors, Östrand, and Grycksbo, Fig. 9), probably reflecting the use of PCB oils in the infrastructure of the mills. A similar pattern as for Hg and PCDD/Fs, with the highest levels in the Grycksbo receiving water, was also noted for PCBs. The high levels in this area are likely due to the receiving water's characteristics, e.g., slow water turnover and low biodilution, rather than an exceptionally high load of PCBs. Except for Obbola and Östrand, most mills are situated near small municipalities (population < 5000). PCBs were widely used within society before it was banned in the 1970s. Not surprisingly, we found the highest PCB levels in the waters outside Stockholm, the largest city in Sweden (population approx. 1 million).

Fig. 9
figure 9

Lipid-normalized ∑PCB6 levels in perch from the study sites. Error bars show standard errors (SE). Stars indicate a significant difference toward reference (5% significance level). Red line at 75 ng/g w.w. corresponds to marketing limit value within the European Union (EC 1881/2006)

Dahlberg et al. (2020; 2021) have conducted detailed investigations of PCBs in one of our study areas, the receiving waters of the Väja mill. The fiber-rich sediments in this area contain moderately elevated PCB levels (~ 25 ng ∑PCB7/g d.w.). Measurements in benthic biota were interpreted as signs of bioaccumulation and biomagnification. Dahlberg et al. (2021) conclude that quantifying dispersal routes is essential for a proper risk assessment and risk management of contaminated sediments. The results presented in this study can be looked upon as an integrated quantification of the dispersal, showing (Fig. 9) that the spread and uptake of PCBs in the food web in the Väja area lead to slightly elevated levels of PCBs at the higher trophic level that predatory perch represents. Whether this is a risk that justifies measures or is a sign of an acceptable environmental situation for an area affected by industrial emissions for over a 100 years is not a scientific but a political question.

Lipid content normalized levels of all measured organochlorines are summarized in Appendix B. HCB and DDT generally showed weak or non-existing signs of bio-uptake in fish. However, in the receiving water of the Östrand mill, an elevated content of HCB (3 µg/kg w.w.) was measured, which can be linked to previous operations at a chlor-alkali plant. Compared to other studies from HCB-contaminated areas outside chlor-alkali plants, the levels were not remarkably high (Hinck et al. 2009; Huertas et al. 2016).

The DDT levels were elevated in the receiving area of the Grycksbo mill compared to other sites. However, the content was highest in the upstream reference lake Tansen (8 µg/kg w.w.). After investigation, it turned out that adjacent to this lake, a wool factory operated in the 1950s and 1960s, and blankets were impregnated with DDT. Thus, this is an excellent example of when fish can function as a sentinel to detect unknown hazards in the aquatic environment.

As discussed by Bignert et al. (2014), the relationship between chemical analytical error and other sources of variation, as well as the cost for collection, preparation of samples, and chemical analysis, will determine the number of individuals in each pool and the number of pools that should be analyzed to achieve high cost efficiency and good statistical power. When using pooled samples instead of analyzing individual fish, information at the individual level is, by definition, lost. This is not optimal since the underlying distribution and possible outliers are masked. Statistical comparisons of data from pooled samples also rely on randomly assigning individuals to a sample, a requirement that must not be overlooked. Despite these drawbacks, in our opinion, statistical comparison with acceptable precision that fulfill the criteria of an operative environmental monitoring program can be made.

Health Status

Observations

The meta-analysis of the morphometric measures CF, LSI, and growth at age gave no indications of an overall response pattern (p > 0.05) (Fig. 10). Intervals covering zero indicate no overall response pattern, whereas intervals to the left of zero indicate a reduction in that index. Vice versa, intervals to the right indicate an increase. All intervals cover zero in this case. GSI and SM were excluded from the meta-analysis since too few sexually mature individuals were obtained at most sites. The only two mills regarded as having a sufficient sample size for evaluation of GSI and SM were Väja (all sites) and Grycksbo (Lake Grycken and Lake Tansen, not Lake Varpan). Future studies with larger sample sizes for GSI and SM will shed more light on the overall response pattern for these indexes. Summary data of morphological indexes are presented in Appendix C.

Fig. 10
figure 10

Results from a meta-analysis of CF, LSI, and growth at age in perch collected between 2017 and 2019, comparing near receiving sites with reference sites. Points mark estimated average responses, and intervals mark 95% confidence intervals for estimated averages. Intervals covering zero indicate no significant overall difference between perch from near receiving sites compared to perch from reference sites. GSI and SM were not considered due to low sample sizes for these indexes

Evaluation

Outside Norrsundet and Iggesund, there are data records from earlier health surveys (Sandström et al. 2016). In the 1980s, several biomarkers (LSI, GSI, CF, specific blood parameters, skeletal deformations) indicated apparent health effects (Andersson et al. 1988; Sandström et al. 1988; Förlin et al. 1995). During the 1990s, the effect pattern was still clear but less pronounced (Sandström et al. 1997; Sandström and Neuman; 2003). From 2000 and onwards, decreasing but, in some cases, still, significant deviations in variables addressing reproduction and condition have been observed outside Norrsundet but not Iggesund (Sandström et al. 2016). During this time, several important protective measures were undertaken at the mills to reduce effluent toxicity (Norrström and Karlsson 2015). Process optimizations inside the mills, including improved stock washing, oxygen delignification, and handling of spill and condensate, have likely been the most important measures (Sandström et al. 2016). Clearly, the fish health has responded to the mitigative actions, but some deviations may persist. This is consistent with observations from Canada (Arciszewski et al. 2021; Ussery et al. 2021).

CF and LSI showed an effluent-associated increase in the Canadian EEM program (Lowell et al. 2003) but not in this study. In Canada, weight at age was generally increased, but growth at age (which is highly correlated with weight at age) showed no overall response pattern. The opposite, with a smaller LSI in the receiving areas, has also been observed and interpreted as a result of food limitation or residual habitat damage (Arciszewski et al. 2015). When comparing the statistically significant endpoints at each mill with the respective CES, significances exceeding CES were present only for LSI (Table 4). However, they were not in a direction that corresponds with the average response pattern for metabolic disruption in European perch noted by Sandström et al. (2005) nor white sucker (Catostomus commersonii) frequently used as sentinel species in Canadian surveys (McMaster et al. 2006; Ussery et al. 2021). For CF, no exceedances of CES were present. For growth at age, statistically significant differences (CES not derived) were found at Norrsundet, Gruvön, and Aspa, but only for the near receiving vs. reference, not the remote vs. reference. Outside the Norrsundet and Gruvön mills, perch had increased growth, while the opposite was found outside the Aspa mill (Table 4).

Table 4 Significant differences in biomarkers for health status. In bold indicates exceedance of CES. Plus sign indicates a higher value in near/remote receiving areas than reference sites; minus sign indicates lower in near/remote than in reference

In summary, CES was not exceeded at most mills for CF and LSI. The meta-analysis showed no indication of overall response patterns. The results point toward that Swedish paper and pulp mills generally do not negatively affect these endpoints, which is an improvement compared to historical conditions (Sandström et al. 2016).

As mentioned, for most mills, the sample sizes for the assessment of GSI and SM were unsatisfactory since few individuals had reached sexual maturity. To solve the problem of too few sexually mature individuals between 15 and 20 cm, it is suggested that fish of 20–25 cm are caught and included in the analysis as well. In this larger length span, a higher proportion of the individuals will be sexually mature, leading to increased sample sizes for comparisons of GSI. This addition to the methodology was successfully tested in the fall of 2020 outside a metal ore smelter in northern Sweden (Waldetoft et al. 2021).

Establishing good reference areas is essential to assess health conditions and reproductive capacity. In many cases, this is not trivial and needs to be carefully considered in the planning phase for future investigations. The assessments' reliability also increases if more than one reference area is established. Two main types of study designs for fish-health surveys occur, control-impact and gradient designs (Munkittrick et al. 2009). Choosing a reference area is a challenge for control-impact studies. Ideally, a reference site would be located upstream, in a similar habitat, free of confounding influences, with a natural barrier limiting movement between sites. Unfortunately, this situation is seldom fully achieved in coastal areas or large lakes. Therefore, it may be appropriate to initially work with two reference areas and study their inter-variability.

Statistical Considerations

To make a monitoring program cost-effective, it is necessary to limit the scope without compromising critical aspects of ecological field studies, e.g., the potential impacts of confounding factors, the ecological relevance of endpoints used, the influences of natural variability, concerns over statistical design issues, and possible genetic influences on species characteristics (Munkittrick 2009). One crucial parameter is the number of fish that need to be collected to distinguish any differences in health between areas with reasonable statistical certainty. For example, Munkittrick (1992) found insignificant improvement in white sucker variance and mean estimates of length and weight with a sample size above 16 individuals per site. Based on the data collected for this survey, a power analysis was conducted to find a sample size that yields sufficient power for the statistical tests. Calculations were focused on GSI since this variable is evaluated only for sexually mature individuals and thus acts as a bottleneck regarding sample sizes. Calculations were made for two cases based on GSI for sexually mature females in the Väja mill reference site and the Grycksbo mill reference site, the sites with the largest number of mature females, where 36 and 26 sexually mature females were caught, respectively (Table 2). From these sites, variance estimates were calculated. The required statistical power was set to 80%, and the numerical difference in mean GSI between a receiving water site and a reference site was set to 25% of GSI in the reference. The model was a one-way ANOVA with three groups: receiving water near, receiving water remote, and reference site. The remote receiving site mean GSI was set to the average of the reference and near receiving area. Results gave that between 16 and 31 sexually mature females are required. However, it should be mentioned that the standard approach of using a 5% significance level (risk of type I error) and a 20% chance of making a type II error (80% power) might not always be optimal in the case of impact assessment or environmental monitoring. It could be the case that making a Type II error is more costly than making a Type I error (Peterman and M'Gonigle 1992). In such cases, it could be an option to use statistical power higher than 80%.

Concerning the significance level used for assessing the health status of fish, the Canadian EEM approach and the approach presented in this study use a linear model (e.g., linear regression, ANOVA, or ANCOVA) to investigate site-specific differences. This ensures that the specified significance level is maintained throughout the analysis. An alternative, used in several Swedish surveys (Andersson et al. 1988; Sandström and Neuman 2003), is to use separate t-tests between sites, genders, and length classes. However, this leads to many t-tests being performed, each with α = 0.05. The consequence is that the overall α is larger than 0.05, meaning an increased risk of falsely rejecting the null hypothesis in favor of the alternative hypothesis and, thus, the risk of drawing a false conclusion. Therefore, using a linear model followed by a suitable post hoc test is preferable from a statistical standpoint.

Conclusions and Prospects

The overall picture is that the levels of examined pollutants, with some exceptions, were not noticeably higher in the receiving waters investigated, neither in relation to nearby reference areas nor comparison areas from Swedish inland and coastal waters. Based on comparisons with historical data, the trend regarding levels of contaminants in fish in the cellulose industry's receiving waters is generally decreasing. Regarding fish health, with reservation for the reproductive variables where the sample sizes in most cases were too small, the overall picture is that fish health is not impaired in the receiving waters compared to the reference areas.

After the modifications discussed,

The method tested in the project should have the potential to become a relevant and cost-effective part of industries' ongoing environmental monitoring to have a follow-up and control over historical emissions to sediments over time. Studies of the kind carried out may also improve ecological understanding and guide decision-makers on possible remedial measures connected with contaminated sediments. In cases where it is judged that natural recovery is appropriate, i.e., where no physical measures are performed to remove or limit the impact from sediment contaminants, fish surveys should be an effective way of monitoring whether the recovery follows the expected course.

The case study areas have been receiving waters outside cellulose industries. The methodology has also been successfully tested outside a metal smelter (Waldetoft et al. 2021). It can likely be applied generally in aquatic ecosystems where a historical load of metals and persistent organic compounds have contaminated the sediments. The part of the monitoring program that pertains to fish health is not limited to areas with sediment pollution but could also be used to assess ongoing emissions outside industries, municipal treatment plants, or other point sources. The presented survey strategy is similar to the Canadian EEM program, successfully applied nationally to evaluate the cellulose and mining industries. We consider that this should be the way forward also in the Swedish environmental monitoring programs outside industries and look forward to making more international comparisons between results in the future, expanding the overall knowledge about the impact of contaminants on aquatic life.