Adaptation and Validation of a Sentinel-Based Chlorophyll-a Retrieval Software for the Central European Freshwater Lake, Balaton

The importance of lakes and reservoirs leads to the high need for monitoring lake water quality both at local and global scales. Remote sensing is a rapidly evolving, versatile technology that can be successfully applied in several economic and scientific fields. Numerous studies demonstrate the applicability of satellites in algae detection and monitoring. Algae play an essential role in aquatic ecosystems, although their overgrowth poses a serious risk. Overgrowth of algae, also known as algal bloom, has serious ecological, social, economic and health effects. The research area of our study was Lake Balaton, the largest lake in Central Europe. The aim was to find the most appropriate algorithm(s) for the inland lake to define the chlorophyll-a amount. In addition, two new algorithms were developed based on the reflectance values of the satellite image. The results show that the two highest correlations were performed by the newly validated, blue/green ratio-based algorithm, Chl-aB/G (r = 0.93) and the chl_re_oc2 algorithm (r = 0.86) of the Acolite software. Although the Acolite software was not developed for lakes but for marine waters, it is also applicable to inland waters.


Introduction
Stored in lakes, wetlands and rivers, surface freshwater is the most significant resource that sustains environment and the lives of the organisms. 87% of inland waters are lakes and reservoirs and cover about 1.9 million km 2 of the global land surface area. (Hong et al. 2016) In waters bodies, algae species play a fundamental role in water ecosystems being the primary producers in food chain and essential components in biogeochemical cycles. Algae are a substantial part of the biogeochemical cycle, being the primary carbon fixing organisms. The saturation of light and oxygen in aquatic life communities is highly dependent on the amount of algae (Round 1984). Aquatic environments are rarely characterised by a single algal class as a community usually consists of several algal classes (Sathyendranath et al. 2014). Phytoplankton groups show temporal and spatial change within the water body since they are highly sensitive to environmental changes. There are shifts in the seasonality of bloom and phenology, as a consequence of changing temperature and nutrient conditions, high or low light ambience, nutrient availability and turbulence level (Palmer et al. 2015a;Aiken et al. 2008). Overgrowth of algae, also known as algal bloom, may induce serious ecological, social, economic and health effects which underlines its importance. Bloom effects may be direct or indirect, including possible effects of toxins, lack of oxygen and light, effects on aquatic flora and fauna or reduction of the submerged plants when plankton biomass becomes high. The lack of oxygen and light in the water may lead even to the damage of the whole aquatic ecosystem (Havens 2008;Bao et al. 2015). The significant socioeconomic and ecological costs of blooming events impact human health, ecology, economics, tourism and drinking water supply, fisheries, agriculture and food chain resilience (Carmichael and Boyer 2016). Based on the above information, regular monitoring of algal abundance is extremely important. Traditional strategies with ship-based approaches based on field sampling and laboratory analysis have been adopted to determine chlorophyll-a amount. Chlorophyll-a is the main pigment in phytoplankton, which indicates the trophic status of water (Ansper and Alikas 2019). However, these methods are labour intensive and costly, and do not provide synoptic views of the bloom conditions (Caballero et al. 2020). Over the past decades, satellite remote sensing has been substantial to advancing knowledge pertaining to terrestrial phenology (e.g. MacBean et al. 2015;Asner et al. 2000), and has increasingly been applied to marine (e.g. Arun et al. 2015) and inland water (e.g. Blix et al. 2018) settings using retrievals of chlorophyll-a, a common proxy for phytoplankton biomass. In view of the dynamic, productivity and optical complexity inherent to lakes and other inland waters pose many challenges to the accurate retrieval of biogeochemical parameters using satellite remote sensing (Palmer 2015b). The Sentinel-2 platform offers greatly improved spatial resolution over other satellite platforms designed for water based chlorophyll-a retrievals, furthermore it includes a "red-edge" band at 704 nm not present on the Landsat 8 operational land imager (Bramich et al. 2021). To this aim, it is necessary to analyse the advantages and potential limitations of existing chlorophyll-a algorithms. Efforts have been made to find appropriate sensors, but none were specifically designed for inland waters however, MSI (Sentinel Multispectral Imager) has appeared to be a suitable sensor for chlorophyll-a estimation due to the overall radiometric quality and temporal resolution since the launch of Sentinel-2 in 2015 (Li et al. 2021).

Study Site
The study site was Lake Balaton, the largest freshwater lake in Central Europe lying in the western part of Hungary. The lake is characterised by a large surface area (596 km 2 ) combined with a shallow mean depth (3.25 m). The bed of Lake Balaton is 2-3 m deep at the north coast and gradually drops towards the south (Sathyendranath et al. 2014;Farkas et al. 2020;Pelevin et al. 2017). It is a polymictic inland lake, so it does not undergo permanent or seasonal stratification (Istvánovics et al. 2007). Half of the annual inflow enters in the south-west from river Zala that drains into a catchment area of an intensively cultivated land. This area has historically been one of the main sources of nutrients in the lake. These factors, combined with continental climate make the lake vulnerable to eutrophication (Sváb et al. 2005;Farkas et al. 2020).
A workflow to illustrate the process is shown in Fig. 1.

Data Acquisition
During the summer 2020 (6th, 8th, 24th, 25th, 27th of June, 17th, 19th, 26th oj July and 19th of August), 33 water samples (250 ml) were collected that were refrigerated to avoid the loss of chlorophyll-a (4 °C, 1-5 days) until laboratory analysis.The chlorophyll-a content was measured with a Jenway 6400 spectrometer. The chlorophyll-a concentration was determined by Felföldy method (Felföldy 1963). The accuracy of the laboratory measurement is ± 1-1.5 μg/l. During the process, the samples were compressed through a suitable filter, dissolving the pigments in boiling methyl alcohol, and their amount was measured with a spectrophotometer. The GPS coordinates were written in QGIS Zoom to Coordinate plugin, so the satellite chlorophyll-a data became clearly readable relative to the certain sampling point. The sampling points can be seen in Fig. 2. Sentinel-2 satellite images were obtained to determine chlorophyll-a amount. The image sources were USGS Earthexplorer (USGS 2021) and Copernicus Open Access Hub (Copernicus 2021) databases. The target area and time interval were selected. The data were Level-1C products. This means that the products were characterised by top-of-atmosphere (TOA) reflectance, including radiometric and geometric corrections along with orthorectification to generate highly accurate geolocated outcomes. Although it is possible to set the maximum cloud coverage for both databases, the appropriate products were selected based on individual judgement. The reason of this was that it is possible to have an image with high cloud cover with a still clearly visible target area, meanwhile, even low cloud may overlay the target area. The obtained products were processed with Acolite software. Acolite is a processor for Landsat (5/7/8) and Sentinel-2 (A/B) imagery developed by Royal Belgian Institute of Natural Sciences (RBINS). It is coded in Python and developed for water investigation based on satellite images. The software is available at the website of Operational Directorate Natural Environment scientific web sites and applications (acolite 2021). Among others, it allows to determine chlorophyll-a amount by different algorithms and performs the atmospheric correction by default using the "dark spectrum fitting" approach (Vanhellemont and Ruddick 2018;Vanhellemont 2019). The software provides a map from the input file based on the algorithms and the coordinates given, where each pixel value shows the amount of chlorophyll-a. In the case of Sentinel-2 data, four algorithms are available for chl-a retrieval in Acolite: chl_oc2, chl_re_gons, chl_re_moses3b and chl_re_mishra (Table 1). The chlorophyll-a concentration is given in µg/l. Details of the applied algorithms can be seen in Table 1.
The results were visualised in QGIS Desktop 3.14.15 Software (QGIS 2020). To visualise the data, four different classes were created with different chlorophyll-a rate (0-8; 8-25; 25-75; > 75 µg/l). The four classes were defined by the OECD (Organisation for Economic Co-operation and Development) (Ryding et al. 1994). A script was written by the Water Management Research Group of the Budapest University of Technology and Economics with RStudio 4.0.0 (R Core Team 2020). To extract the reflectance values in all Sentinel-2 bands for one given exact point of interest. Based on this reflectance data, new algorithms were developed to obtain chlorophyll-a data, using the blue (band 2)/ green (band 3) ratio (chl_extBG) and the NIR (band 8)/red (band 4) ratio (chl_extNIRR). They were proved to be useful to define chlorophyll-a amount according to previous studies (Zeng et al. 2016;Gordon and Morel 2012;Ha et al. 2017;Han and Rundquist 1997). The algorithm based on the blue/ greed and NIR/red reflectance ratio is the following: (1) Chl-a B/G = (blue/green) (2) Chl-a NIR/RED = (NIR/red) To investigate seasonal trends and differences in different parts of Lake Balaton, zonal statistic was applied in QGIS. The Zonal statistics plugin is a core plugin in QGIS that allows the user to calculate statistics on pixels of a raster band that are within polygons/zones in a vector layer. The input file is the satellite image, and the output file is an attribute table. A shapefile (a vector layer) was created with the four basins of Balaton due to geological features (Borics et al. 2010), divided them into north and south quarters. The zonal statistic was applied to get the mean, minimum and maximum values of each part. The shapefile and and the zones can be seen in Fig. 3.
To investigate the correlation between in situ and satellite data, Pearson's correlation was applied. The output of the Acolite program is a raster, where each pixel represents the chlorophyll-a value (in μg/l) The two datasets are comparable. The normality of the data was accepted by Shapiro-Wilk's normality test (p > 0.05). To compare the chlorophyll-a values of the four basins north and south, two-way ANOVA was performed. To satisfy the normality assumption of the residuals, 1/√x transformation was applied, where x is the chlorophyll-a value derived from a satellite image, processed by Acolite. In this way, the absolute values of the skewness (0.14) and kurtosis (0.29) were both below 1 which confirms normality. The homogeneity of variances was accepted by Levene's test (p > 0.05). Pairwise comparison was performed using Tukey's post hoc tests. Statistical analysis was performed by IBM SPSS v25 (Armonk 2017) statistical software.

Results
The Pearson's correlation was calculated to test the correlation between laboratory and satellite data. The results can be seen in Table 2.
The highest correlation between laboratory and satellitederived data was obtained by the reflectance-based blue/ green ratio algorithm, Chl-a B/G (r = 0.93). The lowest correlation was produced by the chl_re_mishra algorithm (r = 0.75, however, all were significant at the p < 0.001 level. Among Acolite algorithms, chl_re_oc2 resulted in the highest correlation coefficient (r = 0.86; p < 0.001. The merge of the Acolite calculations and the QGIS visualisation generated a chlorophyll-a map, an example is shown in Fig. 4, which illustrates the results of the four algorithms on the same day.
In Fig. 5, the linear regression of in situ and Sentinel-2-derived chlorophyll-a amount can be seen in case of the four algorithms.
If the slope is positive in the linear regression figure, then there is a positive linear relationship between the data. In the recent study, a positive linear relationship can be seen between the in situ and the Sentinel-2 dataset except the chl_re_mishra algorithm, proving the applicability of the algorithms. According to the previous studies, a west-east trophic gradient can be observed in Lake Balaton (Herodek et al. 1988;Istvánovics et al. 2007;Palmer et al. 2015a). The purpose of zonal statistics is to examine whether satellite image data can also support this affirmation. Back to 2015, mean chlorophyll-a values were derived from the QGIS zonal statistic plugin (QGIS 2021) for the eight parts of the lake. The two-way ANOVA revealed significant chlorophyll-a differences among the basins (F(3;280) = 122.09; p < 0.001) and quarters (F(1;280) = 16,22; p < 0.001). The interaction was not significant (F(3;280) = 0.26; p = 0.85). In Fig. 6, we summarise the post hoc tests results (Tukey's, p < 0.05). It can be seen that a strong trophic gradient is characteristic in the east-west direction. The concentration of chlorophyll-a is highest in the westernmost Keszthely basin and gradually decreases towards easternmost Siófok basin. It can also be seen that we observed higher chlorophyll-a concentrations in the northern area compared to the south.
To investigate the temporal changes of algae, the result of the zonal statistics was examined in seasonal distribution. The average chlorophyll-a value was calculated for each season, considering the values of the entire database. The result can be seen in Fig. 7.
Average chlorophyll-a value was in spring: 17 μg/l, in summer: 16 μg/l, in autumn: 20 μg/l and in winter: 10 μg/l. The highest chl-a amount can be observed in autumn. This is due to the fact that one of the characteristic algal species of Lake Balaton, Cylindrospermopsis raciborskii, reaches its maximum amount in early autumn. Chlorophyll-a content was always higher in the southern areas than in the northern areas, but not to the same extent. The northern and southern regions have the largest different in spring, when the southern region is 19% more than the northern chlorophyll-a value. The smallest difference was observed in winter and autumn, when the southern region was 1% larger than that of northern chlorophyll-a.
The trophic gradient and the seasonal trends can de observed by a change detection in Fig. 8, which presented the chlorophyll-a amount in different months in 2020.
In Fig. 8, the east-west trophical gradient can be observed. The east part of the lake changes from oligotrophic to mesotrophic then eutrophic. The western part of the lake remains oligotrophic in the whole year. The most significant change is produced by Kis Balaton, turning from oligotrophic to hypertrophic.

Table 1
Acolite algorithms for chl-a retrieval in the case of Sentinel-2 Name chl_oc2 chl_re_gons chl_re_moses3b chl_re_mishra Algorithm Blue/green ratio using blue (band 2) and green (band 3) Chlorophyll-a concentration (µg/l) using the blue/green ratio algorithm Chlorophyll-a concentration (µg/l) using the red edge algorithm by Gons et al. (2002) with published coefficients and a mass specific chlorophyll-a absorption of 0.015. By default 780 nm (band 6) is used as a reference, but the chl_re_ gons740 product uses 740 nm (band 5) on MSI Chlorophyll-a concentration (µg/l) using the three band red edge algorithm by Moses et al. (2012). By default 780 nm (band 6) is used as a reference, but the chl_re_moses3b740 product uses 740 nm (band 5) on MSI Chlorophyll-a concentration (µg/l) using the Normalised Difference Chlorophyll Index algorithm using red (band 4) and red edge (band 5) by Mishra and Mishra (2012) Fig . 3 The eight Lake Balaton zones for algae spatial distribution investigation

Discussion
This study demonstrates the operational capabilities of the Sentinel-2 satellite for mapping the freshwater chlorophyll-a amount and its spatial distribution. The implemented Sentinel-2 image processing tools allowed the monitoring of the extension and severity of the algae bloom events in highly turbidity freshwater lakes. The maximum of chlorophylla in autumn was similar to the results of previous studies (Padisák 1994). In that research, Cylindrospermopsis raciborskii biomass (mg/l) was determined in Tihany region of Lake Balaton from July to October in 1982 and 1992. The 1994 study determined the number of phytoplanktons, while our study calculates the chlorophyll-a amount. The strong trophic gradient mentioned in previous studies was also detected in the present research (Vörös and Göde 1993;Felföldi et al. 2011). In an earlier study (Felföldi et al. 2011), samples were collected from Siófok and Keszthely basin in 2006. The concentration of chlorophyll-a was measured spectrophotometrically. Chlorophyll-a content was higher in the western basin, proving the existence of the trophic gradient. The average chlorophyll-a amount in the Keszthely basin was 24.1 μg/l and in Siófok it was 6.5 μg/l. Comparing the result of the recent study, the average chlorophyll-a value in the Keszthely basin was 29 μg/l, in Siófok it was 6 μg/l. The investigation period of our study was longer (approximately 5 years) than the 2006 study (6 months). Acolite-processed Sentinel-2 data were applied in other recent studies as well to determine algae concentrations in case of lakes (Bramich et al. 2021). The script to gain the reflectance values for one exact point is a novel development for Balaton and algea mapping freshwater lakes. QGIS zonal statistics has not been used to invstigate algae distribution in Lake Balaton before. Zeng et al. (2016) applied satellite-based (MODIS, VIIRS), blue/green ratio-based algorithm to estimate chlorophyll-a amount on ocean surface. The study came to the conclusion, that satellite-derived data overestimates if the   Fig. 6 Mean chl-a amount in Lake Balaton depending on basins (Keszthely, Szigliget, Szemes, and Siófok) and quarters (north and south). Different letters are for significantly different basins (Tukey's, p < 0.05) while * and + denote the difference between the north and south part of the lake at p < 0.05 and p < 0.10, respectively, according to chl_oc2 algorithm Fig. 7 The average amount of chl-a in Lake Balaton in seasonal distribution, 2015.08.01-2020.12.31, based on chl_oc2 algorithm and Rundquist (1997) retrieve chlorophyll-a using field spectrometry, with the result r 2 = 0.82.

Conclusion
Regular algal monitoring is widely supported by satellites since algal blooms have serious ecological, social, economic and health effects. Previous studies demonstrated the applicability of satellites in algae detection. The aim of the current study was to validate a satellite-based chlorophyll-a retrieval program for Lake Balaton. Another goal was to select the most appropriate algorithm(s) from the four algorithms available in Acolite software. As a result, we provided two self-validated algorithms. Results show that the two highest correlations were performed by the newly validated, blue/green ratio-based algorithm, Chla B/G (r = 0.93) and the chl_re_oc2 algorithm (r = 0.86) of the Acolite software. According to the Pearson's correlation between the algorithm-based estimations and the laboratory measurements, satellite data are applicable for alga amount retrieval. According to zonal statistical studies, a significant west-east trophic gradient can be observed at Lake Balaton.
Regular monitoring of algae abundance is significant to preserve the appropriate condition of the lake and satellites play an important role in this process. Satellite investigations make it possible to inspect large water surfaces without disturbing the water body, which can lead to distortion of results. Based on the results of the high correlation between laboratory and satellite-based observations, the implementation of Sentinel-based freshwater investigation is highly recommended. Validation of a satellite with appropriate revisiting time and spatial resolution for lakes provide opportunities for further investigation of 1.9 million km 2 lake and reservoir water body surface (Hong et al. 2016) which are prominent drinking water supply, economical and ecological factors. Further research is planned to study other lakes, such as Lake Fertö at the Hungarian-Austrian border.
Funding Open access funding provided by Hungarian University of Agriculture and Life Sciences. The study was supported by the Doctoral School of Horticulture Sciences at the Hungarian University of Agricultural and Life Sciences.

Data availability
The data that support the findings of this study are available from the corresponding author, upon reasonable request.
Code availability Not applicable.

Conflict of interest No conflict of interest has been declared by the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

Fig. 8
Change detection of chlorophyll-a amount in 2020 at Lake Balaton with chl_oc2 algorithm