Skip to main content
Log in

Evaluation of algal species distributions and prediction of cyanophyte cell counts using statistical techniques

  • Research Article
  • Published:
Environmental Science and Pollution Research Aims and scope Submit manuscript

Abstract

Safe drinking water sources are crucial for human health. Consequently, water quality management, including continuous monitoring of water quality and algae at sources, is critical to ensure the availability of safe water for local residents. This study aimed to construct statistical prediction models considering probability distributions relevant to cyanophyte cell counts and compare their prediction performance. In this study, water quality parameters at Juam Lake and Tamjin Lake, representative water sources in the Yeongsan and Seomjin rivers, South Korea, were investigated. We used a water quality monitoring network, algae alert system, and hydraulic and hydrological data measured every 7 days from January 2017 to December 2022 from the Water Environment Information System of the National Institute of Environmental Research. Using data for 2017–2021 as a training set and data for 2022 as a test set, the performances of seven models were compared for predicting cyanophyte cell counts. Environmental factors associated with algae in water sources were observed based on the monitoring data, and a prediction model appropriate for the cyanophyte distribution was generated, which also included the risk of toxicity. The extreme gradient boosting with the random forest model had the best predictive performance for cyanophyte cell counts. The study results are expected to facilitate water quality management in various water systems, including water sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

References

  • Abdellatif D, El Moutaouakil K, Satori K (2018) Clustering and Jarque-Bera normality test to face recognition. Procedia Comput Sci 127:246–255

    Article  Google Scholar 

  • Ajiferuke I, Famoye F (2015) Modelling count response variables in informetric studies: comparison among count, linear, and lognormal regression models. J Informet 9:499–513

    Article  Google Scholar 

  • Angelov PP, Soares EA, Jiang R, Arnold NI, Atkinson PM (2021) Explainable artificial intelligence: an analytical review. Wiley Interdiscip Rev Data Min Know Discov 11:e1424

    Article  Google Scholar 

  • Capblancq T, Forester BR (2021) Redundancy analysis: a Swiss Army Knife for landscape genomics. Methods Ecol Evol 12:2298–2309

    Article  Google Scholar 

  • Casella G, Berger RL (2021) Statistical inference. Cengage Learning, Boston, Massachusetts

    Google Scholar 

  • Cavanaugh JE, Neath AA (2019) The Akaike information criterion: background, derivation, properties, application, interpretation, and refinements. Wiley Interdiscip Rev Comput Stat 11:e1460

    Article  Google Scholar 

  • Choi DH, Jung JW, Lee KS, Choi YJ, Yoon KS, Cho SH et al (2012) Estimation of pollutant load delivery ratio for flow duration using LQ equation from the Oenam-cheon watershed in Juam Lake. J Environ Sci Int 21:31–39

    Article  Google Scholar 

  • Cohen P, West SG, Aiken LS (2014) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, Abingdon, UK

    Book  Google Scholar 

  • Das KR, Imon AHMR (2016) A brief review of tests for normality. Am J Theor Appl Stat 5:5–12

    Article  Google Scholar 

  • Dobson AJ, Barnett AG (2018) An introduction to generalized linear models. Chapman and Hall/CRC press, Boca Raton, Florida

    Google Scholar 

  • Đorić D, Nikolić-Đorić E, Jevremović V, Mališić J (2009) On measuring skewness and kurtosis. Qual Quant 43:481–493

    Article  Google Scholar 

  • Environment Information System of the National Institute of Environmental Research website, last modified March 30, 2023. http://water.nier.go.kr. Accessed 23 Apr 2023

  • Falconer IR, Humpage AR (2005) Health risk assessment of cyanobacterial (blue-green algal) toxins in drinking water. Int J Environ Res Public Health 2:43–50

    Article  CAS  Google Scholar 

  • Górecki T, Hörmann S, Horváth L, Kokoszka P (2018) Testing normality of functional time series. J Time Ser Anal 39:471–487

    Article  Google Scholar 

  • Hastie TJ (2017) Generalized additive models. Statistical models in S, 1st edn. Routledge, Oxfordshire, pp 249–307

    Chapter  Google Scholar 

  • He J, Hou XL, Wang WC (2022) Study of water quality pollution index, land-use and socio-economic factors in Yingkou Irrigation District of China based on redundancy analysis. Nat Environ Pollut Technol 21:297–302

    Article  CAS  Google Scholar 

  • Kaur P, Stoltzfus J, Yellapu V (2018) Descriptive statistics. Int J Acad Med 4:60

    Article  Google Scholar 

  • Kim SG (2017) Green algae and algae warning system. Water for Future 50:22–26

    Google Scholar 

  • Ku CS, Yang Y, Park Y, Lee J (2013) Health benefits of blue-green algae: prevention of cardiovascular disease and nonalcoholic fatty liver disease. J Med Food 16(2):103–111

    Article  CAS  Google Scholar 

  • Lee KL, Choi JS, Lee JH, Jung KY, Kim HS (2021) Response of epilithic diatom assemblages to weir construction on the Nakdong River, Republic of Korea. Ecol Indic 126:107711

    Article  CAS  Google Scholar 

  • Lim JC, Kim TS (2018) Analysis of biodiversity and ecological characteristics on Tamjin-river estuarine ecosytem. J Wet Res 20:181–189

    Google Scholar 

  • Loeys T, Moerkerke B, De Smet O, Buysse A (2012) The analysis of zero-inflated count data: beyond zero-inflated Poisson regression. Br J Math Stat Psychol 65:163–180

    Article  Google Scholar 

  • Mahmood T, Xie M (2019) Models and monitoring of zero-inflated processes: the past and current trends. Quality and Reliability Engin Int 35(8):2540–2557

    Article  Google Scholar 

  • Martinez WL, Martinez AR, Solka J (2017) Exploratory data analysis with MATLAB. CRC Press, Boca Raton, Florida

    Google Scholar 

  • Mukhopadhyay N (2020) Probability and statistical inference. CRC Press, Boca Raton, Florida

    Book  Google Scholar 

  • Ramosaj B, Pauly M (2019) Consistent estimation of residual variance with random forest Out-Of-Bag errors. Stat Probab Lett 151:49–57

    Article  Google Scholar 

  • Rebekić A, Lončarić Z, Petrović S, Marić S (2015) Pearson’s or Spearman’s correlation coefficient-which one to use? Poljoprivreda 21:47–54

    Article  Google Scholar 

  • Rigatti SJ (2017) Random forest. J Insur Med 47:31–39

    Article  Google Scholar 

  • Sakizadeh M, Zhang C (2021) Source identification and contribution of land uses to the observed values of heavy metals in soil samples of the border between the Northern Ireland and Republic of Ireland by receptor models and redundancy analysis. Geoderma 404:115313

    Article  CAS  Google Scholar 

  • Seo KA, Jung SJ, Park JH, Hwang KS, Lim BJ (2013) Relationships between the characteristics of algae occurrence and environmental factors in Lake Juam, Korea. J Korean Soc Water Environ 29:317–328

    Google Scholar 

  • Seo MJ, Lee HJ, Kim YS (2019) Relationship between coliform bacteria and water quality factors at weir stations in the Nakdong River, South Korea. Water 11:1171

    Article  CAS  Google Scholar 

  • Shim YJ, Cha JY, Park YS, Lee DJ, Seo YH, Hong JP et al (2014) A study on the land purchase priority measurement of the riparian areas in Yeongsan and Seomjin River basin-focusing on the riparian areas of the Juam Lake. J Korean Soc Environ Restor Technol 17:173–184

    Article  Google Scholar 

  • SigmaPlot (2014) SigmaPlot for windows. https://systatsoftware.com/sigmaplot/. Accessed 10 Apr 2023

  • Singh AP, Dhadse K, Ahalawat J (2019) Managing water quality of a river using an integrated geographically weighted regression technique with fuzzy decision-making model. Environ Monit Assess 191:1–17

    Article  Google Scholar 

  • ter Braak CJ, Šmilauer P (2018) Canoco reference manual and user’s guide: software for ordination (version 5.10). Biometris, Wageningen University & Research

    Google Scholar 

  • Water Resources Management Information System website, last modified March 30, 2023. http://www.wamis.go.kr. Accessed 23 Apr 2023

  • Winter B (2019) Statistics for linguists: an introduction using R. Routledge, New York

    Book  Google Scholar 

  • Yu JJ, Lee HJ, Lee KL, Lyu HS, Whang JW, Shin LY et al (2014) Relationship between distribution of the dominant phytoplankton species and water temperature in the Nakdong River, Korea. Korean J Ecol Environ 47:247–257

    Article  Google Scholar 

  • Yusuf OB, Bello T, Gureje O (2017) Zero inflated Poisson and zero inflated negative binomial models with application to number of falls in the elderly. Biostat Biom Open Access J 1:69–75

    Google Scholar 

  • Zhang W, Wu C, Zhong H, Li Y, Wang L (2021) Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci Front 12:469–477

    Article  Google Scholar 

  • Zhang X, Yi N (2020) Fast zero-inflated negative binomial mixed modeling approach for analyzing longitudinal metagenomics data. Bioinformatics 36:2345–2351

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the water quality team and algae team of the Yeongsan River Environment Research Center at the National Institute of Environmental Research for their assistance with sampling and analysis and Editage (http://www.editage.co.kr) for their English language editing service.

Funding

This work was supported by the National Institute of Environment Research (NIER), funded by the Ministry of Environment (MOE) of the Republic of Korea (grant number [NIER-2022-01-01-044]). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, Seong-Yun Hwang and Kang-Young Jung; data curation, Seong-Yun Hwang and Kang-Young Jung; formal analysis, Seong-Yun Hwang and Kyung-Lak Lee; funding acquisition, Jong-Hwan Park and Dong-Seok Shin; investigation, Hyeon-Su Chung, Mi-Sun Son, and Don-Woo Ha; methodology, Seong-Yun Hwang; project administration, Seong-Yun Hwang, Won-Seok Lee, and Dong-Seok Shin; resources, Byung-Woong Choi; software, Kyung-Lak Lee, Seong-Yun Hwang, and Kang-Young Jung; Supervision, Jong-Hwan Park, Won-Seok Lee, and Dong-Seok Shin; validation, Seong-Yun Hwang, Byung-Woong choi, and Kang-Young Jung; visualization, Kyung-Lak Lee, Seong-Yun Hwang, and Kang-Young Jung; roles/writings—original draft, Seong-Yun Hwang; writing—review and editing, Seong-Yun Hwang and Kang-Young Jung. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kang-Young Jung.

Ethics declarations

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent to publish

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Responsible Editor: Lotfi Aleya

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Seong-Yun Hwang is the first author of this study.

Highlights

- Statistical distribution of variables predicting water quality or algae-related variables warrant consideration

- Redundancy analysis can investigate seasonal or environmental distributions of algae

- Water quality management is enhanced by predicting cyanophyte cell counts

Appendix

Appendix

See Table 13

Table 13 Abbreviations

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hwang, SY., Choi, BW., Park, JH. et al. Evaluation of algal species distributions and prediction of cyanophyte cell counts using statistical techniques. Environ Sci Pollut Res 30, 117143–117164 (2023). https://doi.org/10.1007/s11356-023-30077-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11356-023-30077-8

Keywords

Navigation