Abstract
Water is one of the main sources of life on Earth. As a result of the progress made in the field of industry and technology, water has become one of the most important wealth that must be preserved. Studies indicate that the world is heading toward a crisis in the percentage of available water by the year 2025 as a result of the scarcity of water sources, the increase in pollution rates, and the increased use of water. On the other hand, water refining is a very expensive method. Therefore, it was necessary to go to computer methods characterized by high accuracy to know the percentage of water quality scale and the possibility of using it in different places other than drinking before resorting to the refining process. This paper presents a model for predicting a water quality scale based on twelve concentrations called (IM12CP-WQI) that is based on the use of the concept of intelligent data mining that combines the construction of two algorithms, namely (DWM-Bat and DMARS). The DWM-Bat worked to find the number of DMARS models in addition to the weights of each of the concentrations used in this study. The DMARS algorithm has found a mathematical model that combines these concentrations to predict the percentage of water quality. The MARS algorithm was developed by replacing its kernel with four functions: [linear, RBF, sigmoid, and polynomial]. The proposed model consists of four basic stages that included: the first stage is data collection and preliminary treatment to put it within the same ranges, which are [0, 1], as well as finding the correlation between concentrations to find out the direct or inverse correlation between those concentrations and their relationship with the water quality coefficient WQI. The second stage included building an optimization algorithm called DWM-Bat to find the optimal weights for each of the twelve concentrations, as well as the optimal number of M models for DMARS. The third stage included building a mathematical model that combines these concentrations, based on DMARS and benefiting from the results of the previous stage, DWM-Bat. The last stage included evaluating the results that were reached using three types of measurements (R2, NSE, D) on the basis of which the WQI value was determined based on four cases. The first case if the WQI value is less than 25, it can be used for the purpose of drinking, the second case if it was between (26–50) and it is used in fish lakes, the third case if it was between (51–75) and it could be used in agriculture, the fourth case if the WQI value is higher than 75 and then the water needs a refining process. Also, the results of the proposed model called (IM12CP-WQI) were compared with the results of MARS after it was developed by using different kernel functions. By applying the proposed model, it was found using DWM-Bat that the optimal number of M related to the winter and summer data sets is 9. And the best weight for each concentration was as follows: PH = 0.247, NTU = 0.420, TDS = 0.004, Ca = 0.028, Mg = 0.042, Cl = 0.008, Na = 0.011, K = 0.175, SO4 = 0.008, NO3 = 0.042, CaCO3 (TA) = 0.011, and CaCO3 (TH) = 0.004. On the other hand, the study demonstrated a high correlation between WQI, and the following concentrations are k = 0.985, TH = 0.86, NO3 = 0.761, TDS = 0.55, Na = 0.415, PH = 0.371, TA = 0.37, Cl = 0.362, and Ca = 0.317. The results showed that the predictor IM12CP-WQI is a good indicator compared with other techniques represented by MARS-linear, MARS-Sig, MARS-RBF, and MARS-Poly. Thus, the proposed model IM12CP-WQI is considered one of the most promising techniques in the field of water quality measurement despite the different concentrations that cause water pollution .
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are available in the https://link.springer.com/article/10.1007%2Fs13201-019-1080-z
Abbreviations
- BA:
-
Bat algorithm
- BBA:
-
Binary bat algorithm
- BFs:
-
Basis functions
- BOD5:
-
Five-day biochemical oxygen demand
- BTCR:
-
Boosted tree classifiers and regression
- Ca:
-
Calcium
- CART:
-
Classification and regression tree
- CHAID:
-
Chi-squared automatic interaction detection
- COD:
-
Chemical oxygen demand
- D :
-
Relative efficiency criteria
- DLBA:
-
Differential operator and L´evy flights bat algorithm
- DO:
-
Dissolved oxygen
- DOY:
-
Day of the year
- E :
-
Coefficient of efficiency
- E-CHAID:
-
Exhaustive Chi-squared automatic interaction detection
- f i :
-
Frequency
- FLBA:
-
Fuzzy logic bat algorithm
- GCV:
-
Generalized cross-validation
- IDM:
-
Intelligent data mining
- IM12CP-WQI:
-
Intelligent miner based on twelve concentrations to predict water quality index
- MARS:
-
Multivariate adaptive regression splines
- Mg:
-
Magnesium
- Na:
-
Sodium
- NO2 :
-
Nitrite
- NO3 :
-
Nitrate nitrogen
- NSE:
-
Nash–Sutcliffe efficiency
- NTU:
-
Turbidity
- PH:
-
Potential hydrogen
- PO4 :
-
Orthophosphate
- PSO:
-
Particle swarm optimization
- Q :
-
Discharge
- R :
-
Correlation coefficient
- R 2 :
-
Coefficient of determination
- RF:
-
Random forest
- RFC:
-
Randomizable filtered classification
- RFRCX:
-
Random forest regression and classification
- r i :
-
Pulse rate
- SO4 :
-
Sulfate
- Ta:
-
Air temperature
- TA:
-
Total alkalinity
- TDS:
-
Total dissolved solid
- TH:
-
Total hardness
- v i :
-
Velocity
- WQI:
-
Water quality Index
- x i :
-
Position
References
Aggarwal C.C. (2020) Optimization Basics: A Machine Learning View. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_4.
Al-Janabi S (2015) A novel agent-DKGBM predictor for business intelligence and analytics toward enterprise data discovery. J Babylon Univ Pure Appl Sci 23:2
Al-Janabi S (2020) Smart system to create an optimal higher education environment using IDA and IOTs. Int J Comput Appl. https://doi.org/10.1080/1206212x.2018.1512460
Al-Janabi S (2020) Smart system to create an optimal higher education environment using IDA and IOTs. Int J Comput Appl 42(3):244–259. https://doi.org/10.1080/1206212X.2018.1512460
Al-Janabi S, Alkaim AF (2021) A comparative analysis of DNA protein synthesis for solving optimization problems: a novel nature-inspired algorithm. In: Abraham A, Sasaki H, Rios R, Gandhi N, Singh U, Ma K (eds) Innovations in bio-inspired computing and applications. IBICA 2020. Advances in intelligent systems and computing, vol 1372. Springer, Cham. https://doi.org/10.1007/978-3-030-73603-3_1
Al-Janabi S, Kad G (2021) Synthesis biometric materials based on cooperative among (DSA, WOA and gSpan-FBR) to water treatment. In: Abraham A et al (eds) Proceedings of the 12th international conference on soft computing and pattern recognition (SoCPaR 2020). SoCPaR 2020. Advances in intelligent systems and computing, vol 1383. Springer, Cham. https://doi.org/10.1007/978-3-030-73689-7_3
Al-Janabi S, Mahdi MA (2019) Evaluation prediction techniques to achievement an optimal biomedical analysis. Int J Grid and Util Comput 10(5):512–527. https://doi.org/10.1504/IJGUC.2019.102021.7
Al-Janabi S, Mohammad M (2020) A new method for prediction of air pollution based on intelligent computation. Soft Comput 24:661–680. https://doi.org/10.1007/s00500-019-04495-1
Al-Janabi S, Ahmed P, Hayder F, Ibrahim A, Kenan K (2014) Empirical rapid and accurate prediction model for data mining tasks in cloud computing environments. In: 2014 International congress on technology, communication and knowledge (ICTCK), pp 1–8, https://doi.org/10.1109/ICTCK.2014.7033495
Al-Janabi S, Alkaim AF, Adel Z (2020) (2020) An Innovative synthesis of deep learning techniques (DCapsNet & DCOM) for generation electrical renewable energy from wind energy. Soft Comput 24:10943–10962. https://doi.org/10.1007/s00500-020-04905-9
Al-Janabi S, Yaqoob A, Mohammad M (2019) Pragmatic method based on intelligent big data analytics to prediction air pollution. Lecture Notes in Networks and Systems. Springer, pp 84–109. https://doi.org/10.1007/978-3-030-23672-4_8
Alkaim AF, Al-Janabi S (2020) Multi objectives optimization to gas flaring reduction from oil production. In: Farhaoui Y (eds) Big data and networks technologies. BDNT 2019. Lecture Notes in Networks and Systems, vol 81. Springer, Cham. https://doi.org/10.1007/978-3-030-23672-4_10
Ameen HA (2019) Spring water quality assessment using water quality index in villages of Barwari Bala, Duhok Kurdistan Region, Iraq. Appl Water Sci 9:176. https://doi.org/10.1007/s13201-019-1080-z
Banda TD, Kumarasamy MV (2020) Development of water quality indices (WQIs): a review. Polish J Environ Stud. https://doi.org/10.15244/pjoes/110526
Bui DT, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N (2020) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci Total Environ. https://doi.org/10.1016/j.scitotenv.2020.137612
Dey N, Rajinikanth V (2021) Applications of bat algorithm and its variants. Springer tracts in nature-inspired computing, eBook ISBN 978-981-15-5097-3, Hardcover ISBN 978-981-15-5096-6, Series, pp 12, 172. Springer, https://doi.org/10.1007/978-981-15-5097-3
Fahad LG, Tahir SF, Shahzad W, Hassan M, Alquhayz H, Hassan R (2020) Ant colony optimization-based streaming feature selection: an application to the medical image diagnosis. Sci Program. https://doi.org/10.1155/2020/1064934
Fan J, Wu L, Ma X, Zhou H, Zhang F (2020) Hybrid support vector machines with heuristic algorithms for prediction of daily diffuse solar radiation in air-polluted regions. Renew Energy 145:2034–2045. https://doi.org/10.1016/j.renene.2019.07.104
Guo H, Jeon K, Lim J, Jo J, Kim YM, Park J-p, Kim JH, Cho KH (2015) Prediction of effluent concentration in a wastewater treatment plant using machine learning models. J Environ Sci. https://doi.org/10.1016/j.jes.2015.01.007
Heddam S (2021) Intelligent data analytics approaches for Predicting dissolved oxygen concentration in river: extremely randomized tree versus random forest, MLPNN and MLR. In: Deo R, Samui P, Kisi O, Yaseen Z (eds) Intelligent data analytics for decision-support systems in hazard mitigation springer transactions in civil and environmental engineering. Springer, Singapore. https://doi.org/10.1007/978-981-15-5772-9_5
Heddam S, Kisi O (2018) Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree. J Hydrol. https://doi.org/10.1016/j.jhydrol.2018.02.061
Isiyaka HA, Mustapha A, Juahir H, Phil-Eze P (2019) Water quality modelling using artificial neural network and multivariate statistical techniques. Model Earth Syst Environ 5:583–593. https://doi.org/10.1007/s40808-018-0551-9
Kashikolaei SMG, Hosseinabadi AAR, Saemi B et al (2020) An enhancement of task scheduling in cloud computing based on imperialist competitive algorithm and firefly algorithm. J Supercomput 76:6302–6329. https://doi.org/10.1007/s11227-019-02816-7
Li Y, Han T, Han B, Zhao H, Wei Z (2019) Whale optimization algorithm with chaos strategy and weight factor. IOP Conf Ser J Phys Conf Ser 1213:032004. https://doi.org/10.1088/1742-6596/1213/3/032004
Ma X-X, Wang J-S (2018) Optimized parameter settings of binary bat algorithm for solving function optimization problems. J Electr Comput Eng. https://doi.org/10.1155/2018/3847951
Masrur Ahmed AA (2017) Prediction of dissolved oxygen in Surma River by biochemical oxygen demand and chemical oxygen demand using the artificial neural networks (ANNs). J King Saud Univ Eng Sci 29:151–158. https://doi.org/10.1016/j.jksues
Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67. https://doi.org/10.1016/j.advengsoft.2016.01.008
Mishra S, Sagban R, Yakoob A, Gandhi N (2018) Swarm intelligence in anomaly detection systems: an overview. Int J Comput Appl. https://doi.org/10.1080/1206212X.2018.1521895
Nasiri J, Khiyabani FM (2018) A whale optimization algorithm (WOA) approach for clustering. Cogent Math Stat 5:1. https://doi.org/10.1080/25742558.2018.1483565
Parmar KS, Soni K, Singh S (2021) Prediction of river water quality parameters using soft computing techniques. In: Deo R, Samui P, Kisi O, Yaseen Z (eds) Intelligent data analytics for decision-support systems in hazard mitigation. Springer Transactions in Civil and Environmental Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-15-5772-9_20
Rana N, Latiff MSA, Abdulhamid SM et al (2020) Whale optimization algorithm: a systematic review of contemporary applications, modifications and developments. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04849-z
Safari MJS (2020) Hybridization of multivariate adaptive regression splines and random forest models with an empirical equation for sediment deposition prediction in open channel flow. J Hydrol. https://doi.org/10.1016/j.jhydrol.2020.125392
Shekhar C, Varshney S, Kumar A (2020) Optimal control of a service system with emergency vacation using bat algorithm. J Comput Appl Math 364:112332. https://doi.org/10.1016/j.cam.2019.06.048
Sun S, Cao Z, Zhu H, Zhao J (2019) A survey of optimization methods from a machine learning perspective. School of Computer Science and Technology, East China Normal University, Shanghai. https://doi.org/10.1109/TCYB.2019.2950779
Xuan Z (2014) Computational intelligence techniques and applications. In: Computational intelligence techniques in earth and environmental sciences. Springer. https://doi.org/10.1007/978-94-017-8642-3_1
Yang XS (2013) Optimization and metaheuristic algorithms in engineering. In: Yang XS, Gandomi AH, Talatahari S, Alavi AH (eds) Metaheursitics in water, geotechnical and transport engineering. Elsevier, New York, pp 1–23. https://doi.org/10.1016/B978-0-12-398296-4.00001-5
Yang X-S, He X (2013) ‘Bat algorithm: literature review and applications. Int J Bio Inspired Comput 5(3):141–149. https://doi.org/10.1504/IJBIC.2013.055093
Zhu S, Heddam S, Wu S, Dai J, Jia B (2019) Extreme learning machine-based prediction of daily water temperature for rivers. Environ Earth Sci. https://doi.org/10.1007/s12665-019-8202-7
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Data collection and analysis were performed by Samaher Al-Janabi and Zahraa Al-Barmani. The first draft of the manuscript was written by Samaher Al-Janabi, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This paper does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Al-Janabi, S., Al-Barmani, Z. Intelligent multi-level analytics of soft computing approach to predict water quality index (IM12CP-WQI). Soft Comput 27, 7831–7861 (2023). https://doi.org/10.1007/s00500-023-07953-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-07953-z