Abstract
Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.
Similar content being viewed by others
References
Anoruo, E. (2011). Testing for linear and nonlinear causality between crude oil price changes and stock market returns. International Journal of Economic Sciences and Applied Research, 4, 75–92.
Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistical Survey, 4, 40–79.
Basant, N., Gupta, S., Malik, A., & Singh, K. P. (2010). Linear and nonlinear modeling for simultaneous prediction of dissolved oxygen and biochemical oxygen demand of the surface water—a case study. Chemometrics and Intelligent Laboratory Systems, 104, 172–180.
Brock, W. A., Dechert, W., Scheinkman, J. A., & LeBaron, B. (1996). A test for independence based on the correlation dimension. Econometric Reviews, 15, 197–235.
Cao, D. S., Liang, Y. Z., Xu, Q. S., Hu, Q. N., Zhang, L. X., & Fu, G. H. (2011). Exploring nonlinear relationships in chemical data using kernel-based methods. Chemometrics and Intelligent Laboratory Systems, 107, 106–115.
Chapra, S. (1997). Surface water-quality modeling. New York: McGraw Hill Companies Inc.
Chen, W.-B., & Liu, W.-C. (2013). Artificial neural network modeling of dissolved oxygen in reservoir. Environmental Monitoring and Assessment. doi:10.1007/s10661-013-3450-6.
Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 17, 113–126.
Chu, C., Ni, Y., Tan, G., Saunders, C. J., & Ashburner, J. (2011). Kernel regression for FMRI pattern prediction. NeuroImage, 56, 662–673.
Cortes, C., Mohari, M., Weston, J. (2005). A general regression technique for learning transductions. Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Germany
Cozzolino, D., Cynkar, W. U., Shah, N., & Smith, P. (2011). Feasibility study on the use of attenuated total reflectance mid-infrared for analysis of compositional parameters in wine. Food Research International, 44, 181–186.
Cristianini, N., Taylor, J.S. (2000). An Introduction to Support Vector Machine and other Kernel based Learning Methods. Cambridge, Cambridge University Press
Daszykowski, M., Semeels, S., Kaczmarck, K., Van Espen, P., Croux, C., & Walczak, B. (2007). TOMCAT: A MATLAB toolbox for multivariate calibration techniques. Chemometrics and Intelligent Laboratory Systems, 85, 269–277.
Ekinci, S., Celebi, U. B., Bal, M., Amasyali, M. F., & Boyaci, U. K. (2011). Predictions of oil/chemical tanker main design parameters using computational intelligence techniques. Applied Soft Computing, 11, 2356–2366.
Evrendilek, F., & Karakaya, N. (2013). Monitoring diel dissolved oxygen dynamics through integrating wavelet denoising and temporal neural networks. Environmental Monitoring and Assessment. doi:10.1007/s10661-013-3476-9.
Heddam, S. (2013). Modeling hourly dissolved oxygen concentration (DO) using two different adaptive neuro-fuzzy inferencesystems (ANFIS): a comparative study. Environmental Monitoring and Assessment. doi:10.1007/s10661-013-3402-1.
Hsu, C.W., Chang, C.C. (2003). A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.
Jade, A. M., Srikanth, B., Jayaraman, V., Kulkurani, B. D., Jog, J. P., & Priya, L. (2003). Feature extraction and denoising using kernel PCA. Chemical Engineering Science, 58, 4441–4448.
Jemwa, G. T., & Aldrich, C. (2005). Monitoring of an industrial liquid–liquid extraction system with kernel-based methods. Hydrometallurgy, 78, 41–51.
Kim, K. I., Franz, M. O., & Scholkopf, B. (2005). Iterative kernel principal component analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1351–1366.
Kramer, R. (1998). Chemometric techniques for quantitative analysis (pp. 173–180). Sharon: CRC Press.
Li, H., Liang, Y., & Xu, Q. (2009). Support vector machine and its application in chemistry. Chemometrics and Intelligent Laboratory Systems, 95, 188–198.
Lin, F. C., Moschetti, M. P., & Ritzwoller, M. H. (2008). Surface wave tomography of the western United States from ambient seismic noise: Rayleigh and Love wave phase velocity maps. Geophysics Journal International, 173, 281–298.
Mattera, D., & Haykin, S. (1999). Support vector machines for dynamic reconstruction of a chaotic system. In B. Scholkopf, J. Burges, & A. Smola (Eds.), Advances in kernel methods: support vector machine. Cambridge, MA: MIT Press.
Naik, V. K., & Manjapp, S. (2010). Prediction of dissolved oxygen through mathematical modeling. International Journal of Environmental Research, 4, 153–160.
Ngo, S. H., Kemeny, S., & Deak, A. (2004). Application of ridge regression when the model is inherently imperfect: a case study of phase equilibrium. Chemometrics and Intelligent Laboratory Systems, 72, 185–194.
Noori, R., Karbassi, A. R., Moghaddamnia, K., Han, D., Zokaei-Ashtiani, M. H., Farokhnia, A., et al. (2011). Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. Journal of Hydrology, 401, 177–189.
Pagnini, G. (2009). The kernel method to compute the intensity of segregation for reactive pollutants: mathematical formulation. Atmospheric Environment, 43, 3691–3698.
Pan, Y., Jiang, J., Wang, R., Cao, H., & Cui, Y. (2008). Advantages of support vector machine in QSPR studies for predicting auto-ignition temperatures of organic compounds. Chemometrics and Intelligent Laboratory Systems, 92, 169–178.
Postama, G. J., Krooshof, P. W. T., & Buydens, L. M. C. (2011). Opening the kernel of kernel partial least squares and support vector machines. Analytica Chimica Acta, 705, 123–134.
Rosipal, R., & Trejo, L. J. (2001). Kernel partial least squares in reproducing kernel Hilbert space. Journal of Machine Learning Research, 2, 97–123.
Rosipal, R., Girolami, M., Trejo, L. J., & Cichocki, A. (2001). Kernel PCA for feature extraction and de-noising in nonlinear regression. Neural Computing and Applications, 10, 231–243.
Scholkopf, B., Smola, A., Muller, K.R. (1996). Nonlinear component analysis as a kernel eigenvalue problem. Max-Planck-Institut für biologische Kybernetik Spemannstra Germany, Technical Report No.44.
Shaghaghian, T. (2010). Prediction of dissolved oxygen in rivers using a Wang–Mendel method—case study of Au Sable River. World Academy of Science, Engineering and Technology, 38, 795–802.
Singh, K. P., Malik, A., Mohan, D., & Sinha, S. (2004). Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India)–a case study. Water Research, 38, 3980–3992.
Singh, K. P., Malik, A., & Singh, V. K. (2006). Chemometric analysis of hydro-chemical data of an alluvial river—a case study. Water, Air, & Soil Pollution, 170, 383–404.
Singh, K. P., Basant, A., Malik, A., & Jain, G. (2009). Artificial neural network modeling of the river water quality—a case study. Ecological Modeling, 220, 888–895.
Singh, K. P., Basant, N., Malik, A., & Jain, G. (2010). Modeling the performance of “up-flow anaerobic sludge blanket” reactor based wastewater treatment plant using linear and nonlinear approaches—a case study. Analytica Chimica Acta, 658, 1–11.
Singh, K. P., Basant, N., & Gupta, S. (2011). Support vector machines in water quality management. Analytica Chimica Acta, 703, 152–162.
Singh, K. P., Gupta, S., Kumar, A., & Shukla, S. P. (2012). Linear and nonlinear modeling approaches for urban air quality prediction. Science of the Total Environment, 426, 244–255.
Singh, K. P., Gupta, S., & Rai, P. (2013). Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment, 80, 426–437.
Taylor, J.S, Cristianini, N. (2004). Kernel method for pattern analysis. Cambridge, Cambridge University Press
Thomann, R. V., & Mueller, J. A. (1987). Principles of surface water quality modeling and control. New York: Harper Collins Publishers.
Ustun, B., Melssen, W. J., Oudenhuijzen, M., & Buydens, L. M. C. (2005). Determination of optimal support vector regression parameters by genetic algorithms and simplex optimization. Analytica Chimica Acta, 544, 292–305.
Vapnik, V. (1999). The nature of statistical learning theory (2nd ed.). Berlin: Springer.
Wang, W., Xu, Z., Lu, W., & Zhang, X. (2003). Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing, 55, 643–663.
Wang, J., Du, H., Liu, H., Yao, X., Hu, Z., & Fan, B. (2007). Prediction of surface tension for common compounds based on novel methods using heuristic method and support vector machine. Talanta, 73, 147–156.
Wen, X., Fang, J., Diao, M., & Zhang, C. (2013). Artificial neural network modeling of dissolved oxygen in the Heihe River, Northwestern China. Environmental Monitoring and Assessment, 185(5), 4361–4371.
Woo, S. H., Jeon, C. O., Yun, Y. S., Choi, H., Lee, C. S., & Lee, D. S. (2009). On-line estimation of key process variables based on kernel partial least squares in an industrial cokes wastewater treatment plant. Journal of Hazardous Materials, 161, 538–544.
Zhang, Y., & Ma, C. (2011). Fault diagnosis of nonlinear processes using multiscale KPCA and multiscale KPLS. Chemical Engineering Science, 66, 64–72.
Zhang, Y., & Teng, Y. (2010). Process data modeling using modified kernel partial least squares. Chemical Engineering Science, 65, 6353–6361.
Zhang, P., Lee, C., Verweij, H., Akbar, S. A., Hunter, G., & Dutta, P. K. (2007). High temperature sensor array for simultaneous determination of O2, CO, and CO2 with kernel ridge regression data analysis. Sensors and Actuators B: Chemical, 123, 950–963.
Zhang, W., Tang, S. Y., Zhu, Y. F., & Wang, W. P. (2010). Comparative studies of support vector regression between reproducing kernel and Gaussian kernel. World Academy of Science, Engineering and Technology, 65, 933–941.
Acknowledgements
The authors thank the Director, CSIR-Indian Institute of Toxicology Research, Lucknow, for his keen interest in this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Singh, K.P., Gupta, S. & Rai, P. Predicting dissolved oxygen concentration using kernel regression modeling approaches with nonlinear hydro-chemical data. Environ Monit Assess 186, 2749–2765 (2014). https://doi.org/10.1007/s10661-013-3576-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10661-013-3576-6