Application of classification methods to analyze chemicals in drinking water quality
- 29 Downloads
To analyze drinking water dataset, various statistical methods have been applied, including discriminant analysis, logistic regression and cluster analysis, to construct models for the identification of important input variables. Among them decision trees are more flexible than other statistical classification methods because it provides us a complete path or frame to reach a specific decision with simplicity and ease of understanding about critical variables. This article describes the application of classification decision trees for the analysis of drinking water quality affecting variables and includes discussion about these based on various methods as well as their comparison to reach the best approach for the further analysis about understudy area. In this study, samples of filtered water are taken from 100 pumps located in different union councils of the Lahore city. The classification trees are constructed on the basis of input quality variables, and the results are reported in the form of confusion matrix. Four techniques, including Chi-square Automatic Interaction Detector, Exhaustive Chi-square Automatic Interaction Detector, Classification and Regression Tree and Quick Unbiased Efficient Statistical Tree, were used. Three experiments were conducted to get performance evaluation of the models by the number of misclassified units. The first method used complete dataset, the second one is based on the cross-validation, while the last one is based on the random subsampling.
KeywordsClassification trees CHAID ECHAID CRT QUEST Cross-validation
The authors are deeply thankful to the editor and the reviewers for their valuable suggestions to improve the quality of this manuscript. This work was supported by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah. The author, Muhammad Aslam, therefore, acknowledges with thanks for DSR technical support.
- 9.Breiman L (1984) Classification and regression trees. Routledge, New YorkGoogle Scholar
- 11.Morgan J, Messenger R (1973) THAID: a sequential search program for the analysis of nominal scale dependent variables. Survey Research Center, Institute for Social Research, University of Michigan, p 251Google Scholar
- 12.Steinberg D, Colla P (1995) CART: tree-structured non-parametric data analysis. Salford Systems, San DiegoGoogle Scholar
- 13.Martinez WL, Martinez AR (2007) Computational statistics handbook with MATLAB. CRC Press, Boca Raton, p 22Google Scholar
- 14.Hothorn T, Zeileis A (2019) Partykit: a toolkit for recursive partytioning. R package version 2.1–3. http://CRAN.R-project.org/package=partykit
- 16.Loh WY, Shih YS (1997) Split selection methods for classification trees. Stat Sin 7:815–840Google Scholar
- 17.Azam M, Zaman Q, Pfeiffer K (2007) Improved classification trees with two or more classes. In: Proceedings of the 9th Islamic countries conference on statistical sciencesGoogle Scholar
- 18.Huang C-S, Lin Y-J, Lin C-C (2008) Implementation of classifiers for choosing insurance policy using decision trees: a case study. WSEAS Trans Comput 7(10):1679–1689Google Scholar