Abstract
In the present paper, the novel software GTest is introduced, designed for testing the normality of a user-specified empirical distribution. It has been implemented with two unusual characteristics; the first is the user option of selecting four different versions of the normality test, each of them suited to be applied to a specific dataset or goal, and the second is the inferential paradigm that informs the output of such tests: it is basically graphical and intrinsically self-explanatory. The concept of inference-by-eye is an emerging inferential approach which will find a successful application in the near future due to the growing need of widening the audience of users of statistical methods to people with informal statistical skills. For instance, the latest European regulation concerning environmental issues introduced strict protocols for data handling (data quality assurance, outliers detection, etc.) and information exchange (areal statistics, trend detection, etc.) between regional and central environmental agencies. Therefore, more and more frequently, laboratory and field technicians will be requested to utilize complex software applications for subjecting data coming from monitoring, surveying or laboratory activities to specific statistical analyses. Unfortunately, inferential statistics, which actually influence the decisional processes for the correct managing of environmental resources, are often implemented in a way which expresses its outcomes in a numerical form with brief comments in a strict statistical jargon (degrees of freedom, level of significance, accepted/rejected H0, etc.). Therefore, often, the interpretation of such outcomes is really difficult for people with poor statistical knowledge. In such framework, the paradigm of the visual inference can contribute to fill in such gap, providing outcomes in self-explanatory graphical forms with a brief comment in the common language. Actually, the difficulties experienced by colleagues and their request for an effective tool for addressing such difficulties motivated us in adopting the inference-by-eye paradigm and implementing an easy-to-use, quick and reliable statistical tool. GTest visualizes its outcomes as a modified version of the Q-Q plot. The application has been developed in Visual Basic for Applications (VBA) within MS Excel 2010, which demonstrated to have all the characteristics of robustness and reliability needed. GTest provides true graphical normality tests which are as reliable as any statistical quantitative approach but much easier to understand. The Q-Q plots have been integrated with the outlining of an acceptance region around the representation of the theoretical distribution, defined in accordance with the alpha level of significance and the data sample size. The test decision rule is the following: if the empirical scatterplot falls completely within the acceptance region, then it can be concluded that the empirical distribution fits the theoretical one at the given alpha level. A comprehensive case study has been carried out with simulated and real-world data in order to check the robustness and reliability of the software.
Similar content being viewed by others
References
Barca, E., & Passarella, G. (2008). Spatial evaluation of the risk of groundwater quality degradation: a comparison between disjunctive kriging and geostatistical simulation. Environmental Monitoring and Assessment, 137(1–3), 261–273.
Barca, E., Passarella, G., & Uricchio, V. F. (2008). Optimal extension of the rain gauge monitoring network of the Apulian regional consortium for agricultural defense. Environmental Monitoring and Assessment, 145(1–3), 375–386.
Beaulieu-Prevost, D. (2006). Confidence intervals: from tests of statistical significance to confidence intervals, range hypotheses and substantial effects. Tutorial in Quantitative Methods for Psychology, 2(1), 11–19.
Calzada M. E., Scariano S. M. (2002).Visual EDF software to check the normality assumption. Electronic Proceedings of the Fifteenth Annual International Conference on Technology in Collegiate Mathematics. Orlando, Florida, 31 October – 3 November 2002, Paper C022.
Castrignanò, A., De Benedetto, D., Girone, G., Guastaferro, F., & Sollitto, D. (2010). Characterization, delineation and visualization of agro-ecozones using multivariate geographical clustering. Italian Journal of Agronomy, 5, 121–132.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley and Sons.
D’Agostino R., Stephens M. (1986). Goodness-of-fit techniques. Marcel Decker
Devaney J. (1997), Equation discovery through global self-referenced geometric intervals and machine learning. Ph.D thesis, George Mason University, Fairfax, VA.
Diggle P. J., Ribeiro P. J. Jr (2007). Model-based geostatistics. Springer Series in Statistics
Filliben, J. J. (1975). The probability plot correlation coefficient test for normality. Technometrics (American Society for Quality), 17(1), 111–117.
Glantz S. (2005) Primer of biostatistics. McGraw-Hill (6 ed).
Gnanadesikan, R., & Wilk, M. B. (1968). Probability plotting methods for the analysis of data. Biometrika, 55(1), 1–17.
Greene, W. H. (2000). Econometric analysis (4th ed.). Upper Saddle River: Prentice Hall.
Hazen, A. (1930). Flood flows. A study of frequencies and magnitudes. New York: Wiley.
Hogg, R. V., & Tanis, E. A. (1977). Probability and statistical inference. New York: MacMillan Publishing.
Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods (2nd ed.). New York: Wiley.
Keeling, K. B., & Pavur, R. J. (2011). Statistical accuracy of spreadsheet software. The American Statistician, 65(4), 265–273.
Looney, S. W., & Gulledge, T. R., Jr. (1985). Use of the correlation coefficient with normal probability plots. The American Statistician, 39(1), 75–79.
Masciale, R., Barca, E., & Passarella, G. (2011). A methodology for rapid assessment of the environmental status of the shallow aquifer of "Tavoliere di Puglia" (Southern Italy). Environmental Monitoring and Assessment, 177(1–4), 245–261.
Mazen A., Magid M., Hemmasi M., Lewis M. F. (1985). In search of power: a statistical power analysis of contemporary research in strategic management. Academy of Management Proceedings, 30–34.
Michael, J. R. (1983). The stabilized probability plot. Biometrika, 70(1), 11–17.
Nash, J. C. (2006). Spreadsheets in statistical practice—another look. The American Statistician, 60(3), 207–289.
Ott W. R. (1995). Environmental statistics and data analysis. Lewis Publishers
Razali, N. M., & Wah, Y. B. (2011). Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21–33.
Reinard J. C. (2006). Communication research statistics. Sage Publications
Rochowicz, J. A., Jr. (2010). Bootstrapping analysis, inferential statistics and EXCEL. Spreadsheets in Education (eJSiE), 4(3), 1–23.
Royston, P. (1993). Graphical detection of non-normality by using Michael’s statistic. Journal of the Royal Statistical Society: Series C: Applied Statistics, 42(1), 153–158.
Steinskog, D. J., Tjøstheim, D. B., & Kvamstø, N. G. (2007). A cautionary note on the use of the Kolmogorov-Smirnov test for normality. American Meteorological Society, 135(3), 1151–1157. doi:10.1175/MWR3326.1.
Stirling W. D. (1982). Enhancements to aid interpretation of probability plots. The Statistician, 31(3)
Sutherland, W. J., Spiegelhalter, D., & Burgman, M. (2013). Policy: twenty tips for interpreting scientific claims. Nature, 503, 335–337. doi:10.1038/503335a.
Thode, H. C., Jr. (2002). Testing for normality. New York: Marcel Dekker. ISBN 0-8247-9613-6.
Wheater, C. P., & Cook, P. A. (2000). Using statistics to understand the environment. Introductions to Environment Series (1st ed.). London: Routledge. 246 p. ISBN 0-415-19887-9.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barca, E., Bruno, E., Bruno, D.E. et al. GTest: a software tool for graphical assessment of empirical distributions’ Gaussianity. Environ Monit Assess 188, 138 (2016). https://doi.org/10.1007/s10661-016-5138-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10661-016-5138-1