Skip to main content

Nonparametric Statistics in Human–Computer Interaction

  • Chapter
  • First Online:
Modern Statistical Methods for HCI

Part of the book series: Human–Computer Interaction Series ((HCIS))

Abstract

Data not suitable for classic parametric statistical analyses arise frequently in human–computer interaction studies. Various nonparametric statistical procedures are appropriate and advantageous when used properly. This chapter organizes and illustrates multiple nonparametric procedures, contrasting them with their parametric counterparts. Guidance is given for when to use nonparametric analyses and how to interpret and report their results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The Mann-Whitney U test has multiple and sometimes confusing names. It is also known as the Wilcoxon-Mann-Whitney test, the Mann-Whitney-Wilcoxon test, and the Wilcoxon rank-sum test. None of these should be confused with the Wilcoxon signed-rank test, which is for one-factor two-level within-subjects designs.

  2. 2.

    Holm’s sequential Bonferroni procedure for three pairwise comparisons uses a significance threshold of \(\upalpha =0.05/3\) for the lowest p-value, \(\upalpha =0.05/2\) for the second lowest p-value, and \(\upalpha =0.05/1\) for the highest p-value. Should a p-value compared in that ascending order fail to be statistically significant, the procedure halts and any subsequent comparisons are regarded as statistically nonsignificant.

  3. 3.

    Rather than using traditional repeated measures ANOVAs, ARTool uses mixed-effects analyses of variance, explained below in the section on Generalized Linear Mixed Models.

  4. 4.

    General Linear Models are often called “linear models” and may be abbreviated “LM.” These should not be confused with Generalized Linear Models, which may be abbreviated “GLM.” However, some texts use “GLM” for linear models and “GZLM” for generalized models. Readers should take care when encountering this family of abbreviations.

  5. 5.

    While not covered in this chapter, LMs and GLMs also offer the ability to use continuous independent variables, not just categorical independent variables (see Chap. 11).

  6. 6.

    Multinomial logistic regression—when used with dichotomous responses such as Yes/No, True/False, Success/Fail, Agree/Disagree, or 1/0—is called “binomial regression.” The GLM for binomial regression uses a “binomial” distribution and “logit” link function. It can be conducted using the glm function in much the same way as Poisson regression explained below, except with the parameter family=binomial.

  7. 7.

    Given data with a large number of zeroes, it is prudent to consider an extension to Poisson regression called “zero-inflated” Poisson regression. This model incorporates binomial regression to predict the probability of a zero alongside Poisson regression to model counts. See the zeroinfl function in the pscl package.

  8. 8.

    Although the canonical link function for the Gamma distribution is actually the “inverse” function, the “log” function is often used because the inverse function can be difficult to estimate due to discontinuity at zero. The two functions provide similar results.

  9. 9.

    This model uses an intercept-only random effect. There are other types of random effects such as slopes-and-intercept random effects that are described in Chap. 11.

  10. 10.

    The ANOVA type indicates how the sums-of-squares are computed. In general, Type III ANOVAs are preferred because they can support conclusions about main effects in the presence of significant interactions. For Type I and Type II ANOVAs, significant main effects cannot safely be interpreted in the presence of significant interactions.

References

  • Anderson TW, Darling DA (1952) Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann Math Stat 23(2):193–212

    Google Scholar 

  • Anderson TW, Darling DA (1954) A test of goodness of fit. J Am Stat Assoc 49(268):765–769

    Google Scholar 

  • Brown GW, Mood AM (1948) Homogeneity of several samples. Am Stat 2(3):22

    Google Scholar 

  • Brown GW, Mood AM (1951) On median tests for linear hypotheses. In: Proceedings of the second Berkeley symposium on mathematical statistics and probability, Berkeley, California. University of California Press, Berkeley, California, pp 159–166

    Google Scholar 

  • Conover WJ, Iman RL (1981) Rank transformations as a bridge between parametric and nonparametric statistics. Am Stat 35(3):124–129

    Google Scholar 

  • D’Agostino RB (1986) Tests for the normal distribution. In: D’Agostino RB, Stephens MA (eds) Goodness-of-fit techniques. Marcel Dekker, New York, pp 367–420

    Google Scholar 

  • Dixon WJ, Mood AM (1946) The statistical sign test. J Am Stat Assoc 41(236):557–566

    Google Scholar 

  • Fawcett RF, Salter KC (1984) A Monte Carlo study of the F test and three tests based on ranks of treatment effects in randomized block designs. Commun Stat Simul Comput 13(2):213–225

    Google Scholar 

  • Fisher RA (1921) On the “probable error” of a coefficient of correlation deduced from a small sample. Metron 1(4):3–32

    Google Scholar 

  • Fisher RA (1922) On the interpretation of \(\chi ^{2}\) from contingency tables, and the calculation of P. J R Stat Soc 85(1):87–94

    Google Scholar 

  • Fisher RA (1925) Statistical methods for research workers. Oliver and Boyd, Edinburgh

    MATH  Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Google Scholar 

  • Gilmour AR, Anderson RD, Rae AL (1985) The analysis of binomial data by a generalized linear mixed model. Biometrika \(72\)(3):593–599

    Google Scholar 

  • Greenhouse SW, Geisser S (1959) On methods in the analysis of profile data. Psychometrika 24(2):95–112

    Google Scholar 

  • Higgins JJ, Blair RC, Tashtoush S (1990) The aligned rank transform procedure. In: Proceedings of the conference on applied statistics in agriculture. Kansas State University, Manhattan, Kansas, pp 185–195

    Google Scholar 

  • Higgins JJ, Tashtoush S (1994) An aligned rank transform test for interaction. Nonlinear World 1(2):201–211

    Google Scholar 

  • Higgins JJ (2004) Introduction to modern nonparametric statistics. Duxbury Press, Pacific Grove

    Google Scholar 

  • Hodges JL, Lehmann EL (1962) Rank methods for combination of independent experiments in the analysis of variance. Ann Math Stat 33(2):482–497

    Google Scholar 

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70

    Google Scholar 

  • Kolmogorov A (1933) Sulla determinazione empirica di una legge di distributione. Giornale dell’Istituto Italiano degli Attuari 4:83–91

    Google Scholar 

  • Kramer CY (1956) Extension of multiple range tests to group means with unequal numbers of replications. Biometrics 12(3):307–310

    Google Scholar 

  • Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Amer Stat Assoc 47(260):583–621

    Google Scholar 

  • Lehmann EL (2006) Nonparametrics: statistical methods based on ranks. Springer, New York

    MATH  Google Scholar 

  • Levene H (1960) Robust tests for equality of variances. In: Olkin I, Ghurye SG, Hoeffding H, Madow WG, Mann HB (eds) Contributions to probability and statistics. Stanford University Press, Palo Alto, pp 278–292

    Google Scholar 

  • Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22

    Google Scholar 

  • Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60

    Google Scholar 

  • Mansouri H (1999a) Aligned rank transform tests in linear models. J Stat Plann Inference 79(1):141–155

    Google Scholar 

  • Mansouri H (1999b) Multifactor analysis of variance based on the aligned rank transform technique. Comput Stat Data Anal 29(2):177–189

    Google Scholar 

  • Mansouri H, Paige RL, Surles JG (2004) Aligned rank transform techniques for analysis of variance and multiple comparisons. Commun Stat Theory Methods 33(9):2217–2232

    Google Scholar 

  • Massey FJ (1951) The Kolmogorov-Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78

    Google Scholar 

  • Mauchly JW (1940) Significance test for sphericity of a normal n-variate distribution. Ann Math Stat 11(2):204–209

    Google Scholar 

  • McCullagh P (1980) Regression models for ordinal data. J R Stat Soc Ser B 42(2):109–142

    Google Scholar 

  • Mehta CR, Patel NR (1983) A network algorithm for performing Fisher’s exact test in r \(\times \) c contingency tables. J Am Stat Assoc 78(382):427–434

    Google Scholar 

  • Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc Ser A \(135\)(3):370–384

    Google Scholar 

  • Pearson K (1900) On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos Mag Ser 5 50(302):157–175

    Google Scholar 

  • Razali NM, Wah YB (2011) Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. J Stat Model Anal \(2\)(1):21–33

    Google Scholar 

  • Richter SJ (1999) Nearly exact tests in factorial experiments using the aligned rank transform. J Appl Stat \(26\)(2):203–217

    Google Scholar 

  • Salter KC, Fawcett RF (1985) A robust and powerful rank test of treatment effects in balanced incomplete block designs. Commun Stat Simul Comput \(14\)(4):807–828

    Google Scholar 

  • Salter KC, Fawcett RF (1993) The ART test of interaction: a robust and powerful rank test of interaction in factorial models. Commun Stat Simul Comput \(22\)(1):137–153

    Google Scholar 

  • Sawilowsky SS (1990) Nonparametric tests of interaction in experimental design. Rev Educ Res \(60\)(1):91–126

    Google Scholar 

  • Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika \(52\)(3, 4):591–611

    Google Scholar 

  • Smirnov H (1939) Sur les écarts de la courbe de distribution empirique. Recueil Mathématique (Matematiceskii Sbornik) 6:3–26

    Google Scholar 

  • Sokal RR, Rohlf FJ (1981) Biometry: the principles and practice of statistics in biological research. W. H. Freeman, Oxford

    MATH  Google Scholar 

  • Stewart WM (1941) A note on the power of the sign test. Ann Math Stat \(12\)(2):236–239

    Google Scholar 

  • Stiratelli R, Laird N, Ware JH (1984) Random-effects models for serial observations with binary response. Biometrics 40(4):961–971

    Google Scholar 

  • Student (1908) The probable error of a mean. Biometrika \(6\)(1):1–25

    Google Scholar 

  • Tukey JW (1949) Comparing individual means in the analysis of variance. Biometrics 5(2):99–114

    Google Scholar 

  • Tukey JW (1953) The problem of multiple comparisons. Princeton University, Princeton

    Google Scholar 

  • von Bortkiewicz L (1898) Das Gesetz der kleinen Zahlen (The law of small numbers). Druck und Verlag von B.G. Teubner, Leipzig

    Google Scholar 

  • Wald A (1943) Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans Amer Math Soc \(54\)(3):426–482

    Google Scholar 

  • Welch BL (1951) On the comparison of several mean values: an alternative approach. Biometrika \(38\)(3/4):330–336

    Google Scholar 

  • White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4):817–838

    Google Scholar 

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biomet Bull 1(6):80–83

    Google Scholar 

  • Wobbrock JO, Findlater L, Gergle D, Higgins JJ (2011) The Aligned Rank Transform for nonparametric factorial analyses using only ANOVA procedures. In: Proceedings of the ACM conference on human factors in computing systems (CHI ’11), Vancouver, British Columbia, 7–12 May 2011. ACM Press, New York, pp 143–146

    Google Scholar 

  • Zeger SL, Liang K-Y, Albert PS (1988) Models for longitudinal data: a generalized estimating equation approach. Biometrics 44(4):1049–1060

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacob O. Wobbrock .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Wobbrock, J.O., Kay, M. (2016). Nonparametric Statistics in Human–Computer Interaction. In: Robertson, J., Kaptein, M. (eds) Modern Statistical Methods for HCI. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-319-26633-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26633-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26631-2

  • Online ISBN: 978-3-319-26633-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics