Normality: a Two-Faced Concept

Wysocki, Tomasz

doi:10.1007/s13164-020-00463-z

Normality: a Two-Faced Concept

Published: 14 March 2020

Volume 11, pages 689–716, (2020)
Cite this article

Review of Philosophy and Psychology Aims and scope Submit manuscript

Tomasz Wysocki ORCID: orcid.org/0000-0003-2204-914X¹

671 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

Consider how we evaluate how normal an object is. On the dual-nature hypothesis, a normality evaluation depends on the object’s goodness (how good do you think it is?) and frequency (how frequent do you think it is?). On the single-nature hypothesis, the evaluation depends solely on either frequency or goodness. To assess these hypotheses, I ran four experiments. Study 1 shows that normality evaluations vary with both the goodness and the frequency assessment of the object. Study 2 shows that manipulating the goodness and the frequency dimension changes the normality evaluation. Yet, neither experiment rules out that some people evaluate normality solely based on frequency, and the rest evaluate normality solely based on goodness. Whence two more experiments. Study 3 reveals that when scenarios are contrasted—presented one after another—only frequency matters. But, as study 4 shows, when scenarios are evaluated alone, both frequency and goodness influence normality evaluations in a single person, although the more a person is sensitive to one dimension, the less she’s sensitive to the other. The dual-nature hypothesis seems thus true of uncontrasted applications of the concept of normality, whereas the single-nature hypothesis seems true of contrasted applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Notes

I use the term object to refer to the object of a normality judgment. Thus, the term can refer to material objects (“this seems like a perfectly normal orange to me”), behaviors (“what are you doing? This is not normal at all!”), events (“earthquakes are pretty normal in this part of the country”), and so on.
Later, Bear and Knobe indeed studied the applications of the concept of normality (Bear and Knobe 2017). However, the first two experiments in this paper (§2, §3) were run before theirs, and the subsequent two experiments (§4, §5) take a different route than Bear and Knobe’s study. I discuss their results briefly in §2.
Given the recommendations by Green (1991) and to account for some participants missing a question, I intended to have 200 participants. However, only 189 mTurk workers filled out the questionnaire before the time allotted for data collection expired. Of those, 10 participants missed one question and thus were excluded from the analysis.
The experiment was run before same-sex marriage was legalized in the U.S.
This interpretation isn’t strictly correct because the interaction is significant; still, the interpretation provides a good grasp on what bs mean. The interaction term will be discussed shortly.
In linear regression, a β coefficient denotes how many standard deviations a dependent variable will change if the predictor changes by one standard deviation. βs thus behave like bs but are interpreted in terms of standard deviations rather than the units of the IV and DV.
Let me thank one of the reviewers for drawing my attention to this fact.
For the ordinal regression results, see the appendix, §1. Since in all studies I use a Likert scale for the dependent variable, ordinal regression might seem like a better choice than linear regression. However, the interpretation of ordinal models isn’t as straightforward as the linear models, only loose analogs of R² and η² coefficients are available, and in practice psychologists use linear regression for modelling similar problems. Therefore, in this investigation, I limit myself to linear models (standard and mixed-effects). The appendix, as well as the data, protocols, and scripts in R, can be downloaded from the project’s repository on the Open Science Framework: https://osf.io/6kr2m/.
That’s consistent with the results of the fourth and fifth experiment (§4, §5), which indicate that people’s average normality evaluations differ between scenarios.
Further research stemming from Barsalou’s focused mostly on the role of ideals (Bailenson et al. 2002; Burnett et al. 2005; Voorspoels et al. 2011).
Presumably, typicality—again, in this research denoting being a good example of a member of a target category—is different from normality. Declaring something a bad example doesn’t carry the normative implications of declaring it abnormal (“What you have done isn’t normal” sounds like an accusation; “what you have done here isn’t what people typically do in similar circumstances” may even sound like a praise of your creativity). It’s also felicitous to utter “this one isn’t a good example of skin moles, yet it’s still normal,” and it sounds fine to call Kaczyński a good example of a modern Central European authoritarian figure, but less so to call him a normal authoritarian. For similar reasons, goodness is presumably very different from closeness to the ideal: traits close to the ideal of the category ‘annoying personality features’ won’t score high on goodness.
I would like to thank one of the reviewers for this observation.
Other authors working on the interaction between the descriptive and the normative also use the broader notion of the normative dimension (Bear and Knobe 2017; Eidelman and Crandall 2014; Tworek and Cimpian 2016).
Again, let me explain the choice of the sample size. There were 18 conditions; for each condition, I planned to have at least 14 observations, so the comparisons between the conditions would be meaningful. Moreover, running a mediation analysis (below) required a big sample size (Fritz and MacKinnon 2007). I thus intended there to be 300 participants; 284 answered both questions about the scenario.
In all following studies, the interaction term was not significant, and so I report models evaluated for equations without interaction terms. I include the models with interaction terms in the appendix, §2.
That is, the green (red) line shows how normal depends on frequency for trait = 1 (trait = 0).
In Preacher and Hayes’s method the correlations are established using linear regression with bootstrapping. On fig. 2, b_t → n is the regression coefficient in the model predicting normal from trait alone, b_t → g is the coefficient in the model predicting good from trait alone, and b_g → n is the coefficient in the model predicting normal from good alone. All of them are significant (p < 0.001). b’_t → n is the coefficient next to trait in the model predicting normal from good and trait, and this coefficient isn’t significant (p = 0.515): once good is included, the relationship between trait and normal disappears.
These correspond to Hitchcock and Knobe’s notions of statistical and prescriptive norms, respectively.
Given recommendations in Westfall and Kenny (2014) and expecting a medium-sized effect at least, I chose the sample size as 100 participants; 97 mTurk workers filled out the questionnaire before the task expired.
Therefore, goodness refers to the participants’ own evaluation, on a seven-point scale, how good the described behavior or situation is.
The model with random slopes for participants and random intercepts for scenarios was the simplest model whose fit wasn’t worse than any other model’s—i.e., no other model was deemed better with the χ² test. Additionally, the chosen model has the smallest BIC of all models considered. For the details of model selection, see the appendix, §3.1.
There’s no straightforward way to obtain η²s for a mixed-effects model, but there are two indirect ways. First, you can use the corresponding ANOVA model with random-effects variables as blocks. And indeed, the values in the table come from such a model with participant and scenarios as blocking variables. (Moreover, the ANOVA model corroborates the mixed-level results: goodness and frequency, but not their interaction, are significant at p < 0.001 level; I don’t report these results in detail, as doing so wouldn’t add to the analysis.) Another way is to compare the R²s of mixed-effects models with either predictor removed. So, marginal R² = 34.3% for the full model with scenario and participant as random effects; for goodness as the sole predictor, marginal R² = 0.7%, and for frequency as the sole predictor, marginal R² = 33.7%. These values match almost perfectly the η²s obtained with ANOVA.
For one participant, the first response would be one to the good-rare version of the second scenario; for another, it would be one to the good-frequent version of the fifth scenario, and so on.
Models with scenario-specific random slopes for either goodness or frequency had no better fit than the simpler model from table 5, and that’s why I chose the latter. See the appendix, §3, for the details of model selection.
Again, a corresponding ANOVA model with scenario as a blocking variable delivered η²s. (It also agreed with the mixed-effects model on the predictors’ significance: p < 0.001 for both goodness and frequency, with a not significant interaction.) Similar (although not identical) results follow from analyzing the proportions of the total variance explained. For the full mixed-effect model, marginal R² = 26%; for a model with goodness as the sole predictor, marginal R² = 10%, and for a model with frequency as the sole predictor, marginal R² = 21%. Notice that the additional amount of variance explained by a variable added to the model depends on the order of adding variables, which is the reason why I prefer relying on less ambiguous η²s from ANOVA.
For instance, point (0.1, 1) means that, according to the equation for that individual, a one-point increase in goodness evaluation increases the expected normality evaluation by 0.1, and switching from a rare to a frequent behavior increases the evaluation by 1 point.
The normative dimension is was additionally highlighted, as immediately after the first normality evaluation, the participants answered how good or bad they find the behavior/situation depicted.
The mixed-effects model can’t be used to evaluate that interaction, but an ANOVA can. A two-way ANOVA modelling the influence of goodness, frequency, and the scenario (a categorical variable with five levels) on the normality evaluation yields results consistent with the mixed-effects models. All three main effects, but no interactions, were significant. For details, see the appendix, §4.
The median times between filling out the vignettes were: 6 days between the first and second one, 4 days between the second and third, and 3 days between the third and fourth.
Justifying the sample size is tricky in this case because the statistics used below is descriptive, not inferential. I therefore simply collected responses from as many students as possible.
Again, η²s come from ANOVA with participants as a blocking variable, and these values are closely matched by analyzing marginal R²: 12% for a mixed-effects model with goodness as the sole predictor and 26% for a model with frequency as the sole predictor.
That is, I estimated the model in table 6 (right) using only the answers to the four versions of the squirrel scenario, even though the participants evaluated four other scenarios too.
63% of uncontrasted goodness slopes are larger than the largest contrasted goodness slope.
The versions appeared in the same order for all participants. Since the first two scenarios differ with respect to goodness but not frequency, that might have drawn the participants’ attention to that dimension. That is, the distribution in these answers (fig. 5, red) might be due to the salience of the normative dimension rather than due to considering the cases in isolation. Although only a follow up study could settle this possibility, I don’t find it worryingly likely, as the median time between filling out the first two vignettes was 6 days, more than enough for the participants to forget the details of the scenario.
Allow me to mention my reasons—I don’t expect you to share them. First, the data collected doesn’t yet allow for inferring underlying mechanisms (I find the story presented plausible but still merely hypothetical). Second, if the correct meaning of a term has normative implications about how one ought to use the term, establishing the correct meaning of normality would require analyzing the ramifications of using the concept (Burgess and Plunkett 2013). Given my casual observation of how this concept functions in moral reasoning (see §6.3), my tentative verdict is that the term should be banished altogether, except for its technical uses (as in, e.g., statistics).
But notice that “Hibbles eat one kind of berries” is a generic statement, whose meaning is much more complex than the one of statistical statements (Leslie 2012). If generics themselves encode normative information, children in the experiment aren’t making is-to-ought inferences.
And indeed, the authors devised the fake-galaxy experiment to test for the existence bias, although in the experiment they manipulated the frequency of the galaxy.
If this speculation is indeed correct, the two hypotheses should be put in terms of being rather than frequency.
However, Tworek and Cimpian’s model (estimated on adults) explains R² = 11% of the variance in their participants’ is-ought inferences, which suggests there’s more to these inferences than peoples’ focus on objects’ intrinsic features.

References

Bailenson, J., M. Shum, S. Atran, D. Medin, and J. Coley. 2002. A bird’s eye view: Biological categorization and reasoning within and across cultures. Cognition 84 (1): 1–53.
Article Google Scholar
Barsalou, L. 1985. Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories. Journal of Experimental Psychology. Learning, Memory, and Cognition 11: 629–654.
Article Google Scholar
Bear, A., and J. Knobe. 2017. Normality: Part descriptive, part prescriptive. Cognition 167: 25–37.
Article Google Scholar
Burgess, A., and D. Plunkett. 2013. Conceptual ethics I. Philosophy Compass 8 (12): 1091–1101.
Article Google Scholar
Burnett, R., D. Medin, N. Ross, and S. Blok. 2005. Ideal is typical. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale 59 (1): 3–10.
Article Google Scholar
Egré, P., and F. Cova. 2015. Moral asymmetries and the semantics of many. Semantics and Pragmatics 8 (13): 1–45.
Google Scholar
Eidelman, C., and S. Crandall. 2014. The intuitive traditionalist: How biases for existence and longevity promote the status quo. Advances in Experimental Social Psychology 50: 53–104.
Google Scholar
Eidelman, S., C. Crandall, and J. Pattershall. 2009. The existence bias. Journal of Personality and Social Psychology 97 (5): 765–775.
Article Google Scholar
Eidelman, S., J. Pattershall, and C. Crandall. 2010. Longer is better. Journal of Experimental Social Psychology 46 (6): 993–998.
Article Google Scholar
Fritz, M., and D. MacKinnon. 2007. Required sample size to detect the mediated effect. Psychological Science 18 (3): 233–239.
Article Google Scholar
Goldstein, N., R. Cialdini, and V. Griskevicius. 2008. A room with a viewpoint: Using social norms to motivate environmental conservation in hotels. Journal of Consumer Research 35: 472–482.
Article Google Scholar
Green, S. 1991. How many subjects does it take to do a regression analysis? Multivariate Behavioral Research 26 (3): 499–510.
Article Google Scholar
Grice, P. 1961. The causal theory of perception. Proceedings of the Aristotelian Society, Supplementary 35: 121–168.
Article Google Scholar
Hitchcock, C., and J. Knobe. 2009. Cause and norm. Journal of Philosophy 106: 587–612.
Article Google Scholar
Hlavka, H. (2014). Normalizing sexual violence: Young women account for harassment and abuse. Gender and Society.
Kalish, C. 2015. Normative concepts. In The conceptual mind: New directions in the study of concepts, ed. E. Margolis and S. Laurence, 519–539. Cambridge, MA: MIT Press.
Google Scholar
Kushner, H. 2017. On the other hand: Left hand, right brain, mental disorder, and history. Baltimore: Johns Hopkins University Press.
Google Scholar
Leslie, S.-J. 2012. Generics. In The Routledge companion to philosophy of language, ed. G. Russell and D. Fara, 355–367. New York: Routledge.
Google Scholar
Madon, S. 1997. What do people believe about gay males? A study of stereotype content and strength. Sex Roles 37 (9): 663–685.
Article Google Scholar
Men’s health. (2014a). Am I normal? p. 21.
Men’s health. (2014b). Am I normal? p. 30.
Men’s health. (2014c). Am I normal? p. 26.
Nolan, J., P. Schultz, R. Cialdini, N. Goldstein, and V. Griskevicius. 2008. Normative social influence is underdetected. Personality and Social Psychology Bulletin 34: 913–923.
Article Google Scholar
Preacher, K., and A. Hayes. 2004. SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, & Computers 36: 717–731.
Article Google Scholar
Roberts, S., S. Gelman, and A. Ho. 2016. So it is, so it shall be: Group regularities license children’s prescriptive judgments. Cognitive Science 41: 576–600.
Article Google Scholar
Roberts, S., A. Ho, and S. Gelman. 2017. Group presence, category labels, and generic statements influence children to treat descriptive group regularities as prescriptive. Journal of Experimental Child Psychology 158: 19–31.
Article Google Scholar
Tworek, C., and A. Cimpian. 2016. Why do people tend to infer “ought” from “is”? The role of biases in explanation. Psychological Science 27 (8): 1109–1122.
Article Google Scholar
Voorspoels, W., W. Vanpaemel, and G. Storms. 2011. A formal ideal-based account of typicality. Psychonomic Bulletin & Review 18 (5): 1006–1014.
Article Google Scholar
Westfall, J., and D. Kenny. 2014. Statistical power and optimal Design in Experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology 143 (5): 2020–2045.
Article Google Scholar

Download references

Acknowledgements

I would especially like to thank Joshua Knobe and the two reviewers—Steven Verheyen and the other, anonymous reviewer—for their invaluable advice and rich feedback. (Only a person who has worked with Josh knows how encouraging and helpful he is; I wouldn’t have completed this study if it wasn’t for him. And Steven Verheyen’s suggestions, especially the ones about the methodology and related psychological literature, were amazing; I wish everyone a reviewer like him.) For their help, I also want to thank John Doris, Dominik Dziedzic, Phoebe Friesen, Thomas Icard, Katarzyna Szubert, Adam Shmidt, Pascale Willemsen, and Jennifer Whyte. This research was supported by the National Science Centre (Poland), grant no. DEC-2013/09/N/HS4/03693.

Author information

Authors and Affiliations

The Department of History and Philosophy of Science, University of Pittsburgh, Pittsburgh, PA, USA
Tomasz Wysocki

Authors

Tomasz Wysocki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomasz Wysocki.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wysocki, T. Normality: a Two-Faced Concept. Rev.Phil.Psych. 11, 689–716 (2020). https://doi.org/10.1007/s13164-020-00463-z

Download citation

Published: 14 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s13164-020-00463-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Normality: a Two-Faced Concept

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

ESM 1

ESM 2

ESM 3

ESM 4

ESM 5

ESM 6

ESM 7

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Normality: a Two-Faced Concept

Abstract

Access this article

Similar content being viewed by others

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation