1 Introduction: coefficient eta and eta squared

Coefficient eta (η; Pearson 1903, 1905), sometimes—specifically, in the early days—called the correlation ratio (e.g., Pearson 1911; Ayres 1920; Fisher 1925; Kelley 1935), is one of the oldest directional measures of association. Originally, Pearson proposed it as a measure of the relationship between a categorical and a continuous variable, although it can be used also with ordinal and interval-scaled variables. Fisher, however, was not convinced of the usefulness of correlation ratio and in his influential Statistical Methods for Research Workers he notea: “As a descriptive statistic the utility of the correlation ratio is extremely limited” (Fisher 1925, p. 219). However, η turned out to be the dominant measure to quantify a curvilinear association (e.g., Howell 2012; Sechrest and Yeaton 2011; see also Ayres 1920 and Peters and Van Voorhis 1940) and, as eta squared (η2) and partial eta squared (\(\eta_{p}^{2}\); Cohen 1965, 1973), it turned out to be the dominant indicator of the explaining power between two variables in settings related to the analysis of variance (ANOVA) and covariance (ANCOVA) or, more generally, in general linear modeling (GLM; e.g., Cohen 1965, 1973, 1988).

The rise of η2 was influenced by the rise in the discussions about effect sizes related to the proportion of variance in the 1960s (e.g., Cohen 1965, 1969; Friedman 1968; Kerlinger 1964). Even though epsilon squared (ε2; Kelley 1935), omega squared (ω2; Hays, 1963) or, recently, adjusted (partial) eta squared (\({\text{adj}}\eta_{p}^{2}\), later \(\eta_{adj}^{2}\); Mordkoff 2019)Footnote 1 are suggested to be used instead of η2 for the inferential statistic related to population (see the history, differences, and literature in Glass and Hakstian 1969; Okada 2013; 2017; Richardson 1996), η2 is the most often used measure of these three (see Fig. 1). From the year 1990 onwards, the trend of fixed keyword “eta squared” has rocketed against the (trend of) proportion of all publications (as indicated by a common keyword “and” in Fig. 1).

Fig. 1
figure 1

Historical occurrence of the traditional measures of explaining power in Google Scholar

Although ω2 and, specifically, ε2 are shown to give less biased estimates than η2 (see Okada 2013, 2017; Mordkoff 2019), some positive characteristics of η are that, first, η2 can be interpreted in the same manner as a squared partial correlation coefficient (\(r_{p}^{2}\)) from multiple linear regression: as the proportion of remaining variance in the dependent variable (Mordkoff 2019). Second, while all unbiased estimates ω2, ε2, and \(\eta_{adj}^{2}\) are prone to give negative (out-of-range) values when η2 is near 0, specifically, with the small sample sizes (Okada 2013, 2017), the estimates by η2 always stay within the limits although with a considerable amount of bias. The positive bias in η and η2 is caused by the fact that even if the true means in all categories of the categorical variable are equal leading to η = η2 = 0, variance across the sample means is rarely zero. Random differences around zero will cause the value of η2 to be greater than zero (Mordkoff 2019). In what follows, the opposite challenge of η and η2 is also discussed: an obvious and grave underestimation of association and explaining power.

Of many indices of association between two variables, η is a truly directional measure in the same manner as Somers delta (Somers 1962). As a directional measure, η produces two estimates of the association: one of the two variables is dependent and the other, then, is independent. In some cases, either direction may be valuable to study—both directions may be calculated to conclude which direction is more dominant (e.g., whether the attitude better explains the achievement or it is the other way around).Footnote 2 However, often only one direction interests the researcher; in GLM settings, a metric variable (X) is usually taken as the dependent variable and the grouping variable (g) is taken as independent. Then, this direction is often notated as \(\eta \left( {X\left| g \right.} \right)\) which direction is named as “X dependent” in most software packages; the thinking and naming are opposite in measurement modelling settings. The peculiarity in naming and an alternative notation are discussed later.

Traditionally, η varies from 0 to 1 where \(\eta \left( {X\left| g \right.} \right) = 0\) indicates the special case of no dispersion among the means of the metric variable X with ordinal, interval, or continuous scale in different categories in the categorical or ordinal variable g. The value \(\eta \left( {X\left| g \right.} \right) = 1\) indicates that each category in X is related to only one category in g, which presumes that there are no crossing pairs in the dataset. Technically, then, η and η2 can reach the value 1. However, this necessitates that the “metric” variable has only as many categories as the independent variable (see Appendix). The interpretation is parallel with \(\eta \left( {g\left| X \right.} \right) = 0\) and \(\eta \left( {g\left| X \right.} \right) = 1\).

This article discusses a specific challenge related to coefficient η and its close relative, product‒moment correlation coefficient (PMC; Bravais 1844; Galton 1889; Pearson 1896 onwards) often seen in the form of item–total correlation (Rit = \(\rho_{gX}\)) in the measurement modelling settings: artificial systematic attenuation or deflation in the estimates of correlation. In the empirical section, it is seen that the deflation in η may be as high as 0.50–0.60 units of correlation and in η2 as high as 0.70–0.80 units of explaining power. A deflation of this size is worth discussing and taking seriously.

Attenuation is a statistical concept that, in general, refers to underestimating the correlation between two different measures because of measurement error (Lavrakas 2008). Pearson himself was the first to offer a solution to the problem (1903) and many solutions have been offered after that (see the discussion related to the concept of restriction of range as the reason for the attenuation in Mendoza and Mumford 1987; Sacket et al. 2007; Schmidt and Hunter 2015). This matter is discussed later in Section 4.3. If the attenuation is understood broadly, the estimates by the traditional estimators of correlation are also be radically deflated caused by artificial systematic errors during the estimation (see the discussion of the terms in, e.g., Chan 2008; Gadermann et al. 2012; Lavrakas 2008; Metsämuuronen 2022a). Underestimation of a magnitude of 0.50–0.70 units of correlation (see Section 4.2) is no more attenuation as we understand it in measurement modelling—or attenuation is only a minor part of this radical deflation. However, most probably, the grave deflation also includes attenuation caused by the difference in scales. Separating these two is difficult. Hence, the term “mechanical attenuation” is used to cover both random error and deflation. Even though the focus in this article is on deflation and its correction, both the terms mechanical attenuation and deflation are in use.

2 Research questions and the course of study

Condensing the previous discussion, the main research questions are:

  1. 1.

    What is the nature of relation between η and PMC? This matter is discussed, first, based on the literature related to point-biserial correlation and, second, based on empirical dataset related to point–polyserial correlation.

  2. 2.

    What is the nature, algebraic reasons, and magnitude of attenuation and deflation in eta and eta squared? Because of the algebraic identity of η and PMC, η must be artificially attenuated because PMC is known to be radically deflated. The algebraic reasons are discussed in the theoretical section and Appendix, and the empirical section illustrates the magnitude.

  3. 3.

    How to correct the obvious deflation in PMC, η, and η2? A simple procedure is suggested to assess the magnitude of the deflation and to correct it.

  4. 4.

    How to obtain also the negative values of η by using the traditional procedures? The negative values of η are easy to obtain with binary g, because PMC and η are identical in dichotomous cases. For the polytomous ordinal case, a new procedure is suggested.

The course of the article can be as follows: Although η and PMC use, apparently, different information in their calculation, first, the identity of their formulae is discussed in the case of the dichotomous independent variable. Second, it is shown empirically in the polytomous ordinal case how close PMC follows a certain direction of η, namely, η directed, so that “X dependent” although, practically always, |PMC| <|\(\eta \left( {X\left| g \right.} \right)\)|. This leads to a somewhat disturbing note that Pearson correlation which is usually thought to be a symmetric measure is, factually, a directional measure when the scales of two variables are not identical. Third, because of the identity of η and PMC, and because PMC is known to be deflated in an obvious manner when the scales of two variables are not equal, the magnitude of underestimation by η and η2 is discussed and shown by examples. Fourth, a relevant option of correcting the attenuation in η and, consequently, in η2 are discussed. Fifth, a form of η that also allows negative values is discussed. The empirical section gives numerical examples of the underestimation in η and η2, how to obtain negative values for η, and how the negative values of η are seen in non-linear association with convex and concave patterns.

3 Forms of η and different traditions of naming the directions

Above, the notation of η relevant in GLM settings was discussed. Here, an alternative thinking relevant to measurement modelling settings is discussed. In measurement modelling settings, the relevant direction is the one where the variable with a wider scale (often, a score variable X) explains the response pattern or order of the test-takers in the variable with a narrower scale (usually, a grouping variable such as sex or a test item g). This direction is relevant also in nonparametric methodology when analyzing the variables using Mann–Whitney U test (Mann and Whitney 1947) or Jonckheere–Terpstra JT test (Jonckheere 1954; Terpstra 1952) The thinking, terminology, and the notations of the directions vary between GLM settings and measurement modelling settings. This is briefly discussed below.

3.1 Different traditions in naming the directions

Assume a binary variable g and a metric variable X. Variable g could be, as an example, sex (0 = female, 1 = male or vice versa) or response in a binary test item (0 = incorrect answer, 1 = correct answer), and the metric variable may be a test score X. In the literature related to directional measures (e.g., IBM 2017; Metsämuuronen 2017; Newson 2002, 2006; Siegel and Castellan 1988) and, consequently, in the outputs of many generally known software packages such as IBM SPSS, SAS, as well as R libraries, the traditional direction of the analysis is the one where the categorical variable g is taken as an independent factor. This direction is called “X dependent” and it is notated as (X|g). This makes sense when η2 is used in GLM settings where the metric variable (X) cannot explain the nominal variable, such as sex (g), and, hence, g must be an “independent” factor and, consequently, X must be a “dependent” factor.

The opposite thinking and naming of the same direction are relevant in settings related to measurement modelling and nonparametric testing. In what follows, this logic is used in the notation. In measurement modelling settings, the same factual direction as above means that the latent trait manifested as the score (X) explains the response pattern in the item (g), and the opposite direction does not make sense (e.g., Byrne 2016; Kim 1971; Metsämuuronen 2017; see the discussion of the directions with Table 3 and examples in Metsämuuronen 2020). Also, when analyzing the association of g and X with such procedures as Mann–Whitney U or Jonckheere–Terpstra JT tests, with manual calculation (see, e.g., Metsämuuronen 2017; Siegel and Castellan 1988), the subpopulations in g are first ordered by X after which the order is analyzed, that is, the order in g depends on X. In these settings, it is natural to think the relation of g and X from the condition’s viewpoint as “the pattern in g depends on X”, that is, “g dependent” or “g given X”, usually notated as (g|X). Because of willingness to keep the discussion open for both interpretations, both viewpoints are kept equally relevant views in the text to interpret the outcome. From this on, the logic familiar from conditions is used in the notation: in what follows, \(\eta \left( {g\left| X \right.} \right)\) = \(\eta_{g\left| X \right.}\) refers to “ η directed so that “g given X’” and \(\eta_{g\left| X \right.}^{2}\) refers to “η2 directed so that ‘g given X’” which would be labelled as “X dependent” in generally known software packages. The alternative notation \(\eta_{g\left| X \right.}\) is preferred primarily for the cases that superscripts are used also as in \(\eta_{g\left| X \right.}^{2}\), \(\eta_{g\left| X \right.}^{Obs}\), or \(\eta_{g\left| X \right.}^{Max}\).

3.2 Calculation of coefficient η: a simple case with dichotomous categories

To lead to the general formula, let us first consider the simple case of a dichotomous g and a metric X. Assume a metric (ordinal, interval, or continuousFootnote 3) variable X with observed values xj and a binary variable g with observed values yi = 0, 1. The traditional direction of η in settings related to GLM (“X dependent”) can be expressed as follows:

$$\eta \left( {g\left| X \right.} \right) = \sqrt {\eta_{g\left| X \right.}^{2} } = \sqrt {\frac{{n_{0} \left( {\overline{X}_{X0} - GM_{X} } \right)^{2} + n_{1} \left( {\overline{X}_{X1} - GM_{X} } \right)^{2} }}{{\sum\limits_{j = 1}^{N} {\left( {x_{j} - GM_{X} } \right)^{2} } }}} ,$$
(1)

where \(n_{0}\) and \(n_{1}\) refer to the number of cases in subpopulations 0 and 1 in g, \(\overline{X}_{X0}\) and \(\overline{X}_{X1}\) refer to the means of the variable X in these subpopulations, and \(GM_{X}\) is the grand mean of the variable X. If g is a test item with incorrect (0) and correct (1) responses, \(\eta \left( {g\left| X \right.} \right)\) indicates, in general, to what extent the higher score is related to the correct answer in the item. Notably, the interpretation for the same direction by Goodman–Kruskal gamma (G = \(G\left( {g\left| X \right.} \right)\); Goodman and Kruskal 1954; Metsämuuronen 2021a, b)Footnote 4 and Somers’ delta (D = \(D\left( {g\left| X \right.} \right)\); Somers 1962; on the selected direction, see Metsämuuronen 2020) is known to correspond the proper direction in the measurement modelling settings. Then, consequently, it is known that \(\eta \left( {g\left| X \right.} \right)\) and related \(\eta_{g\left| X \right.}^{2}\) indicate to what extent the score variable X explains the behavior (or the pattern) in variable g (and not the opposite way). From the GML viewpoint, \(\eta \left( {g\left| X \right.} \right)\) quantifies the proportional differences of the means of X between observations in different categories in g (e.g., males and females or between those who gave the correct and incorrect answer). From both viewpoints, the opposite direction is not meaningful. However, in some cases, both directions may be equally relevant.

3.3 Calculation of coefficient eta: a general case

Assume a nominal or ordinal variable g with observed values xi with R categories, and a metric (ordinal, interval, or continuous) variable X with observed values yi with C categories, and, often, R <  < C. The traditional direction of η in settings related to GLM (“X dependent”) notated here as \(\eta \left( {g\left| X \right.} \right)\) can be expressed as square root of the sum of squares related to the difference between the means of X in the subpopulations in g (SSbetween or SStreatment) divided by the sum of squares within the groups related to X (SSwithin or SStotal)

$$\eta \left( {g\left| X \right.} \right) = \sqrt {\eta_{g\left| X \right.}^{2} } = \sqrt {\frac{{SS_{between} \left( {g\left| X \right.} \right)}}{{SS_{total} \left( {g\left| X \right.} \right)}}} = \sqrt {\frac{{\sum\limits_{g = 1}^{R} {n_{g} \left( {\overline{X}_{Xg} - GM_{X} } \right)^{2} } }}{{\sum\limits_{i = 1}^{R} {\sum\limits_{j = 1}^{C} {\left( {x_{ij} - GM_{X} } \right)^{2} } } }}}$$
(2)

(e.g., Kerlinger 1964)Footnote 5 where \(\overline{X}_{Xg} = \sum\nolimits_{j = 1}^{{n_{g} }} {\frac{{y_{j} }}{{n_{g} }}}\) refers to the means of X in different categories in g, and

\(GM_{X} = {{\sum\nolimits_{g = 1}^{R} {n_{g} \overline{X}_{Xg} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{g = 1}^{R} {n_{g} \overline{X}_{Xg} } } {\sum\nolimits_{g = 1}^{R} {n_{g} } }}} \right. \kern-\nulldelimiterspace} {\sum\nolimits_{g = 1}^{R} {n_{g} } }}\) is the grand mean of X. Correspondingly, η directed so that “g is dependent” as in GLM settings or “X given g” as in settings related to conditions can be expressed as

$$\eta \left( {X\left| g \right.} \right) = \sqrt {\eta_{X\left| g \right.}^{2} } = \sqrt {\frac{{SS_{between} \left( {X\left| g \right.} \right)}}{{SS_{total} \left( {X\left| g \right.} \right)}}} = \sqrt {\frac{{\sum\limits_{X = 1}^{C} {n_{X} \left( {\overline{X}_{gX} - GM_{g} } \right)^{2} } }}{{\sum\limits_{i = 1}^{R} {\sum\limits_{j = 1}^{C} {\left( {y_{ij} - GM_{g} } \right)^{2} } } }}} ,$$
(3)

where \(\overline{X}_{gX} = \sum\nolimits_{i = 1}^{{n_{X} }} {\frac{{y_{i} }}{{n_{X} }}}\) refers to the means of g in each category X and \(GM_{g} = {{\sum\nolimits_{X = 1}^{C} {n_{X} \overline{X}_{gX} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{X = 1}^{C} {n_{X} \overline{X}_{gX} } } {\sum\nolimits_{X = 1}^{C} {n_{X} } }}} \right. \kern-\nulldelimiterspace} {\sum\nolimits_{X = 1}^{C} {n_{X} } }}\) is the grand mean of g.

4 Relation of η and PMC and some consequences

4.1 Relation of η and PMC

In the case of the correlation between a dichotomous variable and metric variable, Eq. (1) can be expressed in a form

$$\eta \left( {g\left| X \right.} \right) = (\overline{X}_{X1} - \overline{X}_{X0} )\frac{{\sigma_{g} }}{{\sigma_{X} }},$$
(4)

where \(\sigma_{g}^{{}}\) and \(\sigma_{X}^{{}}\) are the standard deviation of g and X (see the algebraic proof in Wherry and Taylor 1946; Eikeland 1971). Notably, this form is identical with the simplified form for PMC found in textbooks for binary settings (e.g., Lane et al. 2016; Lord and Novick 1968; Metsämuuronen 2017), usually referred to as point-biserial correlation (\(\rho_{PB}\); Kuder 1937; Swineford 1936)Footnote 6 or as item–total correlation (\(\rho_{gX}\)) for binary items in measurement modelling settings. Hence, with binary items or for example with dummy variables in GLM settings (see Cohen 1969), η directed so that “g given X” (in the measurement modelling settings) or “X dependent” (in GLM settings) equals PMC

$$\eta \left( {g\left| X \right.} \right) = \rho_{PB} = \rho_{gX} = {\text{PMC}}{.}$$
(5)

Notably, the opposite direction of η, \(\eta \left( {X\left| g \right.} \right)\) do not lead to the form of PMC.

Although the proof in the binary case is straightforward and simple, it is not trivial to derive —if not even possible—in the polytomous ordinal case, because, in general, PMC differs from \(\eta \left( {g\left| X \right.} \right)\) (see, however, formulae in Wherry and Taylor 1946). However, it is easy to show by empirical datasets that the closeness between PMC and \(\eta \left( {g\left| X \right.} \right)\) in Eq. (4) is a general one: with polytomous (ordinal) variables, the magnitude of the estimates by PMC is closer to \(\eta \left( {g\left| X \right.} \right)\) than to \(\eta \left( {X\left| g \right.} \right)\), however, such that always

$$\left| {{\text{PMC}}} \right| \le \left| {\eta \left( {g\left| X \right.} \right)} \right|.$$
(6)

This is caused by a small difference between the formulae of PMC and η (see Howell 2012). While the absolute value of PMC can be simplified as

$$\left| {{\text{PMC}}} \right| = \sqrt {\frac{{SS_{total} - SS_{residual} }}{{SS_{total} }}} = \sqrt {\frac{{\sum {\left( {x_{ij} - GM_{X} } \right)}^{2} - \sum {\left( {x_{ij} - \hat{X}_{ij} } \right)^{2} } }}{{\sum {\left( {x_{ij} - GM_{X} } \right)}^{2} }}} ,$$
(7)

the absolute value of η is

$$\left| {\eta \left( {g\left| X \right.} \right)} \right| = \sqrt {\eta_{g\left| X \right.}^{2} } = \sqrt {\frac{{SS_{total} - SS_{error} }}{{SS_{total} }}} = \sqrt {\frac{{\sum {\left( {x_{ij} - GM_{X} } \right)}^{2} - \sum {\left( {x_{ij} - \overline{X}_{Xg} } \right)^{2} } }}{{\sum {\left( {x_{ij} - GM_{X} } \right)}^{2} }}} .$$
(8)

If the association between two variables is perfectly linear, which in general is a rare special case but which is always the case with the binary and dichotomous g, the predicted value by the regression model (\(\hat{X}_{ij}\)) equals the means of X in the subpopulations of g (\(\overline{X}_{Xg}\)). In that specific case, \(\left| {\eta \left( {g\left| X \right.} \right)} \right| = \left| {{\text{PMC}}} \right|\), because \(\sum {\left( {x_{ij} - \hat{X}_{ij} } \right)^{2} }\) = \(\sum {\left( {x_{ij} - \overline{X}_{Xg} } \right)^{2} }\). In any other condition, that is, practically always with polytomous g, \(\sum {\left( {x_{ij} - \hat{X}_{ij} } \right)^{2} }\) < \(\sum {\left( {x_{ij} - \overline{X}_{Xg} } \right)^{2} }\), because the residuals related to the data in hand (\(\sum {\left( {x_{ij} - \hat{X}_{ij} } \right)^{2} }\)) are constructed to be minimum and, hence, \(\sum {\left( {x_{ij} - \overline{X}_{Xg} } \right)^{2} }\) would always be greater than the minimum, causing

$$\left| {\eta \left( {g\left| X \right.} \right)} \right| > \left| {{\text{PMC}}} \right|.$$
(9)

The factual difference between \(\eta \left( {g\left| X \right.} \right)\) and PMC may be subtle depending on the degree of non-linear nature of the association. In the empirical dataset related to scores (X) and ordinal items (g) in measurement modelling settings with finite or small sample sized (n ≤ 200) with 14,880 estimates of correlation between items and score variables,Footnote 7 the difference between \(\eta \left( {g\left| X \right.} \right)\) and PMC appeared to be 0.007 units of correlation, while the difference between \(\eta \left( {X\left| g \right.} \right)\) and PMC appeared to be 0.159 units of correlation on average (Tables 1 and Fig. 2). The former difference seems to be relatively small. Hence, in what follows, the value of PMC is taken as a benchmark for the values of \(\eta \left( {g\left| X \right.} \right)\) also in polytomous (ordinal) cases.

Table 1 Selected characteristics of 14,880 test items
Fig. 2
figure 2

Connection of PMC and eta in real-life datasets (n = 14,880 estimates)

  1. 1)

    \(\eta_{g\left| X \right.}\) = eta directed, so that “g given X” (as conditions) or “X dependent” (as in GLM)

  2. 2)

    \(\eta_{X\left| g \right.}\) = eta directed, so that “X given g” (as conditions) or “g dependent” (as in GLM)

4.2 Related result 1: Pearson correlation is a directional measure

Traditionally, PMC is taken as a symmetric measure, because it produces only one estimate for the association (e.g., Walk and Rupp 2010). A consequence related to Eqs. (4) and (5) with a binary and metric variable and empirical findings with polytomous case, the classic Pearson correlation is not a symmetric measure of association as has been traditionally thought. Instead, PMC has a hidden directional nature in the same direction as G (= \(G\left( {g\left| X \right.} \right)\); Metsämuuronen 2021b), such that the variable with a wider scale (X) explains the response pattern or order in the variable with a narrower scale (g) and not the opposite way nor symmetrically (or as “X dependent” as in GLM settings). Then, because \(\eta \left( {g\left| X \right.} \right) \ne \eta \left( {X\left| g \right.} \right)\)

$${\text{PMC}} \cong \eta \left( {g\left| X \right.} \right) \ne \eta \left( {X\left| g \right.} \right).$$
(10)

This connection of PMC and η is not elaborated on here except to the extent needed for proposing a new type of attenuation correction to η and η2 and a short-cut to reach the negative values of η in a polytomous g with an ordinal scale.

4.3 Related result 2: η and η 2 underestimate the association and explaining power in an obvious manner

The identity and relation of PMC and η make it clear why the estimates by η and η2 must underestimate the true association and the true explaining power in an obvious manner. The attenuation in PMC and, consequently, in η and η2 is artificial and systematic in nature and may be partly related to the phenomenon called restriction of range or range restriction (RR; see literature in Meade 2010; Sackett and Yang 2000; Sackett et al. 2007; Schmidt et al. 2008; Schmidt and Hunter 2015; Walk and Rupp 2010). Pearson himself was the first to offer a solution to the problem (1903) and many solutions have been offered to correct the attenuation in X variable (see the typology in Sacket et al. 2007; see also Mendoza and Mumford 1987). However, even if there is no manifestation of RR in X, PMC itself is very vulnerable to several sources of mechanical error in the estimates of correlation (MEC; see simulations in Metsämuuronen 2021a, 2022a). Notably, the reason for the deflation is not in the imperfect scales; if two variables both are measured by 5-point Likert scale reflecting continuous latent variables, but all the cases are in the diagonal of the crosstable, PMC = 1 without error. The question is, why the correlation cannot be 1 if one scale is 5-point scale and the other 4- or 6-point scale? This is no more a matter of attenuation but caused by technical reasons.

In general, it is known that the number of categories, among others, influences the magnitude of the estimates by PMC (see simulations in Martin 1973, 1978; Olsson 1980). In practical terms, assume two identical continuous variables with obvious perfect correlation. Let one be dichotomized (g) and the other polytomized (X). Under this condition, PMC and, consequently, \(\eta \left( {g\left| X \right.} \right)\) cannot reach the (obvious) perfect (latent) correlation (see simulations in Metsämuuronern, 2021a, 2022a; see the algebraic reasons in Metsämuuronen 2016; see also Appendix). As an example, let us take a vector of 1000 normally distributed cases, dublicate it, truncate the one version into a form of ordinal variable (g) with three categories (0, 1, 2; df(g) = 2) and the other into a form of 21 ordinal categories (X). Let the proportion of the three categories be so that the ‘difficulty level’ is p = 0.15, that is, most of the cases fall into the categories 0 or 1. In the case, the highest value of the observed PMC is around 0.745 and of the observed \(\eta \left( {g\left| X \right.} \right)\) = 0.749 even if the latent variables, obviously, correlated perfectly (\(\rho = 1\)). Notably, such estimators of correlation as polychoric correlation (RPC) and G detect the perfect association and D almost perfectly (\(D\left( {g\left| X \right.} \right) = 0.985\); see Fig. 3).

Fig. 3
figure 3

Mechanical attenuation in eta and PMC; df(g) = 2, df(x) = 20, p = 0.15, latent normality

By modifying the example related to the obvious perfect latent correlation and binary g, Metsämuuronen (2021a, 2022a) showed that PMC and, consequently, η are sensitive to, at least, five sources of artificial systematic attenuation causing deflation in the estimates. First, if there is a discrepancy in the scales, in general, η tends to underestimate the true association between g and X. Second, η is sensitive to the number of categories in g; the true association tends to be underestimated more the less categories there are in g. Third, η is sensitive to the distribution of the latent variable; the true association tends to be underestimated more when the distribution of the latent variable is normal or skewed than when it is even. Fourth, η is sensitive to the division of subpopulations in g (or item difficulty in the measurement modelling settings); true association is underestimated more, the more extreme is the division (or the difficulty level) in g. Fifth, η is sensitive to the number of categories in X; the true association tends to be underestimated more the less categories there are in X.

Notably, \(\eta \left( {g\left| X \right.} \right)\) can reach the perfect correlation only in one specific theoretical case discussed above: that the variances of X in the subpopulations of g are equally zero.In the binary case, this can be inferred strictly from an alternative form of \(\eta \left( {g\left| X \right.} \right)\) (see Appendix)

$$\eta \left( {g\left| X \right.} \right) = \frac{{\left( {\overline{X}_{X1} - GM_{X} } \right)}}{{\sqrt {\left( {1 - p} \right)\sigma_{X1}^{2} + \left( {1 - p} \right)\left( {\frac{1}{p} - 1} \right)\sigma_{X0}^{2} + \left( {\overline{X}_{X1} - GM_{X} } \right)^{2} } }},$$
(11)

where \(\sigma_{X1}^{2}\) and \(\sigma_{X0}^{2}\) are the variances of X in the subpopulations 1 and 0. From Eq. (11), it is seen that \(\eta \left( {g\left| X \right.} \right)\) and, consequently, \(\eta_{g\left| X \right.}^{2}\) can reach the perfect 1 only in the theoretical case of \(\sigma_{X1}^{2}\) = \(\sigma_{X0}^{2}\) = 0.Footnote 8 This condition implies that each category in X is connected to only one category in g without crossing pairs between the variables, that is, there are equal number of categories in g and X. This is, however, usually not true when η and η2 are used in normal settings related to η2. In the binary case, the highest magnitude in Eq. (20) is achieved when p = 0.5

$$\eta_{g\left| X \right.}^{Max} = \frac{{\left( {\overline{X}_{X1} - GM_{X} } \right)}}{{\sqrt {0.5 \times \left( {\sigma_{X1}^{2} + \sigma_{X0}^{2} } \right) + \left( {\overline{X}_{X1} - GM_{X} } \right)^{2} } }}$$
(12)

and, assuming symmetric distribution of X,

$$\eta_{g\left| X \right.}^{Max} = \frac{{\left( {\overline{X}_{X1} - GM_{X} } \right)}}{{\sqrt {\sigma_{X1}^{2} + \left( {\overline{X}_{X1} - GM_{X} } \right)^{2} } }} = \frac{{\left( {\overline{X}_{X0} - GM_{X} } \right)}}{{\sqrt {\sigma_{X0}^{2} + \left( {\overline{X}_{X0} - GM_{X} } \right)^{2} } }}$$
(13)

(see Appendix).

The numeric examples below show that the underestimation in η and η2 may be notable. For instance, in deterministic patterns where G and D can detect (correctly) the extreme association (G = D = 1), the maximum magnitudes of \(\eta \left( {g\left| X \right.} \right) = 0.701 - 0.769\) leading to \(\eta_{g\left| X \right.}^{2} = 0.491 - 0.591\) (see Tables 2 and 3 below in Section “Numeric examples…”) indicate directly how much η and η2 underestimate the association and the explaining power. In terms of explaining power, around 50% of the information is lost, that is, η2 cannot even reach more than 50–60% of the remaining variance in the other variable. With η2 and \(\eta_{p}^{2}\) as well as with \(\rho_{p}^{2}\), this leads us to conclude that the explaining power traditionally defined as “the proportion of remaining variance in the dependent variable” means, factually, “the proportion of remaining variance of which the coefficient can reach in the dependent variable”. This relates with the note by Hays (1963, p. 505; see also Richardson 1996) that the proportion of the total variation in the dependent variable is what can be predicted or explained based on its regression on the independent variable within the sample being studied. This phenomenon is discussed later with numerical examples.

The severity of mechanical attenuation or deflation in η and η2 comes from the fact that, in the binary case, Eq. (11) implies that the loss of information in \(\eta \left( {g\left| X \right.} \right)\) and \(\eta_{g\left| X \right.}^{2}\) approximates 100% when the division of subpopulations (or, difficulty level) is extreme and, hence, the variance in g approximates zero irrespective of the fact that the true correlation between the variables may be perfect. This underestimation of association can be benchmarked when the behavior of directional coefficients such as D and G is known: these are capable of detecting the deterministic patterns correctly (see Newson 2002; Metsämuuronen 2020, 2021a, b; see Fig. 3 and Tables 2 and 3 and the related discussion).Footnote 9

4.4 Related result 3: need for attenuation correction to PMC and eta

To conclude the discussion of the underestimation in the estimates by η and η2, a possible new kind of attenuation correction in PMC, η, and η2 is discussed. The traditional coefficients suggested to correct the inaccuracy of η2, that is, ε2, ω2, and \(\eta_{adj}^{2}\) were developed to correct the positive bias in η2, specifically, with values near zero (see Okada 2013, 2017; Mordkoff 2019). However, because PMC and η, in general, tend to be severely affected by mechanical error causing bias toward zero, it may be worth developing potential correction factors that may correct also the negative bias in η and η2.

Many corrections of attenuation for PMC are available (see a typology in Sacket and Yang 2000), although these are developed for different purposes than what is discussed here. Possible corrections for attenuation have been studied from Pearson (1903) and Spearman (1904) onward. The well-known corrections based on works of Pearson (1903; see also the notes by Aitken 1934 and Lawley 1943) and Thorndike (1949) are based on correcting the error when a restriction has occurred in one variable (usually in X). The idea is to enhance the concurrent validity of the test score of this restricted sample by altering it either by understanding or modelling the behavior of unrestricted population variance (see the mechanics in, e.g., Sacket and Yang 2000; Schmidt et al. 2008). This approach seems not the best option in correcting the deflation in the settings related to η. Another type of simple option for the correction of the mechanical attenuation is discussed below.

An essential characteristic of PMC and η is that, given the observed dataset, their maximum values are fixed. For PMC, because of the basic formula of PMC (\(\rho_{gX} = {{\sigma_{gX} } \mathord{\left/ {\vphantom {{\sigma_{gX} } {\sigma_{g} \sigma_{X} }}} \right. \kern-\nulldelimiterspace} {\sigma_{g} \sigma_{X} }}\)) and given the observed values of \(\sigma_{g}^{2}\) and \(\sigma_{X}^{2}\), the only element affecting the magnitude of correlation is the covariance between g and X (\(\sigma_{gX}\)). The maximum value of \(\sigma_{gX}\) is obtained when g and X are in the same order. Hence, the maximal possible correlation (\(\rho_{gX}^{Max}\)) in the given set of variables is

$$\rho_{gX}^{Max} = \frac{{\sigma_{gX}^{Max} }}{{\sigma_{g} \sigma_{X} }}$$
(14)

(see Metsämuuronen 2022c). For \(\eta \left( {g\left| X \right.} \right)\), because of Eq. (8), the only element affecting the magnitude of the estimate is the sum of squared deviances of the score in the subsamples in g, SSError = \(\sum {\left( {x_{ij} - \overline{X}_{Xg} } \right)^{2} }\); the smaller is this element, the higher gets the value of \(\eta \left( {g\left| X \right.} \right)\). Given the number or proportions of cases in the subpopulations in g, the minimum value of \(\sum {\left( {x_{ij} - \overline{X}_{Xg} } \right)^{2} }\) is obtained when X is in order and all cases in a subpopulation in g are as close as possible to each other. One of these options is the condition that both g and X are in the same order irrespective of the nominal or ordinal nature of g. Hence, the same logic of finding the maximum correlation can be used with PMC and \(\eta \left( {g\left| X \right.} \right)\) irrespective of the nature of scale in \(\eta \left( {g\left| X \right.} \right)\).

A simple attenuation correction related to \(\rho_{gX}\)(\(\rho_{AC}\), later RAC) is the proportion of the observed correlation (\(\rho_{gX}^{Obs}\)) with \(\rho_{gX}^{Max}\) given the observed g and X

$$\rho_{AC} = \frac{{\rho_{gX}^{Obs} }}{{\rho_{gX}^{Max} }} = \frac{{\rho_{gX} }}{{\rho_{gX}^{Max} }}$$
(15)

(see Metsämuuronen 2022c). Similarly, the attenuation-corrected η (\(\eta_{AC}\), later EAC) is the proportion of observed eta (\(\eta_{g|X}^{Obs}\)) and the maximal value (\(\eta_{g|X}^{Max}\)), which, in the binary case, is the maximum value of PMC

$$\eta_{AC} = \frac{{\eta_{g|X}^{Obs} }}{{\eta_{g|X}^{Max} }}\left( { = \frac{{\eta_{g|X} }}{{\rho_{gX}^{Max} }}} \right).$$
(16)

Consequently, attenuation-corrected η2 (\(\eta_{AC}^{2}\), later E2AC) is

$$\eta_{AC}^{2} = \left( {\frac{{\eta_{g|X}^{Obs} }}{{\eta_{g|X}^{Max} }}} \right)^{2} \left( { = \left( {\frac{{\eta_{g|X} }}{{\rho_{gX}^{Max} }}} \right)^{2} } \right).$$
(17)

In polytomous ordinal g, the latter part of formulae (16) and (17) leads to a mild overestimation in practical settings, because the absolute magnitude of the estimates by PMC is somewhat lower than the ones by \(\eta_{g|X}\) (see Eq. 9). However, in empirical datasets, the difference between \(\eta_{g|X}\) and PMC may be nominal (see above, 0.007 units of correlation), while the attenuation may raise as high as 0.13 to 0.99 units of correlation depending on the characteristics of g (see Metsämuuronen, 2021a, 2022a). Obviously, using \(\rho_{gX}^{Max}\) as a benchmark would not make sense with genuinely nominal-scaled polytomous g. Numeric examples of RAC, EAC, and E2AC are given later.

Equations (15) and (16) seem to solve all five challenges of PMC and η discussed above: (1) The general characteristic of PMC and η of being artificially attenuated is solved; RAC and EAC can reach the extremes of correlation also when the number of categories of the variables is not equal. (2) The effect of low variance in g caused by extreme division of cases of subpopulations (or item difficulty) in g is solved; RAC and EAC can reach the perfect 1 irrespective of the difficulty and variance in g and, by solving this challenge, the radical deflation related to items with extreme difficult level is solved. (3) The effect of the number of categories in g is solved; RAC and EAC can reach value 1 irrespective of the number of categories in g. (4) The effect of small number of categories in X is solved, RAC and EAC can reach value 1 irrespective of the number of categories in X. (5) The latent variable is not a challenge; RAC and EAC can reach value 1 irrespective of the form of the variable latent to X and g.

This article does not study further the characteristics of RAC, EAC, and E2AC; simulations of their behavior would be beneficial (see comparison of RAC, and EAC in Metsämuuronen 2022a). However, these estimators reach the value 1 when the maximum possible value of \(\rho_{gX}\) and \(\eta_{g|X}\) is achieved, that is, when the item and score are in the same order. Value 0 is obtained when the observed correlation is 0. RAC and EAC also can reach negative values; because the maximum possible value is always positive, the value of RAC is negative when the observed \(\rho_{gX}\) is negative (see the next section). Hence, unlike PMC and η, RAC and EAC reach the limits of correlation (\(- 1 \le R_{AC} ,E_{AC} \le + 1\)) also when the number of categories in two variables is not equal.

4.5 Related result 4: η can reach negative values

Traditionally, η does not reach negative values. This is caused by the traditional formulae based on squares or variances which are always positive. However, in the form in Eq. (4), η can reach negative values; if the mean of X in subpopulation 1 in g is lower than in subpopulation 0, (\(\overline{X}_{X1} < \overline{X}_{X0}\)), the factual correlation is negative, although the traditional way of calculating η will lead to signal (falsely) a positive association. Then, the value of η we usually see is, in fact, the absolute value of the estimate of the association between two variables. Hence, coefficient η calculated by using the traditional formulae gives us just the magnitude of the association and not necessarily the true association. As a parallelism, the traditional way of thinking and calculating η seems to follow the same logic as with the coefficient phi (φ) originally suggested as “mean square contingency” by Pearson (1904, p. 6) for a 2 × 2 cross-table: \(\varphi^{2} = \frac{{\chi^{2} }}{N} \Rightarrow \varphi = \sqrt {\frac{{\chi^{2} }}{N}}\). The original definition leads us to the positive values, even though the factual association could be negative. The negative values are found by going back to the basic elements in the calculation: \(\varphi = \frac{AD - BC}{{\sqrt {\left( {A + B} \right)\left( {C + D} \right)\left( {A + C} \right)\left( {B + D} \right)} }}\) (Yule 1912; see Glen 2016; see also Gibbons 1993).

Hence, to obtain the true value of eta, Eq. (4) can be used in the binary or dichotomous case. For the polytomous ordinal case, because of the close relation between PMC and \(\eta \left( {g\left| X \right.} \right)\), we could use the sign of PMC to indicate both the direction and magnitude of η

$$\eta \left( {g\left| X \right.} \right) = sign(PMC) \times \left| {\eta \left( {g\left| X \right.} \right)} \right|.$$
(18)

Numeric examples will clarify these results.

5 Numeric examples of artificial mechanical attenuation and negative values in eta and eta squared

5.1 Examples of attenuation in η and η2 and related attenuation corrections

As a simple numeric example of RAC, EAC, and E2AC, assume a test with a score X and five items (g1–g5) as part of a longer test (or five conditions of different proportions of males and females in an independent variable) with descendent levels of proportion of 1 s (pi) in gi as in Table 2. Two of the items (g1 and g5) discriminate the lower performing test takers from the higher performing ones (or males from females) in a deterministic manner. This is indicated as the perfect correlation by G and D in Table 2, and it would be obtained also using polychoric correlation coefficient albeit asymptotically.

Table 2 Hypothetic binary dataset with the descendent proportion of 1 s

Notably, an apparent challenge with the extreme items g1 and g5 is that the extreme division of the responses in g causes that only the extreme values in X are reached and, hence, this may cause the reduced eta and eta squared. However, if letting all 1 s in g5, as an example, be related to every value in X (10 different categories) but keeping the proportion of 0 s and 1 s the same as it was (8.3% of the cases being 1s) we would have a dataset with 146 test takers with 134 0s and we would obtain almost zero correlation between variables g5 and X; this is easy to confirm by forming such a dataset. That is, the reduction in correlation has not to do with non-extreme X scores and the problem remains—if not exacerbates—with a larger sample size. Because of the deflation, neither \(\rho_{gX}^{Obs}\) and \(\eta_{gX}^{Obs}\) cannot reach the perfect correlation, and the maximal possible correlations \(\rho_{gX}^{Max}\) and \(\eta_{gX}^{Max}\) differ item-wise. In items g2 − g4, the patterns include stochastic error to different extent.

The right-hand side of Table 2 shows the pattern in single items leading to maximal correlations; with each g, both g and X are in the same order leading to maximal covariance.

Three points are highlighted based on Table 2. First, with deterministic patters, RAC, EAC, and E2AC can reach the extreme value (RAC = EAC = E2AC = 1) in the same manner as G and D do. Second, in comparison with RAC and EAC, the deflation in PMC and η varies 0.03–0.56 units of correlation. The latter indicates a notable deflation in correlation, and it has a strict effect on the deflation in η2: 0.03–0.80 units of explaining power. In the case of g1, η2 was able to reach only 20% of the total variance in the score because of deflation. Third, the loss of information in η2 is not symmetric; based on Table 2, we see somewhat more loss of information when the proportion of 1s is extremely high than when it is extremely low. These need in-depth studies when exploring the boundaries of RAC, EAC, and E2AC.

5.2 Simple example of detecting negative values of eta

As a simple numeric example of the difference between the traditional formulae (Eqs. 2 and 3) and the better formula of η (Eqs. 4 and 18), let us consider a set of variables with the deterministic association as in Table 3. Of the two binary variables A1 and A2, one correlates positively with X, while the other correlates negatively. Similarly, of the two polytomous variables B1 and B2, one correlates positively with X, while the other correlates negatively. Notably, because of deterministic patterns, the correlations in variables A1 and B1 are the maximum possible values that \(\rho_{gX}\) and \(\eta \left( {g\left| X \right.} \right)\) can reach given the observed g and X because gi and X are in the same order. The scale in B1 and B2 may be ordinal or nominal.

Table 3 Negative and positive eta in a hypothetic dataset with deterministic patterns

In the binary case with positive association (A1), \(\eta \left( {g\left| X \right.} \right)\) acquired both in the traditional (Eq. 2) and with the enhanced formula (Eq. 4) equals PMC. The difference is with negative association (A2) where PMC and Eq. (14) reach the real correlation (− 0.837), while the traditional formula gives us an absolute value (+ 0.837). The outcome is obvious because of Eq. (5): in the binary case, \({\text{PMC}} = \rho_{PB} = \rho_{gX} \equiv \eta \left( {g\left| X \right.} \right)\). The same is seen also with polytomous items, however, so that Eq. (4) cannot be used because of a lack of proper formula; PMC =  − 0.851 while \(\left| {\eta \left( {g\left| X \right.} \right)} \right| = + 0.878\) and \(\left| {{\text{PMC}}} \right| < \left| {\eta \left( {g\left| X \right.} \right)} \right|\), as known from Eq. (6). In this case, however, we could use Eq. (14): the sign of PMC indicates the direction of η: \(\eta \left( {g\left| X \right.} \right) = sign(PMC) \times \left| {\eta \left( {g\left| X \right.} \right)} \right|\), that is, \(\eta \left( {g\left| X \right.} \right) = - 0.878\).

Notably, in Table 3, in comparison with estimators that can detect the deterministic patterns and strictly indicate the proportion of the logically ordered cases in g after they are ordered by X (here, D and G), \(\eta \left( {g\left| X \right.} \right)\) underestimates the association and the explaining power in an obvious manner (\(\eta_{g\left| X \right.}^{2} = 0.70 - 0.84\)). From this viewpoint, the attenuation-corrected explaining powers by \(\rho_{AC}^{2}\) = \(\eta_{AC}^{2}\) = 1 as well as G2 = D2 = 1 intuitively feel to be an interesting subject to study more. Notably also, the estimates by \(\eta \left( {X\left| g \right.} \right)\) do not refer to the characteristics of g but to those of X. Because of no changes in X and because, in all items, the two tied scorers with X = 15 are both from the subpopulation 0 in all items and, hence, no crossing observations between the categories, the values are intact even if the patterns in g differ item-wise.

Until we have more accurate forms of η for a polytomous ordinal case, because of the algebraic connection of PMC and \(\eta \left( {g\left| X \right.} \right)\) (see Eqs. 4 and 5), a possible short-cut for the sign of η is to use the sign of PMC obtained for the same variables (Eq. 18). Hence, η would be estimated in the traditional way but, on the side, PMC also is calculated, and the sign of PMC is given to η to point the direction of the association. This may serve as an intermediate solution to reach the real η in the case that g has an ordinal scale; with truly nominal polytomous g, PMC does not have a meaningful interpretation. Some limits of this option are studied in what follows.

5.3 Examples with convex and concave patterns

Being used as a measure for non-linear association, the behavior of η is also illustrated with convex and concave patterns based on 5-point ordinal variables. The convex pattern could lead us to obtain η with a positive sign, while the concave pattern would lead to η with a negative sign. This is not, however, true if we use the sign of PMC as an indicator of the sign of η as seen in Table 4 and a set of illustrations in Fig. 4.

Table 4 Negative and positive values of eta in hypothetic convex and concave patterns
Fig. 4
figure 4

Convex and concave patterns

Table 4 includes three sets of polytomous ordinal variables (g1 to g5) with a common X. Variables g1 and g2 represent symmetric convex and concave situations where PMC = G = D = 0. Variables g3 and g4 represent non-symmetric convex and concave situations where \(\left| {{\text{PMC}}} \right|\) is identical although opposite in direction. Variable g5 represents convex patterns where PMC has a negative sign.

The first thing to note from Table 4 is that high values of η do not indicate that η detects the curvilinear pattern better than other coefficients but, instead, the fact that η is sensitive to the small number of crossing pairs between variables. The same magnitudes of η would be obtained if the pattern would have been strictly linear, but the pattern of paired cases would be identical. Second, to simplify the assessment of convexity of the pattern, curvilinearity is indicated by a function with second power—notably, a function with third power would fit better to the patterns (compare graphs for g5 in Fig. 4). From the function of the second power, convexity is easy to note as the positive sign of the second derivate of function (\(f^{\prime\prime}\left( X \right) > 0\)), while the concave pattern gives a negative sign (\(f^{\prime\prime}\left( X \right) < 0\)). Notably, the complexity comes from the fact that the pattern in variable g5, as an example, is both convex and concave; convex when X = 1–6 and concave when X = 7–10 (see the right-hand side graph related to g5 in Fig. 4). From this viewpoint, the second derivate of g5 is worth noting: while \(f^{\prime\prime}\left( X \right) = 2 \times 0.1231 = 0.2462 > 0\) indicates convex patterns, PMC indicates negative correlation that seems to point to a negative \(\eta \left( {g\left| X \right.} \right)\). This indicates that the possible negative or positive sign of η is not an indicator of convexity or concavity in the pattern.

In the graphs in Fig. 4, the explaining powers (\(\rho_{gX}^{2}\)) are calculated for PMC and curvilinear association based on residuals related to the non-linear prediction. Hence, η2 is not seen in the graphs. Notably, in the case of symmetric convex and concave patterns where PMC equals 0 (items g1 and g2), the short-cut method cannot be used to indicate whether η should have a positive or negative sign. In the case of non-symmetric patterns (g3 and g4), the sign of PMC may indicate the sign of η properly (based on the close connection of the forms in Eqs. 4 and 5).

6 Conclusions and limitations

6.1 General discussion

The main result concerning coefficients η and η2 is that their values are artificially and systematically attenuated in an obvious manner. This is a known from the identity of \(\eta \left( {g\left| X \right.} \right)\) and PMC, of which the latter tends to include notable mechanical error in estimates of correlation leading to mechanical attenuation or deflation when the number of categories in the variables is not equal. The magnitude of deflation in \(\eta \left( {g\left| X \right.} \right)\) and PMC may be remarkable, specifically when the division of cases in subpopulations on g or difficulty level in g is extreme.

Because of the obvious deflation, it is worth noting the basic deficiency in \(\eta_{g\left| X \right.}^{2}\) as an indicator of explaining power in practical settings related to GLM. In terms of explaining power, in some cases, as much as 50%—or even more—of the information may get lost, that is, η2 cannot even reach more than 50–20% of the remaining variance in the other variable. With η2 and \(\eta_{p}^{2}\) as well as with \(\rho_{p}^{2}\), this leads us to conclude that the attribute for the explaining power, “the proportion of remaining variance in the dependent variable” should be rephrased as “the proportion of remaining variance of which the coefficient can reach in the dependent variable”.

The identity of PMC and η evokes the need of deflation correction for PMC and η. The issue is not necessarily relevant when the true association is near zero; in these settings, the bias-corrected estimators ε2, ω2, and \(\eta_{adj}^{2}\) are developed to correct the positive bias in η2. However, this article discussed the opposite challenge in η and η2: the radical negative bias. To combine these obviously contradicting views seems interesting and worth studying more. Logically, because η is connected to PMC, it always underestimates the correlation—even near zero—and this is a strict characteristic of PMC and η and this has nothing to do with the proposed attenuation-corrected η. A possible direction to go to seek the answer to this practical challenge may be related to the traditional practicality in estimation of not to consider the negative values as real values; near zero, the factual values of eta may be negative ones, and this squared make them always positive which, apparently, is seen as an apparent or real overestimation in explaining power. Simulations with the near-zero correlations from a viewpoint of negative values may enrich our knowledge of the phenomenon.

A simple option to correct attenuation-corrected coefficients for dichotomous and ordinal-scaled g by proportioning the observed estimates by PMC and \(\eta \left( {g\left| X \right.} \right)\) with the maximum possible estimates given the dataset. This kind of correction, leading to attenuation-corrected \(\rho_{gX}\) (\(R_{AC} = {{\rho_{gX}^{Obs} } \mathord{\left/ {\vphantom {{\rho_{gX}^{Obs} } {\rho_{gX}^{Max} }}} \right. \kern-\nulldelimiterspace} {\rho_{gX}^{Max} }}\)) and attenuation-corrected \(\eta \left( {g\left| X \right.} \right)\) (EAC = \({{\eta_{g|X}^{Obs} } \mathord{\left/ {\vphantom {{\eta_{g|X}^{Obs} } {\eta_{g|X}^{Max} }}} \right. \kern-\nulldelimiterspace} {\eta_{g|X}^{Max} }}\)) of which the latter leads to attenuation-corrected \(\eta_{g\left| X \right.}^{2}\) (E2AC = \(\left( {{{\eta_{g|X}^{Obs} } \mathord{\left/ {\vphantom {{\eta_{g|X}^{Obs} } {\eta_{g|X}^{Max} }}} \right. \kern-\nulldelimiterspace} {\eta_{g|X}^{Max} }}} \right)^{2}\)), seems to solve most of the challenges related to attenuation and deflation in PMC, \(\eta \left( {g\left| X \right.} \right)\), and \(\eta_{g\left| X \right.}^{2}\). RAC and EAC can reach the extremes of correlation ± 1 and E2AC can reach values 0–1 even when the categories in the items differ from each other in an obvious manner and irrespective of the division of cases in subpopulations of g or difficulty level in g. In some cases, the attenuation may be corrected by 0.20–0.50 units of correlation or even more. The article did not discuss the characteristics of RAC, EAC, and E2AC in-depth and simulations of their behavior is needed (see comparisons in Metsämuuronen 2022a). Notably, the attenuation correction to η and η2 presented in the article also fits with nominal-scaled g.

6.2 Some practical possibilities to use RAC and EAC

The characteristics of RAC and EAC were not studied in-depth, although some limits were discussed. More studies in this respect would be beneficial. However, if RAC and EAC and the related maximal Rit and eta are found stable and useful tools in evaluating and correcting attenuation in correlations, the maximal possible Rit and eta given the observed dataset—if not RAC or EAC—may be worth considering reporting routinely as a related statistic with the observed correlation. In case the widths of the scales differ from each other in an obvious manner, this may help assess the magnitude of possible deflation in the observed estimates. Maybe, RAC could be considered when choosing the correction formulae for the r2 effect sizes (see, e.g., Skidmore and Thompson 2011; Vacha-Haase and Thompson 2004).

RAC and EAC have two specific applications in measurement modelling settings where the correlation between an item with a narrower scale and a score with a wider scale is of interest and where Rit and η always underestimate the true correlation. First, Rit has been classically used as an estimator of item discrimination power (see Swineford 1936; Kuder 1937; Moses 2017) and η also could be used; η would react more efficiently to the non-linearity in the association. Because Rit and η underestimate the true correlation between an item and a score in an obvious manner, RAC and EAC could be used instead to reflect closer the true association. More studies in this regard could be beneficial.

Second, on one hand, Rit is embedded in the traditional estimators of reliability of the score such as coefficient alpha (Cronbach 1951) based in Rit, Armor’s theta (Armor 1973) based on principal component loadings, McDonald’s omega total (McDonald 1970), and rho or maximal reliability (e.g., Li 1997; Li et al. 1996; Raykov 2004), both based on factor loadings. Notably, the principal component or factor loadings (λi) are, essentially, correlations between items and the score variable, that is, their essence is Rit, see, e.g., Yang 2010). On the other hand, empirical results show that, using the traditional estimators of reliability, the reliability may be underestimated as much as 0.40–0.60 units of reliability (see Gadermann et al. 2012; Metsämuuronen 2022b, c; Metsämuuronen and Ukkola 2019) because of attenuation in PMC. Hence, Metsämuuronen (2022b, c) discusses attenuation-corrected estimators of or reliability (ACER) by changing Rit or principal component- or factor loading in the estimators by attenuation-corrected estimate of Rit (or attenuation-corrected λi). Then, the attenuation-corrected coefficient alpha based on RAC would be

$$\rho_{{\alpha \_RAC{\uptheta }}} = \alpha_{RAC} = \frac{k}{k - 1}\left( {1 - \frac{{\sum\limits_{i = 1}^{k} {\sigma_{gi}^{2} } }}{{\left( {\sum\limits_{i = 1}^{k} {\sigma_{gi} \times RAC_{{i{\uptheta }}} } } \right)^{2} }}} \right)$$
(19)

and based on EAC

$$\rho_{{\alpha \_EAC{\uptheta }}} = \alpha_{EAC} = \frac{k}{k - 1}\left( {1 - \frac{{\sum\limits_{i = 1}^{k} {\sigma_{gi}^{2} } }}{{\left( {\sum\limits_{i = 1}^{k} {\sigma_{gi} \times EAC_{{i{\uptheta }}} } } \right)^{2} }}} \right),$$
(20)

where k is the number of items in the test, \(\sigma_{gi}^{2}\) refers to item variances, and \(RAC_{{i{\uptheta }}}\) and \(EAC_{{i{\uptheta }}}\) are attenuation-corrected correlations between the item i and the score variable θ. Similarly, attenuation-corrected theta based on RAC would be

$$\rho_{{TH\_RAC{\uptheta }}} = \frac{k}{k - 1}\left( {1 - \frac{1}{{\sum\limits_{i = 1}^{k} {RAC_{{i{\uptheta }}}^{2} } }}} \right)$$
(21)

and based on EAC

$$\rho_{{TH\_EAC{\uptheta }}} = \frac{k}{k - 1}\left( {1 - \frac{1}{{\sum\limits_{i = 1}^{k} {EAC_{{i{\uptheta }}}^{2} } }}} \right),$$
(22)

(22) where \(RAC_{{i{\uptheta }}}^{2}\) and \(EAC_{{i{\uptheta }}}^{2}\) are attenuation corrected principal component loadings. Correspondingly, attenuation-corrected omega total based on RAC would be

$$\rho_{{\omega \_RAC{\uptheta }}} = \frac{{\left( {\sum\limits_{i = 1}^{k} {RAC_{{i{\uptheta }}} } } \right)^{2} }}{{\left( {\sum\limits_{i = 1}^{k} {RAC_{{i{\uptheta }}} } } \right)^{2} + \sum\limits_{g = 1}^{k} {\left( {1 - RAC_{{i{\uptheta }}}^{2} } \right)} }},$$
(23)

and based on EAC

$$\rho_{{\omega \_EAC{\uptheta }}} = \frac{{\left( {\sum\limits_{i = 1}^{k} {EAC_{{i{\uptheta }}} } } \right)^{2} }}{{\left( {\sum\limits_{i = 1}^{k} {EAC_{{i{\uptheta }}} } } \right)^{2} + \sum\limits_{g = 1}^{k} {\left( {1 - EAC_{{i{\uptheta }}}^{2} } \right)} }},$$
(24)

and attenuation-corrected maximal reliability (\(\rho_{MAX}\)) based on RAC would be

$$\rho_{{MAX\_RAC{\uptheta }}} = \frac{1}{{1 + \frac{1}{{\sum\limits_{i = 1}^{k} {\left( {{{RAC_{{i{\uptheta }}}^{2} } \mathord{\left/ {\vphantom {{RAC_{{i{\uptheta }}}^{2} } {\left( {1 - RAC_{{i{\uptheta }}}^{2} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {1 - RAC_{{i{\uptheta }}}^{2} } \right)}}} \right)} }}}},$$
(25)

and based on EAC

$$\rho_{{MAX\_EAC{\uptheta }}} = \frac{1}{{1 + \frac{1}{{\sum\limits_{i = 1}^{k} {\left( {{{EAC_{{i{\uptheta }}}^{2} } \mathord{\left/ {\vphantom {{EAC_{{i{\uptheta }}}^{2} } {\left( {1 - EAC_{{i{\uptheta }}}^{2} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {1 - EAC_{{i{\uptheta }}}^{2} } \right)}}} \right)} }}}},$$
(26)

where \(RAC_{{i{\uptheta }}}^{2}\) and \(EAC_{{i{\uptheta }}}^{2}\) are attenuation-corrected factor loadings. The characteristics of the estimators 19 to 26 are not studied here (see closer Metsämuuronen 2022b, c). However, it is known that, except for binary items where RAC = EAC, the estimates based on EAC would give somewhat higher estimates than those by RAC. Also, it is known that theta maximizes alpha (e.g., Greene and Carmines 1980) and rho maximizes omega (e.g., Cheng et al. 2012) and that rho tends to overestimate reliability with small sample sizes (Aquirre-Urreta et al. 2019). Hence, using RAC with the alpha formula (Eq. 19) would give us a more conservative estimate of the reliability than the other estimators. Correspondingly, using EAC with the form of maximal reliability (Eq. 26) may lead to overestimate reliability with small sample sizes. In-depth studies of the estimators would enrich the discussion. It is good to note also the critical note by Chalmers (2017) that if the estimator of correlation does not refer to the observed variables as RPC does not, the estimator of reliability does not refer to the observed score. From this viewpoint, estimators based on RAC and EAC refer to the observed score the same manner as G and D do. Hence, their use could be justified in estimators or reliability (see Metsämuuronen 2022c).

6.3 Some possibilities to use the negative values of eta

One of the results was that coefficient η can indicate a negative association between two variables, although the traditional ways of calculating the estimates cannot detect this. This has its own value in extending our knowledge of the traditional coefficients of association. We may relevantly ask: Would it be valuable to start using a similar kind of “back to the basic elements” type of version of coefficient η as we have for the coefficient φ to indicate both the tendency and the magnitude of the association between variables and not just the magnitude? For a binary case, we can use Eq. (4), and parallel version(s) for the polytomous cases could be derived. While waiting for the derivation, Eq. (18) could be used as a shortcut to the direction of the association. Alternatively, a relevant question for the binary case is whether it would be easier, at first hand, to use PMC or, more precisely, the point-biserial correlation coefficient (Eq. 4) instead of η to obtain strictly both the magnitude as well as direction of the association?

The negative values in η may expand the usefulness of this coefficient as a descriptive statistic. First, generally, η gives a more credible estimate of association in comparison with PMC in the case of curvilinear association with ordinal g. Second, the enhanced η could be used to indicate strictly whether the curvilinear trend between the variables is, overall, ascending or descending. However, although the enhanced η tells the overall trend with higher accuracy than PMC under the condition of non-linearity in the association, it does not tell whether the pattern of association is convex or concave. Also, η seems not to reach the same accuracy of the fit as do the methods based on residuals related to the predicted values. Third, η has been used in measurement modelling settings rarely. Maybe the possibility of detecting curvilinearity in the item responses could be utilized more in the screening phase of items?

6.4 Limitations of the study

An obvious limitation of the study is the lack of further simulation of the proposed corrections of attenuation, RAC, EAC, and E2AC (see, however, Metsämuuronen 2022a). Another is that an enhanced form of η is available only for a binary case, and not for the polytomous ordinal case. A possible short-cut method for the sign of η for polytomous cases was discussed: to use the sign of PMC for the same variables as a sign for η. Until better formulae are developed, this may serve as an intermediate solution to reach the real η. However, if this method is applied, one needs to be aware of three limitations in the interpretations of the results.

First, the short-cut method makes sense only when the categorical variable is ordinal (or better); PMC does not get any relevant interpretation with truly nominal-scaled variables with polytomous categories. Second, the sign of η does not tell whether the pattern between the variables is convex or concave, but it indicates the overall (linear) trend in the pattern. In these settings, the estimate of the true association between the variables is, most probably, reached closer by η than by PMC, because the latter detects the linear association, while η is not restricted to linear correlation. Third, if PMC equals zero, irrespective of the pattern between the variables, the sign of η will always be positive, because the sign of PMC cannot be used as an indicator of the sign of η. From this point of view, testing of the null hypothesis related to \(\rho_{gX}\) = 0 is a relevant side procedure in the short-cut method; only when the estimate by PMC truly differs zero, the sign of PMC should be used in the process.