Artificial systematic attenuation in eta squared and some related consequences: attenuation-corrected eta and eta squared, negative values of eta, and their relation to Pearson correlation

In general linear modeling (GLM), eta squared (η2) is the dominant statistic for the explaining power of an independent variable. This article discusses a less-studied deficiency in η2: its values are seriously deflated, because the estimates by coefficient eta (η) are seriously deflated. Numerical examples show that the deflation in η may be as high as 0.50–0.60 units of correlation and in η2 as high as 0.70–0.80 units of explaining power. A simple mechanism to evaluate and correct the artificial attenuation is proposed. Because the formulae of η and point-biserial correlation are equal, η can also get negative values. While the traditional formulae give us only the magnitude of nonlinear association, a re-considered formula for η gives estimates with both magnitude and direction in binary cases, and a short-cut option is offered for the polytomous ones. Although the negative values of η are not relevant when η2 is of interest, this may be valuable additional information when η is used with non-nominal variables.


Introduction: coefficient eta and eta squared
, sometimes-specifically, in the early dayscalled the correlation ratio (e.g., Pearson 1911;Ayres 1920;Fisher 1925;Kelley 1935), is one of the oldest directional measures of association. Originally, Pearson proposed it as a measure of the relationship between a categorical and a continuous variable, although it can be used also with ordinal and interval-scaled variables. Fisher, however, was not convinced of the usefulness of correlation ratio and in his influential Statistical Methods for Research Workers he notea: "As a descriptive statistic the utility of the correlation ratio is extremely limited" (Fisher 1925, p. 219). However, η turned out to be the dominant measure to quantify a curvilinear association (e.g., Howell 2012; Sechrest and Yeaton 2011; see also Ayres 1920 andPeters andVan Voorhis 1940) and, as eta squared (η 2 ) and partial eta squared ( 2 p ;Cohen 1965Cohen , 1973, it turned out to be the dominant indicator of the explaining power between two variables in settings related to the analysis of variance (ANOVA) and covariance (ANCOVA) or, more generally, in general linear modeling (GLM; e.g., Cohen 1965Cohen , 1973Cohen , 1988. The rise of η 2 was influenced by the rise in the discussions about effect sizes related to the proportion of variance in the 1960s (e.g., Cohen 1965Cohen , 1969Friedman 1968;Kerlinger 1964). Even though epsilon squared (ε 2 ; Kelley 1935), omega squared (ω 2 ; Hays, 1963) or, recently, adjusted (partial) eta squared ( adj 2 p , later 2 adj ; Mordkoff 2019) 1 are suggested to be used instead of η 2 for the inferential statistic related to population (see the history, differences, and literature in Glass and  Number of publications in Google Scholar with a fixed key word "eta squared" (n = 73,800) "correlatio ratio" (n = 16,500) "omega squared" (n = 6,000) "epsilon squared" (n = 417) control "and" if n = 70,000 In the case where the association of just two variables is of interest, the attribute "partial" is not relevant. Then, Mordkoff's coefficient could be called adjusted eta squared ( 2 adj ). This notation is used in what follows. Hakstian 1969;Okada 2013;Richardson 1996), η 2 is the most often used measure of these three (see Fig. 1). From the year 1990 onwards, the trend of fixed keyword "eta squared" has rocketed against the (trend of) proportion of all publications (as indicated by a common keyword "and" in Fig. 1).
Although ω 2 and, specifically, ε 2 are shown to give less biased estimates than η 2 (see Okada 2013Okada , 2017Mordkoff 2019), some positive characteristics of η are that, first, η 2 can be interpreted in the same manner as a squared partial correlation coefficient ( r 2 p ) from multiple linear regression: as the proportion of remaining variance in the dependent variable (Mordkoff 2019). Second, while all unbiased estimates ω 2 , ε 2 , and 2 adj are prone to give negative (out-of-range) values when η 2 is near 0, specifically, with the small sample sizes (Okada 2013(Okada , 2017, the estimates by η 2 always stay within the limits although with a considerable amount of bias. The positive bias in η and η 2 is caused by the fact that even if the true means in all categories of the categorical variable are equal leading to η = η 2 = 0, variance across the sample means is rarely zero. Random differences around zero will cause the value of η 2 to be greater than zero (Mordkoff 2019). In what follows, the opposite challenge of η and η 2 is also discussed: an obvious and grave underestimation of association and explaining power.
Of many indices of association between two variables, η is a truly directional measure in the same manner as Somers delta (Somers 1962). As a directional measure, η produces two estimates of the association: one of the two variables is dependent and the other, then, is independent. In some cases, either direction may be valuable to study-both directions may be calculated to conclude which direction is more dominant (e.g., whether the attitude better explains the achievement or it is the other way around). 2 However, often only one direction interests the researcher; in GLM settings, a metric variable (X) is usually taken as the dependent variable and the grouping variable (g) is taken as independent. Then, this direction is often notated as (X|g ) which direction is named as "X dependent" in most software packages; the thinking and naming are opposite in measurement modelling settings. The peculiarity in naming and an alternative notation are discussed later. 2 An anonymous reviewer pointed correctly that the arbitrary direction may become an issue in the GLM settings from the statistical inference viewpoint: "under the general linear modeling framework, it wouldn't make sense for the categorical variable g to be the outcome of which X explains, as GLM assumes conditionally normally distributed outcomes", that is, in all conditional divisions of variables X and Y (Y|X), we should obtain a normal distribution to meet the assumptions of GLM. However, it seems that the technicalities related to calculating eta and eta squared is not based on conditionally normally distributed outcomes. That is, we may calculate eta and eta squared without assuming any distribution in the variables or outcomes and still interpret the result meaningfully; this is a relevant case when two ordinal or interval-scaled variables are of interest. However, the statistical inference requires conditional normality. Conditional normality and related diagnostic analyses of the assumptions of the modelling have traditionally been taken crucial for statistical inference and not that much in the calculation of the statistics. However, obviously, statistical inference is an important matter to discuss, because usually we are not interested in the dataset as such but to the generalize the result to the population. Because this article does not discuss the statistical inference of the coefficients, the further discussion of the issue related to the challenges in interpretation is left for the writings to come.
Traditionally, η varies from 0 to 1 where (X|g ) = 0 indicates the special case of no dispersion among the means of the metric variable X with ordinal, interval, or continuous scale in different categories in the categorical or ordinal variable g. The value (X|g ) = 1 indicates that each category in X is related to only one category in g, which presumes that there are no crossing pairs in the dataset. Technically, then, η and η 2 can reach the value 1. However, this necessitates that the "metric" variable has only as many categories as the independent variable (see Appendix). The interpretation is parallel with (g|X ) = 0 and (g|X ) = 1.
This article discusses a specific challenge related to coefficient η and its close relative, product-moment correlation coefficient (PMC; Bravais 1844; Galton 1889; Pearson 1896 onwards) often seen in the form of item-total correlation (Rit = gX ) in the measurement modelling settings: artificial systematic attenuation or deflation in the estimates of correlation. In the empirical section, it is seen that the deflation in η may be as high as 0.50-0.60 units of correlation and in η 2 as high as 0.70-0.80 units of explaining power. A deflation of this size is worth discussing and taking seriously.
Attenuation is a statistical concept that, in general, refers to underestimating the correlation between two different measures because of measurement error (Lavrakas 2008). Pearson himself was the first to offer a solution to the problem (1903) and many solutions have been offered after that (see the discussion related to the concept of restriction of range as the reason for the attenuation in Mendoza and Mumford 1987;Sacket et al. 2007;Schmidt and Hunter 2015). This matter is discussed later in Section 4.3. If the attenuation is understood broadly, the estimates by the traditional estimators of correlation are also be radically deflated caused by artificial systematic errors during the estimation (see the discussion of the terms in, e.g., Chan 2008;Gadermann et al. 2012;Lavrakas 2008;Metsämuuronen 2022a). Underestimation of a magnitude of 0.50-0.70 units of correlation (see Section 4.2) is no more attenuation as we understand it in measurement modelling-or attenuation is only a minor part of this radical deflation. However, most probably, the grave deflation also includes attenuation caused by the difference in scales. Separating these two is difficult. Hence, the term "mechanical attenuation" is used to cover both random error and deflation. Even though the focus in this article is on deflation and its correction, both the terms mechanical attenuation and deflation are in use.

Research questions and the course of study
Condensing the previous discussion, the main research questions are: 1. What is the nature of relation between η and PMC? This matter is discussed, first, based on the literature related to point-biserial correlation and, second, based on empirical dataset related to point-polyserial correlation.
2. What is the nature, algebraic reasons, and magnitude of attenuation and deflation in eta and eta squared? Because of the algebraic identity of η and PMC, η must be artificially attenuated because PMC is known to be radically deflated. The algebraic reasons are discussed in the theoretical section and Appendix, and the empirical section illustrates the magnitude. 3. How to correct the obvious deflation in PMC, η, and η 2 ? A simple procedure is suggested to assess the magnitude of the deflation and to correct it. 4. How to obtain also the negative values of η by using the traditional procedures?
The negative values of η are easy to obtain with binary g, because PMC and η are identical in dichotomous cases. For the polytomous ordinal case, a new procedure is suggested.
The course of the article can be as follows: Although η and PMC use, apparently, different information in their calculation, first, the identity of their formulae is discussed in the case of the dichotomous independent variable. Second, it is shown empirically in the polytomous ordinal case how close PMC follows a certain direction of η, namely, η directed, so that "X dependent" although, practically always, |PMC| <| (X|g ) |. This leads to a somewhat disturbing note that Pearson correlation which is usually thought to be a symmetric measure is, factually, a directional measure when the scales of two variables are not identical. Third, because of the identity of η and PMC, and because PMC is known to be deflated in an obvious manner when the scales of two variables are not equal, the magnitude of underestimation by η and η 2 is discussed and shown by examples. Fourth, a relevant option of correcting the attenuation in η and, consequently, in η 2 are discussed. Fifth, a form of η that also allows negative values is discussed. The empirical section gives numerical examples of the underestimation in η and η 2 , how to obtain negative values for η, and how the negative values of η are seen in non-linear association with convex and concave patterns.

Forms of η and different traditions of naming the directions
Above, the notation of η relevant in GLM settings was discussed. Here, an alternative thinking relevant to measurement modelling settings is discussed. In measurement modelling settings, the relevant direction is the one where the variable with a wider scale (often, a score variable X) explains the response pattern or order of the test-takers in the variable with a narrower scale (usually, a grouping variable such as sex or a test item g). This direction is relevant also in nonparametric methodology when analyzing the variables using Mann-Whitney U test (Mann and Whitney 1947) or Jonckheere-Terpstra JT test (Jonckheere 1954;Terpstra 1952) The thinking, terminology, and the notations of the directions vary between GLM settings and measurement modelling settings. This is briefly discussed below.

Different traditions in naming the directions
Assume a binary variable g and a metric variable X. Variable g could be, as an example, sex (0 = female, 1 = male or vice versa) or response in a binary test item (0 = incorrect answer, 1 = correct answer), and the metric variable may be a test score X. In the literature related to directional measures (e.g., IBM 2017; Metsämuuronen 2017; Newson 2002Newson , 2006Siegel and Castellan 1988) and, consequently, in the outputs of many generally known software packages such as IBM SPSS, SAS, as well as R libraries, the traditional direction of the analysis is the one where the categorical variable g is taken as an independent factor. This direction is called "X dependent" and it is notated as (X|g). This makes sense when η 2 is used in GLM settings where the metric variable (X) cannot explain the nominal variable, such as sex (g), and, hence, g must be an "independent" factor and, consequently, X must be a "dependent" factor.
The opposite thinking and naming of the same direction are relevant in settings related to measurement modelling and nonparametric testing. In what follows, this logic is used in the notation. In measurement modelling settings, the same factual direction as above means that the latent trait manifested as the score (X) explains the response pattern in the item (g), and the opposite direction does not make sense (e.g., Byrne 2016;Kim 1971;Metsämuuronen 2017; see the discussion of the directions with Table 3 and examples in Metsämuuronen 2020). Also, when analyzing the association of g and X with such procedures as Mann-Whitney U or Jonckheere-Terpstra JT tests, with manual calculation (see, e.g., Metsämuuronen 2017; Siegel and Castellan 1988), the subpopulations in g are first ordered by X after which the order is analyzed, that is, the order in g depends on X. In these settings, it is natural to think the relation of g and X from the condition's viewpoint as "the pattern in g depends on X", that is, "g dependent" or "g given X", usually notated as (g|X). Because of willingness to keep the discussion open for both interpretations, both viewpoints are kept equally relevant views in the text to interpret the outcome. From this on, the logic familiar from conditions is used in the notation: in what follows, (g|X ) = g|X refers to " η directed so that "g given X'" and 2 g|X refers to "η 2 directed so that 'g given X'" which would be labelled as "X dependent" in generally known software packages. The alternative notation g|X is preferred primarily for the cases that superscripts are used also as in 2 g|X , Obs g|X , or Max g|X .

Calculation of coefficient η: a simple case with dichotomous categories
To lead to the general formula, let us first consider the simple case of a dichotomous g and a metric X. Assume a metric (ordinal, interval, or continuous 3 ) variable X with observed values x j and a binary variable g with observed values y i = 0, 1. The traditional direction of η in settings related to GLM ("X dependent") can be expressed as follows: where n 0 and n 1 refer to the number of cases in subpopulations 0 and 1 in g, X X0 and X X1 refer to the means of the variable X in these subpopulations, and GM X is the grand mean of the variable X. If g is a test item with incorrect (0) and correct (1) responses, (g|X ) indicates, in general, to what extent the higher score is related to the correct answer in the item. Notably, the interpretation for the same direction by Goodman-Kruskal gamma (G = G(g|X ) ; Goodman and Kruskal 1954; Metsämuuronen 2021a, b) 4 and Somers' delta (D = D(g|X ) ; Somers 1962; on the selected direction, see Metsämuuronen 2020) is known to correspond the proper direction in the measurement modelling settings. Then, consequently, it is known that (g|X ) and related 2 g|X indicate to what extent the score variable X explains the behavior (or The "continuous" nature of the metric variable X may be worth discussing, because it may become an issue in the direction selected for the analysis. In many cases, such as in econometrical, statistical, or engineering datasets, X may be a truly continuous variable. Then, the direction (X|g ) does not make sense. However, in the contemporary settings of psychometrical testing, the score variable is very rarely totally continuous, but it is, factually, also a categorical one. To obtain a truly continuous scale would need an infinite number of items (or continuous scale in items which are very rare in the testing settings) and infinite sample size to form an infinite number of categories in the score. However, a raw score always forms a categorical (ordinal) score and, in the simplest case of a logistic transformation of the score by using item response theory (IRT) modelling, the number of categories in the score variable equals the raw score even though the "names" of the categories differ from the raw score numbers. Also, using factor analysis or related procedures, the number of categories is strictly dependent on the summed number of categories in the items; a test with ten Likert-scaled items with 5 categories means a maximum 50 categories in the scale regardless of the sample size. Then, factually, practically always, the score variable is a "categorical" variable with an unweighted of weighted ordinal scale in a sense that the names of the categories may look like they come from a continuous scale but the scale itself includes only limited number of categories that are in a weighted order. Hence, both directions in estimating eta and eta squared may make sense, because, in many cases, both directions are based on, factually, "categorical" variables. Notably, contrast to the polychoric correlation and truly non-parametric coefficients gamma and delta which, genuinely, transforms the weighted score into an ordinal scale, when calculating the values of eta and eta squared, no information of the scales is lost regardless of the direction taken. 4 Traditionally, G is taken as a symmetric coefficient, because it produces only one estimate in the same manner as PMC (e.g., IBM 2017; Sheskin 2011; Sirkin 2006; Wholey et al. 2015). However, Metsämuuronen (2021b) showed that, under certain conditions, G = D(g|X ) and not D(symmetric) or D(X|g ) . Hence, G has a hidden directional nature: when the scales of two variables are not identical, G is unambiguously directional, so that the variable with a wider scale (X) explains the order in the variable with a narrower scale (g) the same manner as in (g|X ) . Then, a proper way to notate G may be G(g|X ).
1 3 the pattern) in variable g (and not the opposite way). From the GML viewpoint, (g|X ) quantifies the proportional differences of the means of X between observations in different categories in g (e.g., males and females or between those who gave the correct and incorrect answer). From both viewpoints, the opposite direction is not meaningful. However, in some cases, both directions may be equally relevant.

Calculation of coefficient eta: a general case
Assume a nominal or ordinal variable g with observed values x i with R categories, and a metric (ordinal, interval, or continuous) variable X with observed values y i with C categories, and, often, R < < C. The traditional direction of η in settings related to GLM ("X dependent") notated here as (g|X ) can be expressed as square root of the sum of squares related to the difference between the means of X in the subpopulations in g (SS between or SS treatment ) divided by the sum of squares within the groups related to X (SS within or SS total ) (e.g., Kerlinger 1964) refers to the means of X in different categories in g, and GM X = ∑ R g=1 n g X Xg � ∑ R g=1 n g is the grand mean of X. Correspondingly, η directed so that "g is dependent" as in GLM settings or "X given g" as in settings related to conditions can be expressed as Kennedy (1970, pp. 886-887) calls this the "classical formula" "proposed" by Kerlinger (1964). It seems a somewhat late proposal considering that Ayres (1920) already refers to the ratio of variances. On the other hand, Kennedy refers also to Peters and Van Voorhis (1940) while discussing epsilon squared, even though it was proposed 5 years earlier by Kelley (1935). Maybe, Kennedy just wanted to make the point by referring to the generally known textbooks while challenging Cohen's (1965) and Friedman's (1968) formulae and thinking of eta squared.
where X gX = ∑ n X i=1 y i n X refers to the means of g in each category X and

Relation of η and PMC
In the case of the correlation between a dichotomous variable and metric variable, Eq.
(1) can be expressed in a form where g and X are the standard deviation of g and X (see the algebraic proof in Wherry and Taylor 1946;Eikeland 1971). Notably, this form is identical with the simplified form for PMC found in textbooks for binary settings (e.g., Lane et al. 2016;Lord and Novick 1968;Metsämuuronen 2017), usually referred to as pointbiserial correlation ( PB ;Kuder 1937;Swineford 1936) 6 or as item-total correlation ( gX ) for binary items in measurement modelling settings. Hence, with binary items or for example with dummy variables in GLM settings (see Cohen 1969), η directed so that "g given X" (in the measurement modelling settings) or "X dependent" (in GLM settings) equals PMC Notably, the opposite direction of η, (X|g ) do not lead to the form of PMC. Although the proof in the binary case is straightforward and simple, it is not trivial to derive -if not even possible-in the polytomous ordinal case, because, in general, PMC differs from (g|X ) (see, however, formulae in Wherry and Taylor 1946). However, it is easy to show by empirical datasets that the closeness between PMC and (g|X ) in Eq. (4) is a general one: with polytomous (ordinal) variables, the magnitude of the estimates by PMC is closer to (g|X ) than to (X|g ) , however, such that always This is caused by a small difference between the formulae of PMC and η (see Howell 2012). While the absolute value of PMC can be simplified as the absolute value of η is If the association between two variables is perfectly linear, which in general is a rare special case but which is always the case with the binary and dichotomous g, the predicted value by the regression model ( X ij ) equals the means of X in the subpopulations of g ( X Xg ). In that specific case, In any other condition, that is, practically always ) are constructed to be minimum and, hence, would always be greater than the minimum, causing The factual difference between (g|X ) and PMC may be subtle depending on the degree of non-linear nature of the association. In the empirical dataset related to scores (X) and ordinal items (g) in measurement modelling settings with finite or (9) | (g|X )| > |PMC|. small sample sized (n ≤ 200) with 14,880 estimates of correlation between items and score variables, 7 the difference between (g|X ) and PMC appeared to be 0.007 units of correlation, while the difference between (X|g ) and PMC appeared to be 0.159 units of correlation on average (Tables 1 and Fig. 2). The former difference seems to be relatively small. Hence, in what follows, the value of PMC is taken as a benchmark for the values of (g|X ) also in polytomous (ordinal) cases.
1) g|X = eta directed, so that "g given X" (as conditions) or "X dependent" (as in GLM) 2) X|g = eta directed, so that "X given g" (as conditions) or "g dependent" (as in GLM)

Related result 1: Pearson correlation is a directional measure
Traditionally, PMC is taken as a symmetric measure, because it produces only one estimate for the association (e.g., Walk and Rupp 2010). A consequence related to Eqs. (4) and (5) with a binary and metric variable and empirical findings with polytomous case, the classic Pearson correlation is not a symmetric measure of association as has been traditionally thought. Instead, PMC has a hidden directional nature in the same direction as G (= G(g|X ) ; Metsämuuronen 2021b), such that the variable with a wider scale (X) explains the response pattern or order in the variable with a narrower scale (g) and not the opposite way nor symmetrically (or as "X dependent" as in GLM settings). Then, because (g|X ) ≠ (X|g ) correlation df(g) = R-1 PMC Pearson product-moment correlation η(g|X) "g given X" or "X dependent" (GLM) η(X|g) "X given g" or "g dependent" (GLM) This connection of PMC and η is not elaborated on here except to the extent needed for proposing a new type of attenuation correction to η and η 2 and a short-cut to reach the negative values of η in a polytomous g with an ordinal scale.

Related result 2: η and η 2 underestimate the association and explaining power in an obvious manner
The identity and relation of PMC and η make it clear why the estimates by η and η 2 must underestimate the true association and the true explaining power in an obvious manner. The attenuation in PMC and, consequently, in η and η 2 is artificial and systematic in nature and may be partly related to the phenomenon called restriction of range or range restriction (RR; see literature in Meade 2010; Sackett and Yang 2000;Sackett et al. 2007;Schmidt et al. 2008;Schmidt and Hunter 2015;Walk and Rupp 2010). Pearson himself was the first to offer a solution to the problem (1903) and many solutions have been offered to correct the attenuation in X variable (see the typology in Sacket et al. 2007; see also Mendoza and Mumford 1987). However, even if there is no manifestation of RR in X, PMC itself is very vulnerable to several sources of mechanical error in the estimates of correlation (MEC; see simulations in Metsämuuronen 2021aMetsämuuronen , 2022a. Notably, the reason for the deflation is not in the imperfect scales; if two variables both are measured by 5-point Likert scale reflecting continuous latent variables, but all the cases are in the diagonal of the crosstable, PMC = 1 without error. The question is, why the correlation cannot be 1 if one scale is 5-point scale and the other 4-or 6-point scale? This is no more a matter of attenuation but caused by technical reasons. In general, it is known that the number of categories, among others, influences the magnitude of the estimates by PMC (see simulations in Martin 1973Martin , 1978Olsson 1980). In practical terms, assume two identical continuous variables with obvious perfect correlation. Let one be dichotomized (g) and the other polytomized (X). Under this condition, PMC and, consequently, (g|X ) cannot reach the (obvious) perfect (latent) correlation (see simulations in Metsämuuronern, 2021a, 2022a; see the algebraic reasons in Metsämuuronen 2016; see also Appendix). As an example, let us take a vector of 1000 normally distributed cases, dublicate it, truncate the one version into a form of ordinal variable (g) with three categories (0, 1, 2; df(g) = 2) and the other into a form of 21 ordinal categories (X). Let the proportion of the three categories be so that the 'difficulty level' is p = 0.15, that is, most of the cases fall into the categories 0 or 1. In the case, the highest value of the observed PMC is around 0.745 and of the observed (g|X ) = 0.749 even if the latent variables, obviously, correlated perfectly ( = 1 ). Notably, such estimators of correlation as polychoric correlation (RPC) and G detect the perfect association and D almost perfectly ( D(g|X ) = 0.985 ; see Fig. 3).
By modifying the example related to the obvious perfect latent correlation and binary g, Metsämuuronen (2021aMetsämuuronen ( , 2022a showed that PMC and, consequently, η are sensitive to, at least, five sources of artificial systematic attenuation causing deflation in the estimates. First, if there is a discrepancy in the scales, in general, η tends to underestimate the true association between g and X. Second, η is sensitive to the (10) PMC ≅ (g|X ) ≠ (X|g ).
number of categories in g; the true association tends to be underestimated more the less categories there are in g. Third, η is sensitive to the distribution of the latent variable; the true association tends to be underestimated more when the distribution of the latent variable is normal or skewed than when it is even. Fourth, η is sensitive to the division of subpopulations in g (or item difficulty in the measurement modelling settings); true association is underestimated more, the more extreme is the division (or the difficulty level) in g. Fifth, η is sensitive to the number of categories in X; the true association tends to be underestimated more the less categories there are in X.
Notably, (g|X ) can reach the perfect correlation only in one specific theoretical case discussed above: that the variances of X in the subpopulations of g are equally zero.In the binary case, this can be inferred strictly from an alternative form of (g|X ) (see Appendix) where 2 X1 and 2 X0 are the variances of X in the subpopulations 1 and 0. From Eq. (11), it is seen that (g|X ) and, consequently, 2 g|X can reach the perfect 1 only in the theoretical case of 2 X1 = 2 X0 = 0. 8 This condition implies that each category in X is connected to only one category in g without crossing pairs between the variables, that is, there are equal number of categories in g and X. This is, however, usually not true when η and η 2 are used in normal settings related to η 2 . In the binary case, the highest magnitude in Eq. (20) is achieved when p = 0.5 and, assuming symmetric distribution of X, (see Appendix).
The numeric examples below show that the underestimation in η and η 2 may be notable. For instance, in deterministic patterns where G and D can detect (correctly) the extreme association (G = D = 1), the maximum magnitudes of (g|X ) = 0.701 − 0.769 leading to 2 g|X = 0.491 − 0.591 (see Tables 2 and 3 below in Section "Numeric examples…") indicate directly how much η and η 2 underestimate the association and the explaining power. In terms of explaining power, around 50% of the information is lost, that is, η 2 cannot even reach more than 50-60% of the remaining variance in the other variable. With η 2 and 2 p as well as with 2 p , this leads us to conclude that the explaining power traditionally defined as "the proportion of remaining variance in the dependent variable" means, factually, "the proportion of remaining variance of which the coefficient can reach in the dependent variable". This relates with the note by Hays (1963, p. 505; see also Richardson 1996) that the proportion of the total variation in the dependent variable is what can be predicted or explained based on its regression on the independent variable within the sample being studied. This phenomenon is discussed later with numerical examples.
The severity of mechanical attenuation or deflation in η and η 2 comes from the fact that, in the binary case, Eq. (11) implies that the loss of information in (g|X ) and 2 g|X approximates 100% when the division of subpopulations (or, difficulty level) is extreme and, hence, the variance in g approximates zero irrespective of the fact that the true correlation between the variables may be perfect. This underestimation of association can be benchmarked when the behavior of directional coefficients such as D and G is known: these are capable of detecting the deterministic patterns correctly (see Newson 2002;Metsämuuronen 2020Metsämuuronen , 2021a; see Fig. 3 and Tables 2 and 3 and the related discussion). 9

Related result 3: need for attenuation correction to PMC and eta
To conclude the discussion of the underestimation in the estimates by η and η 2 , a possible new kind of attenuation correction in PMC, η, and η 2 is discussed. The traditional coefficients suggested to correct the inaccuracy of η 2 , that is, ε 2 , ω 2 , and 2 adj were developed to correct the positive bias in η 2 , specifically, with values near zero (see Okada 2013Okada , 2017; Mordkoff 2019). However, because PMC and η, in general, tend to be severely affected by mechanical error causing bias toward zero, it may be worth developing potential correction factors that may correct also the negative bias in η and η 2 .
Many corrections of attenuation for PMC are available (see a typology in Sacket and Yang 2000), although these are developed for different purposes than what is discussed here. Possible corrections for attenuation have been studied from Pearson (1903) and Spearman (1904) onward. The well-known corrections based on works of Pearson (1903; see also the notes by Aitken 1934 andLawley 1943) and Thorndike (1949) are based on correcting the error when a restriction has occurred in one variable (usually in X). The idea is to enhance the concurrent validity of the test score of this restricted sample by altering it either by understanding or modelling the behavior of unrestricted population variance (see the mechanics in, e.g., Sacket and Yang 2000; Schmidt et al. 2008). This approach seems not the best option in correcting the deflation in the settings related to η. Another type of simple option for the correction of the mechanical attenuation is discussed below.
An essential characteristic of PMC and η is that, given the observed dataset, their maximum values are fixed. For PMC, because of the basic formula of PMC ( gX = gX g X ) and given the observed values of 2 g and 2 X , the only element affecting the magnitude of correlation is the covariance between g and X ( gX ). The maximum value of gX is obtained when g and X are in the same order. Hence, the maximal possible correlation ( Max gX ) in the given set of variables is Max gX = Max gX g X 9 Possible explanations for the obviously contradicting results between the earlier researchers of the overestimation by eta squared and the results of the radical underestimation discussed in this article may disturb the reader. The discrepancy could be explained by two distinct quantities estimated: eta squared may reflect the association between the observed measured variables, while the new estimator may reflect the association between latent continuous versions of these variables. To confirm this, we need more theoretical and empirical works. Another, simpler, explanation is that while PMC was originally created for the association of two continuous variables, the way we use it today with variables with different scales and specifically with binary and metric variables is not the intended environment for PMC and, hence, mechanical error in the estimates. The correction suggested in this article is one option to rectify the deflation.
(see Metsämuuronen 2022c). For (g|X ) , because of Eq. (8), the only element affecting the magnitude of the estimate is the sum of squared deviances of the score in the subsamples in g, SS Error = ∑ � x ij − X Xg � 2 ; the smaller is this element, the higher gets the value of (g|X ) . Given the number or proportions of cases in the subpopulations in g, the minimum value of ∑ � x ij − X Xg � 2 is obtained when X is in order and all cases in a subpopulation in g are as close as possible to each other. One of these options is the condition that both g and X are in the same order irrespective of the nominal or ordinal nature of g. Hence, the same logic of finding the maximum correlation can be used with PMC and (g|X ) irrespective of the nature of scale in (g|X ).
A simple attenuation correction related to gX ( AC , later R AC ) is the proportion of the observed correlation ( Obs gX ) with Max gX given the observed g and X (see Metsämuuronen 2022c). Similarly, the attenuation-corrected η ( AC , later E AC ) is the proportion of observed eta ( Obs g|X ) and the maximal value ( Max g|X ), which, in the binary case, is the maximum value of PMC Consequently, attenuation-corrected η 2 ( 2 AC , later E2 AC ) is In polytomous ordinal g, the latter part of formulae (16) and (17) leads to a mild overestimation in practical settings, because the absolute magnitude of the estimates by PMC is somewhat lower than the ones by g|X (see Eq. 9). However, in empirical datasets, the difference between g|X and PMC may be nominal (see above, 0.007 units of correlation), while the attenuation may raise as high as 0.13 to 0.99 units of correlation depending on the characteristics of g (see Metsämuuronen, 2021aMetsämuuronen, , 2022a. Obviously, using Max gX as a benchmark would not make sense with genuinely nominal-scaled polytomous g. Numeric examples of R AC , E AC , and E2 AC are given later. Equations (15) and (16) seem to solve all five challenges of PMC and η discussed above: (1) The general characteristic of PMC and η of being artificially attenuated is solved; R AC and E AC can reach the extremes of correlation also when the number of categories of the variables is not equal.
(2) The effect of low variance in g caused by extreme division of cases of subpopulations (or item difficulty) in g is solved; R AC and E AC can reach the perfect 1 irrespective of the difficulty and variance in g and, by solving this challenge, the radical deflation related to items with extreme difficult level is solved. (3) The effect of the number of categories in g is solved; R AC and E AC can reach value 1 irrespective of the number of categories in g. (4) The effect of small number of categories in X is solved, R AC and E AC can reach value 1 irrespective of the number of categories in X. (5) The latent variable is not a challenge; R AC and E AC can reach value 1 irrespective of the form of the variable latent to X and g. This article does not study further the characteristics of R AC , E AC , and E2 AC ; simulations of their behavior would be beneficial (see comparison of R AC , and E AC in Metsämuuronen 2022a). However, these estimators reach the value 1 when the maximum possible value of gX and g|X is achieved, that is, when the item and score are in the same order. Value 0 is obtained when the observed correlation is 0. R AC and E AC also can reach negative values; because the maximum possible value is always positive, the value of R AC is negative when the observed gX is negative (see the next section). Hence, unlike PMC and η, R AC and E AC reach the limits of correlation ( −1 ≤ R AC , E AC ≤ +1 ) also when the number of categories in two variables is not equal.

Related result 4: η can reach negative values
Traditionally, η does not reach negative values. This is caused by the traditional formulae based on squares or variances which are always positive. However, in the form in Eq. (4), η can reach negative values; if the mean of X in subpopulation 1 in g is lower than in subpopulation 0, ( X X1 < X X0 ), the factual correlation is negative, although the traditional way of calculating η will lead to signal (falsely) a positive association. Then, the value of η we usually see is, in fact, the absolute value of the estimate of the association between two variables. Hence, coefficient η calculated by using the traditional formulae gives us just the magnitude of the association and not necessarily the true association. As a parallelism, the traditional way of thinking and calculating η seems to follow the same logic as with the coefficient phi (φ) originally suggested as "mean square contingency" by Pearson (1904, p. 6 Numeric examples will clarify these results.

Examples of attenuation in η and η 2 and related attenuation corrections
As a simple numeric example of R AC , E AC , and E2 AC , assume a test with a score X and five items (g1-g5) as part of a longer test (or five conditions of different proportions of males and females in an independent variable) with descendent levels of proportion of 1 s (p i ) in g i as in Table 2. Two of the items (g1 and g5) discriminate the lower performing test takers from the higher performing ones (or males from females) in a deterministic manner. This is indicated as the perfect correlation by G and D in Table 2, and it would be obtained also using polychoric correlation coefficient albeit asymptotically.
Notably, an apparent challenge with the extreme items g1 and g5 is that the extreme division of the responses in g causes that only the extreme values in X are reached and, hence, this may cause the reduced eta and eta squared. However, if letting all 1 s in g5, as an example, be related to every value in X (10 different categories) but keeping the proportion of 0 s and 1 s the same as it was (8.3% of the cases being 1s) we would have a dataset with 146 test takers with 134 0s and we would obtain almost zero correlation between variables g5 and X; this is easy to confirm by forming such a dataset. That is, the reduction in correlation has not to do with nonextreme X scores and the problem remains-if not exacerbates-with a larger sample size. Because of the deflation, neither Obs gX and Obs gX cannot reach the perfect correlation, and the maximal possible correlations Max gX and Max gX differ item-wise. In items g2 − g4, the patterns include stochastic error to different extent.
The right-hand side of Table 2 shows the pattern in single items leading to maximal correlations; with each g, both g and X are in the same order leading to maximal covariance.
Three points are highlighted based on Table 2. First, with deterministic patters, R AC , E AC , and E2 AC can reach the extreme value (R AC = E AC = E2 AC = 1) in the same manner as G and D do. Second, in comparison with R AC and E AC , the deflation in PMC and η varies 0.03-0.56 units of correlation. The latter indicates a notable deflation in correlation, and it has a strict effect on the deflation in η 2 : 0.03-0.80 units of explaining power. In the case of g1, η 2 was able to reach only 20% of the total variance in the score because of deflation. Third, the loss of information in η 2 is not symmetric; based on Table 2, we see somewhat more loss of information when the proportion of 1s is extremely high than when it is extremely low. These need indepth studies when exploring the boundaries of R AC , E AC , and E2 AC .

Table 2
Hypothetic binary dataset with the descendent proportion of 1 s

Simple example of detecting negative values of eta
As a simple numeric example of the difference between the traditional formulae (Eqs. 2 and 3) and the better formula of η (Eqs. 4 and 18), let us consider a set of variables with the deterministic association as in Table 3. Of the two binary variables A1 and A2, one correlates positively with X, while the other correlates negatively. Similarly, of the two polytomous variables B1 and B2, one correlates positively with X, while the other correlates negatively. Notably, because of deterministic patterns, the correlations in variables A1 and B1 are the maximum possible 1 3 values that gX and (g|X ) can reach given the observed g and X because g i and X are in the same order. The scale in B1 and B2 may be ordinal or nominal.
In the binary case with positive association (A1), (g|X ) acquired both in the traditional (Eq. 2) and with the enhanced formula (Eq. 4) equals PMC. The difference is with negative association (A2) where PMC and Eq. (14) reach the real correlation (− 0.837), while the traditional formula gives us an absolute value (+ 0.837). The outcome is obvious because of Eq. (5): in the binary case, PMC = PB = gX ≡ (g|X ) . The same is seen also with polytomous items, however, so that Eq. (4) cannot be used because of a lack of proper formula; PMC = − 0.851 while | (g|X )| = +0.878 and |PMC| < | (g|X )| , as known from Eq. (6). In this case, however, we could use Eq. (14): the sign of PMC indicates the direction of η: (g|X ) = sign(PMC) × | (g|X )| , that is, (g|X ) = −0.878.
Notably, in Table 3, in comparison with estimators that can detect the deterministic patterns and strictly indicate the proportion of the logically ordered cases in g after they are ordered by X (here, D and G), (g|X ) underestimates the association and the explaining power in an obvious manner ( 2 g|X = 0.70 − 0.84 ). From this viewpoint, the attenuation-corrected explaining powers by 2 AC = 2 AC = 1 as well as G 2 = D 2 = 1 intuitively feel to be an interesting subject to study more. Notably also, the estimates by (X|g ) do not refer to the characteristics of g but to those of X. Because of no changes in X and because, in all items, the two tied scorers with X = 15 are both from the subpopulation 0 in all items and, hence, no crossing observations between the categories, the values are intact even if the patterns in g differ item-wise.
Until we have more accurate forms of η for a polytomous ordinal case, because of the algebraic connection of PMC and (g|X ) (see Eqs. 4 and 5), a possible short-cut for the sign of η is to use the sign of PMC obtained for the same variables (Eq. 18). Hence, η would be estimated in the traditional way but, on the side, PMC also is calculated, and the sign of PMC is given to η to point the direction of the association. This may serve as an intermediate solution to reach the real η in the case that g has an ordinal scale; with truly nominal polytomous g, PMC does not have a meaningful interpretation. Some limits of this option are studied in what follows.

Examples with convex and concave patterns
Being used as a measure for non-linear association, the behavior of η is also illustrated with convex and concave patterns based on 5-point ordinal variables. The convex pattern could lead us to obtain η with a positive sign, while the concave pattern would lead to η with a negative sign. This is not, however, true if we use the sign of PMC as an indicator of the sign of η as seen in Table 4 and a set of illustrations in Fig. 4. Table 4 includes three sets of polytomous ordinal variables (g1 to g5) with a common X. Variables g1 and g2 represent symmetric convex and concave situations where PMC = G = D = 0. Variables g3 and g4 represent non-symmetric convex and concave situations where |PMC| is identical although opposite in direction. Variable g5 represents convex patterns where PMC has a negative sign. Behaviormetrika (2023) 50:27-61 The first thing to note from Table 4 is that high values of η do not indicate that η detects the curvilinear pattern better than other coefficients but, instead, the fact that η is sensitive to the small number of crossing pairs between variables. The same magnitudes of η would be obtained if the pattern would have been strictly linear, but the pattern of paired cases would be identical. Second, to simplify the assessment of convexity of the pattern, curvilinearity is indicated by a function with second power-notably, a function with third power would fit better to the patterns (compare graphs for g5 in Fig. 4). From the function of the second power, convexity is easy to note as the positive sign of the second derivate of function ( f �� (X) > 0 ), while the concave pattern gives a negative sign ( f �� (X) < 0 ). Notably, the complexity comes from the fact that the pattern in variable g5, as an example, is both convex and concave; convex when X = 1-6 and concave when X = 7-10 (see the right-hand side graph related to g5 in Fig. 4). From this viewpoint, the second derivate of g5  is worth noting: while f �� (X) = 2 × 0.1231 = 0.2462 > 0 indicates convex patterns, PMC indicates negative correlation that seems to point to a negative (g|X ) . This indicates that the possible negative or positive sign of η is not an indicator of convexity or concavity in the pattern. In the graphs in Fig. 4, the explaining powers ( 2 gX ) are calculated for PMC and curvilinear association based on residuals related to the non-linear prediction. Hence, η 2 is not seen in the graphs. Notably, in the case of symmetric convex and concave patterns where PMC equals 0 (items g1 and g2), the short-cut method cannot be used to indicate whether η should have a positive or negative sign. In the case of non-symmetric patterns (g3 and g4), the sign of PMC may indicate the sign of η properly (based on the close connection of the forms in Eqs. 4 and 5).

General discussion
The main result concerning coefficients η and η 2 is that their values are artificially and systematically attenuated in an obvious manner. This is a known from the identity of (g|X ) and PMC, of which the latter tends to include notable mechanical error in estimates of correlation leading to mechanical attenuation or deflation when the number of categories in the variables is not equal. The magnitude of deflation in (g|X ) and PMC may be remarkable, specifically when the division of cases in subpopulations on g or difficulty level in g is extreme.
Because of the obvious deflation, it is worth noting the basic deficiency in 2 g|X as an indicator of explaining power in practical settings related to GLM. In terms of explaining power, in some cases, as much as 50%-or even more-of the information may get lost, that is, η 2 cannot even reach more than 50-20% of the remaining variance in the other variable. With η 2 and 2 p as well as with 2 p , this leads us to conclude that the attribute for the explaining power, "the proportion of remaining variance in the dependent variable" should be rephrased as "the proportion of remaining variance of which the coefficient can reach in the dependent variable".
The identity of PMC and η evokes the need of deflation correction for PMC and η. The issue is not necessarily relevant when the true association is near zero; in these settings, the bias-corrected estimators ε 2 , ω 2 , and 2 adj are developed to correct the positive bias in η 2 . However, this article discussed the opposite challenge in η and η 2 : the radical negative bias. To combine these obviously contradicting views seems interesting and worth studying more. Logically, because η is connected to PMC, it always underestimates the correlation-even near zero-and this is a strict characteristic of PMC and η and this has nothing to do with the proposed attenuation-corrected η. A possible direction to go to seek the answer to this practical challenge may be related to the traditional practicality in estimation of not to consider the negative values as real values; near zero, the factual values of eta may be negative ones, and this squared make them always positive which, apparently, is seen as an apparent or real overestimation in explaining power. Simulations with the 1 3 near-zero correlations from a viewpoint of negative values may enrich our knowledge of the phenomenon.
A simple option to correct attenuation-corrected coefficients for dichotomous and ordinal-scaled g by proportioning the observed estimates by PMC and (g|X ) with the maximum possible estimates given the dataset. This kind of correction, leading to attenuation-corrected gX ( R AC = Obs gX Max gX ) and attenuation-corrected (g|X ) (E AC = Obs g|X Max g|X ) of which the latter leads to attenuation-corrected 2 g|X (E2 AC = Obs g|X Max g|X 2 ), seems to solve most of the challenges related to attenuation and deflation in PMC, (g|X ) , and 2 g|X . R AC and E AC can reach the extremes of correlation ± 1 and E2 AC can reach values 0-1 even when the categories in the items differ from each other in an obvious manner and irrespective of the division of cases in subpopulations of g or difficulty level in g. In some cases, the attenuation may be corrected by 0.20-0.50 units of correlation or even more. The article did not discuss the characteristics of R AC , E AC , and E2 AC in-depth and simulations of their behavior is needed (see comparisons in Metsämuuronen 2022a). Notably, the attenuation correction to η and η 2 presented in the article also fits with nominal-scaled g.

Some practical possibilities to use RAC and EAC
The characteristics of R AC and E AC were not studied in-depth, although some limits were discussed. More studies in this respect would be beneficial. However, if R AC and E AC and the related maximal Rit and eta are found stable and useful tools in evaluating and correcting attenuation in correlations, the maximal possible Rit and eta given the observed dataset-if not R AC or E AC -may be worth considering reporting routinely as a related statistic with the observed correlation. In case the widths of the scales differ from each other in an obvious manner, this may help assess the magnitude of possible deflation in the observed estimates. Maybe, R AC could be considered when choosing the correction formulae for the r 2 effect sizes (see, e.g., Skidmore and Thompson 2011;Vacha-Haase and Thompson 2004).
R AC and E AC have two specific applications in measurement modelling settings where the correlation between an item with a narrower scale and a score with a wider scale is of interest and where Rit and η always underestimate the true correlation. First, Rit has been classically used as an estimator of item discrimination power (see Swineford 1936;Kuder 1937;Moses 2017) and η also could be used; η would react more efficiently to the non-linearity in the association. Because Rit and η underestimate the true correlation between an item and a score in an obvious manner, R AC and E AC could be used instead to reflect closer the true association. More studies in this regard could be beneficial.
Second, on one hand, Rit is embedded in the traditional estimators of reliability of the score such as coefficient alpha (Cronbach 1951) based in Rit, Armor's theta (Armor 1973) based on principal component loadings, McDonald's omega total (McDonald 1970), and rho or maximal reliability (e.g., Li 1997;Li et al. 1996;Raykov 2004), both based on factor loadings. Notably, the principal component or factor loadings (λ i ) are, essentially, correlations between items and the score variable, that is, their essence is Rit, see, e.g., Yang 2010). On the other hand, empirical results show that, using the traditional estimators of reliability, the reliability may be underestimated as much as 0.40-0.60 units of reliability (see Gadermann et al. 2012;Metsämuuronen 2022b, c; Metsämuuronen and Ukkola 2019) because of attenuation in PMC. Hence, Metsämuuronen (2022b, c) discusses attenuation-corrected estimators of or reliability (ACER) by changing Rit or principal component-or factor loading in the estimators by attenuationcorrected estimate of Rit (or attenuation-corrected λ i ). Then, the attenuation-corrected coefficient alpha based on R AC would be and based on E AC where k is the number of items in the test, 2 gi refers to item variances, and RAC iθ and EAC iθ are attenuation-corrected correlations between the item i and the score variable θ. Similarly, attenuation-corrected theta based on R AC would be and based on E AC (22) where RAC 2 iθ and EAC 2 iθ are attenuation corrected principal component loadings. Correspondingly, attenuation-corrected omega total based on R AC would be where RAC 2 iθ and EAC 2 iθ are attenuation-corrected factor loadings. The characteristics of the estimators 19 to 26 are not studied here (see closer Metsämuuronen 2022b, c). However, it is known that, except for binary items where R AC = E AC , the estimates based on E AC would give somewhat higher estimates than those by R AC . Also, it is known that theta maximizes alpha (e.g., Greene and Carmines 1980) and rho maximizes omega (e.g., Cheng et al. 2012) and that rho tends to overestimate reliability with small sample sizes (Aquirre-Urreta et al. 2019). Hence, using R AC with the alpha formula (Eq. 19) would give us a more conservative estimate of the reliability than the other estimators. Correspondingly, using E AC with the form of maximal reliability (Eq. 26) may lead to overestimate reliability with small sample sizes. In-depth studies of the estimators would enrich the discussion. It is good to note also the critical note by Chalmers (2017) that if the estimator of correlation does not refer to the observed variables as RPC does not, the estimator of reliability does not refer to the observed score. From this viewpoint, estimators based on R AC and E AC refer to the observed score the same manner as G and D do. Hence, their use could be justified in estimators or reliability (see Metsämuuronen 2022c).

Some possibilities to use the negative values of eta
One of the results was that coefficient η can indicate a negative association between two variables, although the traditional ways of calculating the estimates cannot detect this. This has its own value in extending our knowledge of the traditional coefficients of association. We may relevantly ask: Would it be valuable to start using a similar kind of "back to the basic elements" type of version of coefficient η as we have for the coefficient φ to indicate both the tendency and the magnitude of the association between variables and not just the magnitude? For a binary case, we can use Eq. (4), and parallel version(s) for the polytomous cases could be derived. While waiting for the derivation, Eq. (18) could be used as a shortcut to the direction of the association. Alternatively, a relevant question for the binary case is whether it would be easier, at first hand, to use PMC or, more precisely, the point-biserial correlation coefficient (Eq. 4) instead of η to obtain strictly both the magnitude as well as direction of the association?
The negative values in η may expand the usefulness of this coefficient as a descriptive statistic. First, generally, η gives a more credible estimate of association in comparison with PMC in the case of curvilinear association with ordinal g. Second, the enhanced η could be used to indicate strictly whether the curvilinear trend between the variables is, overall, ascending or descending. However, although the enhanced η tells the overall trend with higher accuracy than PMC under the condition of non-linearity in the association, it does not tell whether the pattern of association is convex or concave. Also, η seems not to reach the same accuracy of the fit as do the methods based on residuals related to the predicted values. Third, η has been used in measurement modelling settings rarely. Maybe the possibility of detecting curvilinearity in the item responses could be utilized more in the screening phase of items?

Limitations of the study
An obvious limitation of the study is the lack of further simulation of the proposed corrections of attenuation, R AC , E AC , and E2 AC (see, however, Metsämuuronen 2022a). Another is that an enhanced form of η is available only for a binary case, and not for the polytomous ordinal case. A possible short-cut method for the sign of η for polytomous cases was discussed: to use the sign of PMC for the same variables as a sign for η. Until better formulae are developed, this may serve as an intermediate solution to reach the real η. However, if this method is applied, one needs to be aware of three limitations in the interpretations of the results.
First, the short-cut method makes sense only when the categorical variable is ordinal (or better); PMC does not get any relevant interpretation with truly nominal-scaled variables with polytomous categories. Second, the sign of η does not tell whether the pattern between the variables is convex or concave, but it indicates the overall (linear) trend in the pattern. In these settings, the estimate of the true association between the variables is, most probably, reached closer by η than by PMC, because the latter detects the linear association, while η is not restricted to linear correlation. Third, if PMC equals zero, irrespective of the pattern between the variables, the sign of η will always be positive, because the sign of PMC cannot be used as an indicator of the sign of η.
From this point of view, testing of the null hypothesis related to gX = 0 is a relevant side procedure in the short-cut method; only when the estimate by PMC truly differs zero, the sign of PMC should be used in the process.

Appendix
Algebraic reasons why η underestimates association in the binary case Assume a metric variable X with observed values x j and a binary variable g with observed values y i = 0,1. In binary case, eta can be expressed in a form where X X0 and X X1 refer to the means of the variable X in the subpopulations y = 0 and y = 1, and g and X are standard deviations of g and X. By denoting the grand mean in X by GM X , the variance of X can be manipulated as follows: The term X X1 − X X0 × g can be manipulated as follows: Then, (g|X ) can be expressed in a form Because of having only two means, they are related to each other through GM X (1) (g|X ) = n 1 + n 0 (X X0 − GM X ) 2 + n 1 (X X1 − GM X ) 2 = 1 N n 0 2 X0 + n 1 2 X1 + n 0 (X X0 − GM X ) 2 + n 1 (X X1 − GM X ) 2 (2) = (1 − p) 2 X0 + p 2 X1 + (1 − p)(X X0 − GM X ) 2 + p X X1 − GM X 2 . (3) (4) (g|X ) = Again, the maximum value is strictly dependent on the variance of X in subpopulations of g (0 and 1), and Max g|X = 1 can be reached only when 2 Xi = 2 Xj = 0, that is, when all cases in the sub-group i in g share the identical category in X. This condition necessities a common number of categories in both variables, although the categories do not need to be in a linear order (cl. PMC). (10)