## Abstract

This study proposes an easy-to-implement approximation for primary and secondary effects in the study of inequality of educational opportunity by discretizing the measure of academic performance. Relative to the widely-used Erikson–Jonsson model, our method is not subject to the potential limitations that are associated with the parametric configurations of the normal distribution of academic performance and the model form restriction for predicting educational choice. Besides, the proposed discretization method can be used to reveal the heterogeneous effect of academic performance on the likelihood of educational transition across the spectrum of performance. Using Monte Carlo simulation and survey data collected in China, we show that our method recovers the results based on the Erikson–Jonsson approach. With another simulation, we illustrate that the Erikson–Jonsson model might produce misleading results if academic performance is not normally distributed. The discretization approach, in contrast, does not suffer from this problem.

This is a preview of subscription content, access via your institution.

## Notes

- 1.
- 2.
We use the term “approximation” because the method proposed in this article is to approximate a parametric model that is based on a continuous measure with a nonparametric model using a discretized measure. However, in certain situations where a parametric model is not applicable, the discretizing approximation may serve as one substitute for the EJM.

- 3.
It is worth mentioning that the computational process becomes more complicated when normality of performance cannot be held. Specifically, the EJM requires computing an integral in Eq. (1). This is simplified in the case of a normal performance distribution, because a logistic function can be approximated by a normal cumulative distribution function. That is, \(\frac{{\exp^{{({\text{a}} + {\text{bx}})}} }}{{1\,+\,\exp^{{({\text{a}} + {\text{bx}})}} }} \approx \Phi (ka + kbx),\) where k is a tuning constant (Kartsonaki et al. 2013, pp. 36–37). Then, Eq. (1) is approximated by \(\int {\upsigma^{ - 1} \upphi \left( {\frac{x\,-\,\upmu }{\upsigma }} \right)} \Phi (ka + kbx){\text{dx,}}\) an expression with a closed form. This convenience in computation, however, no longer exists if the normality configuration cannot be held.

- 4.
Here, effect heterogeneity refers to the case where different xs correspond different values of coefficient b. Although the probability of a positive response of the dependent variable, e.g., the probability of attending college, has a logistic-shape relationship with x, the coefficient b is nevertheless

*fixed*in the logistic regression model. - 5.
This concern could be tackled by a random effect logistic model, which, however, would further complicate the computation of the integral in Eq. (1).

- 6.
- 7.
Other group numbers are also tested in supplementary analyses, and discretization with more than four groups does not materially improve our estimation.

- 8.
For instance, scholars may discretize academic performance according to some theoretical or practical cutting points. As long as the shape of the discretized distribution does not substantially deviate from that of the continuous distribution, different discretizing strategies should produce similar substantive results, since they approximate the same continuous distribution.

- 9.
Denote the minimum and range for the upper class to be

*m*_{ u }and*r*_{ u }and those for the lower class to be*m*_{ l }and*r*_{ l }, the rescaled distribution of performance for the upper class follows \({\text{N}}\left( {\frac{{0{.}6\, -\, m_{u} }}{{r_{u} }},\frac{0{.}1}{{r_{u} }}} \right).\) The corresponding rescaled distribution for the lower class is \({\text{N}}\left( {\frac{{0{.}1 \,-\, m_{u} }}{{r_{u} }},\frac{0{.}5}{{r_{u} }}} \right)\). - 10.
We also tested some other grouping numbers (2, 3, 5, and 6) and the results suggest that three groups and above generate stable results that are close to the one obtained using the Erikson–Jonsson approach.

- 11.
The CHIP 2007 consists of three samples, respectively corresponding to permanent urban residents, permanent rural residents, and migrants. The first two samples are used to examine the urban–rural disparity of education.

- 12.
The process of college recruitment is organized provincially in China, and also varied greatly across periods. Besides, the examination papers differ between the liberal arts track and the science track. In this light, we in supplementary analysis controlled for these factors by regressing college entrance examination score on the year when the respondent took the exam, the province where the respondent took the exam, and the specific track. Then, we estimated primary and secondary effects on the regression residuals. The consistency between the Erikson–Jonsson Model and our discretizing approach is confirmed. For the sake of saving space, we here present the result based on the original test score.

- 13.
In a supplementary analysis, we tested this possibility using simulation. Analytical results suggest that when higher education is generally desirable and there exists a minimum entrance score line, a dichotomized measure of academic performance can always predict the odds of attending college. In this situation, the logistic model becomes not estimable.

- 14.
However, note that in this case, it is not necessary to decompose the primary and secondary effects because the primary effect would dominate the probability of educational transition.

## References

Altman, D.: Categorizing continuous variables. In: Armitage, P., Colton, T. (eds.) Encyclopedia of Biostatistics, pp. 1–4. Wiley, New York (2005)

Altman, D., Lausen, B., Sauerbrei, W., Schumacher, M.: Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J. Natl. Cancer Inst.

**86**, 829–835 (1994)Boudon, R.: Education, Opportunity, and Social Inequality. Wiley, New York (1974)

Connor, R.: Grouping for testing trends in categorical data. J. Am. Stat. Assoc.

**67**, 601–605 (1972)Cox, D.R.: Note on grouping. J. Am. Stat. Assoc.

**52**, 543–547 (1957)Erikson, R., Jonsson, J.O.: The Swedish context: educational reform and long-term change in educational inequality. In: Erikson, R., Jonsson, J.O. (eds.) Can Education Be Equalized? The Swedish Case in Comparative Perspective, pp. 65–94. Westview, Boulder (1996)

Erikson, R., et al.: On class differentials in educational attainment. Proc. Natl. Acad. Sci.

**102**, 9730–9733 (2005)Gelman, A., Park, D.: Splitting a predictor at the upper quarter or third and the lower quarter or third. Am. Stat.

**63**, 1–8 (2009)Jackson, M., et al.: Primary and secondary effects in class differentials in educational attainment: the transition to A-level courses in England and Wales. Acta Sociol.

**50**, 211–229 (2007)Jackson, M.: Introduction: how is inequality of educational opportunity generated? the case for primary and secondary effects. In: Jackson, M. (ed.) Determined to Succeed? Performance Versus Choice in Educational Attainment, pp. 1–33. Stanford University Press, Stanford (2013a)

Jackson, M.: Determined to Succeed? Performance Versus Choice in Educational Attainment. Stanford University Press, Stanford (2013b)

Kartsonaki, C., Jackson, M., Cox, D.R.: Primary and secondary effects: some methodological issues. In: Jackson, M. (ed.) Determined to Succeed? Performance Versus Choice in Educational Attainment, pp. 34–55. Stanford University Press, Stanford (2013)

Kong, S.: Rural-urban migration in China: survey design and implementation. In: Meng, X., Manning, C., Li, S., Effendi, T. (eds.) The Great Migration: Rural-Urban Migration in China and Indonesia, pp. 288–304. Edward Elgar, Cheltenham (2010)

Lausen, B., Schumacher, W.: Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comput. Stat. Data Anal.

**21**, 307–326 (1996)Maxwell, S., Delaney, H.: Bivariate median splits and spurious statistical significance. Psychol. Bull.

**113**, 181–190 (1993)Morgan, T.M., Elashoff, R.M.: Effect of categorizing a continuous covariate on the comparison of survival time. J. Am. Stat. Assoc.

**81**, 917–921 (1986)Naggara, O.N., et al.: Analysis by categorizing or dichotomizing continuous variables is inadvisable: an example from the natural history of unruptured aneurysms. Am. J. Neuroradiol.

**32**, 437–440 (2011)O’Hagan, A., Leonard, T.: Bayes estimation subject to uncertainty about parameter constraints. Biometrika

**63**, 201–202 (1976)Tam, T., Jiang, J.: The divergent urban-rural trends in college attendance: state policy bias and structural exclusion in China. Sociol. Educ.

**88**, 160–180 (2015)Taylor, J., Yu, M.: Bias and efficiency loss due to categorizing an explanatory variable. J. Multivar. Anal.

**83**, 248–263 (2002)Turner, E.L., Dobson, J., Pocock, S.: Categorization of continuous risk factors in epidemiological publications: a survey of current practice. Epidemiol. Perspect. Innov.

**7**, 9 (2010)Wu, X., Treiman, D.: Inequality and equality under Chinese socialism: the Hukou system and intergenerational occupational mobility. Am. J. Sociol.

**113**, 415–445 (2007)Zhao, L., Kolonel, L.: Efficiency loss from categorizing quantitative exposures into qualitative exposures in case-control studies. Am. J. Epidemiol.

**136**, 464–474 (1992)

## Author information

### Affiliations

### Corresponding author

## Rights and permissions

## About this article

### Cite this article

Hu, A. Using a discretized measure of academic performance to approximate primary and secondary effects in inequality of educational opportunity.
*Qual Quant* **51, **1627–1643 (2017). https://doi.org/10.1007/s11135-016-0356-8

Published:

Issue Date:

### Keywords

- Primary effect
- Secondary effect
- Nonparametric
- The Erikson–Jonsson model
- Discretization