Skip to main content

Using a discretized measure of academic performance to approximate primary and secondary effects in inequality of educational opportunity


This study proposes an easy-to-implement approximation for primary and secondary effects in the study of inequality of educational opportunity by discretizing the measure of academic performance. Relative to the widely-used Erikson–Jonsson model, our method is not subject to the potential limitations that are associated with the parametric configurations of the normal distribution of academic performance and the model form restriction for predicting educational choice. Besides, the proposed discretization method can be used to reveal the heterogeneous effect of academic performance on the likelihood of educational transition across the spectrum of performance. Using Monte Carlo simulation and survey data collected in China, we show that our method recovers the results based on the Erikson–Jonsson approach. With another simulation, we illustrate that the Erikson–Jonsson model might produce misleading results if academic performance is not normally distributed. The discretization approach, in contrast, does not suffer from this problem.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    The decomposition method was first published by Erikson et al. (2005). It is an extension of an approach discussed by Erikson and Jonsson (1996). In this study, we follow Jackson (2013a) and denote this decomposition approach to be “the Erikson–Jonsson model.”

  2. 2.

    We use the term “approximation” because the method proposed in this article is to approximate a parametric model that is based on a continuous measure with a nonparametric model using a discretized measure. However, in certain situations where a parametric model is not applicable, the discretizing approximation may serve as one substitute for the EJM.

  3. 3.

    It is worth mentioning that the computational process becomes more complicated when normality of performance cannot be held. Specifically, the EJM requires computing an integral in Eq. (1). This is simplified in the case of a normal performance distribution, because a logistic function can be approximated by a normal cumulative distribution function. That is, \(\frac{{\exp^{{({\text{a}} + {\text{bx}})}} }}{{1\,+\,\exp^{{({\text{a}} + {\text{bx}})}} }} \approx \Phi (ka + kbx),\) where k is a tuning constant (Kartsonaki et al. 2013, pp. 36–37). Then, Eq. (1) is approximated by \(\int {\upsigma^{ - 1} \upphi \left( {\frac{x\,-\,\upmu }{\upsigma }} \right)} \Phi (ka + kbx){\text{dx,}}\) an expression with a closed form. This convenience in computation, however, no longer exists if the normality configuration cannot be held.

  4. 4.

    Here, effect heterogeneity refers to the case where different xs correspond different values of coefficient b. Although the probability of a positive response of the dependent variable, e.g., the probability of attending college, has a logistic-shape relationship with x, the coefficient b is nevertheless fixed in the logistic regression model.

  5. 5.

    This concern could be tackled by a random effect logistic model, which, however, would further complicate the computation of the integral in Eq. (1).

  6. 6.

    Another condition for categorizing continuous variables is that the cut-point should be decided based on theoretical or clinical justification instead of solely data driven. This point has been highlighted in Altman et al. (1994) and Lausen and Schumacher (1996).

  7. 7.

    Other group numbers are also tested in supplementary analyses, and discretization with more than four groups does not materially improve our estimation.

  8. 8.

    For instance, scholars may discretize academic performance according to some theoretical or practical cutting points. As long as the shape of the discretized distribution does not substantially deviate from that of the continuous distribution, different discretizing strategies should produce similar substantive results, since they approximate the same continuous distribution.

  9. 9.

    Denote the minimum and range for the upper class to be m u and r u and those for the lower class to be m l and r l , the rescaled distribution of performance for the upper class follows \({\text{N}}\left( {\frac{{0{.}6\, -\, m_{u} }}{{r_{u} }},\frac{0{.}1}{{r_{u} }}} \right).\) The corresponding rescaled distribution for the lower class is \({\text{N}}\left( {\frac{{0{.}1 \,-\, m_{u} }}{{r_{u} }},\frac{0{.}5}{{r_{u} }}} \right)\).

  10. 10.

    We also tested some other grouping numbers (2, 3, 5, and 6) and the results suggest that three groups and above generate stable results that are close to the one obtained using the Erikson–Jonsson approach.

  11. 11.

    The CHIP 2007 consists of three samples, respectively corresponding to permanent urban residents, permanent rural residents, and migrants. The first two samples are used to examine the urban–rural disparity of education.

  12. 12.

    The process of college recruitment is organized provincially in China, and also varied greatly across periods. Besides, the examination papers differ between the liberal arts track and the science track. In this light, we in supplementary analysis controlled for these factors by regressing college entrance examination score on the year when the respondent took the exam, the province where the respondent took the exam, and the specific track. Then, we estimated primary and secondary effects on the regression residuals. The consistency between the Erikson–Jonsson Model and our discretizing approach is confirmed. For the sake of saving space, we here present the result based on the original test score.

  13. 13.

    In a supplementary analysis, we tested this possibility using simulation. Analytical results suggest that when higher education is generally desirable and there exists a minimum entrance score line, a dichotomized measure of academic performance can always predict the odds of attending college. In this situation, the logistic model becomes not estimable.

  14. 14.

    However, note that in this case, it is not necessary to decompose the primary and secondary effects because the primary effect would dominate the probability of educational transition.


  1. Altman, D.: Categorizing continuous variables. In: Armitage, P., Colton, T. (eds.) Encyclopedia of Biostatistics, pp. 1–4. Wiley, New York (2005)

    Google Scholar 

  2. Altman, D., Lausen, B., Sauerbrei, W., Schumacher, M.: Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J. Natl. Cancer Inst. 86, 829–835 (1994)

    Article  Google Scholar 

  3. Boudon, R.: Education, Opportunity, and Social Inequality. Wiley, New York (1974)

    Google Scholar 

  4. Connor, R.: Grouping for testing trends in categorical data. J. Am. Stat. Assoc. 67, 601–605 (1972)

    Article  Google Scholar 

  5. Cox, D.R.: Note on grouping. J. Am. Stat. Assoc. 52, 543–547 (1957)

    Article  Google Scholar 

  6. Erikson, R., Jonsson, J.O.: The Swedish context: educational reform and long-term change in educational inequality. In: Erikson, R., Jonsson, J.O. (eds.) Can Education Be Equalized? The Swedish Case in Comparative Perspective, pp. 65–94. Westview, Boulder (1996)

    Google Scholar 

  7. Erikson, R., et al.: On class differentials in educational attainment. Proc. Natl. Acad. Sci. 102, 9730–9733 (2005)

    Article  Google Scholar 

  8. Gelman, A., Park, D.: Splitting a predictor at the upper quarter or third and the lower quarter or third. Am. Stat. 63, 1–8 (2009)

    Article  Google Scholar 

  9. Jackson, M., et al.: Primary and secondary effects in class differentials in educational attainment: the transition to A-level courses in England and Wales. Acta Sociol. 50, 211–229 (2007)

    Article  Google Scholar 

  10. Jackson, M.: Introduction: how is inequality of educational opportunity generated? the case for primary and secondary effects. In: Jackson, M. (ed.) Determined to Succeed? Performance Versus Choice in Educational Attainment, pp. 1–33. Stanford University Press, Stanford (2013a)

    Chapter  Google Scholar 

  11. Jackson, M.: Determined to Succeed? Performance Versus Choice in Educational Attainment. Stanford University Press, Stanford (2013b)

    Book  Google Scholar 

  12. Kartsonaki, C., Jackson, M., Cox, D.R.: Primary and secondary effects: some methodological issues. In: Jackson, M. (ed.) Determined to Succeed? Performance Versus Choice in Educational Attainment, pp. 34–55. Stanford University Press, Stanford (2013)

    Chapter  Google Scholar 

  13. Kong, S.: Rural-urban migration in China: survey design and implementation. In: Meng, X., Manning, C., Li, S., Effendi, T. (eds.) The Great Migration: Rural-Urban Migration in China and Indonesia, pp. 288–304. Edward Elgar, Cheltenham (2010)

    Google Scholar 

  14. Lausen, B., Schumacher, W.: Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comput. Stat. Data Anal. 21, 307–326 (1996)

    Article  Google Scholar 

  15. Maxwell, S., Delaney, H.: Bivariate median splits and spurious statistical significance. Psychol. Bull. 113, 181–190 (1993)

    Article  Google Scholar 

  16. Morgan, T.M., Elashoff, R.M.: Effect of categorizing a continuous covariate on the comparison of survival time. J. Am. Stat. Assoc. 81, 917–921 (1986)

    Article  Google Scholar 

  17. Naggara, O.N., et al.: Analysis by categorizing or dichotomizing continuous variables is inadvisable: an example from the natural history of unruptured aneurysms. Am. J. Neuroradiol. 32, 437–440 (2011)

    Article  Google Scholar 

  18. O’Hagan, A., Leonard, T.: Bayes estimation subject to uncertainty about parameter constraints. Biometrika 63, 201–202 (1976)

    Article  Google Scholar 

  19. Tam, T., Jiang, J.: The divergent urban-rural trends in college attendance: state policy bias and structural exclusion in China. Sociol. Educ. 88, 160–180 (2015)

    Article  Google Scholar 

  20. Taylor, J., Yu, M.: Bias and efficiency loss due to categorizing an explanatory variable. J. Multivar. Anal. 83, 248–263 (2002)

    Article  Google Scholar 

  21. Turner, E.L., Dobson, J., Pocock, S.: Categorization of continuous risk factors in epidemiological publications: a survey of current practice. Epidemiol. Perspect. Innov. 7, 9 (2010)

    Article  Google Scholar 

  22. Wu, X., Treiman, D.: Inequality and equality under Chinese socialism: the Hukou system and intergenerational occupational mobility. Am. J. Sociol. 113, 415–445 (2007)

    Article  Google Scholar 

  23. Zhao, L., Kolonel, L.: Efficiency loss from categorizing quantitative exposures into qualitative exposures in case-control studies. Am. J. Epidemiol. 136, 464–474 (1992)

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Anning Hu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hu, A. Using a discretized measure of academic performance to approximate primary and secondary effects in inequality of educational opportunity. Qual Quant 51, 1627–1643 (2017).

Download citation


  • Primary effect
  • Secondary effect
  • Nonparametric
  • The Erikson–Jonsson model
  • Discretization