The main objective of this study was to work toward the development of a number of measures of student learning outcomes (SLOs) in higher education. Specifically, we used data from Exame Nacional de Desempenho dos Estudantes (ENADE), a college-exit examination developed and used in Brazil. The fact that Brazil administered the ENADE to both freshmen and senior students provided a unique opportunity to get a first approximation of the general and subject area knowledge gained in different programs. The results suggested that, on average, students in the three different categories of programs were gaining valuable general and subject area knowledge. The gains in the subject area were of a larger magnitude than those in the general knowledge component of the test. This study contributes to the field by providing empirical and visually compelling evidence related to SLOs gains in higher education.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
We use Barnett’s (1992) definition of quality of higher education (see below for a more detailed description).
For a comprehensive review of countries around the world that have engaged in developing a system to evaluate SLOs in higher education, please see (Nusche 2008).
The main goal of AHELO is to create a multi-dimensional, interdisciplinary, cross-cultural, and comprehensive system to evaluate whether students were indeed learning valuable knowledge and skills. For a more detailed description of the project and preliminary results of the pilot program, see: http://www.oecd.org/education/skills-beyond-school/testingstudentanduniversityperformancegloballyoecdsahelo.htm.
It is important to clarify that since 1996 the legislation established a learning outcomes test called “Exame Nacional de Cursos” that preceded ENADE.
The ENADE measures general knowledge which is common to all the programs participating in the study, but its content is unrelated to the student’s program of study. The examination content is related to cultural and social aspects of contemporary society. The subject area includes assessments of basic areas in the undergraduate programs (http://portal.mec.gov.br/index.php?Itemid=313; Pedrosa et al. 2013). We describe the ENADE in greater detail below.
For a description of the Collegiate Learning Assessment (CLA), see: Klein et al. (2007).
An additional problem is that to the best of our knowledge, there has not been independent evaluation of the psychometric properties of the CLA. Possin (2013), a philosophy professor, conducted a descriptive evaluation of the instrument and concluded that from the point of view of the graders of the examination, they are rendering the instrument invalid.
The Colombian Institute for the Promotion of Higher Education (ICFES-Spanish acronym) commissioned the Australian Council for Education Research (ACER) to adapt its Graduate Skills Assessment (GSA) to the Colombian university.
It is important to mention that there has not been an independent psychometric study of the properties of the ENADE in all the different fields. As a result, we argue that this examination has unknown psychometric properties. To the best of our knowledge, there are only a couple of studies that have explored this issue in a rigorous way, but only in the case of the Psychology examination (Primi et al. 2010, 2011).
ENADE collects information for the different parts of the examination regarding the type of participation of the student. They report: absentees, students who left the majority of the questions blank, student who protested, student who participated effectively, whether the test was not considered, whether the test was not valid, and whether there was an administrative error.
Freshmen students include those students who enrolled in college the year of the ENADE and that have successfully completed at least 25 percent of the course requirements for the specific academic year.
Seniors are students taking the last required courses to attain their desired degree. The examination takes place in November, and the expected graduation date is December of the year that the examination is given.
All the data were downloaded from http://portal.inep.gov.br/basica-levantamentos-acessar.
Our rationale for removing programs within these three categories was that either they were vocational and technical programs, or that they did not have enough scores.
There is evidence that in certain years groups of students decided to boycott the examination by not participating in it or not completing it (Pedrosa et al. 2013).
This variable was created using the income variable that was part of the student questionnaire (i.e., question 5 in 2009, 2010, and question 7 in 2008). This variable defines income according to minimum wages of all adults living in a household. Individuals from families with total income from 0 to 3 minimum wages per month (included) were defined as low income and those with family income above 10 minimum wages high income.
The data used to estimate this and the other figures in the paper are summarized in Table 3 in the Appendix.
The much higher variability in the subject area part of the examination in 2010 might be related to the substantially higher variation in the estimates for the students enrolled in the different Biological Science programs. This in turn might be related to the way the different tests were created and therefore might not be related to the specific fields or programs. This is a key issue that needs to be taken into account when using these types of methods to compare gains in SLOs in different programs. We thank our reviewer for suggesting this alternative explanation.
For the computation of the combined effect size, we used the raw scores of the student in the multiple-measures part of the general examination (NT_OBJ_FG) and the raw score on the essay component of the examination (NT_OBJ_CE).
Arum, R., & Roksa, J. (2011). Academically adrift: Limited learning on college campuses. Chicago, IL: University of Chicago Press.
Barnett, R. (1992). Improving higher education. Total quality care. The Society for Research in Higher Education and Open University Press, Buckingham.
Barrera-Osorio, F., & Bayona-Rodríguez, H. (2014). The causal effect of university quality on labor market outcomes: Empirical evidence from Colombia. Presented at the V Seminario Internaciónal ICFES sobre Investigacion en la Calidad de la Educación, Bogotá, Colombia.
Borenstein, M., Hedges, L., & Rothstein, H. (2007). Meta-analysis: Fixed effect vs. random effects. http://www.meta-analysis.com/downloads/Meta-analysis%20fixed%20effect%20vs%20random%20effects.pdf. Accessed 25 January 2014.
Clark, B. R. (1983). The higher education system: Academic organization in cross-national perspective. Berkeley, CA: University of California Press.
Coates, H. (2009). What’s the difference? A model for measuring the value added by higher education in Australia, Higher Education Management and Policy, 21(1), 1–13.
Coates, H. (2014). Higher education learning outcomes assessment: International perspectives. Frankfurt: Peter Lang.
Domingue, B. W., Morales, J. A., Shavelson, R., Wiley, E., Molina, A., & Mariño, J. P. (2014). Challenges to the study of school effects in higher education. Institute of Behavioral Science at the University of Colorado Boulder, Instituto Colombiano para la Evaluación de la Educación in Bogotá, Colombia, SK Partners, LLC, and Stanford University.
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2011). MatchIt: Nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 42(8), 1–28.
INEP. (2009). SINAES: da concepção à regulamentação [SINAES: From conceptual development to legislation], Instituto Nacional de Estudos e Pesquisas Educacionais “Anísio Teixeira” (INEP), Ministério da Educação, Brasil. www.publicacoes.inep.gov.br/detalhes.asp?pub=4389#.
Kelley, K. (2007). Confidence intervals for standardized effect sizes: Theory, application, and implementation. Journal of Statistical Software, 20(8), 1–24.
Klein, S., Benjamin, R., Shavelson, R., & Bolus, R. (2007). The Collegiate Learning Assessment facts and fantasies. Evaluation Review, 31(5), 415–439.
Melguizo, T. (2011). A review of the theories developed to describe the process of college persistence and attainment. In J. C. Smart & M. B. Paulsen (Eds.), Higher Education: Handbook of Theory and Research (pp. 395–424). The Netherlands: Springer.
Melguizo, T., Zamarro, G., Velazco, T., & Sanchez, F. (2015). How can we accurately measure whether students are gaining valuable learning as well as other relevant outcomes in Higher Education? Rossier School of Education, University of Southern California.
National Academy of Academic Leadership. (2014). Assessment and evaluation in higher education: Some concepts and principles. http://www.thenationalacademy.org/sitemap.html.
Nusche, D. (2008). Assessment of learning outcomes in higher education. A comparative review of selected practices, OECD education working papers no 15, OECD, Paris.
OECD. (2008). Higher education to 2030, Vol. 1, Demography. Centre for educational research and innovation. Organization for Economic Cooperation and Development.
OECD. (2013). Assessment of higher education learning outcomes (AHELO): Feasibility study report. Volume 1: Design and implementation. Retrieved on 8 April 2014 from http://www.oecd.org/education/skills-beyond-school/testingstudentanduniversityperformancegloballyoecdsahelo.htm.
Pedrosa, R. L., Amaral, E., & Knobel, M. (2013). Assessing higher education learning outcomes in Brazil. Higher Education Management and Policy,. doi:10.1787/hemp-24-5k3w5pdwk6br.
Possin, K. (2013). A serious flaw in the Collegiate Learning Assessment [CLA] test. Informal Logic, 33(3), 390–405.
Primi, R., Carvalho, L. F., Miguel, F. K., & Silva, M. C. R. (2010). Análise do funcionamento diferencial dos itens do Exame Nacional do Estudante (ENADE) de Psicologia de 2006. [Analysis of the differential item functioning of the 2006 Psychology ENADE exam]. Psico-USF 15(3), 379–393. http://www.scielo.br/scielo.php?pid=S1413-82712010000300011&script=sci_arttext.
Primi, R., Hutz, C. S., & Silva, M. C. R. (2011). A prova do ENADE de Psicologia 2006: concepção, construção e análise psicométrica da prova. [The 2006 Psychology ENADE exam: conception, construction, and psychometric evaluation of the exam]. Avaliação Psicológica 10(3), 271–294. http://www.labape.com.br/labape/artigos/A%20PROVA%20DO%20ENADE%20DE%20PSICOLOGIA%202006.pdf.
Rossefsky-Saavedra, A. R., & Saavedra, J. E. (2011). Do colleges cultivate critical thinking, problem solving, writing and interpersonal skills? Economics of Education Review, 30(6), 1516–1526.
Saavedra, J. E. (2009). The learning and early labor market returns to college quality: A regression discontinuity analysis. Cambridge, MA: Harvard University.
Silva Filho, R. L., et al. (2007). A evasão no ensino superior brasileiro. [Evasion in Brazilian higher education]. Cadernos de Pesquisa, 37(132), 641–659.
Steedle, J. T. (2012). Selecting value-added models for postsecondary institutional assessment. Assessment & Evaluation in Higher Education, 37(6), 637–652.
Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 25(1), 1.
Verhine, R. & Dantas, L. M. V. (2005). Assessment of higher education in Brazil: from the provão to Enade”. Document prepared for the World Bank, responsible party: Alberto Rodriguez.
Verhine, R., Dantas, L. M. V., & Soares, J. F. (2006). Do Provão ao ENADE: uma análise comparativa dos exames nacionais utilizados no Ensino Superior Brasileiro [From “Provão” to ENADE: A comparative analysis of national exams used in Brazilian higher education]. Ensaio: Aval. Pol. Públ. Educ., 14(52), 291–310. See earlier English version.
INEP. (nd). Resultado do indicador de diferença entre os desempenhos observado e esperado—IDD [Results from the indicator of the difference between the observed and expected outcomes], Instituto Nacional de Estudos e Pesquisas Educacionais “Anísio Teixeira” (INEP), Ministério da Educação, Brasil.
Zemsky, R., Wegner, G. R., & Massy, W. P. (2005). Remaking the American University: Market-smart and Mission-centered. Piscataway, NJ: Rutgers University Press.
We would like to thank Roberto Verhine of Universidad Federal de Bahia, Marcelo Knobel and Renato Pedrosa at University of Campinas for helpful comments and suggestions on earlier versions of this paper.
Appendix: Combining effect sizes to produce program-level/engineering effect sizes
Appendix: Combining effect sizes to produce program-level/engineering effect sizes
The method to combine measures of effect sizes of different fields (e.g., mathematics and computer science) into a general area (e.g., STEM) measure comes from procedures developed in the meta-analyses field.Footnote 21 There are two approaches that are generally used to combine effect sizes: fixed and random effects models (Borenstein et al. 2007). The fixed effects model assumes that the effect size of each clinical trial, in our case each field, is an estimate of a true and unique correct effect size. Each effect size is different because of sampling error, and thus an average of the measures is a good estimate of this combined effect size. These models traditionally use a weighted average because clinical trials that involve more people should weigh more than trials with less people involved. The weight used is the inverse of the variance of the effect size, which is calculated as
where ne and nc are the number of people in the experimental and control groups and the subscript i refers to each field’s measures. The combined effect size is then calculated as
The variance of the combined effect size is calculated as
and from this, one can calculate a confidence interval for the resulting effect size.
The random effects model does not assume that all trials, or in our case, all field have the same “true” effect size, but instead assumes that the true effect size for each field is normally distributed around a mean true measure, with a variance, called variance between treatments, denoted by τ (tau). The statistical computations estimate not only the mean true value, but also the between-treatment variance. We refer the reader to Borenstein et al. (2007) for a more complete explanation and the specific formulas. In this study, we chose the random effect model because it is clear from Fig. 1a–c that each field has a different “true” effect sizes, sometimes statistically significantly different. Specifically, we use the DerSimonian–Laird estimation for the between-study variance as implemented in the package meta of the software R (Table 3).
About this article
Cite this article
Melguizo, T., Wainer, J. Toward a set of measures of student learning outcomes in higher education: evidence from Brazil. High Educ 72, 381–401 (2016). https://doi.org/10.1007/s10734-015-9963-x
- Student learning outcomes
- Quality of higher education