Skip to main content

Returns to effort: experimental evidence from an online language platform


While distance learning has become widespread, causal estimates regarding returns to effort in technology-assisted learning environments are scarce due to high attrition rates and endogeneity of effort. In this paper, I manipulate effort by randomly assigning students different numbers of lessons in a popular online language learning platform. Using administrative data from the platform and the instrumental variables strategy, I find that completing 9 Duolingo lessons, which corresponds to approximately 60 minutes of studying, leads to a 0.057–0.095 standard deviation increase in test scores. Comparisons to the literature and back-of-the-envelope calculations suggest that distance learning can be as effective as in-person learning for college students for an introductory language course.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2


  1. For example, searches for learning related apps have grown 80% and searches for apps related to learning a language have grown 85% from 2016 to 2017 according to Google data.

  2. For example, Jordan (2015) reports that the average completion rates for Massive Online Open Courses is approximately 15% based on 217 courses from 15 different platforms.

  3. Most researchers assume linear returns when modeling effort choices, but whether returns to effort are non-linear is not explored empirically. Although the experimental design of this paper is suitable for such an exploration, I am unable to infer conclusions about the non-linearity of returns to effort given the small sample size in each assignment group.

  4. For example, Renz et al. (2016) find that reminder emails about unseen lecture videos increase lecture views.

  5. This finding also relates to Clark et al. (2017). The authors show assigning task-based goals has large impacts on course performance of students in in-person classrooms.

  6. Although the skill levels are named as beginner, intermediate, and advanced, Duolingo courses generally teach at most to the B1 level of Common European Framework of Reference for Languages (CEFR) which is threshold level independent user. For more details on CEFR standards, see

  7. See Appendix A.1 in ESM for the structure of Duolingo and Appendix A.2 in ESM for a list of Duolingo lessons.

  8. The recruitments took place in December 2016. The universities which I recruit from are California Polytechnic State University-San Luis Obispo, California State Polytechnic University-Pomona, California State University-Fresno, California State University-Fullerton, California State University-Los Angeles, California State University-Northridge, California State University-Sacramento, University of California-Davis, University of California-Irvine, University of California-Santa Barbara, University of California-San Diego, San Diego State University, and San Francisco State University. I choose a large number of universities since I need a large subject pool to use in both this paper and a companion paper. I determined this list using the College Navigator website based on the following criteria: being public, offering at least a Bachelor’s degree, having at least 20,000 undergraduates and admitting 30% to 70% of its applicants, and being in California.

  9. The answers to the eligibility questions are self-reported. 12 subjects reported that they are not students in one of the universities listed in Footnote 8, 5 subjects reported that they are not highly interested in learning Spanish, 20 subjects reported that they are not able to commit up to 4 hours and 32 subjects reported that they know Spanish at the intermediate level or above.

  10. Once participants signed up for Duolingo, they were encouraged to take a very short Spanish test within Duolingo. Duolingo uses this test to suggest a starting level for its users and marks all the lessons up to that level as complete. I ask participants not to complete any lessons until they receive their lesson assignments and assign their lessons based on the results of this placement test. Hence, the assigned lessons differ within groups depending on the initial levels of the participants. Once the deadline for joining Duolingo passed, randomization to different groups was done using the list randomizer on

  11. Vesselinov and Grego (2012) measure how time spent in Duolingo correlates with improvement in test scores using a random representative sample of Duolingo users studying Spanish. According to their data, the mean hours of study for four weeks correspond to approximately 5 hours. I have determined the number of lessons assigned such that the time spent for 64-lessons is similar to this mean.

  12. I choose not to intervene with the number of logins per week since such an intervention can create differential attrition across groups.

  13. The copy/paste function is disabled for this text and the participants need to type this sentence exactly as shown to be able to proceed to the next parts of the survey.

  14. See Appendix A.4 and A.5 in ESM for screenshots of example e-mails.

  15. See Appendix A.6 in ESM for the exact wording of the instructions regarding the payment procedure. Following the instructions, I ask subjects to answer some comprehension questions about the payment procedure to check their understanding of the procedure. Subjects need to answer them correctly to continue. Appendix A.6 in ESM displays the screenshots of these questions.

  16. For example, Owen (1995) defines student effort as days of school attended per year and hours spent in the classroom or doing homework.

  17. This measure includes new as well as repeated lessons. Sixty nine percent of the subjects did not complete any repeated lessons and an additional 20 percent of them completed between one and five repeated lessons. Using the number of new lessons completed instead of the number of all lessons completed does not affect the results.

  18. I also have data on self-reported minutes of study on Duolingo for the last week of the experiment. Multiplying these self-reports by four and using them as the time spent measure instead of the administrative one leads to similar estimates.

  19. Appendix Figure B.1 in ESM displays a histogram of minutes spent per lesson based on timestamp data. Minutes spent per lesson are 20 or less in 82.4% of the lessons and 100 or more in 14.2% of the lessons.

  20. The tests are accessible through these links: (Test A) and (Test B).

  21. I thank Jaime Arellano-Bover, Nano Barahona, Eduardo Laguna-Muggenburg, Jose Maria Barrero, Alejandro Martinez- Marquina, Oriol Pons-Benaiges, and Diego Torres-Patino for their help.

  22. The overall attrition rate in this paper is similar to the 15% average attrition rate reported for field experiments published in top economics journals (Ghanem et al. 2019).

  23. Since I know the exact number of lessons subjects complete, subjects not completing any lessons or subjects stopping to complete lessons (differentially across assignment groups) does not cause a problem for the estimations as long as these subjects take the final Spanish tests.

  24. Restricting attention to the subjects who completed both of the final tests reveal similar results. The median participants in the 32-lessons, 48-lessons, 64-lessons, 80-lessons, and 96-lessons groups completed 32, 48, 64, 78, and 95 lessons, respectively.

  25. Repeating the analysis of this table using the number of new Duolingo lessons completed and self-reported minutes spent results in similar estimates. See Appendix Table B.2 in ESM.

  26. Appendix Figure B.2b in ESM left panel displays the distribution of final scores and Appendix Figure B.2c in ESM left panel displays the distribution of improvement in scores across treatment groups for the Duolingo-based test.

  27. Based on a pilot for which I have administrative data on the actual time spent on Duolingo lessons (instead of timestamps), a Duolingo lesson takes 6.7 minutes, on average. Hence, 9 lessons would take approximately 60 minutes.

  28. Appendix Table B.3 in ESM Columns (1)–(2) report the intention-to-treat estimates for Duolingo-based test. Each assigned Duolingo lesson increases the internal test score by 0.008 sd (statistically significant at the 5% level).

  29. The instrumental variables estimates reported in Table 3 Panel A are larger than the corresponding OLS estimates (see Appendix Table B.4 Panel A in ESM). This result suggests that unobserved ability and Duolingo effort are negatively correlated.

  30. Appendix Figure B.2b in ESM right panel displays the distribution of final scores and Appendix Figure B.2c in ESM right panel displays the distribution of improvement in scores across treatment groups for the external test.

  31. Appendix Table B.3 in ESM Columns (3)–(4) report the intention-to-treat estimates for WebCAPE. Each assigned Duolingo lesson increases the external test score by 0.005 sd (statistically significant at the 10% level).

  32. The instrumental variables estimates reported in Table 3 Panel B are larger than the corresponding OLS estimates (see Appendix Table B.4 Panel B in ESM).

  33. A student with a score of 346 points could be placed in Semester 3 and a student with a score of 428 points could be placed in Semester 4 of a 4-semester Spanish course.

  34. Notice that this discussion in the text is based on points and the estimates in the table are based on standard deviations.

  35. Duolingo reports that an average lesson takes 5 to 10 minutes. Based on a pilot for which I have administrative data on the actual time spent on Duolingo lessons (instead of timestamps), a lesson takes 6.7 minutes, on average. Using this estimate, I find that 344 Duolingo lessons will take approximately 38 hours (\(\frac{344*6.7}{60}\)).

  36. A semester corresponds to 15 weeks and 3 hours of instruction per week per class for most of the schools in my sample.

  37. Using a continuous GPA measure also results in similar findings. A one point increase in GPA interacted with the number of lessons completed with Duolingo increases the internal (external) test scores by 0.0367 sd (0.0274 sd) and a one point increase in GPA interacted with minutes spent on Duolingo increases the internal (external) test scores by 0.008 sd (0.007 sd).

  38. Additionally, we might worry that the instrument affects the usage frequency of the platform and usage frequency has a direct effect on test performance. To explore this concern, I check how the number of days active in the platform differs across lesson groups. Although there are some statistical differences, the differences are not monotonic and somewhat different in each week.

  39. Doing a trimming exercise in the spirit of Lee (2009) bounds results in similar findings with slightly tighter bounds. In particular, the exercise is informative for the internal test scores (the lower bound is 0.008 sd (statistically significant at the 5% level) and the upper bound is 0.011 sd (statistically significant at the 1% level)), but not for the external test scores (the lower bound is 0.005 sd (not statistically significant) and upper bound is 0.008 sd (statistically significant at the 1% level)).


  • Angrist, J., & Lavy, V. (2009). The effects of high stakes high school achievement awards: Evidence from a randomized trial. American Economic Review, 99(4), 301–331.

    Article  Google Scholar 

  • Athey, S., & Imbens, G. (2017). The econometrics of randomized experiments. Handbook of Economic Field Experiments, 1, 73–140.

    Article  Google Scholar 

  • Banerjee, A., Cole, S., Duflo, E., & Linden, L. (2007). Remedying education: Evidence from two randomized experiments in India. The Quarterly Journal of Economics, (pp. 1235–1264).

  • Barrow, L., Markman, L., & Rouse, C. E. (2009). Technology’s edge: The educational benefits of computer-aided instruction. American Economic Journal: Economic Policy, 1(1), 52–74.

    Google Scholar 

  • Bawa, P. (2016). Retention in online courses: Exploring issues and solutions—a literature review. SAGE Open.

  • Bettinger, E. P., Fox, L., Loeb, S., & Taylor, E. S. (2017). Virtual classrooms: How online college courses affect student success. American Economic Review, 107(9), 2855–2875.

    Article  Google Scholar 

  • Betts, J.R. (1996). The role of homework in improving school quality. Discussion Paper 96–16. Department of Economics, UCSD.

  • Bonesrønning, H., & Opstad, L. (2012). How much is students’ college performance affected by quantity of study? International Review of Economics Education, 11, 46–63.

    Article  Google Scholar 

  • Bonesrønning, H., & Opstad, L. (2015). Can student effort be manipulated? Does it matter? Applied Economics.

  • Chen, J., & Lin, T.-F. (2008). Class attendance and exam performance: A randomized experiment. The Journal of Economic Education, 39(3), 213–227.

    Article  Google Scholar 

  • Chevalier, A., Dolton, P., & Lührmann, M. (2018). Making it count: Incentives. Student effort and Performance. Journal of Royal Statistical Society, 181, 323–349.

    Article  Google Scholar 

  • Clark, D., Gill, D., Prowse, V. & Rush, M. (2017). Using goals to motivate college students: Theory and evidence from field experiments. NBER Working Paper 23638.

  • Davidson, R., & MacKinnon, J. G. (2010). Wild bootstrap tests for IV regression. Journal of Business & Economic Statistics, 28(1), 128–144.

    Article  Google Scholar 

  • De Fraja, G., Oliveira, T., & Zanchi, L. (2010). Must try harder: Evaluating the role of effort in educational attainment. The Review of Economics and Statistics, 92(3), 577–597.

    Article  Google Scholar 

  • Deming, D. J., Goldin, C., Katz, L. F., & Yuchtman, N. (2015). Can online learning bend the higher education cost curve? American Economic Review: Paper & Proceedings, 105(5), 496–501.

    Article  Google Scholar 

  • Dobkin, C., Gil, R., & Marion, J. (2010). Skipping class in college and exam performance: Evidence from a regression discontinuity classroom experiment. Economics of Education Review, 29, 566–575.

    Article  Google Scholar 

  • Eren, O., & Henderson, D. J. (2008). The impact of homework on student achievement. Econometrics Journal, 11, 326–348.

    Article  Google Scholar 

  • Eren, O., & Henderson, D. J. (2011). Are we wasting our children’s time by giving them more homework? Economics of Education Review, 30, 950–961.

    Article  Google Scholar 

  • Escueta, M., Nickow, A. J., Oreopoulos, P., & Quan, V. (forthcoming). Upgrading education with technology: Insights from experimental research. Journal of Economic Literature.

  • Fryer, R. G. (2011). Financial incentives and student achievement: Evidence from randomized trials. The Quarterly Journal of Economics.

  • Furnham, A. (1986). Response bias, social desirability and dissimulation. Personality and Individual Differences, 7(3), 385–400.

    Article  Google Scholar 

  • Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing type S (Sign) and type M (Magnitude) errors. Perspectives on Psychological Science, 9(6), 641–651.

    Article  Google Scholar 

  • Ghanem, D., Hirshleifer, S. R. & Ortiz-Becerra, K. (2019). Testing attrition bias in field experiments. Working Paper.

  • Glewwe, P., & Muralidharan, K. (2016). Improving education outcomes in developing countries: Evidence, knowledge gaps, and policy implications. In E. A. Hanushek, S. Machin, & L. Woessmann (Eds.), Handbook of the economics of education (Vol. 5). London: Elsevier.

    Google Scholar 

  • Gollwitzer, P. M. (1999). Implementation intentions: Strong effects of simple plans. American Psychologist, 54, 493–503.

    Article  Google Scholar 

  • Goodman, J., Melkers, J., & Pallais, A. (2019). Can online delivery increase access to education? Journal of Labor Economics, 37(1), 1–34.

    Article  Google Scholar 

  • Grodner, A., & Rupp, N. G. (2013). The role of homework in student learning outcomes: Evidence from a field experiment. The Journal of Economic Education, 44(2), 93–109.

    Article  Google Scholar 

  • Hirshleifer, S. R. (2016). Incentives for effort or outputs? A field experiment to improve student performance. Working Paper.

  • Horowitz, J. L., & Manski, C. F. (2000). Nonparametric analysis of randomized experiments with missing covariate and outcome data. Journal of the American Statistical Association, 95(449), 77–84.

    Article  Google Scholar 

  • Jordan, K. (2015). MOOC completion rates: The data.

  • Kamenica, E. (2012). Behavioral economics and psychology of incentives. The Annual Review of Economics.

  • Kizilcec, R. F. & Brooks, C. (2017). Diverse big data and randomized field experiments in massive open online courses. In A. Wise C. Lang, G. Siemens & D. Gasevic, (eds.), Handbook on learning analytics & educational data mining (pp. 211–222).

  • Kizilcec, R. F., Brooks, C., & Halawa, S. (2015). Attrition and achievement gaps in online learning. In Proceedings of the second ACM conference on learning@scale (pp. 57–66).

  • Krohn, G. A., & O’Connor, C. M. (2005). Student effort and performance over the semester. The Journal of Economic Education, 36(1), 3–28.

    Article  Google Scholar 

  • Kuehn, Z. & Landeras, P. (2012). Study time and scholarly achievement in PISA. MPRA Paper No. 49033.

  • Lai, F., Luo, R., Zhang, L., Huang, X., & Rozelle, S. (2015). Does computer-assisted learning improve learning outcomes? Evidence from a randomized experiment in migrant schools in Beijing. Economics of Education Review, 47, 34–48.

    Article  Google Scholar 

  • Lee, D. S. (2009). Training, wages, and sample selection: Estimating sharp bounds on treatment effects. Review of Economic Studies, 76(3), 1071–1102.

    Article  Google Scholar 

  • Linden, L. (2008). Complement or substitute? The effect of technology on student achievement in india. Working Paper.

  • Metcalfe, R., Burgess, S., & Proud, S. (2019). Students’ effort and educational achievement: Using the timing of the World Cup to vary the value of leisure. Journal of Public Economics.

  • Muralidharan, K., Singh, A., & Ganimian, A. J. (2019). Disrupting education? Experimental evidence on technology-aided instruction in India. American Economic Review, 109(4), 1426–1460.

    Article  Google Scholar 

  • Nederhof, A. J. (1985). Methods of coping with social desirability bias: A review. European Journal of Social Psychology, 15(3), 263–280.

    Article  Google Scholar 

  • Oreopoulos, P., Patterson, R. W., Petronijevic, U., & Pope, N. G. (2019). When studying and nudging don’t go as planned: Unsuccessful attempts to help traditional and online college students. NBER Working Paper 25036.

  • Owen, J. D. (1995). Why our kids don’t study? An economist’s perspective. Baltimore: The John Hopkins University Press.

    Google Scholar 

  • Paunesku, D., Walton, G. M., Romero, C., Smith, E. N., Yeager, D. S., & Dweck, C. S. (2015). Mind-Set Interventions are a scalable treatment for academic underachievement. Psychological Science, 26, 784–793.

    Article  Google Scholar 

  • Rammstedt, B., & John, O. P. (2007). Measuring personality in one minute or less: A 10-item short version of the big five inventory in English and German. Journal of Research in Personality, 41(1), 203–212.

    Article  Google Scholar 

  • Renz, J., Hoffman, D., Staubitz, T. & Meinel, C. (2016). Using A/B testing in MOOC environments Using A/B testing in MOOC environments Using A/B testing in MOOC environments using A/B testing in MOOC environments. In Proceedings of the sixth international conference on learning analytics & knowledge.

  • Rotter, J. B. (1966). Generalized expectancies for internal versus external control of reinforcement. Psychological Monographs: General and Applied, 80, 1–28.

    Article  Google Scholar 

  • Seaman, J. E., Ellaine Allen, I., & Seaman, J. (2018). Grade increase: Tracking distance education in the United States. Technical Report, Babson Survey Research Group.

  • Stinebrickner, R., & Stinebrickner, T. R. (2004). Time-use and college outcomes. Journal of Econometrics, 121, 243–269.

    Article  Google Scholar 

  • Stinebrickner, R., & Stinebrickner, T. R. (2008). The causal effect of studying on academic performance. The B.E. Journal of Economic Analysis and Policy.

  • Tangney, J. P., Baumeister, R. F., & Boone, A. L. (2004). High self-control predicts good adjustment, less pathology, better grades, and interpersonal success. Journal of Personality, 72(2), 271–324.

    Article  Google Scholar 

  • Vesselinov, R., & Grego, J. (2012). Duolingo effectiveness study. Working Paper.

Download references


I thank Sandro Ambuehl, Douglas Bernheim, Eric Bettinger, Raj Chetty, Pascaline Dupas, Sarah Eichmeyer, Matthew Gentzkow, Caroline Hoxby, Susanna Loeb, Daniel Martin, Muriel Niederle, Odyssia Ng, Imran Rasul, Georg Weizsäcker, David Yang, and participants at Western Economic Association International (WEAI) 2019 and Economic Science Association (ESA) 2019 for helpful comments and discussions. This research was conducted with the support of Russell Sage Foundation (Grant No. 98-16-14). This research is approved by Stanford Panel on Non-Medical Human Subjects (IRB protocol # 36512). This experiment is registered at the AEA RCT registry (ID 0001775) as Stage 1 of a two-stage experiment. The results from Stage 2 are available in a companion paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Fulya Ersoy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3581 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ersoy, F. Returns to effort: experimental evidence from an online language platform. Exp Econ 24, 1047–1073 (2021).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Returns to effort
  • Distance learning
  • Manipulation of effort
  • Field experiment

JEL Classification

  • I23
  • I26
  • C93