Peer and Self Assessment in Massive Online Classes

Kulkarni, Chinmay; Wei, Koh Pang; Le, Huy; Chia, Daniel; Papadopoulos, Kathryn; Cheng, Justin; Koller, Daphne; Klemmer, Scott R.

doi:10.1007/978-3-319-06823-7_9

Chinmay Kulkarni⁶,
Koh Pang Wei^6,7,
Huy Le⁷,
Daniel Chia^6,7,
Kathryn Papadopoulos⁶,
Justin Cheng⁶,
Daphne Koller^6,7 &
…
Scott R. Klemmer^6,8

Part of the book series: Understanding Innovation ((UNDINNO))

7276 Accesses
29 Citations
1 Altmetric

Abstract

Peer and self assessment offer an opportunity to scale both assessment and learning to global classrooms. This paper reports our experiences with two iterations of the first large online class to use peer and self assessment. In this class, peer grades correlated highly with staff-assigned grades. The second iteration had 42.9 % of students’ grades within 5 % of the staff grade, and 65.5 % within 10 %. On average, students assessed their work 7 % higher than staff did. Students also rated peers’ work from their own country 3.6 % higher than those from elsewhere. We performed three experiments to improve grading accuracy. We found that giving students feedback about their grading bias increased subsequent accuracy. We introduce short, customizable feedback snippets that cover common issues with assignments, providing students more qualitative peer feedback. Finally, we introduce a data-driven approach that highlights high-variance items for improvement. We find that rubrics that use a parallel sentence structure, unambiguous wording and well-specified dimensions have lower variance. After revising rubrics, median grading error decreased from 12.4 to 9.9 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Closing the Circle: Use of Students’ Responses for Peer-Assessment Rubric Improvement

Understanding the benefits of providing peer feedback: how students respond to peers’ texts of varying quality

Article 30 May 2015

University students’ strategies and criteria during self-assessment: instructor’s feedback, rubrics, and year level effects

Article Open access 24 October 2022

Notes

1.
https://www.coursera.org/
2.
All assessment materials are also available in full at http://hci.st/assess
3.
https://cs147.stanford.edu/
4.
Staff comprised graduate students from Stanford. The second iteration had Community TAs chosen among top-performing students in the previous iteration in addition to Stanford staff.

References

Alben L (1996) Defining the criteria for effective interaction design. Interactions 3(3):11–15
Article Google Scholar
Amabile TM (1982) Social psychology of creativity: a consensual assessment technique. J Pers Soc Psychol 43(2):997–1013
Article Google Scholar
Anderson JR, Bower GH (1972) Recognition and retrieval processes in free recall. Psychol Rev 79(2):97–123
Article Google Scholar
Andrade HG (2005) Teaching with rubrics: the good, the bad, and the ugly. Coll Teach 53(1):27–31
Article Google Scholar
Bennett RE (1998) Validity and automated scoring: it’s not only the scoring. Educ Meas Issues Pract 17:4
Article Google Scholar
Bennett RE, Steffen M, Singley MK, Morley M, Jacquemin D (1997) Evaluating an automatically scorable, open-ended response type for measuring mathematical reasoning in computer-adaptive tests. J Educ Meas 34(2):162–76
Article Google Scholar
Boud D (1995) Enhancing learning through self assessment. Routledge, London
Google Scholar
Boud D (2000) Sustainable assessment: rethinking assessment for the learning society. Stud Contin Educ 22(2):151–167
Article Google Scholar
Breslow LB, Pritchard DE, DeBoer J, Stump GS, Ho AD, Seaton DT (2013) Studying learning in the worldwide classroom: research into edX’s first MOOC. Res Pract Assess 8:13–25
Google Scholar
Buxton B (2007) Sketching user experiences: getting the design right and the right design. Morgan Kaufmann, San Francisco, CA
Google Scholar
Cadiz JJ, Balachandran A, Sanocki E, Gupta A, Grudin J, Jancke G (2000) Distance learning through distributed collaborative video viewing. In: ACM conference on computer supported cooperative work, ACM, pp 135–144
Google Scholar
Carlson PA, Berry FC (2003) Calibrated peer review and assessing learning outcomes. In: frontiers in education conference, vol 2. STIPES
Google Scholar
Carter S, Mankoff J, Klemmer SR, Matthews T (2008) Exiting the cleanroom: on ecological validity and ubiquitous computing. Hum Comput Interact 23(1):47–99
Article Google Scholar
Cennamo K, Douglas SA, Vernon M, Brandt C, Scott B, Reimer Y, McGrath M (2011) Promoting creativity in the computer science design studio. In: Proceedings of the 42nd ACM technical symposium on computer science education, ACM, pp 649–654
Google Scholar
Cheshire C, Antin J (2008) The social psychological effects of feedback on the production of Internet information pools. J Comput Mediat Commun 13(3):705–727
Article Google Scholar
Chi EH (2009) A position paper on ‘living laboratories’: rethinking ecological designs and experimentation in human-computer interaction. In: Proceedings of the 13th international conference on human- computer interaction. Part I: new trends, Springer, pp 597–605
Google Scholar
Chinn D (2005) Peer assessment in the algorithms course. ACM SIGCSE Bullet 37(3):69–73
Article Google Scholar
Conti R, Coon H, Amabile TM (1996) Evidence to support the componential model of creativity: secondary analyses of three studies. Creat Res J 9(4):385–389
Article Google Scholar
Corbett AT, Koedinaer KR, Haaley W (2002) Cognitive tutors: from the research classroom to all classrooms. In: Goodman PS (ed) Technology enhanced learning: opportunities for change. Lawrence Erlbaum Associates, Mahwah, NJ, p 235
Google Scholar
Dai P, Mausam D, Weld DS (2010) Decision-theoretic control of crowd-sourced workflows. In: In the 24th AAAI conference on artificial intelligence (AAAI10), Citeseer
Google Scholar
Dannels DP, Martin KN (2008) Critiquing critiques a genre analysis of feedback across novice to expert design studios. J Bus Tech Commun 22(2):135–159
Article Google Scholar
De la Harpe B, Peterson JF, Frankham N, Zehner R, Neale D, Musgrave E, McDermott R (2009) Assessment focus in studio: what is most prominent in architecture, art and design? Int J Art Des Educ 28(1):37–51
Article Google Scholar
Dow SP, Glassco A, Kass J, Schwarz M, Schwartz DL, Klemmer SR (2010) Parallel prototyping leads to better design results, more divergence, and increased self-efficacy. ACM Trans Comput Hum Interact 17(4):18
Article Google Scholar
Dow S, Kulkarni A, Klemmer S, Hartmann B (2012) Shepherding the crowd yields better work. In: Proceedings of the ACM 2012 conference on computer supported cooperative work, ACM, pp 1013–1022
Google Scholar
Drexler A, Chafee R et al (1977) The architecture of the ecole des beaux-arts. MIT, Cambridge, MA
Google Scholar
Efron B, Tibshirani R (1993) An introduction to the bootstrap, vol 57. Chapman & Hall, New York
Book Google Scholar
Ehrlinger J, Johnson K, Banner M, Dunning D, Kruger J (2008) Why the unskilled are unaware: further explorations of (absent) self-insight among the incompetent. Organ Behav Hum Decis Process 105(1):98–121
Article Google Scholar
Falchikov N, Goldfinch J (2000) Student peer assessment in higher education: a meta-analysis comparing peer and teacher marks. Rev Educ Res 70(3):287–322
Article Google Scholar
Fallman D (2003) Design-oriented human-computer interaction. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 225–232
Google Scholar
Feldman EB (1994) Practical art criticism. Prentice Hall, New York
Google Scholar
Forlizzi J, Battarbee K (2004) Understanding experience in interactive systems. In: Proceedings of the 5th conference on designing interactive systems: processes, practices, methods, and techniques, ACM, pp 261–268
Google Scholar
Fox A, Patterson D (2012) Crossing the software education chasm. Commun ACM 55(5):44–49
Article Google Scholar
Galinsky AD, Moskowitz GB (2000) Counterfactuals as behavioral primes: priming the simulation heuristic and consideration of alternatives. J Exp Soc Psychol 36(4):384–409
Article Google Scholar
Gallien T, Oomen-Early J (2008) Personalized versus collective instructor feedback in the online courseroom: does type of feedback affect student satisfaction, academic performance and perceived connectedness with the instructor? Int J E-Learn 7(3):463–476
Google Scholar
Gerdeman RD, Russell AA, Worden KJ (2007) Web-based student writing and reviewing in a large biology lecture course. J Coll Sci Teach 36(5):46–52
Google Scholar
Greenberg S (2009) Embedding a design studio course in a conventional computer science program. In: Kotzé P et al (eds) Creativity and HCI: from experience to design in education. Springer, Boston, MA, pp 23–41
Google Scholar
Guo S, Parameswaran A, Garcia-Molina H (2012) So who won?: dynamic max discovery with the crowd. In: Proceedings of the 2012 international conference on Management of Data, ACM, pp 385–396
Google Scholar
Hearst MA (2000) The debate on automated essay grading. IEEE Intell Syst Appl 15(5):22–37
Article Google Scholar
Hsi S, Agogino AM (1995) Scaffolding knowledge integration through designing multimedia case studies of engineering design. In: Proceedings of the frontiers in education conference, 1995, vol 2, IEEE, p 4d1–1
Google Scholar
Huang SW, Fu WT (2013) Enhancing reliability using peer consistency evaluation in human computation. In: Proceedings of ACM: computer supported collaborative work, ACM
Google Scholar
Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation, ACM, pp 64–67
Google Scholar
Kaufman JC, Baer J, Cole JC, Sexton JD (2008) A comparison of expert and nonexpert raters using the consensual assessment technique. Creat Res J 20(2):171–178
Article Google Scholar
Khatib F, DiMaio F, Cooper S, Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I, Thompson J, Popović Z et al (2011) Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nat Struct Mol Biol 18(10):1175–1177
Article Google Scholar
Kim H, Hinds P (2012) Harmony vs. disruption: the effect of iterative prototyping on teams creative processes and outcomes in the West and the East. In: Proceedings of the ICIC: international conference on intercultural collaboration, ACM
Google Scholar
Kittur A, Nickerson J, Bernstein M, Gerber E, Shaw A, Zimmerman J, Lease M, Horton J (2013) The future of crowd work. In: ACM conference on computer supported coooperative work (CSCW 2013)
Google Scholar
Kizilcec RF, Piech C, Schneider E (2013) Deconstructing disengagement: analyzing learner subpopulations in massive open online courses. Proceedings of the Third International Conference on Learning Analytics and Knowledge, LAK’13, Leuven, Belgium, pp 170–179, ISBN 978-1-4503-1785-6, ACM, New York
Google Scholar
Klemmer SR, Hartmann B, Takayama L (2006) How bodies matter: five themes for interaction design. In: Proceedings of the 6th conference on designing interactive systems, ACM, pp 140–149
Google Scholar
Kraut RE, Resnick P (2011) Evidence-based social design: mining the social sciences to build online communities. MIT, Cambridge, MA
Google Scholar
Kuebli JE, Harvey RD, Korn JH (2008) Critical thinking in critical courses: principles and applications. In: Dunn DS, Halonen JS, Smith RA (eds) Teaching critical thinking in psychology: a handbook of best practices. Wiley, New York, p 137
Chapter Google Scholar
Kurhila J (2012) Human-computer interaction by coursera opened for credit for the students of the department. http://www.cs.helsinki.fi/en/uutiset/72025
Lawson B (2006) How designers think: the design process demystified. Architectual Press, Oxford
Google Scholar
Lewin T (2012a) Education site expands slate of universities and courses. The New York Times
Google Scholar
Lewin T (2012b) One course, 150,000 students. The New York Times, July 2012
Google Scholar
Lewin T (2013a) College of future could be come one, come all. January 2013, The New York Times
Google Scholar
Lewin T (2013b) Five online courses are eligible for college credit. February 2013, The New York Times
Google Scholar
Lewin T (2013c) Students rush to web classes, but profits may be much later. January 2013, The New York Times
Google Scholar
Lewin T (2013d) Universities abroad join partnerships on the Web. February 2013, The New York Times
Google Scholar
Little JL, Bjork EL (2012) Pretesting with multiple-choice questions facilitates learning. In: Proceedings of the annual meeting of the cognitive science society
Google Scholar
Markman AB, Gentner D (1993) Splitting the differences: a structural alignment view of similarity. J Mem Lang 32(1993):517–517
Article Google Scholar
Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: activity traces and personal profiles in github. In: Proceedings of the 2013 conference on computer supported cooperative work, ACM, pp 117–128
Google Scholar
Martin FG (2012) Will massive open online courses change how we teach? Commun ACM 55(8):26–28
Article Google Scholar
Mazar N, Amir O, Ariely D (2008) The dishonesty of honest people: a theory of self-concept maintenance. J Marketing Res 45(6):633–644
Article Google Scholar
Murtaugh PA, Burns LD, Schuster J (1999) Predicting the retention of university students. Res High Educ 40(3):355–371
Article Google Scholar
Nicol DJ, Macfarlane-Dick D (2006) Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Stud High Educ 31(2):199–218
Article Google Scholar
Nielsen J (1993) Iterative user-interface design. Computer 26(11):32–41
Article Google Scholar
Nielsen J (1994) Enhancing the explanatory power of usability heuristics. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 152–158
Google Scholar
Palen L (1999) Social, individual and technological issues for groupware calendar systems. In: Proceedings of the SIGCHI conference on human factors in computing systems: the CHI is the limit, ACM, pp 17–24
Google Scholar
Pendleton-Jullian A (2010) Four (+1) Studios. CreateSpace Independent Publishing, New York, NY
Google Scholar
Perry WG (1970) Forms of intellectual development in the college years. Holt, New York
Google Scholar
Pintrich PR (1995) Understanding self-regulated learning. New Dir Teach Learn 63:3–12
Article Google Scholar
Pintrich P, Zusho A (2007) Student motivation and self-regulated learning in the college classroom. In: Perry R, Smart J (eds) The scholarship of teaching and learning in higher education: an evidence-based perspective. Springer, New York, pp 731–810
Google Scholar
Reimer YJ, Douglas SA (2003) Teaching HCI design with the studio approach. Comput Sci Educ 13(3):191–205
Article Google Scholar
Roberts E, Lilly J, Rollins B (1995) Using undergraduates as teaching assistants in introductory programming courses: an update on the Stanford experience. ACM SIGCSE Bullet 27(1):48–52
Article Google Scholar
Schön D (1985) The design studio: an exploration of its traditions and potential. Royal Institute of British Architects, London
Google Scholar
Snodgrass A, Coyne R (2006) Interpretation in architecture: design as a way of thinking. Routledge, London
Google Scholar
Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 conference on empirical methods in natural language processing (EMNLP)
Google Scholar
Stanley CA, Porter ME (2002) Engaging large classes: strategies and techniques for college faculty. Anker Publishing Company, Bolton, MA
Google Scholar
Surowiecki J (2005) The wisdom of crowds. Anchor, New York
Google Scholar
Szpir M (2002) Clickworkers on mars. Am Sci 90(3):13–25
Google Scholar
Tinapple D, Olson L, Sadauskas J (2013) CritViz: Web-based software supporting peer critique in large creative classrooms. Bullet IEEE Tech Committee Learn Technol 15(1):29
Google Scholar
Tohidi M, Buxton W, Baecker R, Sellen A (2006) Getting the right design and the design right. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 1243–1252
Google Scholar
Tomayko JE (1991) Teaching software development in a studio environment. ACM SIGCSE Bullet 23(1):300–303
Article Google Scholar
Topping K (1998) Peer assessment between students in colleges and universities. Rev Educ Res 68(3):249–276
Article Google Scholar
Uluoglu B (2000) Design knowledge communicated in studio critiques. Des Stud 21(1):33–58
Article Google Scholar
Venables A, Summit R (2003) Enhancing scientific essay writing using peer assessment. Innov Educ Teach Int 40(3):281–290
Article Google Scholar
Widom J (2012) From 100 students to 100,000, ACM SIGMOD blog. http://wp.sigmod.org/?p=165
Winograd T (1990) What can we teach about human-computer interaction?(plenary address). In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 443–448
Google Scholar
Zaidan OF, Callison-Burch C (2011) Crowdsourcing translation: professional quality from nonprofessionals. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. pp 1220–1229
Google Scholar
Zimmerman BJ, Schunk DH (2001) Reflections on theories of self-regulated learning and academic achievement. Self-regulated learning and academic achievement. Theor Perspect 2(2001):289–307
Google Scholar

Download references

Acknowledgements

We thank Coursera for implementing this peer assessment system, and enabling us to use the data. We thank Sébastien Robaszkiewicz, Joy Kim and our Community TAs for helping revise assignments, assess student submissions, and provide forum support; Nisha Masharani for helping collect data and for designing fortune-cookie feedback; and Sébastien Robaszkiewicz and Julie Fortuna for rating fortune cookies. We thank Greg Little for discussions about online peer assessment and Michael Bernstein for comments on drafts. We thank Jane Manning and colleagues at Stanford for supporting this class’s development. We thank our editor Marti Hearst and anonymous reviewers for their valuable feedback and suggestions. Human subjects research was reviewed by the Stanford Institutional Review Board through protocol 25001. We thank all of the students in the HCI online class for their foolhardiness enthusiasm in participating in this experimental class.

Author information

Authors and Affiliations

Stanford University, Palo Alto, CA, USA
Chinmay Kulkarni, Koh Pang Wei, Daniel Chia, Kathryn Papadopoulos, Justin Cheng, Daphne Koller & Scott R. Klemmer
Coursera, Inc., Mountain View, CA, USA
Koh Pang Wei, Huy Le, Daniel Chia & Daphne Koller
Computer Science and Engineering, University of California, San Diego, CA, USA
Scott R. Klemmer

Authors

Chinmay Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Koh Pang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Huy Le
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Chia
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Justin Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Daphne Koller
View author publications
You can also search for this author in PubMed Google Scholar
Scott R. Klemmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chinmay Kulkarni .

Editor information

Editors and Affiliations

Campus Griebnitzsee, Hasso Plattner Institute for Software Systems Engineering, Potsdam, Germany
Hasso Plattner
Campus Griebnitzsee, Hasso Plattner Institute for Software Systems Engineering, Potsdam, Germany
Christoph Meinel
Center for Design Research (CDR), Stanford University, Stanford, California, USA
Larry Leifer

Appendices

Appendix 1 1.1 1.1 Agreement Between Peer Grades and Staff Grades Without Aggregation

Comparing the peer grades (not their medians) with staff grades demonstrates the value of aggregating peer grades (Fig. 21). 26.3 % of grades were within 5 % of staff grades, and 46.7 % within 10 %. (Recall that the median agreement was 42.0 % and 65.5 %, respectively).

1.2 1.2 Grading Differences

1.2.1 1.2.1 Where Peers Graded Higher

Figure 22a shows an application a student created as “an interactive website which helps people tracking their eating behavior and overall-feeling, to find and be able to avoid certain foods which causes discomfort or health related problems.” Peers rated the prototype highly for being “interactive”. Staff, rated it low, because “while fully functional, the design does not seem appropriate to the goal. The diary aspect seems to be the main aspect of the app, yet it’s hidden behind a search bar.”

1.2.2 1.2.2 Where Peers Graded Lower

Figure 22b shows an application a student created as an “exciting platform, bored children can engage (physically) with other children in their neighborhood.” Staff praised it as “fully interactive, page flow is complete”, while some peers rated it “unpolished”, and asked the student to “Try to make UI less coloured.”

Appendix 2: Sample Rubric

Table 3 shows a rubric for the “Ready for testing” assignment. All other rubrics are available as online supplementary materials.

Table 3 Rubric for “Ready for Testing” assignment

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kulkarni, C. et al. (2015). Peer and Self Assessment in Massive Online Classes. In: Plattner, H., Meinel, C., Leifer, L. (eds) Design Thinking Research. Understanding Innovation. Springer, Cham. https://doi.org/10.1007/978-3-319-06823-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-06823-7_9
Published: 21 May 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06822-0
Online ISBN: 978-3-319-06823-7
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics