Skip to main content

Peer and Self Assessment in Massive Online Classes

  • Chapter
  • First Online:
Design Thinking Research

Abstract

Peer and self assessment offer an opportunity to scale both assessment and learning to global classrooms. This paper reports our experiences with two iterations of the first large online class to use peer and self assessment. In this class, peer grades correlated highly with staff-assigned grades. The second iteration had 42.9 % of students’ grades within 5 % of the staff grade, and 65.5 % within 10 %. On average, students assessed their work 7 % higher than staff did. Students also rated peers’ work from their own country 3.6 % higher than those from elsewhere. We performed three experiments to improve grading accuracy. We found that giving students feedback about their grading bias increased subsequent accuracy. We introduce short, customizable feedback snippets that cover common issues with assignments, providing students more qualitative peer feedback. Finally, we introduce a data-driven approach that highlights high-variance items for improvement. We find that rubrics that use a parallel sentence structure, unambiguous wording and well-specified dimensions have lower variance. After revising rubrics, median grading error decreased from 12.4 to 9.9 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.coursera.org/

  2. 2.

    All assessment materials are also available in full at http://hci.st/assess

  3. 3.

    https://cs147.stanford.edu/

  4. 4.

    Staff comprised graduate students from Stanford. The second iteration had Community TAs chosen among top-performing students in the previous iteration in addition to Stanford staff.

References

  • Alben L (1996) Defining the criteria for effective interaction design. Interactions 3(3):11–15

    Article  Google Scholar 

  • Amabile TM (1982) Social psychology of creativity: a consensual assessment technique. J Pers Soc Psychol 43(2):997–1013

    Article  Google Scholar 

  • Anderson JR, Bower GH (1972) Recognition and retrieval processes in free recall. Psychol Rev 79(2):97–123

    Article  Google Scholar 

  • Andrade HG (2005) Teaching with rubrics: the good, the bad, and the ugly. Coll Teach 53(1):27–31

    Article  Google Scholar 

  • Bennett RE (1998) Validity and automated scoring: it’s not only the scoring. Educ Meas Issues Pract 17:4

    Article  Google Scholar 

  • Bennett RE, Steffen M, Singley MK, Morley M, Jacquemin D (1997) Evaluating an automatically scorable, open-ended response type for measuring mathematical reasoning in computer-adaptive tests. J Educ Meas 34(2):162–76

    Article  Google Scholar 

  • Boud D (1995) Enhancing learning through self assessment. Routledge, London

    Google Scholar 

  • Boud D (2000) Sustainable assessment: rethinking assessment for the learning society. Stud Contin Educ 22(2):151–167

    Article  Google Scholar 

  • Breslow LB, Pritchard DE, DeBoer J, Stump GS, Ho AD, Seaton DT (2013) Studying learning in the worldwide classroom: research into edX’s first MOOC. Res Pract Assess 8:13–25

    Google Scholar 

  • Buxton B (2007) Sketching user experiences: getting the design right and the right design. Morgan Kaufmann, San Francisco, CA

    Google Scholar 

  • Cadiz JJ, Balachandran A, Sanocki E, Gupta A, Grudin J, Jancke G (2000) Distance learning through distributed collaborative video viewing. In: ACM conference on computer supported cooperative work, ACM, pp 135–144

    Google Scholar 

  • Carlson PA, Berry FC (2003) Calibrated peer review and assessing learning outcomes. In: frontiers in education conference, vol 2. STIPES

    Google Scholar 

  • Carter S, Mankoff J, Klemmer SR, Matthews T (2008) Exiting the cleanroom: on ecological validity and ubiquitous computing. Hum Comput Interact 23(1):47–99

    Article  Google Scholar 

  • Cennamo K, Douglas SA, Vernon M, Brandt C, Scott B, Reimer Y, McGrath M (2011) Promoting creativity in the computer science design studio. In: Proceedings of the 42nd ACM technical symposium on computer science education, ACM, pp 649–654

    Google Scholar 

  • Cheshire C, Antin J (2008) The social psychological effects of feedback on the production of Internet information pools. J Comput Mediat Commun 13(3):705–727

    Article  Google Scholar 

  • Chi EH (2009) A position paper on ‘living laboratories’: rethinking ecological designs and experimentation in human-computer interaction. In: Proceedings of the 13th international conference on human- computer interaction. Part I: new trends, Springer, pp 597–605

    Google Scholar 

  • Chinn D (2005) Peer assessment in the algorithms course. ACM SIGCSE Bullet 37(3):69–73

    Article  Google Scholar 

  • Conti R, Coon H, Amabile TM (1996) Evidence to support the componential model of creativity: secondary analyses of three studies. Creat Res J 9(4):385–389

    Article  Google Scholar 

  • Corbett AT, Koedinaer KR, Haaley W (2002) Cognitive tutors: from the research classroom to all classrooms. In: Goodman PS (ed) Technology enhanced learning: opportunities for change. Lawrence Erlbaum Associates, Mahwah, NJ, p 235

    Google Scholar 

  • Dai P, Mausam D, Weld DS (2010) Decision-theoretic control of crowd-sourced workflows. In: In the 24th AAAI conference on artificial intelligence (AAAI10), Citeseer

    Google Scholar 

  • Dannels DP, Martin KN (2008) Critiquing critiques a genre analysis of feedback across novice to expert design studios. J Bus Tech Commun 22(2):135–159

    Article  Google Scholar 

  • De la Harpe B, Peterson JF, Frankham N, Zehner R, Neale D, Musgrave E, McDermott R (2009) Assessment focus in studio: what is most prominent in architecture, art and design? Int J Art Des Educ 28(1):37–51

    Article  Google Scholar 

  • Dow SP, Glassco A, Kass J, Schwarz M, Schwartz DL, Klemmer SR (2010) Parallel prototyping leads to better design results, more divergence, and increased self-efficacy. ACM Trans Comput Hum Interact 17(4):18

    Article  Google Scholar 

  • Dow S, Kulkarni A, Klemmer S, Hartmann B (2012) Shepherding the crowd yields better work. In: Proceedings of the ACM 2012 conference on computer supported cooperative work, ACM, pp 1013–1022

    Google Scholar 

  • Drexler A, Chafee R et al (1977) The architecture of the ecole des beaux-arts. MIT, Cambridge, MA

    Google Scholar 

  • Efron B, Tibshirani R (1993) An introduction to the bootstrap, vol 57. Chapman & Hall, New York

    Book  Google Scholar 

  • Ehrlinger J, Johnson K, Banner M, Dunning D, Kruger J (2008) Why the unskilled are unaware: further explorations of (absent) self-insight among the incompetent. Organ Behav Hum Decis Process 105(1):98–121

    Article  Google Scholar 

  • Falchikov N, Goldfinch J (2000) Student peer assessment in higher education: a meta-analysis comparing peer and teacher marks. Rev Educ Res 70(3):287–322

    Article  Google Scholar 

  • Fallman D (2003) Design-oriented human-computer interaction. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 225–232

    Google Scholar 

  • Feldman EB (1994) Practical art criticism. Prentice Hall, New York

    Google Scholar 

  • Forlizzi J, Battarbee K (2004) Understanding experience in interactive systems. In: Proceedings of the 5th conference on designing interactive systems: processes, practices, methods, and techniques, ACM, pp 261–268

    Google Scholar 

  • Fox A, Patterson D (2012) Crossing the software education chasm. Commun ACM 55(5):44–49

    Article  Google Scholar 

  • Galinsky AD, Moskowitz GB (2000) Counterfactuals as behavioral primes: priming the simulation heuristic and consideration of alternatives. J Exp Soc Psychol 36(4):384–409

    Article  Google Scholar 

  • Gallien T, Oomen-Early J (2008) Personalized versus collective instructor feedback in the online courseroom: does type of feedback affect student satisfaction, academic performance and perceived connectedness with the instructor? Int J E-Learn 7(3):463–476

    Google Scholar 

  • Gerdeman RD, Russell AA, Worden KJ (2007) Web-based student writing and reviewing in a large biology lecture course. J Coll Sci Teach 36(5):46–52

    Google Scholar 

  • Greenberg S (2009) Embedding a design studio course in a conventional computer science program. In: Kotzé P et al (eds) Creativity and HCI: from experience to design in education. Springer, Boston, MA, pp 23–41

    Google Scholar 

  • Guo S, Parameswaran A, Garcia-Molina H (2012) So who won?: dynamic max discovery with the crowd. In: Proceedings of the 2012 international conference on Management of Data, ACM, pp 385–396

    Google Scholar 

  • Hearst MA (2000) The debate on automated essay grading. IEEE Intell Syst Appl 15(5):22–37

    Article  Google Scholar 

  • Hsi S, Agogino AM (1995) Scaffolding knowledge integration through designing multimedia case studies of engineering design. In: Proceedings of the frontiers in education conference, 1995, vol 2, IEEE, p 4d1–1

    Google Scholar 

  • Huang SW, Fu WT (2013) Enhancing reliability using peer consistency evaluation in human computation. In: Proceedings of ACM: computer supported collaborative work, ACM

    Google Scholar 

  • Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation, ACM, pp 64–67

    Google Scholar 

  • Kaufman JC, Baer J, Cole JC, Sexton JD (2008) A comparison of expert and nonexpert raters using the consensual assessment technique. Creat Res J 20(2):171–178

    Article  Google Scholar 

  • Khatib F, DiMaio F, Cooper S, Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I, Thompson J, Popović Z et al (2011) Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nat Struct Mol Biol 18(10):1175–1177

    Article  Google Scholar 

  • Kim H, Hinds P (2012) Harmony vs. disruption: the effect of iterative prototyping on teams creative processes and outcomes in the West and the East. In: Proceedings of the ICIC: international conference on intercultural collaboration, ACM

    Google Scholar 

  • Kittur A, Nickerson J, Bernstein M, Gerber E, Shaw A, Zimmerman J, Lease M, Horton J (2013) The future of crowd work. In: ACM conference on computer supported coooperative work (CSCW 2013)

    Google Scholar 

  • Kizilcec RF, Piech C, Schneider E (2013) Deconstructing disengagement: analyzing learner subpopulations in massive open online courses. Proceedings of the Third International Conference on Learning Analytics and Knowledge, LAK’13, Leuven, Belgium, pp 170–179, ISBN 978-1-4503-1785-6, ACM, New York

    Google Scholar 

  • Klemmer SR, Hartmann B, Takayama L (2006) How bodies matter: five themes for interaction design. In: Proceedings of the 6th conference on designing interactive systems, ACM, pp 140–149

    Google Scholar 

  • Kraut RE, Resnick P (2011) Evidence-based social design: mining the social sciences to build online communities. MIT, Cambridge, MA

    Google Scholar 

  • Kuebli JE, Harvey RD, Korn JH (2008) Critical thinking in critical courses: principles and applications. In: Dunn DS, Halonen JS, Smith RA (eds) Teaching critical thinking in psychology: a handbook of best practices. Wiley, New York, p 137

    Chapter  Google Scholar 

  • Kurhila J (2012) Human-computer interaction by coursera opened for credit for the students of the department. http://www.cs.helsinki.fi/en/uutiset/72025

  • Lawson B (2006) How designers think: the design process demystified. Architectual Press, Oxford

    Google Scholar 

  • Lewin T (2012a) Education site expands slate of universities and courses. The New York Times

    Google Scholar 

  • Lewin T (2012b) One course, 150,000 students. The New York Times, July 2012

    Google Scholar 

  • Lewin T (2013a) College of future could be come one, come all. January 2013, The New York Times

    Google Scholar 

  • Lewin T (2013b) Five online courses are eligible for college credit. February 2013, The New York Times

    Google Scholar 

  • Lewin T (2013c) Students rush to web classes, but profits may be much later. January 2013, The New York Times

    Google Scholar 

  • Lewin T (2013d) Universities abroad join partnerships on the Web. February 2013, The New York Times

    Google Scholar 

  • Little JL, Bjork EL (2012) Pretesting with multiple-choice questions facilitates learning. In: Proceedings of the annual meeting of the cognitive science society

    Google Scholar 

  • Markman AB, Gentner D (1993) Splitting the differences: a structural alignment view of similarity. J Mem Lang 32(1993):517–517

    Article  Google Scholar 

  • Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: activity traces and personal profiles in github. In: Proceedings of the 2013 conference on computer supported cooperative work, ACM, pp 117–128

    Google Scholar 

  • Martin FG (2012) Will massive open online courses change how we teach? Commun ACM 55(8):26–28

    Article  Google Scholar 

  • Mazar N, Amir O, Ariely D (2008) The dishonesty of honest people: a theory of self-concept maintenance. J Marketing Res 45(6):633–644

    Article  Google Scholar 

  • Murtaugh PA, Burns LD, Schuster J (1999) Predicting the retention of university students. Res High Educ 40(3):355–371

    Article  Google Scholar 

  • Nicol DJ, Macfarlane-Dick D (2006) Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Stud High Educ 31(2):199–218

    Article  Google Scholar 

  • Nielsen J (1993) Iterative user-interface design. Computer 26(11):32–41

    Article  Google Scholar 

  • Nielsen J (1994) Enhancing the explanatory power of usability heuristics. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 152–158

    Google Scholar 

  • Palen L (1999) Social, individual and technological issues for groupware calendar systems. In: Proceedings of the SIGCHI conference on human factors in computing systems: the CHI is the limit, ACM, pp 17–24

    Google Scholar 

  • Pendleton-Jullian A (2010) Four (+1) Studios. CreateSpace Independent Publishing, New York, NY

    Google Scholar 

  • Perry WG (1970) Forms of intellectual development in the college years. Holt, New York

    Google Scholar 

  • Pintrich PR (1995) Understanding self-regulated learning. New Dir Teach Learn 63:3–12

    Article  Google Scholar 

  • Pintrich P, Zusho A (2007) Student motivation and self-regulated learning in the college classroom. In: Perry R, Smart J (eds) The scholarship of teaching and learning in higher education: an evidence-based perspective. Springer, New York, pp 731–810

    Google Scholar 

  • Reimer YJ, Douglas SA (2003) Teaching HCI design with the studio approach. Comput Sci Educ 13(3):191–205

    Article  Google Scholar 

  • Roberts E, Lilly J, Rollins B (1995) Using undergraduates as teaching assistants in introductory programming courses: an update on the Stanford experience. ACM SIGCSE Bullet 27(1):48–52

    Article  Google Scholar 

  • Schön D (1985) The design studio: an exploration of its traditions and potential. Royal Institute of British Architects, London

    Google Scholar 

  • Snodgrass A, Coyne R (2006) Interpretation in architecture: design as a way of thinking. Routledge, London

    Google Scholar 

  • Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 conference on empirical methods in natural language processing (EMNLP)

    Google Scholar 

  • Stanley CA, Porter ME (2002) Engaging large classes: strategies and techniques for college faculty. Anker Publishing Company, Bolton, MA

    Google Scholar 

  • Surowiecki J (2005) The wisdom of crowds. Anchor, New York

    Google Scholar 

  • Szpir M (2002) Clickworkers on mars. Am Sci 90(3):13–25

    Google Scholar 

  • Tinapple D, Olson L, Sadauskas J (2013) CritViz: Web-based software supporting peer critique in large creative classrooms. Bullet IEEE Tech Committee Learn Technol 15(1):29

    Google Scholar 

  • Tohidi M, Buxton W, Baecker R, Sellen A (2006) Getting the right design and the design right. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 1243–1252

    Google Scholar 

  • Tomayko JE (1991) Teaching software development in a studio environment. ACM SIGCSE Bullet 23(1):300–303

    Article  Google Scholar 

  • Topping K (1998) Peer assessment between students in colleges and universities. Rev Educ Res 68(3):249–276

    Article  Google Scholar 

  • Uluoglu B (2000) Design knowledge communicated in studio critiques. Des Stud 21(1):33–58

    Article  Google Scholar 

  • Venables A, Summit R (2003) Enhancing scientific essay writing using peer assessment. Innov Educ Teach Int 40(3):281–290

    Article  Google Scholar 

  • Widom J (2012) From 100 students to 100,000, ACM SIGMOD blog. http://wp.sigmod.org/?p=165

  • Winograd T (1990) What can we teach about human-computer interaction?(plenary address). In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 443–448

    Google Scholar 

  • Zaidan OF, Callison-Burch C (2011) Crowdsourcing translation: professional quality from nonprofessionals. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. pp 1220–1229

    Google Scholar 

  • Zimmerman BJ, Schunk DH (2001) Reflections on theories of self-regulated learning and academic achievement. Self-regulated learning and academic achievement. Theor Perspect 2(2001):289–307

    Google Scholar 

Download references

Acknowledgements

We thank Coursera for implementing this peer assessment system, and enabling us to use the data. We thank Sébastien Robaszkiewicz, Joy Kim and our Community TAs for helping revise assignments, assess student submissions, and provide forum support; Nisha Masharani for helping collect data and for designing fortune-cookie feedback; and Sébastien Robaszkiewicz and Julie Fortuna for rating fortune cookies. We thank Greg Little for discussions about online peer assessment and Michael Bernstein for comments on drafts. We thank Jane Manning and colleagues at Stanford for supporting this class’s development. We thank our editor Marti Hearst and anonymous reviewers for their valuable feedback and suggestions. Human subjects research was reviewed by the Stanford Institutional Review Board through protocol 25001. We thank all of the students in the HCI online class for their foolhardiness enthusiasm in participating in this experimental class.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chinmay Kulkarni .

Editor information

Editors and Affiliations

Appendices

Appendix 1

1.1 1.1 Agreement Between Peer Grades and Staff Grades Without Aggregation

Comparing the peer grades (not their medians) with staff grades demonstrates the value of aggregating peer grades (Fig. 21). 26.3 % of grades were within 5 % of staff grades, and 46.7 % within 10 %. (Recall that the median agreement was 42.0 % and 65.5 %, respectively).

Fig. 21
figure 21

Agreement of unaggregated peer grades and staff grades. Agreement is much lower than between median peer grades and staff grades

1.2 1.2 Grading Differences

1.2.1 1.2.1 Where Peers Graded Higher

Figure 22a shows an application a student created as “an interactive website which helps people tracking their eating behavior and overall-feeling, to find and be able to avoid certain foods which causes discomfort or health related problems.” Peers rated the prototype highly for being “interactive”. Staff, rated it low, because “while fully functional, the design does not seem appropriate to the goal. The diary aspect seems to be the main aspect of the app, yet it’s hidden behind a search bar.”

Fig. 22
figure 22

Student submissions with large differences between staff and peer grades. (a) Submission where peers grade higher than staff, (b) Submission with staff grade higher than peers

1.2.2 1.2.2 Where Peers Graded Lower

Figure 22b shows an application a student created as an “exciting platform, bored children can engage (physically) with other children in their neighborhood.” Staff praised it as “fully interactive, page flow is complete”, while some peers rated it “unpolished”, and asked the student to “Try to make UI less coloured.”

Appendix 2: Sample Rubric

Table 3 shows a rubric for the “Ready for testing” assignment. All other rubrics are available as online supplementary materials.

Table 3 Rubric for “Ready for Testing” assignment

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Kulkarni, C. et al. (2015). Peer and Self Assessment in Massive Online Classes. In: Plattner, H., Meinel, C., Leifer, L. (eds) Design Thinking Research. Understanding Innovation. Springer, Cham. https://doi.org/10.1007/978-3-319-06823-7_9

Download citation

Publish with us

Policies and ethics