Applying Latent Dirichlet Allocation to Automatic Essay Grading

Kakkonen, Tuomo; Myller, Niko; Sutinen, Erkki

doi:10.1007/11816508_13

Tuomo Kakkonen²¹,
Niko Myller²¹ &
Erkki Sutinen²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Included in the following conference series:

International Conference on Natural Language Processing (in Finland)

1755 Accesses
14 Citations
1 Altmetric

Abstract

We report experiments on automatic essay grading using Latent Dirichlet Allocation (LDA). LDA is a “bag-of-words” type of language modeling and dimension reduction method, reported to outperform other related methods, Latent Semantic Analysis (LSA) and Probabilistic Latent Semantic Analysis (PLSA) in Information Retrieval (IR) domain. We introduce LDA in detail and compare its strengths and weaknesses to LSA and PLSA. We also compare empirically the performance of LDA to LSA and PLSA. The experiments were run with three essay sets consisting in total of 283 essays from different domains. On contrary to the findings in IR, LDA achieved slightly worse results compared to LSA and PLSA in the experiments. We state the reasons for LSA and PLSA outperforming LDA and indicate further research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Page, E.B.: The Imminence of Grading Essays by Computer. Phi Delta Kappan 47, 238–243 (1966)
Google Scholar
Burstein, J.: The E-Rater Scoring Engine: Automated Essay Scoring with Natural Language Processing. In: Shermis, M.D., Burstein, J. (eds.) Automated Essay Scoring: a Cross-Disciplinary Perspective, pp. 113–122. Lawrence Erlbaum Associates, Hillsdale (2003)
Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)
Article Google Scholar
Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning Journal 42, 177–196 (2000)
Article Google Scholar
Landauer, T.K., Laham, D., Foltz, P.: Automatic Essay Assessment. Assessment in Education 10, 295–308 (2003)
Article Google Scholar
Kakkonen, T., Myller, N., Sutinen, E., Timonen, J.: Automatic Essay Grading with Probabilistic Latent Semantic Analysis. In: Proceedings of the ACL 2005 Second Workshop on Building Educational Applications Using Natural Language Processing, Ann Arbor, Michigan, USA, pp. 29–36 (2005)
Google Scholar
Lemaire, B., Dessus, P.: A System to Assess the Semantic Content of Student Essays. Journal of Educational Computing Research 24, 305–320 (2001)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Kakkonen, T., Sutinen, E.: Automatic Assessment of the Content of Essays Based on Course Materials. In: Proceedings of International Conference on Information Technology: Research and Education, London, UK, pp. 126–130 (2004)
Google Scholar
Lingsoft: Lingsoft Ltd. (2005) (accessed 1.3.2006), WWW-page: http://www.lingsoft.fi
Kakkonen, T., Sutinen, E., Timonen, J.: Applying Validation Methods for Noise Reduction in LSA-based Essay Grading. WSEAS Transactions on Information Science and Applications 2, 1334–1342 (2005)
Google Scholar
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, pp. 352–359 (2002)
Google Scholar
Girolami, M., Kabán, A.: On an Equivalence between PLSI and LDA. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 433–434. ACM Press, New York (2003)
Google Scholar
Globerson, A., Tishby, N.: Sufficient Dimensionality Reduction. Journal of Machine Learning Research 3, 1307–1331 (2003)
Article MATH Google Scholar
Brants, T.: Test Data Likelihood for PLSA Models. Information Retrieval 8, 181–196 (2005)
Article Google Scholar
Larkey, L.: Automatic Essay Grading Using Text Categorization Techniques. In: Proceedings of 21st Annual International Conference on Research and Development in Information Retrieval, pp. 90–95 (1998)
Google Scholar
Landauer, T., Rehder, B., Schreiner, M.E.: How Well Can Passage Meaning Be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans. In: Proceedings of the 19th Annual Meeting of the Cognitive Science Society (1997)
Google Scholar
Foltz, P.W., Gilliam, S., Kendall, S.: Supporting Content-based Feedback in Online Writing Evaluation with LSA. Interactive Learning Environments 8, 111–129 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Joensuu, P.O. Box 111, FI-80101, Joensuu, Finland
Tuomo Kakkonen, Niko Myller & Erkki Sutinen

Authors

Tuomo Kakkonen
View author publications
You can also search for this author in PubMed Google Scholar
Niko Myller
View author publications
You can also search for this author in PubMed Google Scholar
Erkki Sutinen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Turku Centre for Computer Science (TUCS), Department of Information Technology, University of Turku, Joukahaisenkatu 3-5 B, FIN-20520, Turku, Finland
Tapio Salakoski
Turku Centre for Computer Science (TUCS) and Department of IT, University of Turku, Lemminkäisenkatu 14 A, 20520, Turku, Finland
Filip Ginter & Sampo Pyysalo &
Department of Information Technology, University of Turku, Lemminkäisenkatu 14–18 A, FIN-20520, Turku, Finland
Tapio Pahikkala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kakkonen, T., Myller, N., Sutinen, E. (2006). Applying Latent Dirichlet Allocation to Automatic Essay Grading. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_13

Download citation

DOI: https://doi.org/10.1007/11816508_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics