To What Extent Can Text Classification Help with Making Inferences About Students’ Understanding

Beaumont, A. J.; Al-Shaghdari, T.

doi:10.1007/978-3-030-37599-7_31

To What Extent Can Text Classification Help with Making Inferences About Students’ Understanding

Conference paper
First Online: 03 January 2020

1719 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11943))

Abstract

In this paper we apply supervised machine learning algorithms to automatically classify the text of students’ reflective learning journals from an introductory Java programming module with the aim of identifying students who need help with their understanding of the topic they are reflecting on. Such a system could alert teaching staff to students who may need an intervention to support their learning.

Several different classifier algorithms have been validated on the training data set to find the best model in two situations; with equal cost for a positive or negative classification and with cost sensitive classification. Methods were used to identify those individual parameters which maximise the performance of each algorithm. Precision, recall and F1-score, as well as confusion matrices were used to understand the behaviour of each classifier and choose the one with the best performance.

The classifiers that obtained the best results from the validation were then evaluated on a testing data set containing different data to that used for training.

We believe that although the results could be improved with further work, our initial results show that machine learning could be applied to students’ reflective writing to assist staff in identifying those students who are struggling to understand the topic.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The labels correspond to the final exam results.

References

Aphinyanaphongs, Y., Tsamardinos, I., Statnikov, A., Hardin, D., Aliferis, C.F.: Text categorization models for high-quality article retrieval in internal medicine. J. Am. Med. Inform. Assoc. 12(2), 207–216 (2005). https://doi.org/10.1197/jamia.M1641
Argamon, S., Koppel, M., Pennebaker, J., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52, 119–123 (2009)
Article Google Scholar
Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering (2001). https://arxiv.org/abs/cs/0109015. Accessed 12 Jun 2018
Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the ICML, vol. 3, p. 66 (2003)
Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)
MathSciNet MATH Google Scholar
Elankavi, R., Kalaiprasath, R., Udayakumar, D.R.: A fast clustering algorithm for high-dimensional data. Int. J. Civ. Eng. Technol. (IJCIET) 8(5), 1220–1227 (2017)
Google Scholar
Endelman, J.B.: Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4(3), 250–255 (2011)
Article Google Scholar
Friedman, C.: A broad-coverage natural language processing system. In: Proceedings of the AMIA Symposium, pp. 270–274 (2000)
Google Scholar
Géron, A.: Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc. (2017)
Google Scholar
Gokgoz, E., Subasi, A.: Comparison of decision tree algorithms for EMG signal classification using DWT. Biomed. Signal Process. Control 18, 138–144 (2015)
Article Google Scholar
Gruber, M.: Improving Efficiency by Shrinkage: The James–Stein and RidgeRegression Estimators. Routledge (2017)
Google Scholar
Ham, J., Chen, Y., Crawford, M.M., Ghosh, J.: Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43(3), 492–501 (2005)
Article Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
MATH Google Scholar
Plotly Technologies Inc.: Collaborative data science (2015). https://plot.ly
Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Norwell (2002)
Book Google Scholar
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972). https://doi.org/10.1108/eb026526
Article Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI, vol. 333, pp. 2267–2273 (2015)
Google Scholar
Larkey, L.: Automatic essay grading using text categorization techniques. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 90–95. ACM, August 1998
Google Scholar
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. Springer, New York (1994). https://doi.org/10.1007/978-1-4471-2099-5_1
McNamara, D., Crossley, S., Roscoe, R., Allen, L., Dai, J.: A hierarchical classification approach to automated essay scoring. Assessing Writ. 23, 35–59 (2015)
Article Google Scholar
McTear, M., Callejas, Z., Griol, D.: The Conversational Interface: Talking to Smart Devices, 1st edn. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32967-3
Book Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR abs/1310.4546 (2013). http://arxiv.org/abs/1310.4546
Moon, J.: Reflection in Learning and Professional Development. Routledge, London (1999)
Google Scholar
Murphy, K.P., et al.: Naive Bayes Classifiers, p. 18. University of British Columbia (2006)
Google Scholar
O’Rourke, R.: The learning journal: from chaos to coherence. Assessment Eval. High. Educ. 23(4), 403–413 (1998)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Rudner, L., Liang, T.: Automated essay scoring using Bayes’ theorem. J. Technol. Learn. Assessment 1(2) (2002)
Google Scholar
Silge, J., Robinson, D.: Text Mining with R: A Tidy Approach. O’Reilly Media, Inc. (2017)
Google Scholar
Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 23(4), 501–509 (2004)
Article Google Scholar
Sukkarieh, J.Z., Pulman, S.G., Raikes, N.: Auto-marking: using computational linguistics to score short, free text responses. In: 29th Annual Conference of the International Association for Educational Assessment (IAEA), Manchester, UK (2003)
Google Scholar
TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org
Trstenjak, B., Mikac, S., Donko, D.: KNN with TF-IDF based framework for text categorization. Procedia Eng. 69, 1356–1364 (2014)
Article Google Scholar
Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manage. 50(1), 104–112 (2014)
Article Google Scholar
Valenti, S., Neri, F., Cucchiarelli, A.: An overview of current research on automated essay grading. J. Inf. Technol. Educ. Res. 2, 319–330 (2003)
Google Scholar
Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.S.: ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl. Based Syst. 67, 105–116 (2014)
Article Google Scholar
Yu, B.: An evaluation of text classification methods for literary study. Literary Linguist. Comput. 23(3), 327–343 (2008)
Article MathSciNet Google Scholar
Zhou, B., Yao, Y., Luo, J.: Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 42(1), 19–45 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Aston University, Birmingham, UK
A. J. Beaumont & T. Al-Shaghdari

Authors

A. J. Beaumont
View author publications
You can also search for this author in PubMed Google Scholar
T. Al-Shaghdari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. J. Beaumont .

Editor information

Editors and Affiliations

University of Cambridge, Cambridge, UK
Giuseppe Nicosia
University of Florida, Gainesville, FL, USA
Panos Pardalos
Harvard University, Cambridge, MA, USA
Renato Umeton
Università di Catania, Catania, Catania, Italy
Giovanni Giuffrida
Almawave, Rome, Roma, Italy
Vincenzo Sciacca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beaumont, A.J., Al-Shaghdari, T. (2019). To What Extent Can Text Classification Help with Making Inferences About Students’ Understanding. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-37599-7_31
Published: 03 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37598-0
Online ISBN: 978-3-030-37599-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics