Abstract
In this paper we apply supervised machine learning algorithms to automatically classify the text of students’ reflective learning journals from an introductory Java programming module with the aim of identifying students who need help with their understanding of the topic they are reflecting on. Such a system could alert teaching staff to students who may need an intervention to support their learning.
Several different classifier algorithms have been validated on the training data set to find the best model in two situations; with equal cost for a positive or negative classification and with cost sensitive classification. Methods were used to identify those individual parameters which maximise the performance of each algorithm. Precision, recall and F1-score, as well as confusion matrices were used to understand the behaviour of each classifier and choose the one with the best performance.
The classifiers that obtained the best results from the validation were then evaluated on a testing data set containing different data to that used for training.
We believe that although the results could be improved with further work, our initial results show that machine learning could be applied to students’ reflective writing to assist staff in identifying those students who are struggling to understand the topic.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The labels correspond to the final exam results.
References
Aphinyanaphongs, Y., Tsamardinos, I., Statnikov, A., Hardin, D., Aliferis, C.F.: Text categorization models for high-quality article retrieval in internal medicine. J. Am. Med. Inform. Assoc. 12(2), 207–216 (2005). https://doi.org/10.1197/jamia.M1641
Argamon, S., Koppel, M., Pennebaker, J., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52, 119–123 (2009)
Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering (2001). https://arxiv.org/abs/cs/0109015. Accessed 12 Jun 2018
Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the ICML, vol. 3, p. 66 (2003)
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)
Elankavi, R., Kalaiprasath, R., Udayakumar, D.R.: A fast clustering algorithm for high-dimensional data. Int. J. Civ. Eng. Technol. (IJCIET) 8(5), 1220–1227 (2017)
Endelman, J.B.: Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4(3), 250–255 (2011)
Friedman, C.: A broad-coverage natural language processing system. In: Proceedings of the AMIA Symposium, pp. 270–274 (2000)
Géron, A.: Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc. (2017)
Gokgoz, E., Subasi, A.: Comparison of decision tree algorithms for EMG signal classification using DWT. Biomed. Signal Process. Control 18, 138–144 (2015)
Gruber, M.: Improving Efficiency by Shrinkage: The James–Stein and RidgeRegression Estimators. Routledge (2017)
Ham, J., Chen, Y., Crawford, M.M., Ghosh, J.: Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43(3), 492–501 (2005)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Plotly Technologies Inc.: Collaborative data science (2015). https://plot.ly
Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Norwell (2002)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972). https://doi.org/10.1108/eb026526
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI, vol. 333, pp. 2267–2273 (2015)
Larkey, L.: Automatic essay grading using text categorization techniques. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 90–95. ACM, August 1998
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. Springer, New York (1994). https://doi.org/10.1007/978-1-4471-2099-5_1
McNamara, D., Crossley, S., Roscoe, R., Allen, L., Dai, J.: A hierarchical classification approach to automated essay scoring. Assessing Writ. 23, 35–59 (2015)
McTear, M., Callejas, Z., Griol, D.: The Conversational Interface: Talking to Smart Devices, 1st edn. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32967-3
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR abs/1310.4546 (2013). http://arxiv.org/abs/1310.4546
Moon, J.: Reflection in Learning and Professional Development. Routledge, London (1999)
Murphy, K.P., et al.: Naive Bayes Classifiers, p. 18. University of British Columbia (2006)
O’Rourke, R.: The learning journal: from chaos to coherence. Assessment Eval. High. Educ. 23(4), 403–413 (1998)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rudner, L., Liang, T.: Automated essay scoring using Bayes’ theorem. J. Technol. Learn. Assessment 1(2) (2002)
Silge, J., Robinson, D.: Text Mining with R: A Tidy Approach. O’Reilly Media, Inc. (2017)
Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 23(4), 501–509 (2004)
Sukkarieh, J.Z., Pulman, S.G., Raikes, N.: Auto-marking: using computational linguistics to score short, free text responses. In: 29th Annual Conference of the International Association for Educational Assessment (IAEA), Manchester, UK (2003)
TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org
Trstenjak, B., Mikac, S., Donko, D.: KNN with TF-IDF based framework for text categorization. Procedia Eng. 69, 1356–1364 (2014)
Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manage. 50(1), 104–112 (2014)
Valenti, S., Neri, F., Cucchiarelli, A.: An overview of current research on automated essay grading. J. Inf. Technol. Educ. Res. 2, 319–330 (2003)
Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.S.: ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl. Based Syst. 67, 105–116 (2014)
Yu, B.: An evaluation of text classification methods for literary study. Literary Linguist. Comput. 23(3), 327–343 (2008)
Zhou, B., Yao, Y., Luo, J.: Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 42(1), 19–45 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Beaumont, A.J., Al-Shaghdari, T. (2019). To What Extent Can Text Classification Help with Making Inferences About Students’ Understanding. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-37599-7_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37598-0
Online ISBN: 978-3-030-37599-7
eBook Packages: Computer ScienceComputer Science (R0)