Skip to main content

To What Extent Can Text Classification Help with Making Inferences About Students’ Understanding

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11943))

Abstract

In this paper we apply supervised machine learning algorithms to automatically classify the text of students’ reflective learning journals from an introductory Java programming module with the aim of identifying students who need help with their understanding of the topic they are reflecting on. Such a system could alert teaching staff to students who may need an intervention to support their learning.

Several different classifier algorithms have been validated on the training data set to find the best model in two situations; with equal cost for a positive or negative classification and with cost sensitive classification. Methods were used to identify those individual parameters which maximise the performance of each algorithm. Precision, recall and F1-score, as well as confusion matrices were used to understand the behaviour of each classifier and choose the one with the best performance.

The classifiers that obtained the best results from the validation were then evaluated on a testing data set containing different data to that used for training.

We believe that although the results could be improved with further work, our initial results show that machine learning could be applied to students’ reflective writing to assist staff in identifying those students who are struggling to understand the topic.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The labels correspond to the final exam results.

References

  1. Aphinyanaphongs, Y., Tsamardinos, I., Statnikov, A., Hardin, D., Aliferis, C.F.: Text categorization models for high-quality article retrieval in internal medicine. J. Am. Med. Inform. Assoc. 12(2), 207–216 (2005). https://doi.org/10.1197/jamia.M1641

  2. Argamon, S., Koppel, M., Pennebaker, J., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52, 119–123 (2009)

    Article  Google Scholar 

  3. Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering (2001). https://arxiv.org/abs/cs/0109015. Accessed 12 Jun 2018

  4. Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the ICML, vol. 3, p. 66 (2003)

    Google Scholar 

  5. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Elankavi, R., Kalaiprasath, R., Udayakumar, D.R.: A fast clustering algorithm for high-dimensional data. Int. J. Civ. Eng. Technol. (IJCIET) 8(5), 1220–1227 (2017)

    Google Scholar 

  7. Endelman, J.B.: Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4(3), 250–255 (2011)

    Article  Google Scholar 

  8. Friedman, C.: A broad-coverage natural language processing system. In: Proceedings of the AMIA Symposium, pp. 270–274 (2000)

    Google Scholar 

  9. Géron, A.: Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc. (2017)

    Google Scholar 

  10. Gokgoz, E., Subasi, A.: Comparison of decision tree algorithms for EMG signal classification using DWT. Biomed. Signal Process. Control 18, 138–144 (2015)

    Article  Google Scholar 

  11. Gruber, M.: Improving Efficiency by Shrinkage: The James–Stein and RidgeRegression Estimators. Routledge (2017)

    Google Scholar 

  12. Ham, J., Chen, Y., Crawford, M.M., Ghosh, J.: Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43(3), 492–501 (2005)

    Article  Google Scholar 

  13. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  14. Plotly Technologies Inc.: Collaborative data science (2015). https://plot.ly

  15. Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Norwell (2002)

    Book  Google Scholar 

  16. Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972). https://doi.org/10.1108/eb026526

    Article  Google Scholar 

  17. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  18. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI, vol. 333, pp. 2267–2273 (2015)

    Google Scholar 

  19. Larkey, L.: Automatic essay grading using text categorization techniques. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 90–95. ACM, August 1998

    Google Scholar 

  20. Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. Springer, New York (1994). https://doi.org/10.1007/978-1-4471-2099-5_1

  21. McNamara, D., Crossley, S., Roscoe, R., Allen, L., Dai, J.: A hierarchical classification approach to automated essay scoring. Assessing Writ. 23, 35–59 (2015)

    Article  Google Scholar 

  22. McTear, M., Callejas, Z., Griol, D.: The Conversational Interface: Talking to Smart Devices, 1st edn. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32967-3

    Book  Google Scholar 

  23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR abs/1310.4546 (2013). http://arxiv.org/abs/1310.4546

  24. Moon, J.: Reflection in Learning and Professional Development. Routledge, London (1999)

    Google Scholar 

  25. Murphy, K.P., et al.: Naive Bayes Classifiers, p. 18. University of British Columbia (2006)

    Google Scholar 

  26. O’Rourke, R.: The learning journal: from chaos to coherence. Assessment Eval. High. Educ. 23(4), 403–413 (1998)

    Article  Google Scholar 

  27. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  28. Rudner, L., Liang, T.: Automated essay scoring using Bayes’ theorem. J. Technol. Learn. Assessment 1(2) (2002)

    Google Scholar 

  29. Silge, J., Robinson, D.: Text Mining with R: A Tidy Approach. O’Reilly Media, Inc. (2017)

    Google Scholar 

  30. Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 23(4), 501–509 (2004)

    Article  Google Scholar 

  31. Sukkarieh, J.Z., Pulman, S.G., Raikes, N.: Auto-marking: using computational linguistics to score short, free text responses. In: 29th Annual Conference of the International Association for Educational Assessment (IAEA), Manchester, UK (2003)

    Google Scholar 

  32. TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org

  33. Trstenjak, B., Mikac, S., Donko, D.: KNN with TF-IDF based framework for text categorization. Procedia Eng. 69, 1356–1364 (2014)

    Article  Google Scholar 

  34. Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manage. 50(1), 104–112 (2014)

    Article  Google Scholar 

  35. Valenti, S., Neri, F., Cucchiarelli, A.: An overview of current research on automated essay grading. J. Inf. Technol. Educ. Res. 2, 319–330 (2003)

    Google Scholar 

  36. Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.S.: ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl. Based Syst. 67, 105–116 (2014)

    Article  Google Scholar 

  37. Yu, B.: An evaluation of text classification methods for literary study. Literary Linguist. Comput. 23(3), 327–343 (2008)

    Article  MathSciNet  Google Scholar 

  38. Zhou, B., Yao, Y., Luo, J.: Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 42(1), 19–45 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. J. Beaumont .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Beaumont, A.J., Al-Shaghdari, T. (2019). To What Extent Can Text Classification Help with Making Inferences About Students’ Understanding. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37599-7_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37598-0

  • Online ISBN: 978-3-030-37599-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics