Abstract
This study introduces and investigates the capabilities of three different text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, for automating code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset that was manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes, which can be used in Epistemic Network Analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zörgő, S., Jeney, A., Csajbók-Veres, K., Mkhitaryan, S., Susánszky, A.: Mapping the content structure of online diabetes support group activity on facebook. In: International Conference on Quantitative Ethnography (2021)
Bressler, D.M., Annetta, L.A., Dunekack, A., Lamb, R.L., Vallett, D.B.: How STEM game design participants discuss their project goals and their success differently In: International Conference on Quantitative Ethnography (2021)
Rolim, V., Ferreira, R., Lins, R.D., Gǎsević, D.: A network-based analytic approach to uncovering the relationship between social and cognitive presences in communities of inquiry. Internet Higher Educ. 42, 53–65 (2019)
Vega, H., Irgens, G.A.: Constructing interpretations with participants through epistemic network analysis: towards participatory approaches in quantitative ethnography. In: International Conference on Quantitative Ethnography (2021)
Moraes, M., Folkestad, J., McKenna, K.: Using epistemic network analysis to help instructors evaluate asynchronous online discussions. In: Second International Conference on Quantitative Ethnography: Conference Proceedings Supplement (2021)
Marquart, C.L., Swiecki, Z., Eagan, B., Shaffer, D.W.: Package ‘ncodeR’, (2019). https://cran.r-project.org/web/packages/ncodeR/ncodeR.pdf. (Accessed 18 May 2022)
Esmaeilzadeh, A., Heidari, M., Abdolazimi, R., Hajibabaee, P., Malekzadeh, M.: Efficient large scale nlp feature engineering with apache spark. In: 2022 IEEE 12th Annual Computing and Commnication Workshop and Conference (CCWC) (2022)
Zuo, C., Banerjee, R., Shirazi, H., Chaleshtori, F.H., Zuo, C.: Seeing should probably not be believing: the role of deceptive support in COVID-19 misinformation on twitter. ACM J. Data Inf. Quality (JDIQ) (2022)
 Saravani, S.M., Ray, I., Ray, I.: Automated identification of social media bots using deepfake text detection. In: International Conference on Information Systems Security (2021)
Saravani, S.M.: Redundant Complexity in Deep Learning: An Efficacy Analysis of NeXtVLAD in NLP, Colorado State University Theses and Dissertations (2022)
Saravani, S.M., Banerjee, R., Ray, I.: An investigation into the contribution of locally aggregated descriptors to figurative language identification. In: Proceedings of the Second Workshop on Insights from Negative Results in NLP (2021)
Bakharia, A.: On the equivalence of inductive content analysis and topic modeling. In: International Conference on Quantitative Ethnography (2019)
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W.: Using topic modeling for code discovery in large scale text data. In: International Conference on Quantitative Ethnography (2021)
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Hu, X., Graesser, A.C.: nCoder+: a semantic tool for improving recall of nCoder coding. In: International Conference on Quantitative Ethnography (2019)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium On Mathematical Statistics And Probability (1967)
Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta (2010)
Honnibal, M., et al.: Explosion/spaCy: v2.1.7: Improved evaluation, better language factories and bug fixes, Zenodo (2019)
Esmaeilzadeh, A., Cacho, J.R.F., Taghva, K., Kambar, M.E.Z.N., Hajiali, M.: Building wikipedia n-grams with apache spark. In Science and Information Conference (2022)
Ganegedara, T.: Intuitive Guide to Latent Dirichlet Allocation. https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-latent-dirichlet-allocation-437c81220158. (Accessed 18 May 2022)
Seth, N.: Part 2: Topic Modeling and Latent Dirichlet Allocation (LDA) using Gensim and Sklearn. https://www.analyticsvidhya.com/blog/2021/06/part-2-topic-modeling-and-latent-dirichlet-allocation-lda-using-gensim-and-sklearn/. (Accessed 18 May 2022)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference On Research And Development In Information Retrieval (1999)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, (2013)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information, arXiv preprint arXiv:1607.04606, (2016)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference On Empirical Methods In Natural Language Processing (EMNLP) (2014)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers) (2019)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM International Conference On Web Search And Data Mining (2015)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Moody, C.E.: Mixing dirichlet topic models and word embeddings to make lda2vec, arXiv preprint arXiv:1605.02019, (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Saravani, S.M., Ghaffari, S., Luther, Y., Folkestad, J., Moraes, M. (2023). Automated Code Extraction from Discussion Board Text Dataset. In: DamÅŸa, C., Barany, A. (eds) Advances in Quantitative Ethnography. ICQE 2022. Communications in Computer and Information Science, vol 1785. Springer, Cham. https://doi.org/10.1007/978-3-031-31726-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-31726-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31725-5
Online ISBN: 978-3-031-31726-2
eBook Packages: Computer ScienceComputer Science (R0)