Automated Code Extraction from Discussion Board Text Dataset

Saravani, Sina Mahdipour; Ghaffari, Sadaf; Luther, Yanye; Folkestad, James; Moraes, Marcia

doi:10.1007/978-3-031-31726-2_16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1785))

Included in the following conference series:

International Conference on Quantitative Ethnography

448 Accesses
2 Citations

Abstract

This study introduces and investigates the capabilities of three different text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, for automating code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset that was manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes, which can be used in Epistemic Network Analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zörgő, S., Jeney, A., Csajbók-Veres, K., Mkhitaryan, S., Susánszky, A.: Mapping the content structure of online diabetes support group activity on facebook. In: International Conference on Quantitative Ethnography (2021)
Google Scholar
Bressler, D.M., Annetta, L.A., Dunekack, A., Lamb, R.L., Vallett, D.B.: How STEM game design participants discuss their project goals and their success differently In: International Conference on Quantitative Ethnography (2021)
Google Scholar
Rolim, V., Ferreira, R., Lins, R.D., Gǎsević, D.: A network-based analytic approach to uncovering the relationship between social and cognitive presences in communities of inquiry. Internet Higher Educ. 42, 53–65 (2019)
Article Google Scholar
Vega, H., Irgens, G.A.: Constructing interpretations with participants through epistemic network analysis: towards participatory approaches in quantitative ethnography. In: International Conference on Quantitative Ethnography (2021)
Google Scholar
Moraes, M., Folkestad, J., McKenna, K.: Using epistemic network analysis to help instructors evaluate asynchronous online discussions. In: Second International Conference on Quantitative Ethnography: Conference Proceedings Supplement (2021)
Google Scholar
Marquart, C.L., Swiecki, Z., Eagan, B., Shaffer, D.W.: Package ‘ncodeR’, (2019). https://cran.r-project.org/web/packages/ncodeR/ncodeR.pdf. (Accessed 18 May 2022)
Esmaeilzadeh, A., Heidari, M., Abdolazimi, R., Hajibabaee, P., Malekzadeh, M.: Efficient large scale nlp feature engineering with apache spark. In: 2022 IEEE 12th Annual Computing and Commnication Workshop and Conference (CCWC) (2022)
Google Scholar
Zuo, C., Banerjee, R., Shirazi, H., Chaleshtori, F.H., Zuo, C.: Seeing should probably not be believing: the role of deceptive support in COVID-19 misinformation on twitter. ACM J. Data Inf. Quality (JDIQ) (2022)
Google Scholar
Saravani, S.M., Ray, I., Ray, I.: Automated identification of social media bots using deepfake text detection. In: International Conference on Information Systems Security (2021)
Google Scholar
Saravani, S.M.: Redundant Complexity in Deep Learning: An Efficacy Analysis of NeXtVLAD in NLP, Colorado State University Theses and Dissertations (2022)
Google Scholar
Saravani, S.M., Banerjee, R., Ray, I.: An investigation into the contribution of locally aggregated descriptors to figurative language identification. In: Proceedings of the Second Workshop on Insights from Negative Results in NLP (2021)
Google Scholar
Bakharia, A.: On the equivalence of inductive content analysis and topic modeling. In: International Conference on Quantitative Ethnography (2019)
Google Scholar
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W.: Using topic modeling for code discovery in large scale text data. In: International Conference on Quantitative Ethnography (2021)
Google Scholar
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Hu, X., Graesser, A.C.: nCoder+: a semantic tool for improving recall of nCoder coding. In: International Conference on Quantitative Ethnography (2019)
Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium On Mathematical Statistics And Probability (1967)
Google Scholar
Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta (2010)
Google Scholar
Honnibal, M., et al.: Explosion/spaCy: v2.1.7: Improved evaluation, better language factories and bug fixes, Zenodo (2019)
Google Scholar
Esmaeilzadeh, A., Cacho, J.R.F., Taghva, K., Kambar, M.E.Z.N., Hajiali, M.: Building wikipedia n-grams with apache spark. In Science and Information Conference (2022)
Google Scholar
Ganegedara, T.: Intuitive Guide to Latent Dirichlet Allocation. https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-latent-dirichlet-allocation-437c81220158. (Accessed 18 May 2022)
Seth, N.: Part 2: Topic Modeling and Latent Dirichlet Allocation (LDA) using Gensim and Sklearn. https://www.analyticsvidhya.com/blog/2021/06/part-2-topic-modeling-and-latent-dirichlet-allocation-lda-using-gensim-and-sklearn/. (Accessed 18 May 2022)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference On Research And Development In Information Retrieval (1999)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, (2013)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information, arXiv preprint arXiv:1607.04606, (2016)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference On Empirical Methods In Natural Language Processing (EMNLP) (2014)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers) (2019)
Google Scholar
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM International Conference On Web Search And Data Mining (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Moody, C.E.: Mixing dirichlet topic models and word embeddings to make lda2vec, arXiv preprint arXiv:1605.02019, (2016)

Download references

Author information

Authors and Affiliations

Colorado State University, Fort Collins, CO, 80523, USA
Sina Mahdipour Saravani, Sadaf Ghaffari, Yanye Luther, James Folkestad & Marcia Moraes

Authors

Sina Mahdipour Saravani
View author publications
You can also search for this author in PubMed Google Scholar
Sadaf Ghaffari
View author publications
You can also search for this author in PubMed Google Scholar
Yanye Luther
View author publications
You can also search for this author in PubMed Google Scholar
James Folkestad
View author publications
You can also search for this author in PubMed Google Scholar
Marcia Moraes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcia Moraes .

Editor information

Editors and Affiliations

University of Oslo, Oslo, Norway
Crina Damşa
Drexel University School of Education, Philadelphia, PA, USA
Amanda Barany

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saravani, S.M., Ghaffari, S., Luther, Y., Folkestad, J., Moraes, M. (2023). Automated Code Extraction from Discussion Board Text Dataset. In: Damşa, C., Barany, A. (eds) Advances in Quantitative Ethnography. ICQE 2022. Communications in Computer and Information Science, vol 1785. Springer, Cham. https://doi.org/10.1007/978-3-031-31726-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-31726-2_16
Published: 29 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31725-5
Online ISBN: 978-3-031-31726-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics