Skip to main content

Automated Code Extraction from Discussion Board Text Dataset

  • Conference paper
  • First Online:
Advances in Quantitative Ethnography (ICQE 2022)

Abstract

This study introduces and investigates the capabilities of three different text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, for automating code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset that was manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes, which can be used in Epistemic Network Analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zörgő,  S., Jeney, A., Csajbók-Veres, K., Mkhitaryan, S., Susánszky, A.: Mapping the content structure of online diabetes support group activity on facebook. In: International Conference on Quantitative Ethnography (2021)

    Google Scholar 

  2. Bressler, D.M., Annetta, L.A., Dunekack, A., Lamb, R.L.,  Vallett, D.B.: How STEM game design participants discuss their project goals and their success differently In: International Conference on Quantitative Ethnography (2021)

    Google Scholar 

  3. Rolim, V., Ferreira, R., Lins, R.D., Gǎsević, D.: A network-based analytic approach to uncovering the relationship between social and cognitive presences in communities of inquiry. Internet Higher Educ. 42, 53–65 (2019)

    Article  Google Scholar 

  4. Vega, H.,  Irgens, G.A.: Constructing interpretations with participants through epistemic network analysis: towards participatory approaches in quantitative ethnography. In: International Conference on Quantitative Ethnography (2021)

    Google Scholar 

  5. Moraes, M., Folkestad,  J., McKenna, K.: Using epistemic network analysis to help instructors evaluate asynchronous online discussions. In: Second International Conference on Quantitative Ethnography: Conference Proceedings Supplement (2021)

    Google Scholar 

  6. Marquart, C.L., Swiecki, Z., Eagan, B., Shaffer, D.W.: Package ‘ncodeR’, (2019). https://cran.r-project.org/web/packages/ncodeR/ncodeR.pdf. (Accessed 18 May 2022)

  7. Esmaeilzadeh, A., Heidari, M., Abdolazimi, R., Hajibabaee, P.,  Malekzadeh, M.: Efficient large scale nlp feature engineering with apache spark. In: 2022 IEEE 12th Annual Computing and Commnication Workshop and Conference (CCWC) (2022)

    Google Scholar 

  8. Zuo, C., Banerjee, R., Shirazi, H., Chaleshtori, F.H.,  Zuo, C.: Seeing should probably not be believing: the role of deceptive support in COVID-19 misinformation on twitter. ACM J. Data Inf. Quality (JDIQ) (2022)

    Google Scholar 

  9.  Saravani, S.M., Ray, I., Ray, I.:  Automated identification of social media bots using deepfake text detection.  In: International Conference on Information Systems Security  (2021)

    Google Scholar 

  10. Saravani, S.M.: Redundant Complexity in Deep Learning: An Efficacy Analysis of NeXtVLAD in NLP, Colorado State University Theses and Dissertations (2022)

    Google Scholar 

  11. Saravani, S.M., Banerjee, R., Ray, I.: An investigation into the contribution of locally aggregated descriptors to figurative language identification. In: Proceedings of the Second Workshop on Insights from Negative Results in NLP (2021)

    Google Scholar 

  12. Bakharia, A.: On the equivalence of inductive content analysis and topic modeling. In: International Conference on Quantitative Ethnography (2019)

    Google Scholar 

  13. Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W.: Using topic modeling for code discovery in large scale text data. In: International Conference on Quantitative Ethnography (2021)

    Google Scholar 

  14. Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Hu, X., Graesser, A.C.: nCoder+: a semantic tool for improving recall of nCoder coding. In: International Conference on Quantitative Ethnography (2019)

    Google Scholar 

  15. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998)

    Article  Google Scholar 

  16. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  17. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium On Mathematical Statistics And Probability (1967)

    Google Scholar 

  18. Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta (2010)

    Google Scholar 

  19. Honnibal, M., et al.: Explosion/spaCy: v2.1.7: Improved evaluation, better language factories and bug fixes, Zenodo  (2019)

    Google Scholar 

  20. Esmaeilzadeh, A., Cacho, J.R.F.,  Taghva,  K., Kambar,  M.E.Z.N., Hajiali, M.: Building wikipedia n-grams with apache spark. In Science and Information Conference (2022)

    Google Scholar 

  21. Ganegedara, T.: Intuitive Guide to Latent Dirichlet Allocation. https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-latent-dirichlet-allocation-437c81220158. (Accessed 18 May 2022)

  22. Seth, N.:  Part 2: Topic Modeling and Latent Dirichlet Allocation (LDA) using Gensim and Sklearn. https://www.analyticsvidhya.com/blog/2021/06/part-2-topic-modeling-and-latent-dirichlet-allocation-lda-using-gensim-and-sklearn/. (Accessed 18 May 2022)

  23. Hofmann, T.: Probabilistic latent semantic indexing.  In: Proceedings of the 22nd Annual International ACM SIGIR Conference On Research And Development In Information Retrieval (1999)

    Google Scholar 

  24. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, (2013)

  25. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information, arXiv preprint arXiv:1607.04606, (2016)

  26. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation.  In: Proceedings of the 2014 Conference On Empirical Methods In Natural Language Processing (EMNLP) (2014)

    Google Scholar 

  27. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers) (2019)

    Google Scholar 

  28. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM International Conference On Web Search And Data Mining (2015)

    Google Scholar 

  29. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  30. Moody, C.E.: Mixing dirichlet topic models and word embeddings to make lda2vec, arXiv preprint arXiv:1605.02019, (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcia Moraes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saravani, S.M., Ghaffari, S., Luther, Y., Folkestad, J., Moraes, M. (2023). Automated Code Extraction from Discussion Board Text Dataset. In: DamÅŸa, C., Barany, A. (eds) Advances in Quantitative Ethnography. ICQE 2022. Communications in Computer and Information Science, vol 1785. Springer, Cham. https://doi.org/10.1007/978-3-031-31726-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31726-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31725-5

  • Online ISBN: 978-3-031-31726-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics