nCoder+: A Semantic Tool for Improving Recall of nCoder Coding

  • Zhiqiang CaiEmail author
  • Amanda Siebert-Evenstone
  • Brendan Eagan
  • David Williamson Shaffer
  • Xiangen Hu
  • Arthur C. Graesser
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1112)


Coding is a process of assigning meaning to a given piece of evidence. Evidence may be found in a variety of data types, including documents, research interviews, posts from social media, conversations from learning platforms, or any source of data that may provide insights for the questions under qualitative study. In this study, we focus on text data and consider coding as a process of identifying words or phrases and categorizing them into codes to facilitate data analysis. There are a number of different approaches to generating qualitative codes, such as grounded coding, a priori coding, or using both in an iterative process. However, both qualitative and quantitative analysts face the same coding problem: when the data size is large, manually coding becomes impractical. nCoder is a tool that helps researchers to discover and code key concepts in text data with minimum human judgements. Once reliability and validity are established, nCoder automatically applies the coding scheme to the dataset. However, for concepts that occur infrequently, even with an acceptable reliability, the classifier may still result in too many false negatives. This paper explores these problems within the current nCoder and proposes adding a semantic component to the nCoder. A tool called “nCoder+” is presented with real data to demonstrate the usefulness of the semantic component. The possible ways of integrating this component and other natural language processing techniques into nCoder are discussed.


Coding Grounded coding A priori coding Automatic coding Grounded theory Qualitative analysis Quantitative analysis Latent Semantic Analysis Topic modeling Machine learning 



The research was supported by the National Science Foundation (SBR 9720314, REC 0106965, REC 0126265, ITR 0325428, REESE 0633918, ALT-0834847, DRK-12-0918409, 1108845; DRL-1661036, 1713110; ACI-1443068), the Institute of Education Sciences (R305H050169, R305B070349, R305A080589, R305A080594, R305G020018, R305C120001), the Army Research Lab (W911INF-12-2-0030), and the Office of Naval Research (N00014-00-1-0600, N00014-12-C-0643; N00014-16-C-3027), the Wisconsin Alumni Research Foundation, and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The opinions, findings, and conclusions do not reflect the views of the funding agencies, cooperating institutions, or other individuals.


  1. 1.
    Shaffer, D.W.: Quantitative Ethnography. Cathcart Press, Madison (2017)Google Scholar
  2. 2.
    Chi, M.T.H.: Quantifying qualitative analyses of verbal data: a practical guide. J. Learn. Sci. 6, 271–315 (1997)CrossRefGoogle Scholar
  3. 3.
    Saldaña, J.: The Coding Manual for Qualitative Researchers (2014).
  4. 4.
    Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Transaction, New Brunswick (1967)Google Scholar
  5. 5.
    Charmaz, K.: Constructing Grounded Theory. SAGE, London (2006)Google Scholar
  6. 6.
    Eagan, B.R., Rogers, B., Serlin, R., Ruis, A.R., Irgens, G.A., Shaffer, D.W.: Can we rely on IRR? testing the assumptions of inter-rater reliability. In: CSCL 2017 Proceedings, pp. 529–532 (2017)Google Scholar
  7. 7.
    Blei, D.M., Edu, B.B., Ng, A.Y., Edu, A.S., Jordan, M.I., Edu, J.B.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). Scholar
  8. 8.
    Hu, Y., Boyd-Graber, J., Satinoff, B.: Interactive topic modeling. In: Proceedings of the 49th Annual Meeting Association for Computational Linguistics Human Language Technologies, pp. 248–257 (2011)Google Scholar
  9. 9.
    Marquart, C.L., Swiecki, Z., Eagan, B., Shaffer, D.W.: ncodeR (Version 0.1.2) (2018)Google Scholar
  10. 10.
    Eagan, B.R., Rogers, B., Pozen, R., Marquart, C., Shaffer, D.W.: rhoR: Rho for inter rater reliability (Version 1.1.0) (2016).
  11. 11.
    Gašević, D., Joksimović, S., Eagan, B., Shaffer, D.W.: SENS: network analytics to combine social and cognitive perspectives of collaborative learning. Comput. Hum. Behav. 92, 562–577 (2019)CrossRefGoogle Scholar
  12. 12.
    Cai, Z., Pennebaker, J.W., Eagan, B., Shaffer, D.W., Dowell, N.M., Graesser, A.C.: Epistemic network analysis and topic modeling for chat data from collaborative learning environment. In: Proceedings of the 10th International Conference on Educational Data Mining, pp. 104–111 (2017)Google Scholar
  13. 13.
    Sullivan, S., et al.: Using epistemic network analysis to identify targets for educational interventions in trauma team communication. Surg. (United States) 163, 938–943 (2018). Scholar
  14. 14.
    Shaffer, D.W., Ruis, A.R.: Epistemic network analysis: a worked example of theory-based learning analytics. In: Handbook of Learning Analytics Data Mining, in press (2017)CrossRefGoogle Scholar
  15. 15.
    Cohen, J., Cohen, J.: A coefficient of agreement for nomial scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). coefficient of agreement for nomial scales. Educ. Psychol. Meas. 20, 37–46 (1960).
  16. 16.
    Landauer, T., McNamara, D., Dennis, S., Kintsch, W.: Handbook of Latent Semantic Analysis (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.The University of MemphisMemphisUSA
  2. 2.University of Wisconsin-MadisonMadisonUSA
  3. 3.China Central UniversityWuhanChina

Personalised recommendations