Mining Domain Knowledge for Coherence Assessment of Students Proposal Drafts

  • Samuel González López
  • Aurelio López-LópezEmail author
Part of the Studies in Computational Intelligence book series (SCI, volume 524)


Often, academic programs require students to write a thesis or research proposal. The review of such texts is a heavy load, especially at initial stages. One feature evaluated by instructors is coherence, i.e. the interrelationship of the various elements of the text. We present a coherence analyzer, which employs latent semantic analysis (LSA) to mine existing corpora to further assess new drafts. We designed the analyzer as part of an Intelligent Tutoring System, considering seven common sections. After mining domain knowledge, experiments were done on graduate and undergraduate corpora to define a grading scale. Another experiment that involved human reviewers was set to validate the process. The technique allowed evaluating the coherence of the different sections, reaching an acceptable result and hinting that the level reached so far is adequate to support online review. An innovative exploration across sections was performed, uncovering a consistent interrelationship, according to methodology authors.


Coherence Writing support Latent semantic analysis Intelligent tutoring system 



Data mining


Intelligent tutor system


Latent semantic analysis


Latent semantic indexing


Non-negative matrix factorization


Probabilistic latent semantic analysis


Student progress module


Singular values decomposition



We thank the reviewers: Rene Castro M., Claudia I. Esquivel L., J. Miguel García G., Ramón Cárdenas G., Israel Chávez G., Orlando Madrid M., and Raúl Beltran Q. This research was supported by CONACYT, México, through the scholarship 1124002 for the first author. The second author was partially supported by SNI, México.


  1. 1.
    Luan, J.: Data mining and its applications in higher education. New Dir. Inst. Res. 2002(113), 17–36 (2002)Google Scholar
  2. 2.
    Vilarnovo, A.: Coherencia textual: ¿Coherencia Interna o Coherencia Externa? Estudios de Lingüística 6, 229–240 (1990)Google Scholar
  3. 3.
    Louwerse, M.M.: A concise model of cohesion in text and coherence in comprehension. Revista Signos 37(56), 41–58 (2004)CrossRefGoogle Scholar
  4. 4.
    Skogs, J.: Subject line preferences and other factors contributing to coherence and interaction in student discussion forums. Comput. Educ. 60(1), 172–183 (2013)CrossRefGoogle Scholar
  5. 5.
    Medve, V.B., Takac, V.P.: The influence of cohesion and coherence on text quality: a cross-linguistic study of foreign language learners written production. In: Piechurska-Kuciel, E., Szymańska-Czaplak, E. (eds.) Language in cognition and affect. Second language learning and teaching, pp. 111–131. Springer, Heidelberg (2013)Google Scholar
  6. 6.
    Yannakoudakis, H., Briscoe, T.: Modeling coherence in ESOL learner texts. In: 7th Workshop on the Innovative Use of NLP for Building Educational Applications, pp. 33–43. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  7. 7.
    Higgins, D., Burstein, J., Marcu, D., Gentile, C.: Evaluating multiple aspects of coherence in student essays. In: Human language technology conference/North American chapter of the Association for Computational Linguistics, pp. 185–192. Association for Computational Linguistics, Boston (2004)Google Scholar
  8. 8.
    Miltsakaki, E., Kukich, K.: Evaluation of text coherence for electronic essay scoring systems. Nat. Lang. Eng. 10(1), 25–55 (2004)CrossRefGoogle Scholar
  9. 9.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)CrossRefGoogle Scholar
  10. 10.
    Landauer, T., Dumais, S.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997)CrossRefGoogle Scholar
  11. 11.
    Foltz, P., Kintsch, W., Launder, T.: Textual coherence using latent semantic analysis. Discourse Process. 25, 285–307 (1998)CrossRefGoogle Scholar
  12. 12.
    Hofmann, T.: Probabilistic latent semantic indexing. In: 22nd international conference on research and development in information retrieval, pp. 50–57. ACM, NY (1999)Google Scholar
  13. 13.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  14. 14.
    Lee, S., Baker, J., Song, J., Wetherbe, J.C.: An empirical comparison of four text mining methods. In: 43rd Hawaii international conference on system sciences, pp. 1–10. IEEE Computer Society, Washington (2010)Google Scholar
  15. 15.
    Zhang, M., Yang, H., Ji, D., Teng, C., Wu, H.: Discourse coherence: lexical chain, complex network and semantic field. In: Ji, D., Xiao, G. (eds.) Chinese Lexical Semantics. LNCS, vol. 7717, pp. 756–765. Springer, Heidelberg (2013)Google Scholar
  16. 16.
    Dessus, P.: An overview of LSA-based systems for supporting learning and teaching. In: Dimitrova, V., Mizoguchi, R., du Boulay, B., Graesser A. (eds.) Conference on Artificial Intelligence in Education: Building Learning Systems that Care: From Knowledge Representation to Affective Modeling, pp. 157–164. IOS Press, Amsterdam (2009)Google Scholar
  17. 17.
    Southavilay, V., Yacef, K., Calvo, R.A.: Process mining to support students’ collaborative writing. In: Baker, R.S.J.D., Merceron, A., Pavlik Jr., P.I. (eds.) 3rd International Conference on Educational Data Mining, pp. 257–266. International Educational Data Mining Society, Pittsburgh (2010)Google Scholar
  18. 18.
    Jiang, H., Huang, G., Liu, J.: The research on CET automated essay scoring based on data mining. In: Zhou, M., Tan H. (eds.) Advances in Computer Science and Education Applications. Communications in Computer and Information Science, vol. 202, pp. 100–105. Springer, Heidelberg (2011)Google Scholar
  19. 19.
    Villalón, J., Kearney, P., Calvo, R.A., Reimann, P.: Glosser: enhanced feedback for student writing tasks. In: International conference on advanced learning technologies, pp. 454–458. IEEE Computer Society, Washington (2008)Google Scholar
  20. 20.
    Liu, M., Calvo, R.A., Rus, V.: Automatic question generation for literature review writing support. In: Aleven, V., Kay, J., Mostow, J. (eds.) Intelligent Tutoring Systems. LNCS, vol. 6094, pp. 45–54. Springer, Heidelberg (2010)Google Scholar
  21. 21.
    Higgins, D., Burstein, J.: Sentence similarity measures for essay coherence. In: 7th international workshop on computational semantics, pp. 77–88. Tilburg University, Tilburg (2007)Google Scholar
  22. 22.
    Vasile, R., Nobal, N.: Automated detection of local coherence in short argumentative essays based on centering theory. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing. LNCS, vol. 7181, pp. 450–461. Springer, Heidelberg (2012)Google Scholar
  23. 23.
    Kintsch, W.: On the notions of theme and topic in psychological process models of text comprehension. In: Louwerse, M., Van Peer, W. (eds.) Thematics, Interdisciplinary Studies, pp. 157–170. John Benjamins Publishing, Amsterdam (2002)Google Scholar
  24. 24.
    Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. Soc. Ind. Appl. Math. 4, 573–595 (1995)MathSciNetGoogle Scholar
  25. 25.
    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Hernández, R.: Metodología de la Investigación. Mc Graw Hill, México (2006)Google Scholar
  27. 27.
    García-Gorrostieta, J.M., González-López, S., López-López, A., Carrillo, M.: An intelligent tutoring system to evaluate and advise on lexical richness in students writings. In: Hernández-Leo, D., Ley T., Klamma, R., Harrer, A. (eds.) EC-TEL 2013. LNCS, vol. 8095, pp. 548–551. Springer, Heidelberg (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Samuel González López
    • 1
  • Aurelio López-López
    • 1
    Email author
  1. 1.Instituto Nacional de AstrofísicaÓptica y ElectrónicaPueblaMexico

Personalised recommendations