Educational Data Mining pp 345-377 | Cite as
Mining Texts, Learner Productions and Strategies with ReaderBench
Abstract
The chapter introduces ReaderBench, a multi-lingual and flexible environment that integrates text mining technologies for assessing a wide range of learners’ productions and for supporting teachers in several ways. ReaderBench offers three main functionalities in terms of text analysis: cohesion-based assessment, reading strategies identification and textual complexity evaluation. All of these have been subject to empirical validations. ReaderBench may be used throughout an entire educational scenario, starting from the initial complexity assessment of the reading materials, the assignment of texts to learners, the detection of reading strategies reflected in one’s self-explanations, and comprehension evaluation fostering learner’s self-regulation process.
Keywords
Cohesion-based discourse analysis Topics extraction Reading strategies Textual complexityAbbreviations
- AA
Adjacent agreement
- CAF
Complexity, accuracy and fluency
- CSCL
Computer supported collaborative learning
- DRP
Degree of reading power
- EA
Exact agreement
- FFL
French as foreign language
- ICC
Intra-class correlations
- LDA
Latent Dirichlet allocation
- LMS
Learning management system
- LSA
Latent semantic analysis
- NLP
Natural language processing
- POS
Part of speech
- SVM
Support vector machine
- TASA
Touchstone Applied Science Associates, Inc
- Tf-Idf
Term frequency – inverse document frequency
- WOLF
WordNet Libre du Français
Notes
Acknowledgments
This research was supported by an Agence Nationale de la Recherche (ANR-10-BLAN-1907) grant, by the 264207 ERRIC–Empowering Romanian Research on Intelligent Information Technologies/FP7-REGPOT-2010-1 and the POSDRU/107/1.5/S/76909 Harnessing human capital in research through doctoral scholarships (ValueDoc) projects. We also wish to thank Sonia Mandin, who kindly provided experimental data used for the validation of sentence importance. Some parts of this paper stem from [55].
References
- 1.Agrawal, R., Batra, M.: A detailed study on text mining techniques. Int. J. Soft Comput. Eng. 2(6), 118–121 (2013)Google Scholar
- 2.Trausan-Matu, S., Dascalu, M., Dessus, P.: Textual complexity and discourse structure in computer-supported collaborative learning. In: Cerri, S.A., Clancey, W.J., Papadourakis, G., Panourgia, K. (eds.) ITS 2012. LNCS, vol. 7315, pp. 352–357. Springer, Heidelberg (2012)Google Scholar
- 3.Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
- 4.Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)CrossRefGoogle Scholar
- 5.Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)MATHGoogle Scholar
- 6.Koedinger, K.R., Baker, R.S., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J.: A data repository for the EDM community: the PSLC datashop. In: Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S. (eds.) Handbook of Educational Data Mining, pp. 43–55. CRC Press, Boca Raton (2010). (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)CrossRefGoogle Scholar
- 7.Zou, M., Xu, Y., Nesbit, J.C., Winne, P.H.: Sequential pattern analysis of learning logs: methodology and applications. In: Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S. (eds.) Handbook of Educational Data Mining, pp. 107–121. CRC Press, Boca Raton (2010). (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)CrossRefGoogle Scholar
- 8.Sheard, J.: Basics of statistical analysis of interactions data from web-based learning enviroments. In: Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S. (eds.) Handbook of Educational Data Mining, pp. 27–42. CRC Press, Boca Raton (2010). (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)CrossRefGoogle Scholar
- 9.Tapiero, I.: Situation Models and Levels of Coherence. Lawrence Erlbaum Associates Inc, Mahwah (2007)Google Scholar
- 10.Schnotz, W.: Comparative instructional text organization. In: Mandl, H., Stein, N.L., Trabasso, T. (eds.) Learning and Comprehension of Text, pp. 53–81. Lawrence Erlbaum Associates Inc, Hillsdale (1984)Google Scholar
- 11.McNamara, D., Kintsch, E., Songer, N.B., Kintsch, W.: Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text. Cogn. Instr. 14(1), 1–43 (1996)CrossRefGoogle Scholar
- 12.Oakhill, J., Garnham, A.: On theories of belief bias in syllogistic reasoning. Cognition 46(1), 87–92 (1993)CrossRefGoogle Scholar
- 13.O’Reilly, T., McNamara, D.S.: Reversing the reverse cohesion effect: good texts can be better for strategic, high-knowledge readers. Discourse Process. 43(2), 121–152 (2007)Google Scholar
- 14.Cain, K., Oakhill, J.: Reading comprehension development from 8 to 14 years: the contribution of component skills and processes. In: Wagner, R.K., Schatschneider, C., Phythian-Sence, C. (eds.) Beyond Decoding: the Behavioral and Biological Foundations of Reading Comprehension, pp. 143–175. Guilford Press, New York (2009)Google Scholar
- 15.Kintsch, W.: Comprehension: a Paradigm for Cognition. Cambridge University Press, Cambridge (1998)Google Scholar
- 16.McNamara, D.S., O’Reilly, T.: Theories of comprehension skill: knowledge and strategies versus capacity and suppression. In: Colombus, A.M. (ed.) Progress in Experimental Psychology Research, pp. 113–136. Nova Science Publishers, Hauppauge (2009)Google Scholar
- 17.Winne, P.H., Baker, R.S.: The potentials of educational data mining for researching metacognition, motivation and self-regulated learning. J. Educ. Data Mining 5(1), 1–8 (2013)Google Scholar
- 18.Eason, S.H., Goldberg, L., Cutting, L.: Reader-text interactions: how differential text and question types influence cognitive skills needed for reading comprehension. J. Educ. Psychol. 104(3), 515–528 (2012)CrossRefGoogle Scholar
- 19.McNamara, D.S., Graesser, A.C., Louwerse, M.M.: Sources of text difficulty: across the ages and genres. In: Sabatini, J.P., Albro, E. (eds.) Assessing Reading in the 21st Century: Aligning and Applying Advances in the Reading and Measurement Sciences, Rowman & Littlefield Publishing, Lanham (in press)Google Scholar
- 20.Nelson, J., Perfetti, C., Liben, D., Liben, M.: Measures of text difficulty. Technical Report, Gates Foundation (2011)Google Scholar
- 21.McNamara, D.S., Louwerse, M.M., McCarthy, P.M., Graesser, A.C.: Coh-metrix: capturing linguistic features of cohesion. Discourse Process. 47(4), 292–330 (2010)CrossRefGoogle Scholar
- 22.Millis, K., Magliano, J.: Assessing comprehension processes during reading. In: Sabatini, J. P., O’Reilly, T., Albro, E. R. (eds.) Reaching an understanding pp. 35–54. Lanham: Rowman & Littlefield (2012)Google Scholar
- 23.McNamara, D.S., Magliano, J.P.: Self-explanation and metacognition. In: Hacher, J.D., Dunlosky, J., Graesser, A.C. (eds.) Handbook of Metacognition in Education, pp. 60–81. Erlbaum, Mahwah (2009)Google Scholar
- 24.Millis, K., Magliano, J.: Assessing comprehension processes during reading. In: Sabatini, J.P., O’Reilly, T., Albro, E.R. (eds.) Reaching an Understanding, pp. 35–54. Rowman & Littlefield Publishing, Lanham (2012)Google Scholar
- 25.McNamara, D.S.: SERT: self-explanation reading training. Discourse Process. 38, 1–30 (2004)CrossRefGoogle Scholar
- 26.Nardy, A., Bianco, M., Toffa, F., Rémond, M., Dessus, P.: Contrôle et Régulation de la Compréhension: L’acquisition de Stratégies de 8 à 11 ans. In: David, J., Royer, C. (eds.) L’apprentissage de la Lecture: Convergences, Innovations, Perspectives. Peter Lang, Bern (2003) (in press)Google Scholar
- 27.Hayes, A.F.: Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. The Guilford Press, New York (2013)Google Scholar
- 28.Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)CrossRefMATHGoogle Scholar
- 29.Alias-i: LingPipe, http://alias-i.com/lingpipe
- 30.McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action (2nd ed.): Covers Apache Lucene 3.0. Manning Publications, Greenwich (2010)Google Scholar
- 31.Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003)Google Scholar
- 32.Toutanova, K., Manning, C. D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 63–70. Association for Computational Linguistics, Stroudsburg (2000)Google Scholar
- 33.Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: 41st Annual Meeting of the Association for Computational Linguistics, pp. 423–430. Association for Computational Linguistics, Stroudsburg (2003)Google Scholar
- 34.Green, S., de Marneffe, M., Bauer, J., Manning, C.D.: Multiword expression identification with tree substitution grammars: a parsing tour de force with French. In: Conference on Empirical Methods in Natural Language Processing EMNLP 2011, pp. 725–735. Association for Computational Linguistics, Stroudsburg (2011)Google Scholar
- 35.Snowball, http://snowball.tartarus.org/
- 36.Centre National de Ressources Textuelles et Lexicales. le Lexique Morphalou, http://www.cnrtl.fr/lexiques/morphalou/LMF-Morphalou.php
- 37.Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by gibbs sampling. In: 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics, Stroudsburg (2005)Google Scholar
- 38.Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39(4), 1–32 (2013)Google Scholar
- 39.Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.D.: A multi-pass sieve for coreference resolution. In: Conference on Empirical Methods in Natural Language Processing, pp. 492–501. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
- 40.Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
- 41.Sagot, B., Darja, F.: Building a free french WordNet from multilingual resources. In: 6th International Conference on Language Resources and Evaluation, Ontolex 2008 Workshop, pp. 14–19. ELRA, Marrakech (2008)Google Scholar
- 42.Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics, Stroudsburg (1994)Google Scholar
- 43.Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for wordsense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 265–283. MIT Press, Cambridge (1998)Google Scholar
- 44.Denhière, G., Lemaire, B., Bellissens, C., Jhean-Larose, S.: A semantic space for modeling children’s semantic memory. In: Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis, pp. 143–165. Psychology Press, New York (2007)Google Scholar
- 45.Dascalu, M., Trausan-Matu, S., Dessus, P.: Utterances assessment in chat conversations. Res. Comput. Sci. 46, 323–334 (2010)Google Scholar
- 46.Lemaire, B.: Limites de la Lemmatisation pour L’extraction de Significations. In: 9es Journées Internationales d’Analyse Statistique des Données Textuelles, pp. 725–732. Presses Universitaires de Lyon, Lyon (2008)Google Scholar
- 47.Wiemer-Hastings, P., Zipitria, I.: Rules for syntax, vectors for semantics. In: 23rd Annual Conference of the Cognitive Science Society. Lawrence Erlbaum Associates Inc, Mahwah (2001)Google Scholar
- 48.Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. VLDB Endowment 5(8), 716–727 (2012)Google Scholar
- 49.Mallet: A machine learning for language toolkit, http://mallet.cs.umass.edu/
- 50.Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: GraphLab: a new parallel framework for machine learning. In: Grünwald, P., Spirtes, P. (eds.) 26th Conference on Uncertainty in Artificial Intelligence, pp. 340–349. AUAI Press, Catalina Island (2010)Google Scholar
- 51.Dascalu, M., Trausan-Matu, S., Dessus, P.: Cohesion-based analysis of CSCL conversations: holistic and individual perspectives. In: 10th International Conference on Computer-Supported Collaborative Learning, vol. 1, pp. 145–152. University of Wisconsin-Madison, Madison (2013)Google Scholar
- 52.Trausan-Matu, S., Stahl, G., Sarmiento, J.: Supporting polyphonic collaborative learning. E-service J. 6(1), 58–74 (2007). (Indiana University Press)Google Scholar
- 53.Rebedea, T., Dascalu, M., Trausan-Matu, S., Chiru, C.G.: Automatic feedback and support for students and tutors using CSCL chat conversations. In: 1st International K-Teams Workshop on Semantic and Collaborative Technologies for the Web, pp. 20–33. Politehnica Press, Bucharest (2011)Google Scholar
- 54.Trausan-Matu, S., Rebedea, T.: A polyphonic model and system for inter-animation analysis in chat conversations with multiple participants. In: Gelbukh, A. (ed.) 11th International Conference Computational Linguistics and Intelligent Text Processing. LNCS, vol. 6008, pp. 354–363. Springer, Heidelberg (2010)Google Scholar
- 55.Dascalu, M., Dessus, P., Trausan-Matu, S., Bianco, M., Nardy, A.: ReaderBench: an environment for analyzing text complexity and reading strategies. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik. P. (eds.) 16th International Conference on Artificial Intelligence in Education. LNCS, vol. 7926, pp 379–388. Springer, Heidelberg (2013)Google Scholar
- 56.Topic Sentences and Signposting. Harvard University, Writing Center, http://www.fas.harvard.edu/~wricntr/documents/TopicSentences.html
- 57.Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefMATHGoogle Scholar
- 58.Galley, M., McKeown, K.: Improving word sense disambiguation in lexical chaining. In: 18th International Joint Conference on Artificial Intelligence, pp. 1486–1488. Morgan Kaufmann Publishers, San Francisco (2003)Google Scholar
- 59.Vidal, N.: Miguel de la Faim. Amitié-G.T. Rageot, Paris (1984)Google Scholar
- 60.Bastian, M., Heymann, S., Jacomy, M.: Gephi: An open source software for exploring and manipulating networks. In: 3rd International Conference on Weblogs and Social Media, pp. 361–362. AAAI Press, Menlo Park (2009)Google Scholar
- 61.Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)CrossRefMATHGoogle Scholar
- 62.Williams, M.: Wittgenstein, Mind and Meaning: Towards a Social Conception of Mind. Routledge, New York (2002)Google Scholar
- 63.Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Stroudsburg (2004)Google Scholar
- 64.McNamara, D.S., O’Reilly, T.P., Rowe, M., Boonthum, C., Levinstein, I.B.: iSTART: a web-based tutor that teaches self-explanation and metacognitive reading strategies. In: McNamara, D.S. (ed.) Reading Comprehension Strategies: Theories, Interventions, and Technologies, pp. 397–420. Lawrence Erlbaum Associates Inc, Mahwah (2007)Google Scholar
- 65.Dahl, R.: Matilda. Gallimard, Paris (2007)Google Scholar
- 66.Dascalu, M., Trausan-Matu, S., Dessus, P.: Towards an integrated approach for evaluating textual complexity for learning purposes. In: Popescu, E., Li, Q., Klamma R., Leung, H., Specht, M. (eds.) 11th International Conference in Advances in Web-Based Learning. LNCS, vol. 7558, pp. 268–278. Springer, Heidelberg (2012)Google Scholar
- 67.Cortes, C., Vapnik, V.N.: Support-Vector Networks. Mach. Learn. 20(3), 273–297 (1995)MATHGoogle Scholar
- 68.François, T., Miltsakaki, E.: Do NLP and machine learning improve traditional readability formulas? In: 1st Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 49–57. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
- 69.Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assessment. Comput. Speech Lang. 23, 89–106 (2009)CrossRefGoogle Scholar
- 70.van Dijk, T.A., Kintsch, W.: Strategies of Discourse Comprehension. Academic Press, New York (1983)Google Scholar
- 71.Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: 23rd International. Conference on Computational Linguistics, pp. 276–284. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
- 72.Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 Shared task. In: 15th Conference on Computational Natural Language Learning: Shared Task, pp. 28–34. Association for Computational Linguistics, Stroudsburg (2011)Google Scholar
- 73.Pfeffer, P.: Les Pharmacies des Éléphants. Vie et Mort d’un Géant: L’éléphant d’Afrique, Flammarion, Paris (1989)Google Scholar
- 74.Mandin, S.: Modèles Cognitifs Computationnels de L’activité de Résumer: Expérimentation d’un Eiah auprès D’élèves de Lycée. Laboratoire des Sciences de l’Éducation. PhD thesis. Université Grenoble (2009)Google Scholar
- 75.Donaway, R.L., Drummey, K.W., Mather, L.A.: A comparison of rankings produced by summarization evaluation measures. In: Workshop on Automatic Summarization, vol. 4, pp. 69–78. Association for Computational Linguistics, Stroudsburg (2000)Google Scholar
- 76.Graesser, A.C., Singer, M., Trabasso, T.: Constructing inferences during narrative text comprehension. Psychol. Rev. 101(3), 371–395 (1994)CrossRefGoogle Scholar
- 77.Geisser, S.: Predictive Inference: An Introduction. Chapman and Hall, New York (1993)CrossRefMATHGoogle Scholar
- 78.Schulze, M.: Measuring textual complexity in student writing. In: American Association of Applied Linguistics. AAAL 2010, Atlanta (2010)Google Scholar
- 79.McNamara, D.S., Boonthum, C., Levinstein, I.B.: Evaluating self-explanations in iSTART: comparing word-based and LSA algorithms. In: Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis, pp. 227–241. Psychology Press, New York (2007)Google Scholar
- 80.Graesser, A.C., McNamara, D.S., VanLehn, K.: Scaffolding deep comprehension strategies through point & query, AutoTutor, and iStart. Educ. Psychol. 40(4), 225–234 (2005)CrossRefGoogle Scholar
- 81.Nardy, A., Bianco, M., Toffa, F., Rémond, M., Dessus, P.: Contrôle et Régulation de la Compréhension: L’acquisition de Stratégies de 8 à 11 ans. In: David, J., Royer, C. (eds.) L’apprentissage de la Lecture: Convergences, Innovations, Perspectives. Peter Lang, Bern (in press) (2003)Google Scholar
- 82.O’Reilly, T.P., Sinclair, G.P., McNamara, D.S.: iSTART: a web-based reading strategy intervention that improves students’ science comprehension. In: Kinshuk, K., Sampson D. G., Isaías P. (eds.) IADIS International Conference Cognition and Exploratory Learning in Digital Age: CELDA 2004 pp. 173-180. IADIS Press, Lisbon (2004)Google Scholar
- 83.Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-metrix: analysis of text on cohesion and language. Behav. Res. Meth. Instrum. Comput. 36(2), 193–202 (2004)CrossRefGoogle Scholar
- 84.François, T.: Les Apports du Traitement Automatique du Langage à la Lisibilité du Français Langue Étrangère. Centre de Traitement Automatique du Langage, PhD thesis. Université Catholique de Louvain, Faculté de Philosophie, Arts et Lettres, Louvain-la-Neuve (2012)Google Scholar
- 85.TreeTagger—A Language Independent Part of Speech Tagger, http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
- 86.Kukemelk, H., Mikk, J.: The prognosticating effectivity of learning a text in physics. Quant. Linguist. 14, 82–103 (1993)Google Scholar
- 87.Bouhineau, D., Luengo, V., Mandran, N., Toussaint, B.M., Ortega, M., Wajeman, C.: Open platform to model and capture experimental data in technology enhanced learning systems. In: Workshop on Data Analysis and Interpretation for Learning Environments,Vienna University of Economics and Business, Vienna (2013)Google Scholar