Mining learner–system interaction data: implications for modeling learner behaviors and improving overlay models
A growing body of empirical evidence suggests that the adaptive capabilities of computer-based learning environments can be improved through the use of educational data mining techniques. Log-file trace data provides a wealth of information about learner behaviors that can be captured, monitored, and mined for the purposes of discovering new knowledge and detecting patterns of interest. This study aims to leverage these analytical techniques to mine learner behaviors in relation to both diagnostic reasoning processes and outcomes in BioWorld, a computer-based learning environment that support learners to practice solving medical problems and receive formative feedback. In doing so, hidden Markov models are used to model behavioral indicators of proficiency during problem solving, while an ensemble of text classification algorithms are applied to written case summaries that learners’ write as an outcome of solving a case in BioWorld. The application of these algorithms characterize learner behaviors at different phases of problem solving which provides corroborating evidence in support of where revisions can be made to provide design guidelines of the system. We conclude by discussing the instructional design and pedagogical implications for the novice–expert overlay system in BioWorld, and how the findings inform the delivery of feedback to learners by highlighting similarities and differences between the novice and expert trajectory toward solving problems.
KeywordsIntelligent tutoring systems Medical education Educational data mining Learning analytics Hidden Markov models Overlay models
The utility of computer-supported education for learning cannot be understated; such learning experiences enrich the learner in varied ways. Progress in instructional technology research and development has brought various affordances and achievements in computer-based learning environments (CBLEs). CBLEs present important learning opportunities (VanLehn 2011), highlighted by the body of work replete with examples of CBLEs that have been developed, deployed, and assessed in an effort to aid learners in their scholastic activities. CBLEs have been used for a wide variety of purposes and in a gamut of contexts, including science (e.g., Lajoie 2009), math (e.g., Matsuda et al. 2013), computer science (e.g., Mitrovic 2003), and history (e.g., Poitras et al. 2012). The literature is rife with studies documenting the positive outcomes of the use of CBLEs (e.g., Beal et al. 2007; Biswas et al. 2010; Dodds and Fletcher 2004; Graesser et al. 2005; Matsuda et al. 2013; VanLehn et al. 2005). Research has suggested that there are beneficial effects of incorporating adaptation in such learning systems (Anderson and Gluck 2001; Durlach and Ray 2011).
Computer-based learning environments like intelligent tutoring systems (ITSs) are designed to provide adaptive instruction through the careful tracking of learner performance that determines the level of assistance needed at specific points in time. Some examples of adaptive systems include: Autotutor (Graesser et al. 2004), Andes Physics Tutor (VanLehn et al. 2005), and Cognitive Tutor in Algebra (Koedinger and Corbett 2006). ITSs provide learners with a one-on-one tutoring experience, which as Bloom (1984) found, can result in learning outcomes two standard deviations above regular classroom instruction. Over the years, research on the design, implementation, and evaluation of such learning systems has been accruing, yet challenges remain. One perennial challenge for learning technology researchers is how to adapt learning systems and improve response mechanisms. Modeling learner behaviors in such systems is crucial to better comprehend the learning trajectory which is needed to improve the learning system. Durlach and Ray (2011) note that generally the input (data) used to determine the level of adaptation is based on the “student ability to answer questions or solve problems” (p. 21). Thus, to enhance the adaptation and response mechanisms in learning systems, and ultimately learning, user–system interaction data need to be considered in light of its importance to improving instructional systems.
The unbridled growth in digital details that are created by people and systems is a widely told story. Similarly, a wealth of educational data is being amassed by researchers designing computer-based learning environments. An important aspect of learning in an ITS is the monitoring of learners’ progress by capturing the traces they leave as they interact with the learning system. Technological advances have afforded the ability to capture students’ learning on a more granular level than ever before. One of the most common means of capturing learner–system interactions in computer-based learning environments is via log files (Romero and Ventura 2010). The use of log files can be especially helpful in ascertaining learning without being intrusive, and also providing a flexible means of capturing data at different levels of granularity. Leveraging the learning data could be used to create more effective educational experiences. Recent advances in educational data mining and learning analytics have opened up new avenues for conducting educational research. Specifically, these techniques have created new opportunities to investigate learners’ learning trajectories, and thus, increases the ability to facilitate learning. Concomitantly, there has been an explosion of interest in mining the vast and rich repository of data in CBLEs as is shown by the use of educational data mining (EDM), a fast-growing field based on the optimal use of educational data, to facilitate and augment educational research. Data mining has been applied with much success on educational data (Baker and Yacef 2009; Romero and Ventura 2010). EDM research has focused on mining and modeling educational data toward comprehending various dimensions of learning such as off-task behavior (Baker 2007), learner emotions (Craig et al. 2008), gaming the system (Baker et al. 2013), detecting programing strategies (Blikstein 2011), and learner engagement (Cocea and Weibelzahl 2009). Durlach and Ray (2011) highlight that “one way to tune parameters of adaptation is through analysis of past student performance data using data mining techniques” (p. 24). Thus, EDM techniques and tools can be useful analysis tools that can lead to improvements in the types of student modeling afforded by ITSs.
Enhancing learner experience is a constant goal for instructional technology researchers, and the ways to achieve this goal are plentiful. One of the predominant features of intelligent tutoring systems is the provision of feedback to learners. Studies have suggested the benefits of feedback toward better learning outcomes (Hattie and Timperley 2007; van der Kleij et al. 2012) and further, guided feedback through appropriate and timely feedback is also posited to be important for learning (Anderson et al. 1995; Merrill 2002). Similarly, individualized feedback is considered an important component of ITSs (Park and lee 2004). Employing various data mining techniques on usage data can be used to improve the learning system, specifically toward the design of feedback mechanisms. In this paper, we present and discuss our efforts to leverage data mining techniques in the context of BioWorld.
BioWorld (Lajoie 2009; Lajoie et al. 2015) is an ITS developed to scaffold novice physicians in developing clinical reasoning skills as they diagnose virtual patient cases. Appropriate scaffolding and adaptation is based on an accurate learner model, which is an essential step toward identifying learner behavior, ascertaining the use of the learning material, and improving the learning system. Analysis of the sequence of actions employed by learners in problem solving can provide insights into the use of the learning material, and facilitate a deeper understanding of the learning process. For the proposed study, we employ a data mining approach, which automatically builds a learner model via the behavior patterns observed that illustrate specific learning trajectories. Hidden Markov models (HMMs) provide a simple and effective way for modeling sequence data (Rabiner 1989; Rabiner and Juang 1986), and have received a growing recognition for its applicability in varied problems. In the present study, we employ HMMs on the sequence data extracted from the log files generated by BioWorld, to explore insights that can be harvested by such a modeling technique.
Scaffolding is provided in the form of expert feedback about the clinical reasoning processes taken to solve a specific case. BioWorld dynamically assesses novices’ reasoning trajectory against expert paths. The novice–expert overlay model is used to provide requisite feedback to learners (Naismith and Lajoie 2010; Doleck et al. 2014a). An important facet of clinical reasoning is the ability to write case summaries. However, the current overlay model does not include these summaries. Motivated by the desire to improve the Novice–Expert overlay model in BioWorld, we explore the feasibility and efficacy of text mining in ascertaining the differences between case summaries written by experts and novices in BioWorld.
The primary goals and contributions of this paper are to illustrate the possibilities and efficacy of two popular mining techniques, namely, HMMs and text mining on learner–system usage data from an ITS called BioWorld. This paper is organized as follows: (1) the first section presents a brief overview of the learning environment used in the study, namely, BioWorld; (3) the second section describes the methodology; (4) the third section presents the procedure, analysis, and results of HMM analysis across the three clinical cases; (5) the fourth section outlines the experimental setup and findings of using text mining for the novice–expert classification task; and (6) the final section discusses the findings, highlights limitations, and offers future directions of the present study.
Learning environment: BioWorld
Cynthia is a 37-year-old banker. For the past few months she has been feeling unwell. She dates this back to when she started taking her medication for high blood pressure. She has had frequent headaches and periods of time during which she feels extremely anxious with palpitations, profuse sweating, and flushing. The episodes are not triggered by any specific cause but she finds they have become more frequent in the past little while. When asked by her doctor, Cynthia admits that she has lost 10 pounds in the last 4 months and that she has been dizzy from time to time.
Participation in the study was solicited through advertisements and newsletter at a North American research university. Thirty volunteer undergraduate students participated in the study, and were compensated $20 at the completion of a 2-h study session. The convenience sample comprised 19 women (63 %) and 11 men (37 %), with an average age of 23 (SD = 2.60). All 30 participants were registered in the same classes, where 28 were medical students and 2 were dental students. The data for this study were collected as part of a larger project that investigated the antecedent factors that led to attention allocation toward feedback in the BioWorld environment (Naismith 2013). All participants consented to the use of the data collected for research purposes.
Participants completed both a demographic questionnaire and the achievement goal questionnaire. This was followed by a training session, where participants completed a training case, which allowed them to learn how to navigate and use the BioWorld system. Participants were also provided instructions on how to think aloud while solving cases in BioWorld. Following the training case, the actual experimental study began, where participants solved each of the three cases in BioWorld on an individual basis for 2 h. The three endocrinology cases were: Amy, Cynthia, and Susan Taylor. The correct diagnosis for each was diabetes mellitus (type 1), pheochromocytoma, and hyperthyroidism, respectively. The order of the cases was counterbalanced to mitigate practice effects. Upon completion of each case, participants completed a retrospective outcome achievement emotions questionnaire. The learners’ processes and think alouds were recorded. As a measure of case difficulties, we used the results of an earlier study as a baseline; Gauthier et al. (2008) ascertained the difficulty levels of the various patient cases based on accuracy alone; the anticipated success rates (represented as percent accuracies) for the three cases, ordered from easiest to the most difficult, were Amy (94 %), Susan Taylor (78 %), and Cynthia (33 %).
Examining learner behavior
Although the topic of clinical reasoning has previously been examined extensively, the bulk of the studies focus on diagnosis correctness. However, understanding how learners arrived at the diagnosis can provide a more detailed examination of the diagnostic reasoning processes that lead to patient case solutions than diagnosis accuracy alone. The learning trajectory, i.e., actions taken by learners in solving a problem, holds considerable information that can be useful for modeling and improving learning systems. Biswas et al. (2010) recommend that shifting focus from the frequency and relevance of learner activities to considering the internal states and related learning strategies, can be useful in illuminating additional information for examining learning in learning systems. Recent research in computer-supported education has leveraged data mining techniques for examining varied problems (Baker and Yacef 2009; Romero and Ventura 2010). One method that has proven useful in investigating a range of problems is the HMMs (Rabiner 1989). HMMs have found widespread use in modeling sequential data in a range of contexts, yet the application of HMMs to usage data from computer-based learning environments is relatively new. Recent examples of application of HMMs in educational contexts include studies by: (a) Beal et al. (2007) who used HMM to ascertain the level of engagement in a tutoring system for high school mathematics to predict the subsequent actions in the tutoring environment; and (b) Jeong et al. (2008) who employed HMM in predicting transitions between learner activities in a teachable agent environment and exhibited the efficacy of HMMs in ascertaining learners’ pattern of activities. Leveraging HMM analysis may facilitate considerable progression and improvements in learner modeling by providing an alternate understanding of learner activities that will lead to better adaptive environments.
As mentioned earlier, the actions that learners take to solve a problem, hold considerable information. As learners use BioWorld, the system captures user actions in log files. An example of actions taken in solving a real case, where Pheochromocytoma is the main hypothesis, is illustrated in Fig. 2. The set of actions captured by the BioWorld system include: AE = ‘add evidence’; AH = ‘add hypothesis’; CHC = ‘change hypothesis conviction’; LE = ‘link evidence’; SEH = ‘select hypothesis’; AT = ‘add test’; SH = ‘submit hypothesis’; P = ‘prioritize’; EM = ‘expert match’; FP = ‘final priority’; LL = ‘load library’; C = ‘categorize’; RP = ‘reprioritize’; SU = ‘submit summary’; UL = ‘unlink evidence’; DH = ‘delete hypothesis’; SLC = ‘select library category’; ASH = ‘abort submit hypothesis’; SL = ‘search library’; U = ‘unprioritize’; RC = ‘recategorize’.
Hidden Markov models
One major affordance of intelligent tutoring systems is that learner interactions are captured, and can be analyzed to provide adaptive instruction. Such fine-grained user–system data can be beneficial in learner modeling and analytics. When solving a case in BioWorld, learners’ actions and interactions with several tools, such as the library and chart during problem solving are logged by the system. The learner actions are time stamped and categorized according to both superordinate (e.g., Action: add lab test) and subordinate categories (e.g., Evidence: Thyroid Stimulating Hormone (TSH) Result: 0.2 mU/L).
From the log files, a line of diagnostic reasoning (i.e., sequence data) can be extracted. We extracted the sequence data for each of the endocrinology cases (Amy, Cynthia, and Susan Taylor) from the BioWorld logs. To this end, a parser (Doleck et al. 2014b) was used to extract the sequence data for the three cases from the log files. The HMM that captures learners’ aggregated behavioral patterns was then generated using this sequence data. In our work, we elected to use the HMM generation tool (Biswas et al. 2010), which is used for modeling generative sequences, to calculate and generate the HMMs. The tool generates visualizations that illustrate the states, the action emission probabilities for each state, and the transitions between states. In this paper, we present the visualizations illustrating the state connections and the action emission probabilities for the three cases.
Comparison of the HMM derived for the three cases
State transition with the highest probability
Diagnosis formulation to evaluation
Evidence-driven diagnosis formulation to evaluation
Evidence-driven diagnosis formulation to evaluation
Discussion: modeling behaviors via HMM
Biswas et al. (2014) note that theory-driven metrics and context-driven hypotheses have in the past been the vehicles for assessing learning behaviors in learning systems; furthermore, they highlight the shift toward adoption of data mining techniques for examining learner behaviors. Much work has been done to collect and model user–system usage data, so that the information they contain can be most effectively used. Various mining methods can be applied to educational data to improve learning experiences and outcomes by increasing comprehension of learner behaviors. A deep and appropriate examination of learner actions in learning systems can lead to insights on behavioral patterns in learning and problem solving. The HMM analysis on the clinical reasoning data from BioWorld represents an effort toward revealing rich information about learner behaviors that can be valuable from both an instructional design and technology perspective. The findings from the HMM analysis indicate that behavioral patterns are indeed identifiable, where the HMM model did capture a pattern in the lines of diagnostic reasoning in the context of BioWorld. In particular, for the Amy case, learners engaged in a pattern of formulating a hypothesis, and then evaluating their final diagnosis. For the Cynthia and Susan Taylor cases, the evidence that is linked to a particular hypothesis is also predictive of a shift from diagnosis formulation to evaluation. As highlighted earlier, the three cases have varying difficulty levels; the Cynthia and Susan Taylor cases are the more difficult cases. Linking evidence is more prominent in the two aforementioned cases, suggesting the proclivity of evidence linking in the more complex cases. The HMM results provide valuable insights into behavioral differences across the three cases under consideration. This provides some evidence of the case-specific (van der Vleuten and Swanson 1990; Fitzgerald et al. 1994; Doleck et al. 2015) nature of clinical reasoning. The assessment of these reoccurring behaviors stands to elucidate the important factors to task performance. Learner modeling is an important research topic in computer-supported education; the improvements and increased availability of mining methods have opened the area of learner modeling to vast possibilities, enabling increased knowledge discovery from educational data. Overall, HMMs can be used effectively in meeting the growing interest in the analysis of educational usage data to detect behavior patterns in learning environments. Future directions involve looking at other state transitions for each of the cases to understand other differences in behavior patterns.
Similar to the popularity of HMMs, text mining has also been widely applied in various disciplines as improving learning systems like ITSs continues to be of interest to learning technologists. Mining data logged by such systems can play a key role in system improvements. Text mining has been widely applied in extracting knowledge from various forms of text data. In the following sections, we illustrate our approach to improving the novice–expert overlay model in BioWorld by employing text mining in ascertaining the differences between case summaries written by experts and novices in BioWorld.
Improving the novice–expert overlay model
Text classification for novice–expert overlay model
Text mining has become an important means of knowledge-based discovery from varied data sources and types, and has assumed a central method with multifarious potential applications in varied fields ranging from commerce to education. Text classification is employed in a gamut of domains for varied purposes (Sebastiani 2002), such as news filtering and organization, document organization and retrieval, opinion mining, email classification, and spam filtering (Aggarwal and Zhai 2012). Text classification algorithms are also commonly used in intelligent tutoring systems for assessment purposes (McNamara 2007). When investigating the potential for augmenting the novice–expert overlay model in BioWorld, one natural idea was to consider the use of various text-mining classifiers for this task. A survey of the literature yields a gamut of classifiers for solving text classification problems; for our study we limited our experiments to a set of commonly used text-mining algorithms (Aggarwal and Zhai 2012; Kibriya et al. 2004; Platt 1998). The goal of text classification is the classification of text into a number of predefined categories; this is achieved by first transforming the text (which are typically strings of characters) into a representation suitable for the learning algorithm and the classification task (Joachims 1998). Essentially, a general text classification problem involves assigning a new unseen text to one of the given classes. For example, a binary classification task for email messages would involve assigning a new unlabeled email as either spam or non-spam. The general approach toward solving such problems is to train classifiers utilizing labeled texts. Text-mining approaches can be incorporated into learning systems like BioWorld in order to improve such learning systems. We investigate the efficacy of text mining in the case summary classification problem. Specifically, we test an ensemble of machine learning algorithms for the novice–expert problem.
Data set: case summaries
In BioWorld, after learners receive individualized feedback on their solution based on an aggregated expert solution, the learners’ final task in the diagnostic process involves writing a final case summary of the patient’s case. The case summaries written by novices and experts serve as the data for our experiments. The case summaries were extracted from the log files generated by BioWorld. The final data for the novice–expert classification problem included a total of 74 case summaries, with 60 summaries written by novices and 14 summaries written by experts. In order to evaluate our method, we used the 74 case summaries for our experiments.
Patient has elevated T3, T4; low TSH, and elevated thyroid stimulating antiglobulin. This is very suggestive of hyperthyroidism due to an autoimmune process. Listed symptoms (anxiety, weight loss, elevated HR, BP, tremor, sweating) all support this diagnosis.
37-year-old female, presenting after starting high blood pressure pill with episodes of palpitations, flushing, and sweating. On exam, hypertensive, relative tachycardia. Labs revealed: normal TSH, T4, T3, glucose. Elevated free urinary cathecolamines but normal total. CT abdo normal.
Ensemble of text-mining algorithms
Our initial experiment with LibSVM demonstrated the feasibility of achieving highly accurate prediction rate in the Novice–Expert case summary classification task. In data mining, classification performance evaluation is an essential and key way to ascertain the optimal learning algorithm from an ensemble of algorithms. We wanted to extend this work by comparing an ensemble of algorithms. By empirically comparing a number of classifiers, we can ascertain the classifier that yields the best performance.
The preprocessing filter StringToWordVector provided by WEKA (Hall et al. 2009) was used to extract feature vector from the summary texts. The default values were used except for the parameter that converted texts into lowercase. Stemmer is commonly used to reduce the feature size in text classification problems where the feature size tends to be in the order of tens of thousands. Because our dataset is fairly small in terms of both the size of each summary, as well as the total number of summaries written by expert and novices, the stemmer was not utilized. Stopwords were not removed—they were kept as a part of feature vector. Using the aforementioned preprocessing technique, word tokens were extracted as features. The resulting arff file format was then utilized for the experiments.
Results with all the features
Results with all the features
Naïve bayes multinomial
Accuracy (correctly classified)
Results with feature selection
Results with feature selection
Naïve bayes multinomial
Accuracy (correctly classified)
Results with cost-sensitive classifiers
Results with cost-sensitive classifiers
Naïve bayes multinomial
Accuracy (correctly classified)
Discussion: improving novice–expert overlay model
The initial results obtained with LibSVM (the accuracy rate of the prediction was 93.2432 %) showed promise. We then tested an ensemble of classifiers. Using the full features in the dataset, the various applied classifiers were able to achieve high prediction rates. SMO performed the best among the classifiers (92.1053 %). We then experimented with feature selection techniques to ascertain if the performance of the various classifiers could be improved. Interestingly, the results with feature selection did not result in improved performance, except for J48 (minor improvement from 78.9474 to 80.2632 %). The results obtained using feature selection were worse than results obtained using feature selection for the rest of the classifiers. We noted that the dataset used in our experiments was unbalanced. Thus, to mitigate that limitation, we decided to use cost-sensitive classifiers. Using the CostSensitiveClassifiers, there was no deterioration in the performance. Overall, the experimental results revealed the feasibility and efficacy of using text mining to achieve highly accurate predictions for the Novice–Expert case summaries classification task. In doing so, the linguistic features that differentiate case summaries written by novices and experts may inform revisions to the novice–expert overlay model, allowing the system to use the unstructured data obtained from case summaries to better tailor the content shown in the feedback panel. As such, the novice–expert overlay model may be further improved by examining the descriptors that characterize these learner behaviors and how they differ across lines of reasoning that are found to be correct as opposed to those that are incorrect. On the basis of our theoretical framework, learners are expected to benefit from instruction on how to adaptively monitor and control their own reasoning processes to avoid common pitfalls in their interpretation of the evidentiary data collected from the patient. The revisions made to the novice–expert overlay system allow the system to individualize instruction to the specific needs of different learners.
The present work examined the feasibility and efficacy of employing HMMs and text mining to usage data from a computer-based learning environment called BioWorld. On the one hand, the formulation and revision of diagnoses on the basis of evidence collected from the virtual patient are the most probable underlying sequence of behaviors that characterize problem solving in BioWorld. We are currently investigating this finding by exploring sequential patterns of behaviors through the use of subgroup discovery method (Poitras et al. 2015). On the other hand, we have shown that the linguistic features that characterize case summaries written by learners are indicative of proficiency differences in synthesizing the evidence and diagnostic processes after solving problems in BioWorld.
Two techniques were used to determine both process and product differences of how medical students solve medical cases using BioWorld. In addition to diagnostic accuracy, we were interested in the processes learners use to diagnose a patient case. What steps do they take to reason about the disease? HMM was used to identify state probabilities and transitions within and between 3 cases. HMM was used to determine the types of learner patterns that existed for solving cases of varying difficulty level. Another important advance in computational analyses is the use of text-mining techniques to look at patterns in data, which we used to look at expert/novice differences in how physicians wrote patient case summaries at the completion of a case. These summaries are used in the real world as a hand-off method to inform the next physician about the patient case.
The goal of the first part of our paper was to employ the use of HMMs for examining the learner behaviors across three different virtual patient cases. The findings demonstrate that the HMM model was effective at capturing patterns in the lines of diagnostic reasoning that mediate performance in solving problems in the context of BioWorld. More specifically, the results exhibited that learners’ behaviors could be modeled and analyzed effectively with HMMs. Similar to previous examples of successful application of HMMs in solving various problems, the results from our explorations of learner behaviors exhibited the utility of HMMs in discover, detection, explication, and comparison of learner behaviors. However, there are some limitations for this approach. Firstly, there is some subjectivity in model explication; the derived states of the model are assigned labels/meaning by the researchers using qualitative analysis. One way to mitigate this limitation is to have domain experts involved in this stage of the analysis. Secondly, large data sets are usually required to produce generalizable models; our data set is fairly small and thus, the next logical step is to collect more observations and retest the model. Our research demonstrates that using HMMs in this particular context can illustrate how novices’ problem-solving actions differ from experts. This contribution is part of a larger ongoing series of studies to better understand learners’ behaviors in BioWorld; a forthcoming contribution proposes the use of process mining for modeling learner behaviors (Doleck et al., in prep).
We have shown empirically that the ensemble of text-mining algorithms is especially useful in generating high predictive accuracy for the classification task. As such, the proposed case summary classifier allows the system to differentiate between case summaries written by novices and experts, which suggests that these case summaries are not only recognizably different by a machine, but also that their constituent content is different in some important manner. The novice–expert case summary classification task represents an important step in broadening the nature of the information that is processed by the current version of the novice–expert overlay model embedded in the BioWorld. This model is currently limited to the structured data that is collected as learners interact with tools embedded in the interface; however, the unstructured texts that constitute the case summaries may provide a wealth of information that is unaccountable by such tools, for instance, the patient management plan or a justification for a differential diagnosis. One of the most important limitations of the proposed revisions to the algorithms that underlie the novice–expert overlay system is its scalability. It is yet to be determined whether the model performs as well for novel examples of case summaries, or if the parameters are applicable for case summaries written in relation to other cases that exhibit variations in symptoms and vital signs indicative of the same disease. Although the extensiveness of the training dataset is an important factor in addressing this challenge, we maintain that the current method benefits the learner model embedded in the system by targeting complementary channels of data about the learner. The current version of the overlay model is severely limited in its inferential capabilities by relying solely on evidence items, including highlighted symptoms and vital signs as well as lab test procedures that warrant the main hypothesis for the case under investigation. In order to build more scalable models, we suggest that establishing common standards for logging case summary data across the field is necessary in order to attain sufficiently large labeled datasets. We call for researchers to establish such standards for intelligent tutoring systems in the medical domain to enable further progress in this area.
The findings of this study suggest that sequential and linguistic features extracted from the log-file database of intelligent tutoring systems are a meaningful source of information to differentiate the mechanisms that underlies superior performance in the medical domain. One promising line of research involves fine-grained examinations of case summaries in order to better understand how novices write case summaries differently than experts. A sentence- or proposition-level analysis may lead the novice–expert model to make detailed recommendations regarding the type of different feedback that most benefit a particular learner. Alternatively, the features of the case that are most commonly reported by both novices and experts alike should be considered as self-evident, and not worthy of extensive consideration beyond the feedback that is currently delivered by the system, and which has been shown to be effective in terms of promoting learning. In future studies, we will focus our efforts on applying the text classifiers that were found to be successful in the current study to a training dataset that has been labeled at a fine-grained level with the aim of evaluating the impacts on classification accuracy.
- Anderson, J., & Gluck, K. (2001). What role do cognitive architectures play in intelligent tutoring systems? In D. Klahr & S. Carver (Eds.), Cognition & Instruction: 25 years of progress (pp. 227–262). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
- Baker, R. S. J. D. (2007). Modeling and understanding students’ off-task behavior in intelligent tutoring systems. Proceedings of ACM CHI 2007: computer-human interaction, 1059–1068.Google Scholar
- Baker, R. S. J. D., Corbett, A. T., Roll, I., Koedinger, K. R., Aleven, V., Cocea, M., et al. (2013). Modeling and studying gaming the system with educational data mining. In R. Azevedo & V. Aleven (Eds.), International handbook of metacognition and learning technologies (pp. 97–116). New York: Springer.CrossRefGoogle Scholar
- Baker, R. S. J. D., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.Google Scholar
- Beal, C. R., Mitra, S., & Cohen, P. (2007a). Modeling learning patterns of students with a tutoring system using hidden Markov models. In R. Lucking, K. R. Koedinger, & J. Greer (Eds.), Artificial intelligence in education (pp. 238–245). Amsterdam: IOS Press.Google Scholar
- Beal, C. R., Walles, R., Arroyo, I., & Woolf, B. P. (2007b). On-line tutoring for math achievement testing: A controlled evaluation. Journal of Interactive Online Learning, 6(1), 43–55.Google Scholar
- Biswas, G., Kinnebrew, J. S., & Segedy, J. R. (2014). Using a Cognitive/Metacognitive Task Model to analyze Students Learning Behaviors. In Proceedings of the 16th International conference on human-computer interaction. Crete, Greece.Google Scholar
- Chapelle, O., & Vapnik, V. (2000). Model selection for support vector machines. Advances in neural information processing systems (Vol. 12). Cambridge, MA: MIT Press.Google Scholar
- Collins, A. (2006). Cognitive apprenticeship. In K. Sawyer (Ed.), Cambridge handbook of the learning sciences (pp. 47–60). NY: Cambridge University Press.Google Scholar
- Dodds, P., & Fletcher, J. D. (2004). Opportunities for new “smart” learning environments enabled by next-generation web capabilities. Journal of Educational Multimedia and Hypermedia, 13(4), 391–404.Google Scholar
- Doleck, T., Jarrell, A., Chaouachi, M., Poitras, E., & Lajoie, S. (in prep). A tale of three cases: Examining accuracy, efficiency, and process differences in diagnosing virtual patient cases.Google Scholar
- Doleck, T., Basnet, R. B., Poitras, E., & Lajoie, S. (2014). BioWorldParser: A suite of parsers for leveraging educational data mining techniques. In Proceedings of 2nd IEEE International Conference on MOOCs, Innovation & Technology in Education (MITE), (pp. 32–35), India: IEEE. doi: 10.1109/MITE.2014.7020236
- Doleck, T., Basnet, R. B., Poitras, E., & Lajoie, S. (2014). Augmenting the novice-expert overlay model in an intelligent tutoring system: Using confidence-weighted linear classifiers. In Proceedings of IEEE International Conference on Computational Intelligence & Computing Research (IEEE ICCIC), (pp. 87–90), India: IEEEGoogle Scholar
- Doleck, T., Jarrell, A., Poitras, E., & Lajoie, S. (2015). Towards investigating performance differences in clinical reasoning in a technology rich learning environment. In Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M. F. (Eds.), Artificial intelligence in education (pp. 567–570). Lugano: Springer International Publishing. doi: 10.1007/978-3-319-19773-9_63
- Durlach, P. J. & Ray, J. M. (2011). Designing adaptive instructional environments: Insights from empirical evidence. Army Research Institute Technical Report 1297. U.S. Army Research Institute, Arlington, VA.Google Scholar
- Fitzgerald, J., Wolf, F., Davis, W., Barclay, M., Bozynski, M., Chamberlain, K., et al. (1994). A preliminary study of the impact of case specificity on computer-based assessment of medical student clinical performance. Evaluation and the Health Professions, 17(3), 307–321. doi:10.1177/016327879401700304.CrossRefGoogle Scholar
- Frank, E., & Bouckaert, R. R. (2006). Naive Bayes for text classification with unbalanced classes. In Proceedings of 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 503-510. Berlin: Springer.Google Scholar
- Gauthier, G., Lajoie, P. S., Naismith, L., & Wiseman, J. (2008). Using expert decision maps to promote reflection and self-assessment in medical case-based instruction. In Proceedings of Workshop on the Assessment and Feedback in Ill-Defined Domains at ITS, Montréal, Canada.Google Scholar
- Hall, M. A. (1999). Correlation-based feature subset selection for machine learning. (doctoral dissertation). Department of Computer Science, University of Waikato, New Zealand.Google Scholar
- Jeong, H., Gupta, A., Roscoe, R., Wagster, J., Biswas, G., & Schwartz, D. (2008). Using hidden Markov models to characterize student behaviors in learning-by-teaching environments. In Proceedings of Intelligent Tutoring Systems: Vol. 5091. Lecture Notes in Computer Science (pp. 614–625). Montreal: Springer.Google Scholar
- Joachims, T. (1998). Text Categorization with support vector machines: Learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning.Google Scholar
- Kibriya, A. M., Frank, E., Pfahringer, B., & Holmes, G. (2004). Multinomial naïve bayes for text categorization revisited. In G. I. Webb & X. Yu (Eds.), Advances in artificial intelligence (pp. 488–499). Berlin, Heidelberg: Springer.Google Scholar
- Koedinger, K., & Corbett, A. (2006). Cognitive tutors: Technology bringing learning science to the classroom. In K. Sawyer (Ed.), The cambridge handbook of the learning sciences (pp. 61–78). Cambridge: Cambridge University Press.Google Scholar
- Lajoie, S. P. (2009). Developing professional expertise with a cognitive apprenticeship model: Examples from avionics and medicine. In K. A. Ericsson (Ed.), Development of professional expertise: Toward measurement of expert performance and design of optimal learning environments (pp. 61–83). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
- Lajoie, S. P., Naismith, L., Hong, Y. J., Poitras, E., Cruz-Panesso, I., Ranellucci, J., et al. (2013). Technology rich tools to support self-regulated learning and performance in medicine. In R. Azevedo & V. Aleven (Eds.), International handbook of metacognition and learning technologies (pp. 229–242). New York: Springer.CrossRefGoogle Scholar
- Lajoie, S. P., Poitras, E. G., Doleck, T., & Jarrell, A. (2015). Modeling metacognitive activities in medical problem-solving with bioworld. In A. Peña-Ayala (Ed.), Metacognition: Fundaments, applications, and trends (pp. 323–343). New York: Springer Series: Intelligent Systems Reference Library.Google Scholar
- McNamara, D. S. (2007). IIS: A marriage of computational linguistics, psychology, and educational technologies. In D. Wilson & G. Sutcliffe (Eds.), Proceedings of the twentieth international florida artificial intelligence research society conference (pp. 15–20). Menlo Park, California: The AAAI Press.Google Scholar
- Mitrovic, A. (2003). An intelligent SQL tutor on the web. International Journal of Artificial Intelligence in Education, 13(2), 173–197.Google Scholar
- Naismith, L. (2013). Examining motivational and emotional influences on medical students’ attention to feedback in a technology-rich environment for learning clinical reasoning (Unpublished doctoral dissertation). Montreal: McGill University.Google Scholar
- Naismith, L., & Lajoie, S. P. (2010). Using expert models to provide feedback on clinical reasoning skills. In proceedings of the International Conference on Intelligent Tutoring Systems (pp. 242–244).Google Scholar
- Park, O. C., & Lee, J. (2004). Adaptive instructional systems. In D. H. Jonassen (Ed.), Handbook of research for educational communications and technology (2nd ed., pp. 651–684). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
- Platt, J. C. (1998). A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14.Google Scholar
- Poitras, E. G., Lajoie, S. P., Doleck, T., & Jarrell, A. (in press). Subgroup discovery with user interaction data: An empirically guided approach to improving intelligent tutoring systems. Educational Technology & Society.Google Scholar
- Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618.Google Scholar
- Shute, V. J., & Zapata-Rivera, D. (2008). Adaptive technologies. In J. M. Spector, D. Merrill, J. van Merriënboer, & M. Driscoll (Eds.), Handbook of research on educational communications and technology (3rd ed., pp. 277–294). New York, NY: Lawrence Erlbaum Associates, Taylor & Francis Group.Google Scholar
- Shute, V. J., & Zapata-Rivera, D. (2012). Adaptive educational systems. In P. Durlach & A. Lesgold (Eds.), Adaptive technologies for training and education. New York, NY: Cambridge University Press.Google Scholar
- Vanlehn, K., Lynch, C., Schulze, K., Shapiro, J. A., Shelby, R., Taylor, L., et al. (2005). The Andes physics tutoring system: Lessons learned. International Journal of Artificial Intelligence in Education, 15, 147–204.Google Scholar
- Zapata-Rivera, D., & Greer, J. (2000). Inspecting and visualizing distributed Bayesian student models. In Proceedings of Intelligent Tutoring Systems (pp. 544–553).Google Scholar