In recent years, artificial intelligence (AI) has been widely applied in many areas including education (Luckin et al., 2016), with commensurate increases being seen in AI education (AIED) research and applications. AIED adaptive learning and evaluation applications are now being used to improve educational effectiveness and efficiency (Chassignol et al., 2018; Kurshan, 2016), evaluate teaching effect, adjust teaching and problem-solving strategies in real-time (Shute & Psotka, 1996), and provide a better understanding of student knowledge acquisition (VanLehn et al., 2007; Beal et al., 2010).

ITS research is multidisciplinary involving AI, pedagogy, psychology, and other related disciplines (e.g., Craig et al., 2004; Graesser et al., 2012; Hu & Cooper, 2014; Luckin et al., 2016). Sleeman and Brown’s (1982) early book Intelligent Tutoring Systems brought together different research fields, with the contributors coming from AI, cognitive science, and education fields (Luckin et al., 2016). Luckin et al. (2016) defined ITSs as computerized learning environments that incorporated computational models from the cognitive sciences, learning sciences, computational linguistics, AI, mathematics, and other fields, Graesser, Conley, et al. (2011), Graesser, Mcnamara, et al. (2011)) commented that ITS often incorporated pedagogical, psychological, and other cognitive learning theories into computational models, Cristina Conati (2002) noted that ITS research focused on advances in AI, cognitive science, and education to improve computer-supported education, and Ahuja and Sille (2014) commented that ITS research was an intersection of computer science, cognitive psychology, and educational research.

As early as the 1970s, AI systems were being used to provide individual, adaptive instruction (Conati et al., 2002a, 2002b; Luckin et al., 2016). The first ITS system, which was developed by Carbonell in 1970, was SCHOLAR (Dargue & Biddle, 2014), which was followed a few years later by several influential ITSs, such as the BIP designed by Stanford University in 1977 (Wescourt et al., 1977), WUMPUS developed by MIT in 1977 (Xu et al., 2009), SOPHIE (Sleeman & Brown, 1982), DEBUGGY (Sleeman & Brown, 1982), and AutoTutor (Graesser et al., 2005). The teaching effectiveness of these ITSs has been found to nearly parallel real-life teachers (VanLehn, 2011) and enhance the performances of both teachers and students (Spector et al., 2014). Therefore, ITSs are being regarded as the propeller for the future of human education (Luckin et al., 2016).

Conceptually, there is no clear boundary between ITSs and related concepts such as computer-assisted learning (CAL), computer-assisted training (CAT), and computer-assisted instruction (CAI) (Sleeman & Brown, 1982; VanLehn, 2006; Anderson et al., 1990). However, Conati (2009) claimed that the key difference between ITSs and CAI was that the solutions provided by ITSs were generated in real-time from student input, rather than having to be predefined, and Wenger (1987) and VanLehn (2011) commented that while CAI, CAL, and CAT were based on responses, ITSs were based on steps. In this study, CAI-related concepts have been excluded.

ITSs research has been multidisciplinary, with the key research associations being computer science and computer technologies and the core of the research being the development of student models (Desmarais & Baker, 2012) that digitize student abilities and allow them to access personalized instruction (Conati & Kardan, 2013) matched to their aptitude (Graesser et al., 2012; Hu & Cooper, 2014; Vandewaetere et al., 2011). The benefits of these ITS student models and other components are that they promote personalized learning, provide real-time learning analysis, use self-adaptive content, and designate targeted practice (Liu, 2003). Corbett and Anderson (1994) first proposed a knowledge tracing (KT) model based on a hidden Markov model to identify changing cognitive states during knowledge acquisition by analyzing student data and predicting future performances.

Another crucial research field has been ITS applications (D’Mello et al., 2007) such as AutoTutor, which has also included comparative studies that compared ITS learning effects with other forms (VanLehn et al., 2007) to assess effectiveness. Some studies found AutoTutor to be as effective as a human tutor in computer-mediated conversations (Graesser et al., 2003; Person et al., 2001), and others identified the specific factors related to the gains in deep-level comprehension (Graesser et al., 2009; Baker et al., 2010). These early ITS research studies attracted greater research interest and the development of other ITS applications, such as Coh-Metrix and Cognitive Tutor (Graesser et al., 2014; Graesser, Conley, et al., 2011; Graesser, Mcnamara, et al., 2011; McNamara et al., 2010, 2014; Pane et al., 2014).

Scholars in psychology provided theoretical tools and was of great significance to this kind of research (Arroyo et al., 2009; D’Mello et al., 2007). For example, Kort et al. (2002) proposed a comprehensive four-quadrant model that explicitly linked learning and affective states), Anderson proposed an adaptive control of thought (ACT) cognitive theory (Anderson, 1980, 1983), which became the theoretical basis for the popular Cognitive Tutor ITS system, and others have used ITSs to identify the different influences that different affective/cognitive states have on learning effects. (Craig et al., 2004; Koedinger & Corbett, 2006; Woolf, 2009; Arroyo et al., 2009; Lester et al., 2013; D’Mello & Graesser, 2011).

Pedagogical researchers have focused on designing teaching strategies to achieve better teaching effects (Kolodner, 2002; Luckin et al., 2016; VanLehn, 2006). For example, Graesser et al. (2005) reviewed the pedagogical strategies that embedded constructivist approaches in ITS instruction, finding that the learning effect was negatively correlated with boredom and positively correlated with flow, and also explored the relationships between the emotional state and the learning process. Other pedagogical aspects have been examined to improve the effectiveness and efficiency of education, such as game-based learning strategies (Lester et al., 2013; Tsay et al., 2018; Santhanam et al., 2016), collaborative learning environments (Chi et al., 2001), and intelligent narrative technologies (McCoy et al., 2011; Yu & Riedl, 2012).

Given all this interest over several decades, there have been a number of previous ITS research reviews, such as systematic reviews focused on ITS composition, current research foci, and current trends (Almasri et al., 2019; Akbulut & Cardak, 2012; Baker et al., 2008; Schmidhuber, 2015), and reviews focused on technology/evaluation methods in different ITSs (Desmarais & Baker, 2012; Elham et al., 2018; Conati, 2002a, 2002b). In a recent review, Elham et al. (2018) concluded that the most frequent AI techniques applied to ITSs were action condition rule-based, Bayesian networks, and data mining. Other reviews have given suggestions on specific teaching mechanisms, such as strategic decision making based on student emotions (Sharma et al., 2014; Conati, 2002). The most promising ITS research trends have been identified as portable devices (Elham et al., 2018; Ahuja & Sille, 2014) and collaborative learning (Isotami & Misogici, 2008), with others redefining the purpose of using ITSs as learning devices that are unable to be effective without human guidance rather than systems to improve effectiveness and achievement rates (VanLehn, 2011; Woolf, Lane, et al., 2013; Woolf, Chad Lane, et al., 2013). Another common ITS research review type has been meta-analyses based on quantitative systematic reviews, most of which have compared the learning effects of different ITS systems, the teaching methods, or the learning conditions (VanLehn, 2011; Graham et al., 2015; Steenbergen-Hu & Cooper, 2014; Ma et al., 2014; Kulik, 2015).

The review conducted in this paper employed a scientometric method, which uses bibliometrics such as citation analyses to evaluate the scientific research activities that guide science-policies (Egghe, 2005). Nalimov and Mulchenko (1971) coined the term “scientometrics” in the 1960s to describe the growth, structure, interrelationships, and productivity in scientific research: (Hood & Wilson, 2001), after which scientometrics was primarily employed to analyze research literature based on the attributes in the research itself, such as the number of publications, keywords, or other dynamic indicators such as citation information. Compared with meta-analyses or systematic reviews, which usually require detailed coding or the weighting of research content based on human judgment, the scientometric method automatically calculates and demonstrates the information based on the publication attributes, that is, it is an analysis method based on data and algorithms.

Scientometrics can provide a quantitative and systematic review of all aspects of ITS research, such as the publication and disciplinary states, the particular research issues, the intellectual structures, and the emerging trends. While previous research has discussed the multidisciplinary composition of ITS research, it has not provided a discipline construction path (Craig et al., 2004; Koedinger et al., 2012; Woolf, 2009). Therefore, this review will reveal the path of the multidisciplinary integration of ITSs and the proportions of the composition. The proportions involved in the composition of ITSs researches and the path of its formation are essential to understand the characteristics of ITSs as a subdiscipline as well as to identify the discipline formation stage. Only by analyzing the characteristics and composition of the subdiscipline, the evolution of the knowledge sources, and the intellectual base can researchers be better guided concerning the research methods or skills they need to master. Further, the illumination of the current stage of the subdiscipline can help researchers identify more promising research directions and adopt more suitable strategies.

Therefore, based on the following research questions (RQs), this research aimed to reveal the history, current status, and trends in ITS research from a scientometric and multidisciplinary perspective:

  1. RQ1:

    What is the current status of ITS research? What are the main contributing countries/regions and the major journals, authors, and institutions?

  2. RQ2:

    What are the ITS knowledge sources and how have they developed?

  3. RQ3:

    What have been the most popular ITS research foci?

  4. RQ4:

    What are the chronological ITS research stages and the intellectual bases in each of these stages?

  5. RQ5:

    What are the emerging ITS research trends?

Data and methods

Data collection

The relevant research article data for this study were extracted from the Science Citation Index Expanded (SCIE) and the Social Science Citation Index (SSCI) in the Web of Science (WoS) Core Collection database from 1963 to 2020. Due to deficiencies in the originality, completeness, and impact of review articles, conference papers, proceedings, and other document types, only research articles were included in this study. Review articles were excluded because they were not original research and their inclusion would possibly lead to over-citation and overrepresentation, especially for highly cited papers (Miranda & Garcia-Carpintero, 2018), which could have affected the veracity of the results.

The 1173 bibliographic records (research articles) and 12,992 associated references were collected on May 21, 2020, using the advanced search service offered by the WoS Core Collection. A query formula that included field tags, Boolean operators, parentheses, and query sets was created to retrieve the desired literature, as follows: TS = (“intelligent tutoring systems” OR “Intelligent Computer-Aided Instruction” OR “Intelligent Computer-Assisted Instruction” OR “Artificial Intelligence in Education” OR “adaptive educational system” OR “adaptive learning systems” OR “constraint-based tutors” OR “Cognitive Tutor” OR “AutoTutor” OR “SQL-tutor” OR “assistments” OR “elm-art” OR “iweaver” OR “DeepTutor” OR “Coh-Metrix” OR “Electronix Tutor” OR “Student modeling” OR “Knowledge Tracing).” This retrieval formula included concepts, theories and methods, and applications to ensure precision in the ITS research retrieval results.

Data analysis

After the data collection, several descriptive statistical analyses were performed to identify the WoS categories, the specific country/region, journal, author, institutional, research area publication numbers, and the ITS research field citation distribution. To explore the information behind the publications, a series of co-occurrence analyses (co-word and co-citations) were also conducted.

Co-occurrence analyses or co-word analyses were conducted on the keywords. The keyword network in scientific literature can be used to explore the correlations between different research studies to reveal the potential research issues and the intellectual structure in a certain research field (Chen et al., 2019). This method has been widely used in bibliometric studies (Wu et al., 2016) and has proven effective (Khasseh et al., 2017).

Then, a co-citation analysis of the articles was performed. Two articles that are cited in the same article are believed to have a certain relevance, that is, a co-citation relationship, as these co-cited articles are the upstream knowledge sources. Co-citation analyses are not only capable of presenting the intellectual evolutionary process in the ITS field, but can also reveal the milestone literature that promote innovation (Hou et al., 2018).

Visualization and tools

The scientometric research tools, CiteSpace and VOSviewer, were used to visualize the co-word and co-citation networks to elucidate the deeper structures in these networks.

Developed by Chen (2006), CiteSpace has been widely used for bibliometric analyses and visualizations. In addition to co-occurrence analysis and data visualization, CiteSpace also generates cluster analyses and cluster-labeling using algorithms such as the Log-Likelihood Ratio (LLR) to extract the themes in each cluster and also uses Burst detection to identify emergent research‐front concepts. VOSviewer, which was developed by Van Eck and Waltman (2009), was also employed to construct and view the bibliometric maps.

This paper harnessed the strengths of each of the above-mentioned tools to enhance the presentation of the study results. Table 1 shows the corresponding relationships between the research questions, data analysis methods, tools, and research content.

Table 1 Relationship between research questions, methodology, and analysis tools

Results and discussion

Publication analysis

RQ 1: Three-stage publication growth

Figure 1 shows the number of publications in each year. A total of 1173 articles (publications) were collected. The earliest paper appeared in 1963 (Cavanagh, 1963), but besides this paper, few relevant papers were published in the 1960s and 1970s. However, the general ITS technological development during the early years was driven by AI technology.

Fig. 1
figure 1

ITSs topic papers collected from SCIE and SSCI, 1963–2020. Note. The Top Edge of the Yellow Rectangle represents the mean number of publications at each stage

More ITS-related publications appeared in the 1980s and since then, the number of publications has been keeping fluctuant growth. The overall distribution of ITS-related publications had a three-stage growth. During the first stage (1985–1998), the ITS research studies jumped from 7 in 1989 to 32 in 1998, with a mean of 14. In the second stage (1999–2006), after a brief decline in 1999, the number of publications increased, with a mean of 32.6, with a peak appearing in 2006. However, it was then followed by a sharp decline in 2007, but rebounded in 2008. The third stage was from 2007 to 2019, during which time, the mean was 52.4, and reached an all-time peak of 77 in 2013. In general, the number of publications has had a fluctuating growing trend, with the average number of publications in the three stages being respectively 14, 32.6, and 52.4, from which the three-stage growth pattern can be clearly seen.

There were several noteworthy AI technology events that increased public interest in AI technology and led to technological breakthroughs. As the public attention began to attract research interest, there was a significant increase in the number of publications.

Lighthill’s report (1973) revealed that there was a gap between AI technological expectations and reality, which resulted in a reduction in funding in some countries for the first ebb since the Dartmouth Artificial Intelligence Conference in 1956. In 1986, the development of the back propagation (BP) neural network algorithm (Rumelhart et al., 1986) incited new interest in ITS research. In 1997, Deep Blue defeated the world chess champion, which was a milestone in AI development and awakened public and research interest in related ITS applications, with the number of publications climbing to 37 in 1998. However, a year later, only 11 papers were published as chess robots were not seen as directly relevant to ITSs. In 2006, Hinton’s deep belief network (Hinton et al., 2006), which was a breakthrough in neural network algorithms, spurred renewed interest, with the number of publications increasing to 59, and in 2013, the deep learning algorithm that finally achieved speech and visual recognition success, resulted in the era of perceptual intelligence (Li, 2018), which resulted in publications reaching an all-time peak.

These landmark events were respective ITS publication triggers, which little by little led to the fluctuating growth in the number of publications over the years. The relationship between AI technology and ITSs was due to the obvious spillover effect as cutting-edge AI theory and technological research started to lead to ITS education application developments. As AI research increased, interest in ITSs increased, with each research field having an influence on the other as reflected in the fluctuating number of publications. From the scientometric view, the stepped and fluctuating growth) in ITS research (Fig. 1) did not conform to the Price literature exponential growth curve (Tague et al., 1981), which demonstrated that ITS research was still in its initial discipline formation stage.

Since the neural network algorithmic breakthroughs in 2006, algorithms such as Deep Learning, CNN (Convolutional Neural Networks), and GAN (Generative Adversarial Networks) have gradually become the most exciting AI areas. In recent years, one of the deep learning algorithms, CNN, which specialized in recognizing images, facial expressions, voice, and even text, has begun to be more widely applied to education through ITS research optimized model designs and parameter values (Liu et al., 2018; Hu et al., 2020). Another deep learning algorithm, GAN, which has also received significant attention, has seldom been used in education (Chen et al., 2020), but is suitable for automatically generating graphics and audio images and could be a useful future ITS application.

RQ 1: Publication numbers in SCIE or SSCI

ITS research combines achievements from both the social and natural sciences. Figure 2 compares the number of publications in the SCIE and SSCI databases from 1963 to 2020, which are respectively related to natural and social science publications, with 330 publications included by both indices. However, this section focuses on the relationships between the social and natural science ITS publications.

Fig. 2
figure 2

SCIE and SSCI ITS-related publication numbers (1963–2020)

The SSCI publications in this field first appeared in the 1960s, which was far earlier in the SCIE papers (1986). This revealed that the possibility of ITSs was first examined in the social sciences. Before 2007, the number of SCIE publications exceeded the number of SSCI publications, but after 2007, the two domains had commensurate growth until 2010, when the number of SSCI publications surpassed the number of SCIE publications.

The first surge in ITS-related SCIE publications in 1992 was possibly because of engineering research. The second SCIE surge was in 1997, which was the same year that the Deep Blue chess AI defeated the human chess master for the first time, which initiated a global upsurge in AI research. From 1992 to 2006, the number of natural science publications was significantly higher than in the social sciences, but after 2007, the number of social science publications exceeded those in the natural sciences.

RQ 1: Exponential increase in citation trends

The annual number of ITS citations had an exponential growth trend (see Fig. 3), increasing from 4 in 1986 to 2714 in 2019, with this growth trend being precisely fitted using the software Origin 2019 Pro with R2 = 0.99. The rapid growth of the fitting curve started from around 1990, which marked the beginning of a significant increase in ITS citations.

Fig. 3
figure 3

Growth of citations

RQ 1: Top productive countries/regions, authors, journals and institutions

Table 2 lists the top five dominant ITS research countries/regions and authors. The ITS research geographic distribution can help researchers identify the countries/regions on the cutting edge of ITSs. The United States ranked first with 480 publications, or about 40.9% of the total, with the most productive writer being Graesser, a professor at the University of Memphis specializing in psychology, discourse, language, AI, and education. The top 4 productive authors were all from the USA.

Table 2 Top productive countries/regions and authors

Table 3 shows the top five journals that published ITS-related research and the institutions with the highest number of ITS publications. Computers & Education, which is an interdisciplinary journal for different types of technically-based education studies, published the highest number of ITS papers. Most of the top five journals had research studies focused on computer science, education, engineering, psychology, cybernetics, and other disciplines, which reflected the multidisciplinary features of the ITS domain. The University of Memphis published the most ITS studies.

Table 3 Top productive journals and institutions

Multidisciplinary characteristics based on WoS category

The journal categories for the ITS publications were generated from the WoS classification tags “Research Areas” and “WoS Categories,” with the former giving the general journal classification, and the latter giving more detailed information that could be detected by the CiteSpace category function.

RQ 2: Distribution of research areas and the Timezone view of categories

In the top ten research areas based on the ITSs WoS classifications by “Research Areas” tags, four areas significantly prevailed: “COMPUTER SCIENCE” (688 publications); “EDUCATION & EDUCATIONAL RESEARCH” (453 publications); “PSYCHOLOGY” (183 publications)); and “ENGINEERING (159 publications).” The number of publications in the computer science domain was much higher than in the other research areas, and these four research areas appeared to be the fundamental ITS research branches.

To dynamically illustrate the multidisciplinary ITS research integration paths, CiteSpace software was applied to develop a time zone view (see Fig. 4) of the WoS Categories in this field. In Fig. 4, the size of each circle is proportional to the number of publications in the category, and the position of the node centers indicate the year in which the initial publication in each category appeared.

Fig. 4
figure 4

Category time zone of ITSs research

As the journal that included the earliest publication in 1963 (Cavanagh, 1963) had incomplete information, it was filtered out from the category time zone (Fig. 4). In 1964, the Education & Educational Research and Education, Special research fields first appeared, along with a publication by Cavanagh (1964). In 1986, Psychology and Computer Science appeared, followed by Computer Science and Information Systems (1988), Computer Science and Interdisciplinary Applications (1989), Computer Science and Computer Science Theory & Methods (1989), Computer Science Artificial Intelligence (1990), Computer Science and Cybernetics (1994), and Computer Science and Software Engineering (1994), with these categories forming the largest community. Since then, research related to computer science has been the most significant knowledge source for ITS research. In 1991, engineering research appeared, including Engineering and Engineering, Electrical & Electronic, and in 2008, ITS-related linguistics research emerged.

As shown in Fig. 4, the ideas for ITSs originated from the education discipline. In the 1960s, pioneering education scholars introduced a rudimentary ITS application. Due to the scarcity in basic research achievements and computer hardware, ITSs research came to a standstill for decades. However, from the middle of the 1980s, a group of scholars introduced computer technology theories and methods into ITS research, and in 1986, the introduction of psychological theories and methods began to contribute significantly to the study of evaluation. In 1991, ITSs appeared in the “Engineering” and “Electrical & Electronic” fields, which were mainly focused on software engineering or automated/information systems. For example, an article published in Information and Software Technology entitled “SQL Tutor, a co-operative ITS with repository support” introduced the system architecture and concluded that repository technology was a technical solution that supported multi-user co-operation and collaboration (Wang, 1997).

The most recent subject, CHEMISTRY, TELECOMMUNICATIONS appeared in 2019, followed by Psychology Mathematical (2016), Psychology Multidisciplinary (2013), and Linguistics/Language (2008). The Category Timezone view revealed the path of multidisciplinary integration of ITS.

Computer science was the main ITS function background. While the theoretical basis for ITSs was education and psychology, the integration needed to be realized through technology. Therefore, the related computer science studies were focused on specialized computer science skills and the incorporation of educational or psychological frameworks. For example, Chang et al. (2020) proposed methods to maintain the benefits of semantic web-based approaches when representing ITS pedagogical rules, and Liu et al. (2018) incorporated Cognitive Diagnosis models, which were an evaluation system based on cognitive psychology, statistics, and computer science, into an ITS to model to analyze student answer data, which introduced a correlation between test questions and knowledge structures to diagnose student cognitive states and quantitatively investigate student differences and cognitive levels.

Algorithms need to be guided by educational and psychological theories, which then need to be properly embedded in the designed teaching practice systems and programs. It is also of great significance to have experts in the fields of education and psychology evaluate methods using various ITS applications, propose suggestions for system improvement during the implementation, and develop theories applicable to the educational context.

Some scholars have commented that to adapt to different teaching situations, pedagogical and domain models based on embedded pedagogical or psychological theories were essential for ITSs (Luckin, 2016; Sharma et al., 2014). For example, Scaffolding theory and Socratic Questioning have been used to design a pedagogical ITS model (Azevedo & Hadwin, 2005; Cotton, 2001; VanLehn, 2011). From an application perspective, ITSs can provide adaptive and personalized teaching strategies for different situations. Graesser et al. (2011) concluded that the ITSs environment was well suited to the acquisition of pedagogical strategies and deeper knowledge in the cognitive spectrum), and Anderson et al. (1995) examined the development of individual instruction, arguing that the best tutorial strategy was to provide immediate feedback in short and directed error messages).

ITSs have also been found to be advantageous in tracking psychological attributes, such as emotional/cognitive recognition, and in developing diagnostic information models (Chang et al., 2020; Conati & Maclaren, 2009; Craig et al., 2004) and computer perception devices that automatically monitor student emotions while they are learning (Aleven et al., 2006; Graesser & d'mello, 2012).

In terms of theoretical constructs in psychology, Matz (1981) developed one of the first detailed psychological models to explore why students had certain misconceptions while using ITSs, which provided a cornerstone for building flexible diagnostic systems), and Du Boulay et al. (2007) proposed meta-cognitive scaffolding to increase learner motivation and engagement. As early as 1993, the ACT-R theory was proposed that integrated cognitive science and learning theory, which consequently became a theoretical framework to guide system design (Anderson et al., 1995) for ITSs such as the Cognitive Tutor developed by Carnegie.

Research issues

RQ 3: Co-occurrence networks of keywords

Because keywords are refined research content, popular ITS research foci can be identified using keyword co-occurrence analysis (Xie, 2015), which indicates that the keyword appears in the same article. When two keywords appear in the keyword list for the same article, a co-occurrence relationship is established. Excluding the phrase “Intelligent tutoring systems”, the top 15 keywords in descending order of co-occurrence frequency were as follows: interactive learning environments (69); student modeling (51); teaching/learning strategies (35); machine learning (24); human–computer interface (23); evaluation methodologies (19); e-learning (19); architecture for educational technology systems (18); Bayesian networks (17); adaptive learning (16); learning (16); artificial intelligence (15); Coh-metrix (15); multimedia/hypermedia systems (14); improving classroom teaching (14); and computational linguistics (14). Each keyword here is derived from research articles in the ITSs field collected for this study.

The keyword co-occurrence networks (Fig. 5) were constructed using VOSviewer. Each keyword is called a node and the co-occurrence relationship between each keyword pair is called an edge, with the size of the nodes and fonts in Fig. 5 showing the keyword co-occurrence frequencies, and the distance between each pair of nodes showing the co-occurrence relationships and similarities (Wei, 2011). The more frequently each pair of nodes concurrently appears in the keyword list of articles, the stronger the edge between them, which in turn determines the node layout and clustering in the networks. The node layout is based on stress-minimization and multidimensional scaling (MDS) in VOSviewer (Leydesdorff & Rafols, 2012), that is, the network nodes are clustered by the built-in MDS algorithm based on the edge strengths and the distance from the neighboring nodes. Consequently, the nodes in the same cluster have stronger correlations.

Fig. 5
figure 5

Keyword co-occurrence networks

The nodes in Fig. 5 were divided into four clusters and separately colored (1) red, (2) green, (3) blue, and (4) yellow, with each cluster representing a different popular research theme.

The red cluster was the largest with 32 items and served as the bridge to the other three clusters. The dominant topic in the red cluster was emotion recognition due to the keywords including “learning,” “affective computing,” “affect, emotions,” and “cognition.” The green cluster (26 items) contained the nodes such as “student modeling,” “machine learning,” “Bayesian networks,” “educational data mining,” and “fuzzy logics,’’ which implied that the core cluster topic was computer technologies. In the blue cluster (21 items), the most frequent co-occurrence keyword was “interactive learning environments,” with the other keywords in this cluster being focused on teaching/learning strategies, educational technology architectures, human–computer interfaces, and pedagogical issues as well as “post-secondary education” and “secondary education,” indicating that the research in the two themes were tightly associated with the ITS learning environments and learning strategies. The yellow cluster (14 items) contained the nodes, “Coh-Metrix,” “readability,” “coherence,” and “cohesion” and so on. Coh-Metrix is an online text analysis tool designed and developed by the University of Memphis that measures readability, vocabulary, syntax, concreteness, cohesion, and the story hood of a text.

The red keyword co-occurrence network cluster indicated that the ITS cognitive, emotional, and educational issues were closely related, that is, cognitive psychology and ITS education appeared to be highly integrated. However, while the analysis did not identify a keyword for an exclusive ITS research theory, it was inferred that there were few guiding theories. Therefore, it appears that there is an urgent task for researchers to develop targeted pedagogical and psychological theories for the ITS context.

Most of the keywords in the green cluster were general rather than specific technical types, such as “machine learning,” “Bayesian networks,” and “fuzzy logic,” which indicated that because of the diverse application types and technologies in ITSs, the research was not specifically focused on a specific technology. So the keywords were only reflected the most common technology types. Moreover, the latest AI technologies that were leaps and bounds from artificial neural network algorithm since 2006 were not reflected, which was possibly because of the publication research lag. However, it did indicate that greater research attention is necessary on emerging algorithms and technologies such as GAN.

Intellectual structure evolution

The intellectual structure of a given research field can be illuminated using a co-citation reference analysis (Chen et al., 2010), which highlights the key fields in the intellectual base and predicts the intellectual structure trends (Chen et al., 2008; Madani & Weber, 2016). When two publications are cited by another publication, these two publications are seen to be interrelated (Griffith et al., 1974; Small & Griffith, 1974). The collection of references cited by a research community, therefore, comprises the intellectual base of the research frontier (Braam et al., 1991). Co-citation analyses identify the consistency of the concepts and references and their associations with the research field (Anwar et al., 2019).

In this section, the intellectual structure evolution is demonstrated through a visualization of the intellectual base and the research fronts (Chen & Guan, 2011). The co-citation analysis was based on the 12,992 references in the 1173 sample publications.

RQ 4: Chronological research stages

The co-citation references and their relationships are the nodes and links in the co-citation network, which are clustered based on the co-citation associations (Madani & Weber, 2016). In CiteSpace, the co-citation network clusters are labeled by the Log-Likelihood Ratio algorithm (Chen et al., 2010), which reveals the main research specialties in the nodes in each cluster. (Chen, 2017; Hu, 2017). Each cluster has a specific ID, such as #0, #1, as shown in Fig. 6. The ID numbers are in descending order based on the size of the cluster (Li & Chen, 2016), and the color bar below Fig. 6 changes from gray to red to represent the transition from 1963 to 2020.

Fig. 6
figure 6

Co-cited reference clusters with cluster ID

The silhouette score in Table 4 is an indicator of the cluster homogeneity, that is, when the silhouette score is greater than 0.7, the clustering results have high reliability (Li & Chen, 2016). The Mean Year is the average publishing year of the literature in the cluster, which is used to detect the ITS intellectual base evolution. The top terms are detected by the Log-likelihood algorithm from the keywords in the co-cited references. The top terms in Table 4 are the first two terms in each cluster that were statistically significant.

Table 4 Information of the co-cited references clusters

From the cluster evolution shown in Fig. 6, the co-cited reference clusters were divided into three stages in Table 4, which are shown in different background colors.

Clusters #7, #11, #9 and #4 constituted the early stage (mean years 1986–1992). Using the LLR tag semantic analysis, it can be seen that the early research was mainly focused on the ITS applications associated with knowledge representation and CAI, such as adaptive systems and business management gaming simulations, and the development of cutting-edge computer technologies, such as unsupervised learning and neural networks. This early stage research indicated that ITS research originated from an education base and was established through the associations of education and computer technologies, with the core themes being related to how the education and computing technologies could be combined.

Clusters #5, #8, #4, #6, #23, #13, #10, #18, #16 and #0 constituted the second stage (mean years 1997–2006), with the research themes being an extension of the earlier stage. Due to the spillover effect of the continuing technological progress, many ITS applications such as AutoTutor and Cognitive tutor began to emerge. Consequently, the research directions began to focus on the integration of pedagogy, cognitive psychology, and linguistics theories into ITS applications (Anderson et al., 1990; Johnson & Richel, 2000). The 2006 publication were related to AI technology breakthroughs (Hinton et al., 2006) and the application of deep learning technologies to ITSs (Conati, 2009; Desmarais & Naceur, 2013; Koedinger et al., 2012). Research also strengthened and deepened in various subfields as shown in the several co-citation reference cluster branches, such as #0 human–computer interactions, #5 probabilistic models, #8 computer-mediated communication, #10 Cognitive tutors, #18 meta-cognitive skills, #23 semantic web-based educational systems, #13 latent semantic analyses, #4 STEM learning, #16 data mining, and #6 computer-supported collaborative learning. Of these, the research on human–computer interactions was the largest research cluster in this stage.

The appearance of these multiple research branches in the second stage was directly related to AI technological breakthroughs, which opened up the field to different types of ITSs. For example, the progress made in the development of probability models improved the construction of student models, and the multidisciplinary pedagogy, cognitive psychology, and linguistics theory developments deepened the ITS application research possibilities.

Two research branches appeared as the intellectual bases in the third stage (# 1, # 3, # 14, # 15, # 2). Cluster #1 was associated with developments in computational linguistics and was an extension #6 and #23 and semantic analysis. The references in cluster #1 were mainly associated with ITS text analysis applications such as Coh-Metrix. The second branch (# 3, # 14, # 15, # 2) was an extension associated with the combination of computer technology and pedagogy in ITSs, such as problem-centered instruction, and STEM. Some of the themes that appeared in the clusters #4 and #5 reappeared in clusters #2, #3, and #14, which suggested that the STEM learning and model constructions were common ITS research concerns. The two research branches in the third stage, therefore, appeared to be related to computer technology spillover effects, as they represented progress in areas such as the associations between NLP (Natural Language Processing) and social science and ITS text analysis applications such as Coh-Metrix; that is, the research in this stage deepened the research directions identified in the previous stages.

The scientometrics analyses of the ITS research revealed the multidisciplinary stage of discipline formation, as highlighted in Shneider’s theory (2009). Shneider proposed that the discipline formation may experience four stages, the initial conceptualization stage, the multidisciplinary stage, the expansion stage, and the final stage of decay (Shneider, 2009). The researchers were applying methods from other disciplines to deepen the ITS application-oriented research. Several studies (Choi & Clin, 2006; Núñez et al., 2019); Garnder (1987) claimed that multidisciplinary science was a weaker version of interdisciplinary research, which creates its own theoretical, conceptual, and methodological identity (Núñez et al., 2019). However, it could also be argued that ITS research is currently multidisciplinary as specific ITS communication systems or specific theories have not yet been developed. Overall, the main characteristics of the studies in this period were originality and creativity, which tended to indicate that to effectively extend into new research areas, researchers needed to practice high-risk endurance when choosing their tasks (Shneider, 2009).

RQ 4: Significant references in the co-citation networks

Frequently co-cited references imply advanced ideas and developments in a given research field (Anwar et al., 2019). Table 5 shows the top 10 most frequently co-cited references and gives a brief description of the articles. Articles with high betweenness centrality scores generally indicate (Chen & Guan, 2011) a fundamental transition in the research knowledge domain paradigm and the significant influence these references have on the co-citation structure (Li & Chen, 2016). The betweenness centrality in CiteSpace is algorithmically identified and represented using magenta circles, which allows researchers to understand the changes in a domain’s knowledge structures over time (Chen, 2005).

Table 5 Top 15 co-cited references

The article with the highest betweenness centrality (centrality = 0.32) was “Cognitive tutors: Lessons learned” (Anderson et al., 1995), which presented several empirical studies on ITS learning effects, the development of procedures when cognitive models were incorporated into tutoring systems, and the development of individual instruction. The article also reviewed the use of KT and Bayesian technologies to assess the probabilities that students had learned the principles in the cognitive model. This research article belonged to cluster #5 (probabilistic models, adaptive testing) at the connection point of #9, #5, and #8 in Fig. 6. Therefore, this study was a notable joint in the intelligence base and transited single chain development to the multiple sub-research fields.

RQ 5: Emerging ITS trends

Emerging research‐front concepts were identified using CiteSpace’s Burst detection of the article citations (Chen, 2006), and were specifically detected using the Kleinberg algorithm (Feng et al., 2015). Co-cited references with high Burst values indicate that the number of co-citations of an article suddenly increased sharply (Chen, 2012; Hou et al., 2018). The citation reference themes with high Burst values in a specific period, therefore, reflect the research trends in specific domains for the next few years (Hou et al., 2018).

Figure 7 shows the co-cited reference areas with high Burst values, most of which were located between 2019 and the first half of 2020, with the red dot representing those references with high burst values. The image in the larger red circle is an enlarged drawing of the area within the smaller red circle, and the thickened lines in the larger red circle represent the citation relationships between the co-cited references with high Burst values from 2019 to 2020. The latest co-cited reference clusters with high Burst values were mainly in clusters #1 and #2, the themes of which reflect the emerging ITS research trends.

Fig. 7
figure 7

Evolution of co-cited reference with high Burst values

Cluster #1 was mainly associated with Coh-Metrix research (Graesser et al., 2014; Graesser, Conley, et al., 2011; Graesser, Mcnamara, et al., 2011; McNamara et al., 2010, 2014). Grasser et al.’s (2014) review, “Coh-Metrix measures text characteristics at multiple levels of language and discourse,” summarized how five factors; narrativity, syntactics, simplicity, word concreteness, referential cohesion, and deep/causal cohesion); accounted for text variations, and also reported on analyses that augmented Coh-Metrix. McNamara et al. (2010) investigated the validity of Coh-Metrix as a measure of cohesion and coherence in texts, finding that the Coh-Metrix cohesion indexes were able to significantly distinguish the high versus low-cohesion text versions. This research revealed that Coh-Metrix was one of the most popular research applications in recent years.

In the #2 cluster, the top terms were “problem-centered instruction” and “STEM,” for which there were three research articles were prominent after all the reviews with the high frequency of co-cited references were excluded.

The first identified paper was the “Effectiveness of Cognitive Tutor Algebra I at Scale” (Pane et al., 2014) (burst = 4.15), which reported on a two year study of Cognitive Tutor Algebra I (CTAI) to compare the learning effects of personalized, mastery learning, blended learning, and CTAI alone. It was found that there was no noticeable learning effect in the first year, but positive effects were found in the second year. A second prominent paper was “Deep knowledge tracing” (Piech et al., 2015) (burst = 4.05), which introduced Recurrent Neural Networks (RNNs) to assess large scale online teaching environments and student modeling. The RNN model family was found to more easily capture the complex representations of student knowledge than earlier models and was able to substantially improve student performance predictions. The third paper was “Stupid Tutoring Systems, Intelligent Humans” (Baker, 2016) (burst = 4.02), which examined the importance of human intelligence in ITS research from a critical perspective, proposed a new teaching paradigm based on educational data mining and learning analytics methods, and emphasized the value of human wisdom when developing ITS applications.

Of the top five references with the strongest citation bursts in 2019–2020 in clusters #1 and #2, there were three meta-analytic reviews: “Intelligent Tutoring Systems and Learning Outcomes: A Meta-Analysis” by Ma et al. (2014) (burst = 10.16); the “Effectiveness of Intelligent Tutoring Systems: A Meta-Analytic Review” by Kulik (2015) (burst = 9.1); and “A meta-analysis of the effectiveness of intelligent tutoring systems on college students’ academic learning. Journal of Educational Psychology” by Steenbergen-Hu and Cooper, (2014) (burst = 9.1). These references analyzed the effectiveness of the ITS-related learning effect evaluations in different environments.

Kulik (2015) conducted a meta-analysis of 50 controlled intelligent computer tutoring system evaluations and found that the intelligent tutoring had raised test scores by 0.66 standard deviations over the conventional level. Ma et al. (2014) examined 107 findings from 73 separate reports, finding that the average ITS effect resulted in an improvement in test scores of 0.43 standard deviations, but also found that the improvements depended to a great extent on whether they were measured using locally developed or standardized tests. The meta-analytic review by Steenbergen-Hu and Cooper (2014) analyzed 35 ITS effectiveness evaluations in colleges, finding that the ITS applications resulted in increases in the overall test scores by approximately 0.35 standard deviations, but that the type of control group strongly influenced the evaluation results.

The other two references with the strongest citation bursts were a research article and a book. The article “Coh-Metrix: Providing Multilevel Analyses of Text Characteristics” (Graesser et al., 2011) (burst = 13.09) identified “word concreteness,” “syntactic simplicity,” “referential cohesion,” “causal cohesion,” and “narrativity” as the main factors accounting for the most variance in texts across grade levels and text categories using Coh-Metrix. The book “Automated Evaluation of Text and Discourse with Coh-Metrix” (McNamara et al., 2014) (burst = 12.37) provided a comprehensive introduction to Coh-Metrix from both theoretical and practical perspectives, and commented that the development of Coh-Metrix had resulted in a new paradigm that integrated language, corpus analyses, computational linguistics, education, and cognitive science research.

The top ten co-cited references with high burst values from 2019 and the first half of 2020 are listed in Table 6. As there was a large number of reviews (7 reviews, 2 articles, and 1 book), the proportion of original research has been relatively small. It showed that scholars in this field paid more attention to the review in this period. Generally speaking, reviews do not produce new knowledge as they only analyze previous studies, which seems to indicate that there have been fewer original research breakthroughs in recent years or any breakthroughs there have been have not received widespread attention. This situation was further confirmed when compared with Table 5, in which the top co-cited references in history were research articles. Liu and Hu (2018) believed that as there had been wide ranging discussions on education ITS developments, there was a great deal of high-level research repetition but few innovative breakthroughs. This was possibly because the ITS field has lacked a set of adaptive theoretical frameworks and basic guiding theories or assumptions, which should not be broad and descriptive, but directly reflect the characteristics and the essences of ITS-associated learning and education (Liu & Hu, 2018). Thus, we could boldly infer that there was little obvious breakthrough in ITSs in recent years, and there might exist a high level of research repetition.

Table 6 Top 10 co-cited references with high burst value in 2019–2020.5


Publication growth patterns and top contributors

In response to RQ 1, it was revealed that there had been a fluctuating growth in yearly ITS publications, and that the number of citations had exponential growth, which indicated that the ITS field was still in its initial developmental stage based on the Price literature exponential growth curve. However, high developmental potential was evident, which indicated that the ITS and ITS research is expected to flourish in the coming years.

It was implied that the milestones in the development of AI led to the three-stage development of ITSs research. In 1963, ITSs were first proposed and after the Lighthill report (1973) was published, research into AI was reduced with the study into ITSs commensurately reducing until 1985, as shown in Fig. 1. The development of the BP algorithm made the training of large scale neural networks possible, which resulted in increased interest in AI research and a renewed interest in ITSs. However, AI research experienced a second tough time in the late 1980s and early 1990s, then the research recovered. When Deep Blue defeated the world chess champion in 1997, ITS research increased. When Hinton’s Deep Belief Net made a breakthrough in AI algorithms in 2006, there was an explosion in ITS research publications.

The most productive country/region for ITS research has been the USA, the most productive author has been Graesser, the top publication journal has been Computers & Education in terms of the number of the publications and the University of Memphis has contributed the most ITS research studies.

The multidisciplinary integration of ITSs

A new viewpoint was introduced to explain the subdiscipline composition to answer RQ 2. It was found that computer science, education research, psychology, and engineering have been the main ITS research knowledge sources, with computer science taking the dominant position. Due to its application-oriented characteristics, ITSs have had a unique discipline integration feature, which means that ITS research has tended to harness the latest achievements from other disciplines (Shneider, 2009). Given the continuing rapid technological developments, several ITS research subdivisions emerged associated with educational theory, cognitive psychology, and linguistics. This study explicated the path of multidisciplinary integration of ITSs researches in social and natural science fields. Social science in ITSs developed along with natural science and technology evolution.

The category analyses also clearly revealed ITS’s multidisciplinary integration path. While ITSs originated from an educational discipline, since 1986, computer science research has become its most important knowledge base; however, from 2007, social science-based publications have exceeded natural science-based publications.

The most popular research issues

The top five keywords from the co-occurrence frequencies were identified as interactive learning environments, student modeling, teaching/learning strategies, machine learning, human–computer interface, and Coh-Metrix), which reflected the most popular issues in ITS research.

Intellectual base and emerging research trends of ITSs

The intellectual base evolution was analyzed by examining the co-cited references, the clusters for which were divided into three chronological research stages: the first stage mainly focused on the ITS applications; the second stage led to various subfields; human–computer interaction, probabilistic models, computer-mediated communication, cognitive tutors/meta-cognitive skills, semantic web-based educational systems/latent semantic analysis, STEM learning, data mining, and computer-supported collaborative learning; and the third stage gave rise to two main research subfields; computational linguistics, and the combination of computer technology and pedagogy.

The latest Co-cited references clusters with high Burst value references were clusters #1 and #2, the themes of which reflected the emerging ITS research trends. Cluster #1 was mainly focused on Coh-Metrix, indicating that the text analysis/NLP and other research around Coh-Metrix was the current research trend. Cluster #2 was focused on problem-centered instruction and STEM learning, which have been traditional ITS research concerns. Model construction and critical thinking for the development of ITS applications were also found to be important recent research themes.

From the scientometrics view, it was argued that ITS research is currently multidisciplinary and therefore researchers, should implement ideas and practice high-risk endurance when choosing research tasks (Shneider, 2009).


Here we wish to raise a question for future research. What has been the effect of ITS on education? To answer this question, we need to consider the effects and functions that ITSs are seeking to achieve, the extent to which technology can be applied, as well as identifying the initial desire for ITSs from an educational viewpoint. This question comes from the vision of education. It doesn’t need to be answered immediately, and there might be more questions to be addressed, especially under the situation of the global epidemic COVID-19 bringing great challenges to education. There are still many students taking classes at home around the world. What are the new requirements in the circumstances for ITSs? We hope to raise these questions to inspire researchers, educators to explore more undeveloped research areas, and have more in-depth discussions.

Limitations and future research

This article had some limitations as the data were only collected from the SCIE/SSCI indices in the WoS database. Although the WoS database contains a large volume of high impact research articles, some valuable research has been also published in books or collected in other indices. Diversified data sources should therefore be considered in the future. This study also excluded reviews from the data collection. In the future study, the trends of review research including systematic review, literature review and meta-analysis could be conducted.