1 Introduction

English is not a primary language in Korea. As a result, very few people find themselves using English for communication purposes in their daily lives, which results in English being taught (particularly in the early stages of education) purely in the context of an academic subject. Korea’s primary education is provided on both a public and private basis, and each type provides a different quality of education. In the case of English, public schools often utilize Korean teachers lecturing to students in a non-native context. However, private schools can afford to hire native English teachers while also providing a wide array of extracurricular activities to further foster learning. Due to this discrepancy in budgets, private schools are more expensive than public schools. Consequently, children’s English proficiency is influenced by parental socioeconomic status (henceforth, SES). This gap is larger for other subjects, such as mathematics [1, 2], but in the context of students’ English skills, it widens as further English education takes place outside the regular curriculum of schools, such as in private institutions or homes. Many Korean children learn English at private institutions with full financial support from their parents, which may occur before a student even enters elementary school and tends to continue alongside the regular curriculum.

EBS, the South Korean public educational broadcasting service, has been making continuous efforts to compete with expensive private education by providing quality online education for free. It provides recorded lectures on many subjects, including but not limited to English. This feature of EBS came into the spotlight as the need for online education increased with the COVID-19 outbreak. During the onset of the pandemic, many schools were forced to switch from offline classes to e-classes to prevent the spread of COVID-19. However, most schools at the time lacked the capacity and infrastructure to offer quality online classes. To prevent the complete suspension of public education, the government requested that EBS provide online class materials for elementary, middle, and high school students. However, online classes with recorded one-way lectures that lacked interaction between teachers and students had difficulty motivating or capturing the academic interest of students. In addition to concerns about the overall degradation of the quality of education, it was expected that the gap in students’ learning incurred by differences in parental SES would be exacerbated during the pandemic. Such gaps were likely to become larger for elementary English education, which is largely influenced by students’ parents. A study conducted in Japan in 2002 showed that the gap widened as Saturday classes in schools were cancelled, resulting in students spending more time at home [3].

Before the outbreak of COVID-19, EBS started a project to develop an artificial intelligence-based English tutoring platform called AI-Pengtalk with technical support from the Electronics and Telecommunications Research Institute (ETRI) and the NHN consortium. Pengsoo, the mascot of EBS that is very popular with children, is an AI English teacher that teaches English conversation through an education platform called AI-Pengtalk. Pengsoo is equipped with ETRI’s perception and learning algorithms, and AI-Pengtalk adopted NHN’s game-design elements and game principles.Footnote 1 This project was funded by the Ministry of Education in South Korea. The original EBS plan was to provide AI-Pengtalk as a supplement for English classes in schools. Due to the outbreak of the COVID-19 pandemic, AI-Pengtalk was introduced earlier than planned to supplement online English classes. Accordingly, a study to test the effectiveness of AI-Pengtalk was conducted ahead of the original plan. Through nationwide recruitment, 108 fourth-grade classes in 54 elementary schools voluntarily participated in this study. In each school, one class was randomly selected to use AI-Pengtalk in English classes, while another class was told to use EBS’s conventional online materials. The experiment was conducted over the course of four weeks, from April 27 to May 22, 2020.

This paper aims to investigate whether AI-Pengtalk contributes to improving students' English proficiency and attitudes towards English. It also aims to determine whether AI-Pengtalk mitigates the disparity in English ability caused by differences in students’ endowment, including parental SES. Data obtained from the experiment, such as pre- and post-experiment English test scores, were used for the empirical exercise.

2 Related literature

Artificial intelligence (AI), enhanced with the recent development of big data and machine learning, has been applied to education services. AI that has the explicit purpose of tutoring has various names. For example, a chatbot, a portmanteau of “chat” and “robot”, is a computer programme with the ability to connect with a human using natural language speech [4]. Early chatbots were chiefly text-based programmes that understood and answered text entered by users. Recently, however, they have been developed into voice-based operations or multimedia files that can be used to communicate through a combination of STT (sound-to-text) and TTS (text-to-sound) technologies. Chatbots can provide immediate user feedback through dialogue during learning and customized content based on feedback, making interaction very easy. Because of these characteristics, they are mainly used in the field of education with a focus on language. Duolingo, the quintessential example, uses AI technology to conduct real-time conversations with people in specific situations. Eggbun conducts learning based on an interactive interface and is designed to enable feedback through preset menus and answers for word and sentence learning. Other types of chatbots currently being developed and used for language studies include Alexa (developed by Amazon), Google Assistant (developed by Google), and Cleverbot (developed by the British scientist Rollo Carpenter).

Recently, research has been conducted to determine the possibility of using AI-based software for learning English and its educational effectiveness [5,6,7,8,9,10,11]. Randall [12] provides an in-depth literature review on the effects of using robots in language learning. One of the most representative studies is Köse and Arslan [5], which was conducted among university students in Turkey, Italy, and Romania. In this study, researchers compared a control group in which English classes were conducted using only the conventional face-to-face method and an experimental group that used both the face-to-face method and AI-based online learning software, as shown in Fig. 1. They found that higher academic performance and test scores were achieved when AI-based online learning software was combined with the existing method [5].

Fig. 1
figure 1

The main steps of analysis and processing to perform human–computer conversation source: Abdul-Kader and Woods (2015)

The main advantages of AI-based learning software are that learners can learn autonomously at their own pace; learning materials are provided according to each learner's level; learning is possible anytime and anywhere and without time and space constraints, which is convenient for repetitive learning; and it can lead to increased motivation, an increased sense of ease and comfort in a learning environment, the stimulation of essential learning behaviours, and increased smoothness of information and communication processes [5, 6, 13,14,15,16,17]. However, AI chatbots continue to be limited in their capacity to train students’ conversational English skills. It has been observed that AI chatbots often fail to recognize non-native speakers’ English pronunciation and that conversations on free topics with AI chatbots remain limited due to the lack of availability of a knowledge database for dialogue [18,19,20,21,22].

Despite the rapid expansion of AI technology in second-language education, there are few studies that evaluate its effectiveness using the actual academic performance data of participants. Some studies use test scores to show the effect of robot-assisted language learning. However, the data used in these studies are apparently not generated by a randomized experimental design and/or include limited samples. For example, Ruan et al. [9] used 60 samples collected from voluntary participants at a university in China to statistically evaluate the learning benefits of EnglishBot. Wang et al. [23] used 327 samples collected from a primary school in China; therefore, their sample represented only a limited part of the entire population or peer group. To our knowledge, this study is the first to conduct a nationwide randomized experiment, collect students’ academic performance data, and empirically test whether using AI-based software improves students’ conversational English learning.

3 Study design

3.1 AI-Pengtalk, a two-way English tutoring AI

AI-Pengtalk is a chatbot (chat robot) powered by AI that interacts with students, perceives their personality and language skills, and provides customized English conversation classes. As shown in Fig. 2, the popular mascot of EBS, Pengsoo, is introduced as a native English teacher and/or as a friend in Al-Pengtalk. According to Randall [12], humans’ perception of the emotional expressions of AI plays a crucial role in human–robot interaction. AI-Pengtalk therefore employs Pengsoo and uses facial expressions and body movements to convey actions, intentions, and emotions.

Fig. 2
figure 2

Pengsoo in “AI-Pengtalk

AI-Pengtalk works in both online and offline educational environments to enable students to learn English regardless of their location. The content of AI-Pengtalk for fourth graders was developed based on the content of the five most commonly selected English textbooks in Korea. It is built on advanced machine learning technology, including artificial intelligence and voice recognition. Students can have real conversations with the 3D character “Pengsoo” and see the scores of their conversation practice immediately. AI-Pengtalk is linked with a scientific LCMS (Learning Content Management System), which uses learning big data to enable students to learn English with minimum effort (Fig. 3). When students interact with an AI-embodied tutor (software), they feel less nervous about making mistakes [9, 23, 24]. Furthermore, the app used in this study, AI-Pengtalk, encourages students to talk and practice repeatedly and regularly. This helps students build confidence in English over time [25].

Fig. 3
figure 3

AI-Pengtalk system

Intonation, one of the most difficult aspects for second-language English learners, can be improved by real-time visualization of a student’s own intonation compared with a prerecorded native speaker (see Fig. 4).

Fig. 4
figure 4

English conversation analysis in AI-Pengtalk

The effectiveness of immediate feedback with visualization has been well addressed by previous studies [26, 27]. Teachers are able to closely monitor students’ learning status through the teacher’s portal on the website. Moreover, teachers can check the overall status of students’ progress before class and plan upcoming classes accordingly. Teachers trace how students have studied with AI-Pengtalk, the duration of their study time, and their progress in more than five categories, such as "Topic World”, “Speaking”, “Let’s Talk”, “Scan It”, and “School Talk”. During class, teachers can use AI-Pengtalk for both individual and group work. After class, students can continue their studies and reflect upon feedback from their teachers (Fig. 5).

Fig. 5
figure 5

AI-Pengtalk learning system

3.2 Experimental design

We used a randomized control trial that randomly assigned AI-Pengtalk users. Figure 6 depicts the study design. Reflecting the regional distribution of all elementary schools nationwide, we set the target number of school participants for 18 regions. There are 17 provinces in Korea, but we assigned 2 regions to Seoul given the high level of heterogeneity between the Gangnam and non-Gangnam areas of Seoul. With the help of the Ministry of Education and the use of EBS’s school network, we recruited schools nationwide to voluntarily participate in the study. A school that was interested in this study could join the experiment through the EBS English website. When the target number of schools for a region was met, we closed the application window in the region. We ultimately found a total of 54 schools interested in taking part. In each school, two fourth-grade elementary classes taught by an English teacher were selected, one for the treatment group and the other for the control group. A total of 940 fourth graders across the country participated in our AI-Pengtalk experiment. Students in the treatment group (T) used AI-Pengtalk in English conversation classes. Students in the control group (C) used EBS’s traditional online content in English conversation classes. This separation was maintained during the study period from April 27 to May 22 in 2020. After the study period, AI-Pengtalk became available to every fourth-grade student in Korea.

Fig. 6
figure 6

Setting samples for this study

Two sets of English tests were taken before and after the experimentFootnote 2 to measure students' level of improvement in English proficiency. In addition to test scores, responses obtained from the two surveys were used in our analysis. Surveys were conducted using Google Online Survey. The pre hoc survey included questions about attitudes towards English (such as confidence in English, subjective evaluation of English skills, interest, motivation, and desire for English skills), self-evaluated level of English proficiency, and previous personal efforts for improvement (such as study hours and English learning methods). The questionnaire was made by EBS with the assistance of teachers participating in this study. The post hoc survey contained identical questions. In the follow-up survey, questions on the experience of using AI-Pengtalk were added for students in the treatment group. As shown in Table 1, not all participants completed the two sets of surveys and tests. In the control group, the proportion of first-test participants who took the second test was only 60.2%, and the proportion that completed the two surveys was even lower (51.2%). In the treatment group, the response rate was higher: the proportion of participants that took both tests was 70.6%, and the proportion that completed both surveys was 58.4%.

Table 1 The number of students who took part in surveys and tests

3.3 Sample characteristics and tests for randomization balance

The first plot of Fig. 7 compares the distributions of English conversation test scores for students in the control and the treatment groups before the experiment and shows insignificant levels of difference in the test scores across the two groups. Table 2 summarizes the learning-related aptitudes and personal characteristics of students in both groups. The last column of Table 2 shows the t test results and confirms that the randomization was successful since there was no statistically significant difference in the attributes of participants’ personalities across the two groups at baseline.

Fig. 7
figure 7

Distribution of samples across treatment and control groups at baseline

Table 2 T test results: learning-related aptitude and personality

Table 3 also summarizes the responses of the survey concerning whether participants had learned English conversation at school or outside the school before the experiment. As shown in the shaded area of Table 3, the proportion of students who learned English conversation with native or Korean English teachers either at school or at private institutes was not statistically the same across the two groups over the entire education period up until fourth grade. The last column of Table 3 shows the corresponding t test outcomes. With the exception of five cases, there were no significant differences in English-speaking learning experiences between the two groups. It would not be desirable to conduct an experiment on two groups with different prior learning experiences. However, the students’ experiences with conversational English education immediately before the experiment (the 1st semester of fourth grade) were not significantly different between the two groups (see Fig. 7). Therefore, if students with poor endowments, such as unfavourable parental SES, physical disability, or introverted personalities, showed improvement in their conversational English abilities after the use of AI-Pengtalk for four weeks, this would suggest that there is potential to overcome the gap in English proficiency associated with the presence of unequal endowment. The second null hypothesis was established to test this differentiated impact of AI-Pengtalk.

Table 3 T test outcomes: the proportion of students with experience in conversational English education

3.4 Testing hypothesis and estimation model

If AI-Pengtalk is an effective tool for teaching conversational English, students in the treatment group should show improvements in their test scores and in their attitudes towards English after the experimental period when compared to the control group. Therefore, changes in test scores, attitudes towards English, and self-confidence in English were set as variables of interest. We evaluated the effect of AI-Pengtalk in improving a student’s English conversation skills. In Table 4, \({C}_{0}\) and \({C}_{1}\) are the respective expected test scores of a student in the control group before and after the experimental period. \({T}_{0}\) and \({T}_{1}\) are the counterparts of a student in the treatment group. A difference between the two periods, (\({T}_{1}\) \({T}_{0}\)), is attributed by two compound learning effects: the effect of conventional English classes and the effect of studying with AI-Pengtalk for a fourth-grade student in the treatment group. In other words, the net effect of AI-Pengtalk has not yet been isolated. A group difference after the experimental period, (\({T}_{1}\) \({C}_{1}\)), includes not only the effect of AI-Pengtalk but also the students’ characteristics that can influence test scores.

Table 4 DID: pre- and post-comparison between two groups

The difference-in-differences (DID) estimator is frequently used to account for these confounders. DID is calculated through the formula (\({T}_{1}\)\({C}_{1}\))–(\({T}_{0}\)\({C}_{0}\)), as shown in Table 4, after obtaining the mean values of the variables of interest for two time periods (0 and 1) in the control and the treatment groups (C and T). Through this method, the net learning effect of AI-Pengtalk can be extracted. The null hypothesis (\({H}_{0}\)) assumes that the DID value, (\({T}_{1}\)\({C}_{1}\))–(\({T}_{0}\)\({C}_{0}\)), is equal to zero. If the null hypothesis is not rejected, we can conclude that the use of AI-Pengtalk does not offer additional effects in improving the English conversation skills of participating students in the treatment group. If the null hypothesis is rejected and the DID value is significantly positive, it implies that the use of AI-Pengtalk improves students’ English conversation skills.

$$H_{0} :E\left[ {\overline{Y}_{t = 1} - \overline{Y}_{t = 0} |T = 1} \right] = E\left[ {\overline{Y}_{t = 1} - \overline{Y}_{t = 0} |T = 0} \right] \to \left( {T_{1} - C_{1} } \right) - \left( {T_{0} - C_{0} } \right) = 0$$
$$H_{A} :\left( {T_{1} - C_{1} } \right) - \left( {T_{0} - C_{0} } \right) \ne 0$$

4 Analysis of experimental outcomes

4.1 AI-Pengtalk and English test scores

The effectiveness of using AI-Pengtalk in improving English test scores was examined by comparing the scores of pre and post hoc tests between the control and treatment groups. As shown in Table 5, both groups showed a large decrease in average scores in the post hoc test. The pre hoc test was set to evaluate students’ English proficiency at a third-grade level, whereas the post hoc test was set to evaluate proficiency at the level of a 1st-semester fourth-grader and was therefore substantially more difficult. Hence, it is possible to observe drops in the groups’ average scores. As shown in Table 5, the average score of the control group decreased by 22.1, while the average score of the treatment group decreased by 19.7. The average score of the treatment group decreased by less than that of the control group by 2.4, although the better performance demonstrated by AI-Pengtalk users was statistically insignificant (Fig. 8).

Table 5 The DID value: English Test Scores
Fig. 8
figure 8

Box plots for English test scores

4.2 AI-Pengtalk and self-evaluated improvement in English conversational skills

A DID approach was employed to test whether the use of AI-Pengtalk provided students with the feeling that their English conversational skills improved. To answer this question, data on students’ self-assessments of their perceived levels of improvement in their English conversational skills obtained through surveys were used. The students’ responses were measured on a 4-point scale (0 = not improved at all, 1 = slightly improved, 2 = somewhat improved, 3 = improved a lot). After the experimental period, self-assessed levels of improvement were 0.34 for the treatment group, which were much larger than the levels for the control group (0.13). The DID value of 0.21 was positive and statistically significant enough to reject the null hypothesis at the 1 per cent significance level (p < 0.01). This is depicted in Fig. 9

Fig. 9
figure 9

Box plots for students’ self-assessments of their English conversational skill improvement

We further investigated students’ responses to seven questions asked in the pre- and post-surveys. Responses were obtained on a 5-point scale: 1—strongly disagree, 2—disagree, 3—uncertain, 4—agree, and 5—strongly agree. As shown in Table 6, which summarizes the questions and responses of the students, the degree of self-assessed improvement in conversational English skills was higher in the treatment group than in the control group. This finding is supported by the positive DID values shown in the last column of Table 7. This value was significantly positive, so we can reject the null hypothesis for three questions: “I can listen and repeat simple English sentences”; “If I go to a grocery market in a foreign country, I can order and pay for five of my favourite fruits in English”; and “If I go to a lost and found in a foreign country, I can ask if they found my lost blue hat in English”. It should be noted that the proportion of students who learned English from native teachers was lower for the treatment group than for the control group (see Table 3 and Fig. 7). Having students study with AI-Pengtalk may ameliorate this discrepancy.

Table 6 The DID value: Self-Evaluation of English Conversational Skill Improvement
Table 7 DID values: self-assessed English-speaking ability

4.3 AI-Pengtalk and English study hours

The pre and post hoc surveys asked students how many hours they spent studying English, including study hours at school or private institutions, online classes, and self-study hours. If the DID value is positive and statistically significant, we can reject the null and conclude that a student’s English study hours will increase when he or she studies with AI-Pengtalk. Table 8 summarizes the average number of students’ English study hours and the DID values. Since the number of study days included private tutoring days, which were assumed to have decreased in frequency due to COVID-19, a decrease in the number of English study days was expected. However, the use of AI-Pengtalk seems to have prevented a decrease in the number of days students spent studying English. While the average number of days the control group spent studying English decreased by 3.68, it decreased by only 2.34 for the treatment group. The average number of English study hours increased by 0.07 for the treatment group and decreased by 0.02 for the control group. Similarly, the average number of hours spent self-studying English increased by 0.09 for the treatment group but decreased by 0.02 for the control group. Every difference was statistically significant, so we rejected the corresponding null hypothesis. These numbers demonstrate that the use of AI-Pengtalk is effective in increasing the time that students spend studying English.

Table 8 Comparisons of English study time between the two groups

Not all students completed the full four-week experiment. In this case, comparing the average test scores of the two groups was likely to obscure the effectiveness of AI-Pengtalk. Instead, we compared English study hours during the experimental period. We obtained these data by examining the log files of the participants in the control and treatment groups using EBS’s conventional English class website and the AI-Pengtalk platform, respectively.Footnote 3 As shown in Table 9, the number of hours spent studying was longer for the treatment group (4,123 s on average, 1.15 h) with AI-Pengtalk compared with users of conventional classes (363 s on average). Within the treatment group, the standard deviation of the students’ AI-Pengtalk usage time was very large. There was a student who, with a minimum value of 10 s, did not actually use the service, and there was also a student with a maximum value of 42,118 s (11.7 h). Therefore, the effectiveness of AI-Pengtalk was remeasured using a regression model with value-added specification. As emphasized in [28], the concept of value added is highly useful in the modern economics of education because an increase in study hours implies increased input in the education production function. Table 9 shows the relationship between post hoc test scores and AI-Pengtalk usage time.Footnote 4 When controlling for pre hoc test scores, the post-assessment scores increased by 2.34 points for each one-hour increase in AI-Pengtalk usage (see Model (1)). Compared to Model (1), Models (2) and (3) employed additional explanatory variables such as learning-related aptitude.Footnote 5 The estimation results from Models (2) and (3) also support the effectiveness of study time using AI-Pengtalk in improving English test scores Table 10.

Table 9 AI-Pengtalk Use Time (unit: second)
Table 10 AI-Pengtalk use time (unit: hour) and English test scores after the experiment

4.4 AI-Pengtalk and students’ attitudes towards English

A positive attitude towards English is important for students to study steadily and achieve long-term learning outcomes. In this regard, students responded to the following six items: “I am good at English”; “Studying English will help me do what I want to do later”; “I am interested in studying English”; “I do my best when I study English”; “I like to speak English”; and “I want to improve my English skills a lot”. Table 11 reports the summary statistics of the students’ responses and the DID values. Before the experiment, the control group generally had more positive attitudes towards English than the treatment group. However, the positive attitudes of the control group became weaker for all questions after the experiment. Students in the treatment group showed similar changes for most questions, but the downwards slopes were smaller than those of the control group. The DID values were uniformly positive, and four were statistically significant to reject the null hypothesis. These results imply that AI-Pengtalk helped students have positive attitudes towards English.

Table 11 Students’ attitudes towards English

5 Conclusion

5.1 Summary

This paper examined whether AI-Pengtalk, an AI-based conversational English programme, is useful in learning conversational English and narrowing the English ability gap associated with differences in parental SES. To test this, 108 fourth-grade classes in 54 elementary schools voluntarily participated in an experiment over the course of four weeks from 27 April to 22 May 2020. One of the two classes in each school was designated as the treatment group, while the other was designated as the control group. For the treatment group, a tablet with AI-Pengtalk installed was provided. Two sets of pre and post hoc surveys and English tests were conducted. After four weeks of using AI-Pengtalk, the test scores, log files, and survey responses of the participants were analysed. A series of DID analyses demonstrated that the use of AI-Pengtalk improved students’ self-confidence in their English skills and overall interest in English and increased their amount of English study time.

This study shows that AI-Pengtalk is worth applying to customized learning and/or remote education. AI-Pengtalk is designed to evaluate a student’s English proficiency level, adjust the class accordingly, and enhance learning with a pedagogy customized to each student. If students feel something is lacking after class, repetitive learning is possible in the AI-Pengtalk app. This is even more valuable in abnormal situations such as the onset of COVID-19, which involves a considerable need for non-face-to-face education with social distancing policies. This smart education can help underperforming or introverted students learn the basics through a non-face-to-face method, and it can motivate students through immediate feedback and video game-like immersion. AI-based customized education can provide greater help, especially to academically low-achieving students and/or students with low levels of parental SES. Therefore, this study advocates the introduction of such a system in schools through public supportFootnote 6 to increase English education efficacy and narrow the educational gap.

5.2 Discussion and limitations of this study

Young students from low-income households or families living in less-developed rural counties are unlikely to have opportunities to learn English conversation with native English tutors. This results in an English conversation capacity gap between students with high parental SES and those with low SES. Our study finds that the effect of using Al-Pengtalk to improve English conversational learning is larger for students without native English-speaking tutor experience than for their counterparts. Given that education is one of the most representative public goods, it is recommended that the government expand the use of AI technology in English education to achieve one of the Sustainable Development Goals, “Leaving No One Behind”. It should also be noted that students’ endowment includes not only their parents’ SES but also their physical conditions and/or social skills. Compared to offline learning, obstacles faced by students with inabilities are quite marginal. Additionally, the participants of this study mentioned that they were more comfortable having English conversations with AI-Pengtalk than with a human [9, 23, 24]. The emotional benefits were specifically larger for students who identified themselves as being shy. As suggested in a previous study [30], AI-Pengtalk can provide more personalized learning and guidance to students with special needs.

English conversation is one of the most difficult subjects to teach with a robot because there are many unstructured elements and no predetermined conclusions. The effectiveness of robotic teaching in this area means that there is great potential for robots to improve learning in other subjects as well. Especially in subjects with many predetermined outcomes, such as math, physics, and chemistry, robot teachers can enhance learning by visualizing or repeating existing knowledge at a pace appropriate to the student's level. However, this does not mean that a robot teacher can replace human teachers entirely. Robotic teachers are effective for students who are already self-regulated or self-motivated but are less effective for those without these characteristics. In addition, native language education is an area that does not show promise with the use of AI. This is because habit formation and repetitive practice, which are essential to language learning, are naturally supplied in everyday life in the case of the mother tongue.

This study has several limitations. The post-outcomes used in this study were test scores and responses to a postsurvey conducted after the four-week experiment. Therefore, this study tested the impact of the use of AI-Pengtalk on short-term changes in students’ English conversational performance and their attitudes towards English conversation. In other words, long-term effects were not included in the scope of this study. In addition, as AI-Pengtalk provides students with various learning modules, it is expected that its effectiveness depends on students’ module choices. This study did not take this into account when evaluating the impact of AI-Pengtalk. Therefore, follow-up studies are needed to increase the credibility of the research findings by measuring whether the actual AI usage data and actual learning results match [12]. We plan to address these topics in future studies and to provide guidelines that can be used for developing AI-assisted language learning software.