1 Introduction

In Asia, lots of countries start their English learning from their early childhood. For example, in Taiwan, it started to implement English teaching on third-grade students in elementary school as foreign language learning (EFL). English courses in Taiwan not only put a focus on language learning and build up language skills but also put much emphasis on applying language in daily life communication. Therefore, building up students’ English speaking skills and making students speak more English in their daily life are English learning priorities. However, English as a Foreign Language (EFL) education usually ignores real-life scenarios in learning activities design and English teaching materials are not natural enough for students to connect learning with their daily life [1, 2]. In addition, English as EFL learners do not have numerous chances to speak English in their daily life so that students often lose their confidence and are afraid of speaking English in real life situations. It implies that there is a gap in English as a Foreign Language (EFL) education that often lacks real-life scenarios and fails to provide natural learning materials that can be connected to students’ daily lives. On the other hand, although some researchers focused on students’ speaking and listening performance and did some remarkable improvement in EFL learning with intelligent speech signal processing technology [3, 4], the emphasis on students’ mistakes in their speaking was not considered much. Moreover, it has the potential to support students to overcome difficulties and give them instant feedback when practicing English speaking [5]. To overcome these challenges, we proposed an English-speaking practice system—UEnglish app, which aims to connect English speaking practices to students' daily lives. With the UEnglish app, we utilized automatic speech recognition (ASR) and text-to-speech (TTS) technologies to facilitate students' English speaking and pronunciation and provide them with instant feedback for revision.

In this study, we investigated the effects of proposed UEnglish to English speaking by comparing two groups with and without UEnglish support. Furthermore, students’ learning behaviors were also investigated while using UEnglish and their influences on learning achievements of English speaking. From data analyzed results, we found that UEnglish will enhance students' learning achievements in English speaking, as well as their motivation, creativity, and overall well-being. These results implied that the proposed UEnglish app can really facilitate students’ English speaking in authentic contexts as the advent of speech recognition.

We expected that the readership of the paper, such as educators, researchers, and practitioners in the field of EFL education and mobile-assisted language learning will recognize some main key takeaways of this study: from pedagogical aspect, they are the potential influences of UEnglish app to improve students’ English speaking skills in authentic contexts and their motivation, creativity, and health; from technology aspect, they are the significance of providing instant feedback and the effective support through speech recognition and text-to-speech technologies.

To this end, the structure of this paper was organized into six sections. Firstly, the Sect. 1 presented general background and pointed out the research gaps this study aims to fulfill. Secondly, Sect. 2 focus on reviewing the recent related studies and knowledge that were relevant to the research objectives. Thirdly, Sect. 3 aimed to propose UEnglish app to help students to practice English speaking in daily lives. Fourthly, Sect. 4 presented the description of the experiment design, research measurement and research tools that were used in this study. Next, the data analysis results would be and extend discussions related previous studies would also be delivered in the Sect. 5. Finally, Sect. 6 highlighted the unique findings and main contributions of the study.

2 Literature Review

In this Sect. 2, we established a comprehensive survey of the existing research related to technology assisting EFL learning topics to establish the context of our research, to highlight the research gaps, to debate and point out the unresolved research questions that our study aims to address. The first subsection provided an overview of previous studies on how mobile devices facilitate EFL learning in authentic context. The second subsection focused on the enormous potential of advanced technologies on supporting EFL speaking. The third subsection estimated some specific popular questionnaires that are suitable for authentic contextual learning discipline and considered one set of questionnaires that best fit in our study. Final subsection synthesized the basis that contributed to the research questions creation.

2.1 Facilitating EFL Learning with Mobile Devices in Authentic Context

EFL learning in this study is considered as a sociocultural/constructivist theory which takes ideas from both biology and sociology to interpret language acquisition [6]. This language acquisition theory states that children are able to learn language out of a desire to communicate with their surrounding physical environment and culture. Moreover, the theory stated that our environment and culture when we grow up has a heavy influence on how quickly and how well we learn to talk. It implied that EFL learning in authentic contexts including physical environment and culture could provide more meaningful cognitive supports rather than in the traditional in-door classrooms. Moreover, according to Enactivism theory which combined constructivism and cognition, learning environment and cognition are inseparable where students are not only absorbed knowledge from the environment passively but also interact with the environment actively to obtain knowledge [7]. Therefore, study in an authentic context could provide more meaningful cognitive support rather than in the traditional in-door classrooms.

Mobile technology allows learners to practice speaking actively and many previous studies stated that mobile technologies could really enhance their second language acquisition [8, 9]. There was a potential for students to practice English based on their surrounding daily life contexts. However, there are still limitations in authentic contexts such as lack of instant guidance and support to practice effectively and efficiently. Currently, with proper support of mobile devices from cloud services like ASR, MT and TTS, students can have a good practice of English speaking anytime and anywhere based on authentic contexts surrounding them in their daily life. Authentic contexts in this study means the physical environments in the real world. Therefore, authentic contextual learning is defined in this study as a learning curriculum involving learners to engage themselves in activities supported by authentic contexts. Hence, authentic contextual learning can help learners to get rich experience through resources in authentic contexts with the support of advanced educational technologies.

Normally, students acquire knowledge and information to apply what they learned to solve the problems in daily life by learning from previous knowledge. However, EFL students usually hesitated in speaking English and applying what they had learned in their life. Thus, the rising question to the teachers and researchers is how to facilitate students to learn EFL in authentic contexts. In this circumstance, authentic contextual learning with mobile devices’ support were found to not only improve students’ language learning ability but also increase their learning motivation [4].

2.2 Advanced Technologies Support EFL Speaking in Authentic Contexts

With rapid development of natural language processing (NLP), more and more learning assisted tools were investigated to help students learn EFL in authentic contexts which are more student-centered and more self-regulated. Regarding mobile-assisted language learning with NLP, there were several technologies that facilitated English learning better and improved students’ inner motivation such as ASR, TTS, and MT.

With help of ASR, students received feedback immediately to revise and practice their speaking again and again [10]. They could not only improve accuracy and fluency of their speaking but could also speak more complicated sentences while expressing their thoughts and feelings. Moreover, ASR was recognized as a better tool for EFL learners to improve pronunciation in a shorter time than traditional teaching methods [11]. As ASR offered students more authentic opportunities to speak English, they could speak more interactive and could use more contextualized language in the scenario-based tasks [5]. Application of ASR in English learning had been determined previously, but there was no in-depth analysis on the impact of speech recognition in English speaking learning. Therefore, this study aimed to investigate the impacts of speech recognition on students’ learning behaviors and learning achievement of English speaking.

Lee [12] found that MT is a useful tool to help reduce students’ lexico-grammatical errors, depressing their anxiety toward language learning, and producing positive learning outcomes. However, Lee [12] stated that teachers needed to be aware of its reliability and should provide appropriate guidance to students, especially to some low proficiency EFL learners who required sufficient listening input to construct speaking output. In these cases, TTS technology which provides auditory input and converts text to spoken language could act as an extra important support tool [13]. With TTS, the auditory text is expressed in an authentic way so that students might encounter less difficulty with the content reading and lower their extraneous load [14].

Recently, many tools and applications have been proposed and investigated the impacts on English learning. Hwang et al. [6] designed a mobile game-based learning system, a jigsaw-like game, that enabled students to practice English listening and speaking. The learning material was specifically tailored to authentic contexts such as classrooms and schoolyards, making it meaningful to the students. The findings demonstrated that learning in authentic contexts significantly increased students’ motivation and improved the accuracy and clarity of their spoken English. Nguyen et al. [2] developed ezTranslate app for mobile to practice English-speaking. This app allowed students to input Chinese sentences through voice, which were then translated into English. The translated results were pronounced using text-to-speech technology. Additionally, the app incorporated automatic speech recognition for pronunciation exercises. By engaging in this practice both inside and outside the classroom, students had the opportunity to enhance their English language skills. Zhang et al. [15] created a system that enabled students to annotate prepared learning materials in authentic contexts. Through an activity called “English Drama,” students could take photos, upload them, and annotate them using voice recordings. This approach allowed students to practice sentences and develop their English-speaking abilities. Liu et al. [16] developed an annotatable multimedia E-reader (AME) to enhance learners' efficiency, motivation, and interactivity in learning English. The study found that using the AME benefited English learning and students who engaged in a greater diversity of learning behaviors during drama activities achieved higher learning scores across various dimensions. In general, the proposed tools and apps in aforementioned studies proved the high beneficials and potential support to learners in learning English.

2.3 The SSACL Questionnaire

Regarding the tools for students’ perception investigating, many previous studies relied on well-known tools like TAM and ARCS [17]. The Technology Acceptance Model (TAM) questionnaire was first proposed by Davis and Smith [18] to investigate the impact of technology on users’ behavior. The model focuses on the process of using technology, where Perceived Usefulness and Perceived Ease of Use are the two key factors that affect an individual’s intention to use a technology [19]. Perceived Usefulness means that the user believes the technology will improve his/her performance, while Perceived Ease of Use refers to the belief that using the technology will be free of effort [18]. The ARCS Questionnaire is an instrument for investigating students’ learning motivation toward the proposing learning system in terms of Attention, Relevance, Confidence, and Satisfaction [20]. The first dimension of the questionnaire, Attention or Interest, aimed to investigate the aspect of students’ curiosity and continuous attention toward the learning activities. The second dimension of the questionnaire, Relevant or Relationship, aimed to investigate how the learning subject connected with students’ learning interests. The third dimension of the questionnaire, Confidence, aimed to investigate how much students build confidence through the learning activities. The fourth dimension of the questionnaire, Satisfaction, aimed to investigate how much students managed intrinsic and extrinsic reinforcement through the learning activities. Although widely used, the TAM and ARCS instruments still fail to investigate some aspects of learning that are specific for learning in authentic contexts (for example, healthy learning).

In this study, the learning activities were designed in authentic contexts and therefore, should be evaluated with a more suitable questionnaire that are specific for authentic context learning. Consequently, the Sustainable and Scalable Authentic Contextual Learning (SSACL) questionnaire was adopted from a study by Hwang et al. [21] to investigate students' learning perception toward the proposed learning approach. The SSACL questionnaire consisted of six dimensions, learning by applying, healthy learning, collaborative learning, creativity, sustainability, and scalability (Appendix 3). The constructs of the SSACL questionnaire was validated by Ref. [21] and the developed framework has a good predictive power for evaluating the effectiveness of a SSACL design. Therefore, in this study, SSACL was suitable for examining students’ meaningful learning by applying in authentic contexts.

2.4 Research Questions

After reviewing the influence of the authentic contexts on language learning and the potential of mobile technologies to enhance EFL learning, it is worthy to highlight that there are limitations in authentic contextual learning, specifically the lack of immediate guidance and support to EFL practice effectively and efficiently. Moreover, while the impacts of speech recognition on English speaking learning had been recognized in many aforementioned studies, there is still a need for an in-depth analysis of its impact on students' learning behaviors and achievements. Therefore, we raise three research questions as follows:

  1. (1)

    Are there any significant differences in learning achievements of English speaking between students who use UEnglish and students who learn English with a traditional teaching method?

  2. (2)

    What is the correlation between learning behaviors and learning achievement of English speaking in EG and which learning behaviors significantly predict learning achievement of English speaking?

  3. (3)

    What are students’ perceptions toward using UEnglish and reasons why UEnglish can facilitate students’ English speaking learning in authentic contexts?

3 System Design and Implementation

The system architecture of UEnglish app was described in Fig. 1. First, through the aid of ASR, UEnglish can give immediate assistance to help students use their mother language to ask for correct words and sentences in English in authentic contexts whenever they encounter something new around them and do not know how to say the words or sentences in English. Second, MT helps students translate text into English, but it seems difficult for those with low proficiency EFL learners to comprehend how to pronounce. Therefore, text-to-speech technology plays an important role here, it provides auditory input for these learners, and they can learn how to pronounce the words or sentences correctly. Last but not least, through ASR, the system gives students immediate feedback and facilitates students’ English speaking. By the scoring and correction system, students can revise and improve their speaking.

Fig. 1
figure 1

UEnglish system architecture

The major functions of UEnglish were described as follows:

  1. (1)

    Speech Translation: The speech translation function of the UEnglish app was built by combining three cloud services, Google speech recognition, Google translate and text-to-speech, to instantly translate Chinese speech into English speech or English speech into Chinese speech. Students could use their mother language to look up the words or sentences that they were unfamiliar with. For instance, student said, “我看到一張地圖在黑板上,” UEnglish would show “I see a map on the blackboard.” with both auditory and visual output (Fig. 2).

    Fig. 2
    figure 2

    Speaking translation

  2. (2)

    Speech Shadowing and Instant Feedback: Speech shadowing function allows students to imitate referent sounds, train their phoneme perception toward shadowing, and practice again and again until they feel satisfied. When students imitated referent sounds, there were two kinds of feedback for them: pronunciation accuracy and speaking fluency. The first type of feedback, pronunciation accuracy, is the word-for-word comparison of the differences between the student’s shadowing sentence and the system's translation sentence. It corrects differences in sentences by strikethrough and underscores and provides a percentage of accuracy. For instance, students shadowed referent sounds “I see a map on the blackboard” and UEnglish gave instant feedback of pronunciation accuracy of 93/100 (Fig. 3). UEnglish also highlighted the mispronunciation of words so that students would pay more attention that they have difficulties in pronouncing. The other type of feedback, speaking fluency, provides their fluency on a 100-point scale (Fig. 4). After students’ speaking was recorded and uploaded to the cloud, the uploaded records were recognized automatically and scored using Google Open Source, Diff Match Patch (https://opensource.google/projects/diff-match-patch). With this feedback, students can know the fluency of their speaking and improve fluency through repeated listening and speaking practice.

    Fig. 3
    figure 3

    Speaking accuracy

    Fig. 4
    figure 4

    Speaking feedback

  3. (3)

    Adding photos, flashcards, history, and assignment: With adding photos function (Fig. 5) of UEnglish, students were allowed to add the photos they took in authentic contexts to UEnglish. The photos they took were considered as flashcards that they made on their own and therefore, they could recall and memorize words or sentences easier. Usage histories keep all learning records of students where they could submit their speech assignments as many times as they want (Fig. 6).

    Fig. 5
    figure 5

    Adding photos

    Fig. 6
    figure 6

    History list

4 Research Methods

4.1 Experimental Procedure

According to the proposed research questions, this study aims to demonstrate causality between an intervention (learning English with/without UEnglish app support) and an outcome (students’ learning achievements). Thus, the quasi-experimental research was designed to investigate the differences in learning outcomes of different treated groups. A total of 56 fifth-grade elementary school students (10–11 years old) from two classes volunteered to participate in this experiment and were divided into two groups, EG (N = 28, Male = 12, Female = 16) and CG (N = 28, Male = 13, Female = 15) that learned English with and without UEnglish app support, respectively. The experiment lasted for one month and a half, three English sessions every week (40 min each class). Figure 7 present the experimental flow diagram. The contents of this learning activities followed the structure of the school textbook and consisted of two topics: 1. Things around students’ school environment and 2. Lunch Time. Every topic lasted for 3 weeks and was divided into two stages. For the first learning topic, EG used UEnglish to look up the words they interested in and related to given topics and daily life. They could practice and repeat what they heard by clicking the listening and speaking buttons in UEngish app (Figs. 2, 3, and 4). Through UEnglish support and feedback, participants facilitated their English speaking immediately. They could take some pictures which were related to their given topics to support their content of speaking (Fig. 5). For the second learning topic, participants handed their assignments and uploaded to cloud service to store their voice recordings. CG had the same speaking assignment as EG at the same time. However, they used voice record function of the devices (without UEnglish app) to record their voice and uploaded to the cloud system. While EG could look up the words, take pictures and get instant feedback to help them modify their speaking errors, CG had textbooks, teachers and peers to ask for help.

Fig. 7
figure 7

Experimental flow diagram

4.2 Research Structure and Research Variables

The experimental variables data flow diagram included three main types of variables data that were recorded and stored: control variables, independent variables, and dependent variables (Fig. 8).

Fig. 8
figure 8

Experimental variables data flow diagram

The independent variables data were EG (who practiced their English speaking with UEnglish) and CG (who practiced their English with traditional teaching methods). The control variables data consisted of ‘learning time’, ‘learning materials’ and ‘teacher’ which means the activities designed for both EG and CG had the same learning time, the same learning materials and the same teacher. The dependent variables consisted of two categories: ‘Learning achievement’ category and ‘Learning behaviors’ category. ‘Learning achievement’ category basically was the test score to evaluate students’ learning achievement of speaking (i.e., ‘performance in the test’), including three dimensions: Read out words (ROW), Describe comic pictures (DCP), and Describe authentic pictures (DAP). To achieve ROW scores, students have to read correctly vocabularies based on provide pictures; To achieve DCP scores, students have to make at least two sentences to describe comic pictures; To achieve DAP scores, students have to try their best to say more than three sentences about authentic pictures which indicate the pictures taken in authentic contexts and sentences possess logical and contextual. In ‘Learning behaviors’ category, the variables data were selected on the basis of previous studies [2, 22, 23] and there were two dimensions: basic ‘learning behaviors’ variables (including ‘Number of translations’, ‘Number of taking pictures’, and ‘Number of speaking practices’) and ‘performance in practice’ variables (‘Speaking accuracy’, ‘Speaking Fluency’, ‘Weighted speaking accuracy’, and ‘Weighted speaking fluency’). The ‘Number of translations’ variable is defined as the times students used the function of translation and the content of translation is related to students’ learning or not. The ‘Number of taking pictures’ variable is defined as the times students used the function of taking pictures. The ‘Number of speaking practices’ variable is defined as the number of voice records students practiced in authentic contexts and uploaded to the cloud service. The ‘Speaking accuracy’ variable is scored automatically by Google Speech service. The score represents the confidence value (0 to 100) of Google speech API (Application Programming Interface) with the given transcription being correct. For example, if the system shows “Your accuracy = 93/100”, it means that Google speech API has 93% confidence that the transcription is correct. The total score of speaking accuracy is 100. If students got 100 points on speaking accuracy, UEnglish would give students a fluency score in term of ‘Speaking Fluency’ variable. The ‘Weighted speaking accuracy’ variable was defined as the students’ accuracy of speaking practice multiplied by their complexity of speaking practice. The complexity was automatically calculated by a formula in excel software. Most readability formulas (methods of measuring or predicting the difficulty level of a text by analyzing sample passages, for example, spell checkers or grammar checkers from Microsoft office products) use the number of words in a sentence to measure its difficulty. Besides that, a study by Ref. [15] used the length of T-unit (complexity = Number of words in a sentence/T-unit) to calculate the complexity of the sentences. The ‘Weighted speaking fluency’ variable was defined as students’ fluency of speaking practice multiplied by their complexity of speaking practices. After multiplying the complexity of speaking practices, we divide the value by the number of practices.

4.3 Research Tools

There were three tools in this study, the pre- and post-test of English speaking, SSACL questionnaire, and a semi-structured interview questionnaire. In the pre- and post-test, each test had 10 items to test the speaking abilities. The content of the tests was designed from easy to difficult levels (Appendix 1). The tests were composed by an English teacher who has more than 10-year-teaching experience and was validated by consulting an expert in English education. The evaluation of students’ speaking tests is adapted from Ref. [24] for the oral proficiency test with scoring categories and scoring criteria details were shown in Appendix 2. To investigate students’ perceptions of using UEnglish in authentic contexts, the SSACL questionnaire [21] was utilized. The questionnaire consisted of 38 items which were divided into six dimensions (learning by applying, healthy learning, collaborative learning, creativity, sustainability and scalability). Each item was evaluated by students in Likert 5 scale including: 1. Strongly Disagree, 2. Disagree, 3. Neutral, 4. Agree, and 5. Strongly agree (Appendix 3). The third research tool was an interview following the Focus group research method [25]. Our semi-structure interview questionnaire included 8 open-end questions to explore the reasons deeply behind the statistical results. We selected three high achievement students and three low achievement students for the interview to know their comments and perceptions toward using UEnglish.

5 Results and Discussion

To answer the research question 1 regarding the differences in learning achievements of English speaking between students who use UEnglish and students who learn English with a traditional teaching method, we used t-test analysis to compare the pre- and post- test scores of the two groups and the data analysis results were presented in Sect. 5.1 Learning Achievements results of the two groups. To answer the research question 2 regarding the correlation between learning behaviors and learning achievement of English speaking in EG and which learning behaviors significantly predict learning achievement of English speaking, we utilized Pearson Correlation analysis and Stepwise Multiple regression on the learning behaviors data and the learning achievement data. The data analysis results were presented in Sect. 5.2 Learning behaviors and performance in practice in correlation with Learning Achievement and Sect. 5.3 Learning Behaviors and Performance in Practice to Predict Learning Achievement. To answer the research question 3 regarding students’ perceptions toward using UEnglish and reasons why UEnglish can facilitate students’ English speaking learning in authentic contexts, we relied on the SSACL questionnaire and the results were presented in Sect. 5.4 Students’ Perception toward UEnglish.

5.1 Learning Achievements Results of the Two Groups

Tables 1 and 2 showed no statistically significant differences between two groups in pre-test scores (t = − 0.126, p = 0.90) in terms of total score and in terms of each dimension including ROW, DCP and DAP. These insignificant results mean that the two groups did not have significant differences in prior knowledge. On the other hand, EG post-test score was significantly better than CG’s (t = 4.268, p = 0.00) which inferred that UEnglish was useful and meaningful for students’ learning achievements in terms of total score (Table 1).

Table 1 The investigation of comparison of pre-test and post-test between two groups
Table 2 Analysis of three dimensions of pre-test between two groups

Further investigation on each dimension of the post-test score, Table 3 showed that ROW of EG (M = 57.5000) and CG (M = 56.0714) were quite similar while there was a remarkable difference in the dimension of DCP (t = 5.652, p = 0.00) and DAP (t = 5.192, p = 0.00). There was no difference in ROW dimension between the two groups because the difficulty level of ROW is not as high as the DAP, ROW only required students to read out simple words while DAP required students to form up and read the completed sentences. Moreover, because all the participants were fifth-grade students, both EG and CG were capable enough to read out single words. Despite there was no significant difference in ROW dimension, EG still showed higher scores in ROW dimension compared to CG and EG students expressed that they used UEnglish functions to recall the words. One EG student said: “Whenever I forget the word which I look up, I can recall the words through the pictures which I took.”. In DAP dimension, EG’s post-test scores were significantly higher than CG which means UEnglish helps students’ English describing the surroundings objects better. It is because UEnglish provided students with appropriate scaffolds to make their English sentences and gave students words around them immediately which gave them opportunities to broaden their inner words database to connect what they saw and applied in their English speaking.

Table 3 Analysis of three dimensions of post-test between two groups

UEnglish was found to facilitate students to make longer sentences and motivated them to use more words. For example, based on students’ records, an EG student could say “I saw an electronic whiteboard” while a CG student could say a simpler sentence “I saw a board” in a similar context. By learning more and more words using UEnglish, students could make more meaningful and abundant sentences to describe and convey what they see. The EG student stated that “I can look up the English words I need through UEnglish so that I can name the objects I see”. As students learned lots of words from UEnglish to describe their surroundings, their English speaking improved.

5.2 Learning Behaviors and Performance in Practice in Correlation with Learning Achievement

In this subsection, Pearson correlation was utilized to analyze the correlation among the three variable groups of the basic ‘learning behaviors’, the ‘performance in practice’, and the ‘performance in the test’ (which were mentioned in Research structure and research variables subsection) and the results were presented in Table 4.

Table 4 Pearson correlation between research variables and post-test in experiment group

In Table 4, among the three basic ‘learning behaviors’ variables (i.e., ‘Number of translations’, ‘Number of taking pictures’, and ‘Number of speaking practices’), ‘Number of speaking practices’ was found to have significant correlation with the ‘performance in the test’ (i.e., post-test score) which means the more students practiced speaking English using UEnglish, the higher learning outcome they would get. This result is obviously because students who were willing to practice more, their speaking skill could become better and, thereby, they got better learning outcomes. The other two variables (i.e., ‘Number of translations’, ‘Number of taking pictures’) showed no significant correlation because the post-test basically was designed to test the speaking skill while ‘translation’ and ‘taking pictures’ functions did not directly support the speaking practice.

In ‘performance in practice’ dimension (including ‘Speaking accuracy’, ‘Speaking Fluency’, ‘Weighted speaking accuracy’, and ‘Weighted speaking fluency’ variables), the ‘speaking accuracy’ and ‘weighted speaking accuracy’ showed significant correlations to the post-test score which means that the more students speak accurately, the higher ‘performance in the test’ they got (Table 4). It was because of the UEnglish’s instant feedback, students were informed individually on how much they speak accurately, and which words they should improve the speaking in a proper way. Consequently, their English speaking performance improved effectively and efficiently. Intriguingly, we found no significant correlation in ‘speaking fluency’ to post-test scores (Table 4). The reason behind this phenomenon was that, ‘speaking fluency’ is a completely difficult English speaking skill compared to ‘speaking accuracy’ which considers not only the speaking accuracy but also the speaking rate and prosody. Therefore, UEnglish only gave ‘speaking fluency’ scores once students got 100 points for ‘speaking accuracy’ and there was a significant correlation between ‘speaking fluency’ and ‘speaking accuracy’ (Table 4). However, a deeper investigation on the correlation of ‘speaking fluency’ and the three components of ‘performance in the test’ (i.e., ROW, DCP, and DAP) revealed that, ‘speaking fluency’ still got significant correlation in term of DCP and DAP (Table 4) which were more difficult comparing to ROW. It means that students who were able to speak English more fluently using UEnglish could conquer more difficult speaking tasks (like DCP and DAP). Although there was no significant correlation in ‘speaking fluency’ to post-test scores, the ‘weighted speaking fluency’ still showed significant correlation with post-test scores (Table 4). This was because the ‘weighted speaking fluency’ variable considered not only the fluency but also the complexity and the quality of the speaking. The more complex sentences students tried, the higher post-test scores they got. For instance, although having the same number of practices, the students who practiced English speaking in varied contents would get higher score in ‘weighted speaking fluency’ compared to the students who were only repeat speaking in one content again and again. It means that students who were willing to practice more diverse speaking contents would get significantly higher scores in post-test. Considering more advanced variables like ‘weighted speaking fluency’, UEnglish manifested its helpfulness more perspicuously. Through the assistance of UEnglish, students tried to say more complicated sentences and practice again and again until they were satisfied. One student stated that: “Through the functions of feedback and scoring system, I would like to try again and again to revise my English speaking.” The other student mentioned that the speaking scores which showed on UEnglish made them practiced more: “When I see my friends get much higher scores than me, I try my best to catch her/him up.”

Considering the correlation among the basic ‘learning behaviors’ variables and the ‘performance in practice’ variables, ‘speaking accuracy’ was found to have significant correlations with ‘Number of speaking practices’, ‘weighted speaking accuracy’, ‘fluency’ and ‘weighted speaking fluency’ (Tables 4). As UEnglish provided self-regulated tools to practice speaking, students could shadow and mimic the sound of the MT system and get instant feedback via ASR and scoring system. With UEnglish support, students were oriented to make longer sentences while speaking English and also improve their speaking fluency. While doing speaking practice, students not only learned how to express their thoughts in English by instant translation but also got immediate feedback to revise their pronunciation and content of speaking. The considerable assistance of UEnglish encouraged students to try their best to get better at their English speaking gradually.

5.3 Learning Behaviors and Performance in Practice to Predict Learning Achievement

In this subsection, the Stepwise multiple regression was utilized to further investigate if the basic ‘learning behaviors’ and/or the ‘performance in practice’ predict learning achievement. The only predictor variable was found to be influencing students’ post-test scores significantly was ‘speaking accuracy’, a ‘performance in practice’ variable (Table 5). This result is reasonable because ‘speaking accuracy’ is the basis for EFL learners in speaking skill and students need to speak correctly before learning to speak fluently. When students put more effort into practicing their English speaking, they could speak more accurately and, therefore, could obtain better ‘performance in the test’ (which focuses on the speaking skill). With UEnglish support, students learned many more words related to their real life which were easier for them to remember. Moreover, the function of MT service and ASR in UEnglish gave students opportunities to do self-learning and repeat words or sentences which they were not familiar, comprehend their pronunciation, intonation, words and sentence meanings.

Table 5 Regression model summary coefficients

5.4 Students’ Perception Toward UEnglish

The reliability of SSACL questionnaire in this study was examined by Cronbach’s alpha coefficient to assess the internal consistency of the questionnaire, and Cronbach’s alpha values for SSACL was 0.908 which confirmed the reliability of the questionnaire survey. The descriptive results of the SSACL questionnaire showed that the EG students gave relatively high scores to all six dimensions: 4.44 for learning by applying, 3.68 for healthy learning, 4.274 for collaborative learning, 4.44 for creativity, 4.39 for sustainability and 4.47 for scalability. In general, it means that students were satisfied with learning in authentic contexts using UEnglish. Based on the interview contents, instant feedback of UEnglish was found to be very valuable to students’ learning achievement. One student stated that: “I can not only know the scores of my English but also can know some words that I always cannot say it well and practice it over and over.” After practicing English speaking in authentic contexts, EG students expressed to be more confident in their English speaking when making sentences; one of EG students said: “I feel more confident to practice speaking with UEnglish than my teacher, I can speak again and again”. Students were also found to prefer practicing new words or pronouncing unfamiliar words with UEnglish instead of asking teachers and they could learn by themselves immediately. A EG student stated that “I can not only learn new words but also know how to pronounce them. Through UEnglish, I can practice again and again. I think it is a good way to improve my English speaking”.

The highest score for scalability might be due to the fact that UEnglish facilitated students’ self-regulated English speaking so they got familiar with English words in authentic contexts and understood English words better. Consequently, UEnglish motivated students to do more exploration and to look for more different authentic contexts to practice their English. An EG student expressed in the interview: “Whenever using UEnglish to explore the surroundings, I can learn more and more words around me in my daily life”.

Learning by applying was the second highest mean among all dimensions. This result indicated that UEnglish can facilitate students to apply the concepts of English speaking to practice their English in authentic contexts. Students can apply what they learned in authentic contexts because UEnglish gave them more opportunities to learn English related to their daily life. A student stated that: “When I compare the textbook with UEnglish, I find that UEnglish gives me more chances to learn the words around me and I think it is much more useful than only learning the words in my textbook”.

It was found that, with the aid of translation and TTS technology on UEnglish, students had more confidence in speaking more complex sentences to express their feeling or thoughts, and they could do self-regulated English speaking to revise their pronunciation on some easily mispronounced English words:

“Sometimes I don’t know how to use complex sentences to convey my feeling and thoughts, UEnglish can do me a favor immediately.”

“If I am curious about the words around me, I can look up the words immediately. I don’t have to ask my teacher; I can learn all the words by myself.”

Furthermore, from interview content, we found that the function of scoring and instant feedback played an important role in students’ English speaking. With UEnglish, students were more willing to practice English speaking in authentic contexts by themselves until they felt satisfied. It was a good way to push students to do self-regulated English learning because they could monitor their learning progression and revise immediately:

“I think it is a good way to push me to practice again and again because when I see my friends get much higher scores than me, I try my best to catch her/him up.”

“I made much progress with my English through the feedback function.”

5.5 Discussion

To this end, our analysis results align with the positive impact of UEnglish on students' English speaking skills since EG outperformed CG and their learning achievements were improved in both quality and quantity. For more specific, EG students showed proficiency in describing comic and authentic pictures. We also found that the proposed UEnglish app enhanced the abilities of EG students in terms of using more English words, constructing more complex sentences, and applying learned knowledge better to similar contexts. Finally, we consolidated the benefits of authentic contextual English learning with the UEnglish app in terms of facilitating English speaking practice related to real-life situations and improving students' confidence in speaking English actively. These findings inferred that students could practice the words they encountered in their daily life repeatedly with UEnglish and hence, got much impression of the words and sentences related to their daily life. This result is consistent with results from previous study that students should have chances to get a lot of exposure to situational materials in authentic contexts, then it is much easier for them to recall what they had learned when they encountered similar scenarios [2, 4, 26, 27].

5.6 Implications and Suggestions

The problem of facilitating EFL learning with mobile devices in authentic contexts is taken in this study due to the main drawback of traditional EFL education which often lacks connection to real-life situations that make it difficult for students to relate what they learn to their daily lives. On the other hand, the advancements of mobile technology and the potential of ASR, MT, and TTS to enhance language learning provide a strong rationale for exploring their application in facilitating English speaking in authentic contexts. Therefore, this study aims to utilize advanced mobile technology to provide students with suitable environments for practicing English speaking in authentic contexts. Consequently, the main contributions of this study to the literature lie in emphasizing the significance of learning in authentic contexts and highlighting the important role of instant feedback in facilitating English speaking practice and improving students’ speaking abilities.

From the pedagogical aspect, the results implied that the conversation function included in both the designed talk and the free talk were helpful to the students’ English-speaking ability. Flexible conversation, sustainable conversation, adaptive conversation, and learning feedback enable students to practice conversation more effectively and maintain their learning motivation. Flexible conversation allows students to practice conversations more smoothly and not be frustrated by minor errors in their sentences that prevent them from having a conversation. Sustainable conversation allows students to have conversations on the same topic, which they are more interested in than pronunciation practices. Adaptive conversation allows students to get a variety of responses in a conversation, making them feel more like practicing a conversation with a real person. Learning feedback allows students to instantly know how well they speak and helps them improve. Following are some suggestions for researchers who want to conduct experiments in this area. First of all, designing a system to be used with authentic contexts will give learners a stronger sense of interaction with the environment, which is conducive to students' memory and learning. Moreover, activities must be conducted at the appropriate time and place, in accordance with the surrounding things for conversation practice. For example, the sentence pattern in the textbook is “What do you do after class?” It is necessary to bring the students to the playground where they usually play after class, and let them look at the familiar facilities for conversation practices.

From the technology aspect, chatbots with flexible conversation, sustainable conversation, adaptive conversation, and learning feedback features will make them smarter. In addition, according to the regression analysis, the relations between pronunciation and fluency and free talk were not significant, because the free talk was a more advanced function, and it did not provide sentence patterns and words for students to reference like those in designed talk. Therefore, if researchers want to develop a conversation system for students to practice English conversation in the future, scaffolding should be considered.

6 Conclusion

In this study, UEnglish was proposed with functions of taking pictures, recording voices, ASR, MT service, TTS technology, instant scoring feedback, and correction to facilitate students’ English speaking learning in authentic contexts. In general, this study found that the use of UEnglish in authentic contextual settings had a positive impact on students' English speaking skills. The experimental group exhibited improvements in both the quality and quantity of their English speaking, and students reported that the platform facilitated their ability to practice speaking English in relation to their real-life situations. Specifically, the results of the experiment demonstrated that EG performed better than CG in both quality and quantity of English speaking when describing comic pictures and authentic pictures in contextual settings. The results also revealed that EG students were able to utilize a wider range of English words and construct more complex sentences to express their thoughts. Moreover, EG students demonstrated the ability to easily apply their learned knowledge and skills when encountering similar contexts. Finally, through the interviews, students expressed that engaging in authentic contextual English learning using UEnglish app had been beneficial for them in practicing English speaking that was relevant to their surroundings. The summary of all the results aligning with research questions was illustrated in Fig. 9.

Fig. 9
figure 9

Results summary aligning with research questions (RQ)

In this study, there were some limitations that should be improved. First of all, students could not listen to other students’ assignments or view other students’ usage and scoring rankings. If students have such opportunities to learn from other peers, it can facilitate collaborative learning, motivate students to learn more, and seize their initiative on English speaking learning. Second, in order to encourage students to say as much longer English sentences as they can and build up their confidence in English speaking actively, we did not strictly correct their grammatical errors. To overcoming students’ grammatical errors, more sentence structures and patterns to scaffold and facilitate their English speaking should be provided in the future. Third, for promoting healthier learning, the duration of learning activities should be expanded to let students have more time to explore their surroundings. Finally, there was a limitation of this study regarding the small number of participants. This limitation might lead to not enough convince to inferred the effectiveness of UEnglish. Therefore, the number of participants should be increased in the future studies. In summary, the future scope of this study involves exploring peer learning, addressing grammar errors through scaffolding, expanding learning activities, increasing sample sizes, and considering the integration of additional features to further enhance the effectiveness and usability of UEnglish for English speaking learning in authentic contexts.