Research on auxiliary training system of oral English pronunciation based on data extraction

Qin, Xiaomei

doi:10.1007/s42452-023-05306-x

Research on auxiliary training system of oral English pronunciation based on data extraction

Research Article
Open access
Published: 17 February 2023

Volume 5, article number 84, (2023)
Cite this article

Download PDF

You have full access to this open access article

SN Applied Sciences Aims and scope Submit manuscript

Research on auxiliary training system of oral English pronunciation based on data extraction

Download PDF

Xiaomei Qin¹

1180 Accesses
Explore all metrics

Abstract

Due to the limitations of the language environment and other aspects, students generally have problems such as inaccurate English pronunciation in oral English learning. To solve the English pronunciation problem, an English oral pronunciation auxiliary training system based on data extraction is designed. With the help of the B/S three-tier structure, the system designs the overall framework of the system, including the data layer, business logic layer, and display layer. In the hardware design of the system, the collection, transmission, and processing of oral English pronunciation audio are realized with the help of pickup, audio Bluetooth chip, and programmable logic controller. Three business processing programs are designed, including a pronunciation audio input processing program, pronunciation audio data extraction program, and pronunciation auxiliary training program, to correct oral English pronunciation. The experimental results show that the system can improve students' oral English pronunciation and improve students' oral learning performance.

RETRACTED ARTICLE: Research on pronunciation accuracy detection of English Chinese consecutive interpretation in English intelligent speech translation terminal

Article 30 April 2021

The Effect of ASR Apps on Monophthong Pronunciation Improvement and Generalization to New Words in English

Development English Pronunciation Practicing System Based on Speech Recognition

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the continuous improvement of education level, China pays more and more attention to the cultivation of talents. In order to cultivate diverse talents to meet the needs of society, schools have offered various courses for students. Among them, English is one of the basic courses that need to be learned from primary school to university in China's education, and it is also one of the key projects that modern students need to investigate in job hunting and employment [1]. At the same time, English is also a language with the characteristics of internationalization and universality. In this context, all schools in China attach great importance to students' English learning. Compared with the latter, students generally have some problems in the former. This is due to the lack of English language environment in China. There are fewer opportunities to communicate in English, which leads to students' non-standard English pronunciation.

In the process of English communication, English pronunciation problems will lead to the inability to accurately express their ideas to each other, resulting in poor communication. For this reason, primary and secondary schools have begun to attach importance to the cultivation of students' English pronunciation learning, that is to say, while cultivating English language writing ability, they also attach importance to the cultivation of English language expression ability. On this basis, they also provide students with a good English language expression environment. At the same time, oral English pronunciation practice is a long-term task. At present, in the process of training, in view of students' oral English pronunciation problems, teachers mainly use the way of teaching through their own learning and long-term accumulation. They borrow their own knowledge reserves and learning ability to impart knowledge to students. But due to the deviation of different teachers' ability, knowledge reserve and language expression ability, and the inconsistency of students' understanding of knowledge, the effect of English pronunciation training is not ideal, which is mainly due to the lack of a complete set of standardized training methods.

Faced with this situation, along with the development and popularization of information technology, information technology has been well applied in many fields. In order to better help students improve oral English pronunciation and the quality of oral English, we need to use information technology to help students develop oral English pronunciation training methods and establish a unified and standardized training mode. Therefore, how to help students carry out oral English pronunciation training has become an urgent problem to be solved in universities.

Aiming at the research on the application of informatization in the auxiliary training of spoken English pronunciation, the collection, transmission and processing of spoken English pronunciation audio in the early stage can be realized by means of pickups, audio Bluetooth chips, programmable logic controllers and so on. They can all help students obtain available standardized data for spoken English pronunciation. The pronunciation audio is input into a processing program. Besides, a pronunciation audio data extraction program and a pronunciation auxiliary training program are also used to realize the application of the information technology in the oral English pronunciation auxiliary training and to form a standardized and targeted oral English pronunciation training mode.

2 Research status

With the development of information technology, some researchers in the field of education have applied information technology to oral English pronunciation training. Some researchers have collected teacher training samples and established a database of oral English pronunciation training. Students can extract pronunciation training samples from the database through the pronunciation content they need to train so as to improve the effect of oral English [2]. However, this method of oral English training is lack of unified standards, because the training sample data of different teachers are inconsistent. It is easy to form the problem of non-standard English pronunciation of database samples. There are also researchers who use BP neural network algorithm to establish the forgetting curve of oral English training. Through the analysis of the process of students' oral English training, we can get the cycle of English oral training and carry out effective training according to the key training cycle in the forgetting curve points [3]. This method only studies the process of training. The standardization of oral English pronunciation training has not been studied. Other researchers use sensor equipment to collect samples of spoken English pronunciation. Then, they screen the collected samples of spoken English pronunciation, filter out some samples with poor pronunciation effect, retain available samples with standard pronunciation effect, and then use the sample matching method to match different students in order to recommend the appropriate learning level and method [4]. But this method is more likely to match the appropriate English pronunciation samples. But there is no good study on the training and the training process.

There are also some researchers who aim at the establishment of spoken English pronunciation system. Some researchers have established the evaluation index system of spoken English pronunciation. They also made a questionnaire and entered the evaluation scoring table in the system. When students conduct the questionnaire survey according to the requirements of the system, the system will automatically give the defect level of spoken English pronunciation and give some suggestions for improvement [5]. However, this method only uses the system to realize the calculation of evaluation scores. It does not effectively analyze the training of spoken English pronunciation. There are also researchers who establish a system matching pronunciation comparison method [6]. According to the matching algorithm, they compare and analyze the pronunciation data input by the system with the pronunciation data stored in history. Then, they output the similar pronunciation data and the problems existing in the pronunciation through using the case-based reasoning technology and the matching algorithm. Therefore, the system has not been widely used.

According to the research results of oral English pronunciation training, the current research in this field mainly focuses on the establishment of pronunciation database and the evaluation of pronunciation defects through questionnaires. The use of the system is only a simple calculation of the evaluation level of pronunciation defects, which has certain limitations. Therefore, in view of the above problems, this study establishes a "data layer-logic layer-display layer" system, in which students' oral English pronunciation is recorded. Then compare with accurate pronunciation, adopt suprasegmental processing, score according to the comparison results, and finally give some correction suggestions. Realize the online automatic evaluation and correction of English speech. At the same time, this study is based on the transformation and innovation of data extraction algorithm to realize the scientific training of spoken English pronunciation. Through the design of hardware and software of the system, the application of the system in oral English pronunciation training is completed.

3 Design of English spoken pronunciation auxiliary training system based on data extraction

The fundamental function of English as a language is to communicate with others [7]. The purpose of language use and the accuracy of pronunciation are very important in the language system, but Chinese students generally have the problem of inaccurate pronunciation in English learning. To solve this problem, this paper designs an aided training system of spoken English pronunciation based on the data extraction. The system can record pronunciation, extract pronunciation characteristics, and then judge whether the pronunciation is accurate. Then, it gives pronunciation scores and points out the mistakes.

3.1 System framework design

The system framework is the overall structure of the system design, which provides guidance and reference for the whole design. The framework of this system is based on the three-layer framework of B/S three-layer structure design system. The advantage of B/S three-layer structure is that it can simplify the complexity of client design and reduce the network load. It is easy to maintain and upgrade [8]. The system framework design includes data layer, business logic layer and display layer.

1.
Data layer: The function of the data layer is to input the spoken English pronunciation of the user. Then it forms operable and accessible data and provides data services for the other two layers [9].
2.
Business logic layer: The function of the business logic layer is to design and run various business logics for various functional modules to deal with various problems [10].
3.
Display layer: The display layer is used to display the operation results of business logic and provide the window for user operation [11].

3.2 System hardware design

3.2.1 English pronunciation input equipment

The purpose of oral English pronunciation training is to compare with accurate pronunciation to judge whether the user's pronunciation is accurate so as to correct the user's pronunciation [12]. The key of the training system is to use the recording device to collect the user's pronunciation. The spoken English recording device in this system is a low-noise pickup, which is composed of a microphone and an audio amplification circuit. It uses a high-fidelity low-noise processing chip to effectively suppress environmental noise through multiple frequency selection. It has a built-in automatic gain control (AGC) circuit to ensure the pure tone of the recording. The technical parameters of the equipment are shown in Table 1.

Table 1 Pickup technical parameters

Full size table

3.2.2 Audio Bluetooth chip

The functions of the audio Bluetooth chip include two aspects: one is to transmit the recorded and processed spoken English pronunciation audio signal to the auxiliary training center for further processing and analysis; the other is to output the voice command of the auxiliary training center and transmit the correct spoken English pronunciation audio [13]. The audio Bluetooth chip in this system is CX950B, which is mainly used for short-distance audio signal transmission. It can be easily connected with notebook computers, mobile phones, PDAs and other devices to achieve wireless transmission of audio signals. The specific parameters of the chip are shown in Table 2:

Table 2 Audio bluetooth chip working parameters

Full size table

3.2.3 Programmable logic controller

In the auxiliary training system of spoken English pronunciation, all kinds of business logic operations are needed. Therefore, the programmable logic controller is an important hardware in the system design. It uses a programmable memory, which is responsible for data acquisition and extraction, self-diagnosis, control execution, external communication and external output functions [14]. The performance characteristics of the PLC in this system are as follows:

1.
With 32-bit 2001-2421 CPU, it has faster speed and higher performance, supports basic control instructions, and can better meet diversified control requirements;
2.
CPU, I/O signal, communication network and power supply all adopt isolation protection measures;
3.
A miniature embedded real-time multi-task operating system is adopted to support multi-task distribution and reasonably use CPU resources;
4.
Open network: The 100 m Ethernet, multi-serial communication, MODBUS support and custom protocol can realize wireless data transmission with wireless terminal equipment.

3.3 System software design

The business logic program mainly analyzes the operation process of each functional module in the system. It completes the whole program of oral English pronunciation training in a logic-driven way [15]. The service logic program in the system comprises a pronunciation audio input processing program, a pronunciation audio data extraction program and a pronunciation auxiliary training program.

3.3.1 Pronunciation audio input processor

The pronunciation audio input processing program is the first important program to be run after the user logs in. The program includes the logic operation of the whole early stage from the input of spoken English pronunciation to the processing and then to the sending of audio files. The program is shown in Fig. 1 below.

3.3.2 Pronunciation audio data extraction program

The main reason for the poor effect of the previous oral English pronunciation training system is that the system can not effectively compare the differences between standard pronunciation and used pronunciation, which results that users can not fully understand their pronunciation errors. Therefore, in order to facilitate subsequent pronunciation comparison, the pronunciation audio data extraction program is the core program of the system [16]. The specific procedures are as follows:

Step1: Receive and decode the spoken English pronunciation audio file;
Step2: Segment the Oral English Pronunciation Audio Syllables
Step3: Extract audio features, which includes the pitch period, Mel cepstrum coefficient and formant frequency.

1) Pitch period. The pitch period characteristic parameters are extracted by the autocorrelation function. The extraction formula is as follows:

$$f_{i} \left( k \right)^{1} = \sum\limits_{t = 1}^{N} {x_{i} \left( t \right)x_{i} \left( {t + k} \right)}$$

(1)

$f_{i} \left( k \right)^{1}$ represents the pitch period of the $i$ audio signal; $k$ represents the amount of time delay;$N$ represents the frame length; $x_{i} \left( m \right)$ represents the audio signal of spoken English pronunciation; $t$ represents time.

2) Meyer cepstrum coefficient. The meir cepstrum coefficients of audio were extracted by Meir filter [17]. The extraction formula is as follows:

$$f_{i} \left( k \right)^{2} = \sum\limits_{i = 0}^{M} G \left( i \right) \cdot \sin \left( {\frac{n\pi }{M}} \right)$$

(2)

$f_{i} \left( k \right)^{2}$ represents the Meir cepstrum coefficient of the $i$ audio signal; $G\left( i \right)$ represents the logarithmic energy output by the first Mayer filter; $M$ represents the number of Meyer filters; $n$ represents the order of parameters.

3) Formant frequency. Using the linear predictive coding method to extract the frequency of audio formant [18]. The extraction formula is as follows:

$$f_{i} \left( k \right)^{3} = \frac{{F\left( {L_{i} } \right)}}{2\pi } \cdot T$$

(3)

$f_{i} \left( k \right)^{3}$ represents the formant frequency; $T$ represents the signal sampling period; $F$ represents the prediction error filter; $L_{i}$ represents the bandwidth of the $i$ audio signal.

During the research process, the first three formant peaks of each frame of audio signal can be connected together to form formant locus.

Step4: Do the multi-feature fusion and normalize the processing.

The above step is the pronunciation audio data extraction program. After the extraction, proceed to the next procedure.

3.3.3 Pronunciation aid training program

A pronunciation auxiliary training program is executed by taking that extracted pronunciation audio characteristic parameters as input. The program process is as follow:

Step 1: Input audio characteristic parameters of spoken English pronunciation; Step 2: Extract the characteristic parameters of the standard spoken English pronunciation audio;

Step 3: Calculate the similarity between the actual pronunciation audio features of the user and the standard spoken English pronunciation audio features according to a distance formula. The calculation formula is as follows:

$$d\left( {f_{i} ,f_{i}^{^{\prime}} } \right) = \frac{{\sqrt {\sum\limits_{i = 1}^{N} {\left( {f_{i} - f_{i}^{^{\prime}} } \right)^{2} } } }}{N}$$

(4)

$d\left( {f_{i} ,f_{i}^{^{\prime}} } \right)$ represents the similarity between the user's actual pronunciation audio features $f_{i}$ and the standard spoken English pronunciation audio features $f_{i}^{^{\prime}}$; $N$ represents the frame length.

Step 4: calculate the corresponding pronunciation score according to the characteristic distance. The calculation formula is as follows:

$$S = 1 + \frac{100}{{\delta \cdot d^{\gamma } }}$$

(5)

$S$ represents the pronunciation score; $\delta$ and $\gamma$ are constant parameters. The value ranges are from 0 to 1: $\delta + \gamma = 1$.

Step5: Calculate the comprehensive score of the user's oral English pronunciation and audio. The calculation formula is as follows:

$$S^{\prime} = w_{1} S_{1} + w_{2} S_{2} + w_{3} S_{3}$$

(6)

$S^{\prime}$ represents the comprehensive score; $w_{1}$,$w_{2}$, and $w_{3}$ represent the value of representation; $S_{1}$, $S_{2}$ and $S_{3}$ represent the pronunciation fraction corresponding to pitch period, Meyer cepstrum coefficient and formant frequency.

Step6: judge whether the user's pronunciation is qualified according to the comprehensive score, as shown in Table 3 below:

Table 3 Standard pronunciation scale

Full size table

Step 7: Further determine the types of pronunciation errors according to the SVM classifier, and mine the error rules.

Step 8: Give the corresponding report of pronunciation standard;

Step 9: According to the report, the user identifies the pronunciation errors and performs corrective exercises corresponding to the standard pronunciation.

Repeat the above process of recording, scoring and correcting until the pronunciation reaches the qualified standard or above to complete the whole pronunciation assistant training process [16].

4 System implementation and testing

The key of oral English pronunciation training is to find out the user's pronunciation errors accurately and correct them. In the experimental analysis, the samples are trained by collecting test data. The characteristics of the data are analyzed. The results of the system are tested.

4.1 System test sample collection

Twenty students were taken as test samples. 20 oral English pronunciation samples with the same test content (see Fig. 3) were collected in the environment as shown in Fig. 2.

4.2 Training sample

The user's oral English pronunciation training sample of SVM classifier is selected from the TIMIT speech database. The training sample set is shown in Fig. 4 below.

4.3 Data characteristics

The pronunciation audio data extraction program is executed to extract three features of the audio data, namely a pitch period, a Mel cepstrum coefficient and a formant frequency. Take the standard spoken English pronunciation sample as an example. The three data characteristics are shown in Fig. 5.

4.4 System test results

Pronunciation training program was carried out to calculate the distance between the audio features of 20 students' spoken English pronunciation and the standard pronunciation. The score was calculated. Then, the SVM classifier was used to further classify the types of wrong pronunciation and mine the error rules. The results are shown in Table4:

Table 4 System application test results

Full size table

It can be seen from Table4 that under the application of the system, the oral English pronunciation of 12 of the 20 students is above the qualified line. Among them, 3 are excellent, 5 are good, 4 are qualified, and the remaining 8 students are unqualified. They are with common problems of stress errors and light reading errors.

5 Conclusion

Students in China generally have the problem of inaccurate English pronunciation, which leads to difficulties in English communication. In order to improve students' oral English pronunciation level, this paper designs an auxiliary training system of oral English pronunciation based on data extraction. The conclusions are as follows:

1.
The system proposed in this study picks up students' pronunciation audio. Compare and analyse the system with standard pronunciation. Then determine the inaccuracy of pronunciation and give corrective suggestions.
2.
The system proposed in this study has been repeatedly recorded and corrected to achieve the purpose of training. After testing, the auxiliary training function of the system is good and can meet the system design objectives.

Data availability statement

Data will be made available on reasonable request.

References

Jiang Z (2021) Spoken English assessment using confused phoneme assessment model. Mob Inf Syst 2021:1–10
Google Scholar
Biber D (2000) Longman grammar of spoken and written English. TESOL Q 34(4):787–788
Article Google Scholar
Brown G, Yule G, Mckelvie N (2000) Teaching the spoken language. RELC J 17(1):97–99
Article Google Scholar
Fangfang X, Chun Z (2020) Critical thinking-oriented college oral English teaching reform. In: International forum of teaching and studies. American Scholars Press Inc
Dong J (2020) Study on ways to improve oral English of freshmen. Sci Technol Vis 7:42–46
Google Scholar
Jothilakshmi S, Ramalingam V, Palanivel S (2012) A hierarchical language identification system for Indian languages. Digital Signal Process 22(3):544–553
Article MathSciNet Google Scholar
Weiss EH (2005) The elements of international English style: a guide to writing correspondence, reports, technical documents, and internet pages for a global audience[J]. m.e.sharpe 52(4):478–478
Google Scholar
Adachi C, Tokito S, Tsutsui T et al (2014) Electroluminescence in organic films with three-layer structure. Jpn J Appl Phys 27(Part 2, No. 2):L269–L271
Google Scholar
Greenstein S (2014) Baking the data layer. IEEE Micro 34(4):56–57
Article Google Scholar
Lee S (2008) Collocation and collation of business logic for web application development. J Comput Inform Syst 49:57–66
Google Scholar
Chen Y, Setiadi D, Li H, et al. (2011) Generic non-volatile service layer: US, US7966581 B2[P]
Camastra F, Vinciarelli A (2008) Audio acquisition, representation and storage. Springer, London, pp 13–50. https://doi.org/10.1007/978-1-84800-007-0(Chapter2)
Book MATH Google Scholar
Multer DL, Garner RE, Ridgard LA, et al (2012) Data transfer and synchronization system: US, US6694336 B1[P]
Basnight Z, Butts J, Jr JL (2013) Firmware modification attacks on programmable logic controllers. Int J Crit Infrastruct Prot 6(2):76-84
Article Google Scholar
Du R, Song L, Li NN, et al. (2009) Measurement of spoken language training, learning and testing: US, US20090204398 A1[P]
Dongge L et al (2001) Classification of general audio data for content-based retrieval—ScienceDirect. Pattern Recogn Lett 22(5):533–544
Article MATH Google Scholar
Garcia JO, Garcia C (2003) Mel-frequency cepstrum coefficients extraction from infant cry for classification of normal and pathological cry with feed-forward neural networks. In: International joint conference on neural networks. IEEE
Meredith N, Books K, Fribergs B et al (2010) Resonance frequency measurements of implant stability in vivo. A cross-sectional and longitudinal study of resonance frequency measurements on implants in the edentulous and partially dentate maxilla. Clin Oral Implants Res 8:226–233
Article Google Scholar

Download references

Funding

The study was supported by “The innovation and practice of translation teaching model under “new liberal art + information technology” (Grant No. 2021ND0605)”.

Author information

Authors and Affiliations

College of Translation Studies, Xi’an Fanyi University, Xi’an, 710100, Shaanxi, China
Xiaomei Qin

Authors

Xiaomei Qin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The article was written independently by the author.

Corresponding author

Correspondence to Xiaomei Qin.

Ethics declarations

Conflict of interest

No conflict of interest exits in the submission of this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Qin, X. Research on auxiliary training system of oral English pronunciation based on data extraction. SN Appl. Sci. 5, 84 (2023). https://doi.org/10.1007/s42452-023-05306-x

Download citation

Received: 24 October 2022
Accepted: 08 February 2023
Published: 17 February 2023
DOI: https://doi.org/10.1007/s42452-023-05306-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Research on auxiliary training system of oral English pronunciation based on data extraction

Abstract

Similar content being viewed by others

RETRACTED ARTICLE: Research on pronunciation accuracy detection of English Chinese consecutive interpretation in English intelligent speech translation terminal

The Effect of ASR Apps on Monophthong Pronunciation Improvement and Generalization to New Words in English

Development English Pronunciation Practicing System Based on Speech Recognition

1 Introduction

2 Research status