Introduction

Dialogic teaching is an approach that prioritizes interactive dialogue communications between teachers and students to jointly explore subject knowledge and address problems (Kim & Wilkinson, 2019; Mercer & Littleton, 2007). It aims to foster students’ abilities of inquiry, reasoning, and critical thinking (Mercer, 2000), which can further promote their knowledge construction, and enhance self-confidence and learning motivation (Wegerif, 2011). Discourse is a fundamental component of dialogic teaching, and its quality and effectiveness directly influence teaching outcomes and students’ learning experiences.

As high-quality classroom dialogues are crucial in modern education, improving the quality of classroom dialogue is an essential direction in classroom interaction research (Gagné & Parks, 2013; Walsh, 2013). In practice, however, there is wide variation in the quality of discourse (Howe et al., 2019). To improve the effectiveness of dialogue, one common approach is to transcribe discourse into a series of categories, which enables the use of appropriate strategies to analyse and explore deep information (Mercer, 2010; Song et al., 2021), such as the structure and interaction process of classroom dialogue. In this way, the following classroom situations can be revealed: the content of discourse, the process of teaching practice and student learning, and the interaction patterns between teachers and students. Moreover, discourse analysis can provide key information for teaching-related activities such as teachers’ self-reflection, which can ultimately enhance teaching effectiveness and revision of pedagogical strategies.

However, there are also many challenges in classroom discourse analysis research, such as the complexity of dialogue coding and modelling (Calcagni & Lago, 2018; Song et al., 2021). To address these issues, researchers have proposed several analytical methods. For instance, Flanders (1970) proposed to analyse discourse types, while the framework suffered from the problem that analysis situations are not fully summarized, and is no longer suitable for use in the context of AI technology engagement. Gu and Wang (2004) expanded the framework and created a more comprehensive dialogue analysis framework in a smart teaching environment. Moreover, Calcagni and Lago (2018) have summarized the essential principles of classroom dialogue in combination with existing discourse classification models. The above analysis of classroom dialogue relies on manual coding, which is inefficient and cannot be applied in large-data case (Blanchard et al., 2015). In this way, teachers cannot get timely feedback on the discourse situation, which hinders the quality improvement of classroom dialogue (Blanchard et al., 2015; Wang et al., 2014).

The emergence of artificial intelligence technology, with its core technologies such as deep learning and natural language processing, has brought new methods for classroom dialogue analysis. Suresh et al. (2022) used natural language processing (NLP) technology to achieve automatic classification instead of manual coding, which significantly improved the analysis accuracy. Song et al. (2021) have proposed a deep learning-based dialogue classification model using a deep neural network approach, which can analyse classroom dialogue more comprehensively and intelligently. However, the current application of AI technology in the field of education tends to be classified as tasks, and the link between technology and teaching is not explicitly explored. With the gradual rise of the concept of Artificial Intelligence Generated Content (AIGC) (Brown et al., 2020), the large language model can automatically generate dialogues according to contextual information. By using this technology, teachers can receive timely feedback and make adjustments to pedagogical strategies, which has great research value.

This paper is exploratory on the systematic design of technology-supported analytical frameworks for classroom dialogue. A novel integrated framework is proposed to achieve discourse analysis automatically, which combines both pedagogical theory and information technology. The proposed framework includes two main components: a dialogue-oriented interactive classroom and an artificial intelligence-powered analysis system, and the two components of the framework are connected through dialogues and feedback. We summarize key domains of the dialogue-oriented interactive classroom that consists of several essential principles. We also emphasize the importance of the interplay between technology and the classroom (Major et al., 2018), presenting a “Guide of AI” to supervise the operation of AI systems. The collected dialogue data are analysed, and the feedback is provided back to the classroom. In particular, it can be combined with AI feedback and relevant classroom essential principles to help teachers adjust their pedagogical strategies to improve the quality of teaching, forming a closed loop of the overall system. Based on the above background and purpose, this study mainly addresses the following questions:

  1. 1.

    Which domains and principles should be included in an AI-supported framework for classroom dialogue analysis? What is the role of AI and its relationships to other elements?

  2. 2.

    What effect can AI assistance have on the quality of classroom dialogue? What are the attitudes of the teachers towards the AI’s feedback and principles in the framework?

Literature Review

In this section, we will introduce relevant concepts and recent research progress that apply to this paper. First, we will give a research review on classroom dialogue that mainly consists of theoretical foundation and several dialogic educational viewpoints. Second, coding schemes and analysis methods for dialogue are also discussed for further explanation of the subject of classroom dialogue analysis.

Research on Classroom Dialogue

This work is rooted in the growing body of literature on classroom dialogue from sociocultural and social constructivism theory perspectives. The sociocultural theory proposed by Vygotsky (Vygotsky, 2012) contains core concepts such as private speech and the zone of recent development. The theory holds that learning is seen as a process of social practice mediated by language and symbols, emphasizing learning through social and communicative processes. Language plays a key role as a tool for thinking and a mediator of activity (Hirtle, 1996; Mercer, 2000; Vygotsky, 2012); therefore, the quality of spoken interactions is critical (Mercer and Littleton, 2007). Social constructivism is a significant branch of constructivism influenced by Gergen and Vygotsky (Hirtle, 1996; Jaramillo, 1996; Vygotsky, 2012), which integrates perspectives on knowledge, learning, language, and teaching. Social constructivism holds that knowledge is “co-constructed” by learning subjects, and discourse serves as the basis of knowledge (Hirtle, 1996; Jaramillo, 1996), allowing teachers and students to acquire, store, express and transfer knowledge through language communication.

The viewpoints and empirical research of classroom dialogue are mainly based on sociocultural theory and social constructivism theory, which explore the elements involved in classroom dialogue from different perspectives. Instead of imparting fixed knowledge, classroom dialogue focuses on the importance of the teacher’s words and valuing students’ free expression. It aims to create an environment of positive and collaborative dialogue, forming a community that promotes students’ free expression and understanding (Cazden, 1988).

The research of “Cam Talk” is about working with teachers to open up their dialogue so that students can build knowledge together rather than having the teacher impose content on them (Higham et al., 2014). Dialogic teaching proposes five major principles: collective, reciprocal, supportive, cumulative and purposeful. These five principles emphasize the power of dialogue to develop students’ thinking, learning and understanding abilities (Alexander, 2008). “Accountable talks” argues that teacher-student dialogue involves learning community, rigorous thinking and accurate knowledge (Michaels et al., 2008). Wolf et al. holds that dialogic classroom interaction, specifically fostering accountable talk, is positively correlated with reading comprehension (Wolf et al., 2005). “Thinking together” argues that engaging students in collective thinking can effectively promote students’ thinking skills, and a sense of shared participation (Mercer & Littleton, 2007). The “Ground rule” of “Exploratory talk” is for participants to pool ideas, opinions and information, and think aloud together to create new meanings, knowledge and understanding (Edwards and Mercer, 2013). Research has also shown that exploratory talk improves students’ individual reasoning and group thinking skills (Mercer, 2008). However, teachers’ awareness of how to have productive dialogue is limited (Nystrand et al., 2003). The study found that only 19% of teachers had a medium or strong understanding of dialogic teaching, despite having participated in teacher professional development (TPD) sessions with this focus (Hennessy et al., 2018). Therefore, some studies have proposed the procedure of the “Dialogic video cycle”, in which teachers use video as a reflection tool to improve classroom dialogue and enhance professional skill (Gröschner et al., 2015). It is clear that improving the quality of classroom dialogue and promoting teachers’ professional development is an urgent issue in the field of classroom dialogue.

Coding Scheme and Analysis Methods

There are several coding schemes for classroom dialogue, which mainly include the most widely used Flanders Interaction Analysis System (Flanders, 1970), and the ITIAS coding scheme that proposed technology category to improve the FIAS scale (Gu and Wang, 2004). There is also an in-depth analysis of the impact of classroom interactive dialogue on students’ thinking development and knowledge construction (Zhao and Chan, 2014). We list several classical and recent coding schemes for classroom dialogue analysis as in Table 1. In this paper, we use China’s elementary school mathematics classroom teachers’ discourse analysis framework as the coding scheme (Huang et al., 2021). The development of the scheme has been reviewed by literature and interviewed by experts, and has withstood the strict test in terms of reliability and validity. This scheme divides teacher discourse into five main categories: Question, Statement, Feedback, Transition, and Management. And the further subdivision of Question and Feedback categories enhances the effectiveness of classroom observation and analysis. Compared with other available frameworks, it mainly focuses on the background of teacher–student interaction and exhibits a differentiable classification of various dialogue categories, which can be better recognized by AI technology. Notably, if the verification of this framework is proven effective, it can be generalized to other tasks to apply artificial intelligence technology to dialogue teaching.

Table 1 Brief accounts for several coding schemes

Early analysis of classroom dialogue is often based on manual hand-coding of scheme (Flanders, 1970). However, this method can be time consuming, leading to longer waiting periods for teachers to receive feedback on their actual performance in dialogue teaching. Comparatively, automatic classification can be conducted in a faster and more cost-effective way (Wang et al., 2014). With the emergence of deep learning (LeCun et al., 2015) and natural language processing (Hirschberg & Manning, 2015), technology is used to automate the analysis of dialogue content, such as hierarchical speech-act classification (Kang et al., 2013) and automatic systems for discourse processing (Wang et al., 2014). It is also possible to use technical analysis to explore deeper information of dialogues (Zhao & Chan, 2014), such as lagged sequence analysis (Yang et al., 2018) and sequence pattern mining methods (Boeheim et al., 2021). However, the current application of AI technology in the field of education tends to be the automation of discourse analysis and does not explore the connection between technology and teaching. By introducing artificial intelligence technology into the classroom dialogue analysis system, teachers can get timely feedback to improve quality and efficiency, which is of great significance to the progress of education.

Fig. 1
figure 1

The architecture of the proposed framework, which mainly consists of two components, the dialogue-oriented interactive classroom and the AI-powered analysis system. Dialogues and feedback connect the mentioned two components and drive them into more positive developments

Framework

The current research still faces some challenges, including the lack of a comprehensive overview of classroom dialogue analysis tools, delays in manual coding feedback, and limited integration of artificial intelligence technology with education and teaching. In response to these challenges, this paper presents a comprehensive framework that combines pedagogical theory with information technology to automate discourse analysis. By exploring a technical support analysis framework for classroom dialogue, this approach aims to assist teachers in enhancing their ability to manage classroom discourse and providing timely feedback for teaching, ultimately improving the overall quality of dialogue in the classroom.

The proposed framework for classroom dialogue analysis, which consists of two components: a dialogue-oriented interactive classroom and an artificial intelligence-powered analysis system, as shown in Fig. 1. In the dialogue-oriented interactive classroom, the four domains interact and rely on each other to promote high-quality dialogue. Dialogue-oriented interactive classroom connects to the AI-powered analysis system through classroom dialogue and feedback. Dialogue is fed into the AI-powered analysis system, which performs intelligent analysis of the discourse and gives instructional feedback under the guidance of an “Guide of AI” and then reapplies it to the classroom. This loop is repeated to promote classroom dialogue iteratively. The framework does not isolate the various domains, as they are interdependent and interact with each other.

Fig. 2
figure 2

Illustration of the proposed dialogue-oriented interactive classroom, in which essential principles of each domain are presented

Dialogue-Oriented Interactive Classroom

We have summarized several essential principles that should be included in the dialogue-oriented interactive classroom, as shown in Fig. 2. Environment, community and teaching–learning are three key domains for improving the quality of dialogue, which are placed at the top of the figure. And “Guide of AI”, another important domain, particularly for the AI system, is proposed for a better connection with the following AI system.

It is worth noting that the smart classroom equipped with technology can enhance the diversity of classroom interaction and improve the quality of dialogue. Intelligent teaching provides the interactive whiteboard, microphones and other equipment, and also supports activities such as question and voting (Dai, 2019; Saini & Goel, 2019). Based on the collected data, intelligent analysis is carried out and feedback reports generated by the AI system are returned to the class.

Environment

An open, equitable and supportive classroom environment is the necessary foundation of teaching and learning, as it provides the soil in which classroom dialogues are germinated and nurtured (Cui & Teo, 2021; Lefstein & Snell, 2013). It mainly includes three main aspects: To create opportunities for students to freely exchange information and express their ideas (Bouhnik & Deshen, 2014), to respect the ideas of others (Pianta et al., 2008) and to be able to respond positively and praise the ideas of others (Michaels et al., 2008).

Community

From a Socio-cultural perspective, community is an important way for teachers and students to integrate into the classroom (Sánchez et al., 2013). Building a teacher–student community mainly includes three steps. Firstly, a collaborative relationship should be established to jointly accomplish learning tasks (Barron, 2000; Looi et al., 2010). Secondly, a sense of community is also should be developed, which focuses on different discursive identities (Cobb & Hodge, 2011; Hufferd-Ackles et al., 2004). Finally, interactive dialogue should be an important mediator for sharing and negotiating ideas.

Teaching–Learning

The generation of dialogue in the interactive classroom is a process in which the community participates and collaborates to complete teaching–learning. Exploring dialogue through teaching–learning (Mercer, 1996) achieves the goal of jointly constructing and consistent understanding of knowledge (Lossman & So, 2010). The teaching–learning process needs reasonable teaching objectives and rich teaching activities to promote the effectiveness of dialogue teaching (Cazden, 1988; Lefstein & Snell, 2013). On the other hand, classroom assessments can more clearly reflect the quality of the classroom.

Guide of AI

In recent dialogue analysis studies, especially DNN-based methods using data-driven strategies (Song et al., 2021), have significantly improved the accuracy of classification systems, while having a limited correlation with education principles and methods of pedagogy. In order to better leverage AI for classroom dialogue analysis, strong collaboration among technologists, designers, teachers, researchers and students is necessary to fully explore the potential of digital tools to interact with classroom dialogues (Major et al., 2018). Therefore, we present the “Guide of AI”, the conception of the union of the above-mentioned people in education and technology. It offers proper guidance to the intelligent analysis of the AI model for the classroom, greatly enhancing the usefulness and reliability of the technology. It receives professional knowledge from experts, and requirements from teachers. And it will use this information to guide the operation of the AI system. In the proposed framework, teachers can act as “Guide of AI”, proposing their requirements to the AI system. More specifically, the training of AI system relies on coding schemes defined by education experts or teachers.

Fig. 3
figure 3

Workflow of AI-powered analysis system. The dialogues received from the classroom are analysed by LLM-enabled DNN model, generating statistical feedback. The whole process is under the control of “Guide of AI”

AI-Powered Analysis System

The emergence of artificial intelligence has shown its promising ability for information processing tasks, as well in the education field (Langley, 2019; Sapci & Sapci, 2020). Generally, AI-based methods can greatly reduce the cost of manual labelling and improve the accuracy of system identification (Alam, 2021; Chen et al., 2020). Therefore, we propose to use an AI-powered analysis system to efficiently process the dialogue data received from classrooms and give statistical feedback. In this section, we will introduce the workflow of an AI-powered system and the main logic of how it works.

Workflow

The main workflow of the proposed AI-powered analysis system is shown in Fig. 3. There are two inputs from the interactive classroom, guidance and dialogue data. The guidance is passed into the central control system that is used to release orders to each component of the system. We use the deep neural network (DNN) method to further process the dialogue data (LeCun et al., 2015). Basically, DNN is a network-like model containing a large number of neurons for imitating people’s learning behaviour (LeCun et al., 2015). The data processed by the DNN model can be passed forward with two main routes.

The dotted line refers to the training route of the DNN-based discourse analysis model. Before the deployment of the DNN model, it should be trained under the dataset of related fields to learn the process of pattern recognition. In our case, the training data are the processed discourse text. The training process can use the classification criterion under the guidance of the specific classroom. As the solid line shown in the AI system of the Fig. 3, When training is finished, the trained model with fixed parameters can automatically infer the real-world discourse data. Finally, the analysed dialogues are summarized statistically to provide feedback to the classroom teachers. The timely feedback provided can iteratively improve the quality of the dialogue and enhance teacher professionalism. In particular, in deep learning-based methods, it is essential to appropriately process data because it affects the training of the model. In general, the preparation of data can be summarized in two steps, data collection and preprocessing. After data processing, a corpus of classroom dialogue is constructed for the following intelligent analysis. The details of the dataset used in this study will be further discussed in the next chapter.

Fig. 4
figure 4

Architecture of the LLM supported DNN method for intelligence dialogue analysis

Large Language Model Supported DNN Method

In the natural language processing field, several large language models (LLM) have shown promising performance for text processing tasks, such as Bert (Kenton & Toutanova, 2019), GPT (Radford et al., 2018), ERNIE (Zhang et al., 2019) etc. In these methods, LLMs are models that are trained under thousands of language materials, which usually have a strong ability for language understanding and analysing. Therefore, we propose to use the LLM-based DNN method for our logical implementation of the AI system, as shown in Fig. 4.

The input text is first fed into the LLM to obtain hidden features, in which text context is comprehensively extracted. The fine-tuning backbone is used to extract target information for discourse analysis as the downstream task, which is commonly processed by a fully connected (FC) layer and a softmax function to obtain the posterior probabilities for all dialogue categories. “Fine-tuning” is an AI-related term that is defined as a process to train neural networks on down-streaming tasks (classroom dialogue classification in our case). “FC layer” and “softmax” are components of neural networks used for the final prediction. The training of the LLM is frozen, and the fine-tuning part should be trained using dialogue data. In our case, with the quality of the classroom discourse improves when receiving feedback, the LLM is still fixed and only the fine-tuning part should be updated to adapt to the new dialogues. The total parameters of the fine-tuning part are relatively fewer so that the training of this part can be finished very fast (Chalkidis et al., 2019; Yu et al., 2019). Therefore, the iteration of the overall system covering the classroom and technology can be achieved efficiently.

Continuous Improvement by Information Circulation

It is necessary to strengthen the connection between the classroom and the AI-powered analysis system. There are two outputs that are generated from the above main components of the framework, which refer to the dialogue and feedback. These two elements can be appropriately used to further improve the quality of the classroom dialogue and the performance of the AI system, while this is not discussed in most previous studies.

Specifically, dialogues are generated from interactive classrooms and are fed into the AI system. Intelligent analysis provides systematic information about teacher–student dialogue, such as the distribution of classroom discourse categories. The feedback has close relationship with classroom essential principles. Once teachers receive intelligent feedback, they can adjust their pedagogical strategies and improve subsequent lessons timely. Meanwhile, the intelligent system continuously collects data in subsequent lessons with improved pedagogical strategies, which will be fed into the analysis system again. A positive cycle of the system has been realized to improve the quality of education. The method allows for extensive analysis and timely feedback to the classroom for teacher, thus, improving the quality of dialogue and enhancing teacher professionalism.

Methodology

To validate the above-proposed framework, an empirical research has been conducted that involves the actual teaching process, which also can serve as an application of the proposed method. We mainly investigate the effectiveness of essential principles and AI system in the framework. The details of the research method will be explained in this section.

Participants

The study employs an experimental design involving 6 pre-service teachers whose ages range from 20 to 25. All participants in the study have completed the examination to obtain the Mathematics Teachers’ Teacher Qualification Certificate, which is regarded as an important credential for educators in the Chinese education sector. Although the pre-service teachers have acquired practical teaching experience, they receive limited guidance in developing discourse skills. Before their involvement in this study, the participants possessed no prior exposure to the specific teaching materials under investigation, thereby eliminating any potential bias from prior experimentation. Upholding ethical standards, all participants provide their explicit consent by signing an informed consent form before their participation.

Design

For the content of the lessons, we select three primary school mathematics knowledge points, sharing similar difficulty levels and conceptual frameworks with the prescribed curriculum materials, which have been assessed by related primary school educators. The cohort of participants is subsequently randomized into three separate groups: Experimental Group A, Experimental Group B and Control Group C.

As shown in Table 2, three groups are each provided with essential principles for constructing a dialogue-oriented interactive classroom. Each participant is involved in a series of three lessons, each of which is focused on designated knowledge topics. Experimental Group A receives informative feedback from an AI-powered analysis system following the completion of lesson 1 and lesson 2. Participants in Experimental Group B receive feedback from the AI system only after lesson 1. In contrast, Control Group C exclusively benefits from the essential principles of cultivating a dialogue-oriented interactive classroom before each teaching preparation, devoid of any feedback.

Table 2 Experimental configuration of AI feedback for the teaching process

Research Procedure

The study follows the steps of Teaching Preparation, Teaching and Revision, and Questionnaires. The focus of the research is to examine whether essential principles and feedback from the AI system improve teacher dialogue quality. The main steps are listed as follows:

  1. 1.

    The participants prepare their teaching based on the curriculum materials and essential principles, and the duration of the preparation period is standardized for all groups.

  2. 2.

    Teachers conduct teaching activities. The groups involved in the interaction with the AI system should provide audio records after the lessons are finished, and feed the dialogue records into the AI system. Subsequently, the AI system generates analytical results and delivers feedback to teachers. Experimental Groups are provided with feedback from the AI system, which prompts teachers to revise their instruction in terms of discourse content, classroom roles, and other relevant aspects.

  3. 3.

    In order to obtain more information for analysis, a questionnaire is assigned to the participants after the three lessons. The survey includes the participants’ recognition of the provided principles and their attitudes towards the effectiveness of the feedback.

Instruments

Data

As described above, the dialogue data that should be analysed are collected from the practical teaching of 6 pre-service teachers. Except this, more dialogue data are required for the development of AI systems. This study utilizes data from China’s National Public Education Platform, an extensive collection of high-quality curriculum and classroom teaching resources designed for primary and secondary schools. Teachers voluntarily contribute to the platform by recording and sharing videos of their classroom teaching process. We select a total of 10 national-level exemplary primary school mathematics lessons for the training of AI systems. And these lessons undergo meticulous transcription, compilation and analysis, yielding a corpus of nearly 3,000 classroom dialogue data. To ensure an understanding of the dialogic context and improve the accuracy of the coding, the researchers thoroughly reviewed the lessons and coding outputs throughout the procedure.

Analytical Coding Scheme

This research aims to evaluate whether the proposed framework improves the quality of dialogue among primary school mathematics teachers. Based on the coding scheme proposed by Huang et al. (Huang et al., 2021), the analysis is carried out using the characteristics of high-quality primary school mathematics teacher discourse. The method codes discourse into five categories: Question, Statement, Feedback, Transition, and Management. The study claims that the high-quality primary school mathematics quality classroom mainly has the following characteristics:

  1. 1.

    Question category is used most often, and Management category is basically not involved.

  2. 2.

    For Question category, the use of Comprehension Question is the most common, and Mechanical Question are not involved.

  3. 3.

    Statement category avoids lengthy explanations, while Feedback category emphasizes Strong Verbal Feedback.

Based on this previous research and our AI-supported situation, we design several main variables to statistically describe the quality of classroom dialogues, which mainly consists of two main aspects. The first aspect focuses on the overall distribution of positive and interactive dialogue categories, which includes the percentage of “Question” (\(P_q\)), “Feedback” (\(P_f\)) and the sum of the two above (\(P_{q \vee f}\)). And the other aspect focuses on the detailed characteristic of the “Question” and “Feedback”. For “Question” category, we calculate the distribution of its sub-categories and use the percentage of Non-Mechanical Question (\(P_{\lnot m}\)) to measure the quality. And for “Feedback” category, we use the percentage of “Strong Verbal Feedback” (\(P_{sv}\)) to measure the quality. These five key variables were positively correlated with the quality of classroom dialogue.

AI System Feedback

The AI system is first trained by dialogue corpus mentioned in “Data” section and then can be used to give feedback to teachers based on the mentioned metrics during teaching procedure. Some example feedback is mapped them into the proposed essential principles of the dialogue-oriented interactive classroom, as shown in Table 3. The essential principles are included in the environment, community and teaching–learning domain.

Table 3 Feedback-principle mapping examples used for teacher revision

Questionnaires

A post-study questionnaire includes the following sub-scales: recognition of essential principles and attitude towards AI feedback. The questionnaire is based on a 5-point Likert scale. Scores on the scale range from strongly disagree (1) to strongly agree (5). Cronbach’s alpha coefficient is used to test the reliability of the questionnaire to ensure internal consistency and reliability of each subscale.

The first part of the questionnaire is the recognition of essential principles, which include three dimensions: environment, community and teaching–learning, and three questions for each dimension. The Cronbach’s \(\alpha\) scores of the three dimensions were 0.857, 0.701 and 0.750, respectively, indicating a high level of internal consistency and reliability (Cohen, 2013).

The second part of the questionnaire is the attitude towards the AI feedback, which has three questions. The questionnaire refers to the study design of Van et al. (Van Ginkel et al., 2020) and has been modified for our study purposes. Cronbach’s \(\alpha\) of this questionnaire is 0.909, which demonstrates its high internal consistency.

Results

The results of our empirical study are mainly from the practical teaching dialogue data and questionnaire data. First, the distribution of dialogue categories is measured across different lessons and groups, which validates the effectiveness of the AI systems in improving dialogue quality. And the two dialogue categories, Question and Feedback, are analysed in more detail. In addition, We also analyse the results of the questionnaire survey, which mainly includes the teachers’ recognition degree with the essential principles proposed in this study and the attitude of the AI system feedback. The results will be presented in the following part.

Measurement of Teaching Discourse

Overall Distribution of Discourse Categories

To examine whether teachers improve the quality of their teaching dialogue after receiving feedback from the AI system, we compare the percentage of teachers’ discourse on different groups and lessons. The results are shown in Fig. 5, and the statistical analysis of key variables mentioned in “Analytical coding scheme” section will be discussed.

Fig. 5
figure 5

Overall distribution of discourse categories

All three groups are given the essential principles that should be followed in the classroom for teaching preparation. We first analyse the distribution of discourse categories across three successive lessons. Experimental Group A receives feedback from the AI system at the end of each lesson. It can be seen that the structure of classroom discourse changes continuously with two rounds of post-feedback corrections. From a statistical point of view, the values of \(P_q\) and \(P_f\) are from 21%, 16% to 28%, 20%, and finally to 30%, 22%. There is a rise of \(P_{q \vee f}\) of 11% and 4% at lesson 2 and lesson 3, respectively. The Question and Feedback categories are continuously on the increase and the Statement category is decreased. For Experimental Group B, it receives feedback from the AI system only after lesson 1; the values of \(P_q\) and \(P_f\) are from 22%, 15% to 31%, 18%, and finally to 29%, 18%. For \(P_{q \vee f}\), there is a rise of 12% at lesson 2 while a decrease of 2%, which indicates a notable improvement after the initial round of feedback correction. Sustaining this shift proves to be challenging. The last lesson with no feedback received shows a rebound trend in which the statement category is slightly increased and Question category is slightly decreased. While it was still better than the first class, with a relatively 10% increase of \(P_{q \vee f}\). In Control Group C which does not receive any AI system feedback, the distributions of \(P_{q \vee f}\) in three successive lessons are 35%, 38% and 39%. The results show that a relatively slighter improvement compared with the mentioned two groups.

Table 4 Independent samples t-test for experimental group and control group

We also use the Independent Samples t-test to further evaluate the Experimental and Control groups after receiving feedback (the statistics of Experimental Group combine Experimental Group A and B). The results are shown in Table 4, which are based on the percentage of each discourse category. For Statement category, the t value is \(-\)2.862 and p is 0.017. For Question and Feedback category (evaluated by \(P_{q \vee f}\)), t is 4.092 and p is 0.002. The p values are both less than 0.05, demonstrating the significant difference between experimental and control groups. As for the mean value (M), the Statement category of the Experimental Group (\(M=28.77\)) was significantly lower than that of the Control Group (\(M=36.63\)), and Question and Feedback categories of the Experimental Group (\(M=48.97\)) were significantly higher than that of the Control Group (\(M=39.01\)). The Experimental Group’s deviations (SD) values are generally higher than those of the Control Group, which demonstrates the more significant variation of dialogue structure in Experimental Group.

Combining the above two main experimental results, we can find the conclusion that the introduction of AI feedback can improve the quality of classroom dialogue, resulting in a significant reduction in classroom discourse Statement, an increase in Question and Feedback, and positive changes in the structure of classroom discourse. Improvement can only be sustained when the feedback progress is retained.

Detailed Analysis of Question and Feedback

In order to further identify specific areas of improvement in discourse, categories of Question and Feedback are segmented and analysed according to the criteria outlined in the “Analytical coding scheme” section and a statistical analysis of key variables is performed, and the results are shown in Fig. 6. The Question can be subdivided into Mechanical, Memorization, Comprehension and Application Question. Similarly, the discourse used for Feedback was categorized into three levels: Weak Verbal Feedback, Strong Verbal Feedback and Repetitive Feedback.

Fig. 6
figure 6

Detailed analysis of question and feedback

We mainly analyse Experimental Group A since it has the most significant variation, as shown in Fig. 6. From lesson 1 to lesson 3, the value of \(P_{\lnot m}\) ranges from 65% to 79% and finally to 90%, with Comprehension Question being the most frequent. The function of Question shifts from Mechanical and lower order to more Application and higher order. Similarly, the Feedback transits from Weak Verbal to Strong Verbal, with the value of \(P_{sv}\) statistically ranging from 50% to 54% and finally to 59%. The research results show that teachers’ discourse patterns are constantly shifting towards high-quality characteristics, with a gradual characterization of the most Comprehensible Question and Strong Verbal Feedback.

This finding demonstrates the teachers receive feedback effectively and then make revisions to their teaching. The improvement of discourse quality is not only the overall distribution of different discourse types but also the detailed function in specific discourse categories such as Question and Feedback.

Questionnaire

The results of the questionnaire show teachers’ subjective assessment of some elements we are interested in. First, we discussed the effectiveness of essential principles and AI feedback. The mean values and standard deviations of essential principles and AI feedback are calculated to describe teachers’ overall attitudes towards these two aspects. The results are shown in Table 5. The score of Control Group towards AI feedback is not listed because teachers in this group did not experience the impact of AI feedback. It can be found that when AI feedback is equipped, teachers give higher scores to AI feedback (\(M=4.75\)) than essential principles (\(M=3.75\)), signifying a greater propensity among teachers in the Experimental Group to recognize the pivotal role played by AI system feedback to improve the quality of discourse. Without AI system, teachers also give high scores to essential principles (\(M=4.5\)), which indicates that teachers’ recognition of essential principles promotes the quality of discourse.

Table 5 Overall statistics of teachers’ attitude towards essential principles and AI feedback

We further analyse the detailed teachers’ endorsement of different essential principles, which are three classroom domains, environment, community and teaching–learning. The descriptive statistical results are shown in Table 6. The mean values of the three key domains are 4.17, 3.73 and 4.89, respectively. “Community” is the most important factor, followed by “Environment ” and “Teaching-learning”. All factors receive relatively high scores, indicating that these three key domains are widely recognized by teachers as being relevant to classroom teaching and contributing to the quality of classroom discourse from different perspectives, which is the basis for building a high-quality classroom.

Table 6 The score of essential principles from three domains

In order to further explore teachers’ attitudes towards AI feedback, correlation analyses are conducted on the three aspects of AI feedback, including the recognition degree, absorption degree and effectiveness degree of AI feedback. It can be seen from Table 7 that the mentioned three aspects exhibit a positive and significant correlation, with correlation coefficients in the range of 0.853 \(\sim\) 0.979. It shows that teachers can actively accept AI feedback and make corrections, thus improving the quality of discourse.

Table 7 Results of correlation analysis of AI feedback

Discussion

This study endeavours to investigate the impact of an artificial intelligence-supported discourse analysis system on the quality of classroom dialogue. The findings indicate that the feedback generated by the AI system offers useful information for teachers to improve their teaching and facilitate a shift towards the characteristics of high-quality classroom discourse. It is noteworthy that the proposed framework’s iterative process of circular corrections is expected to sustainably improve the quality of classroom dialogue. In addition, participants have a fairly positive attitude towards timely feedback, recognizing its effectiveness in improving the quality of their classroom discourse.

Many studies in the past have shown that classroom dialogue analysis is an important factor in exploring classroom quality, making it of paramount significance for both teacher and student development (Mercer and Littleton, 2007). Traditional classroom dialogue analysis methods usually rely on manual coding, which is relatively slow in analysis speed. Therefore, this study adopts modern artificial intelligence technology to replace traditional manual coding methods, thus significantly improving the efficiency of analysis. In addition, unlike previous studies that did not explicitly explore the relationship between technology and education (Song et al., 2021; Suresh et al., 2022), this paper proposes a comprehensive framework that combines pedagogy theory with information technology.

The framework emphasizes the importance of essential principles and timely feedback, where the essential principles are not only the basis for building a high-quality classroom but also provide timely feedback on pedagogical strategies based on the results of discourse analysis, thereby enabling teachers to scrutinize and revise their discourse. By automating discourse analysis and providing timely feedback for teaching based on essential principles, the quality of classroom dialogue can be continuously improved. The results show that this improvement of discourse quality is evident not only in the change of the discourse distribution but also in the transformation of discourse function, especially in the Question and Feedback category, which is consistent with previous studies (Huang et al., 2021). In particular, multiple rounds of continuous discourse feedback play a key role in improving the quality of discourse.

Unquestionably, technology-enabled classroom dialogue analysis is a promising area of research (Hao et al., 2020). As several studies have shown, teachers have a positive attitude towards AI technology and consider it a useful teaching tool, which further contributes to improving the quality of classroom discourse (Başöz and Çubukçu, 2014; Scherer et al., 2018). This study also demonstrates that participants show a positive attitude towards technology integration in the classroom, and believe the beneficial impact of timely feedback and fundamental principles on their teaching efficacy.

Conclusion

In this paper, we study the issue of classroom dialogue analysis and introduce a novel framework combining artificial intelligence. It has been demonstrated that AI brings efficiency and effectiveness to dialogue analysis systems.

The proposed framework mainly includes two components. The first component, the dialogue-oriented interactive classroom, forms the basis of the framework and comprehensively summarizes four domains. The first three domains are in the basic interactive classroom, including environment, community, and teaching-learning. They clarify the essential principles for dialogues with better quality. To further make the connection with the AI system, we propose a “Guide of AI”, the fourth domain that serves as the bridge between the classroom and AI technology, which can also supervise the operation of an AI system. The second component comprises an artificial intelligence analysis system that autonomously processes conversational data gathered within the classroom. Compared to manual operation, automatic analysis can be conducted in a faster and more cost-effective way. The AI system utilizes natural language processing technology to meticulously analyse the amassed conversational data, subsequently furnishing educators with invaluable feedback about the pivotal facets of cultivating an effective classroom atmosphere.

Through empirical research, it is proved that the comprehensive framework proposed in this paper can help teachers develop better discourse structures. With AI feedback timely received, there tend to be more high-quality dialogue characteristics such as more Question and Feedback, and fewer Statements. The essential principles are not only an important foundation for creating an open and interactive classroom, but also provide effective suggestions for teaching. The questionnaire results show teachers’ high level of awareness of the essential principles and AI feedback for improving the quality of discourse.

This study has contributed to exploring the pedagogical theory and technological integration in discourse analysis, through a theoretical design and empirical research of an artificial intelligence-supported framework for classroom dialogue analysis. However, certain constraints do exist within this research. This study only verified the effectiveness of the AI system and framework in the primary school mathematics classroom, and the effectiveness of the method in other categories of classrooms with more scheme coding can be further extended and explored. In the proposed method, the AI system is more likely to identify the quantity rather than quality of different discourse types, while the latter might be inseparable from human judgement. We will further explore the possibility of the AI system’s more direct and more accurate assessment of dialogue quality for the AI-enabled discourse analysis system.