1 Introduction

Online English teaching and learning are commonly provided to students to gain more knowledge during their leave days. Online English teaching requires a proper Internet connection and a smart device to learn English via online applications [1]. Online English teaching and learning reduces the overall complexity ratio in English learning processes. Online English learning analysis is a process that analyses the huge amount of data that are presented in an application [2, 3]. The analytics process reduces the latency and energy consumption range during classification and identification processes. Online English learning systems use a learning analytics approach [4]. Learning analytics is mainly used to analyze the same skill sets of students via an evaluation process. The learning analytics approach predicts students’ outcomes and behavior patterns, which produce feasible data for further evaluation [5]. Online behavior patterns are calculated based on online classes, reducing the evaluation process’s latency. The learning analytics approach maximizes the performance and feasibility level of online English teaching and learning applications [6, 7].

Big data analytics (BDA) is most commonly used to analyze a huge amount of data. BDA identifies the important key values which are needed to be analyzed from the database. BDA reduces the latency ratio in various processes and functions [8, 9]. BDA handling is complicated in every online English learning application and system. Artificial intelligence (AI)-based big data analysis methods are mostly used in online English learning systems [10, 11]. AI uses a feature extraction method to extract necessary values and data from the database. The extracted data produce optimal information for the analysis process. AI maximizes online English learning applications’ efficiency and mobility ratio [12]. AI-based BDA improves students’ English learning ability and maximizes the systems’ teaching and learning quality [13]. Big data environment for students and teachers (BEST) is a method that is used in online learning systems. The BEST method uses BDA to identify necessary variables and features from the database [14]. The BEST method provides suitable learning modes and modules for students, which reduces the complexity ratio in English learning processes. The BEST method improves the effectiveness and robustness of online English learning systems [15,16,17,18].

BDA is widely used in various fields and applications. BDA is also used for the online English improvement process. BDA analyzes the learning characteristics required for English teaching systems [19]. BDA identifies the important datasets that need improvement from online applications. BDA is used to identify the variables and characteristics of online English learning applications [20]. Data mining-based BDA is used in various online learning systems. The data mining technique detects the important key values and patterns that contain optimal information relevant to English teaching [21,22,23]. Data mining reduces the latency in further improvement and evaluation processes. Data mining techniques provide feasible data which improves students’ learning abilities during online classes [24]. K-means clustering algorithm-based BDA method is also used for the online English learning improvement process. The k-means clustering algorithm predicts the data required for the learning improvement process. K-means clustering algorithm maximizes the accuracy in detection and prediction processes. K-means algorithm-based BDA improves the overall performance range of online English learning systems [14, 25]. This article summarizes the contributions as follows:

  1. 1.

    Introducing a manifold learning data analytics model for improving the quality of English learning through online medium.

  2. 2.

    Performing an improvement process based on the teaching session impact and the students’ experience of prolonged sustainability.

  3. 3.

    Improving the session initialization through impacting recommendations for performance improvement through better understandability.

  4. 4.

    Performing a comparative and data analysis using precise web sources and verifying the proposed methods’ consistency.

The remaining discussion of this article is presented in the subsequent sections as given here: Sect. 2 discusses the related work of this research, and Sect. 3 describes the manifold learning data analytics model proposed for this research. The proposed model is analyzed with evaluation metrics in result and discussion part in Sect. 4. Finally, the discussion of this article concludes in Sect. 5.

2 Related Works

Goodsett [26] proposed a best practice for teaching and assessing critical thinking using online learning objects. The literature review technique is used here to review the exact content and characteristics range gathered. The provided best practices evaluate the information and literacy information required for teaching systems. Literacy instruction and library instructions are provided to both students and teachers, which maximizes the performance range of the systems. The raw score (RS) mean for the 158 online learning objects sample was 9.19, and the overall element score (ES) mean was 4.37. However, a lack of in-depth investigation of potential variances in different learning environments or situations potentially restricts adapting the proposed best practices.

Fitria [27] emphasized how the online learning system might improve classroom dynamics. The importance of such methods is highlighted in unprecedented instances like the COVID-19 outbreak by the positive support of institutions and the good attitudes of lecturers. Educators’ willingness to try new methods and venues for teaching English online is a testament to the field’s flexibility. Qualitative research has neglected important quantitative data about students’ progress and outcomes. The evaluation may have been complete with the addition of numerical data. The outcomes recommend that the online learning system has the potential to help educators and students in the teaching and learning process.

Shwartz-Asher et al. [28] designed teaching and assessing active learning strategies for online academic outcomes. The same learning experience of students is stored and identified from the database that produces feasible data for other processes. The actual goal is to create an active learning environment for the students. Twenty-first-century skills and knowledge are provided to students by teaching proper content. The designed strategy maximizes the performance and effectiveness level of online learning systems. Conversely, this study tends to be specific to a given course or subject, reducing its applicability to other online learning environments.

Insights into e-learning are provided by Bailey et al.’s [29] investigation of intrinsic value, drives, and their effect on satisfaction with the course and technology uptake. The identified connections aid in expanding our knowledge of the elements that affect students’ motivation and choice of online language courses. These results are important for developing instructional training and e-learning strategies that encourage life-changing learning opportunities. Nevertheless, possible differences in motivation and experience among study participants may limit the generalizability of results from studies examining the relationship between intrinsic value and learner satisfaction.

Tang et al. [30] introduced a mediation model to evaluate students’ self-efficacy for online learning systems. The main aim of the introduced model is to identify the effects and self-efficacy of over-English learning. A multivariate correlational analysis method is used here to detect the necessary variables for the evaluation process. The analysis method reduces the latency in the computation process. The self-efficacy range maximizes the performance proficiency level of the students during academic examinations. Although self-efficacy is the primary emphasis of the mediation model, other important psychological aspects that influence students’ success with online learning may not be fully examined.

Zhang et al. [31] developed a core competency-based English major practical teaching system (CCEPTS). The developed model is to improve the overall curriculum development range of students. The artificial intelligence (AI) technique identifies the multi-level collaboration variables for teaching practices. The developed CCEPTS detects students’ authentic experiences, perspectives, capabilities, and activities. The proposed model improves the overall performance and efficiency level of teaching systems. However, the established core competency-driven practical teaching method may rely largely on AI algorithms, reducing its efficacy when AI technologies are not adequately integrated.

Zhai [32] proposed a big data content recommendation algorithm-based oral English training system. The proposed design is a cloud-computation-based architecture that provides various oral learning practices to the students. The proposed design is commonly used in smart classrooms, which require proper teaching practices for the students. Compared with other methods, the proposed method increases students’ oral language skills and knowledge levels. Conversely, potentially restricting its adaptability is that the suggested big data content recommendation method based on the spoken English training system does not consider learner preferences and needs changes.

Shen and Qin [33] designed a data mining-based source language sentence extraction (SLSE). The proposed approach increases the bar for the effectiveness and performance of SLSE systems. A deep neural network (DNN) algorithm is used here to evaluate the important variables necessary for data mining. The DNN approach reduces the processing time to energy usage ratio. DNN trains the datasets that are retrieved from the database. Nonetheless, results from the suggested data mining and regulated mediation analysis using deep neural network methods may be affected by the availability and quality of the underlying data.

Yang et al. [34] developed a new feedback evaluation method for online English learning during COVID-19 pandemic. Feedback gathers the exact information which is relevant to teaching and learning processes. The feedback method evaluates the actual content of the feedback, which produces feasible data for further improvement process. The developed method increases learners’ self-regulated experience, reducing the complexity of understanding the English syllabus. Although the established feedback evaluation method proved useful during the COVID-19 pandemic, its applicability and relevance in less chaotic and non-pandemic settings remain to be explored.

The literature review examined research into several methods for bettering the effectiveness of online English language instruction and testing. Instruction in critical thinking and other basic competencies are the primary research foci in these studies. However, they depend on algorithms and are often only applicable in certain situations, lack quantitative data, and need to be more easily generalized. These restrictions can only be overcome with an all-encompassing strategy. The current study fills in these blanks by introducing an innovative manifold learning data analytics (MLDA) model that can evaluate and utilize data characteristics in real time in correlation. The methodology integrates multi-dimensional data extraction, categorization learning, and impact evaluation to boost the efficacy of online English language training. The concept improves upon existing virtual classrooms using the power of comparative analysis to make them more versatile and comprehensive.

Manifold data analysis for learning improvement and online-based student performance requires suggestions, modifications, and session initializations. Such data analysis uses data from distinct sources and varying factors like understandability and session reachability. For the real-time correlated analysis, data utilization and trivial feature identification are less considered in afore discussed methods. The method increases the chances of high discarding without reusing the existing data for which the new analysis is initialized. Therefore, pre- and post-classification of a session become mandatory in handling the discarding rate. The proposed MLDA achieves this by improving the chances of trivial and non-trivial feature classifications.

3 Manifold Learning Data Analytics Model

The proposed MLDA model is introduced to improve the classification accuracy of session reachability, suggestions, and students’ understanding during English language teaching sessions to identify their impact on the learning experience. The features mentioned above are analyzed in consecutive sessions based on observed data from before and after the learning sessions.

3.1 Data Collection

The proposed model is analyzed using the collected data from (https://www.kaggle.com/datasets/dfydata/the-online-plain-text-english-dictionary-opted) [35]. The data source provides 176,009 words and their corresponding definition for online synonyms sessions. A total of 4 fields (words, count, part of speech, and its definition) are used for handling the sessions. The dataset breakdown consists of typical elements into four categories: (1) The list of words, which might be the main pieces of data in the dataset, is most likely contained in this field. (2) Count: this variable could list the frequency or total number of times each term appears in the dataset. It might shed light on the use or ubiquity of particular words. (3) Part of Speech: The field here may list the grammatical class or part of speech that each word belongs to. Nouns, adjectives, verbs, adverbs, and other basic speech constituents are common. (4) Definition: The meanings or definitions of the words are probably contained in this area. There would be a definition for each term.

The data observed from the platform and students are independently analyzed using classifier tree learning. The proposed model is illustrated in Fig. 1.

Fig. 1
figure 1

Proposed MLDA model

Pre-session data collection, data analysis and classification, and post-session outcomes are the three primary portions that are presented in Fig. 1, which is a thorough depiction of the flow of an educational process. The following lists each section in detail.

3.1.1 Pre-session Data Collection

This section outlines the initial stages of a learning process, including the online session, session reachability, and understanding. Icons represent diverse students, while arrows indicate the flow of data from students to session reachability and understanding. A target icon represents student comprehension, while a direct arrow points to the next section, indicating the transition of data for further processing. These symbols help visualize the learning process and ensure accessibility and understanding.

3.1.2 Data Analysis and Classification

Data processing and analysis are the main points of the figure’s central part. Data gathering after sessions, suggestion input, and classifier training are all part of it. Icons stand for post-session data, and suggestion input is where suggestions come from. The analytics wheel classifies results as either non-trivial or trivial, serving as a visual representation of continuous analysis. Outputs that are not trivial are deemed less important and may not necessitate quick action by the analytics wheel, whereas outputs that are significant warrant additional attention.

3.1.3 Post-session Outcomes

The final section showcases the data analysis results and applications. It evaluates the significance of findings, presenting both non-trivial and trivial outputs. The experience outcome, represented by a person icon with thought bubbles, reflects students’ knowledge or skills gained from improved teaching methods. The improvements, represented by an upward arrow with gears, indicate system or process enhancements based on the analysis’s insights.

3.2 Data Preparation

The manifold data extracted before and after the English learning session using the learning process in a particular session were analyzed to achieve better accuracy for its feature analysis. Some common data analytics features, such as session reachability, student understandability, and suggestions’ accuracy, are considered factors for which the impacts are controlled through recurrent data analysis. MLDAM is a model that uses classifier tree learning for classifying trivial and non-trivial features observed from the extracted manifold data before and after the session. The process of MLDAM in big data-based English language teaching sessions accumulates data, observes, and analyzes it to identify the learning experience’s impact. The manifold data are extracted for classification using the classifier tree learning using improvements and experience observation. The classifier learning output identifies the impacts through classification signified from the accumulated data analytics. The input data observed from the online session are represented as

$$EON^{SS} \left( {DA} \right) = \frac{1}{t}\left( {\sum\limits_{xy = 1}^{t} {H_{v} \left( {x,y} \right) - L_{v} \left( {x,y} \right)} } \right),$$
(1)

where

$$\left. {\begin{array}{*{20}c} {H_{v} \left( t \right) = \frac{1}{2\pi }\sum\nolimits_{x}^{t} {\frac{{H_{v} \left( x \right)}}{{t\left( {SS} \right)}}dt} } \\ {{\text{and}}} \\ {L_{v} \left( t \right) = \frac{1}{2\pi }\sum\nolimits_{y}^{t} {\frac{{L_{v} \left( y \right)}}{{t\left( {SS} \right)}}dt} } \\ \end{array} } \right\}.$$
(2)

In Eqs. (1) and (2), the variable \({\text{H}}_{\text{v}}\left(\text{x}\right)\)and \({L}_{v}(y)\) used to represent the high level and low level of the English online teaching session data analytics \({EON}^{SS}(DA)\) for the session reachability and students’ understandability observed from the past session suggestion is indicated in \(x\) and \(y\) planes. Based on the \(\text{x}\) \(x\) and \(y\) rising and falling for current English teaching session time \(t\) then \(x \in \left( {x, - y} \right)\) and \(y \in \left( { - x,y} \right)\) shown in Eq. (3)

$$\left. {\begin{array}{*{20}c} {H_{v} \left( {x,y} \right) = \frac{1}{2xy}\int_{ - y}^{x} {\frac{{t\left( {x^{2} + y^{2} + 2xy} \right)}}{t}dt} } \\ {{\text{and}}} \\ {L_{v} \left( {x,y} \right) = \frac{1}{2xy}\int_{y}^{ - x} {\frac{{t\left( {x^{2} + y^{2} - 2xy} \right)}}{t}dt} } \\ \end{array} } \right\}.$$
(3)

Based on the data analytics, the initial impact is mitigated for all \(H_{v} \left( {x,y} \right) - L_{v} \left( {x,y} \right)\) that represents a complete sequence based on session reachability, students’ understandability, and suggestions from a past session at any interval on \(\left( {C \times t} \right)\). Here \(C\) denotes the complete classification process. The classification flow is illustrated in Fig. 2.

Fig. 2
figure 2

Classification flow illustration

In Fig. 2, the system is designed for continuous, real-time assessment of student engagement in online learning environments. It uses real-time data analytics and historical session summaries to collect student interactions, engagement metrics, and potentially biometric data. The central component is a machine learning classification model, which receives inputs from both real-time data analytics and historical session summaries. The system operates in a time-series manner, continuously analyzing data as the session progresses. The time-series analysis graph indicates the evolution of certain features or metrics over the course of the session. The system extracts input features at time t, which are derived from the time-series data and fed into the classification model. The predicted output is represented by the model output, with decision boundaries indicating threshold-based decision-making. The system classifies the engagement state into two categories: (a) raising plane, indicating increasing engagement or performance, and (b) falling plane, potentially signaling decreasing engagement or need for intervention. The output is visualized through various user interfaces or dashboards, allowing for real-time monitoring of student engagement.

The \(DA\) and \(SS\) from the sessions are extracted for two classifications Viz \(x(t)\) and\(y\left(t\right)\). The \(H_{v}\) and \(L_{v}\) by matching \(x\left( t \right)\) and \(y\left( t \right)\forall EON^{SS}\). Across the varying \(t(SS)\), the range is modified as \(\left(x,-y\right)\) and \(\left(y, -x\right)\) depending on the planes (refer to Fig. 2). Classification is performed to reduce the impact of online English teaching at each session and is observed for final student experience validation. The impact is due to the trivial feature observation from the extracted features in independent students’ learning experiences \({EON}^{SS}\) at any \(t\) interval. This classification follows a high level of data analysis that is expressed as follows:

$$x\left( t \right) = \frac{{x*2^{{\frac{{{\text{Im}} p}}{2}}} }}{t} \omega_{i} \left( {C \times t - 2^{{{\text{Im}} p}} } \right)$$
(4)
$$y\left( t \right) = \frac{{y*2^{{\frac{{{\text{Im}} p}}{2}}} }}{t} \omega_{j} \left( {C \times t - 2^{{{\text{Im}} p}} } \right),$$
(5)

where

$$\left. {\begin{array}{*{20}c} {\omega_{i} = Tf\left( t \right) \left| {\frac{{{\text{Im}} p}}{2}} \right|Tf\left( t \right)_{{{\text{Im}} p - 1}} } \\ {{\text{and}}} \\ {\omega_{j} = NTf(t)^{ - 1} \left| {\frac{{{\text{Im}} p}}{2}} \right|Tf\left( t \right)_{{{\text{Im}} p - 1}} } \\ \end{array} } \right\}.$$
(6)

Based on Eqs. (4), (5), and (6), the variable \(\omega_{i}\) and \(\omega_{j}\) means the classifier tree learning for high-level and low-level data analytics based on \(x\) and \(y\). The variable \(Tf\left( t \right)\) and \(NTf(t{)}^{-1}\) represents the trivial and non-trivial features from the manifold data extracted of \(\omega_{i}\) and \(\omega_{j}\). For the varying \(\omega_{i}\) and \(\omega_{j}\), the \(x\left( t \right)\) and \(y(t)\) is analyzed in Fig. 3.

Fig. 3
figure 3

Analyses of \(x\left( t \right)\) and \(y\left( t \right)\)

The variations observed \(\forall x\left( t \right)\) and \(y\left( t \right) \in \left( {w_{i} ,w_{j} } \right)\) are tabulated in Fig. 4. As the classifications are across \((x,y)\), the \({w}_{i}\) in \(\left(Imp-1\right)\) achieves high output compared to that of \({w}_{j}\). If this is feasible, reachability is high, confining the level of understandability. It generates an expectation over the available \(Tf \left(t\right)\) and \(NTf{\left(t\right)}^{-1}\). Therefore, the impact is less analyzed for preparing \(y(t)\). The understandability is analyzed for identifying Imp over the \(w_{i}\) and \(w_{j}\), such that weight is adjusted. Considering the \(Tf \left( t \right)\) and \(NTf\left( t \right)^{ - 1}\), the \(y\left( t \right)\) varies for the post-session. Based on the classification of trivial and non-trivial features (i.e.) \(x\) or \(y\), online English learning is observed to improve the consecutive session. The variable \(Imp\) denotes the impact of the past session suggestion used in both the analysis of session reachability and students’ understandability. Now, the normalized classifier learning output based on consecutive \({EON}^{SS}(DA)\) is defined as in Eq. (7)

$$\left. {\begin{array}{*{20}c} {EON^{SS} \left( {Tf\left( t \right)} \right) = \frac{{2^{{\frac{{{\text{Im}} p}}{t}}} \left( {\left( {C \times t} \right) - 2^{{{\text{Im}} p}} } \right)}}{{t^{2} }} \times \left( {\omega_{i} - \omega_{j} } \right)} \\ {{\text{and}}} \\ {EON^{SS} \left( {NTf(t)^{ - 1} } \right) = 2^{{\frac{{{\text{Im}} p}}{t}}} \left( \begin{gathered} \mathop \int \nolimits_{y}^{x} \frac{{\omega_{i} \left( {\left( {C \times t} \right) - 2^{{{\text{Im}} p}} } \right)}}{xy.t}dt - \hfill \\ \mathop \int \nolimits_{ - y}^{ - x} \frac{{\omega_{j} \left( {\left( {C \times t} \right) - 2^{{{\text{Im}} p}} } \right)}}{xy.t}dt \hfill \\ \end{gathered} \right)} \\ \end{array} } \right\},$$
(7)

where the normalized impact-less \({EON}^{SS}\left(Tf\left(t\right)\right)\) and \(EON^{SS} \left( {NTf(t)^{ - 1} } \right)\) are observed after classifying the learning experience. From this \(EON^{SS} \left( {Tf\left( t \right)} \right)\) and \(EON^{SS} \left( {NTf(t)^{ - 1} } \right)\), two features, namely experience, and improvements, are extracted for identifying the impacting feature analysis. Equations (8) and (9) is used to compute the student’s learning experience \(\left( {E_{p} } \right)\) and improvement \(\left( {IM_{p} } \right)\)

$$E_{p} = \frac{1}{2\pi C}\mathop \sum \limits_{xy = 1}^{t} \left( {x\left( t \right) + y\left( t \right)} \right) - {\text{Im}} p,\quad \forall j = i + 1,\quad i \in {\text{Im}} p$$
(8)
Fig. 4
figure 4

Learning process illustrations

and

$$IM_{p} = \sum\limits_{{i = \Delta_{l} }}^{\Delta h} {\log E_{{p_{i} }} \left( {EON^{SS} } \right)} ,$$
(9)

where \(N\) is the normal student experience observed in online English teaching sessions. The variables are \(\Delta_{h}\) and \(\Delta_{l}\). The high and low session reachability, understandability, and suggestion in experience are observed at any time interval. The log normalization of students’ experience analysis output improves student learning \(Tf\left(t\right)\). The based analysis is represented as follows:

$$IM_{p} \left( {Tf\left( t \right)} \right) = \log \left( {\frac{{\Delta_{h} - \Delta_{l} }}{t}} \right)*EON^{SS} \left( {E_{p} } \right).$$
(10)

Equation (10) log normalization is performed to identify the impact on the learning experience \(EON^{SS} \left( {Tf\left( t \right)} \right)\) and \(E_{p}\) is independently analyzed with the different time intervals and sessions. The classification is performed based on \(E_{p}\) and \({IM}_{p}\left(Tf\left(t\right)\right)\) using classifier tree learning. This classification process helps to segregate the trivial and non-trivial features for all low-level to high-level data analysis. The learning process is illustrated in Fig. 4.

The classification is performed for \(DA\forall x\left( t \right)\) and \(SS\forall y\left( t \right)\). The first outcome is the \(E_{p} \forall w_{ij}\) and the second is the \(Tf(t)\) extraction \(\forall y\). The Imp over the available and unavailable \(y\) is measured depending on the \(E_{p}\) gained. Therefore the successive classification relies on \(E_{p}\) through \(w_{i + 1}\) and \({w}_{j+1}\) until \(x\left( t \right) > y\left( t \right)\forall DA\) is achieved. In the alternating iteration, if \(E_{p}\) is not matched with \(y\), then \(Tf (t)/NTf{\left(t\right)}^{-1}\) is classified. It significantly identifies the \(Imp\) (adverse/ continual) over available \(DA\) and \(SS\) (refer to Fig. 4). In this classifier learning, the available features of each student are independently analyzed at each online English teaching session followed by impacting feature analysis. The observed input from the active online English teaching session and session initialization time is determined for the \(E_{p}\) and \(IM_{p} \left( {Tf\left( t \right)} \right)\) Computation is given as

$$\left. {\begin{array}{*{20}c} {CL\left( {E_{p} ,IM_{p} \left( {Tf\left( t \right)} \right)} \right) = \sum\nolimits_{i = 1}^{x} {t_{i} } + \sum\nolimits_{j = 1}^{y} {t_{i} } - \sum\nolimits_{i = 1}^{x} {\sum\nolimits_{j = 1}^{y} {t_{i} \left( {\Delta_{h} - \Delta_{l} } \right)_{i} } } } \\ {\begin{array}{*{20}c} {{\text{and}}} \\ {SIN\left( {E_{p} ,t} \right) = \frac{{{\text{Im}} p^{t} *CL\left( {E_{p} ,IM_{p} \left( {Tf\left( t \right)} \right)} \right)}}{{\mathop \sum \nolimits_{xy = 1}^{C \times t} {\text{Im}} p^{i} *CL\left( {E_{p} ,IM_{p} \left( {Tf\left( t \right)} \right)} \right)_{i} }}} \\ \end{array} } \\ \end{array} } \right\}.$$
(11)

In Eq. (11), the factor \(CL\left( . \right)\) used to denote the classifier learning operation for \(E_{p}\) and \(SIN\left(.\right)\) is the initial session initialization at \(t\). Similarly, the initial session time of online English teaching and suggestion is given as in Eq. (12)

$$CL\left( {E_{p} ,IM_{p} \left( {Tf\left( t \right)} \right),IM_{p} } \right) = \left\{ {\begin{array}{*{20}c} {\sum\nolimits_{i = 1}^{xy} {SS_{{t_{i} }} \Delta_{i} \frac{1}{{\omega_{i} }},\quad {\text{if}}\quad x\left( t \right) + y\left( t \right) \in \left( {0,\infty } \right)} } \\ {\sum\nolimits_{j = 1}^{xy} {t_{j} \Delta_{j} \omega_{j} ,\quad {\text{if}}\quad x\left( t \right) + y\left( t \right) \notin \left( {0,\infty } \right)} } \\ \end{array} } \right.,$$
(12)

such that

$$SIN\left( {IM_{p} \left( {Tf\left( t \right)} \right)} \right) = \frac{{x^{2} + y^{2} - {\text{Im}} p^{{ - CL\left( {E_{p} ,IM_{p} \left( {Tf\left( t \right)} \right)} \right)}} }}{{\mathop \sum \nolimits_{i = 1}^{t} {\text{Im}} p^{{ - CL\left( {E_{p} ,IM_{p} \left( {Tf\left( t \right)} \right)} \right)_{i} }} }}.$$
(13)

In Eq. (13), the initial online session time and classifier learning output are processed to be analyzed for both \(x\) and \(y\). This classification-based analysis helps to distinguish the student learning experience based on teaching and session \(t\) to identify possible classification instances. The consecutive data analysis relies on gathering sequences \(x\) and \(y\), such that successful online English teaching is achieved in all the sessions. The linear output of \(x\) and \(y\) is the classifying data analytics for maximizing \(\left( {C \times t} \right)\). The past session’s suggestion-based mediate output \(({PS}^{Out})\) and classifier learning final output \(({F}^{Out})\) are crucial to successful sessions. Based on the dataset introduced later in this article, the initial and final suggestions/sessions are tabulated in Table 1.

Table 1 Initial and final suggestions/session

The suggestions in CL and final are required for the joint identification of \(w_{i}\) and \(w_{j}\). Considering the classification across \(E_{p}\) and \({\left(t\right)}^{-1}\) as presented in Fig. 4, the first classification is high if \(w_{i}\), it is above the negative marks; similarly, if it \(w_{j}\) is higher than the last observed output and increases it eventually. Therefore, the number of classifications required gradually increases. Considering the available \(DA\) and \(SS\) and their triviality, it is discarded. Therefore, the suggestions for the new sessions are modified for their high Imp (refer to Table 1).

3.3 Classifier Learning Process

Classifier learning analyzes students’ online English learning experience and improvement. Both factors vary based on the condition \(Imp \ne 0\), \({EON}^{SS}\left(NTf(t{)}^{-1}\right)=(xy-Imp)Tf\left(t\right)\) and \(SIN\left(.\right)\). If the online English session is available in the allocated time, it is output in 1. Else, \(0\). The mediate output in the first online session \(C \in t\) generates a high learning experience and improvement in students, whereas \(\left( {C,{\text{Im}} p} \right) \in t\) extracts the features of \(x\) and \(y\) from the classification process with \(Imp \ne 0\). Equation (14a) and (14b) evaluates the mediate output and final output analysis for achieving two successive sessions. This computation is to satisfy both session reachability and student understandability assessment with \(SIN = 1\) or \(SIN = 0\) in \(t\) interval. Therefore, the outputs are required for the entire online English teaching session allocated time interval \(t\). In the above classification process, \(Imp\) serves as an input for the experience analysis, and after the detection of \(Imp\) in \(C \in t\) is given as

$$\left. {\begin{array}{*{20}c} {PS^{{Out^{1} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{1} t_{1} + x_{1} y_{1} } \\ {PS^{{Out^{2} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{2} t_{2} - {\text{Im}} p_{1} + x_{2} y_{2} } \\ {PS^{{Out^{3} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{3} t_{3} - {\text{Im}} p_{2} + x_{3} y_{3} } \\ \vdots \\ {PS^{{Out^{t} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{t} t_{t} - {\text{Im}} p_{t} + x_{t} y_{t} } \\ \end{array} } \right\}$$
(14a)
$$\left. \begin{gathered} \begin{array}{*{20}c} {F^{{Out^{1} }} = PS^{{Out^{1} }} } \\ {F^{{Out^{2} }} = PS^{{Out^{2} }} - {\text{Im}} p_{1} } \\ {F^{{Out^{3} }} = PS^{{Out^{3} }} - {\text{Im}} p_{2} } \\ \vdots \\ {F^{{Out^{t} }} = PS^{{Out^{t} }} - {\text{Im}} p_{t} } \\ \end{array} \hfill \\ \left| \begin{gathered} F^{{Out^{1} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{1} t_{1} + Tf\left( t \right)_{1} SIN_{1} \hfill \\ F^{{Out^{2} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{2} t_{2} + Tf\left( t \right)_{2} SIN_{2} - \left( {xy} \right)^{1} {\text{Im}} p_{1} \hfill \\ F^{{Out^{3} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{3} t_{3} + Tf\left( t \right)_{3} SIN_{3} - \left( {xy} \right)^{2} {\text{Im}} p_{3} \hfill \\ F^{{Out^{t} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{t} t_{t} + Tf\left( t \right)_{t} SIN_{t} - \left( {xy} \right)^{t} {\text{Im}} p_{t - 1} \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered} \right\}.$$
(14b)

As per the above equations, the linear output is computed from past session suggestion and classifier learning output, and if \(Imp = 0\), then \(NTf(t)^{ - 1} = Tf\left( t \right)\), and hence, \(\begin{gathered} F^{{Out^{t} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{t} t_{t} + \hfill \\ Tf\left( t \right)_{t} SIN_{t} - \left( {xy} \right)^{t} Imp_{t - 1} \hfill \\ \end{gathered}\) is the reliable output for two successive session \(SUCC_{s} = 1\). Therefore, the previous non-trivial solutions for revamping a new online English teaching are retained regardless of the training level \(1\). The big data store all the information observed from the students at the time of the English teaching session for providing reliable teaching, and this data analytics determines the manifold data extracted before and after the session using the learning process. Instead, \(\left( {C,{\text{Im}} p} \right) \in t\)-based mediate, the final output is estimated as in Eq. (15a) and (15b)

$$\left. {\begin{array}{*{20}c} {PS^{{Out^{1} }} = \left( {NTf(t)^{ - 1} } \right)_{1} } \\ {PS^{{Out^{2} }} = \left( {NTf(t)^{ - 1} } \right)_{2} - SIN_{1} *{\text{Im}} p_{1} - xy_{1} } \\ {PS^{{Out^{3} }} = \left( {NTf(t)^{ - 1} } \right)_{3} - SIN_{2} *{\text{Im}} p_{2} - xy_{2} } \\ \vdots \\ {PS^{{Out^{t - 1} }} = \left( {NTf(t)^{ - 1} } \right)_{t - 1} - SIN_{t - 1} *{\text{Im}} p_{t - 1} - xy_{t - 1} } \\ \end{array} } \right\}$$
(15a)
$$\left. {\begin{array}{*{20}c} {F^{{Out^{1} }} = PS^{{Out^{1} }} = \left( {NTf(t)^{ - 1} } \right)_{1} } \\ \begin{gathered} F^{{Out^{2} }} = PS^{{Out^{2} }} + t_{1} - \left( {NTf(t)^{ - 1} } \right)_{1} = \hfill \\ Tf\left( t \right)_{1} SIN_{1} *{\text{Im}} p_{1} - xy_{1} + t_{1} - {\text{Im}} p_{1} \hfill \\ \end{gathered} \\ \begin{gathered} F^{{Out^{3} }} = PS^{{Out^{3} }} + t_{2} - \left( {NTf(t)^{ - 1} } \right)_{2} = \hfill \\ Tf\left( t \right)_{2} - SIN_{2} *{\text{Im}} p_{2} - xy_{2} + t_{2} - {\text{Im}} p_{2} \hfill \\ \end{gathered} \\ \vdots \\ \begin{gathered} F^{{Out^{t} }} = PS^{{Out^{t - 1} }} + t_{t - 1} - \left( {NTf(t)^{ - 1} } \right)_{t - 1} = \hfill \\ Tf\left( t \right)_{t - 1} - SIN_{t - 1} *{\text{Im}} p_{t - 1} - xy_{t - 1} + t_{t - 1} - {\text{Im}} p_{t - 1} \hfill \\ \end{gathered} \\ \end{array} } \right\}.$$
(15b)

The outputs of Eq. (15a) and (15b) is required by verifying the learning experience condition \(EON^{SS} \left( {NTf(t)^{ - 1} } \right) = \left( {xy - Imp} \right)Tf\left( t \right)\) and \(SIN = 1\) or \(SIN = 0\) in a step-by-step manner for identifying the impact of each student on English learning. If \(SIN = 0\), then \(F^{{Out^{t} }} = Tf\left( t \right)_{t - 1} - SIN_{t - 1} *Imp_{t - 1} - xy_{t - 1} + t_{t - 1} - Imp_{t - 1}\) is the final classifier learning output, whereas if \(SIN = 1\), then \(Imp = 0\) and hence the learning output is \({\text{F}}^{{\text{Out}}^{\text{t}}}={\text{PS}}^{{\text{Out}}^{\text{t}-1}}+{\text{t}}_{\text{t}-1}-{\left(\text{NTf}(\text{t}{)}^{-1}\right)}_{\text{t}-1}\) \(F^{{Out^{t} }} = PS^{{Out^{t - 1} }} + t_{t - 1} - \left( {NTf(t)^{ - 1} } \right)_{t - 1}\). Therefore, if \(C \in t\) then \({\text{F}}^{\text{Out}}=\left(\text{NTf}(\text{t}{)}^{-1}\right)+1\) \(F^{Out} = \left( {NTf(t)^{ - 1} } \right) + 1\) is the non-trivial solution and \(F^{{Out^{t} }} = PS^{{Out^{t - 1} }} + t_{t - 1} - \left( {NTf(t)^{ - 1} } \right)_{t - 1}\) represents the set of extracted features output. From this non-trivial solution, \(SUCC_{s} = \left( {\frac{SIN\left( 1 \right) - xy - Imp}{t}} \right)\) are the revamping value, and the training is updated with all the online English teaching sessions with \(PS^{Out}\) and \(F^{Out}\) as in Eqs. (15a) and (15b), the condition is not applicable for the first English teaching session assessment as per Eqs. (14a) and (14b), because the analysis depends on individual student understandability and session reachability for time. The revamping process through the classification is illustrated in Fig. 5.

Fig. 5
figure 5

Classification for revamping

The revamping process is classified and extracted for \(Imp_{t} \forall \left( {x,y} \right)\), and therefore, two sequences \(\left( t \right)\) and \(\left( {t^{ - 1} } \right)\) are identified. Based on the recurrent process then, new suggestions are produced \(\forall F^{out}\) and \(PS^{out}\). In this process, \(P^{out}\) is extracted from \(PS^{out} \forall revamping\) initializations. The multiple combinations of \(xy \forall \left( {One to t} \right)\) and \(\left( {1\;{\text{to}}\;t - 1} \right)\) is required to prevent unnecessary discarding. Therefore, the \(Tf\left( t \right)\) and \(NTf\left( t \right)^{ - 1}\) are required for improving the solutions in training improvements (refer to Fig. 5). Therefore, the \(SUCC_{s}\) along with \(EON^{SS} \left( {NTf(t)^{ - 1} } \right)\) and \(t\) is observed using the big data, and hence, the learning experience is changed at any time. In the following data analysis sequence,\(SUCC_{s}\) on its past session suggestion determines the improvements in an online session. If the sequence is observed \(Imp > xy\), the extracted features are discarded to prevent a trivial-based impact on the session reachability and student understanding. The classifier learning generates additional training for the less-learning experience students to ensure appropriate tests/exams are conducted to address the impacting feature analysis. The data accumulated from the online English teaching platform and students are analyzed using classifier tree learning depending on English language learning and assessment in each session, preventing impacts by gathering trivial features, whereas the successive session is high. The controlled trivial feature extraction ensures impact-less data analysis within the learning analysis scenario. However, the chance for experience and improvement modification in the learning session is improved, and therefore, the successive session is achieved. The following steps outline the general steps of the proposed MLDA methodology:

  1. 1.

    Data Collection and Preparation: Collect and organize information on online English lessons, such as accessibility, student comprehension, and suggestions.

  2. 2.

    Consecutive Session Feature Analysis: Using the learning process, analyze the session’s reachability, comprehension by students, and suggestion accuracy before and after each session.

  3. 3.

    Classifier Tree Learning: Training a classifier tree to recognize influences on student learning allows you to quickly and accurately categorize characteristics from large datasets.

  4. 4.

    Influence Identification and Examination: Analyze the results of the classifier’s learning to determine the effects on session accessibility and student comprehension, and keep an eye on the elements influencing classification to learn from and improve upon them.

  5. 5.

    Sequential Impact Scrutiny: Analyze the effects of different session ideas on the quality of the learning experience for the students and the intermediate and final results using sequence analysis of the data.

  6. 6.

    Learning Experience Assessment: Normalize impact-free and impact-bearing activities and evaluate students’ learning experiences utilizing computed outputs.

  7. 7.

    Upgrading Analysis: Analyze the extent to which the learner has progressed and standardize the resulting data. Consider the effect that students have on their English proficiency.

  8. 8.

    Revamping and Training: Improve the training process by including non-trivial solutions and utilizing extracted features and classification output.

  9. 9.

    Data Accumulation and Influence Mitigation: Accumulate data from the online English education platform and learners applying classifier tree learning to regulate the extraction of trivial features and mitigate potential effects.

  10. 10.

    Successive Session Achieved: Maximize the value of your experience by adjusting your sessions.

The proposed MLDA approach uses classifier tree training and multivariate data analysis to isolate and manage relevant influences on the efficacy of English language instruction delivered via digital mediums. The model’s sequential analysis boosts session accessibility, student comprehension, and suggestion accuracy, contributing to better educational outcomes.

4 Results and Discussion

Based on the available words and their “parts of speech,” four levels under 36 sessions are classified. For ease of representation, the initialization (for a session) is represented as in Fig. 6.

Fig. 6
figure 6

Representation of session initialization

The data source is utilized for a new session in a level-by-level manner for performance assessment. The performance outcome is measured by identifying \(DA\) and \(SS\). The flaws in session instigation are prevented by upgrading the application/providing ease of access and grade assessments. Contrarily, the understandability portrays the required improvements by identifying the \(Imp \forall E_{p}\). From the available data, the \(Imp\) of \(Tf\left( t \right)\) and \(NTf\left( t \right)^{ - 1}\) over the performance is analyzed in Fig. 7.

Fig. 7
figure 7

\(Tf\left( t \right)\) and \(NTf \left( t \right)^{ - 1}\) analysis

The \(Tf\left( t \right)\) is comparatively less than \(NTf \left( t \right)^{ - 1}\) for both \(w_{i}\) and \(w_{j}\) across multiple \(t\). Depending upon \(xy_{t}\) and \(xy_{t - 1}\), the revamping for \(PS^{out}\) is performed, and the \(w_{i}\) and \(w_{j}\) are either incremented/ decremented. Considering the changes in \(F^{out}\) instead of \(PS^{out}\), the initialization for the consecutive session is performed. Therefore, the \(NTf\left( t \right)^{ - 1}\) is signified for improving the learning assistance over \(t\) (Fig. 7). Now, the \(Imp\) and \(E_{p}\) for \(w_{i}\) and \(w_{j}\) is presented in Fig. 8.

Fig. 8
figure 8

Imp and \(E_{p}\) analyses

The \(w_{i}\) shows up high variation for \(Imp\) and \(E_{p} \forall t\). The condition is due to the \(xy\) combination split into \(\left( {One to t} \right)\) and \(\left( {1\;{\text{to}}\;t - 1} \right)\) across \(CL \left( . \right)\). The classifying learning segregates \(SIN\)-based \(PS^{out}\) and \(F^{out} \forall \left( {c,\;Imp} \right)\), such that the session improvements are made. Depending on the available \(w_{j}\) configuration, the next \(CL\left( . \right)\) is planned for a new session for better improvements (Fig. 8). The comparative analysis is presented in the following section using the metrics classifications, data extraction, feature discard, initialization, and classification time. The input is analyzed for 36 sessions and 14 data instances per session. The methods SLSE [33], CCEPTS [31], and TA-AL [28] are considered from the related works section.

4.1 Classification

In Fig. 9, the impacting features are detected from the online English teaching sessions using manifold data extraction to improve the classification accuracy based on session reachability and students’ understanding. The past session suggestion is classified as trivial and non-trivial impact detection using classifier tree learning and learning experience computed in different intervals. The manifold data observed before and after the session are extracted based on the consecutive session processing to identify the impact. Classifier learning is used for analyzing individual student learning experiences and improvement in online English teaching sessions. Both factors vary based on the condition \(Imp \ne 0\) \(EON^{SS} \left( {NTf(t)^{ - 1} } \right) = \left( {xy - Imp} \right)Tf\left( t \right)\) and \(SIN\left( . \right)\). This condition is performed between two successive sessions based on accumulated data from the sessions to prevent impact. Therefore, the impacts detected before and after the session are reduced and lead to high classification due to changes in the learning experience.

Fig. 9
figure 9

Classifications

4.2 Data Extraction

The trivial and non-trivial features observed from the extracted data are analyzed for identifying impacts on English language learning and assessment as the first input observed for computing each student’s learning capacity is illustrated in Fig. 10. The learning capacity is analyzed based on session reachability, student’s understandability, and suggestions are observed and monitored in English teaching sessions with audio and visual representation. This proposed model satisfies high manifold data extraction by classifying the trivial and non-trivial features in the session at any instance.

Fig. 10
figure 10

Data extraction

This estimation is to satisfy session reachability and student understandability assessment with \(SIN = 1\) or \(SIN = 0\) in \(t\) interval is processed until the learning experience is improved. Therefore, the outputs are obtained from the online English teaching session allocated time interval \(t\). Hence, changes in data analytics are identified to maximize the session initialization time for training the previous suggestion through classification-based analysis and achieve high data extraction.

4.3 Feature Discard

The identification of impacts in online learning based on learning analysis in big data for college English language teaching using the non-trivial inputs for extracting manifold data before and after the session is represented in Fig. 11. This observation is analyzed using the condition \(Imp = 0\), which \(F^{{Out^{t} }} = EON^{SS} \left( {NTf(t)^{ - 1} } \right)_{t} t_{t} + Tf\left( t \right)_{t} SIN_{t} - \left( {xy} \right)^{t} Imp_{t - 1}\) is the reliable output between two successive sessions \(SUCC_{s} = 1\). If the addressed impact is nil or less, the extracted feature from the online session is discarded. After that, the classification process is signified for impacting feature analysis with high-level to low-level data analytics achieved wherein the different session learning experience is analyzed using Eqs. (14a), (14b), (15a), and (15b) computations. Based on this classification of impacting feature analysis in online English teaching sessions, the feature discard is less than the other factors.

Fig. 11
figure 11

Feature discard

4.4 Initialization Time

In this proposed manifold learning data analytics in online English teaching sessions, the data above analytics reduces the session initialization time based on changing trivial features to non-trivial features for retaining the previous training. The impacts identified from a past session are addressed for improving college students’ online English learning experience through classifier tree learning for linear computation. The impacting feature identified before and after the session \(\left( {C,Imp} \right) \in t\) is verified for satisfying both mediate and final output. The session initialization time is computed using the learning to identify trivial and non-trivial features from the extracted data in online sessions for learning experience impacts the training instance for improving consecutive sessions. The proposed model analyzes both the platform and students through learning for impact detection in which the non-trivial solution achieves less initialization time, as presented in Fig. 12.

Fig. 12
figure 12

Initialization time

4.5 Classification Time

The online English teaching for college students using data analytics is performed through manifold data extracted before and after the English teaching session depending upon trivial and non-trivial feature classification instances in different intervals, as represented in Fig. 13. In this model, the impact is identified on the learning experience for analyzing independent students understandability and improvement based on data analytics, such that \(PS^{Out}\) and \(F^{Out}\) is computed between two successive sessions. The following feature extraction instance relies on data analysis; \(SUCC_{s}\) its past session determines the improvements in online English sessions. If the impact is identified in the current session \(Imp > xy\), the extracted features are discarded to prevent trivial-based impact based on the session reachability and student understanding computation. Therefore, the impact is less than other factors in online English learning. This model’s classification time is less based on consecutive session processing. Tables 2 and 3 summarize the above comparative analysis discussion.

Fig. 13
figure 13

Classification time

Table 2 Summary of # sessions
Table 3 Summary of # data instances (/session)

Findings: The proposed model improves classifications and data extraction ratio in the numeric: 12.82% and 9.12%. The feature discard, initialization, and classification time are reduced by 12.27%, 12.31%, and 11.69%.


Implications: Reduced feature discard indicates that MLDA can efficiently keep pertinent features for analysis, minimizing information loss. Lower startup and classification durations suggest that the MLDA may quickly process and analyze data, enabling real-time modifications during lessons. Less time is needed for initialization and classification, which helps educators and institutions respond to students’ needs more quickly and enhances online learning quality.


Findings: The proposed model improves classifications and data extraction ratio in the numeric: 14.74% and 8.73%, respectively. The feature discard, initialization, and classification time are reduced by 11.84%, 10.57%, and 13.17%.


Implications: The increased classification accuracy raises the possibility that MLDA will be more adept at spotting essential components in online English instruction sessions. Improved data extraction suggests that MLDA can extract pertinent data from sessions, resulting in more thorough analysis and improved decision-making. The adoption of MLDA by educators and organizations could improve the caliber of online English instruction sessions.

The approach assesses impact detection, session recommendations, clarity, relevance, and efficiency and then pinpoints where enhancements might occur. It employs classifier tree learning to categorize and examine influential characteristics, emphasizing gathering data, feature elimination, initialization, and classifying time. A new model called Manifold Learning Data Analytics (MLDA) has been developed to make online English lessons more efficient and successful. The model takes a fresh approach to the problems and restrictions of the prior literature by emphasizing correlated analysis, data usage, and feature recognition in real time. This novel method expands existing literature and provides a targeted strategy to improve online English classes.

5 Conclusion

The Manifold Learning Data Analytics Model (MLDAM) introduced in this study represents a significant advancement in online English language instruction. By utilizing classifier tree learning to differentiate between trivial and non-trivial elements, the model substantially improves the precision of impact evaluations, enabling targeted enhancements in subsequent lessons. The iterative training process, incorporating manifold data extraction based on student performance and feedback, promotes continuous improvement of teaching strategies. The findings demonstrate notable improvements in classification accuracy, data extraction ratio, and efficiency metrics. These results underscore MLDAM’s capacity to optimize data utilization and create a more responsive, adaptable learning environment. The model’s ability to provide personalized educational experiences tailored to individual student needs marks a crucial step toward more effective online language learning. Future research should focus on assessing MLDAM’s long-term sustainability and effectiveness across diverse educational settings and extended time periods. This will provide deeper insights into its potential and guide its optimal integration into virtual classrooms. Additionally, exploring the model’s applicability to other language learning contexts and its potential for integration with emerging educational technologies could further expand its impact. Investigating the model’s role in supporting learner autonomy and motivation in online environments also presents an exciting avenue for future study.