Keywords

1 Introduction

In this era of time, artificial intelligence is growing at an alarming rate and it is becoming a part of everyday life. However, the most important components of the artificial intelligence that can make a successful chatbot are the natural language processing (NLP) and machine learning (ML) (Satheesh et al., 2020; Ranavare & Kamath, 2020; Lalwani et al., 2018).

According to the Caldarini et al. (2022) the chatbot works by taking the natural language texts as input and the output is expected to be a relevant to the user’s input. Thus, the natural language processing can enable the machine to understand the user’s request, nevertheless, the a chatbot can learn continuously as long as it is answering more questions from users, and it keeps learning from the past data which can improve the performance without being explicitly programmed (Samuel, 1967). Nowadays, the chatbot can take multiple roles in different educational sectors such as teaching, advising, and providing a real time 24/7 to support the students and parents (Abdelhamid et al., 2020). Moreover, Assayed et al. (2023a, b) conducted a chatbot classifier for identifying the students’ emotions during Covid-19 pandemic. Indeed, High school is one of the most important stage in students’ lives, as in this stage, students have the option to select their academic streams and advanced courses that can shape their future career to their next important milestone in the post-secondary education. Accordingly, students in high schools need to be advised and guided effectively before going through their final college-major choices.

In fact, during the busy periods of universities’ applications, students might need one-to-one adviser for guiding them toward their applications during a limited time, besides other students might need to know the minimum score of the admission tests before submitting their applications. To begin with, these students need to receive effective answers from their schools’ representatives without waiting longtime. Thus, this paper aims to develop a novel chatbot for improving the students service in high schools by transferring students’ enquiries to the particular agent based on the enquiry type. This technique is called the intent classification. Which can work by identifying the intent behind the student’s question and accordingly it will transfer the student’s request to the assigned agent. The main role of the chatbot in this study is using the state-of-the-art machine learning algorithms as well as the natural language processing (NLP) to understand and identify the intention behind the student’s request and accordingly classify it into a proper class for further process. As a result, a comparison between machine learning and neural network models will be conducted to identify the most accurate model. To do so, we created a new dataset with 1004 high school students queries that annotated with two classifications, 1- high school courses 2- majors & universities. The request will be transferred to the correct agent based on the classification type. For example, if the student is interested to know more about the deadline for the university application, the request will be transferred immediately to the majors & universities’ agent, on the other hand, if the student requests to register a new advanced course during the A-level year, then the request will be transferred to the high school courses’ agent. This paper is organized as follows: Sect. 2 explore the related works, Sect. 3 describes the experiment and development, Sect. 4 describes the results and finally the conclusion and future works are described in Sect. 5.

2 Related Works

There are several researchers study the chatbots as a conversational assistants and for classifying the intent of users’ input in order to improve the services into different aspects of industries. For instance, Henfy et al. (2020) developed an intent classifier of chat messages that were used in communicating between the teams of the software developers. Moreover, Schuurmans and F. Frasincar (2020) used multiple Machine Language algorithms to enhance the intent classification of the dialogue system. Furthermore, Pérez-Vera et al. (2017) developed a chatbot classifier to answer to FAQ by using 5000 tweets from the twitter’s account in the electricity’s companies in order to enhance their services as well as the customers’ satisfaction. In general, the researchers used different algorithms for identifying the intent classifications. For example, Setyawan et al. (2018) compared the chatbot classifiers with using the Naive Bayes model and the Logistic Regression, the data in this study is collected from the reports that have been reported by the public, then the authors take some data samples from the test dataset and determine the class recognition used as training data. Furthermore, a deep learning neural network technique is adopted into other chatbots classifiers. Lhasiw et al. (2021) developed a chatbot classifier by using a bidirectional LSTM model to classify a question message into five intention classes. Despite the fact of the essential stage of students in high school in shaping their future career, but few researchers studied the chatbot classifiers for students advising in high schools. Hamzah et al. (2022) developed a chatbot for education by using SVM algorithm to provide an instant feedback in interactive sessions with students and Assayed et al. (2023a, b) conducted a deep learning based chatbot for classifying students’ enquiries into two classifications. In this paper we aim to fill the gap by providing a novel chatbot classifier with using the state-of-the-art machine learning algorithms along with other neural network techniques, the MLP and the national language understanding (NLU) are the main components in processing the intent classifications. Nonetheless a new corpus of high school advising questions and enquiries is created, Fig. 1 illustrates the main architecture of the chatbot.

Fig. 1.
figure 1

The main architecture of the classifier chatbot

3 Experiment and Development

3.1 Corpus Selection

The corpus is collected from unstructured resources such as social media, students, parents and schools’ websites. The selected data includes new (1004) questions and enquiries. Most of the questions are asked by high school students and parents, usually students in high schools have many concerns about universities requirements and majors.

Data Annotation

Each question is annotated manually; if the question is related to school’s curriculum and the advanced courses in high school, then the tag of “high school courses” will be assigned to it, on the other hand, if the question is related to the majors and universities selections then the “majors & universities” will be annotated to it. However, Table 1 shows the distribution of the data annotation.

Table 1. The description of the data annotation

3.2 Preprocessing the Dataset

This function aims to prepare the dataset for the machine learning models as well as to improve the performance in processing the data. This task includes several subtasks starting from cleaning data, removing stopping words, normalization and tokenization. Figure 2 shows the main tasks of preprocessing the data.

Fig. 2.
figure 2

The main tasks of preprocessing the data.

Afterward, all tokens are normalized and identified. Figure 3 shows 1036 unique tokens from 1004 texts after -applying all the preprocessing techniques.

Fig. 3.
figure 3

The vector dimension after preprocessing the data.

3.3 Features Extraction

Features Extraction is quantifying the features by converting the unstructured text to structural dataset, in order to be identified by the machine language successfully (Liu et al., 2018), however in this paper the authors applied the function TfidfTransformer for improving the performance of the model.

TfidfTransformer

This function is weighing the features that are repeated in the majority of the texts. Accordingly, it can reduce the impact of the more frequent words with less valuable, on the other hand, it will enhance the impact of the infrequent with more valuable words (Zhao et al., 2018). Figure 4 shows an example of quantifying the probabilities by using the tfdf for all the (1036) unique tokens.

Fig. 4.
figure 4

The TFIDF features after using the TfidfTransformer.

3.4 Building the Model

Tools and Techniques

In this study we used the Python software from the Anaconda platform to write and run the codes of developing the models and algorithms. The Anaconda platform is an open source platform for Python distribution which comes with effective packages and libraries (“Anaconda Software Distribution,” 2020), we used particularly the JupiterLab 3.0.14 to write the Python codes.

Naive Bayes Machine Learning Algorithm

Naive Bayes is a probabilistic algorithm that depends on the Bayes Theorem, it calculates the probability of each feature based on the past results as shown below:

figure a

In fact, Naive Bayes assumes that all features are equally important, which can cause a low performance. Therefore, in this study we applied the features extraction (Tfidf) in order to improve the accuracy of this model.

Recurrent Neural Networks (RNN)-LSTM

The long short-term memory (LSTM) network is the advanced approach of recurrent neural network (RNN) with extending the memory. LSTM can be capable to remember inputs from long text over a long period of time by learning the order dependence in input sequences (Datta et al., 2022). The extension memory is responsible for remembering the inputs from long sentence. In this model we constructed a simple architecture, since we have a small corpus. Therefore, the model started with connecting the input data with the embedding layer. Nevertheless, before feeding the input sequences to the LSTM, we used the padding function to make all the length (200) for all sequences, finally we used a sigmoid function for the output layer. Figure 5 explains the structure of this model.

Fig. 5.
figure 5

LSTM architecture for students ‘intent classifications.

In fact, the training process in this model is improved by keep tuning the hyper-parameters, consequently, an embedding layer with (32) dimensions has been added in this model with using (10) LSTM neurons and (1) output layer as displayed in Fig. 6.

Fig. 6.
figure 6

The architecture of the layers in the RNN-LSTM model.

The Adam optimizer is selected to compile the model with using 20 epochs, the model did a good job, as the result shows that data is trained with high accuracy level (99%), as shown in the Fig. 7.

Fig. 7.
figure 7

Training the data high accuracy level 99% by using only 20 epochs.

After that, the model is evaluated by passing the test data to the prediction function with using the accuracy metrics; indeed, the accuracy score is the most popular metric for the classification task in the deep learning models particularly. The accuracy score is calculated as the following formula: by dividing the formula of estimating the accuracy performance is:

The accuracy value = Number of corrected predictions/ Total number of predictions.

3.5 Testing the Model

Both models the Naive Bayes and the RNN-LSTM are tested with using the unseen data from students’ questions -without including the students’ intention classes-. The following performance metrics are used in the Naïve Bayes model: Precision, Recall and F1-score. On the other hand, only the accuracy metric is used in LSTM; since it’s the most popular for the classification task in the neural networks models.

4 Results and Discussion

Both models in this study show a high level of accuracy, the Naive-Bayes algorithm with using feature extraction technique (Tfidf) shows a high accuracy level in all performance metrics, as the accuracy metric along with other performance metrics show scores close to (92%) as shown in Table 2.

Table 2. The performance of the Naive-Bayes algorithm

On the other hand, the deep learning technique with using the LSTM layer shows as well a high level of accuracy. Though, before start calculating the accuracy level in the neural network model (RNN-LSTM), the number of the predicted classes from the test dataset should match the same number of the test data from the students’ enquiries (X). Therefore, in this model we selected randomly the first (518) data from both lists in order to be sure are both have same number of records as shown in Fig. 8.

Fig. 8.
figure 8

The accuracy score in evaluating the RNN-LSTM model.

Afterward the accuracy level shows a high score as it reached (91.4%) as explained in the above-mentioned python code (see Fig. 8).

For more clarifications, the below Table 3 shows the comparisons in the accuracy level along with the main features that used in both models:

Table 3. The comparisons of the accuracy level for the performance in both models.

Interestingly, that a chatbot has been evaluated successfully with using both models. Figure 9 shows the real requests that asked by students. For example, the student is asking about the requirement of BUiD, and as we can see, the chatbot is able to understand the question to classify it to the appropriate class Majors & Universities. The second example shows that student is enquiring about the AP exam at year 12 and the chatbot is able to understand it as well and assign it effectively to the class HighSchool Course as shown in Fig. 9.

Fig. 9.
figure 9

Testing the chatbot with using real students-enquiries.

4.1 Validation Models

In this study, the models have been validated as a pilot testing by using high school students as well as experts in education. The researchers presented six questions and enquiries to the chatbots for the evaluation. The below Table 4 shows the classifications of these answers based on each model:

Table 4. The validation of the models

According to the educational experts’ answers, the chatbot with Naïve Bayes model outperformed the RNN-LSTM; for instance, the response to enquiry # 6 with using the RNN-LSTM model does not align with that of the human educational experts.

5 Conclusions and Future Work

The chatbots and conversational agents play an essential role in enhancing the students’ success from different perspectives, for example, high schools are the most crucial stage in student’s live as they have higher stress levels regarding their future compared to other students. Therefore, these students want advices more than any others, they need help to find the best-fit universities and courses that can fit with their passions and goals. In this study, the novel classifier chatbot is developed to understand the intention behind the student’ request and enquiry, and accordingly it will be able to categorize it to the right classification. As a result, it can add value to high schools by providing students with 24/7 support services with can increase the efficiency in students’ service processes.

Two novel classifier chatbots are developed and evaluated, where the first chatbot is developed by using a naive bayes machine learning Algorithm, while the other is developed by using recurrent neural networks (RNN)- LSTM. Some features and techniques are used in both models in order to improve the performance, for example the Tfidf function is used as a feature engineering technique to improve the Naïve Bayes model. Where as, in contrast, tuning the hyperparameters in the neural network layers is used to improve the accuracy level. Take for example, the tuning of the size of epochs and batches, as well as, the number of neurons in the LSTM layer. Interestingly, both models reveal high accuracy scores which exceed (91%). This study can add value to the team of researchers and developers who are interested in using state-of the-art algorithms in recognizing different aspects of data. Moreover, schools and educational institutions can benefit from these techniques by improving the student services effectively. In the future, the chatbot intent classification will be improved to include the voice input in order to embed it into phone services, which can assist schools by transferring student calls immediately to the right agent.