Introduction

University teaching practices are a major area of interest for educational researchers (Harbour et al., 2015; Slavin & Lake, 2008). Teaching practices play a role in stimulating students’ interest, engagement, learning, and academic performance (Vercellotti, 2018). A paradigm shift in university teaching is currently taking place; expository teaching practices are being questioned and gradually replaced by active methodologies, professional simulation practices and interactive practices, among others (Carr et al., 2015; Roberts, 2019). However, it is necessary to understand students’ preferences for teaching practices because they impact the emotional engagement and performance of students (Smith & Baik, 2021).

A suitable way to assess students’ preferences for teaching practices is through open-ended questions (Hills et al., 2016). However, the data coding used in this method prevents working with a large number of samples and requires considerable processing time (Rahman, 2016). Currently, these problems can be overcome due to advances in artificial intelligence (AI), as they facilitate and optimise the performance of these tasks (Hirschberg & Manning, 2015). Employing a language processing model makes it possible to accurately and efficiently analyse large amounts of text. This, in turn, allows researchers to gain a deeper understanding of the topic of study and to thus draw more meaningful conclusions (Johnson & Onwuegbuzie, 2004).

Therefore, the aims of this paper are (1) to assess university students’ preferences for teaching practices through open-ended questions and (2) to encode the information using an AI-based tool. In this way, we can assess whether the tool is sufficiently reliable to analyse data collected through open-ended questions. In addition, this method enables us to identify which teaching practices are preferred by university students, which could help researchers and teachers account for these aspects when designing teaching and learning programmes.

The introduction is divided into two parts. The first part elaborates on the importance of learner preferences, while the second provides a more detailed description of the AI-based tools capable of analysing large amounts of text.

Students’ preferences for teaching practices

Previous studies have addressed the need for academic communication that motivates emotional engagement on the part of university students through the teaching practices employed by their teachers (Chalmers et al., 2018; Könings et al., 2011; Tronchoni et al., 2021). Several studies advocate reducing the use of lectures for large groups and employing active methodologies with regular feedback for students (Carr et al., 2015; Chalmers et al., 2018; Hardman, 2016; Moliní Fernández & Sánchez-González, 2019; Roberts, 2019; Steen-Utheim & Wittek, 2017). To date, however, there has been little discussion about students’ preferences within these methodologies. A study in which the way that students experience their university classes, how they value their active learning experiences and what preferences they have in this respect is assessed is needed to maximise their emotional engagement with the subject matter and ensure their success (Alegre & Villar, 2017; Könings et al., 2011; Slater & Davies, 2020).

Previous studies on students’ preferences have mainly focused on specific areas of knowledge or specific teaching practices. For instance, Minhas et al. (2012) assessed the preferences of health students for teaching practices and found a preference for seminar-based learning over lectures. Another study conducted by Opdecam et al. (2014) assessed the preferences of first-year university students for teamwork-based activities over lectures. Their results showed that the students’ preferences clearly differed depending on gender (women preferred teamwork more than men did), level (students with a lower profile preferred teamwork) and motivation (teamwork had a high acceptance among the most intrinsically motivated students). Nevertheless, a global understanding of preferred teaching practices in general is lacking.

In regard to the study of students’ preferences, we encounter a major problem: the existence of different names for similar constructs and similar names for different constructs. This problem is known as the jingle-jangle fallacy (Marsh, 1994; Marsh et al., 2003) and leads to confusion, information elusiveness and misinterpretation. To avoid these fallacies, in this study, we build on the categories identified in a recent systematic review of teaching practices in universities (Smith & Baik, 2021). The nine categories of teaching practices applied in this research are as follows:

  1. 1.

    Clarity: Teaching practices in which the structure and content of knowledge are clear to students. These practices require planning, organisation and structure in the content and delivery of lectures and practical classes.

  2. 2.

    Research: The use of methodological approaches that encourage problem solving, enquiry and testing, such as problem- or case-based learning.

  3. 3.

    Application: The conducting of exercises and activities that require the use of knowledge gained through active learning in different situations or contexts.

  4. 4.

    Experiential: A particular type of application in which practical and experiential, authentic or real learning is developed through the learner’s own experience (e.g., work placements in other institutions).

  5. 5.

    Challenges: Practices meant to achieve interest and deep cognitive engagement through didactic proposals that challenge students’ thinking, expression, or action.

  6. 6.

    Relevance: A set of teaching practices that highlight the value, purpose or impact of the interventions to be addressed in a learning or professional development process.

  7. 7.

    Interaction and relationships: A set of communicative and relational processes between teachers and students and among the students themselves (collaborative learning, peer tutoring, classroom dialogue, etc.).

  8. 8.

    Consolidation: A set of correction, recovery and revision practices meant to help identify errors and other comprehension problems to address them correctly and adequately.

  9. 9.

    Self-regulation: Self-assessment and self-monitoring practices conducted independently by the students themselves to plan, organise and correct their own comprehension errors, which leads to the achievement of cognitive training and a greater awareness of progress.

These practices capture Smith and Baik’s (2021) findings at a general level of abstraction. By employing these categories, we can classify the information from student responses to open-ended questions at a level that can be easily understood by both teachers and students. In this way, it may be easier to integrate students’ preferred practices into teachers’ professional performance.

Text analysis with AI

We can take different approaches to answer the question of which teaching practices are preferred by university students. For instance, some studies attempt to answer this question through the use of self-report questionnaires (Aridah et al., 2017), while others do so through open collection methods (Hills et al., 2016). Consequently, the method of data analysis differs. In the first case, the analysis is quantitative and usually performed by using statistical techniques, while in the second case, the analysis is qualitative and conducted through content analysis of emerging categories (Cohen et al., 2000; Lodico et al., 2010).

Between these two approaches, the use of open collection methods such as open-ended questions or interviews facilitates a better expression of ideas among students, allowing researchers to gain a deeper understanding of the relevant issue (Johnson & Onwuegbuzie, 2004). However, studies following this methodology are often limited in sample size or data processing time due to the large amounts of collected information that need to be coded and analysed (Rahman, 2016).

A review of the various studies related to the nine categories of good practice, as identified by Smith and Baik (2021), shows that, thus far, most studies designed to collect large amounts of information from a large number of samples used questionnaires or standardised scales as data collection techniques for quantitative analysis (Könings et al., 2011; Minhas et al., 2012; Opdecam et al., 2014; Vercellotti, 2018). In contrast, most other studies designed to collect information from a small number of samples or more easily manageable amounts of data used qualitative analysis (Steen-Utheim & Wittek, 2017) and descriptive statistics (Hardman, 2016). However, another form of data analysis is emerging thanks to AI.

Advances in AI over the last decade have made it easier to solve problems involving the processing of large amounts of data across all fields of knowledge. This is especially notable in biology, where the development of AlphaFold 2, an AI-based tool capable of predicting the three-dimensional structure of proteins, has been a milestone (Callaway, 2020; Jumper et al., 2021). Since researchers have gained access to this tool, the number of preprints and scientific publications in the field has increased significantly, as has our knowledge (Callaway, 2022). Similarly, great achievements are being made in fields related to textual analysis due to advances in natural language processing (Hirschberg & Manning, 2015).

The use of AI-based tools for textual analysis in education is a practice that has yielded positive results for a decade. For instance, consider the case of sentiment analysis, which involves a tool capable of extracting sentiment (positive, negative, or neutral) from large amounts of text (Rani & Kumar, 2017). Over the last few years, researchers have used this tool to study students’ evaluations of massive open online courses (MOOCs) and teachers through the use of open-ended questions (Geng et al., 2020; Rybinski & Kopciuszewska, 2021; Zhou et al., 2020). Some of this research has concluded that this tool is useful for course satisfaction evaluations, (Cunningham-Nelson et al., 2019), teaching analysis (Leong et al., 2012) and course improvement (Pong-inwong & Songpan, 2019). This might be an effective way to assess students’ preferences for teaching practices, but thanks to the transformer revolution (Vaswani et al., 2017), a door to the thorough analysis of responses to open-ended questions has been opened.

Transformers allowed language processing models to shift from being dependent on human training (in a supervised way) to being trained automatically (or self-supervised) through large corpora of textual data. This paradigm shift has resulted in large pretrained models with millions of parameters that are able to understand human language much better than their predecessors (Qiu et al., 2020). In this context, models that use deep learning to understand and generate high-quality text, such as the Generative Pre-trained Transformer-3 (GPT-3), have emerged (Floridi & Chiriatti, 2020). GPT-3 has two strengths: (1) its ability to understand written instructions in natural language (as one person would speak to another); and (2) its flexibility, since, having been trained only to understand and generate human-like text, it can perform many tasks for which it has not been specifically trained. These tasks include classification, sentiment analysis, programming, and textual summarisation, among many others (OpenAI, 2022). Despite the novelty of these models, several authors have already expressed a need to use them in the execution of tasks such as the coding of large amounts of information to then study their reliability (Qiu et al., 2020). Moreover, by using these novel models, many of the problems associated with the coding and analysis of information collected through open methods can be overcome.

Objectives of the current study

In this study, using AI (GPT-3), we analysed students’ answers to an open-ended question regarding their preferred teaching practices. Furthermore, we identified the university teaching practices that most satisfy students. The findings will determine whether the reliability results are good enough for the use of this tool in analysing open-ended student responses. This could open a door for the use of AI-based tools in the analysis of large amounts of qualitative data. Moreover, by using this method, we can discover students’ preferences for university teaching practices and thus better guide the processes of methodological change.

Methods

Participants

Participants were undergraduate and postgraduate students from 90 classes (42 in the first term and 48 in the second term) at the University of Cantabria, Spain representing different disciplines. The total number of respondents was 1081 (601 women and 480 men).

Procedure

We informed both teachers and students of the study objectives, and then we visited each class so that students could complete the questionnaires. Students completed the surveys in the classroom under the supervision of the teacher and researchers. These surveys consisted of several scales (Álvarez-Álvarez et al., 2022), but only the open-ended question on best teaching practices was considered in this study. The data were treated ethically and in accordance with the guidelines of academic university research, which stipulate confidentiality and objectivity.

Instruments

Teaching practices

Following previous studies in which specific open-ended questions are asked and the answers are then analysed using AI (Hynninen et al., 2019), we assessed best teaching practices from a student’s point of view by reviewing responses to the following open-ended question: “Comment and explain in your own words the best practice you have seen in this class and explain why you think it is successful in as much detail as you can so that other teachers can imitate it”.

To code the information collected through the use of the open-ended question, we used a classification system for teaching practices developed by Smith and Baik (2021) in their systematic review. This system consists of 9 categories of teaching practices, to which we added the category “0”, referred to as “none”, to classify student responses stating that there is no good teaching practice. The resulting rubric is detailed in Table 1.

Table 1 Rubric used to classify teaching practices.

GPT-3

We used the GPT-3 model (Brown et al., 2020) to code the responses to our open-ended question. Specifically, we used text-davinci-002, with the temperature set to 0.1 and the Top P set to 1. The instructions for the model included the sentence “Classify the comments in one of the following categories:”, followed by the categories defined in the rubric (Table 1). Afterwards, we provided the answers to the open-ended question for classification.

Data analysis

To calculate the reliability of the coding conducted with GPT-3, two researchers independently coded a random selection equal to 10% of the total sample following the procedure conducted by other researchers to assess the reliability of coding (Russ, 2018). Both GPT-3 and our coders classified each response into a single category of teaching practices. Reliability was calculated as the percentage of agreement (Brownell et al., 2013; King & La Paro, 2015) using the ReCal3 tool (Freelon, 2010). After coding students’ responses with GPT-3, we conducted a descriptive analysis of the results using JASP 0.16.2 (JASP Team, 2022).

Results

Coding reliability

The overall percentage of agreement between the two researchers and GPT-3 was 89.07%, which is considered satisfactory (O’Connor & Joffe, 2020). Among the various categories, this overall agreement percentage varied individually from 64.81% for the category “Interaction and relationships” to 97.53% for the category “Importance” (Table 2).

Table 2 Percentage of agreement in coding 10% of the total sample

Descriptive analysis

The frequency of each category used to classify responses according to the rubric is presented below (Table 3). In addition, one representative example from each category is presented.

Table 3 Frequency of each category of teaching practice

We can see that there are two categories of teaching practices that stand out, “Interaction and relationships” and “Clarity”, in which 30.53% and 28.86% of the responses were classified, respectively. The next highest ranking categories were “Application”, with 13.14% of responses classified, “Investigation”, with 10.64%, “Experience”, with 7.12%, “Self-regulation”, with 3.42%, "Consolidation”, with 2.40% and “None”, with 2.31%. Finally, the categories “Challenges” and “Importance” were almost residual, with 0.93% and 0.65% of responses classified, respectively.

Discussion

The present study aimed (1) to analyse students’ answers to an open-ended question on their preferred teaching practices using AI (GPT-3); and (2) to identify the university teaching practices that most satisfy students to better understand their preferences. The results showed that GPT-3 was able to classify responses to the open-ended question with a reliability remarkably similar to that of humans. They also showed that university students prefer practices that focus on clarity and those that focus on interaction and relationships. These findings are discussed below.

First, it is necessary to comment on the reliability of the coding performed by the AI-based model. As claimed by Qiu et al. (2020), one of the remaining challenges following the recent emergence of large pretrained language models is determining how to use them to code large amounts of information and then using that coding information to study their reliability. Surprisingly, the percentage of agreement between humans was remarkably similar to that between humans and AI, even in the category with the lowest percentage of agreement. The average percentage of agreement between researchers 1 and 2 was 90.56%, which was not far from that between researcher 2 and GPT-3 (89.81%) or between researcher 1 and GPT-3 (86.85%). These findings help overcome the challenge proposed by Qiu et al. (2020) by demonstrating GPT-3’s usefulness and reliability in coding large amounts of information. This opens the door to the use of AI-based models for coding and data analysis in other types of qualitative research. In this way, it will be possible to have a larger number of samples and shorter analysis times without losing the richness of the information obtained through open-ended collection methods, which often allow a researcher to reach more elaborate conclusions (Johnson & Onwuegbuzie, 2007).

Regarding the second objective, the results show a preference for those practices that focus on (1) clarity and (2) interaction and relationships. Students demand clear teaching practices where ideas and activities are presented and displayed unambiguously and show order, design, and planning. They also advocate the use of practices that are based on interaction and relationships (between teachers and students and among the students themselves) to share their concerns and doubts and to engender support of their learning processes in university classrooms. It is encouraging to compare these findings with those of Hattie (2008), who, after analysing more than 800 meta-analytic studies, found that effective teachers communicate clear content and assessment criteria and apply feedback both among students and between teachers and students. The results of this study also showed students’ preferences for teaching practices that focus on the investigation and application of knowledge. According to previous studies, these practices are useful when teaching students (Ambrose et al., 2010). These findings have implications for university teaching practices, demonstrating student interest in clear and active methodologies (Minhas et al., 2012; Opdecam et al., 2014). University teachers who wish to meet students’ preferences and achieve greater engagement in their teaching experiences need to rethink both aspects.

Limitations and future perspectives

Despite the contributions of this study, it also has some limitations that need to be addressed. According to a report carried out by UNESCO’s education sector (2019), the use of AI in educational research incurs several challenges that need to be considered by researches working with AI. Among them is the creation of inclusive models that are not biased due to the training of models with inadequate databases. In this study, this dimension is not considered, but future studies need to test for possible biases in the use of AI-based models. However, another challenge set by UNESCO is to increase the use of AI in educational research, so future research should continue to use this type of model to further explore and enable the advantages of this methodology for researchers. In addition, the results of this study were obtained using a pretrained base model. This implies that there is still room for improvement if the model were to be refined with data that had been previously classified by the researchers.

Another limitation of the current study was that only the preferences of students from Spain were assessed. Previous research has shown cultural differences in students’ preferences for teaching practices (Macfayden et al., 2003; Yang & Tsai, 2008). A cross-cultural study that includes teachers from other countries is needed to assess any differences in their use of engaging messages. Similarly, the global view followed in this study prevented us from analysing the results by student attributes such as knowledge or gender, even when previous studies have shown differences in students’ preferences across these variables (Opdecam et al., 2014). In future studies, it will be worthwhile to identify cohort trends in students’ preferences.

Finally, the findings of this research provide insights into the design of future interventions aimed at modifying university teaching practices. A methodological change in teaching practices could lead to an improvement in student interest, engagement, and performance (Smith & Baik, 2021; Vercellotti, 2018). One possible means of achieving this is through feedback-based interventions that leverage the use of technology (Falcon et al., 2023; Rodgers et al., 2019). Students could provide feedback on their preferred teaching practices, which can be analysed instantly with GPT-3 so that a teacher can adapt to the preferences of their students. Further research should be undertaken to explore this possibility.

Conclusions

In the present study, we performed an automatic classification using an AI-based tool of student responses to open-ended questions regarding their preferences for teaching practices. We then examined the results to determine which teaching practices are preferred by university students. First, we found that the reliability of the AI model regarding the classification task was similar to that of humans. Then, the results showed that students preferred practices that focus on clarity and those that focus on interaction and relationships. These findings open the door for the use of pretrained text generation models for large textual analysis and classification tasks. In addition, they provide university teachers with guidelines for developing their teaching practices. Learning how to better plan and develop lessons has been and will be a professional challenge for university teachers, and this study contributes to the identification and categorization of students’ preferences by highlighting the importance of clarity and interaction.