Background

Team teaching is a widely used teaching method in integrated curricula. This method is one of the unique features of medical education, in which each course consists of a series of classes run by multiple instructors who majored in medical subjects [1, 2]. The advantage of this method is that each instructor conveys his or her specialized knowledge to medical students. However, students must adapt to the different styles of each instructor [3]. In addition, if deviations exist in instructors’ teaching styles, these differences can affect academic productivity [4]. Taking this into account, researchers have shown increased interest in integrated curriculum management to maintain consistency across courses [5, 6].

Given the team teaching practice in integrated curricula, a single course evaluation cannot capture students’ perceptions of classes taught by multiple faculty members. One reason individual class evaluation is required is that, because there are differences in instructors’ teaching methods and proficiency levels [7], the results of course evaluations cannot include students’ perceptions of all instructors or classes. Another reason is that it is practically impossible for students to remember and comment on multiple instructors’ classes. Thus, the course evaluation, which has been used so far, has limitations in properly assessing the integrated curriculum.

One of the intended goals of medical school lectures is to enhance academic achievement. It was reported that medical students expected to acquire facts through lectures [8]. Schools also expect students to have medical expertise. Therefore, in this study, we tried to find effective teaching characteristics in class by examining which areas of class evaluation were related to academic achievement. Previous research has reported that medical student satisfaction with a seminar was not correlated with academic achievement [7]. However, the study did not present an operational definition of satisfaction with the seminar. Readers could not discern the criteria used in the course evaluation. Another limitation of this approach was that, since the research setting was a discussion seminar, it was difficult to generalize the result to a lecture setting, which is common at Korean medical schools. More recently, another investigation examined the effects of team teaching on medical student performance [9]. Students who participated in team teaching sessions responded that this approach improved their comprehension and concentration and enhanced their interest in the topics. Because measurement of performance was based on students’ subjective responses, it could not be determined whether team teaching was related to objective academic performance.

The present study qualitatively analyzed students’ comments about each class. The reason for the qualitative analysis is to reduce errors when they reply to each item with the same answers [10], and to recognize strengths or weaknesses of instructors who influenced students’ learning processes. Several attempts were made to identify critical features of effective instructors. Ullian and colleagues [11] reported that students considered ten categories important for clinical instructors, including knowledge and clinical competence. Lin [12] also analyzed behaviors of problem-based learning tutors and reported that medical students preferred instructors with tutoring skills, medical and clinical knowledge, and positive personality. A qualitative study in Korea analyzed important characteristics of excellent lecturers at one medical school [13]. In the study, adequately summarizing learning content was regarded as the most important characteristic of a good lecturer. Some studies were carried out on excellent instructor’s features; however, to date, no studies have been conducted that investigated the association between students’ subjective comments on a class and the objective outcomes of the class.

The aim of this study was to establish the necessity of individual class evaluation and to confirm teaching characteristics related to academic achievement. For this, the authors qualitatively classified students’ comments on classes by connotation (positive or negative) and by subject, and analyzed if the test scores of positively mentioned classes were higher than those of negatively mentioned classes within each subject category. The results for class evaluation were compared to those for course evaluation. We hope that this research supports medical schools in managing the quality of classes in integrated curricula, and instructors to recognize the things to consider in class preparation.

Methods

Basic medical education curriculum of Catholic University, College of Medicine in Seoul, Republic of Korea consists of a four-year program. Excluding clinical clerkship, 47 courses and 3329 classes are taught each year. The median number of classes and instructors per course is 49 (range, 15 to 135) and 17 (range 3–71), respectively. The median number of students per class is 95. Teaching methods in our curriculum included clinical clerkship, lecture, practice, case discussion, team-based learning (TBL), and student presentation, etc. Except for clinical clerkship, the most commonly used teaching method is lecture (71%). The teaching methods in each course showed similar distributions. Out of all courses from the first to the fourth year, we excluded fourteen courses that did not use computer-based test (CBT) as well as clinical clerkship, because it was not possible to calculate scores for each class in non-CBT classes. Accordingly, thirty-three courses were chosen for statistical analysis.

Two evaluation instruments were used to collect students’ evaluations of both classes and courses (Table 1). The items were developed by our school’s integrated curriculum committee of medical education specialists. The evaluation tools were composed of Likert scale ratings and open-ended questions. During 2014 and 2015, class assessments were completed online within 24 h after conclusion of the classes. Courses were evaluated within one week after completion. Class evaluations and course evaluations were conducted before students identified their scores so that knowing their scores did not affect class or course evaluations. It was assumed that because course evaluation was conducted at the end of courses, it would not be easy for students to recall all classes retrospectively. Thus, students were encouraged to comment on each instructor in class evaluations.

Table 1 Course and class evaluation tool

Students’ comments in each evaluation were manually reviewed and grouped by connotation and subject. The comments had positive, negative, or neutral connotations. Positive comments were encoded as 1, and negative comments were encoded as 0. We identified 19 themes in their comments (Table 2). Brief descriptions of the categories are as follows. Evaluation consists of comments on assignments and exams. Schedule includes class arrangement within a course and class-hour management, and workload/pace covers academic workload and pace of progress. Difficulty refers to whether an instructor easily explained contents. Punctuality reflects whether an instructor observed class hours, and main points indicates if learning objectives were clearly conveyed. Speaking refers to an instructor’s voice volume and speed of speaking. Attitude refers to an instructor’s attitude toward students, and media/contents involves effects of utilizing media and clinical cases. Practice/participation includes degree of student participation in simulations and discussions. Faculty consists of comments on coherence of classes by instructors, and interest covers how interesting a course or class was. Communication refers to degree of interaction between an instructor and students, teaching methods involves effects of class activities, and materials is about whether an instructor uploaded the materials in advance and requests to improve materials. Physical environment includes the physical characteristics of classroom settings. General comments include overall remarks such as “Thank you.” Preparation covers an instructor’s readiness for a course or class; finally, quizzes was about effectiveness and difficulty of quizzes.

Table 2 Frequency of students’ comments across categories

During each course, student academic achievement was assessed with several exams. Most exams were online, structured as CBT containing multiple choice questions (MCQs). Properly structured MCQs clearly reflect predefined learning objectives and assess students’ abilities from knowledge to problem-solving [14], so test scores in this study means the degree of achievement of learning objectives. In addition, MCQs were used as a primary student evaluation tool in our basic medical education curriculum. Because of difficulty of standardizing test results, only MCQs were used as the evaluation tool for academic achievement. To produce score for class, we divided the MCQs used in a course’s exams by the corresponding classes, and re-calculated an average score for the class. Course score was an average score for exams taken during the course. The scores were presented as percent.

To determine significant associations between positive-negative comments and scores, we applied a few statistical tests to the data. First, we tested if students’ positive and negative comments were associated with their scores in classes and courses. Furthermore, after dividing the comments into categories, we verified that positive and negative comments within each category were related to the scores in classes and courses. Data distribution was assessed by the Kolmogorov-Smirnov test and the Shapiro-Wilk test. The score variable did not satisfy the normality. Thus, differences in scores between positively and negatively evaluated classes/courses were tested using the Mann-Whitney U test, with significance noted at p < 0.05. SPSS 19 software was used for analysis.

The study protocol (MC15EISI0121) was approved by the institutional review board of the Catholic University of Korea, College of Medicine. In accordance with the Declaration of Helsinki and its later amendments, all personal information was anonymized.

Results

The descriptive data

From 2014 to 2015, 79.3% of students completed class evaluations. Of the 43,109 class evaluations, 7832 comments (18.2%) were made. In terms of course evaluations, 70.8% of students responded. Of the 2847 course evaluations, 1714 comments (60.2%) were provided. The median scores of students who did not comment and who commented on classes were 80% and 65%. Mann-Whitney U test showed a significant difference in the scores of two groups (P = 0.000). For course evaluations, the median scores of students who did not comment and who commented were 77.4% and 76.8%. The test of Mann-Whitney U showed a significant difference in the scores (P = 0.047). The statistical difference in course evaluations might be due to the large sample size. Of the 7832 comments on classes, we excluded 528 items that were classified as neutral comments. Besides, 2922 items from non-CBT classes that did not produce scores were excluded. Finally, 4382 items were used for statistical analysis. The same criteria were applied to course evaluation data, so 955 of 1714 items were used for analysis. Although the rates of leaving comments were low, there was a reason to investigate them. We found that students’ survey fatigue from Likert scale rating increased with time, so all the results from Likert scale rating could not be used. Yet, making comments was less affected by survey fatigue than Likert scale rating.

Ratios of positive comments to total comments on each course were calculated, and we divided a higher group and a lower group by an average (0.52) of the ratios. In the same way, ratios of positive comments to total comments on each class were made. Figure 1 showed distribution of ratios of positive comments on each class in the higher and the lower group. Classes with higher ratios of positive comments were found in the lower group as well as in the higher group. However, the lower group had more classes with lower ratios of positive comments.

Fig. 1
figure 1

Distribution of ratios of positive comments on each class in higher and lower groups

The categorization of comments

As shown in Table 2, students made the most comments in course evaluations in order of schedule, general comments, media/contents, practice/participation categories. In class evaluations, on the other hand, the most comments were made in order of Difficulty, Main points, Interest, Media/contents.

The distribution of positive and negative comments by teaching methods was illustrated in Table 3. The Chi-square test result showed that students’ positive or negative comments were significantly related to types of teaching methods. Additionally, we conducted Kruskal-Wallis test to determine if comments on specific categories were indicative of test scores. The test results showed that there were significant differences in test scores across different category groups (P = 0.000).

Table 3 Chi square test result between students’ comments and teaching methods

Identifying effective teaching characteristics in course and class

Table 4 provides results of the Mann-Whitney U test between positively and negatively mentioned courses/classes. Classes with positive descriptions exhibited higher class scores compared to classes with negative descriptions (P = 0.043). The same was true for course evaluation data. There was a significant difference in course scores between positively and negatively mentioned courses (P = 0.000).

Table 4 The results of the Mann-Whitney U test between positively and negatively mentioned groups

Test results for each subject category revealed that effective teaching characteristics in class differed from effective teaching characteristics in course. As shown in Table 5 and Fig. 2, only the practice/participation category in course evaluation indicated a significant difference in course scores (P = 0.005). In the other categories, course scores were not statistically different between positively and negatively described courses. In contrast, analysis of the class data showed that significant differences in class scores existed in six categories (Table 5, Fig. 3): difficulty (Fig. 3a; P = 0.000), main points (Fig. 3b; P = 0.001), attitude (Fig. 3c; P = 0.04), media/contents (Fig. 3d; P = 0.000), interest (Fig. 3e; P = 0.022), and materials (Fig. 3f; P = 0.01). Table 6 provides examples of students’ comments on the six significant subjects. The five categories of schedule, workload/pace, speaking, communication, and teaching methods showed slight differences in class scores but were not statistically significant.

Table 5 Comparison of scores in a positively and negatively mentioned courses/classes across categories
Fig. 2
figure 2

Score distribution of positively and negatively mentioned courses (practice/participation)

Fig. 3
figure 3

Score distribution of positively and negatively mentioned classes in six significant categories. a Difficulty. b Main points. c Attitude. d Media/contents. e Interest. f Materials

Table 6 Examples of students’ comments on the six significant subjects

Discussion

The present study was set out to prove the necessity of class evaluation in integrated curricula and to identify effective teaching characteristics in class. The results of this study supported the following conclusions.

Students’ evaluations of multiple-instructor courses in a single, retrospective manner might not adequately represent their true experience in classes. The importance of class evaluation was confirmed in two aspects. First, effective teaching characteristics in class differed from those in course. For course evaluation, it was only one category that was associated with students’ scores, but six categories in class evaluations were significantly related to class scores. This result is in line with that of a preliminary study that compared prospective class assessment and retrospective course assessment [15]. In the study, the researchers concluded that evaluation immediately after classes was higher than retrospective course evaluation in quality, since medical students could not remember and rate every class in course evaluation. Similarly, the course evaluation data in this study showed that many students mentioned the most memorable point that was good, rather than providing specific comments on classes that consisted of the course. Second, the fact that classes with higher ratios of positive comments were found within the lower ratio course group suggests that a course may be negatively evaluated by a handful of classes with lower ratios of positive comments (Fig. 1). This interpretation cannot be derived from course evaluation alone.

According to previous studies [1, 16], course evaluation has typically been conducted at the end of courses. Findings of this study, however, imply that continuous class evaluation is a practical tool for integrated curriculum management. Recently, researchers have begun to describe their experiences with an integrated curriculum at medical schools [5, 6]. One medical school in the research has set one of its goals as effective curriculum management in the process of restructuring into integrated curriculum [5]. As part of this process, teaching activities and productivity were monitored through student comments, performance in exams, and class audits to sustain uniformity and harmony between courses and within a course. The medical school in this study also concluded that there was a limit to utilizing course evaluations to manage the quality of classes in an integrated curriculum. Through this research, we demonstrated quantitative and qualitative differences between class and course evaluations.

Second, key elements that instructors should keep in mind were identified to enhance learning outcomes. Classes that received positive comments from students in the categories of difficulty, main points, attitude, media/contents, interest, and materials produced higher academic achievement than classes that received negative comments.

The effects of difficulty and main points on academic achievement represent needs of medical students and show that these two factors are related to medical student academic achievement. The results of this study support previous research that investigated requisites for good teaching [17]. The study found that the most satisfactory type of lecture for medical students was that in which they could easily learn important contents. Students and professors answered that if they had the opportunity to attend other lectures, they would see whether the lecture would make sure students knew main points. Results of the current study also reveal that medical students want instructors to easily and clearly explain knowledge, and that if instructors fail to control the difficulty of classes and deliver main points, student academic achievement can be affected.

In high-scoring classes, students’ positive comments on media/contents showed that contents were well remembered when students were exposed to related pictures, videos, and various clinical examples. These results are consistent with recent research results, suggesting use of multimedia and case presentations to increase the effects of teaching in medical schools [18, 19]. Further, the result can be explained by the multi-media theory. In other words, activation of information processing by utilizing multiple channels of visual/pictorial and auditory/verbal learning contents leads to meaningful learning [20].

The emotional domains such as attitude of instructors and degree of interest were found to be related to academic achievement. It has been reported that instructors’ respect for students and linguistic or nonverbal enthusiasm influenced pharmacological students’ potential vitality, inner motivation, and final course grades [21]. With respect to interest, comments from the high-scoring classes in this study included “the professor was witty,” and “the contents of the class were really interesting.” Whether the comments came from humor of instructors or from academic interest, the students in the classes seemed to focus indirectly on interesting objects and to acquire knowledge through exploratory activities with the objects [22].

Finally, students often left negative comments on class materials in the classes that received low scores. Comments included, “I wanted the professor to upload materials before the class starts,” or “I didn’t know what the professor was explaining because there were not indicators at all on the slides.” Because lectures are still one of the major instructional forms in medical schools [2, 8, 13, 23], presentation slides as a supplementary material needs to be properly designed according to students’ needs [23].

The results presented here should be interpreted in a setting of online MCQ evaluation. The researchers excluded data of classes that used something other than an online CBT system including MCQs. That is, it is difficult to say that effective teaching characteristics in class found in this study are valid in other classes that used essays and project evaluations. In addition, students’ neutral comments such as suggestions for the online system and for future directions were not included in the analysis. Thus, the results of this study do not reflect the opinions of all medical students in our school.

Notwithstanding these limitations, the present study demonstrated that effective teaching characteristics in class and those in course were different from each other, suggesting the need for individual class evaluation for quality control in an integrated curriculum. In addition, we confirmed six characteristics of classes related to medical student academic achievement.

Conclusions

In integrated curricula, course evaluation may not comprehensively reflect student perception of the courses. From this research, it was found that individual class evaluation is required to manage courses run by multiple instructors. School authorities can use these findings to ensure that instructors have effective teaching characteristics for students to develop medical expertise through classes.