Participants
To determine whether the use of reflective writing in heterogeneous cluster grouping for literature study could increase students’ empathy, critical thinking, and reflective writing, two homogeneous and normally distributed classes were used as the experimental group (43 students) and as the control group (43 students). Assignment to group was determined by coin flip. All students belonged to freshmen or juniors; therefore, they had little clinical experience. Information about the research was provided before the students agreed to participate. The participants were informed about the research purpose and were required to sign an informed consent form in accordance with the Declaration of Helsinki [27]. To ensure confidentiality, the students’ personal identities were protected and all data were identified only by numbers. However, in order to avoid the Hawthorne effect or John Henry effect, the students were not informed which groups they were in. All 86 participants (mean age = 18.32; SD = 0.42) were college students at colleges of medicine, medical science and technology, health care and management, and medical humanities and social sciences, with at least six years of English learning experience.
Cluster algorithm for heterogeneous cluster grouping
The cluster algorithm for heterogeneous cluster grouping was developed as below.
At this step, the pretest scores on the empathy scale and the critical thinking disposition assessment were collected. To avoid using different standards to measure students’ different empathy levels and critical thinking disposition levels, Eq. (1) was used to normalize the data.
$$ {Z}_{ij}=\frac{X_{ij}}{X_j^{\max }}, $$
(1)
where
X
max
j
= max{X
ij
, i = 1, 2, …, n}, j = 1 for the empathy scale score, 2 for the critical thinking disposition assessment score.
The notation is described as follows:
Z
ij
is the ith student in the jth normalization score in the empathy scale/critical thinking disposition assessment.
X
ij
is the ith student in the jth initial score in the empathy scale/critical thinking disposition assessment.
To encourage “diverse thinking” in heterogeneous learning clusters, students should follow the following guidelines to create their heterogeneous learning clusters. First, each learning cluster should contain four to five students, including at least two females and two males. Second, each student in a learning cluster should come from a different department and college. The researchers could then compute the diverse thinking effect in each learning cluster from the normalized score in the empathy scale/the critical thinking disposition assessment. Equation (2) provides the index.
$$ \left|{\alpha}_1{\displaystyle \sum_{i=1}^n{\overline{Z}}_{i1}+{\alpha}_2{\displaystyle \sum_{i=1}^n{\overline{Z}}_{i2}}}\right|\le \vartheta $$
(2)
Where α1 and α2 are the fuzzy weights. ϑ is the threshold.
\( {\overline{Z}}_{ij}={Z}_{ij}-{\overline{Z}}_j \) for j = 1, 2.
\( {\overline{Z}}_j \) is the mean of the jth normalization score in the empathy scale/critical thinking disposition assessment.
To arrange the derived heterogeneous learning clusters, the researchers computed the index of diverse effect and validated whether the index was less than or equal to the threshold. If necessary, the researchers assigned or adjusted the clusters based on the guidelines to compose their heterogeneous learning clusters.
Cluster algorithm for non-heterogeneous cluster grouping
The cluster algorithm for non-heterogeneous cluster grouping was developed as below.
After collecting the pretest scores on the empathy scale and the critical thinking disposition assessment, in order to avoid using different standards to measure students’ different empathy levels and critical thinking disposition levels, Eq. (3) was used to normalize the data.
$$ {Z}_{ij}=\frac{X_{ij}}{X_j^{\max }}, $$
(3)
where
X
max
j
= max{X
ij
, i = 1, 2, …, n}, j = 1 for the empathy scale score, 2 for the critical thinking disposition assessment score.
The notation is described as follows:
Z
ij
is the ith student in the jth normalization score in the empathy scale/critical thinking disposition assessment.
X
ij
is the ith student in the jth initial score in the empathy scale/critical thinking disposition assessment.
To create the non-heterogeneous learning clusters, students should follow the following guidelines. First, each learning cluster should contain four to five students, including at least three females or three males. Second, students in each learning cluster should come from the same department or the same college, that is, they should have similar majors. The researchers could then compute the same thinking effect in each learning cluster from the normalized score in the empathy scale/the critical thinking disposition assessment. Equation (4) provides the index.
$$ \left|{\beta}_1{\displaystyle \sum_{i=1}^n{\overline{Z}}_{i1}+{\beta}_2{\displaystyle \sum_{i=1}^n{\overline{Z}}_{i2}}}\right|\ge \theta $$
(4)
where β1 and β2 are the fuzzy weights. θ is the threshold.
\( {\overline{Z}}_{ij}={Z}_{ij}-{\overline{Z}}_j \) for j = 1, 2.
\( {\overline{Z}}_j \) is the mean of the jth normalization score in the empathy scale/critical thinking disposition assessment.
To arrange the derived non-heterogeneous learning clusters, the researchers computed the index of diverse effect and validated whether the index was greater than or equal to the threshold. If necessary, the researchers assigned or adjusted the clusters based on the guidelines to compose their non-heterogeneous learning clusters.
Experimental design
The study investigated whether the use of heterogeneous cluster grouping in reflective writing about medical humanities literature could reveal any differences between the experimental group and the control group in terms of empathy, critical thinking, and reflective writing. The researchers used a quasi-experimental design in the study because in classroom research neither random selection nor random assignment of students to classes was possible. In addition, in classroom interactions, male students speak longer and more frequently than male students [28, 29]. In order to hear each gender’s voice, after expert panel discussion, in heterogeneous grouping, the number of male students was equivalent to the number of female students. Therefore, there should be at least two males or two females in order to let both male and females can speak up their opinions. However, in order to form a non-heterogeneous learning cluster, there were three males or three males per cluster. After conducting the cluster algorithms for heterogeneous and non-heterogeneous cluster grouping, heterogeneous learning clusters (experimental group) and non-heterogeneous learning clusters (control group) were derived. The students in the control group worked in non-heterogeneous cluster grouping to reflective writing for medical humanities literature study, while the students in the experimental group worked in heterogeneous cluster grouping. Therefore, the post-experimental difference could be attributed to the intervention of heterogeneous cluster grouping.
Before the experiment, an Empathy Scale in Patient Care (ES-PC), a critical thinking disposition assessment (CTDA-R), and a reflective writing test were administered to both groups to collect, analyze, and compare the test results. The pretest results showed that both groups were homogeneous in empathy, critical thinking disposition, and reflective writing (see Tables 1, 3, and 5). In the experiment, one researcher taught both the control group and the experimental group. In order to prevent coercion and experimenter bias, the researchers did not get involved in the survey data collection. One well-trained research assistant was responsible for collecting the survey data.
Table 1 F-test for the homogeneity of regression slope assumption for Group*the empathy pre-test Both groups of students could read the literature they preferred, along with the requested teaching content and discussion topics (shown in Additional file 1). The learning activities included independent study, discussion forums, reflective writing, class presentations, and so on. The treatment lasted 15 weeks, with classes occurring two sessions per week, plus at least two hours of e-learning, in which discussion forum was facilitated. After reading medical humanities literature, including short stories, novels, and film literature, both groups went to the Medical Humanities and English Learning website (as Additional file 2) to post their reflections about troubling medical care situations or moral/ethical dilemmas in the lecture topics or the material they read. They were required to write a reflection per week, through which they responded for or provided reasons for the actions taken, using critical thinking and justifications.
The discussion forum served as a collaborative learning tool, by which, through discussion and writing, students who were less reflective could get involved in the discussion process, thus learning better writing and communication. Although monitoring the discussion, the instructor did not participate in the discussion. Each student was the facilitator of his/her learning cluster to critique and discuss ideas with one another in the forum. In 15-min class presentations, students were randomly selected to share their ideas, thoughts, and feelings with their peers. After intervention, the students were compared in the post-test results of the empathy scale, critical thinking disposition assessment, and reflective writing to evaluate their learning performances.
Instrumentation
Empathy Scale in Patient Care (ES-PC)
In order to measure students’ levels of empathy, the researchers developed the Empathy Scale in Patient Care (ES-PC; see Additional file 3) to test students’ empathy awareness. After exploratory factor analysis, the researchers extracted 23 items and three factors, based on a nine-point rating scale, with nine meaning “strongly agree” and one meaning “strongly disagree.” The three factors were: behavioral empathy (nine items), affective empathy (seven items), and intellectual empathy (seven items). The higher the score, the more importance a student places on the issues related to empathy presented in the survey. The Cronbach’s alpha values for the three subscales were 0.93, 0.87, and 0.88, and the Cronbach alpha for the entire questionnaire was 0.94. The test-retest reliability of the final version of the scale was 0.89.
Critical Thinking Disposition Assessment (CTDA-R)
Yuan, et al’s [30] nine-point Likert scale of critical thinking disposition assessment (CTDA-R; see Additional file 4) was used to measure the participants’ levels of critical thinking disposition, ranging from 9 (strongly agree) to 1 (strongly disagree). The CTDA-R included 19 items and three factors: systematicity and analyticity (eight items), inquisitiveness and conversance (six items), and maturity and skepticism (five items). The students who received higher scores on the scale are interpreted as the students having higher levels of critical thinking disposition. The Cronbach’s alpha values for the three subscales were 0.93, 0.88, and 0.88, respectively, and the Cronbach’s alpha for the entire questionnaire was 0.95. The test-retest reliability of the final version of the scale was 0.87.
Analytic Reflective Writing Scoring Rubric (ARWSR)
To understand the performance of reflective writing in heterogeneous cluster grouping, after an extensive literature review and expert panel discussion, an Analytical Reflective Writing Scoring Rubric (ARWSR), using 0- to 5-point rating, was designed to assess students’ reflective writing ability in focus & context structure, ideas, voice & point of view, critical thinking & representation, depth of reflection on personal growth, and language & conventions. In the analytical scoring rubric, the students’ performances were divided into essential dimensions or components and scored separately. To verify the content validity, a multidisciplinary team went through runs of panel discussion to consensually agree upon the descriptions and the psychometric properties of the rubric. Scores between 0 and 6 indicated a failure or unacceptable reflective writing; scores between 7 and 12, poor reflective writing; scores between 13 and 18, acceptable reflective writing; scores between 19 and 24, strong reflective writing; scores between 25 and 30, excellent reflective writing. Two raters independently rated seven reflective writing pre-test performances. Based on the data from the baseline (n = 7) ratings, the mean score was 19.81, with a maximum possible score of 30. The inter-rater reliability using Cronbach’s alpha was 0.82. In the study of 86 participants, the inter-rater reliabilities between the ratings of the first and second rater were between 0.82 and 0.88 (p < .001), revealing the overall agreement of the two raters.
Data analysis
The quantitative data was analyzed using Statistical Packages for Social Science Version 14.0. ANCOVA (Analysis of Covariance) was used to compare the post-test scores, using pre-test scores as a covariate to control the probable initial group differences. The post-test results were also compared using an ANCOVA to see the effects of intervention in terms of empathy, critical thinking disposition, and reflective writing. The 95 % confidence level (p < 0.05) was used as the criterion level to determine the statistical significance.