Disciplinary Differences in Teaching

The practice of evaluating teaching and courses in higher education, through Course And Teaching Evaluation Questionnaires (CTEQ) is now widespread to the extent that it is rare to find colleges which do not routinely evaluate courses. As the practice has become so extensive, a substantial literature has developed. Several reviews have concurred that course evaluation questionnaires give quite reliable and valid measures of teaching effectiveness (Aleamoni 1999; Feldman 1996, 1997, 2007; Marsh 1987, 2007; Marsh and Dunkin 1997; McKeachie 1997).

For all the research, though, there is a well-substantiated finding which does not seem to have been adequately explained. Comparisons of large samples of CTEQ data consistently show disciplinary differences. Teaching in the arts, humanities and social sciences tends to receive higher ratings than engineering and science, with other discipline groupings in intermediate positions. As reviews have concluded that CTEQ data are valid and reliable, this seems to indicate that teaching quality varies between disciplines. There does not seem to be any consensus, however, as to whether this is the case and if so why.

Surveys or reviews of studies on teaching or course evaluation questionnaires commonly show variations in ratings by academic discipline. Feldman (1978) reviewed 11 studies which compared ratings across disciplines and found that the humanities and arts tended to be rated higher than sciences, engineering and business administration. Cashin (1990) analysed large data sets from two widely used course evaluation instruments in the US to find that ratings tended to be higher in the arts and humanities than in science, engineering and business. Barnes and Patterson (1988) also produced closely related findings. There does, therefore, seem to be widespread evidence of disciplinary differences in ratings, with some consistency about which disciplines tend to receive better ratings. Arts, the humanities and the social sciences tended to be in a higher rated group, while science and engineering are on average lower. Other disciplines are intermediate or less consistent.

There is evidence of disciplinary differences on measures of teaching and learning other than CTEQ instruments. For example, both of the most widely used instruments for measuring approaches to learning have indicated disciplinary effects. Entwistle and Ramsden (1983) reported scores for the Approaches to Studying Inventory. English and history were higher on deep approach and lower on surface approach than physics and engineering. Biggs (1987) gathered data from a wide sample of Australian universities with the Study Process Questionnaire. Arts and science students had equal mean scores for deep approach in the first year, but in each subsequent year the arts students had higher mean scores.

The grouping of the disciplines has most commonly been interpreted using Biglan’s (1973) classification of disciplines (e.g. Neumann and Neumann 1985). Biglan proposed a 2 × 2 × 2 categorisation scheme for disciplines. The three distinguishing criteria were:

  • degree of consensus on paradigm development, hard versus soft;

  • presence of practical application, pure versus applied; and

  • presence of living organisms, life versus non-life.

The research into disciplinary differences in ratings has most often drawn upon the hard versus soft distinction. This may be because the broad field of disciplinary research has most commonly examined research-related concepts and the behavior of the professoriate (e.g. Becher and Trowler 2001). For undergraduate teaching, particularly in the initial years, these research-related constructs may not be particularly pertinent. For example, applied degrees commonly start by building a solid foundation of pure basic knowledge, leaving application to the final parts (Schön 1987).

The large survey by Cashin (1990) found that the disciplinary differences in ratings explained only a modest percentage of the variance, but of sufficient magnitude that the effect could not be ignored. As pointed out by Kwan (1999, p. 184) and Marsh (1987, p. 309) the percentage of variance shows the strength of the relationship between variables and the explanatory power. However, a more practical issue is the absolute effect on ratings, because that is what is taken into account when teacher evaluation ratings are used in appraisal judgements. That is better measured by an effect size.

Kwan’s (1999) study gives both the percentage of variance and effect sizes on the effects of factors on evaluation ratings. Discipline explained about 10% of the variance. Effect sizes for greatest differences between five discipline groupings were computed for each of the six scales in the particular CTEQ instrument. They ranged from 0.43 to 0.86 which means that disciplinary differences in CTEQ ratings have to be treated as quite appreciable effects according to the recommendation by Cohen (1988).

Potential Explanations

As for attempted explanations for the disciplinary differences, the review of the teaching evaluation questionnaire literature by Marsh (1987) originally treated disciplinary differences in ratings as a bias. Marsh (2007), though, does not include discipline in an extensive set of potential biases. Marsh (2007) uses Centra’s (2003) definition of bias, ‘Bias exists when a student, teacher, or course characteristic affects the evaluations made, either positively or negatively, but is unrelated to any criteria of good teaching, such as increased student learning’ (p. 350).

This definition is readily applicable to other factors which have been discussed as potential biases, such as the size of classes and whether a course is taken as an elective major or compulsory minor. However, treating discipline as a bias is debateable. It does not seem reasonable to interpret discipline as having no relationship to good teaching. There is concurrence that course evaluation questionnaires give quite reliable and valid measures of teaching effectiveness (Aleamoni 1999; Feldman 1996, 1997, 2007; Marsh 1987, 2007; Marsh and Dunkin 1997; McKeachie 1997). If this position is accepted it implies that teaching tends to be more effective in some disciplines.

The original suggestion that disciplinary influence could be interpreted as a bias seems to have resulted in a limited discussion of an underlying reason for the differences in ratings. Neumann and Neumann (1985) had no evidence of underlying cause, but speculated on the reason their study found differences between ‘hard’ and ‘soft’ disciplines. They thought that soft subjects might have less agreement on knowledge development; so the classes would require discussion of major arguments. Hard subjects would be presented in a more routine format requiring a narrower range of teaching skills. However, if the necessary range of teaching skills is narrower, it might seem reasonable to assume that a higher proportion of teachers would have attained these necessary skills.

Feldman’s (1997) review suggested a number of possible explanations, but it was clear that these potential explanations were purely speculations.

Among possible causes of these differences are the following: some courses are harder to teach than others; some fields have better teachers than others; and students in different major fields rate differently because of possible differences in their attitude, academic skills, goals, motivations, learning styles, and perceptions of good teaching. (p. 48)

Murray and Renaud (1995) investigated disciplinary variations in the frequency with which classroom teaching behaviours were utilised. They reported that teachers in the arts and humanities made greater use of behaviours classified as interaction, rapport and mannerisms. Teachers in the natural sciences had a high incidence of organization and pacing behaviours. The study then proceeded to examine correlations between the frequencies of ten teacher behaviour dimensions with student ratings of teaching effectiveness for three discipline groups. Their inference was that the correlations were best interpreted as being random and hence indicating no significant differences between disciplinary groups. The conclusion was, therefore, that there were differences between the nature of teaching and disciplines manifest in differing frequencies of types of classroom behaviour. However, students’ views of the effectiveness of the types of behaviour did not differ significantly between disciplines. This suggests that there is a common model of what constitutes good teaching which is independent of discipline.

This is consistent with Kember and McNaught (2007), who analysed interviews with 62 award-winning university teachers in Australia and Hong Kong. The teachers came from across the major discipline areas, yet it was possible to derive ten principles of good teaching which were consistent with the beliefs and practices of the whole sample. More stringent checks of consistency between teachers and disciplines were possible with a sub-sample of 18 Hong Kong award-winning teachers from all faculties of a comprehensive university (Kember et al. 2006). Transcripts were searched for utterances consistent with the principles and high levels of matching were found. Each of the 18 teachers was also asked to examine the conclusions for compatibility with their practices as a teacher and there was no disagreement.

The notion of a common inter-disciplinary model of good teaching is also consistent, implicitly at least, with the CTEQ literature. Major reviews of the field (Aleamoni 1999; Feldman 1996, 1997, 2007; Marsh 1987, 2007; Marsh and Dunkin 1997; McKeachie 1997) all discuss the nature of good university teaching, as the design of a valid instrument is conditional on identifying good teaching practice. There is recognition that there are alternative models of good teaching used to frame instruments, with the variation arising from differing theoretical perspectives of the originators. However, once developed from a model, it has been normal to use questionnaires across a broad range of disciplines. None of the reviews indicate that it is necessary to have instruments specific to particular disciplines and indeed there is a general advocacy of the common use of well-designed questionnaires with multi-factor structures corresponding to the identified facets of effective teaching.

It ought to be noted that there are contrary views to the idea of a common disciplinary view of what constitutes good teaching. Schulman (1986, 1987) advocated the concept of pedagogical content knowledge. Hativa and Marincovich (1995) interpret Schulman’s work as implying that faculty developers who give counseling to academic staff about their CTEQ results, need to develop expertise with respect to discipline. While the notion of disciplinary differences in what constitutes good teaching might appeal intuitively to many academics, it is hard to find clear empirical evidence to support the claim.

It is possible that any differences between disciplines may be magnified by academics perceptions of them. Becher and Trowler (2001) argued that disciplines should be viewed from a perspective intermediate between realist and phenomenological positions. Their view of differences between disciplines was that there was an element of social construction. ‘We need to take into account narratives, “stories”, about disciplinary epistemology as well as disciplinary epistemology itself’ (p. 38). Wareing (2009) could find little relevant evidence of differences between disciplines in how students learn. However, academics were prone to hold perceptions that good teaching and learning in their discipline differed from others.

Aims

The literature review clearly establishes that there have been consistent findings of disciplinary differences between student ratings of teaching. The Biglan (1973) classification scheme seems to provide a viable way of grouping the disciplines, with the hard v soft classification particularly apposite. There is evidence of commonality in students’ perceptions of what constitutes good teaching across disciplines, but also contrary claims. It, therefore, seems particularly appropriate to test whether data on students’ perceptions of the quality of the teaching and learning environment they experience fit to a common model across disciplines. If a common model were tenable it would then be apt to examine whether there were disciplinary effects for the magnitude of variables within the teaching and learning environment. It would be appropriate to include within a tested model a measure of perceptions of learning outcomes, so as to have a focus upon student learning rather than teacher performance. For all variables, student perceptions would be an adequate measure as the aim is to explain differences in student ratings by discipline. These are the perception measures widely used to appraise teaching quality and have provided the evidence of disciplinary differences; so it is appropriate to utilise them within a study of the phenomenon.

The aims of the study were formulated as:

  1. 1.

    To test whether a common model of good teaching applies across contrasting disciplinary groups.

  2. 2.

    To examine the extent of deployment between disciplines of variables within a broadly conceived teaching and learning environment.

  3. 3.

    To determine whether students perceive differing learning outcomes by discipline from any differences in the teaching and learning environment they experience.

Kember and his colleagues (Kember and Leung 2005a, b, 2006, 2009; Kember et al. 2007; Leung and Kember 2005a, b, 2006) had used structural equation modelling (SEM) to investigate the impact of a broadly defined teaching and learning environment upon the development of generic capabilities. The models tested in these studies incorporated a broadly conceived teaching and learning environment, including teaching and curriculum variables, together with variables pertinent to teacher–student and student–student relationships. It, therefore, encompassed the type of variables which Murray and Renaud’s (1995) study indicated as being deployed to differing extents by discipline. The model hypothesised that the teaching and learning environment impacted upon the development of a set of generic capabilities. These capabilities were of the type that all college graduates are now expected to possess (Barrie 2006; Candy and Crebert 1991; Leckey and McGuigan 1997; Longworth and Davies 1996; Tait and Godfrey 1999), so this introduced an outcomes measure which could reasonably be expected to result from all disciplines.

The model hypothesized the nine scales of the teaching and learning environment influencing the development of the six capabilities. The teaching and learning environment scales acted as indicators for three latent constructs, while the capabilities were subsumed under two latent constructs. The model, shown in Fig. 1, aims to investigate the way capabilities could be nurtured through a teaching and learning environment of appropriate configuration and quality (Kember and Leung 2005a; Kember et al. 2007). The model, and a similar precursor one, had previously been tested with large multi-disciplinary samples in two universities (Kember and Leung 2005a, b, 2006, 2009; Kember et al. 2007; Leung and Kember 2005a, b, 2006). These studies did not specifically test for disciplinary differences, but the model showed a good fit to the data for samples with a wide range of disciplines, suggesting that the model is applicable across disciplines.

Fig. 1
figure 1

The conceptual model relating the teaching and learning elements and the development of capabilities. Note: variances/disturbance terms of the latent constructs are not displayed for simplicity

The survey instrument was used to gather data from students in all undergraduate degrees within a comprehensive university in Hong Kong. The comprehensive university offered a wide range of disciplines, similar to that of other comprehensive research-intensive universities. It was, then, possible to compare models across broad groupings of disciplines, along traditional lines, using the group comparison techniques of SEM.

Four discipline areas were selected: arts, education and social science (labeled humanities); business administration (business); engineering and science (hard science); and health sciences and medicine (health). These disciplinary groupings were chosen so as to be consistent with the literature reviewed above and the Biglan (1973) category scheme. The hard versus soft distinction was the principal guide to discipline groupings as it had been most relevant in the CTEQ literature. Engineering and science were grouped together as initial courses in engineering are often used to build basic foundation knowledge, so are relatively pure in nature. The separate grouping for the health sciences was the most obvious manifestation of the life category. More fine-tuned groupings would have been undesirable as they would have complicated interpretation, made generalization more difficult and the smaller numbers in the resulting groups would have reduced the reliability of the SEM tests.

The humanities group included 11 departments or disciplines in arts and eight in social science, together with education. Business and administration offered programmes in business administration and finance. Hard science included students in 14 undergraduate degrees in science and from five engineering disciplines, with an emphasis on computer-based engineering. Health science was made up of medicine, nursing and pharmacy.

The aims given above can now be re-formulated in more specific terms, taking into account the nomination of the groups and the SEM techniques to be employed.

  1. 1.

    to test the model (Fig. 1) for configural invariance between the four groups

  2. 2.

    to compare latent and observed means for environment variables

  3. 3a.

    to compare latent and observed means for capability variables

  4. 3b.

    to compare the effects of the environment constructs on the capabilities.

Method

Sample and Procedures

Participants in the study were full-time undergraduate students from the 50 undergraduate programs offered by a university in Hong Kong. The universities in Hong Kong are governed by the University Grants Council (UGC), which has an international membership. All the universities were founded while Hong Kong was a British colony and are consistent with UK standards and practice. The UGC has been entrusted with ensuring that the standards and independence of Hong Kong universities has been maintained since the handover to China. Given the importance attached to globalisation and student exchange, the leading Hong Kong universities have become highly international in outlook, so can be seen as comparable to good quality universities elsewhere.

The questionnaires were administered to all 5,613 first- and third-year undergraduates and 3,341 of them completed and returned the questionnaire. Deletion of 36 cases with missing data ultimately yielded a final sample of size 3,305, 59% of the total population for the analysis. Due to the Hong Kong education system, most of the students were Chinese and aged between 18 and 22. The details of sampling procedure can be found in Kember and Leung (2006). The final sample was divided into the four discipline groups. The numbers within each group were; humanities (n = 1,182), business (n = 694); hard science (n = 1,056); health sciences (n = 373). The 50 undergraduate programmes were quite discrete. Apart from a small number of general education courses, most students would take courses associated with their major and few students would take courses from more than one of the disciplinary groupings.

Measures

The 33-item Student Engagement Questionnaire (SEQ) was used to seek feedback on the students’ perceptions of the development of six generic capabilities and of nine elements in the teaching and learning environment (Kember and Leung 2009). The instrument and its precursor has been used extensively (Kember and Leung 2005a, b, 2006, 2009; Kember et al. 2007; Leung and Kember 2005a, b, 2006). Reliability, validity and other psychometric properties of the instrument have been dealt with in detail in Kember and Leung (2009).

The SEQ was designed to provide feedback at the level of a programme or degree. This distinguishes it from CTEQ-type instruments which focus at the level of individual teachers or courses. The most common equivalent is probably the Course Experience Questionnaire, originally developed by Ramsden and Entwistle (1981) and subsequently adapted for use nationally in Australia and the UK. The SEQ evaluates a broadly conceived teaching and learning environment (Kember and Leung 2009). As such it is responsive to diverse forms of teaching and learning, which appears to make it suitable for the present study.

Sample items for the instrument are presented in Table 1. Items were rated on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). In the following text, we present the measures of internal consistency for each of the constructs, or latent variables, which were used in the structural equations models.

Table 1 Student Engagement Questionnaire—sample items

Intellectual

The intellectual capability was measured by eight items which were grouped into four scales with two items representing students’ perceptions of the development of generic capabilities which are needed for a knowledge-based economic society (Candy and Crebert 1991; Leckey and McGuigan 1997; Longworth and Davies 1996; Tait and Godfrey 1999): (a) critical thinking (α = 0.78), (b) self-managed learning (α = 0.74), (c) adaptability (α = 0.61), (d) problem solving (α = 0.70).

Working Together

Four items were used to assess the development of students’ capabilities in communication and team work skills. These items was grouped into two scales communication skills (α = 0.72), and interpersonal skills & groupwork (α = 0.57).

Teaching

Students’ perception of the teaching they received was assessed by ten items and were grouped into four constructs—that is, active learning (α = 0.70), teaching for understanding (α = 0.81), assessment (α = 0.57), and coherence of curriculum (α = 0.79). This measure was tapping both the teaching inside the classroom as well as the curriculum of the program.

Teacher–Student Relationship

Three constructs were measured by seven items to investigate students’ perception of their relationship with their teaching staff—teacher/student interaction (α = 0.88), assistance from teaching staff (α = 0.85), and feedback to assist learning (α = 0.80).

Student–Student Relationship

Student–student relationship was measured by four items comprising two constructs to reflect the perception of the bonding with peer students and engagement in learning activities among peer students, that is, relationship with other student (α = 0.86), and cooperative learning (α = 0.71).

The Cronbach alpha values for the scales range from 0.57 to 0.88; 13 of the 15 alphas exceed 0.60 and the other two are only marginally lower and in a range which has been argued to be acceptable (>0.5; Schmitt 1996).

Data Analysis

Data analyses were performed in a series of steps. First, we constructed summary scales based on the mean scores for items in a scale. Table 2 presents bivariate correlations for all of the 15 continuous variables in the hypothesized model for the overall sample (n = 3,305). The sizes of the correlations range from small (0.11) to moderate (0.65).

Table 2 Bivariate correlations among the six scales in capabilities and nine scales in the teaching and learning environments of the overall sample (n = 3,305)

Next, we used SEM to test the research questions of our study using EQS 6.0 (Bentler 2006). First, data from each of the four discipline groups were separately tested to examine the degree of fit with the prior model (Fig. 1) and to demonstrate that the hypothesized model fitted the data from the four discipline groups simultaneously. Given the evidence for invariance across discipline groups, we next used multigroup SEM to compare factor loadings of our model across groups by constraining the factor loadings to be equal across groups while structural paths were allowed to be freely estimated (metric invariance). Then, observed mean-level differences among groups were compared by computing Cohen’s d as a measure of effect sizes. Finally, we tested latent mean-level differences in the five constructs among the groups in two steps using mean and covariance structure analysis (MACS) which models both the pattern of means and covariations among the scales simultaneously (Bentler 2006; Byrne 2006). In the first step, we tested for intercept invariance by equating the intercepts of the 15 scales across groups to the final model in metric invariance. If intercept invariance was established, we further tested the scalar invariance. Latent means and mean differences for the five latent constructs were compared by selecting the humanities as the reference group and fixing latent means of this group at zero.

Assessment of overall model fit was based on multiple criteria including both absolute misfit and relative fit indices. The absolute misfit indices included the root mean square error of approximation (RMSEA; Browne and Cudeck 1993) and the standardized root mean squared residual (SRMR; Bentler 2006). Values of RMSEA and SRMR <0.08 are indicative of an acceptable fit. The relative goodness-of-fit index was the comparative fit index (CFI; Bentler 1990). As a rule of thumb, values greater than 0.9 for CFI are considered as indicating an acceptable fit (Holye 1995), and values approaching 0.95 are indicative of a good fit (Hu and Bentler 1999). Models with both SRMR and CFI or both SRMR and RMSEA values indicating acceptable fit are not rejected.

The normality of all 15 scales was investigated for each of the four disciplinary groups and the distributions of all observed variables were found to be within the level recommended for SEM with maximum likelihood estimation procedure (skewness < 2 and kurtosis < 7) (West et al. 1995). Converged solutions with no out-of-range parameter estimates were obtained for all the analyses.

Results

Configural Invariance

The procedure involved initial SEM analyses to separately test the hypothesized model shown in Fig. 1 with data from the four discipline groups. Factor loadings for active learning, feedback to assist learning, relationship with other students, critical thinking, and communication skills were fixed to 1 for identification. The hypothesized model provides adequate approximations to the data from each of the four groups (Model 0 for each of the four groups) as indicated by the fit indexes in Table 3.

Table 3 Results of testing invariance of the 5-factor conceptual model to the six scales in the capabilities and nine scales in the teaching and learning environment across disciplinary groups

The next analysis was to test for configural invariance, which involved showing that the data from the four groups were consistent with the established model structure (Fig. 1). Multiple-group SEM was performed to assess for configural invariance across the four disciplinary groups by analysing the four samples simultaneously without imposing any constraints across the groups (Bentler 2006). The goodness-of-fit results for testing configural invariance are also shown in Table 2 (Model 1) which indicated an acceptable fit to the data (χ2(332) = 1439.18, p < 0.001, CFI = 0.92, RMSEA = 0.06, SRMR = 0.05). All indicators significantly loaded on their respective factors. Hence, it can be claimed that the structural forms of the models are the same for the four disciplinary groups. This implies a common mechanism for good teaching within the four disciplinary groups.

Metric Invariance

Given this preliminary evidence for invariance across discipline groups, we ran SEM to test metric invariance by further constraining factor loadings to be equal across groups (Model 2). The constrained model also showed acceptable fit, (χ2(362) = 1531.04, p < 0.001, CFI = 0.92, RMSEA = 0.06, SRMR = 0.06), however, the difference in chi-square between the constrained and unconstrained models was significant, Δχ2(30) = 91.86, p < 0.001, so full metric invariance was not supported.

We then tested for partial invariance (Byrne et al. 1989) by freeing the paths which significantly differed across groups using multivariate Lagrange multiplier (LM) tests (Stieger et al. 1985). A model in which five factor loadings of teaching for understanding (hard sciences vs health sciences, and hard sciences vs humanities), assessment (hard sciences vs humanities), cooperative learning (hard sciences vs business), and coherence of curriculum (hard sciences vs business) were allowed to vary, provided an adequate fit to the data (χ2(357) = 1465.05, p < 0.001, CFI = 0.92, RMSEA = 0.06, SRMR = 0.05), and in addition did not differ significantly from the unconstrained model (Δχ2(25) = 25.87, p > 0.05). This suggested that partial metric invariance held across the groups. Factor loadings of this final model for each discipline groups are shown in Fig. 2.

Fig. 2
figure 2

Unstandardized solution (partial metric invariance) (humanities; business; hard science; health sciences). Note: variances/disturbance terms of the latent constructs are not displayed for simplicity. Solution with one value indicates the parameter estimate is invariant across the four groups, and with value underlined indicates the estimate is different from other groups

Observed and Latent Means Comparison

Means and standard deviations for capability variables and variables in the teaching and learning environment for each of the four discipline groups are given in Table 4. Table 5 permits comparison of the mean values by computing effect sizes, using Cohen’s d as a measure across the four discipline groups. Cohen (1988) suggested Cohen’s d (absolute value) below 0.2 be regarded as small, those between 0.2 and 0.5 medium and values over 0.5 be treated as a large effect size.

Table 4 Means and standard deviations of the six scales in capabilities and nine scales in the teaching and learning environment by disciplinary group
Table 5 Effect sizes (Cohen’s d) of mean differences in the six scales in capabilities and nine scales in the teaching and learning environment across disciplinary group

To have a clearer picture, we further assessed for disciplinary mean-level differences in latent constructs using multigroup MACS. The intercept invariance model fit the data adequately, χ2(402) = 2524.48, p < 0.001, CFI = 0.92, RMSEA = 0.06, SRMR = 0.06. The results suggested that students who have the same value on the construct would obtain the same value on the observed variable regardless of their group membership (Vandenberg and Lance 2000). We then proceeded to examine the mean-level differences among the five latent constructs across the four discipline groups. The latent mean difference model also provided a reasonable fit to the data as suggested by χ2(387) = 1959.02, p < 0.001, CFI = 0.93, RMSEA = 0.06, SRMR = 0.05. Latent mean difference estimates of the model are presented in Table 6. Bonferroni corrections were applied to account for the risk of capitalization on chance (Bollen 1989; Green and Babyak 1997) and the cut-off point of p-value < 0.003 (0.05/15 constraints) was used in assessing the significance of the latent mean differences. Compared with the humanities group, both the business and hard science groups exhibit significantly lower latent mean values in all the five latent constructs except the business group had a significantly higher mean value in working together and the hard sciences group had a non-significant difference mean value in student–student relationship; while the health sciences group scored significantly higher in teaching, student–student relationship, and working together, significantly lower in intellectual, and no difference in teacher–student relationship.

Table 6 Latent mean-level differences in the five latent factors in the conceptual model by disciplinary group

Comparing Effects of the Environment on Capability Development

The influence of the teaching and learning environment on capability development can, firstly, be examined by inspecting the models for each of the discipline groups with partial metric invariance (Model 2a). Figure 2 shows the unstandardized solution for each of the four groups by showing values for the unstandardized coefficients. Variances and disturbances to the latent variables and the values for them are not included in the diagram so as to make it conceptually simpler. The diagram includes parameters which are central to the research questions and the discussion which follows.

Table 7 enables comparison of the influence of the teaching and learning environment on capability development for the four discipline groups. The table gives direct, indirect and total effects for each of the five relevant paths in the model. The upper part of the Table gives non-standardized effects. These are normally used to examine the significance of effects. All of the factor loadings estimates of the models were statistically significant. However, the pattern of structural paths between the five latent constructs was different for the four groups in that the path from teaching to intellectual in the business group and the path from teaching to working together in the hard sciences group were statistically non-significant.

Table 7 Unstandardized and standardized total, direct and indirect effects of the three teaching and learning latent constructs on the two latent capability constructs by disciplinary group

Discussion

Epistemological Beliefs

One of the frameworks drawn upon in this discussion is that of epistemological beliefs. The Biglan (1973) classification took into account epistemological differences in its derivation. It has subsequently been used to interpret epistemological issues and phenomena related to them. Becher and Trowler (2001) coined the notion of academic tribes to highlight differences between academic disciplines. They argued that values, ways of behaving and practices are related to the nature of knowledge and ideas in a discipline. Their book examined a range of aspects of academic life in terms of this framework, but paid relatively limited attention to teaching.

Smart and Ethington (1995) found differences by discipline in importance attached to goals for undergraduate education. Hard disciplines attached more importance to knowledge application, whereas soft ones thought knowledge integration was more important. Neumann and Neumann (1985) suggested that the better ratings of soft disciplines might occur because knowledge within them was less established and, therefore, more likely to be discussed in class. Since interaction and discussion had been associated with positive ratings (Murray and Renaud 1995) this suggested soft disciplines would tend to be rated better than hard ones.

Lindblom-Ylänne et al. (2006) studied variations in approaches to teaching by discipline. They used the Approaches to Teaching Inventory, which had two main scales; one for a conceptual change/student-focussed approach and the other for an information transmission/teacher-focussed approach. Teachers from hard disciplines were more likely to report a teacher-focussed approach, while those from soft disciplines made greater use of student-focussed approaches.

Humanities

In the models for the humanities and health groups, all paths between the teaching and learning environment and the capabilities latent variables were significant. These groups might, therefore, be regarded as conforming to the model in its purest form.

The humanities group had the highest latent mean for the intellectual capabilities and noticeably higher ratings for critical thinking than any of the other three groups. This is readily explainable in terms of the epistemological framework. The humanities group would be the one in which knowledge was most contested. Teaching was, therefore, likely to involve relatively high levels of discussion of alternative positions. This would particularly nurture critical thinking, but would also help in the development of the other intellectual capabilities (Kember and Leung 2005a, Kember et al. 2007). The interaction in class also tended to strengthen student–student relationships.

Health

The health group had intellectual capability latent means lower than for the humanities and comparable to business. All capability ratings were higher than for hard sciences. The ratings for the teaching and learning environment for health were the highest of the four groups. This data and the significance of all paths suggest no deviation from the pure model. The teaching and learning environment is optimised and as a consequence students perceive that the generic capabilities are nurtured.

The latent mean for the working together capabilities was comparable in value with business and higher than the other two groups. This is presumably because the teaching involves group activities giving practise in the working together capabilities, thus promoting their development as suggested by the highest standardised coefficient for the path between teaching and the working together capabilities among the four groups.

In terms of the epistemological framework, the health disciplines require students to acquire a body of basic knowledge which is reasonably well established. It is also important, though, that the necessary practitioner skills are also developed. These can best be developed through practise, so the health programs contain periods of clinical or professional practice as well as activities that simulate the practise of clinical skills. There are therefore significant portions of teaching devoted to active learning and interaction. There may also be an influence from these being caring professions. The teachers, most of whom would also be, or have been, practitioners in the field, may have developed an attitude of care towards their students and a heightened ability to counsel them. This would further contribute to the degree of teacher–student interaction.

Hard Science

The hard science disciplinary grouping has the highest standardised coefficient for the direct link from the teaching latent variable to the intellectual capabilities. The reason for this is possibly indicated by the high loading of teaching for understanding on the teaching latent variable. Presumably students perceive a high importance being attached to ensuring that they have a good understanding of important constructs. Science has a foundation body of well-established knowledge that students need to understand as a prerequisite for learning more advanced concepts.

The direct path from teaching to the working together capability latent variable was non-significant for hard science, which indicates that the method of teaching did not encourage the development of the working together capabilities. This might be because the hard science teachers concentrate upon instilling knowledge of well-established concepts through a predominantly didactic form of teaching. This provides limited opportunities for students to work together in groups or engage in discussion and hence limited opportunities to practise and develop the working together capabilities.

Indications of the didactic nature of the teaching come from the low values for the latent means for teaching and teacher–student relationships. Teachers in the disciplines are not effectively making use of the range of teaching behaviours and learning activities encompassed within the broadly conceived teaching and learning environment. The low latent mean for teacher–student relationships suggests a limited degree of interaction because of didactic teaching.

For the hard science group, the standardised coefficient for the direct path from student–student relationships to working together was particularly high at 0.74. This path coefficient was comparatively high for the indirect path from teaching, through working together to intellectual capabilities. Presumably this path is too indirect, though, to compensate for the non-significant teaching to working together path. The outcome is a negligible indirect effect of teaching on the development of intellectual capabilities in hard science.

While the direct effect of teaching on intellectual capabilities for hard science was the highest of the four groups, the total effect ends up as the least because of the non-contribution of indirect effects. This might go some way to explaining the relatively negative student perceptions of hard science teaching in the literature.

The didactic teaching is presumably adopted because teachers in the disciplines feel that, in the early years of the degree particularly, they are teaching a well-established body of knowledge. They conceive their role as teachers to transmit this body of knowledge to their students. As the knowledge is well established the need for discussion and active learning experiences might not seem apparent.

Hard vs Soft

In relating the findings of this study to the exiting literature, it is the comparison of hard and soft disciplines which is most pertinent. Most studies which offer any insights on why there are disciplinary differences conceptualise the issue as a hard/soft spectrum, so there is little to compare to the results for the health and business groupings of this study.

Epistemological differences have been used as an explanation for disparities in ratings and approaches to teaching between science and the humanities. Smart and Ethington (1995) found that, compared to hard disciplines, academics in soft disciplines placed more emphasis on knowledge acquisition and integration and less on application. This presumably links to the finding that teachers in soft disciplines are more likely to employ active learning methods (Braxton et al. 1988; Lattuca and Stark 1995).

However, academics in hard disciplines placed more value in undergraduate research projects (Lattuca and Stark 1995). The research behaviour of scientists and engineers has also been characterised by relatively high numbers of publications compared to their counterparts in the arts and social sciences (Becher and Trowler 2001).

The comparisons seem somewhat dichotomous. The teaching behaviour of academics in science appears to be more didactic because of claims that knowledge in the disciplines is more certain. However, undergraduate research is more valued and, if publications are a reliable indicator, there is more research activity. This might give credence to the idea that disciplinary behaviours are formed as much by socially-constructed stories as by real differences (Becher and Trowler 2001; Wareing 2009). The observation could be particularly applicable to teaching behaviour. Requirements for training in teaching for university teachers are limited and often non-existent. The major influence on the teaching of new academics is, then, the behaviour of their former teachers. Teachers in hard disciplines have been found to be more likely to have teacher-centred conceptions of teaching (Lindblom-Ylänne et al. 2006), which means that, as students, new academics will have had considerable exposure to didactic teaching.

This suggested explanation attests to the influence of academic tribes (Becher and Trowler 2001). Disciplinary tribes construct their own culture of teaching, influenced by real and perceived epistemological differences. These are passed on to succeeding generations of the tribe through exposure to beliefs and resulting practices during their education.

Business Administration

For the business administration discipline, the latent mean for the working together capabilities is the highest of the four groups. The reason seems to be connected with the nature of teaching and learning, since the standardised coefficient from teaching to the working together capabilities is the highest of the four groups. This might be interpreted as meaning that the business administration discipline set a relatively high proportion of learning activities, such as group projects, which involve students working together. There might also be requirements for students to make presentations, which would help the development of communication skills. The programs also organised extra-curricular activities, such as visits and competitions, which help nurture working together qualities. This also provides a reason for non-invariance of the factor loadings of the student–student relationship with the other two disciplinary groups.

The business administration group had the lowest latent mean for teaching. The loadings on assessment and coherence of the curriculum were the lowest for the four groups, which presumably indicates students’ perceptions of these were less strong. The latent means for intellectual capability development for the business administration and other groups were not significantly different. The lower teaching latent mean meant a smaller extent in the development of intellectual capabilities via direct impact. The indirect contribution for business was higher, though, because of the stronger impact of teaching upon the working together capabilities.

The teaching in business therefore placed an emphasis on developing working together capabilities through learning activities. There was less stress than science on teaching a body of knowledge. This is presumably because the disciplinary knowledge is not as well-established, particularly in management and marketing, while greater importance is attached to the development of business skills.

Conclusion

There has been an abundant literature attesting to variations in students’ ratings of teaching by discipline and the patterns of variation have been quite consistent over numerous studies. However, there have been limited insights into why these occurred, and few of the attempted explanation have been backed with empirical evidence. The basis of using CTEQs as quality assurance measures is that they give reasonably reliable and valid measures of teaching quality. This suggests that students commonly report perceptions of variations in quality in teaching which show quite systematic distinction by discipline.

In this study we established that data from four disciplinary groups fitted to a common model of good teaching influencing the development of generic capabilities. Use of multiple-group SEM also showed configural invariance which reinforced the conclusion that the nature of an effective teaching and learning environment was consistent between disciplines. There were, however, differences in the magnitude of structural paths and latent means. This implies that there were differences between disciplines in the extent to which elements within the teaching and learning environment were brought into play. It was possible to suggest reasons for these disciplinary variations in terms of the epistemological nature of the disciplines. Though, it appears that socially-constructed stories might play as much a part epistemological beliefs as real disciplinary differences.

The study was confined to one university, so generalization is debateable. The patterns of variation in ratings within the sample were entirely consistent with those found in large studies and meta-analyses elsewhere. There is also prior evidence to back the inference of a common model of good teaching across disciplines, but differing degrees of implementation. Interpreting disciplinary differences in terms of epistemological distinctions has been common for other phenomenon. It is, therefore, plausible that the suggested explanation could also apply in other contexts. This is particularly the case if there is acceptance of the suggested influence of disciplinary tribes, in influencing the formation of socially-constructed beliefs about epistemology. Disciplinary tribes are international, so their influence should also be international.