Introduction

Gender is a determinant of health inequalities either alone or in combination with other determinants such as socioeconomic status, age, and disability. While sex refers to the biological and physiological characteristics that differentiate men and women, gender refers to socially determined roles and behaviors, activities and attributes that a society considers appropriate for men and women (https://www.who.int/fr/news-room/fact-sheets/detail/gender).

Gender is an explanatory factor that is often accounted for in research on medical practices and health outcomes. It is frequently confused with sex, and the question of the interaction between gender and sex is often overlooked1. Gender is not a binary term. It includes an understanding that in many people, traits of masculinity and femininity coexist and are expressed to different degrees2. Currently, there is no universally accepted validated tool for measuring gender1. Canadian researchers have developed a gender index in a population of people with premature acute coronary events based on the gender concept developed by the Canadian Institutes for Health Research3 which comprises the four interrelated aspects of gender roles (e.g. childcare), identity (e.g., personality traits), relationships (e.g., social support) and gender social position (e.g., education level, personal income)4. Using this index, the authors showed that after adjusting for sex, female roles and personality traits were associated with a higher risk of recurrent acute coronary events5. This article demonstrates that these interrelated aspects of gender are determinants of patient care2.

The question of whether health professionals are aware of gender issues is important to avoid gender bias. Gender blindness and gender stereotypes are recognized as the main causes of gender bias6. Gender blindness is the failure to take gender into account whenever relevant. Gender stereotypes influence the interpretation of clinical signs and the management of conditions. For example, doctors are likely to interpret men's symptoms as organic and women's as psychosocial7. Thus, it is essential to take gender into account in medical practices to ensure quality and appropriate health care for men and women. This consideration, which is insufficient in routine practice, has led several European countries to integrate the gender issue into medical education to raise awareness among future doctors8. Gender awareness means that physicians acquire the knowledge and ability to recognize and integrate gender as a determinant of health and disease in their daily practice9. This is consistent with an awareness of the stereotypical beliefs about men’s and women’s behaviour, skills and needs that are incorrectly held in society. Since gender stereotypes can bias medical assessments, gender awareness involves reflecting on one’s own attitudes and preconceptions about men and women as well as patients and doctors6.

To our knowledge, no research has been conducted on gender awareness among physicians, particularly general practitioners (GPs). GPs are community-based practitioners who are in a leading position to address health inequalities10, including those related to gender bias. They are not only clinicians operating at all levels of care from prevention to palliative care, but they are also consulted very often. Across the EU, almost 3 in 10 (28.6%) males aged 15 years and over and more than one-third (36.3%) of females consulted a GP during the 4-week period leading up to the European health interview survey in 201911. The difficulty may lie in measuring gender awareness. To our knowledge, only two scales have been developed and validated in the literature. The Gender Awareness Inventory-Veterans Affairs (GAI-VA) scale was developed to specifically assess health professionals' gender awareness for women veterans12 which hinders the more widespread use of this scale for both women and men in other care settings. The Nijmegen Gender Awareness in Medicine Scale (N-GAMS) measures medical students' gender awareness in terms of gender sensitivity and gender-role ideology towards patients and doctors (gender stereotypes). The Dutch team that validated the N-GAMS in English9 used it secondarily in collaboration with a Swedish team8. Cultural differences in the students’ responses to the questions were highlighted. This scale was adopted by a Swiss Romansh team who translated it and validated it in French. Their study concluded that medical students’ gender sensitivity seemed to improve throughout the medical curriculum, and that female students had fewer stereotypes towards patients than male students13. It has recently been used in Portugal14 and in Italy15. No validated scale is available in French for the population of GPs.

This study aimed to validate the N-GAMS in a representative population of French GPs and to analyse GPs' gender sensitivity and gender-role ideology towards patients and doctors.

Materials and methods

The original N-GAMS scale

The original N-GAMS scale9 is based on two attitudinal aspects of gender-awareness: gender sensitivity (GS) and gender role ideology which is assessed towards patients (GRIP) or doctors (GRID). GS is defined as the “ability to perceive existing gender differences, issues and inequalities and incorporate these into strategies and actions”16. GRIP and GRID refer to gender stereotypes towards patients and doctors, respectively. These three dimensions contain 14, 11 and 8 items that GPs assessed using a 5-point Likert scale (ranging from 1 “Strongly disagree” to 5 “Strongly agree”) (Table 1). Some items had reverse meaning; therefore, an adjustment of reverse scoring items was performed. The higher the item scores, the greater the gender sensitivity (GS) and gender stereotypes (GRIP and GRID).

Table 1 Nijmegen Gender Awareness in Medicine Scale (N-GAMS)9.

GP recruitment

In total 3530 GPs were invited to participate by a polling company specialized in health surveys (https://www.b3tsi.com). The recruitment of participants was conducted through e-mail and by phone.

Validation of the N-GAMS scale

Translation-retranslation of the N-GAMS

The English version of the original N-GAMS was translated into French by the study team and then back-translated into English by a professional translator. The two English versions (initial and retranslated versions) were compared, item by item, by all participants and disagreements on the French version was discussed and solved (vf N-GAMS).

Exploratory factor analysis (EFA)

Given the potential uncertainty surrounding the structure of the N-GAMS scale for French practising GPs, we chose to perform an exploratory rather than confirmatory factor analysis17.

Appropriateness of the Spearman correlation matrix for EFA

First, we checked the appropriateness of the Spearman correlation matrix for EFA using (i) visual examination of the correlations (% of significant correlations and of correlations \(\ge\) 0.30)18,19, (ii) Bartlett’s test of sphericity which tests the overall significance of the correlation20, and (iii) the Kaiser–Meyer–Olkin measure of sampling adequacy (MSA) for the entire correlation matrix (overall MSA) and for the 33 individual items (item specific MSA). MSA values above 0.50 indicate appropriateness for performing factor analysis on the overall set of items or specific items21. The overall MSA quantifies the degree of intercorrelations among items, and an item-specific MSA quantifies the item’s correlation with the other items in the analysis.

Determination of the number of factors to retain for rotation

Common factor analysis (iterated principal axis extraction) was used because the purpose of this study was to uncover the latent structure underlying these 33 measured items22. To determine the number of factors to retain, we first referred to the original validation study9 (a priori criterion), which suggested that the scale had three dimensions. Since its development, the N-GAMS has been used several times, and researchers have extracted the same number of factors8,13,14,15. We also used a visual scree test23 supplemented by a modified latent root criterion18. The classical latent root criterion, also known as the Kaiser rule, is a stopping rule where all factors with eigenvalues (latent roots) greater than 1 are retained, whereas the modified version recommends that only the factors with eigenvalues greater than the average of the item-specific MSA are considered significant, which is argued to be more appropriate in a common factor analysis18. This criterion is usually considered reliable when the number of variables is between 20 and 50, which was the case in this study (33) and item-specific MSA above 0.4018.

Model acceptability

For both theoretical and empirical reasons, it was assumed that factors would be correlated24,25. Thus, an oblique Promax rotation with a k value of 4 was selected26. An oblique rotation honours the ubiquity of intercorrelations among social science variables25 and “Promax rotation is almost always a good choice”27. The threshold for salience for loadings was set at 0.30 to meet the minimal level for interpretation of structure18, that is, variables with approximately 9% (factor loading squared) of their variance explained by the factor.

Following guidelines for model acceptability, (i) three salient item loadings (pattern coefficients) are necessary to form a factor24, (ii) the root mean squared residual (RMSR), which is a measure of overall residual misfit values, must be ≤ 0.0828, (iii) the proportion of nonredundant residual correlations greater than the absolute value of 0.10 should be small 29, and (iv) the results across alternative extraction (iterated principal axis, ordinary least squares, weighted least squares, minimum residual) and rotation methods (promax, oblimin) must be robust.

Item reliability

Item reliability was assessed through two diagnostic measures of internal consistency. First, a Cronbach’s alpha coefficient was calculated for each sub-scale. It should be at least 0.70. Second, for each subscale, item-rest score correlations between the items and the rest scores of the subscale (i.e., the score computed from the items of the dimension deleting that item) were computed, using Spearman correlations. The absolute value of the item-rest correlations should be above 0.1. Absolute values between 0.1 and 0.3 are considered “fair”, while those above 0.3 are deemed “good”20.

Bivariate correlations between the subscores

Bivariate correlations between the dimensions were also conducted. p values were adjusted for multiple comparisons with the Holm method.

Analysis of French GPs’ gender sensitivity and stereotypes

Descriptive analysis

We first performed a descriptive analysis of GPs’ characteristics and scores on the NGAMS by dimension. We used the so called « dummy » coding for all categorical variables, and age has been segmented into 4 classes. For a GP, the score in a dimension (called the subscore) was obtained by averaging the observed values of the items in the dimension. Mean scores (95% confidence interval) for the NGAMS by dimension were calculated.

Linear regressions

The relationship between GPs’ characteristics and subscores was analysed through univariate linear regressions. GPs’ characteristics with a p value at 0.2 or less were included in multiple linear regression models. A value < 0.05 was considered statistically significant.

Ethics statements

This study was approved by the the Inserm Ethics Evaluation Committee (Comité d’évaluation éthique de l’Inserm), under the 22-895. We confirm that informed consent to participate was obtained from all of the participants in the study and that all methods were performed in accordance with the relevant guidelines and regulations.

Results

A total of 3530 GPs were contacted (3454 received a link via the internet, and 76 were contacted by phone). Of these, 2479 refused to answer (2425 and 54, respectively). The response rate was 30%. Of the 1051 respondents, 151 had incomplete questionnaires. The study population was 900 GPs.

Validation of the N-GAMS scale

Translation-retranslation of the N-GAMS

The vf N-GAMS is available upon request.

Exploratory factor analysis (EFA)

Appropriateness of the data for EFA

Inspection of the correlation matrix reveals that 366 of the 528 correlations (69%) were significant at the 0.01 level, and 168 of the 528 correlations (32%) were ≥ 0.30. Bartlett’s test of sphericity rejected the hypothesis that the correlation matrix was an identity matrix (Bartlett χ2 (528) = 12,708.279, p < 0.001). Therefore the correlation matrix contained statistically significant correlations. The overall MSA value fell in the meritorious range (above 0.80) with a value of 0.938. Examination of the item-specific MSA values for each variable yielded a range from 0.68 to 0.97, with 31 out of 33 MSA values above 0.80, and an average of 0.91. Taken together, these measures indicate that our data were appropriate for EFA.

Determination of the number of factors to retain for rotation

The average of the item-specific MSA was 0.91. Three factors had eigenvalues greater than the average of the MSA. Figure 1 indicates that both latent root and scree test criteria suggest retaining 3 factors, confirming the a priori criteria.

Figure 1
figure 1

Scree test and modified latent root criteria.

Model acceptability

Table 2 shows the results of the EFA. The measured items were distributed across the three factors as predicted by prior theory and the structure was interpretable and theoretically meaningful. One item, GRIP 11, was under the limit for saliency (0.22); three items did not load sufficiently on their own dimension GRID 3 (0.19), GRID 5 (0.20) and GRID 6 (0.14), but on the GRIP dimension (with loadings > 0.30). These items were removed. Two items (GS2 and GS13) had loadings at the limit of saliency (0.29), but we retained them as Hair et al.18 suggested that for a sample size of n = 350, the threshold for loading saliency is 0.30. The sample size of our study was n = 900; therefore, a loading of 0.29 clearly meets the minimal level for the interpretation of structure. In summary, 4 loadings (14%) met the minimal level for the interpretation of structure (0.29–0.40), 15 loadings (52%) were practically significant (0.40–0.70), and 10 loadings (34%) were indicative of well-defined structure (> 0.70). Altogether, the three latent factors accounted for 41.52% of the total variance of the original data. The first factor (GRIP) explained 25.99% of the total variance, the second factor (GS) explained 11.49%, and the third factor (GRID) explained 4.04% of the total variance.

Table 2 Exploratory factor analysis of the N-GAMS (n = 900 general practitionners).

RMSR was low at 0.036, and only 4 (0.99%) of the residuals were above the absolute value of 0.10, indicating no presence of another factor.

The results across alternative extraction and rotation methods were robust (i.e., gave similar solutions). Factor analysis on different subsamples (gender, age) also gave similar solutions.

Item reliability

The Cronbach’s α values were αGRIP = 0.925 [0.917, 0.932] for the GRIP subscale, αGS = 0.806 [0.787, 0.824] for the GS subscale, and αGRID = 0.849 [0.833, 0.864] for the GRID subscale.

Twelve GS items out of 14 (86%), 100% of GRIP items and 100% of GRID items had an item-rest correlation greater than 0.30.

Bivariate correlations between the subscores

The GRIP score and GRID score were positively correlated (\(r =\) 0.611; p < 0.001), no significant correlation was found between the GS score and GRID score (\(r =\)  − 0.027; p = 0.426), and no significant correlation was found between the GS score and GRIP score (\(r =\)  − 0.066; p = 0.096).

Analysis of French GPs’ gender sensitivity and stereotypes

Descriptive analysis

The 900 recruited GPs were representative in terms of age, sex and urban/rural practice of the general population of GPs in France. Their characteristics are summarized in Table 3. Scores for the NGAMS by dimension are presented in Fig. 2. The mean GS, GRIP, and GRID scores were 3.23 (3.18–3.27), 2.33 (2.28–2.39) and 2.46 (2.40–2.51), respectively. GS items were scored 3 or more in 69% of cases, which suggests medium to high gender sensitivity, while GRIP and GRID items were scored 4 or 5 in 17.4 and 19.8% of cases, respectively reflecting significant gender stereotypes towards patients and doctors in this population. The descriptive statistics of the N-GAMS items for all study populations and by sex are summarized in Tables 4 and 5.

Table 3 General practionners’ characteristics (n = 900).
Figure 2
figure 2

Distribution of item values and mean scores for the NGAMS by dimension.

Table 4 Descriptive statistics of the N-GAMS (n = 900 general practitionners).
Table 5 Descriptive statistics of the N-GAMS by sex (n = 900 general practitionners).

Linear regressions

The results of the univariate linear regressions are presented in Table 6. The following variables sex, age, type of exercise, training supervisor, number of years of practice, gynecological practice, patients involved in medical decision which had a p value less than 0.2 in at least one of the three models were included in multiple linear regression models. In the multivariate analysis (Table 7), gender sensitivity was lower for doctors who did not involve their patients at all or involved them moderately in medical decisions than for those who involved them a lot (p = 0.007) and was higher for doctors working in settings other than private practices (p = 0.049) and for those with gynaecological practice (p = 0.036). Gender stereotypes towards patients were significantly more important the older the doctors were, with an increasing gradient of GRIP scores (p < 0.001). They were also higher among male doctors than among female doctors (p = 0.023), among those who did not involve or moderately involved their patients in decisions (p = 0.01) and among those who were not training supervisors (p = 0.05). Gender stereotypes towards doctors were also associated with age (p < 0.001) and were higher among those who did not involve or moderately involved their patients in decisions (p = 0.014). They were lower among male doctors than among female doctors (p < 0.001).

Table 6 Relationship between general practitionners’ characteristics (n = 900) and N-GAMS subscores through univariate linear regression.
Table 7 Relationship between general practionionners’ characteristics (n = 900) and N-GAMS subscores through multivariate linear regression.

Discussion

We validated the NGAMS in a population of French GPs and used this scale to analyse GPs’ gender awareness in this population. We showed that GS was positively associated with care practices (involving patients more in decisions, and working in health centres (centres de santé) or medical homes (maisons de santé pluriprofessionnelles) compared with working in private practice), and practising gynaecology), while GRIP, although also positively associated with care practices (not at all/little/moderately involving patients in the decision, and not being training supervisors), was also associated with GPs’ sociodemographics (being male, older). For GRID, results were quite similar to those of GRIP, except that male doctors had fewer gender stereotypes towards doctors than female doctors.

An EFA of the N-GAMS instrument's items was used to identify the latent structure underpinning the instrument's 33 items. Although9 reported using principal component analysis, we chose to use principal axis factoring instead as it appears to be a particularly suitable means of extracting latent factors based on the shared variance of the variables22. The final set of 29 accepted items produced a 3-dimensional structure of gender awareness: GS (14 items), GRIP (10 items), and GRID (5 items). These three subscales had satisfactory internal consistency (alpha > 0.80). We found a quite high association between the GRID and GRIP subscales, similar to other authors suggesting a common ground for GRIP and GRID. However, the data did not show any evidence of correlation between GS and the other two subscales, which also supported earlier findings9,14. This indicates that these are separate aspects of the attitudinal component of gender awareness, which may need to be targeted and addressed independently in interventions14.

The N-GAMS has never been validated among GPs. Therefore, there are no comparative measures of gender awareness in this population. It should be noted that the sample of 900 GPs we surveyed was representative in terms of age, gender, rural/urban practice and type of practice (private surgery/others) of GPs in France30.

With regard to GS, the result we found of an independent association of three variables describing care practices, i.e., involving patients in decisions, practising gynaecology, and working in health centres or medical homes, with GS is original. Its originality lies both in the fact that it has never been shown before and in the interpretation that could be drawn from it, namely, that these associations could be mediated by patient-centred practices. Following Lindsay et al., GS is a key component of patient-centred care31. Applying a gender-sensitive perspective in patient-centred care requires that healthcare providers can perceive existing sex and gender differences, issues and inequalities and incorporate these into strategies and actions. The patient-centred approach applied to ambulatory care32 implies respect for the patient's values, preferences, and expressed needs, information and education, access to care, emotional support to relieve fear and anxiety, involvement of family and friends, continuity and secure transition between health care settings, physical comfort, and coordination of care. In this framework, shared decision making (i.e., when the patient is involved to reach the optimal decision) is seen as the pinnacle of patient-centred care33. Moreover, prior experiences, notably with patient-centred care medical homes in the US, show that these healthcare organizations should allow better monitoring of the care provided to ensure that it corresponds to the best standards, increase the possibilities of interaction and communication with patients, to encourage their participation in the care process, and better coordinate the intervention of the various stakeholders in the care process32. This is the equivalent of the multiprofessional health centres in France, whose number is increasing. The GPs who work in these healthcare organizations are part of this process of providing patient-centred rather than disease-centred care, even if the organizational practices needed to implement and adopt patient-centred care remain incomplete34. Finally, regarding OB/GYN practices, developmental issues such as menstruation, contraceptive initiation, pregnancy, childbirth, and menopause that are addressed by GPs can be transitional periods of difficulty such as unwanted pregnancy, infertility, pregnancy loss, chronic illness and pain, mood and sleep disorders, interpersonal trauma, and poverty, and may serve as opportunities to address the complexity of patients’ needs35, reflecting the interest of these physicians in the patient-centred approach32.

With regard to GRIP, and the association between being a male doctor and having more gender stereotypes, in four of the NGAMS validation studies conducted among medical students8,9,13,15, male medical students held stronger gender stereotypes than female medical students. These findings may explain, at least in part, the effects of patients’ and physicians’ gender discordance on patient management observed in general practice. Indeed, several studies conducted in general practice have shown that the management of female patients by male doctors (as opposed to female doctors) may be detrimental in the area of cancer screening for women in the USA36, health promotion (advice on physical exercise and weight loss) and the management of hypertension in France37,38, or physicians’ perception of uncertainty about a diagnosis and hidden agenda beyond the reason(s) for the visit in New Zealand39. Our results also show that, all other things being equal, gender stereotypes towards patients increase with the age of the GP, with a very clear gradient, with a mean GRIP score from 2.09 (between 27 and 40 years old) to 2.54 (between 66 and 79 years old). Looking at the validation studies of the N-GAMS among medical students, the results of the effect of age on gender stereotypes towards patients vary from one study to another. In multivariate linear models, stereotypes towards patients decreased linearly with students getting older in Switzerland (ranging from 18 to 32 years old) (coefficient − 0.03, p = 0.035) (Rrustemi et al.) and in Sweden (coefficient − 0.189, p < 0.001) (Andersson et al.). In contrast, older students expressed more stereotypical thinking about patients in Italy (coefficient 0.04, p value = 0.012)15. No significant relationships were found in Portugal14. As Bert et al. suggested, Swedish and Swiss medical students reduced their stereotypes probably due to a good theoretical and practical teaching system15. It is not clear from our study whether stereotypes towards patients are the result of an ageing effect or a cohort effect. Indeed, given that the oldest GPs we interviewed belonged to the 1943–1956 birth cohort (aged 66–79 years in 2022) while the youngest belonged to the 1982–1995 cohort (aged 27–40 years in 2022), it can be assumed that growing up during such distinctive historical times influenced gender stereotypes towards patients40. There are no data in the literature to support this hypothesis. We would need longitudinal data to disentangle these ageing and cohort effects. Additionally, gender stereotypes towards patients increased when GPs did not their patients at all, little or moderately in medical decisions, with a mean GRIP score ranging from 2.13 (patients systematically involved) to 2.57. This result is consistent with those in the literature. Within the framework of an ecological model of communication in medical encounters, Street41 stated that gender-based perceptions and stereotypes can play a prominent role in medical encounters, although, we still know very little about the scope of these beliefs and their impact. Sandhu et al.42 further claimed that non concordant gender dyads may be characterized by perceived differences in power, status, dominance, gender stereotypes, and attitudes towards the other sex that may lead to higher levels of tension and a lower communication quality. In a study performed in primary care where the consultations were videotaped, the provision of patient-centred care (measured by coding the videotapes using the Davis Observation Code) was shown to be influenced by gender concordance: female concordant dyads were associated with a greater amount of patient-centred care43. Finally, not being a training supervisor was associated with more gender stereotypes towards patients with a mean GRIP score ranging from 2.16 (being a training supervisor) to 2.39. This result, which to our knowledge, has not previously been reported, could be explained in several ways. GPs who choose to be supervisors are different from others, particularly in terms of gender, professional practice and training: women are overrepresented, as are practising in multiprofessional health centres and having additional training44; in other words this profile of doctors may have fewer gender stereotypes towards patients. Furthermore, being a supervisor implies a triangular doctor patient relationship that can have numerous potential benefits, not only for the training of residents. The presence of residents introduces a new perspective on medical situations and practice as well as on a given doctor patient relationship45.

Regarding GRID, our results show that, all other things being equal, gender stereotypes towards doctors increase with the age of the GP with a very clear gradient, with a mean GRID score from 2.19 (between 27 and 40 years old) to 2.59 (between 66 and 79 years old). As with GRIP, a cohort effect can probably partly explain this result. The GP profession has been undergoing feminization for several decades. In France, the proportion of female GPs almost doubled between 2000 and 2021 alone, rising from 24 to 43%30. Young male and female doctors today have shared faculty benches and practice in an environment consisting almost equally of male and female doctors, which makes them less prone to gender stereotyping towards their colleagues compared to older doctors. However, all things being equal, the mean GRID score was significantly higher for female doctors (2.57) than for male doctors (2.38). In particular, female doctors were considered more empathetic with patients than men doctors, and male doctors were believed to be too rushed during consultations. Women doctors (due to feedback from patients or discussions between colleagues) may have integrated the differences in practice between male and female doctors that have been demonstrated46.

Conclusion

This study is the first to measure gender awareness in a population of GPs. All the results discussed above allow us to conclude that it seems necessary to teach gender issues in medical schools. This is already the case in some countries, such as Sweden and Switzerland. We may perhaps go further by suggesting that gender be taught as part of continuing medical education.