Background

Midwife empathy and Shared Decision Making (SDM) are fundamental components of woman-centered midwifery care [1, 2]. For pregnant women, women giving birth and women in the postpartum period cared by midwives, the professional relationship with their midwife is an essential element in providing high quality professional midwifery care [3,4,5]. A good communication structure, trustful interaction, an emphatic and sensitive midwife, and participatory involvement in decision-making form the basis for this from the perspective of the women in care.

In Germany, midwives are assigned a central role (primary care provider) within an interprofessional woman-centered care context to ensure the regular course of pregnancy, birth and maternity (midwife-led continuity of care) [6]. By law, all women are entitled to continuous midwifery care during pregnancy, childbirth, postpartum and lactation (§ 134a Code of Social Law V). However, structural characteristics (e.g., shift work in clinical obstetrics) may lead to different midwives being responsible for the woman’s care. The range of services provided by midwives is multifaceted and includes independent and comprehensive counseling, care and monitoring of women. In addition, midwives are responsible for the autonomous management of physiological birth as well as the examination, care and monitoring of newborns and infants (§ 1 Midwifery Law). In case of deviations in the physiological course of pregnancy, birth and puerperium, midwives are obliged to consult physicians. The midwife’s presence during childbirth is required by law (§ 4 Midwifery Law).

Considering the midwife’s empathy, it is important to be aware that empathy has to be understood as a complex multidimensional construct [7], consisting of multiple components: (i) the emotional, (ii) the moral, (iii) the cognitive, and (iv) the behavioral components. Accordingly, empathy is a learnable, professional skill of communicating and is distinct from the view of a solely emotional experience on a subjective level [7]. It represents a key element in health care and is an important predictor of identifying women’s needs and fears as well as sharing information [8]. As a result, health outcomes are positively affected [9]. In the midwifery care context, the positive effect of empathy has been identified in terms of (i) empowering the woman for the upcoming birth, (ii) increasing the likelihood of positive birth experience and overall well-being, (iii) strengthening vital parameters (e.g., blood pressure), and (iv) enhancing pain management [10, 11].

Whereas clinical patient care has a strong focus on pathological aspects, midwives usually care for healthy women preventively who depend on the midwife’s emotional, communicative, interactive and physical support before and during the birth process [9]. Empathy skills enable the midwife to understand issues from the woman’s perspective. Childbirth represents an intense, powerful, and unpredictable life experience for the mother, which affects her subsequent life and daily routine [12]. Empathy is especially important when women do not want to speak in certain situations or when verbal communication is not possible (e.g., certain stages of birth) [10]. Thus, the nonverbal form of empathy (e.g., holding hands, intuitive gestures, and facial expressions) plays especially for the birth care setting a key role in the woman-centered care. This provides evidence that the construct structure and modes of expression of empathy in terms of critical indicators may vary across care settings [10].

SDM is a process in which the midwife, the woman, and her partner are in a trusting relationship. Hence, informed birth- and health-related decisions can be made interactively and consensually [13,14,15]. This process requires that (i) the midwife and the woman are able to define the problem or care options (e.g., mode of delivery), (ii) the best available evidence and the pros and cons (risks, benefits, costs) are known, (iii) the information is communicated to the woman in an understandable way, (iv) the woman’s values and preferences are identified and taken into account, and finally, (v) recommendations are made and a joint decision is made for further care [14, 16, 17]. Measuring SDM is considered a standard element of quality of care assessment [18]. Particularly in perinatal care, participatory decision making is considered a helpful element to prevent overmedicalization and inappropriate use of interventions during birth [19]. Evidence suggests that lack of involvement during decision making is associated with negative birth experiences [20, 21]. In contrast, women’s active role during SDM processes is associated with decreased perinatal depressive symptoms, lower likelihood of low birth weight, and preterm birth [22]. Especially in the stage of labor, the SDM process is challenging particularly due to the pressure of witnessing and unpredictable situational aspects that may affect the successful and low-risk birth process as much as possible [18]. In addition, it is unclear to what extent labor pain might override perceptions of interaction with the midwife during decision making and thus bias valid assessments of satisfaction with the SDM process [18].

Assessments measuring midwife empathy (e.g., Consultation and relational Empathy (CARE) scale [8, 23]) and SDM (e.g., SDM-Q-9-Questionaire [24]) can be used for comparative measurement across different care settings (pregnancy vs. birth). However, a mandatory prerequisite for valid comparisons is that responses on the Empathy and SDM scales are given against the same frame of reference in both care settings. Thus, the construct definition should remain constant across both settings. The association of the indicators with the latent construct (e.g., construct: empathy → indicator: active listening) should be identical in prenatal and obstetric care: Active listening should reflect the woman’s empathy experience in the same degree. This is not necessarily the case with self-reported assessments. If, for example, the positive birth experience affects the response to the individual items in the sense of a halo effect, although there is actually no objective difference with regard to the specific item content, the invariance assumption would be violated. Furthermore, women may prefer a different kind of decision making participation in the childbirth preparation phase than in the birth situation itself [25, 26]. When assessing differences in patient-reported experience of received care, different forms of invariance must be considered. For example, comparing scores at the time of birth care with baseline scores (care during pregnancy) may be misleading with regard to SDM, as the observed differences on item level may not reflect true difference in the SDM process on construct level across care settings [27].

Spranger and Schwartz [25, 26] distinguish three forms of response shift (RS): (i) Recalibration (changing women’s internal measurement standards): For example, because a woman may have had a particularly positive experience with the midwife during childbirth, her judgment may be somewhat more critical of prenatal care because of the new standard of comparison than she would have been without that experience. (ii) Reprioritization (changing women’s values): E.g., the woman perceives the construct empathy differently in the two care settings. In pregnancy, the cognitive-communicative aspects could be considered as the central element of empathy, whereas in childbirth the emotional closeness could be crucial for the construct. (iii) Reconceptualization (redefining the target construct): This would imply that in the birth situation, other aspects would represent as valid indicators of empathy than in prenatal care. E.g., in the birth situation it could be crucial that the midwife is sensitive to the specifics of the medical care context in the interaction with the woman, whereas this is considered obsolete in prenatal care.

These processes can lead to underestimation or overestimation of observed effects when comparing different care settings [28,29,30]. There are a variety of different methods for detecting RS [26]. In addition to the then-test (design approach), the structural equation modeling method of Oort et al. [28] is a widely used method to detect all three types of RS [31]. In addition, this method allows to consider the three different types of RS and to measure the true change or true difference [28].

Invariance measures are mainly used in the medical sector to assess the change in quality of life over the course of treatment from the patient’s perspective (e.g., [27, 29, 32]). To our knowledge, there are no studies in the maternity care setting that address RS in change measures related to indicators of woman-centered care. In addition, there are no RS studies for the CARE and SDM-Q-9 scales. There is evidence that women’s values and perceptions of self-reported experience may vary depending on the care setting (pregnancy, childbirth, postpartum) [33]. The study aimed to examine the response-shift for the Consultation and Relational Empathy (CARE) scale and the SDM-Questionnaire (SDM-Q-9) among pregnant women and women giving birth.

The analysis contributes to (i) improve knowledge on psychometric characteristics of accepted standards of patient-centered care, (ii) transfer and test the established CARE and SDM-Q-9 instruments in the field of midwifery care, and (iii) quantify the true difference and RS to all observed differences across care settings observed in the self-reported assessments on SDM and midwife empathy over the period of pregnancy care to birth care.

Methods

Study design and participants

Data were collected in a cross-sectional survey, which was carried out from June to July 2019. N = 2368 young families from a district in Germany received a written invitation and study information from the register of residents with a hyperlink to the digital questionnaire. Participant inclusion criteria comprised (1) majority age (≥ 18), (2) giving birth(s) in 2018, and (3) use of midwifery services during pregnancy and childbirth. N = 273 women (response rate 11.5%) chose to participate in the study. N = 150 women retrospectively completed the questionnaire 6 to 12 months after the child’s birth for both care settings, prenatal care, and care at childbirth. For N = 123 cases, data were only available for one of the two measurement time points and were excluded from data analysis. The questions referred to the experience of the women during pregnancy and childbirth. Because birth experience may result in biasing effects (e. g. Halo effects), retrospective assessments were used for both care settings [34]. The Ethics Committee of the German Society of Psychology classified the project as ethically acceptable (MAW 022019). All participants completed a digital informed consent form.

Measures

9-item Shared decision making-questionnaire (SDM-Q-9)

The generic SDM-Questionnaire (SDM-Q-9, German version) assesses the one-dimensional construct SDM in the interaction of midwives and women by 9 single items (Cronbach’s α = .94) [26]. These 9 individual items are scored on 6-point Likert scale from 0 = “Completely disagree” to 5 = “Completely agree”. Higher scores are associated with a higher degree of involvement of the woman in decision-making processes, e.g., “My midwife helped me to understand all the information.” (SDM-M-5). The instrument showed good psychometric properties in different clinical settings [35,36,37].

Consultation and relational empathy (CARE)

The validated German version of the Consultation and Relational Empathy (CARE; Cronbach’s α: .92–.94 [8, 23].) was used to assess midwives’ empathy in care. The original one-dimensional scale includes 10 items on emotional, moral, cognitive and behavioral aspects of midwives’ empathy from the perspective of the cared women (e.g., “Was the midwife interested in you as a whole person?” (CARE-M-4)). The women answered the items on a 5-point scale from 1 = “fully applies” to 5 = “does not apply at all”. Thus, low scores indicate high empathy. A previous study found that the last two items, which in contrast to the remaining items assess aspects of decision-making processes, had to be deleted in order to ensure the unidimensional of the assessment in the field of midwifery care [38] (see also [39]). Detailed information on the midwifery-specific adaptation of the CARE and SDM-Q-9 scales and the results of psychometric validation for the pregnancy setting (N = 201 mothers) have been published elsewhere [38]. Accordingly, only the 8 non-decision-related items were included in the analyses.

Statistical analysis

Structural equation modelling

First, the transferability of the models valid for care in pregnancy [38] was tested confirmatory for the assessment of quality of care in childbirth. Measures of global and local model goodness of fit were used to assess model fit [40]. The main criteria were the root mean square error of approximation (RMSEA), the standardized root mean square residual (SRMR) and the incremental fit measures Tucker-Lewis Index (TLI) and Comparative Fit Index (CFI) [40]. Models with RMSEA < .05 exhibit good model fit because less than 5 percentage of the information in the variance-covariance matrix remains unexplained. RMSEA < .08 indicates acceptable fitting models [40]. The same reference values are valid for SRMR. If the TLI and CFI values are > .95, the model can be considered acceptable because more than 95% of the information that cannot be explained by the independence model (assumption: uncorrelated variables) is systematically explained [40]. Values > .97 indicate a good model fit. Indicator reliability (IR; sufficient: IR ≥ .40), factor reliability (FR; sufficient: FR ≥ .60), and average explained variance (AVE; sufficient: AVE ≥ .50) were used as measures of convergent local fit [40]. To ensure psychometrically solid separability of the constructs, the Fornell-Larcker criterion was considered [41]. This requires that each construct shares more variance with its own indicators than with the other model constructs.

The analysis was performed in 4 steps following Oort [28, 42]. (1) Baseline model testing: The measurement models are related to each other without parameter restriction. Insufficient model fit indicates situation-specific structure of the construct that cannot be transferred from pregnancy to birth (reconceptualization). (2) Zero model: Completely restricted models (loadings, intercepts, error variances) between the compared situations (pregnancy vs. birth). By means of a χ2-difference test it is examined whether there is complete measurement invariance or no RS at all. A significant test result indicates that further steps of RS detection are required. (3) Selective release of model restrictions: All factor loadings, all intercepts, and all measurement error variances were released separately, resulting in parameter-specific nested model comparisons. The violation of invariance was tested by χ2 -difference test. Inhomogeneities of factor loadings indicate reprioritization, inhomogeneities of intercepts indicate uniform recalibration, and inhomogeneities of error variances indicate non-uniform recalibration. (4) Definition of the final RS model: All inhomogeneities and parameter restriction releases identified as relevant in step 3 are integrated into a common model. This model can be used to determine the true or RS-adjusted difference [28]. Cohen’s d is calculated as effect size measure (d = 0.2, 0.5 and 0.8 are regarded as “small”, “medium” and “large”, respectively [43]).

Descriptive statistics as well as mean value comparisons (t-tests) were performed using SPSS 26.0. Structural equation models were estimated using the maximum likelihood algorithm estimation implemented in the software Amos 26.0 [44]. To avoid systematic bias due to missing data, missing values were imputed using expectation-maximization algorithm [45].

Results

Of the 2368 young families contacted, a total of 273 persons (11.5%) completed the survey. Data for both pregnancy and birth care were available for 160 cases (58.6%). After excluding cases with missing values > 5 in the 17 scale items (> 16% missing values), N = 150 complete cases for both care settings were included in the analysis. The remaining missing values on the scale items were imputed using the EM algorithm [46]. The descriptive statistics of the analysis sample are presented in Table 1.

Table 1 Characteristics of the sample (N = 150)

Identification of integrated cross-setting structural models for the SDM-Q-9-M and the CARE-8-M

First, the fit of the birth data was tested for the underlying model, which had already been shown to fit the pregnancy data appropriately [38]. For the birth data, moderate to satisfactory model fit was demonstrated for both scales (SDM-Q-9-M: CFI = .96, TLI = .95, RMSEA = .139, SRMR = .030; CARE-8-M: CFI = .93, TLI = .89, RMSEA = .195, SRMR = .041; Table 2). The analysis of residual correlations identified violations due to local item dependencies (medium to strong ≥ .39) for 3 item pairs of the CARE-8-M scale (CARE-M-2 “letting tell your story” & CARE-M-3 “really listening”; CARE-M-4 “interested in you as whole person” & CARE-M-6 “showing care and compassion”; CARE-M-1 “making you feel at ease” & CARE-M-7 “being positive”). For the SDM-M scale the two item pairs (SDM-M-7 “joint considerations of options” & SDM-M-9 “agreement for further care”; SDM-M-8 “joint selection of the option” & SDM-M-9 “agreement for further care”) proved to be local dependent. These item pairs showed stronger associations with each other in the birth setting than in the pregnancy care setting. Nevertheless, these items exhibited high item-construct associations, which is consistent with the assumed unidimensional structure. After considering this model modifications, good model fit was achieved for both pregnancy and birth setting (integrated model pregnancy/birth: SDM-Q-9-M: CFI = .99/.98, TLI = .98/.97, RMSEA = .084/.103, SRMR = .018/.021; CARE-8-M: CFI = .97/.97, TLI = .94/.95, RMSEA = .125/.129, SRMR = .034/.032; Table 2).

Table 2 Measures of global fit for all estimated single CFA-models (N = 150)

Detecting RS from pregnancy care to birth care

Step1: Satisfactory model fits were confirmed for the unrestricted baseline models in which the measurement models are estimated simultaneously for both care settings (overall model; SDM-Q-9-M/CARE-8-M: CFI = .96/.97, TLI = .95/.96, RMSEA = .094/.076, SRMR = .045/.048; Table 2). All items indicated high corrected item-total correlations (rit,c = .72 to.93; Table 3).

Table 3 Descriptive statistics and measures of local fit for the CFA of the RS-scale structure (N = 150)a

Both scales proved to be highly reliable in both settings (α = .94 to.97; FR = .94 to.97). The latent constructs account for a high proportion of the variance in all individual items (AVE = .67 to.78) and were found to be highly separable according to the Fornell-Larcker criterion (max. latent correlation: .71 < min. square root of AVE: .82; Table 4).

Table 4 Intercorrelations of the scales and relevant scale properties for the pregnancy and birth setting (N = 150)

Step 2: The zero model, in which all model parameters are assumed to be invariant of the setting (cross-setting measurement invariance), showed a significantly worse data fit for SDM-Q-9-M and CARE-8-M, respectively (χ2/dfoverall = 2.31/1.85; χ2/dfzero = 2.51/3.43; Table 2). Thus, for both constructs no measurement invariance is given, so that in the subsequent steps significant RS-effects are analyzed by means of a comprehensive RS model.

Step 3: To identify RS effects, each parameter that was restricted as invariant between the pregnancy and birth setting was successively released. The significance of removing each equating restriction was tested in a nested model comparison using the χ2-difference test.

For the SDM-Q-9-M scale, ten of the 27 parameters showed a significant difference between the two care settings (Table 5): Two factor loadings (RS-type: reprioritization; higher value at birth: SDM-M-3: information about different options; higher value at pregnancy: SDM-M-1: informed that a decision must be taken), five intercepts (RS-type: uniform recalibration; higher value at birth: SDM-M-1; SDM-M-6: asked which option I preferred; SDM-M-8: joint selection of the option; higher value at pregnancy: SDM-M-7: joint considered options; SDM-M-9: agreement of further care) and three error variances (RS-type: non-uniform recalibration; higher value at birth: SDM-M-4: explanation assets & drawbacks of the options; SDM-M-9: agreement for further care; higher value at pregnancy: SDM-M-8: joint selection of the option). Although the RS model, in which all RS-related violations are removed simultaneously, has additional 15 degrees of freedom, the global fit criteria actually indicate a slightly better model fit than the much less parsimonious overall model (CFI = .96, TLI = .95, RMSEA = .089, SRMR = .046; Table 2).

Table 5 Estimation of response shift parameters for the SDM-Q-9-M scale (N = 150)

For the CARE-8-M scale, RS-differences between the two care settings were evident for nine of the 24 parameters (Table 6). Reprioritization was present for three items (higher loading at birth: CARE-M-2: letting you tell your story, CARE-M-4: being interested as whole person; higher loading at pregnancy: CARE-M-6: showing care and compassion). Uniform recalibration was found for three items CARE-M-2, CARE-M-4 (both higher intercept at birth), and CARE-M-6 (higher intercept at pregnancy). Non-uniform recalibration was detected for the items CARE-M-1: making you feel at ease, CARE-M-2, and CARE-M-4 due to higher error variances at birth. Global measures of goodness of fit indicate the more parsimonious comprehensive RS model to fit as well as the unrestricted overall model (CFI = .97, TLI = .96, RMSEA = .076, SRMR = .051; Table 2).

Table 6 Estimation of response shift parameters for the CARE-8-M scale (N = 150)

Step 4: Based on the RS model, the contribution of the RS to the observed difference between care settings was determined [30]. The estimate of the true item-specific difference results from the assumption that the measurement models in pregnancy care can be transferred to birth care (setting-invariant measurement models). At the item level, the standardized true difference of pregnancy versus birth for the SDM-Q-9-M scale ranged from dtrue = − 0.184 to − 0.240 (Table 7). For the CARE-8-M scale, the standardized true difference of the midwife’s empathy of pregnancy compared with birth ranged from dtrue = 0.265 to 0.385 (Table 8).

Table 7 Evaluation of RS and true difference for SDM-Q-9-M scale from pregnancy to birth (N = 150)
Table 8 Evaluation of RS and true difference for CARE-8-M scale from pregnancy to birth (N = 150)

RS corresponds to the difference between the true and the observed change. Positive (vs. negative) RS values indicate that the item-level difference is more positive (vs. more negative) than would be expected based on latent construct values. For the SDM-Q-9-M scale, four marginal RS effects point in the negative direction (dRS = − 0.012 to − 0.067; Table 7). The items with negative RS indicate a more significant decrease in shared decision making from pregnancy compared to birth than would be expected based on the true difference. The three positive total RS effects are generally higher (d = 0.070 to 0.318). The effect is particularly pronounced for item SDM-M-1: informed that a decision must be taken, as a positive difference was measured despite the negative true difference (dtrue = − 0.184; dobserv = 0.134). Recalibration is present for all items with RS. Only the items SDM-M-1 and SDM-M-3 shows reprioritization due to setting-specific associations of the item with the underlying construct (unstandardized factor loading).

For the CARE-8-M items, a violation of the invariance of the factor loading occurs for three of the eight items. Overall, the factor loadings are more dependent on the respective setting than for SDM-Q-9-M (reprioritization; CARE-M-2, CARE-M-4 and CARE-M-6; |d| = 0.017–0.079; Table 8). For example, the item CARE-M-2: letting you tell your story (pregnancy: FL = 0.488 vs. birth: FL = 0.551; Table 6) is significantly more highly associated with the latent construct in the birth setting. Together with the recalibration effects, which affect all items with RS, this results in total RS effects for all items. For the items CARE-M-2 and CARE-M-4 there were positive RS effects (dRS = 0.219/ 0.171; Table 8). This means that the true difference is overestimated by these items. The remaining two items with RS (CARE-M-1 and CARE-M-6) underestimate the true difference (dRS = − 0.042 to − 0.139). When interpreting the values, it must be taken into account that high values on the CARE-8-M scale reflect a low level of empathy.

At the latent mean level, both the SDM-Q-9-M (Cohen’s d = 0.26, p < .001) and CARE-8-M (Cohen’s d = 0.37, p < .001) scales showed significantly worse scores for obstetric care compared to pregnancy care.

Discussion

For both the CARE-M and SDM-Q-9-M scales, a subset of the items represent the underlying latent construct empathy and SDM, respectively, in a comparable and fair manner independent of the care setting. However, individual items indicate a stronger or weaker difference between care settings than should be expected based on differences at the latent construct level. Thus, in diagnostic application in maternity care practice, a distinction must be made between (i) which item information actually represents genuine empathy differences (true setting-dependent empathy differences) and (ii) which item information represents differences that dissociate from the overall empathy difference. This contributes to a setting-specific understanding of which behavioral and interactional aspects characterize the midwife’s empathy and SDM behavior from the women’s perspective.

The fact that the women perceive the empathy of the midwife during the birth situation as significantly weaker is reflected in all 8 items of the CARE-8-M scale. For the five empathy indicators “making you feel at ease” (CARE-M-1), “really listening (CARE-M-3), “fully understanding your concerns“ (CARE-M-5), “being positive“ (CARE-M-7), and “explaining things clearly” (CARE-M-8), there are no RS effects. Thus, these items validly represent the differences in equal strength. These empathy aspects, which mainly address the interactional-communicative behavioral component as well as the nonverbal empathy form, can thus be considered as setting-independent core aspects of midwifery empathy [1, 11]. For these items, it can be ruled out that (i) setting-specific Halo effects [47] might cause varying item-construct associations, (ii) a changing woman’s assessment standard might lead to an unexpected lower- or higher-than-average score, and (iii) that empathy perception might change structurally [28, 46].

“Letting you tell your story” (CARE-M-2) and “being interested in you as a whole person” (CARE-M-4), however, are rated significantly worse in the birth situation than would be expected based on the general empathy effect (mean difference ≥ .55, recalibration). This may be due to the fact that these two empathy aspects are difficult to implement by the midwife in direct birth care [33]. Perhaps the importance of these communication aspects attenuates during the birth process because women consider them to be less relevant [48]. Additionally, the variance of these two items is most significantly increased in the birth setting compared to the pregnancy setting (reduction of the ceiling effect). This increase in information variance is reflected in stronger item-construct association in the birth setting (reprioritization). This strengthens the association of the item information in CARE-M-2 and CARE-M-4 with the general empathy construct. Thus, the validity of the items increases because differences in the midwife’s empathic behavior are reflected in a more differentiated way due to the higher trait variance in the birth setting. The opposite effect appears for the item “showing care and compassion” (CARE-M-6). The difference in the item means between the settings is unexpectedly weak and the item variance increases the least. This is in line with the basic orientation of the salutogenetic professional ethos of midwives, in which the biopsychosocial health of women is the focus of practical action, and their rights and dignity are preserved respectfully and with compassion during this vulnerable period [1, 49]. After considering these RS effects, these three items also contribute to a more reliable and differentiated diagnosis of empathy-related differences.

SDM also scores significantly lower during birth care compared to pregnancy care. “Desired participation in decision making” (SDM-M-2), “explanation assets & drawbacks of the options” (SDM-M-4), and “helped to understand all information” (SDM-M-5), prove to be valid indicators independent of the setting context (no item-specific RS effects). Thus, the overall differences of SDM between care settings can be assessed fairly by using these three items [50]. Overall, RS effects are present for 6 SDM items. In contrast to the empathy scale, the RS effects for 5 of these 6 SDM items are considerably weaker (|dRS| = .012–.164). While the mean score of these five SDM items is significantly lower in the birth situation, the SDM aspect “has expressly informed that a decision must be taken” (SDM-M-1) is rated markedly better than expected. In fact, SDM-M-1 is the only one of the 9 SDM-items that measures higher scores in the birth setting. This does not validly represent the general difference in SDM between settings (recalibration). Accordingly, this increase in the mean must be considered an artifact due to the recalibration effect. The increase results exclusively from the fact that the women’s assessment standard for the birth setting changes. It is reasonable to assume that this communicative aspect is of higher importance to women during pregnancy than during childbirth. Particularly when caring for women who express great fear of childbirth, the participatory communication aspect is important in childbirth preparation to preventively mitigate fears and empower women regarding childbirth preparation [51]. During childbirth, women are particularly vulnerable and unanticipated health-related decisions must be made under time pressure [52]. In this situation, women are more dependent on the expertise and experience of health care providers for decision making than during pregnancy [53]. These setting-specific characteristics, as well as the positive birth experience itself, may result in a woman’s evaluation standard being more critical in pregnancy than for childbirth [33].

Overall, the detected RS effects are weak, as expected (except for SDM-M-1), because they are superimposed by the true differences and the different forms of RS (reprioritization, forms of recalibration) may influence each other [42]. However, significant RS should always be considered independently of the true differences for the analysis of the significance of differences in terms of content since they systematically reflect different difference components. In order to be able to depict the true differences in a valid way, considering RS separates theoretically significant difference components that can make the meaning of the construct or its setting dependent change identifiable [46]. RS indicates that, due to these differential components, the observed differences must not be interpreted as a homogeneous change in a characteristic that is erroneously assumed to be stable. The results illustrate the psychometric advantages of invariance testing in the assessment of patient-reported outcomes by means of self-assessments. This procedure should be established as a standard of analysis in the health care sector in order to provide fairness and validity of diagnostic assessments in different clinical contexts and settings [30].

The present study represents a central step in improving construct understanding of empathy and SDM in midwifery care regarding women’s experiences and in identifying valid construct indicators within the context of setting-specific characteristics. It becomes evident that woman-centered care elements such as Empathy and SDM may be less satisfactorily implementable in the birth setting than in prenatal care. This challenge is in line with existing research findings and highlights the need to assume that assessment behaviors are influenced by situational characteristics (e.g., labor, negative/positive birth experiences), limiting comparability across care settings [18, 33, 54].

In general, across both care settings, women appear to rate the midwife’s empathy better than the midwife’s SDM behavior. Although SDM effects are well documented in health services research and the concept is considered a central component of high-quality patient care, implementation in routine care is lacking [55]. SDM is often considered an add-on to discipline-specific care delivery in the care setting. Here, clarification is needed on whether the use of standardized procedures can simplify SDM processes and facilitate the efficiency of procedures to ensure women’s rights [55]. This marks a starting point for clinical practice to align maternity care more closely with women’s needs [56]. In addition, due to the complexity of the delivery system and increasing workload density in midwifery care, advancing professional education, and promoting interprofessional collaboration among providers within obstetric care is useful to improve SDM secondarily [56].

Furthermore, a deeper understanding of the diagnostic properties for the empathy construct as well as for the SDM process could be created by identifying valid and change-sensitive indicators suitable to fairly capture differences between care settings. As a result, the interactional-communicative as well as the nonverbal behavioral components of midwifery become more prominent in the empathy domain. In SDM, regardless of the care setting, support in information processing and clarification of individual participation preferences are central.

Limitations

Data were collected retrospectively and for both care settings at one measurement point in time (6 to 12 months after birth). Accordingly, a biased memory (recall bias) or an insufficiently differentiated (halo effect) or too strongly contrasted judgment of the separate settings (contrast effects) cannot be excluded [57]. The data were collected in a district in southwestern Germany, so that their general representativeness for midwifery care is not given. The analysis of data from further parent samples would be desirable to critically examine the generalizability of the findings. Local dependencies of the items were considered to ensure construct validity for both settings (see Additional file 1). Although this is associated with uncertainties in the modeled covariance structure, it is in line with the recommended procedure for RS analysis [27, 42].

Furthermore, it has to be noted that despite the SEM approach, the RS analysis must be understood as at least partially exploratory [58]. The chosen analysis approach ensured that all significant moderation effects of the settings were considered in an integrated manner in the overall model. Hence, all decisions could be substantiated transparently, and all forms of RS could be determined. As an alternative to our procedure of full parameter-specific significance testing, the equally exploratory modification indices-based iterative procedure of “specification search” by Oort [28] or “Then” test [59] could be used. RS analysis can only identify which RS are present without being able to identify their causes [46]. To be able to test the significance of RS effects for the quality of obstetric care provided by midwives, theory-based structural models in which not only SDM and empathy but also their consequences are modeled should be empirically investigated [60]. Furthermore, theoretically relevant covariates should be considered or interventions with item-selective effect hypotheses (e.g., targeted promotion of emotional components of midwives’ empathy) should be tested. Also, other care settings, such as postnatal care or care after miscarriage, should be investigated in order to empirically analyze the interaction of setting characteristics and the SDM-Q-9-M and CARE-8-M measurement instruments.

Conclusion

Empathy and SDM are of particular importance in care settings because they represent process characteristics that are essential determinants of primary treatment outcomes (patient-reported outcomes: e.g. quality of life, patient satisfaction, quality of care) [60,61,62]. The results indicate that the process-oriented instruments for measuring SDM (SDM-Q-9-M) and Empathy (CARE-8-M), which are widely accepted in patient-centered care [62], could be successfully transferred to the field of midwifery care. We were able to identify item-specific RS effects for both instruments. When these are taken into account, valid comparisons between care settings can be made [28, 46, 58]. In midwifery practice, the two instruments can be used to assess women’s views of their SDM and empathy needs in a setting-specific manner. Thus, the midwife’s professional role can be better understood in terms of the midwife-woman relationship [63]. The scales allow the identification of setting-specific strengths and weaknesses in everyday care in order to develop targeted measures for the promotion of midwifery competencies in terms of woman-centered qualitative care [64]. For this purpose, the results reveal relevant starting points for interventions: (i) Strengthen the focus on woman-centered care aspects in the birth setting, (ii) optimizing SDM processes independent of the care setting. In contrast to existing assessment tools (e.g., Midwifery Empathy Scale [64]), it is possible to use the present scales for both external and self-assessment (e.g. [63]). The systematic comparison results in a better understanding of perspective-related differences and results in improved midwife-woman interactions [65].