Background

A patient has about seven doctors’ consultations per year [1]. Physicians are getting more involved in the quality assessment of daily practice [2], but few questionnaires have been validated to assess patients’ experiences with medical interviews [3, 4]. Patient-reported outcomes are considered valuable measures of healthcare and a step in the development of patient-centered care [5]. The assessment of patients’ experiences could enhance comparisons of strategies about physicians’ communication [6], treatment [7], or accountable care [8]. Available questionnaires in the field are not psychometrically sound [9], rely on expert generated items [3], and focus on specific physicians’ specialties [10] or specific patient courses [11, 12]. The Medical Interview Satisfaction Scale (MISS) was developed to assess patients’ experiences with interviews in primary care [4]. The authors used a rigorous method for item generation with patient interviews, but the factorial structure relied on the classical test theory (CTT) used at the time of questionnaire development [13]. Despite its predicted use in general practice, the original questionnaire tended to be a reference for the evaluation of patient-centered consultations [14,15,16]. Some questions were raised about the internal validity and acceptability of its 29 item form [17], and other authors suggested the factorial structure may differ across populations, stressing the need for a new validation process [18].

The main objective of the Generic Medical Interview Satisfaction Scale (G-MISS) study was to validate a generic version of the MISS questionnaire in general practice, medical specialties, and surgical specialties. The secondary objectives were to reduce the number of items and to explore the determinants of experience and satisfaction across patient groups, medical conditions, and consultation settings. The null hypothesis, defined as the lack of difference with the original questionnaire structure, was ruled out using Item Response Theory (IRT) with exploratory factorial analysis to assess patients’ experiences and satisfaction.

Methods

Patients

The protocol and statistical plan were approved by the Cerar ethical committee, Paris, France, ref. IRB 00010254-2016-023. The requirements of the Declaration of Tokyo were respected, and there was no interference in the physician-patient relationship.

All physicians, registered on the online health insurance server in the city of Marseille, France, were invited to participate in the study. Two thousand seventy-two physicians from various medical specialties were asked to enroll patients between May 2016 and July 2016. All consecutive adult patients undergoing medical consultations and able to complete a self-reported questionnaire were eligible. Non-inclusion criteria were the inability to fill an electronic form, cognitive impairment, and hospitalized patients.

Protocol and data collection

All patients received written information at the time of online registration. Patient consent was obtained by electronic signature and stored in the server. Various specialties were represented including general practice, anesthesia, cardiology, dermatology, gynecology, gastroenterology, neurology, pulmonology, rheumatology, and the following surgeries: neurosurgery, cardiac, thoracic, maxillofacial, ear-nose-throat (ENT), orthopedic, plastic, urologic, vascular, visceral, and ophthalmologic. Physicians who actively participated in the study recruited the patients by giving them a single connection voucher.

Patients were asked to fill out an electronic form about their individual experience within 1 week after their consultation with the physician. There was no accompaniment of the patient in the formulation of the answers.

In accordance with the principle of the self-reported questionnaire, patients have to complete the form on their own, so that their answers reflected only their feelings about consultation. An introductory sentence mentioned that there was no right or wrong answer and that the questionnaire was anonymous and their responses confidential.

The questionnaire relied upon the original MISS-29, which contained 29 items structured in four dimensions named after their content: “Distress Relief” (11 items), “Communication comfort” (4 items), “Rapport” (10 items), and “Compliance intent” (4 items). The 29 items of the Medical Interview Satisfaction Scale were generated after interviews with patients, according to guidelines about item generation (see Appendix) [13]. Demographics were gathered along with sociocultural and medical condition data.

All personal information was anonymized before being sent to the server, according to the recommendations of the French national commission on information technology and human rights [19].

Questionnaire reduction and validation

The questionnaire consisted of items structured into dimensions exploring the various aspects of patients’ experiences and satisfaction with medical consultations across three subgroups of physicians in general practice, medical specialties, and surgical specialties.

Item selection

We used the MISS-29 as an item bank to build the new questionnaire. A forward-backward translation was performed. A native English translator produced the first draft according to the original items. A bilingual expert back translated items to perform the cross-validation of the French version.

Items were considered for deletion if they loaded on two or more factors, or had a correlation of less than 0.40 with their own dimension according to the exploratory factorial analysis. Item deletion also relied on other standard criteria including inter-item correlation (lower than 0.40, or higher than 0.80), floor and ceiling effects (respectively higher than 15 and 40%) or low response rate (higher than 20% missing data). After the questionnaire’s multidimensional structure was identified, any item that, when deleted, would lead to a 0.02 increase in Cronbach’s alpha coefficient was removed.

Internal validity

An exploratory principal component factor analysis with varimax rotation [20] was performed along with inter-item, item-dimension, and inter-dimension correlations (Pearson r), to identify the questionnaire’s multidimensional structure. Each item was correlated with its own dimension and with the others.

If an item correlated higher (>0.4 after overlap correction) with its attendant dimension than with the others, the item internal consistency was supported, confirming its discriminant validity [21]. The internal consistency reliability of each potential dimension was assessed by Cronbach’s alpha coefficient with a threshold of 0.7 expected [22]. We used the polytomous Rasch model from IRT to explore the unidimensionality of the scale. This model assessed the ability of items to measure a “trait” or dimension of the scale. The Partial Credit Model, using threshold and discrimination parameters, was applied as an extension of the Rasch model [23, 24]. Each dimension’s scalability was explored by the pattern of item goodness-of-fit statistics (ranging from 0.5 to 1.5), ensuring that items belonging to the same dimension measured the same trait or concept [25].

Differential item functioning

We explored the differential item functioning (DIF) to assess the questionnaire properties across three settings of consultation, i.e. general practice, medical specialties, and surgical specialties. The DIF analysis sought to determine whether items and dimensions varied in their performance to assess patient satisfaction in these subgroups.

An increase in DIF would mean that the evaluated item functioned differently in the subgroup. The uniform DIF was calculated to determine the probability of giving a specific answer at a given level of satisfaction across physicians’ specialties.

The DIF was detected and the magnitude of the effect was quantified using the Crane and Larson model [26]. A significant DIF reports an increase in the explained variance of a given item when the subgroup’s variable is included, i.e. physicians’ specialties.

In case of statistical significance, Zumbo’s DIF classification was used to assess the DIF magnitude by computing delta R2. The DIF magnitude was considered negligible if delta R2 was <0.13, moderate if between 0.13 and 0.25, and large if >0.25.

External validity and acceptability

The original version of the MISS questionnaire reported correlation with occupational level [4]. Accordingly, we explored the external validity of the G-MISS across various groups of employment type. The G-MISS discriminant validity was further explored by comparisons between dimension scores and demographics, sociocultural levels, or medical conditions of patients using analysis of variance, Mann-Whitney U-test, and Pearson’s correlation. The online form of the questionnaire allowed the patient to skip any item without responding. The rate of missing data was assessed as an objective measure of acceptability [20].

Records showing fewer than 80% response rates were excluded from the validation analysis to ensure the quality of data.

Scoring

Items were answered using a five-point Likert scale, defined from 1 to 5 as “strongly disagree,” “disagree,” “neutral,” “agree,” and “strongly agree.” A dimension score was obtained by computing the mean of item scores for the dimension.

If less than one half of the items of a given dimension were missing, the mean of the non-missing items was substituted for scoring the dimension. Each dimension score was linearly transformed into a 0–100 scale with 0 indicating the worst level of satisfaction and 100 the best.

The global index was calculated as the mean of dimension scores. Dimensions were non-weighted, each was equal to the others for the computing the mean global index. Negatively phrased items were reversed when scored, so that higher scored items represented higher satisfaction (items 1, 5, 11, 15, 16, 18, 22, 24, 28, and 29) (see Appendix).

Results

A total of 2055 patients were included in the study between May 2016 and July 2016 (see Fig. 2 Flow diagram). The baseline characteristics of the patients and types of consultations appear in Table 1. The psychometric validation resulted in a final version comprising 16 items structured into three dimensions, depending on their content: Relief (eight items), Communication (six items), Compliance (two items), Table 2. This short form explained 54.5% of the total variance.

Table 1 Patient’s characteristics and type of consultations (n = 1822)
Table 2 Principal Component Analysis (Varimax rotation) and DIF of the G-MISS

Sample characteristics

Patients enrolled in the validation process were consulted at various medical and surgical specialties n = 1822 (Table 1). The mean patient age was 37.1 ± 14.7 years and 451 consultations were made in an emergency setting (25%). Five hundred ninety-eight patients (33%) reported a long course treatment and 378 had been hospitalized in the last 6 months (21%). This was the first consultation in 6 months for 283 patients (16%). Four hundred and ten patients were unemployed at the time of the consultation (22%) (Table 1).

Internal validity

The G-MISS final version contained 16 items structured in a three-factor questionnaire determined by exploratory factor analysis (Fig. 1). Thirteen items were suppressed according to their load factor and retained suppression criteria. The three dimensions were named after their item content, according to the original version of the MISS-29 questionnaire: “Relief” (eight items), “Communication” (six items), and “Compliance” (two items). The “Rapport” dimension from the original MISS-29 was removed and its remaining items were merged into the “Communication” dimension according to the exploratory factorial analysis. All the fit indices from the statistical model were satisfactory (RMSEA = 0.03, CFI = 0.98, SRMR = 0.06). The overall scalability was good, with items showing a good fit to the Rasch model in each dimension. Item internal consistency (IIC) was satisfactory for all dimensions; each item achieved the 0.40 standard threshold for IIC (ranging from 0.45 to 0.77, Table 3). The correlation of each item with its contributive dimension was higher than those with other dimensions (item discriminant validity, IDV). Cronbach’s alpha coefficients ranged from 0.73 to 0.86, indicating satisfactory reliability for each dimension.

Fig. 1
figure 1

Number of component scree plot of the 16-items G-MISS

Table 3 Internal validity

Floor effects ranged from 0.3 to 13.2%, and ceiling effects ranged from 18.2 to 36.9% (data not shown). Patients who finished the questionnaire responded to all items. The rate of missing values was low (see Applicability). According to the definition of DIF, six items (q8, q11, q19, q25, q26, q28) showed a statistically significant difference in their behavior according to the consultation specialty, but the magnitude of the DIF was negligible for each (Table 2).

External validity

The educational level and type of employment correlated with the “Communication” dimension and with the global index of the G-MISS. The type of employment also correlated with the “Relief” dimension.

Patients consulted by medical and surgical specialties showed higher satisfaction scores in the “Relief” dimension than those that attended general practices (83.0 ± 11.6 and 82.4 ± 11.6 vs. 73.2 ± 16.7, respectively; P < .001), but the most relevant discrepancies were reported for the global index (80.5 ± 9.9, 79.2 ± 10.4 vs. 74.1 ± 12.1, respectively; P < .001, Table 4). There were not differences in the “Compliance” dimension between physician specialties. There was no difference in terms of age, sex, or body mass index (BMI). Smoker patients showed lower satisfaction scores in the “Communication” and “Compliance” dimensions, and in the global index. Consultations shorter than 5 min correlated highly with low patient satisfaction in the “Relief” and “Communication” dimensions, and in the global index, P < .001. The “Compliance” dimension was not correlated with the duration of consultations (Table 4). The emergency context correlated with the “Relief” dimension and global index. There was a linear association between the number of consultations in the last 6 months and the level of satisfaction in the “Relief” dimension, P = .001.

Table 4 Comparisons of G-MISS scores according to patient’s characteristics and type of consultations (n = 1822)

Hospitalization in the last 6 months was associated with a higher level of satisfaction in the “Relief” dimension (P = .005). A long course of treatment was not correlated with overall satisfaction scores on the G-MISS.

Applicability

Seventy-eight patients declined to participate after being screened (3.8%). One hundred and seven patients showed a rate of missing values >20% (5.2%). The percentage of missing values increased along with the progression of the questionnaire. Patients with a rate of missing values >20% did not respond to any items from number 15 (data not shown). Forty-five patients (2.2%) were excluded because they took more than 1 h to fill out the 29 item questionnaire (see Fig. 2 Flow diagram). The mean filing duration for the 16 items of the G-MISS was 6 min and 49 s, [±2 min, 41 s], vs. 12 min, 23 s [±5 min, 13 s] for the original set of 29 items, P < .001.

Fig. 2
figure 2

Flow Diagram of the G-MISS Study

Discussion

Patient-reported outcome evaluations have become mandatory [27, 28]. Most regulatory authorities in Organisation for Economic Co-operation and Development (OECD) have included such outcomes in their quality assessment framework. In the United States, the Primary Care Assessment Survey (PCAS) was initially developed to measure the quality of service in seven domains of primary care through 11 scales [29] and was further developed and tested to assess performance [30]. The Consumer Assessment of Healthcare Providers & Systems (CAHPS) is used in the hospital setting to discriminate between strategies in Medicare Accountable Care Organizations [8].

In Europe, the UK’s Quality of Outcomes Framework was introduced in 2004 [31]. The General Practice Assessment Questionnaire (GPAQ) was derived from the original PCAS [32,33,34] and its psychometrical properties were retrospectively tested [35]. The GPAQ is used by the UK department of Public Health and Primary Care as a survey for general practitioners’ revalidations and practices [36]. These tools are oriented toward quality of service and performance assessments, but there is no generic tools forecasted to the evaluation of patient experience itself in various practice. However, patient satisfaction could be misleading if evaluated with tools that are validated to assess quality of service or inherited from the consumer field [5]. Taking into account patient expectations should help physicians with patient interactions and shared decision-making [37, 38].

In our study, the null hypothesis was rejected according to the differences that emerged between the original MISS-29 structure and that of the G-MISS short form reported here. The MISS-29 was specifically developed to assess patients’ experiences and expectations about physician consultations. The item generation relied on patient interviews to assess the cognitive, affective, and behavioral dimensions of patient-physician interactions [4]. The original set of items demonstrated good wording and comprehension compared with the Consultation Satisfaction Questionnaire [3]. The 29 item form was criticized for its acceptability, with a mean filling time of 12 min. This encouraged some authors to develop a short form of the MISS-29, for use only in general practice [17].

The purpose of the present study was to develop a questionnaire suitable for most consultation settings, by using IRT to select items and structure them into dimensions of experience.

One particular advantage of IRT over CTT is its independence in regard to the population being tested. The G-MISS final version is a self-reported questionnaire of 16 items structured into three dimensions of patients’ experiences and satisfaction with doctors’ consultations. One strength of the questionnaire is its large validation sample of 1822 patients, who were consulted in various general practices as well as medical and surgical specialties. The process of selection deleted items with equivocal loading into the factorial structure. There was no item switch from a dimension to another. The items belonging to the original “Rapport” and “Communication comfort” dimensions were merged into a single dimension, “Communication”, after the exploratory factorial analysis. The G-MISS questionnaire reported high levels of internal validity across its three dimensions, confirming that patients’ experiences are a multidimensional concept. Accordingly, the reliability indices of the three dimensions were satisfactory.

No notable differential item functioning was reported, ensuring that the questionnaire is equally reliable in every setting of consultation it was forecasted for, i.e. general practice, medical specialties and surgical specialties. The three-dimensional structure assesses different aspects of patients’ experiences. The G-MISS dimensions were named after the original MISS-29 questionnaire [13].

The “Relief” dimension assesses the alleviation of illness-related stress. It has been reported that patients experiencing symptoms of chronic or acute conditions worry about the potential impact of their disease on their life [39]. By relieving stress, physicians could improve health status and well-being [40]. The “Communication” dimension assesses the communication comfort between the patient and doctor.

It has been previously reported that patients consider communication to be one of the most important physician skills [41], but even if technically sound, the physician’s communication may appear inappropriate to the patient [42]. A recent systematic review stressed the lack of tools to assess physician’s communication [43]. The “Compliance” dimension reports the patient’s intent to follow doctor’s recommendations. It is well known that medical consultation influences patient compliance; [44] for example, specific attention should be given to verbal interactions, especially in long-term treatments like blood pressure control programs [45].

The items from the “Compliance” dimension were below the usual range of acceptance for the Infit in the general practice population [25]. One could hypothesize that there is an overlap between compliance and communication or relief, because of an indirect connection between these dimensions. A systematic review pointed out the link between physician-patient communication and outcomes in primary care [43].

The level of explanation given by the physician has been correlated with patient compliance [45] and could have influenced the “Compliance” and “Relief” or “Communication” dimensions assessed by the G-MISS scale. In contrast, unadapted behavior from a physician demonstrating nervousness or anger has been associated with less compliance from the patient and could also influence several dimensions of the patient’s experience [46]. The level of Infit was satisfactory in the global population, medical specialties, and surgical specialties, for all dimensions.

The level of satisfaction in the “Relief” dimension was significantly lower in general practice than medical or surgical specialties.

Some patients may not feel sufficiently reassured by their general practitioner’s explanations and need a referral to a specialist [39]. Specialists could be more likely to use situation-specific reassurance strategies while standard communication skills tend to stay generic [47]. These “Relief” dimension findings translated into the results for the global index with specialists’ interviews reporting higher satisfaction scores. Lack of practitioner’s time is classically reported as a cause of dissatisfaction for patients [48].

Although the perceived consultation duration has been associated with better patient satisfaction, the actual length of consultation has produced controversial results [49, 50]. One could hypothesize that the use of non-validated questionnaires may have explained the lack of difference [51]. In our study, interviews that lasted less than 5 min were significantly correlated with lower satisfaction scores in the “Relief” and “Communication” dimensions and global index. The G-MISS correlated with patients’ type of employment which is concordant with the original questionnaire. The external validity of the G-MISS scale was further emphasized by correlations with the emergency context, the educational level and the number of consultations in the last 6 months.

Among patients with a rate of missing values >20%, the number of responded items dramatically fell from item 15 to 29. This reinforces the interest for a short-form to improve the applicability of an everyday practice questionnaire. The final version of the G-MISS questionnaire is 16 items long and took about 6 min to complete. Despite this short form, the G-MISS questionnaire still explained 54.5% of the total variance.

The level of education wasn’t correlated with the rate of missing answers, suggesting that the questionnaire is widely administrable to patients able to read and complete an online survey. Application abilities in various medical fields of primary care underline the interest for the questionnaire in a patient-centered care approach.

Conclusion

The G-MISS questionnaire is a valid and reliable short-form questionnaire to assess patients’ experiences and satisfaction with physician consultations in general practices, medical specialties, and surgical specialties. The multidimensional structure relies on IRT and assesses different aspects of patients’ experiences that could be useful in clinical practice and research settings.