Background

Improvement of quality of clinical practice needs quality measurements. These measurements must be accurate, valid and feasible. The advantages and disadvantages of different methods of measuring the process of care, including both the competence of the clinician and what the clinician actually does, have been well described. Methods include chart extraction, standardized patients and clinical vignettes. Substantial inaccuracies in administrative data are common, which leads to expensive data extraction and difficulty in validating data [13]. Compared to standardized patients and chart extraction, clinical vignettes are an accurate, valid, feasible and inexpensive tool to measure quality of health care [4, 5]. Thus, clinical vignettes have been used widely to compare quality of clinical care and to assess variation in practice across countries, health care systems, specialties or clinicians [611].

Vignette-based surveys for physicians feature open-ended questions rather than multiple-choice questions or checklists. In this way, physicians can give a personal response to each question, which ensures that the survey captures the full range of practice variation [11, 12]. Open-ended questions avoid the "cueing" inherent in multiple-choice questions, which could overestimate real physician performance. However, the accuracy of close-ended questionnaires has been demonstrated in several domains: the format provides a significantly higher rate of accuracy than an open-ended format in terms of eyewitness confidence with the former format [13]. Close-ended questionnaires maximize questionnaire response rate and ensure questionnaire completeness [14].

Moreover, different formats yield different answers: in one study, a closed-ended questionnaire produced results reflecting higher willingness to pay for a health care intervention, with different justifications for those evaluations than did those from an open-ended one [15]. A public opinion survey with two different response modalities asked subjects about the most important problem facing the United States: respondents of the open-ended format most often complained about political leadership, whereas those of the close-ended format considered violence as most important [16].

From these data, we wanted to evaluate how clinical vignette-based surveys influence physician responses. Assuming that closed-ended (multiple-choice) questions for vignettes produce different responses than open-ended, leading to an overestimation of professional performance, we aimed to determine whether the influence of deceptive response items included in the closed-ended questionnaires result in a better assessment of professional performance.

Methods

We conducted a prospective randomized study aimed at comparing three response modalities for a vignette-based survey: open-ended questionnaire, closed-ended questionnaire (with only correct response items) and closed-ended questionnaire with deceptive response items mixed with correct items.

Survey

The survey was composed of two parts. The first part was short, identical in each questionnaire, and collected demographic characteristics and specialties of physicians. The second part was the clinical vignette.

Vignette

The vignette reported the history of a fictitious 50-year-old woman with active rheumatoid arthritis, a candidate for therapy with tumor necrosis factor (TNF) blocking agents. Physicians were asked to answer four questions about their pre-treatment assessment, considering that TNF-blocking treatment was planned: 1) what specific data are you searching for in this patient's history? 2) What clinical data are you personally searching for during the physical examination? 3) Which biological, radiographic or other tests do you request? 4) What other preventive measures do you take? Physicians were given these questions in one of three questionnaire formats: open-ended questionnaire (questionnaire A) [see Additional file 1], closed-ended (multiple-choice) questionnaire with deceptive response items mixed with correct items (n = 73) (questionnaire B) [see Additional file 2], closed-ended questionnaire with only correct items (n = 35) (questionnaire C) [see Additional file 3]. Deceptive and correct response items were created by following published international and national recommendations to help physicians care for patients under this treatment [1721]. Three experts (XM, TP and FL) met to formulate correct and deceptive items. They based their work on the published international and national recommendations to help physicians care for patients under this treatment, to first determine the correct items, and then propose deceptive items. Each expert has elaborated 20 deceptive items, within 4 categories: patient's history, physical examination, biological, radiographic or other tests and other preventive measures. From the 53 elaborated items (duplicates were eliminated), only the more believable were kept, allowing to propose 38 deceptive items, which were mixed with the 35 correct items in questionnaire B.

Scoring

Responses to questionnaire A were coded for comparison to those of the other two questionnaires. For each item, the response was classified as "item correctly selected"; "item incorrectly selected"; "item correctly not selected"; "item incorrectly not selected." We classified each item according to three sources: an evidence-based literature search of clinical practice concerning TNF-blocking drug management, international and national guidelines, and a French clinical tool guide on use of TNF-blocking agents elaborated by an expert panel of academic and community physicians [22]. From these sources, we developed two checklists of items for pretreatment assessment for TNF-blocker therapy: a long version extracted from the French clinical tool guide on use of TNF-blocking agents [22], with detailed data on research into possible contraindications (Table 1), and a short version extracted from the same clinical tool guide and from the French agency for health care recommendations with items mandatory in France (Table 2) [23].

Table 1 Long version of the checklist of correct items in pretreatment assessment for TNF-blocker therapy extracted from the French clinical tool guide [22]:
Table 2 Short version of the checklist of correct items in pretreatment assessment for TNF-blocker therapy extracted from the French agency for health care recommendations (mandatory in France)[23]:

Questionnaire administration

During the 2004 French Society of Rheumatology meeting, rheumatologists were asked to participate in a survey concerning pretreatment assessment in cases of therapy with TNF blockers, which aimed at detecting contraindications to treatment. The survey was conducted on behalf of the Club Rhumatismes et Inflammation (CRI), the division of the French Society of Rheumatology dedicated to musculosquelettal inflammatory diseases.

Until the targeted sample size was achieved, the survey distribution was randomized, with each physician receiving only one questionnaire format (A, B or C). Rheumatologists were blinded to the hypothesis. Particularly, they were unaware of the existence of different response modalities, of deceptive items in questionnaire B, and that all items of questionnaire C were correct. Time to complete the survey was limited to fifteen minutes.

Four interviewers were responsible for encouraging participation in the survey, explaining the official nature of the survey, checking that all the questionnaires were correctly completed in the time allowed, and checking the randomization was achieved. Participation was voluntary, and physicians' responses were kept anonymous.

Statistical analysis

A chi-square test was used to compare the proportion of items selected or not in terms of the questionnaire format of each of the three questionnaires. A p < 0.05 was considered statistically significant. Pairwise chi-square tests with Bonferonni correction (corrected significant probability of 0.017) were used to compare variables with statistically significant differences. Statistical analysis involved use of SAS Release 8.2 and Splus 6.2.

Sample size calculation: Three sets of 100 questionnaires – one set for each of the three questionnaires – were planned for the analysis. In fact, when considering pairwise comparisons for the response item "tuberculin skin test," with a sample size of 100 in two groups, a two-group chi-square test with a 0.017 two-sided significance level would have 80% power to detect a difference between a 65% proportion in one group and a 85% proportion in the other group. Because we expected 15% incomplete or non-analyzable questionnaires, we distributed 350 questionnaires.

Results

Respondents

Of 350 questionnaires dispensed (114 questionnaire As, 118 questionnaire Bs and 118 questionnaire Cs), all were completed, and all responses were eligible for further analysis. Table 3 displays demographic and specialty characteristics of physicians responding to the identical format part of the survey. Physicians were similar in terms of sex, practice duration and practice modalities. Questionnaire A respondents were younger than those of the other two questionnaires. Only two questions were asked about rheumatologists' experience with TNF-blocking drugs: 69.4% had already prescribed anti-TNF therapy and 43.1% had access to a checklist for screening potential contraindications in their department.

Table 3 Demographic characteristics and specialties of physicians completing questionnaires A, B and C.

Questionnaire responses

Although we expected 15% incomplete or non-analyzable questionnaires, we did not observe any missing data for open-ended or closed-ended questionnaires.

Significant differences depending on questionnaire format were found in reporting pre-treatment assessment. Compared with the two closed-ended questionnaires, the open-ended questionnaire gave lower reporting of items correctly selected and correctly not selected (Table 4).

Table 4 Percentage of physicians who correctly selected all the items of the short or long checklist for pretreatment assessment for TNF-blocker therapy and to specific items, by questionnaire A, B and C and pairwise chi-square comparison

In terms of global results, none of the questionnaire A respondents proposed response items of the long checklist, although 5.0% and 5.9% of questionnaire B and C respondents, respectively, correctly selected all items (Table 4).

For the A, B and C questionnaires, 50.4%, 84.0% and 95.0% respondents, respectively, correctly selected all mandatory response items of the short checklist. When focusing on response items within the short checklist, questionnaires B and C did not produce differences in responses to item "order chest X-rays" also difference was observed with open- and close-ended questionnaires (p < 0.0001) (Table 4). In contrast, respondents to questionnaires A, B, and C significantly differed in responses for another mandatory item, "obtaining a tuberculin skin test": 65.8%, 85.7% and 95.8% respondents, respectively, identified this item.

Rheumatologists completing the closed-ended questionnaire B, with deceptive response items, more often chose these items, such as seeking advice of a systematic lung specialist (26.1%) or determining blood sugar level (40.3%). None of the questionnaire A respondents spontaneously proposed these items. Questionnaire B respondents showed a tendency for a lower percentage of correctly selected items than questionnaire C respondents. The open-ended format allowed for collecting qualitative data on items that we did not propose in the close-ended questionnaires, such as "give information to the patient on potential adverse effects" or "give information to the patient on monitoring these drugs."

Discussion

We compared three clinical vignette-based survey response formats: an open-ended questionnaire, a closed-ended (multiple-choice) questionnaire with cued correct items and a closed-ended questionnaire with deceptive items mixed with correct items. As expected, use of a closed-ended questionnaire with cued items overestimated physicians' performance as compared with an open-ended questionnaire, given that the latter is considered as the gold standard in assessing practice [5, 12, 24]. Also as expected, the open-ended questionnaire supplied more information on clinical practice than the close-ended questionnaires; physicians were more willing to provide information to the patient. Although we included response items on examinations or tests, such as cutaneous examination, in the closed-ended questionnaires, none of the respondents of the open-ended questionnaire suggested such tests.

Our study focuses on the difficulty in evaluating the quality of physician performance for specific domains with open-ended questionnaires. Physicians may be more brief with open-ended formats and responses may be less accurate. Of the 114 questionnaire A respondents, 74.4% responded with "tuberculosis" to the "other tests" question but gave no specific description of a test or what clinical examination they would do to evaluate this tuberculosis risk. In the closed-ended format, we assumed that including deceptive items would influence respondents' answers and lower the overestimation inherent in the closed-format survey. To our knowledge, this is the first time that deceptive items have been mixed with cued items in a close-ended questionnaire. Questionnaire B respondents indeed selected fewer correct items than did questionnaire C respondents. However, these results were very different from those obtained with the open-ended questionnaire (A), which are probably closer to reality.

The influence of framing questionnaire items remains crucial for clinical practice evaluation. This bias in response acquiescence has been reported from study of two versions of a training satisfaction questionnaire randomly distributed to medical residents; in one, half the items were stated positively and half negatively, and in the other, all items were stated positively. Results showed a significant effect of positive versus negative framing [25].

Conclusion

In conclusion, even if closed-ended questionnaires may provide more accurate data in clinical practice evaluation, general open-question format has value in such evaluation. Strategies for generating quantitative and qualitative data from open-ended questionnaires, associated or not with closed-ended questionnaires, facilitating survey analysis, are very likely interesting to develop to improve physician performance evaluation [25].