1 Introduction

Waiting room (WR) surveys are conducted in the waiting room of a health care facility before or after clinical encounters (Hogg et al. 2010). These kinds of surveys are commonly used to study patients’ attitudes, knowledge, behaviors related to consultations and other characteristics. Though, in many studies, the methodological choices for WR surveys are not explained, and also response rates are not always mentioned (Pirotta et al. 2002). Guiding literature on WR surveys for practitioners is scarce and outdated (Pirotta et al. 2002; Hogg et al. 2010). Taking the Total Survey Error approach (Groves 2004; Groves and Lyberg 2010), in this paper we synthesize practical guidelines from prior studies, and our own experience in using WR surveys.

To get a better understanding of the advantages and disadvantages of WR surveys, we will compare these surveys with two other kind of surveys: (1) drop-off-pick-up surveys (DOPU; Trentelman et al. 2016) and (2) public intercept surveys (PI; Cowan 1989). In DOPU surveys, surveyors hand deliver a self-administered paper-and-pencil questionnaire at the sampling members’ homes. In a WR survey, patients waiting for their appointment in the hospital or health care centre are requested to self-complete a questionnaire on particular health care topics; thus, in terms of data collection it seems similar to DOPU surveys.

A PI survey entails selecting a public place such as a shopping mall, a public park, an airport or a gas station, and approaching people who happen to be present. In general, public places provide useful opportunities for data collection for research topics with a place-based focus (Jackson-Smith et al. 2016; Rookey et al. 2012). WR surveys therefore resemble PI surveys in terms of sampling; the sampling takes place in a public environment and is based on convenience. These kinds of convenience sampling methods have been critized (Cowan 1989; Jackson-Smith et al. 2016), but there are also beneficial factors.

In our comparison of WR versus DOPU and PI surveys we will take both the criticism and the beneficial factors into account by discussing the Total Survey Error approach (Groves et al. 2004; Groves and Lyberg 2010), while also discussing three examples of type of survey (Table 1). These examples (Due et al. 2018; Lin et al. 2014 and Lee et al. 2019) were selected based on their similar populations (elderly people), and have similar survey topics (social relations, loneliness, and/or accessibility).

Table 1 Examples of WR. PI, and DOPU surveys

2 A comparison of WR versus DOPU and PI surveys

The total survey error approach (Groves et al. 2004; Groves and Lyberg 2010) is a helpful framework in comparing different survey designs for statistical error properties. The model makes a distinction between error sources related to measurement (validity, measurement- and processing error) and representation (coverage-, sampling- and nonresponse error). Since most criticism on WR surveys exists with respect to errors sources related to representation (e.g., the sampling method), we will discuss this component first. Table 2 summarizes our comparison between three survey methods of WR surveys, DOPU surveys and PI surveys, and the comparison is explained in more detail in the next subsections.

Table 2 Comparison of data collection characteristics of WR, DOPU and PI surveys

2.1 Representation: coverage error in WR versus DOPU and PI

Coverage error refers to a mismatch between the sampling frame and the target population. DOPU surveys allow for using sampling frames that enable random sampling methods like multi-stage sampling with field listing or address-based sampling using residential address lists (Kalton et al. 2014). Thus, a target population in DOPU surveys can include the general population, and topics for data collection are not restricted. Administration of intricate survey designs, implementation of complex eligibility criteria for respondents, and introducing respondents to low interest topics can be a reason to choose a DOPU design rather than a traditional mail survey (Trentelman et al. 2016). In the DOPU example case (Lee et al. 2019), homeowners were the target group, which made their sampling procedure using addresses from the homestead tax exemption recipient list very useful. In WR surveys and PI surveys, the sampling frame is limited to people who happen to be present at the facility during data collection. Thus, sampling is based on selection of places and periods, though stratified samples of places can be implemented (Bator et al. 2011). In the WR example case (Due et al. 2018) patients were the target group, while the aim was to relate their responses to a matching questionnaire filled out by the general practitioner. Because of that match, and both persons filling out the questionnaire at the same time, recruiting participants in 12 different general practices is a useful choice. In the PI example case (Lin et al. 2014) elderly train users were the target group. Although detailed procedures are not given in their paper, we assume that all train users were approached regardless of their perceived aged, since from the original sample of 940 train users, only 122 were elderly users. Thus, a target population in WR and PI surveys does not include the general population, and topics for data collection are restricted to place-based topics.

2.2 Representation: sampling error in WR versus DOPU and PI

Sampling error refers to the error that occurs due to the fact that not an entire population of interest is studied. Sampling error can be reduced by drawing a random sample as is commonly done in DOPU surveys. Also, the sample size could be increased, however, risks of selection bias then still remain (Taherdoost 2017). The DOPU example case (Lee et al. 2019) indeed used simple random sampling with sample size determined by applying the probability sampling formula for a simple random sample (with a confidence level 95%, margin of error = 5% applied to the population size). In PI and WR surveys, the sample size has more impact on the length of the field period and also depends on the number of facilities included, and obviously, in waiting rooms selection of times and days of the week is restricted to the opening hours of the facility. The WR example case (Due et al., 2018) used a three-week period and 12 facilities, the times and days of the week were not mentioned in the paper. Based on statistics available on the average frequency of GP visits in Denmark, the authors argue that their target group were likely to be frequent GP visitors. Such information is useful in estimating sampling error.

Problems in PI surveys may arise in terms of frequency bias and length bias: individuals who visit a public space more frequently or who spent more time at such a facility are more likely to be oversampled (Nowell and Stanley 1991). Sampling bias may be adjusted by using frequency of visits to a facility in sample weights. Length bias in WR surveys may be affected by unpunctual patients, no-shows, walk-ins, and/or emergencies. These factors may be distributed unevenly over time, but patients’ unpunctuality is generally assumed to be independent of their scheduled appointment times (Cayirli and Veral 2003). Both in WR and PI surveys, time-based sampling can be helpful in reducing frequency and length bias in sampling (Bruwer et al. 1996). A simple, but not very efficient design is to sample all eligible time periods with equal probabilities (Sudman 1980). Sampling time periods with probabilities proportionate to the number of sample members expected in the time period is more efficient (Bruwer et al. 1996). The PI example case (Lin et al. 2014) sampled during only two days, between 6 am and 6 pm, and the authors acknowledge the possibility of sampling error in their conclusion.

2.3 Representation: nonresponse error in WR versus DOPU and PI

Non response error refers to biases that occur due to the fact that not all sample members agree to participate in the survey. As Eckman and Kreuter (2017) note, there is often a connection between non-response and coverage error. For instance, cases may inadvertently be registered as ineligible cases while in reality they should have been considered as nonresponse. These inadequate registrations may occur as a result of sample members finding out eligibility criteria (i.e., when hearing participants need to be of certain age, a sample member may report an age outside that range). Especially in surveys where eligible sample members can observe other potential respondents being approached, they are likely to find out about the eligibility criteria and adapt their responses to selection questions accordingly. Thus, to calculate response rates adequately, it is important to correctly assess eligibility of sample members. In DOPU surveys, delivery of surveys and enumerating the number of eligible respondents depends on actual contact with sample members, and therefore often a distinction is made between response rate (including non-contacts) and cooperation rate (excluding non-contacts). Although in the DOPU example case (Lee et al. 2019) no response rate was mentioned, it can be derived from the number of usable questionnaire and approached addresses given (27.5%). In PI surveys, lack of control in access points complicates tracking response rates (i.e., the proportion of actual respondents among eligible sample members). The PI example case (Lin et al. 2014) demonstrates this difficulty, as it does not provide a response rate, and does not discuss assessment of eligibility of respondents. For WR surveys, registration of eligible respondents is more convenient since they allow for much more control over observation of patients entering and exiting the waiting room. In the WR example case (Due et al. 2018), eligibility of respondents could be assessed in order to compute an adequate response rate (62%).

2.3.1 Possibility of advance notification

Response rates can be increased when respondents are notified about the survey some time before the request to participate, especially when this notification is personalized (Lynn 2016; Goldstein and Jennings 2002; Dillman et al. 2014; Vogl et al. 2019). Advance (and also personalized) notification of the survey is possible in WR surveys when respondents make their appointment, either by phone, mail or email. Likewise, in DOPU surveys it is possible to send personalized advance notice letters, whereas PI surveys cannot be personally announced, but possibilities exist of announcements in local newspapers or billboard advertising. Advance notification was not mentioned in any of our three example cases.

2.3.2 Social influence on participation

In WR surveys and PI surveys multiple sample members are usually present at the same time. This can have both beneficial and detrimental effects on response rates. On the one hand, social validation effects may increase response rates (i.e., believing others are willing to comply increases compliance, Groves et al. 1992), but on the other hand, viewing others already complying makes the request less scarce and as a result, sample members view their contribution as less useful. Ongena et al. (2021) indeed found evidence for this scarcity effect. When fewer patients were present in the waiting room relatively more patients were willing to participate in their survey. In addition, patients waiting alone accompanied by one or more caregivers were less likely to fully complete the questionnaire, than patients waiting on their own. Reasons that Ongena et al. (2021) suggest for this effect are a possibility of reduced anxiety or increased boredom among patients waiting alone. In PI surveys, people with an aversion to PI surveys (i.e., avoiders, see Keillor & Sutton, 1993) can avoid eye-contact or walk away from surveyors as soon as they notice a PI survey is being conducted. In DOPU surveys, potential respondents are usually unaware of behaviors of other sample members because they are contacted individually.

2.3.3 Staff requirements

In a WR survey, costs can be considerably low when reception staff instead of specifically hired research assistants approach potential respondents for the survey. However, this reduces possibilities of persuading reluctant respondents. In any survey method where surveyors are personally approaching potential respondents, there are increased possibilities in verbally persuading reluctant respondents when staff is specifically trained (Dijkstra and Smit 2002; Ongena and Haan 2016), which can increase response rates (Bowling 2005). Nonetheless, surveyors may differ in their impact on the representativeness of the achieved sample (e.g., Blom et al. 2011; Jäckle et al. 2013; Durrant et al. 2010). Due to the fact that WR surveys are relatively small-scale, experiments on surveyor behavior can easily be conducted (see Ongena et al. 2021). As the WR example case (Due et al. 2018) shows, staff at the facility can be instructed to recruit participants and distribute questionnaires, which is very commonly done in WR surveys. However, Due et al. 2018 do note that it was difficult to recruit more than 12 practices for participation, though the GP practices that did participate appeared to be representative in terms of geographical spread and solo versus partnership practices. In DOPU and PI surveys, staff needs to be hired to visit respondents at their homes or at public places. In PI surveys, the staff usually also administers the survey (i.e., face-to-face), which requires more time and hence increases costs. Both the PI (Lin et al. 2014) and DOPU (Lee et al. 2019) example cases provide no information on their surveyors.

2.3.4 Time constraints in completion

Time constraints are largest in PI surveys, where usually immediate participation is required, interrupting the respondents’ current activities. In WR surveys, the survey is mostly aiming for preprocess administration (i.e., from the time of arrival to when the patient is called for their appointment, Becker & Douglass 2008) though postprocess (i.e., after the medical appointment) is possible as well. Thus, in WR surveys time is only constrained within waiting time, and respondents might find filling out a survey a useful task while waiting for their appointment. Though survey completion postprocess (i.e., after the appointment, before leaving the hospital) is an option, it is not recommended, as surveyors would have less control over actual completion. In WR surveys, the request is also not as a sudden interruption as is the case in PI surveys. Especially when waiting times are long the willingness to participate may be increased, though when waiting times are exceptionally long frustration may negatively influence compliance. In addition, the surveyor can monitor the process of filling out the questionnaire from some distance, In DOPU surveys, respondents have most time to complete a survey. Using a “doorknob” bag allows the surveyor to pick-up the completed questionnaire without further contact with the respondent. Nevertheless, a pick-up time, for instance 24 h to 72 h needs to be agreed upon (Trentelman et al. 2016).

2.3.5 Collection of information from non-respondents

Non-response bias can be adjusted through weighting survey outcomes with characteristics of non-respondents, though this adjustment is still topic of debate (Kreuter et al. 2011). The ability of collecting information on non-respondents in DOPU surveys depends on interviewers’ contact abilities, although information on for example neighborhood characteristics can also be collected when no contact is made (i.e., urbanicity level, presence of gates etc.). In the DOPU example case (Lee et al. 2019) such characteristics on neighborhoods seem not to have been registered, as they are not reported. In a WR survey and to some extent also in a PI survey, it is possible to collect characteristics like the non-respondent’s gender, the number of people present in the waiting room or public area, time of survey administration and (in WR only) the length of time the patient had waited before being approached. Characteristics like age, race, insurance status, employment status, triage acuity level, chief complaint, and triage time using the electronic health record, can be asked from non-respondents who are willing to orally respond to a small number of questions from a surveyor or who agree with accessing their health records (Shaikh et al. 2013). In the WR example case (Due et al. 2018), elaborate information is available from non-respondents, on the basis of which it could be concluded that participants were younger (59% under 75) than non-participants (43% under 75), while gender was distributed equally. In the PI example case (Lin et al. 2014), characteristics of non-respondents seem not to have been recorded.

2.4 Measurement: validity in WR versus DOPU and PI

Measurement component validity refers to the adequacy of conceptualizations. Survey researchers are interested in concepts that can be measured by means of standardized questionnaire items, to be self-reported by respondents. In questionnaires, valid measurement often requires longer questionnaires. However, in PI and WR surveys, usually long surveys are not possible, whereas the DOPU survey could allow for longer questionnaires. However, when surveyors are distributing questionnaires, it is also possible to combine this task with observation of events and readily observable non-verbal behaviors onsite. For instance, Bator and colleagues (2011) were able to unobtrusively observe littering behavior and unbiasedly survey the same individuals in their PI survey of litterers and disposers, pairs of observers and interviewers. The PI example case (Lin et al.2014), combined data collected by means of surveys was with data on land use, road networks, public transport networks, services, and patronage from governmental institutions. The WR example case (Due et al. 2018) also shows a useful means of data collection that entailed matching responses from patients with a matching questionnaire filled out by the general practitioner.

2.5 Measurement: measurement error in WR versus DOPU and PI

Measurement error refers to the errors, both systematic and random, that may arise in receiving responses to items. In general, social desirability concerns are lower for off-site surveys than onsite. Social desirability concerns therefore seem lower for DOPU than for PI and WR surveys. A direct comparison between on-site WR survey and an off-site mail survey with paper-and-pencil questionnaires revealed that respondents aged below 45 reported higher patient satisfaction ratings on-site than off-site, but no such associations were found for respondents aged over 45 (Burroughs et al. 2005).

In addition, onsite research may especially have added value when specific events and experiences during the event are measured. As Wofinden (2003) states: “the need to survey people as they make a journey or ‘immediately encounter an experience’ in order that their recall of behaviour is better than it would be in a remote location and they focus their attention on the experience.”. Furthermore, conducting survey research on a location where people are performing activities that are of interest in the survey can create opportunities. For instance, a smart design of the survey procedure also allows for combining observation of actual behavior (i.e., littering behavior) with responses to a survey, as shown by Bator et al. (2011). In general, on site surveys allow survey researchers to ask people about their behaviours and thoughts as they encounter an experience on the spot. Since respondents are in the midst of the experience, this likely entails optimal recall of behaviour and experiences, though timing of the survey is essential, since respondents may also adapt their behavior as a consequence of becoming aware of the survey. From our three example cases, the PI survey (Lin et al.2014) most clearly shows this connectedness of the experience (begin in a train station while just having commuted to that station) and the questions asked in the questionnaire (accessibility of the train station).

2.5.1 Possibilities of mode for data collection

Completion of the survey in WR and DOPU surveys is mostly through a paper-and-pencil questionnaire (as is true for the WR and DOPU example cases), while PI surveys are mostly completed by means of a face-to-face survey, as is true for the PI example case). Moving to electronic web surveys is an obvious strategy in this digital era. Slater and Kiran (2016) conclude that e-mail surveys offer a convenient, low-cost method of regularly surveying patients to improve quality of care, but patients living in low-income neighborhoods are likely to be underrepresented, so this change can also affect coverage rates. Due to the fact that in WR surveys respondents are seated and stay within view of the surveyor, in these surveys it is possible to use tablets for digital completion. Although use of tablets has not demonstrated to increase response rates for on-site surveys, respondents do enjoy an electronic mode and prefer this mode over the traditional paper-and-pencil survey (Davis et al. 2012; Fanning and McAuley 2014). The advantages of electronic administration over paper-and-pencil administration in terms of response rates may be greater for longer surveys; the more pages, the less likely respondents will comply. In the early days of tablets, respondents were also fascinated by the technology (Pfaffenberg et al. 2014). For a single survey, use of tablets may not be very cost effective, but for multi-year projects with a sample sizes over 100 respondents and survey lengths over 5 pages, tablets are expected to be more cost effective and efficient than paper-based surveys (Hassler et al. 2018; Leisher 2014). Tablet surveys also have lower average completion time and were found to be fully complete more frequently than questionnaires completed on paper (Hassler et al. 2018). In a mixed-device survey with a student population however, no effects in data quality and psychometric survey properties were found (Ravert et al. 2015). The expected number of daily survey participants and the time necessary to complete a questionnaire will help in estimating the required number of tablets and surveyors (Hassler et al. 2018).

However, administration with a tablet to be used by many respondents requires hygiene measures (Pfaffenberg et al. 2014) and the number of tablets available may be a limiting factor in surveying all sample members on a given time (Hassler et al. 2018). A good alternative to tablet is using QR-codes for allowing respondents to complete the survey on their own smartphone or tablet, while in DOPU surveys it is also possible to give a simple URL and login for completion on a PC. In public transport surveys (Guirao et al. 2015) and student surveys (Snyder et al. 2018; Onimowo et al. 2020; Faggiano et al. 2021) QR codes have shown to reduce administrative time and reaching reasonable response rates (Snyder et al. 2018; Faggiano et al. 2021), and as such being an eco-friendly alternative to paper-and-pencil questionnaires (Onimowo et al. 2020). Since they only need to be scanned, they are also preferred over bit.ly short links that require users to type, with the possibility of errors (Guirao et al. 2015).

Tablets and or QR codes could also be used in a mixed-mode design allowing respondents to choose their preferred mode. Although in other mixed mode designs no positive effects for response rates were found (Haan et al. 2014), commonly highest response rates are found when respondents were offered a choice between paper-and-pencil and web modes (Börkan 2010; Greenlaw and Brown-Welty 2009; Kiernan et al. 2005; McCabe 2004; Schonlauet al. 2003; and Sax et al. 2003).

2.6 Measurement: processing error in WR versus DOPU and PI

After data collection, processing errors can occur when preparing data for analysis. These errors, that are often underestimated (Jedinger et al. 2018), can include data entry (for non-computer-assisted surveys), and cleaning and editing of data (i.e., coding open-ended answers, assigning variable and value labels, handling missing values, and implementation of survey weights) and tabulation of survey data (Biemer 2010). Computer-assisted data collection, that can be used most easily in WR, and only to a smaller extent in DOPU and PI surveys, can prevent a large number of errors, provided that routing errors and implausible values are taking into account when pretesting the software. An important issue in surveys administered in different facilities, is that different selection probabilities in sampling due to differences in waiting room circumstances should be corrected for in sampling weights. Decisions on characteristics used as basis for the weights, their origin and the weighting method should be described in the methods section (Jedinger et al. 2018).

3 Discussion

From a comparison of WR, DOPU and PI surveys, we can conclude that though coverage is limited to the target population of health care users, WR surveys yield a useful sample when assessing patients’ perspective on health care. However, a justification of choices in selecting potential respondents is important, and problems in representation may arise with specific groups of patients who do not want to participate. An important advantage of WR surveys is that they do not suffer from the problem of non-contact as household surveys like DOPU surveys do. However, some level of non-contact may arise with patients who did not show up at all for their appointment. Due to the fact that non-verbal visual contact (and in most cases also verbal contact) is available, in WR surveys eligibility criteria can be clearly assessed, which allows for adequate determination of response rates. The mere fact that para-data is available; i.e., relevant information in the circumstances of a request to participate in survey research (Kreuter 2013), is helpful in non-response research. In addition, WR surveys allow for easy control of behavior of surveyors, which in turn is useful in experimental design. The combination of control over surveyors and availability of para-data creates insightful opportunities in investigating the persuasion of respondents.

Importantly, response rates and other relevant methodological details in WR surveys are often not reported. Therefore, it is also not known whether for instance frequency and length bias are taken into account in adjusting sampling weights, or whether sampling is adjusted to sampling times on the probabilities proportionate to the number of sample members expected in the time period (Bruwer et al. 1996). For potential respondents who refuse to participate, it is still possible to note down their gender and, if respondents comply with an ultra-brief interview, questions on their age, general frequency of visits and length of stay could be asked quickly. Similarly, for respondents who give consent for the full questionnaire such questions could be asked at the very start of the questionnaire, in order to account for these characteristics in incomplete surveys, for respondents who were called to their appointment prior to finalizing answering all items.

The fact that a WR survey is conducted on a specific location where respondents have full access to their thoughts about the experience (i.e., waiting for their health care appointment) on that location could be exploited to greater extent. Not only does it create an opportunity for optimal measurement in terms of recall (though data could be compromised in terms of social desirability bias), measurement can also be combined with observation of actual behavior and physiological measurement (i.e., biological markers, that can improve the validity of survey measures (Langhaug et al. 2010; Holmes et al. 2007).

Thus, in short, we would recommend:

  • Setting up a WR survey with staff trained as surveyors and observers;

  • In observation take into account variables to inform about the circumstances of data-collection (i.e., number of people present, differentiated in patients and caregivers, time and date of request and number of staff present);

  • In short interviews with respondents and/or in the main questionnaire, include questions with respect to the frequency of visits and type of visit (i.e., as specific as possible in terms of length of stay) and ask about the number of visits to the department to adjust for sampling bias;

  • Sample times proportionate to the number of sample members expected on a particular time.