Improving surveys with paradata: analytic uses of response time

This study uses paradata and survey data from the China Family Dynamic Survey to analyze the issue of response time in a questionnaire survey. The survey questions were classified into three levels and we compared the inter-group differences of response time between four kinds of questions which included information inquiries, logical calculations, intimate information and logical judgments. We used OLS method to perform a regression analysis to identify the impediments to response time and discussed the roles played by the survey locale, and the characteristics of interviewees and those of interviewers. The major impediments that prolonged response time were factors that impeded comprehension of the survey. The impediments included the difficulties interviewees had understanding questions, questions that did not take into account the different cultural backgrounds of interviewees, the complexity of questionnaires, and the performance of interviewers. Finally, this paper puts forward some suggestions for questionnaire design and investigation processes to reduce impediments to comprehension and improve the ability of interviewers.


Introduction
With large-scale surveys being used in many countries, methods of survey quality assessment are getting attention worldwide and are becoming more diverse. However, assessments mainly focus on the internal consistency of a single survey or comparisons between surveys. The development of research methods that can evaluate response behavior and non-sampling errors in social surveys is always a big challenge. Interviewees busy coping with the pressures of daily life often lack the time and energy to complete a survey. In addition to the impacts of the survey locale and characteristics of the interviewers, in some cases interviewees give answers to questions that are not entirely truthful (Feng 2010). More and more researchers have come to realize that response behavior is directly related to the quality of survey data (Fowler 2010). Non-sampling errors are generally believed to result from questionnaire design, the interview process and response behavior (Lessler and Kalsbeek 1992). Research on response behavior to a survey needs to rely on paradata from that survey. Paradata concerns the context and processes surrounding survey data collection (Evans 1996;Lin et al. 1996;Kreuter 2013), and it is difficult to observe and record in traditional surveys. Only a handful of surveys have recorded paradata, but the paradata are rarely published, resulting in slow progress in the development mining techniques for paradata (Zhang et al. 2003;West 2018).
Both paradata and survey data can be used to explain response behavior. Response time, which reflects the understanding interviewees have of survey questions and their willingness to answer questions, provides important information for questionnaire design and is of value for quality evaluations. In addition, data collection methods also affect response rates and quality of survey data (Mac Elroy 2000). "In particular, the interaction between length of survey (both in terms of content and number of questions) and incentive (either total incentive offered as a prize package or the approximate value of the incentive on an individual basis) has been thought to influence the quality of survey" (Johnson et al. 2006). Specifically, during survey design, the factors affecting the questionnaire include whether the questionnaire is acceptable overall, whether survey plans are well-designed, whether questions are concise and whether questions are put in the correct order. When seeking interviewees, the main factor affecting survey quality is whether the interviewees' contact address and telephone number are obtained. Additionally, special tracking efforts (record of interviewees' tracking and inquiry information) and well-trained interviewers also influence survey quality (Sun et al. 2011). During the stage of data collection, the quality of the survey depends on the making of appointments with interviewees, conducting interviews at different times, being allowed multiple to access interviewees, the gender of interviewees, establishing trust relationships, and whether there is compensation for participating (He et al. 2008).
Overall, the methodologies used to evaluate survey quality are inadequate and this creates enormous barriers to evaluating the quality of social surveys. The most common approach to evaluations is to compare key survey variables with data from previous surveys. But to be effective, this approach needs to draw information from many successful investigations. Due to the lack of data support, prior research has focused mainly on the study of survey questions that are not answered for various reasons. A few studies have analyzed response time. The response time reflects the understanding interviewees have of the questions and their willingness to answer. Analyzing the characteristics of response time can make good use of paradata for quality assessments, and can also provide valuable reference points for the future research.

Concepts
Response time can be defined as the time required for the interviewee to complete the entire questionnaire or answer a single question. The concept can be divided into two types: response time for the questionnaire and response time for each question. Response time for the questionnaire refers to the total time spent on the questionnaire from start to finish. Beginning the questionnaire is marked by clicking the "Start" button on the tablet and the end is marked clicking the "Submit" button. The time between the two clicks is the response time for the questionnaire. The response time for a particular question refers to the time between the beginning of nth question and the beginning of (n + 1)th question. After answering the (n − 1)th question, interviewees need to click the "Next Question" button to move to the nth question. The time between two "Next Question" clicks is the response time for the nth question. The recording of response times is heavily dependent on the use of CAPI (Computer Assisted Personal Interviewing). CAPI can record the response behaviors of people conducting or taking surveys; e.g., the time interviewers spend asking questions, the time interviewees spend reacting, and the time interviewers spend on clicking. Therefore, the notion of response time used in this paper does not simply refer to the reaction time of interviewees, but the time spent on all of the actions needed to complete a questionnaire or a single question.

Data
The China Family Dynamic Survey, which was conducted in 2014, used a stratified multistage PPS design and covered 31 provinces, autonomous regions and municipalities, targeted 32,500 families from 1624 sample communities in 1560 town or urban districts distributed among 233 prefecture-level cities and 321 county-level cities and districts. A total of 32,494 families were actually interviewed. During the face-to-face interviews, interviewers used the CAPI system on a laptop or PAD to recorded process information such as the response time for the whole questionnaire or a single question. The Survey used three questionnaires: the Family Questionnaire, the Individual Questionnaire and the Community Questionnaire. The Individual Questionnaire was used in interviews with children, teenagers, adults and the elderly. Adults formed the largest sample of those who completed the Individual Questionnaire and this study uses these adult questionnaires as the object of analysis. After invalid questions were removed, there were 47 valid questions in the adult questionnaire that focused on the health and lifestyle, work and income, marriage and contraception, and migration and social security of interviewees.

Methods
The purpose of this study is to analyze key determinants of response time in surveys. First, this study used paradata from adult questionnaires that were part of the China Family Dynamic Survey to compare response time length for different questions and subjects. Methods used in other studies to ensure the reliability of measurements were also referred to. Second, because response time is a continuous variable which should follow a normal distribution, OLS regression is an appropriate method to analyze the factors influencing response time.

Response differences by variable
Considering the types of question, content and responses, the adult questionnaire can be divided into information inquiries, logical calculations, intimate information and logical judgments. These questions can also be divided into subcategories for personal information, employment information, time calculations, number calculations, sex and reproductive health, subjective judgments, knowledge judgments and conceptual judgments. Both four-classification and eight-classification show response times for the various groups that are significantly different (Table 1).

Response time for information inquiries
Familiarity with the content of questions is one of the major factors influencing response time. While familiarity depends on an individual's life experiences, memory, and cognitive level, information inquiries are generally easy to answer and had an average response time of 9.73 s. Personal information questions about gender, height and weight are especially easy to understand and had average response times of 9.10 s. However, the response process for employment information, such as job experience, occupation, industry type, and work content, is a bit more complicated and the average response time was longer (10.28 s). It is worth noting that the response time for information inquiry questions varied with the individual's memory and cognitive level. Questions about industry type required interviewees to have an understanding of classification standards, so the average response time for this question was much higher than for other information inquiry questions. 1 3 Improving surveys with paradata: analytic uses of response…

Response time for logical calculations
Compared with information inquiry questions, the answers to some questions required both basic knowledge of the question's content and a logical calculation. Therefore, the response time for these questions was longer than for information inquiry questions. Unit standardization is an important condition to ensure the efficiency of logical calculations. Compared with complex questions, such as questions concerning reading quantity and annual consumption, the response time for questions requiring time computation is shorter.

Response time for intimate information
Each culture has its own customs and social norms, and as a result, individuals have different attitudes and ways to deal with questions about intimate information. The average response time for intimate information questions was 18.19 s, higher than the response times for information inquiries and logical calculations because personal privacy issues were involved. Moreover, the more intimate the questions were, the longer the response time was. Questions which involved personal memories like the dates of changes in marital status and the first sexual behavior were relatively difficult to answer and the average response time to such questions was more than 20 s. Questions asking whether an individual had had sexual encounters or been pregnant did not require a lengthy thought process to answer, and the average response time was relatively short (generally no more than 10 s).

Response time for logical judgments
Of the four kinds of questions, logical judgment questions had the highest average response time of more than 20 s. This group of questions was more influenced than other kinds of questions by questionnaire design and the investigative process, meaning that interviewees needed more time to figure out the meaning of the concepts. However, the questions based on subjective judgments were relatively easy to answer, and the average response time was 9.73 s. Questions such as "What do you like to read?", "What health problems are you worried about?", and "What chronic diseases do you have?" were harder to answer and the average response time was longer (14.17 s). In general, both interviewers and interviewees had to spend more time understanding questions that touched upon relatively subjective concerns like pressure or relatively abstract and complex concepts like migration experiences, especially in the case of questions that were interconnected with other questions. Response times were longer as a result. In the case of migration, interviewees often had to ask first for clarification of what was meant by the term migration before they could answer questions concerning the frequency, distance and time span of migration. The average response time for these questions approached 40 s and was much longer than the response time for other judgment questions. It is worth noting that once the concept of migration was clarified during the answering of the first question, interviewees had shorter response times answering additional questions about migration.

Response time differences in different survey locales
The internal factors that affected the total response time for the questionnaire were content and design of the questionnaire. The external factors were related to the survey locale. By comparing total response time for questionnaires completed in the eastern, central and western regions of China, we can see that differences in location had an important influence. The total response time for questionnaires completed in Eastern China was the shortest, with an average time of only 7.82 min. The response time in Central China was in the middle (10.73 min), the response time in Western China was higher (11.17 min), and the response time in Northeastern China was the highest (14.9 min). At the same time, the standard deviation of response time in Eastern China was far below that of other regions (Table 2).

Response time difference connected to the characteristics of interviewers
In addition to the effects of questionnaire design and the locale where the survey was administered, individual characteristics of different interviewers also influenced the investigative process. Overall, when the interviewers were female, the average response time for completing questionnaires was higher than when interviewers were male. Response times were lower when interviewers had a high education level and higher when their education level was low. It is notable that interviewers who were familiar with CAPI could not complete the questionnaire in a short time, and interviewers whose field was family planning spent more time completing questionnaires than interviewers from other occupations (Table 3).

Response time differences connected to the characteristics of interviewees
The characteristics of interviewees were directly related to their level of understanding and how they responded to the questionnaires; these are factors that cannot be ignored. In terms of gender, the average response time of males was lower than that of females, and the standard deviations of both groups were high. Interviewees with agricultural household registrations (hukou) had a lower average response time than 1 3 Improving surveys with paradata: analytic uses of response… interviewees with non-agricultural hukou, and the kurtosis and skewness of the former were higher than the latter (Table 4).
Education is a reflection of the knowledge, understanding and cognitive abilities of interviewees and greatly affects them. In general, the higher the education level of an interviewee, the shorter the response time. Specifically, graduate students had the lowest response time, using only 7.5 min to complete the whole questionnaire, a time that was far lower than that of other interviewees with less education. Interviewees who had received no formal education had the highest response time (10.73 min), a time far greater than those who had received formal education. However, the average response time of interviewees with a lower level of secondary education was lower than that of interviewees with a higher level of secondary education (Table 5).
Cultural sensitivity also needs to be considered as a factor affecting response time because it has a direct impact on response behavior. This view was confirmed by the difference in response times between ethnic minorities and people of Han  nationality. The response time of many ethnic minorities was much higher than that of people of Han nationality. Among the ethnic minorities, the response time of Tibetan, Dong and Tujia minorities was very high, and this was probably related to the different cultural backgrounds of these interviewees that affected understanding of the questionnaire's content (Table 6).

Model design
A descriptive analysis indicates that response time, which is the record of the survey process, is influenced by multiple factors such as questionnaire design, survey locale, and the characteristics of interviewers and interviewees. However, it is difficult to test the diverse ways in which these factors influence response time using descriptive analysis alone. Hence, this research analyzes the impacts of the survey locale and characteristics of both interviewee and interviewer on total response time and the response time for key questions by using OLS method. The influence of the skip pattern is also controlled in the model because the number of questions that interviewees do not answer has an effect on total response time for the questionnaire (Fig. 1).

Characteristics of interviewees
Our findings show that total response time for completing the entire questionnaire was significantly lower for male interviewees than it was female interviewees. The response time of Han nationality Chinese for completing the entire questionnaire was significantly lower than that of ethnic minority Chinese, and the response time of interviewees with higher levels of education was significantly lower than that of interviewees with lower levels of education. However, interviewees with non-agricultural hukou had remarkably longer response time than interviewees with agricultural hukou. Questions about migration are difficult to answer and this is the main reason that the response time for completing the total questionnaire was longer for non-agricultural interviewees. For instance, interviewers asked "Have you ever had to leave your residence (refers to the town/street where you have lived) for more than six months?" Interviewers also asked "When was the last time you left your residence for more than half a year?" These questions about migration were generally relevant for interviewees from rural areas (most migrants in China have agricultural hukou and move from rural to urban areas), but were likely to confuse interviewees with non-agricultural hukou who had always lived in urban areas. Voice records of the survey process indicate that non-agricultural interviewees often asked interviewers for more details about migration questions and in some cases even terminated the interview. In fact this study found that the extent to which survey questions were universally applicable could change the response time, something that other studies have also found to be the case (Yan and Tourangeau 2008). A study of paradata from the US National Family Development Survey indicates that African-American, Hispanic and other ethnic minority interviewees needed more time to answer questions than white interviewees because the design of survey questions embodied the values of mainstream culture. Other studies have found that the reading habits of interviewees have a significant influence on response time. Interviewees who spend more time reading books and news are more likely to be affected by the universality of questions (Couper and Kreuter 2013). Based on the data for the response time to logical calculation questions, retired and unemployed people answered economic questions more quickly than other interviewees. This is because, first, they had a single source of income and the amount was easy to calculate and, second, they were less likely to be concerned about keeping the amount of their income confidential. On the other hand, interviewees experiencing high levels of economic pressure had longer response times to such questions because they needed more time to calculate their incomes and hesitated to report the amount. With respect to the response time for subjective judgment questions, interviewees participating in labor markets had faster reactions to questions about economic pressure, work stress and work-life conflicts. Farmers, students and retirees had lower response speed because the questions were not relevant to their current life experiences. In the case of knowledge judgment questions, there was a complex relationship between the education levels of interviewees and their responses. For example, interviewees with higher levels of education took less time to answer questions about chronic disease types while interviewees with lower education levers took less time to answer questions about health awareness (refers to health issues of concern in Table 1). A random selection of voice recordings of interviews indicates that interviewees with high education levels paid more attention to their health status and quickly understood questions about chronic diseases (like hypertension, heart disease, diabetes, arthritis, slipped disc, malignant tumors, and cervical spondylosis). However, a number of knowledge judgment questions covering subjects such as smog, water safety, food safety, patient care, medical care, medical insurance, and waste disposal were unclear in intent and confusing. Interviewees usually asked for clarification, for example, asking interviewers whether water shortage problems are part of water safety problems or asking interviewers to explain the difference between "medical care is difficult", "medical care is expensive", and "medical insurance". These questions were raised more frequently by well-educated interviewees who were more likely to ask for clarification than the less educated, who did not ask additional questions and responded more quickly.
With respect to the response times for conceptual judgment questions, interviewees with high education levels took less time than interviewees with low education levels to answer questions about migration experience. Interviewees participating in labor markets needed more time to answer these questions. Moreover, because the concept of migration is multi-faceted and complex, the response time increased rather than decreased for interviewees with good learning and reading habits.

Characteristics of interviewers
With respect to total response time for completing the questionnaire, male interviewers took less time than female interviewers, and interviewers with high levels of education took less time than those with low levels. It is worth noting that interviewers with higher CAPI proficiency actually took more time to complete the questionnaire. Interviewers engaged in family planning work took significantly more time than interviewers working in other fields. It is interesting that interviewers who used CAPI proficiently and had family planning work experience could reduce the response time for many questions, but the total response time for completing the questionnaire was, unexpectedly, greater. A random selection of voice recordings of interviews suggests that family planning workers were more familiar with CAPI than other interviewers and had good relationships with interviewees in their area. Interviewees and interviewers tended to interact more during the part of the survey process that covered subjective judgment questions. Conversations often turned to other topics, resulting in increased response time for completing the whole questionnaire. The characteristics of interviewers did not significantly influence interviewees' response time to logical calculation questions. However, interviewers who were family planning workers could significantly reduce the response time to questions about sex and reproductive health. With respect to knowledge judgment questions, the education and CAPI proficiency of interviewers had no significant influence reducing time interviewees needed to think and respond to questions about whether they suffered from chronic diseases or had other health issues. These findings confirm that it was mainly the characteristics of the interviewees, not of the interviewers, that influenced the response time for knowledge judgment questions. With respect to conceptual judgment questions, interviewers with high education levels and CAPI proficiency could give quick, concise explanations of the concept of migration and this helped to reduce the response time of interviewees to these questions.

Survey locale
Our results show that interviewees in Northeastern China took the longest time to answer the questionnaires. Interviewees in Western and Central China took less time, and response time of interviewees in Eastern China was the least. These findings reveal that reaction speeds differed from region to region, and also reflect the different backgrounds and habits interviewees from different regions have when they are answering questions. The explanation could be that, compared to people living in the relatively more developed Eastern China, people in Central and Western China have a weaker sense of time and more biased understandings which lengthened their response times. In contrast, the sense of time and cultural customs are the major reasons for long response times in Northeastern China where the average education level is high. This conclusion is confirmed by the much shorter response times in the Northeast for questions about migration experiences and other abstract concepts. Furthermore, the response times for subjective judgment, knowledge judgment, and conceptual judgment questions in Northeastern China were far lower than the response times for these questions in other regions. There was also no significant difference in the response times between Northeastern China and other regions for information inquiry, logical calculation, and intimate information questions. This implies that interviewees in Northeastern China were quick to understand the survey questions. However, the time span from clicking "start answer" to answering the first question and from completing the final question to clicking "submit" were longer for interviewees from the Northeast than they were for interviewees from other regions. This lengthened the response time for completing the entire questionnaire. In other words, the interviewers and interviewees in Northeastern China spent less time interacting about questionnaire content, but more time before starting the survey and after completing the process (Table 7).

Conclusions and discussion
This article uses a study of paradata to discuss the relationships between questionnaire design, the characteristics of interviewers and interviewees, the survey locale and the response time for the questionnaire as a whole and for individual questions. First, this study finds that there is a significant correlation between the question type and the response time; for example, interviewees needed more time to answer logical judgment questions than other questions. Second, the characteristics of interviewers and interviewees had a significant impact on response time. Well-educated, male Han interviewees responded more quickly than interviewees with other characteristics, and well-educated male interviewers also played a significant role in decreasing response times. Third, there is a certain similarity in response times for questionnaires completed in the same area. However, response times varied considerably from region to region. The time spent starting and ending the interview process in Northeastern China was remarkably longer than in other regions. These conclusions reflect the fact that variations in survey response times are the result of complex factors, but at the same time, the conclusions also suggest that analyzing response times provides a basis for controlling investigative quality. The conclusions can also support a design process that improves questionnaire quality, especially by reducing non-sampling errors caused by cross-cultural bias and regional differences in large-scale social surveys. The conclusions also provide references in support of innovative decision-making that can help in recruitment and the development of training programs for interviewers, and assist in the implementation of measures to make investigations more robust.

Understanding limitations and response efficiency
The limitations of questionnaires are a major factor affecting response time. There are three principal aspects to limitations: the ability of interviewees to understand the questionnaire, culture compatibility of the questionnaire and the interviewees, and the complexity of the concepts in questions. First, the empirical analysis results show that the higher the education level of interviewees, the shorter their response times. Results for a variety of different types of questions show that interviewees with higher levels of education understood questions more quickly and responded significantly more quickly than other interviewees. Second, although the survey interviewed ethnic minority people whose cultural background differs from that of Han Chinese, questionnaire design was not sensitive to different cultural backgrounds, and this was not conducive to improving response efficiency. The response time of ethnic minority interviewees was quite long. Questionnaire design did not Improving surveys with paradata: analytic uses of response…   take cultural differences into consideration and, as a result, ethnic minority interviewees often had inaccurate understandings of questions. Third, the more complex the question was, the more difficult it was for interviewees to understand the true meaning of the question and for interviewers to explain the question to interviewees. This problem was particularly noticeable in the category for logical judgment questions. The education levels of interviewers and interviewees had no significant influence on the length of the response time to this type of question. The educational levels and cultural backgrounds of interviewees are givens, meaning that questionnaire design must take these factors into account. Using simple, everyday language to ask questions, improving the culture compatibility of the questionnaire, and simplifying the computations needed to answer questions will reduce response time and make the interview process more efficient.

Development of good interviewers
Research into social investigation methods generally concludes there is a positive correlation between the quality of interviewers and the efficiency of investigations. This conclusion, however, does not apply to all cases, because even well-trained interviewers cannot reduce response times for all types of questions. First, careful recruitment and training of skillful, well-educated interviewers can contribute to reduce the response time for logical judgment questions. In other words, there is a positive correlation between the effects of the training interviewers received and their efficiency in dealing with complicated questions. Second, skillful, welleducated interviewers may increase the response time for subjective judgment questions. This shows that interviewers tend to over interpret and this can have a negative impact on survey efficiency to some extent, especially when interviewees must make subjective judgments to answer questions. Therefore, training is needed to help interviewers find middle ground between explanations that provide insufficient detail and excessive, overly detailed interpretations.

The relationship between response time and survey quality
It is important to pay attention to the light paradata sheds on items such as non-response behavior and response time. This is crucial because current social sciences research procedures need more and better data to control completely various types of errors in the survey process. We must explore every possible way to reduce such errors. The relevance to survey results of response time, which serves as the time record of an investigation, deserves more attention because it can serve as an important indicator to measure non-sampling errors. First, there is an essential correlation between the length of the response time and the results of an investigation. Some studies have concluded that there is a noticeable negative correlation between response time and response enthusiasm; longer response time for the baseline survey will decrease the probability of follow-up in the second round (Olson 2013). Therefore, starting from the premise of ensuring that a specific amount of data can be collected and then improving questionnaire design to reduce response time should be seen as a method to increase the follow-up rate. Second, the length of the response time has a direct impact on the answer result. The longer the average response time to a question, the greater the discrepancies among the answers to the question and the higher the standard deviation. With logical calculation and knowledge judgment questions, for example, interviewees whose response times were more than two standard deviations were less objective, and there were more contradictions in the logical associations. The possibility of extreme values in the answers also increased. We should attempt to identify cases of investigations in which there may be problems with response time. Third, there is a significant correlation between the standard errors of the response time and of the answers. When response times have high standard deviation, there is also usually high standard deviation of survey data. Measuring the quality of the questionnaire design by the standard deviation of the response time, therefore, is an effective method to evaluate design quality. This article is limited to discussion of the characteristics of response time and the role it plays in the investigation process and in the relationship between interviewers and interviewees. This study does not offer a detailed look at the relationship between response time and the quality of the investigation. There has been no detailed discussion of the relationship between response duration and survey quality. Further in-depth studies that consider these issue would have crucial practical value and methodological implications for the improvement of social surveys.
Funding This study is funded by the China Social Science Foundation (Grant 16ZDA089)-Construction of China's population data platform .
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.