Sampling and weighting of the Austrian Psychiatric Prevalence Survey (APPS)

Summary Mental disorders are common and have severe consequences for the patients, their relatives, and society. Mental health care planning requires precise knowledge of the prevalence of psychiatric disorders and details regarding the provided treatment. Because administrative data lack information on persons not in contact with health services, we need epidemiologic studies delivering nationwide information on the prevalence of psychiatric disorders. This requirement induces the need for adequate sampling procedures to collect reliable data, allowing for accurate estimations of mental health care needs, over- and underprovision. This is the purpose of the Austrian Psychiatric Prevalence Survey (APPS). The present technical report describes the exact procedure how a nationwide sample has been drawn, adopting a stratified cluster sampling scheme. Because such a complex sampling procedure requires an adequate weighting to obtain unbiased population estimates, this report also contains the exact steps to calculate the corresponding weights. This way, the report not only fosters the full disclosure of the sampling strategy of the APPS, it may also serve as a best practice example for similar endeavours. Electronic supplementary material The online version of this article (10.1007/s40211-019-0305-6) contains supplementary material, which is available to authorized users.


Introduction
For planning adequate mental health care in Austria, the knowledge of the prevalence of psychiatric disorders, the frequency of treatment provided and of the need for treatment is essential [1]. Numerous surveys have shown that mental disorders are common and frequently have severe consequences. For example, increased rates of sickness absence or costs for society due to mental disorders have been reported [2][3][4]. Based on administrative data some authors have reported increasing rates of unemployment due to mental disorders [5] which resulted in the assumption of an increasing prevalence of mental disorders.
However, administrative data are limited by the fact that they can consider only those who are in contact with health services, but lack information about those not seeking treatment [6]. Thus, the estimation of the frequency of mental disorders among the population and its consequences requires data on the general population. Findings from other countries cannot be transferred, because they differ with respect to their regulatory environment (e.g., health services, training of medical staff or regulations regarding unemployment), population composition, geographical structures, and many other factors. Therefore, the Austrian Psychiatric Prevalence Survey (APPS) was planned in order to assess the frequency of psychiatric illness, of health service utilization, of the need for psychiatric treatment and the validity of psychiatric screening tools among the general population [7][8][9][10].

The Quest for a "Representative" Sample
Although "representativity" is a frequently used term, we should use it with caution, for it is not underpinned by a clear definition. For example, Stephan [11] tried to narrow down the term, however, arriving rather at a descriptive statement (e.g., "resembles the population", p. 32) than a mathematically sound definition allowing for deducing concrete action. In this vein, Kish [12] states that "Representative sampling is a term easier to avoid because it is disappearing from the technical vocabulary." (p. 26).
Putting aside the lack of clear definition, we also lack a single universal procedure providing for "representativity" with regard to any population. Rather, the specific structure of the population studied and the research question have to be carefully considered. Kish [12] requires the definition of a population "in terms of (1) content, (2) units, (3) extent, and (4) time." (p. 7). He exemplifies the terms by means of a consumer survey, in which (1) could refer to all persons, (2) to in family units, (3) the US, and (4) in 1965 (ibid.). For the according specification regarding the present study see Sect. "Target Population and the Sampling Framework". Fulfilling these requirements cannot be achieved with a convenience sample (the outcome of which is entirely unpredictable), or any other simple sampling procedure. Rather, we have to carefully develop a sampling strategy allowing for an adequate collection of prevalence data.
When seeking meaningful population data in the context of mental health epidemiology, one has to consider carefully, which population characteristics should be represented adequately. The most fundamental variables to be taken care of are a respondent's sex and age. Next, we have to regard the medical care quality (including also administrative aspects), which we want to cover by distinguishing rural vs. urban population. Although several other aspects would be worth considering as well, we have to limit the requirements to available information (see Sect. "Address Source and Time Frame").

Scientific Demand and Standards -the Objective of this Report
To be able to gauge the extent to which study results can be generalized, we have to be aware of how a sample has been drawn. However, information on sampling are frequently incomprehensive or even entirely lacking. For example, Wancata et al. [9] demand a "checklist of methodological requirements [. . . ] (e.g. sampling methods, [. . . ])" (p. 407). The present report follows this claim and explains in detail the sampling and weighting scheme of the APPS.
The motivation for this report is to give a full account of the intricacy of obtaining a nation-wide representative sample beyond the sparse details usually to be found in articles (claiming "representativity" of their sample not providing convincing evidence, if any). In contrast, the APPS discloses the rationale of how the sample has been drawn in full detail.
This article is structured in the following way: After describing the population to be covered in Sect. "Target Population and the Sampling Framework", we will explain the sampling procedure in Sect. "The Sampling Procedure". Because the sampling comprises probability sampling [e.g., [13][14][15], we have to determine the corresponding weights to take the selection probability correctly into account. This step is described in Sect. "Weighting".  According to the official governmental data base [17], approximately 5.5 million inhabitants of this age group were living in Austria in 2014 (Table 1). Austria is organized in a total of 9 provinces. One of the nine provinces, Vienna ("Wien"), is both a municipality and a province, and at the same time the capital of Austria. Each of the other eight provinces also has a capital.
Overall the provinces are organized into a total of 117 political districts ( Table 2, column 2) including the capitals, which serve as districts of their own. It is a peculiarity of the Austrian population distribution that the capital Vienna is by far the largest city in the country, with a population of (approximately) 1.8/8.8 million (21.3%), and 1.2/5.6 million aged 18-65 years (21.3% as well). The second largest city is Graz (the capital of Styria ["Steiermark"]) with a population of 250,000 (i.e., about 1/7 of Vienna) and overall just six cities with a population of 100,000 or 92 Sampling and weighting of the Austrian Psychiatric Prevalence Survey (APPS) K original article more. Therefore, we treat Vienna rather as a province than a municipality, covering 23 districts.

Sample Size Considerations
Because several analyses involving various procedures are planned, an overall power analysis cannot be performed. Therefore, we calculated as follows: We expect prevalences of the two largest groups affective disorders (F3 according to ICD-10 [18]) on the one hand and anxiety, dissociative, stress-related, somatoform and other nonpsychotic mental disorders (F4) on the other hand of roughly 10%. Moreover, all analyses shall be performed separately for male and female respondents. Targeting about 50 respondents in these subgroups will result in a total of approximately 1,000. This number matches financial and logistic considerations and it is comparable to similar studies [e.g., [19][20][21]. To ensure realizing this target and assuming a low response rate, we decided to include a total of 18,000 respondents.

The Sampling Procedure
To obtain valid prevalence measures, a sample representative of the Austrian population was required. However, a simple random sample was not feasible, because we did not have access to a population register. Moreover, data acquisition was carried out by trained interviewers; hence, a simple random sample would have likely resulted in prohibitive travelling efforts and costs for the interviewers. Therefore, we decided to apply a cluster sampling scheme based on geographical regions [e.g., 22,Ch. 12]. This scheme allowed for employing regional interviewers and thus kept the travelling expenses within affordable limits.
The patient's sex is a key-variable determining both the diagnosis of mental illness and the provision of respective health services. We therefore also stratified the sampling with respect to sex (ibid., Ch. 11). Furthermore, because supply differs considerably between urban and rural areas, we also took this infor-mation into account, arriving finally at a multi-stage stratified cluster sampling scheme (ibid., Ch. 13).

Stratification on Province
Due to the federal structure of Austria, the 9 provinces have key responsibilities in certain public health issues. We therefore decided to represent them accordingly in the sample and stratified in a first step with respect to the provinces.

Cluster Sampling of Districts
Data collection is based on face-to-face interviews, so we have to take the interviewers' routes to the respondents' households into account. Cluster sampling requires a full list of predefined clusters from which a random selection can be performed. Our address source disposes of the respondents' districts; hence, we decided to use this information as primary sampling unit in this step. Austria has a total of 117 districts. Based on logistic and financial considerations, a total of about 40 districts was targeted.
Additionally, the province capitals also play a key role with respect to structural and administrative aspects. Therefore, the following cluster sampling scheme was developed: All 8 provincial capitals, being districts of their own, were used. Due to the specific structure of city sizes mentioned in Sect. "Target Population and the Sampling Framework", this decision was made to represent the urban population accordingly. Due to their structural role, the provinces have to be represented evenly. Therefore, the remaining 32 (= 40 − 8) districts were selected proportional to the number of districts in each province (see Table 2, column 3). After rounding, this calculation resulted in a total of 34 districts to be sampled, 28 rural and 6 urban (see Table 2, columns 4 and 5). These 34 districts were sampled at random from the list of all districts per province, excluding the respective provincial capital (except for Vienna, where 6 districts were sampled at random). Together with the 8 fixed capital districts, we thus arrived at a total of 42 districts, which are listed in Table 2, last column.

Stratification According to Province and Sex
The row percentages of Table 1 show that the two sex groups are virtually of equal size if taken across the entire country (49.95 : 50.05), and also the province shares do not exceed a ratio of 51 : 49. We, therefore, decided to target the same overall number of men and women.
Next, we wanted to represent the nine Austrian provinces and the two sex groups adequately in the sample. For that purpose, we applied the proportion K Sampling and weighting of the Austrian Psychiatric Prevalence Survey (APPS) 93 original article Table 3 Selection probability (in %) of male and female respondents by province of men and women within each province (Table 1, columns headed "col %") to the total sample to be drawn (i.e., 500 men and 500 women, see Sect. "Sample Size Consideration"), obtaining the target sample size for each province. The rounded values are given in the last three columns of Table 1. Next, we split the province target sample size proportionally to the selected districts according to Table  S1 in the supplementary file, columns 4 and 6 (headed "%"). The resulting frequencies for each district are given in the last two columns of Table S1 (rounded to integers). These frequencies were multiplied by 18 (i.e., 6 per wave, see Sect. "Address Source and Time Frame") to obtain the gross number of addresses to contact.

Weighting
From the procedure described above, we obtained a sample covering a proportional share of respondents for both districts (see Table 2) and respondents (stratified by sex; see Table 3). Regarding districts, the overall selection probability was 0.36 (however, ranging across provinces from 0.26 to 0.50 because of roundoff errors due to the small numbers involved). Regarding respondents, we find a selection probability of 0.018% for both male and female respondents (due to the large numbers involved with remarkably finetuned precision).
However, notwithstanding the proportional allocation of districts and sex with respect to province, the sample is not self-weighting, because we performed a random selection of districts based on the number of districts in each province. They were not drawn with a probability proportional to their size, which has to be compensated for. Moreover, all provincial capitals were deliberately included, which can be seen as complete count given the specific city size distribution of Austria (cf. Sect. "Target Population and the Sampling Framework"). Therefore, cities have been selected with a probability of one (with the exception of Vienna, which was treated as a province). Thus, we have to handle the fixed and the randomly selected districts differently.

Calculating Design Weights
Note: In the following, we will use capital letters to indicate population-based figures and lower case letters for sample-based figures. Stratification is indicated by a superscript in brackets, the subscript d denotes references to the district and subscript p to the province. The symbols N and n denote (true) population and sample frequencies, M projections, and w and W denote weights. The symbols m and f refer to male and female.
We start with the probability of choosing a district at random. This was done with respect to the number of districts of each province. If K p is the number of all districts of a province (col. 2 of Table 2) and k p the number of districts chosen from this province (last col. of Table 2), then the probability of drawing a given district is Note that for the special case Vienna, there is no provincial capital, hence we used K and k rather than K − 1 and k − 1, respectively. Second, we calculated the probability of a person to be drawn from the selected districts. Due to the stratification according to sex, we had to perform this calculation separately for men and women. If Hence, the probability of randomly drawing an individual (so far irrespective of the district's size) is the product Taking the inverse of Eqs. (3a) and (3b) yields intermediate district projection weights W d , Multiplying the W (·) d with the sample size n d of the respective district yields the intermediate district pro- (introducing the generic notation (m|f) to indicate the separate application of the formula according to the stratification by sex). Eq. (5) lays the foundation to generalize from the chosen districts of a province to the entire province. For that purpose, we have to take the sum of the M d across all districts of a province p to obtain the (intermediate) province However, these estimates are biased, because we have not yet considered the district size when randomly selecting the districts in Eq. (1). M (·) p would over-estimate the respective province totals N (·) p if we sampled (by chance) rather large districts or under-estimate it if there were more of the small districts of the respective province in our sample (therefore, Eqs. (6) were prefixed "intermediate").
The district rescaling factor R d corrects for this bias, again taking into account that the provincial capitals (indexed c) were deliberately chosen: We yield the corrected district projections M d by multiplying the intermediate projections (5) by the rescaling factor, i.e., and the province projections M p by taking the sum across all districts of a province, which, as a matter of fact, equal the province size, i.e.: To obtain point estimates of population parameters, such as the mean or frequency estimates, for example, we need the respective corrected weights. These are obtainded analoguously by multiplying the intermediate district projection weights by the rescaling factor, i.e., However, to remain with the sample frequencies, we may simply apply a sample rescaling factor r using the sample size n and the population size N , and obtain the sample district weights

Target Weighting for Age
Age was not considered in the sampling design, therefore the age distribution of the sample may differ from the respective population distribution. To compensate for effects resulting therefrom, we performed post-stratification weighting using official statistics provided by Statistik Austria. We obtained the frequencies of age groups 15-19, 20-24, 25-29, . . . for both sexes. Although the target population of APPS was 18-65 years, which differs slightly from the limits used in the official statistics available, the practical impact was negligible as it turned out that the observed minimum age in the sample was 20 and only 5 respondents were over 65 (four 66, one 67; these were added to the 60-64 group). The age weights can be determined directly, because only one target variable is involved [cf.23, ch. 7]. For each age group a, the age weighting factor w a was obtained separately for male and female respondents using the ratio of the proportion of the sample frequency n a and the respective population frequency N a : To consider both sampling and age distribution, the weights (11) and (12) must be multiplied, i.e., Using these weights will exactly reproduce the age distribution of the Austrian population as determined by Statistik Austria [17]. Table S2 in the supplementary file provides two examples of the weighting effect. They were compiled with SPSS, using the weight by statement. The examples cover the two demographic variables residents in household and voluntary/unpaid work. Interestingly, we find generally small differences of the weighted compared to the unweighted results. The example.xlsx in the supplement illustrates the application of the weighting formulas for Upper Austria.

Example Application
Hence, we see that the weights are extremely easy to apply for frequency tables. For more complex analyses and significance tests, one would use the SPSS Complex Samples module, for the standard errors require a modified estimation routine in the context of design weights.
Users of R [24] may choose the survey package [25,26], for example, which also allows for applying design weights and calculating the correct standard errors and significance tests.

Discussion
In this report, we presented the sampling rationale and weights calculation for a nationwide epidemiological study in Austria. It comprises a combined strategy involving stratification on province, cluster sampling of districts, stratification on age, and, finally, random sampling.
The procedure has been specifically adapted to the Austrian population structure. It reflects the distributions of inhabitants across the country (organized in provinces and districts) taking into account the specific role of the Austrian provincial capitals. Thus, the chosen procedure provides a sample, which can be considered adequate to obtain results representative for the Austrian population. Moreover, subsequent analyses could focus on indicators for representativity (e.g. by means of a non-responder analysis).
One critical issue is the question, whether the data base used for sampling covers the Austrian population to a sufficient extent. Unfortunately, Austrian law (Meldegesetz 1991, § § 16a+b) [27] does not allow access to the register of residents (Zentrales Melderegister). We, therefore, were left to a commercial vendor. According to a spokesperson, the data base covers approximately 80 % of the Austrian population. The authors of a similar study [28] covering six European countries (not Austria) faced a similar problem in the case of France. They also chose to buy telephone numbers from a commercial vendor and reported a comparable coverage (unliststed rate approximately 16-18%; ibid., p. 9).
If the strategies presented here were to be applied to a country other than Austria, the procedure might simplify, because the complexities of Eqs. (1) and (7) need not be applied. These extra steps were required because of the disproportional distribution of city sizes, which were the motivation to select all province capitals. This extra effort may not be necessary for larger countries or countries with more large cities. Thus, our complex sampling approach might serve as a best-practice example for future studies pursuing a similar target.
Funding Open access funding provided by University of Klagenfurt.

Conflict of interest
All authors declare that they have no conflict of interest.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.