Weight smoothing for nonprobability surveys

Adjustment techniques to mitigate selection bias in nonprobability samples often involve modelling the propensity to participate in the nonprobability sample along with inverse propensity weighting. It is well known that procedures for estimating weights are effective if the covariates selected in the propensity model are related to both the variable of interest and the participation indicator. In most surveys, there are many variables of interest, making weight adjustments difficult to determine as a suitable weight for one variable may be unsuitable for other variables. The standard compromise is to include a large number of covariates in the propensity model but this may increase the variability of the estimates, especially when some covariates are weakly related to the variables of interest. Weight smoothing, developed for probability surveys, could be helpful in these situations. It aims to remove the variability caused by overfit propensity models by replacing the inverse propensity weights with predicted weights obtained using a smoothing model. In this article, we study weight smoothing in the nonprobability survey context, both theoretically and empirically, to understand its effectiveness at improving the efficiency of estimates.


Introduction
Probability sampling has been the gold standard for empirical research since its development in the XX th century based on the work of Neyman (1934) and Horvitz and Thompson (1952) among others.For a sample to be considered probabilistic and therefore valid for population inferences, it must be drawn under the assumption that all the individuals in the target population have a known and non-null inclusion probability.If any of these conditions do not apply, we have a nonprobability sample instead.The use of such samples in empirical sciences is widespread nowadays thanks to technological development and social media, which allows pollsters and vendors to use new questionnaire administration methods such as online and smartphone surveys.These surveys are usually administered via opt-in panels or by recruiting volunteers via snowball sampling (see Schonlau and Couper 2017 for an extensive review of methods).
Nonprobability survey methods offer several advantages over the traditional ones: critical reduction in costs and time to accomplish the fieldwork (Bonsjak and Tuten 2003;Greenlaw and Brown-Welty 2009;Díaz de Rada 2012), and larger sample sizes in comparison with traditional methods which are experiencing a decrease in response rates (Kohut et al. 2012).On the other hand, nonprobability sampling induces a selection bias in the estimates, as the participants (or sample individuals) can differ substantially from nonparticipants (Elliott and Valliant 2017).
Several methods are available to reduce selection bias when a probability sample from the same target population is available.Here, we mention Propensity Score Adjustment (PSA), including the tree-based inverse propensity-weighted (TrIPW) estimator proposed by Chu and Beaumont (2019), statistical matching (also referred to as sample matching), as well as doubly robust estimators that combine statistical matching ideas with PSA.
PSA was originally developed to mitigate selection bias in nonrandomized clinical trials (Rosenbaum and Rubin 1983), and it was adapted to the survey nonresponse field shortly after (Little 1986).PSA adapted to the nonprobability survey context as a method to mitigate selection bias was developed by Lee (2006) and Lee and Valliant (2009).With the PSA method, propensities to participate in a nonprobability sample are estimated via classical modelling using a probability sample drawn from the same population.The TrIPW estimator is an extension of the PSA estimator proposed by Chen et al. (2020), where propensities are estimated using a weighted version of the Classification And Regression Trees (CART) methodology (Breiman et al. 1984).The CART algorithm builds a tree that optimizes an homogeneity measure, given a set of covariates, which is then used to estimate propensities.
When the propensity model is properly specified, PSA is able to reduce bias of nonprobability sample estimates at the potential cost of increasing their variability (Lee 2006;Lee and Valliant 2009;Valliant and Dever 2011;Ferri-García and Rueda 2018).The TrIPW estimator shows itself as a more robust estimator under complex relationships between variables, such as nonlinearities (Chu and Beaumont 2019) and the presence of interactions.An alternative is to pool the probability and nonprobability samples, similar to Lee (2006), and to use machine learning algorithms to model propensities (Ferri-García and Rueda 2020).
Statistical matching focuses on another model-based approach whose objective is to predict the unobserved values of the variable of interest in the probability sample.The predictive model is fitted using data from the nonprobability sample.Statistical matching has also been proven to mitigate selection bias in nonprobability samples (Castro-Martín et al. 2020).The combination of both strategies via doubly robust estimators may outperform both approaches on their own Chen et al. (2020).
Despite the benefits of statistical matching techniques, they could have limitations in surveys that collect multiple variables of interest.In those surveys, which are common in practice, each variable of interest may require a specific model to predict its unobserved values in the probability sample.This could become cumbersome if the number of variables of interest is large and increase the risk of model misspecifications.The use of weighted estimators, such as the PSA and TrIPW estimators, could provide a reasonable solution, as the same vector of weights would be used to obtain estimates for all of the variables of interest.However, research has shown that propensity techniques are more efficient when the covariates used for modelling the propensities are related to the outcome variables, that is, the variables of interest (Hirano and Imbens 2001;Brookhart et al. 2006).In a survey with multiple variables of interest, a suitable set of covariates may vary between variables.The standard compromise is to include a large number of covariates in the propensity model.This may increase the variability of the resulting estimates due to overfitting, especially when the covariates are weakly related to the variables of interest.
In probability surveys, weight smoothing (Beaumont 2008) has been shown to be effective at reducing the variance of survey-weighted estimators by modelling the survey weights conditional on the variables of interest.The variance of survey-weighted estimators can be large when the design variables are unrelated to the variables of interest.To the best of our knowledge, this technique has not been evaluated in a nonprobability survey context, where the inclusion (or participation) probabilities are unknown and estimated.The objective of this study is to examine the adequacy of weight smoothing for nonprobability surveys, both theoretically and empirically, and explore the situations that could enhance its efficiency.

Weighting in nonprobability surveys
Let U be a target population of size N from which we want to estimate a population parameter, such as the population mean Ȳ = N −1 i∈U y i , for a given variable of interest y.To this end, we obtain a nonprobability sample s v of size n v from the population U .The participation mechanism may depend on features such as self-selection or device availability (computer, internet access, etc.).In this case, the probability that an individual i ∈ U is included in s v is not known a priori.
Let R i be the indicator variable which measures whether a given individual i ∈ U has participated or not.We assume that R i is related to a vector of covariates, x i (e.g.demographic variables such as region, age and sex, or education), and that participation is not informative, i.e.R i does not depend on y i after conditioning on x i .We define the inclusion (or participation) probability as The participation probability is unknown and assumed to be strictly positive.From these assumptions, if x i is known for every i ∈ U , the participation probability can be estimated using standard modelling techniques along with maximum likelihood estimation.However, this information is rarely available.An alternative is to use a probability sample s r of size n r , drawn from the full population U that measures x i for all sample individuals i ∈ s r .The design weight d r i is also available for sample individuals.The covariates x i are assumed to be observed in both the probability and nonprobability samples, whereas y i is observed only in the nonprobability sample.
With PSA methods, a parametric model π i = m(x i , β) is typically postulated, where m is a known function, such as the logistic function m(x i , β) = 1 + exp(−x i β) −1 , and β is a vector of unknown model parameters.Assuming the participation indicators R i , i ∈ U , are mutually independent, Chen et al.
(2020) proposed a pseudo-maximum likelihood estimator of π i , which is computed as πi = m(x i , β), where the estimator β maximizes the pseudo-log-likelihood function with respect to β.The design expectation of the pseudo-log-likelihood function ( 1) is equal to the standard log-likelihood function, which cannot be used unless x i is observed for all population individuals i ∈ U .The pseudo-log-likelihood function proposed by Chen et al. ( 2020) may be less efficient but does not require x i to be observed in the entire population; it only needs x i to be observed for all individuals in s v and s r .
The population mean Ȳ can be estimated by the Hajek estimator where Nv = i∈s v w i and Chen et al. (2020) proved the consistency of the weighted estimator ȳ H w under regularity conditions, i.e. they proved that A number of authors (e.g. Lee 2006;Lee and Valliant 2009) have considered estimating π i using the pooled sample s = s r ∪s v along with a weighted logistic regression.If the input weights for the logistic regression are chosen as the resulting pseudo-log-likelihood function can be written as (3) It can be observed that (3) is equal to the pseudo-log-likelihood function shown in (1), except for the last term i∈s v log (1 − m(x i , β)).Beaumont (2020) pointed out that if the participation probabilities π i are all small, which could be plausible when the participation rate is small, maximizing (3) is approximately equivalent to maximizing the pseudo-log-likelihood function (1) when the logistic function is used.As a result, using (1) or (3) should yield similar estimated participation probabilities when the participation rate is small.However, note that the pseudo-log-likelihood function of Chen et al. (2020) does not require this condition and should be the preferred choice when it is not satisfied.
This idea of pooling both samples has recently been used along with nonparametric methods of estimating propensities, such as machine learning classification algorithms (e.g.Ferri-García and Rueda 2020).Similar to logistic regression, these methods are expected to be valid only when the participation rate is small and design weights d r i are used appropriately.
The literature also accounts for other procedures to calculate weights from propensities.For instance, the original literature on PSA for nonprobability samples (Lee 2006;Lee and Valliant 2009) considered the stratification of propensities into g partitions, usually g = 5 following the criteria of Cochran (1968), and the calculation of weights using a correction factor that takes into account the original design weights.Valliant and Dever (2011) considered a similar approach that also involves stratification of propensities.The use of propensity strata may provide some robustness to misspecifications of the logistic model and may reduce the occurrence of extreme weights.
The TrIPW estimator of Ȳ , developed in Chu and Beaumont (2019), takes the same form as the Hajek estimator (2), but the estimation of the participation probability π i is based on an adaptation of the Classification And Regression Trees (CART) algorithm (Breiman et al. 1984).This adaptation accounts for the design weights d r i , i ∈ s r , in a way similar to Chen et al. (2020).As a result, it does not require the participation rate to be small.After the tree has been grown, using an objective function that accounts for the design weights, the nonprobability sample s v is partitioned into G exhaustive and nonoverlapping homogeneous propensity groups (or terminal nodes), s v,g , g = 1, ..., G.The probability sample is partitioned similarly using the same decision rules into G exhaustive and nonoverlapping groups s r ,g , g = 1, ..., G.The propensity for each individual i ∈ s v,g is estimated as: where n v,g is the number of individuals in the nonprobability sample who fall in propensity group g and Ng = i∈s r ,g d r i is the estimated population size of group g.The estimated probability (4) can be obtained using the Chen et al. (2020) method by defining x i as a G-vector indicating to which group individual i belongs.The creation of homogeneous propensity groups using this weighted CART algorithm is expected to provide some robustness to misspecification of the logistic model.This was shown by Chu and Beaumont (2019) in a simulation study.

Weight smoothing
The application of weighting methods discussed in Sect. 2 can reduce significantly the participation bias at the possible cost of increasing the variance of the estimates (Lee 2006;Lee and Valliant 2009;Ferri-García and Rueda 2018).This variance is directly tied to the variability of the weights and is amplified when the covariates are weakly associated with the variable of interest.In that case, weighting increases the variance without the benefit of a significant bias reduction.Therefore, it seems reasonable to focus on strategies that reduce the variability of the weights.Beaumont (2008) proposed weight smoothing for probability samples.Our objective is to study this method in the context of nonprobability samples.
Let us assume for a moment that the logistic model holds and that β is known.Assuming N is also known, the population mean Ȳ can be estimated by The superscript β is used to indicate that β is known and to distinguish this case from the one considered throughout this paper, where β is estimated.The weighted estimator ȳβ w is unbiased in the sense that E ȳβ w | X, Y = Ȳ , where X is the N -row matrix formed by the row vectors x i , i ∈ U , and Y is the N -vector of population y values.Under these assumptions, the nonprobability sample can be viewed as a Poisson sample with known inclusion probabilities π i = m(x i , β), and the original weight smoothing method of Beaumont (2008) can be directly applied to improve the efficiency of ȳβ w .It consists of replacing the weight w β i with the smoothed weight The basic idea is to extract from the weight w β i its relevant component, i.e. the component that is associated with the variable of interest.Assuming the smoothed weight wβ i is known, the population mean Ȳ is estimated by The smoothed weight Y is generally unknown but can be estimated from sample data by modelling w Beaumont (2008) showed that the resulting estimator of Ȳ remains no less efficient than ȳβ w under a linear model.Note that inferences under weight smoothing are conditional on Y alone.As a result, x i is viewed as random, as well as w β i , but only the latter needs to be modelled.In multipurpose surveys, there are multiple variables of interest so that y i is a vector and Y is a matrix.The weight smoothing methodology can be extended in a straightforward manner to a vector of variables of interest by modelling the weight w β i conditional on the full vector of variables of interest.However, if the number of y variables is large, it may be expected that together they become strongly predictive of the weight w β i , thereby reducing the potential efficiency gains.An alternative would be to determine a specific weight smoothing model for each variable of interest, but this would lead to multiple smoothed weights, which may not be attractive to the ultimate users.
With nonprobability samples, the model parameters β are unknown but can be estimated using the Chen et al. (2020) pseudo-likelihood method discussed in Sect.

This yields the weight w
The proof of ( 6) is given in the appendix.Again, the smoothed weight wi = E (w i | s v , Y) is generally unknown but can be estimated from sample data by modelling w i given y i , i ∈ s v .For instance, consider the linear model where the vector of predictors h i is a function of the variable(s) of interest y i , and γ and σ 2 are unknown model parameters.The smoothed weight wi can be estimated by ŵi = h i γ , where is the least square estimator of γ .The smoothed estimator of Ȳ becomes ȳ ŵ = N −1 i∈s v ŵi y i .After straightforward algebra, it can be shown that where ŷi = h i α is a predicted value of y i with Therefore, Eq. ( 8) indicates that smoothing the weight w i using the predictors h i is equivalent to smoothing y i using the same predictors.It can also be shown that The proof of ( 10) is given in the appendix.It confirms that the smoothed estimator ȳ ŵ is never less efficient than ȳw when the linear model ( 7) holds.The magnitude of efficiency gains from weight smoothing depends in part on the strength of the relationship between the weight w i and the predictors h i .A weak relationship, and thus a large model variance σ 2 , will tend to increase efficiency gains.The efficiency gains will also tend to be larger when h i is not a strong predictor of y i .Instead, if h i is a perfect predictor of y i , i.e. there exists a vector α such that y i = h i α, then it can be easily shown that ŷi = y i , i ∈ s v , and the efficiency gains entirely vanish.Variance reductions may thus vary from one variable of interest to another depending on the strength of their relationship with h i .On the one hand, overfitting should be avoided as much as possible when choosing the predictors h i to maximize variance reductions.Variable selection techniques, such as Least Absolute Shrinkage and Selection Operator (LASSO), can be useful for this purpose.On the other hand, the predictors h i should be chosen to ensure the linear model ( 7) holds, at least its first moment, to avoid introducing bias in the smoothed estimator of the population mean Ȳ .
The most favourable situation for weight smoothing is when none of the variables of interest is related to the weight w i so that , the most unfavourable situation would be when the variables of interest are strong predictors of the covariates x i , and thus the weight w β i , so that wβ i ≈ w β i and basically no variance reduction is possible.In particular, this would occur in the extreme and unlikely scenario where all the covariates would also be variables of interest.
An estimator of the variance (10) requires estimating var ( ȳw | Y).Under regularity conditions given in Chen et al. (2020), The variance (11) can thus be estimated by estimating the conditional variance var ( ȳw | X, Y).Chen et al. (2020) proposed linearization and bootstrap estimators of this conditional variance.We denote a consistent estimator of var ( ȳw | X, Y) by v ( ȳw ).A plug-in estimator of the variance ( 10) is thus where p is the number of predictors in the vector h i and Let us now consider the Hajek estimators ȳ H Using a first-order Taylor linearization and assuming an intercept is included in the vector of predictors h i , it can be shown that var ȳ H ŵ can be approximated as As a result, the variance reduction for the Hajek estimator ȳ H ŵ is asymptotically the same as the variance reduction for ȳ ŵ.Similar to (12), an estimator of the variance (13) can be obtained as where v ȳ H w is a consistent estimator of the conditional variance var ȳ H w | X, Y , such as the linearization and bootstrap estimators proposed in Chen et al. (2020).
The variance expressions ( 10) and ( 13) are valid only for the linear smoothing model ( 7).In practice, a linear model may not always hold even after accounting for interactions and/or polynomial effects.Nonlinear smoothing models could also be considered.For variance estimation under nonlinear models, Beaumont (2008) proposed two bootstrap methods that could be adapted to the context of nonprobability surveys.In our simulation studies, described in the next sections, we evaluate the prediction algorithm XGBoost as an alternative to the linear model ( 7) for the estimation of smoothed weights.The development of theoretical properties of XGBoost for weight smoothing is beyond the scope of this paper.
Nonprobability samples of size n v were selected without replacement with probabilities proportional to Fig. 1 Histogram of the population propensities using the sample() function in R. Therefore, the participation probabilities π i depend on the values of x 7 , x 8 and x 9 and are approximately equal to This participation mechanism was intended to create weights with high variability, a situation where the advantages of weight smoothing might be more visible.The histogram of propensities π i , i ∈ U , is provided in Fig. 1; the mean propensity is 0.002, with a standard deviation of 0.00147, and thus a coefficient of variation of 0.7351.The first and third quartiles are 0.00044 and 0.00354, respectively.The variables of interest were created to have different relationships with the covariates and the propensities according to two scenarios: Sc. 1.No relationship between any variable in (y 1 , ..., y 10 ) and π Sc. 2. Relationship between every variable in (y 1 , ..., y 10 ) and π Scenario 1 is favourable to weight smoothing, whereas Scenario 2 is unfavourable.In practice, we may expect to have a hybrid between these two scenarios, where some but not all covariates that explain π i are unrelated to the variables of interest.
The generation of the variables of interest was performed according to the following formulas: , , , , , and where 1 Sc. 2 is an indicator variable which takes the value 1 if the simulation is conducted under Scenario 2 and 0, otherwise.It can be observed that we have 6 Bernoulli and 4 Gaussian variables among (y 1 , ..., y 10 ) whose parameters depend on the scenario.The vector of population means for Scenario 1 is (0.60, 0.70, 0.50, 0.50, 2.00, 0.18, 0.50, -2.00, 0.18, 0.00), while the vector of population means for Scenario 2 is (0.84, 3.25, 0.21, 0.62, 3.02, 0.28, 0.38, -3.02, 0.12, 1.02).Table 1 contains Pearson's correlation coefficients between the propensities π and each variable of interest for both scenarios.We see that the correlation is nonexistent in Scenario 1 and notable for all the variables in Scenario 2, with different levels of strength caused by the limitations of using this measure on binary variables.Table 2 presents the results of t tests of the equality between the means of π for y k = 0 and y k = 1, for k = 1, 3, 4, 6, 7, 9, in Scenario 2.

Real data
The dataset used to experiment in a real life situation comes from the 2012 edition of the Spanish Life Conditions Survey (National Institute of Statistics 2012).This is an annual survey measuring several aspects of life conditions, such as health status, degree of deprivation and employment conditions, in the Spanish adult population.The survey includes specific modules in each edition; in 2012, the module consisted of a battery of questions regarding household conditions.The sampling design follows a stratified cluster scheme, where the primary units are the households and the secondary units are their members.The total sample size in 2012 was n = 33, 579.For its use as a pseudopopulation, the sample dataset was filtered to rule out those individuals and variables with high amounts of missing data.This reduced the dataset to n = 28, 210 and 146 variables, from which 61 were selected for the simulations.The sample was subsequently bootstrapped in order to increase its size to 1, 000, 000.Finally, all the individuals who selected any of the refusal options ("Does not know" or "Does not answer") were also ruled out of the analysis to avoid further problems with rare classes.The final pseudopopulation size for the experiments was N = 990, 838.
For the experiments, we chose HS090 (Owning a computer at home) as the volunteering variable, given that its behaviour would be very similar to a variable measuring access to internet (see Ferri-García and Rueda 2020 for further details on this matter).The extraction of the nonprobability sample s v was done under two different mechanisms: -Simple Random Sampling Without Replacement (SRSWOR) from the population who has a computer at home.-Unequal probability sampling without replacement from the population who has a computer at home, where the probabilities are calculated as Regarding covariates, two different sets were considered: -A set of nine demographic variables, namely region, urbanization level, number of members of the household and consumption units (weighted mean of the number of members of the household following OECD criteria, where adults have more weight than teenagers and teenagers have more weight than children), sex, marital status, country of birth, nationality, and whether the individual is currently a student or not.-A set of eight variables related to economic and material deprivation, namely capacity of the household to make ends meet, minimum income required by the household to make ends meet, whether the household has the capacity to go on holiday, have a meat or fish meal at least every two days, and deal with unforeseen expenses, household under the poverty threshold, person under the poverty threshold, and household in a situation of severe material deprivation.

Experimental design and metrics
The settings of the experiment were kept as equal as possible for all simulation scenarios.Each simulation was run 500 times, drawing probability and nonprobability samples of equal sizes (n r = n v = 1, 000) using the nonprobability sampling designs described in the previous section to select s v .The probability sample s r was selected using SRSWOR in all scenarios so that d r i = N n r .Two approaches were applied to estimate nonprobability sample propensities π i : weighted logistic regression with main effects only, and weighted CART described in Sect. 2 (see Eq. 4) with a fixed minimum cell size of 50 and minimum cell impurity of 0.0001.For weighted logistic regression, the model parameters β were estimated by maximizing the pseudo-log-likelihood function (3), which should be approximately equivalent to maximizing the pseudo-log-likelihood function of Chen et al. (2020), given the participation rates are small in our simulation scenarios (0.002 for the artificial data simulation and 0.001 for the real data simulation).
Two methods were considered for the estimation of the smoothed weights E (w i | s v , Y), where the weight w i = 1/ πi is obtained using either weighted logistic regression or weighted CART: -XGBoost algorithm (XGB) using the xgboost package in R (Chen and Guestrin 2016).-Least Absolute Shrinkage and Selection Operator (LASSO) regression using the glmnet package in R (Friedman et al. 2010).
All 10 variables of interest (described in the previous section) were considered as potential predictors of w i for both XGBoost and LASSO.The XGBoost algorithm was trained with default hyperparameters: a L1 regularization term of 0.1, a L2 regularization term of 0.0001 and a learning rate of 0.3.The number of rounds was fixed at 50.In the case of LASSO, w i was modelled using a linear model with main effects only (no interaction).The optimal shrinkage parameter was obtained with a tenfold cross-validation procedure in each run of the simulation.
Relative measures of Monte Carlo bias and Monte Carlo Mean Square Error (MSE) were calculated to allow for the comparison between seven estimators of the population mean Ȳ .These seven estimators can be divided into three categories: the naive unweighted (Unw) estimator, the nonsmoothed (NS) Hajek estimator, and the smoothed Hajek estimator where ŵi is the smoothed weight obtained using either XGB or LASSO.The unweighted estimator ȳUnw is used as a reference for the comparisons.It can be obtained from the smoothed Hajek estimator by replacing ŵi with the average of w i over the nonprobability sample individuals.It is the most extreme form of smoothing that can be obtained from the linear model ( 7) with h i = 1.It is well known that the unweighted estimator may result in significant biases.There are two versions of the NS estimator ȳ H w , depending on whether πi was obtained using a weighted logistic regression, yielding the PSA estimator of Ȳ , or weighted CART, yielding the TrIPW estimator of Ȳ .There are two smoothed estimators (XGB and LASSO), and each of them has two versions depending on the estimation method for π i .There are thus four different smoothed estimators.
Let ȳ * be any of the seven estimators described above.The Monte Carlo bias, standard deviation and MSE of ȳ * are defined as respectively, where ȳ * j is the j th simulation replicate of ȳ * , computed from the j th replicates of s v and s r .From these quantities, we computed the Monte Carlo absolute relative bias and Monte Carlo relative MSE defined as respectively.

Artificial data simulation
The relative bias of estimators can be consulted in Table 3 for both scenarios.As expected, all the estimators in Scenario 1, where there is no relationship between any of the 10 variables of interest and the participation probability π , show very low bias.This is not the case of Scenario 2 for which each variable of interest is related to π .As expected, the unweighted estimator is the most biased.Both nonsmoothed estimators (PSA and TrIPW) were effective at reducing the bias of the unweighted estimator.The TrIPW estimator achieved reductions of more than half of the original bias for almost every variable.The PSA estimator reduced bias to a lesser extent.The magnitude of bias remains moderate, except for variables y 2 and y 3 .Given participation is not informative and the logistic model is correctly specified, albeit with the inclusion of too many covariates, the bias of the PSA estimator is most likely explained by the presence of very small participation probabilities so that a non-negligible proportion of individuals never get selected in any of the 500 simulation replicates (around 47%).Monte Carlo bias occurs if the population mean of those who are never selected is different from the overall population mean.This bias would be expected to decrease if the number of simulation replicates could be significantly increased so that a smaller proportion of individuals never get selected.
In both scenarios, the application of weight smoothing did not produce significant changes in bias.Weight smoothing is intended to reduce variance, not bias.It is thus not surprising to observe that it did not reduce bias, but it is reassuring to see that it did not significantly increase it either.
The relative MSE or efficiency of estimators for each scenario can be seen in Table 4. Values below 1 indicate that the estimator performed better than the unweighted estimator.Scenario 1 is favourable to weight smoothing, and the unweighted estimator is the most efficient since it corresponds to the most extreme form of smoothing.As expected, the nonsmoothed estimators (PSA and TrIPW) were both less efficient than the unweighted estimator due to the variability of weights.The TrIPW estimator was less efficient than the PSA estimator with an MSE around twice that of the unweighted estimator.On the one hand, smoothing using LASSO was very effective at improving efficiency.The LASSO smoothed estimators were almost as efficient as the unweighted estimator with a relative MSE close to 1. On the other hand, smoothing using XGBoost produced only marginal efficiency improvements.It appears that variable selection is useful for variance reduction as pointed out in Sect. 3 for the linear model (7).The following hybrid approach might provide better results than LASSO or XGBoost alone: first, select predictors from the variables of interest using LASSO and then smooth using XGBoost and the selected predictors in the first step.
In Scenario 2, the unweighted estimator is the least efficient due to its bias.Both nonsmoothed estimators improve efficiency by reducing bias, the MSE reduction being more pronounced for the TrIPW estimator.Scenario 2 is unfavourable to weight smoothing as all variables of interest are related to the participation probability.The smoothed weights ŵi are thus expected to be in the neighbourhood of the original propensity weights w i .As a result, none of the two smoothing methods produced any significant change in MSE, neither positive nor negative.
Overall, considering both scenarios, the TrIPW estimator combined with LASSO smoothing seems to offer the best compromise in terms of both bias and variance.In practice, a scenario in between these two extreme ones could be expected, where some variables of interest would be related to the propensity weight w i and others would not.In that case, propensity weighting would contribute to bias reduction for variables of interest related to the propensity weight and LASSO smoothing would reduce variance for other variables provided the predictors are not too strongly related to the propensity weight.

Real data simulation
The relative bias of estimators for the two different set of covariates can be observed in Table 5 for the real data simulation when SRSWOR is used to draw s v from the subpopulation having a computer at home.The unweighted estimator shows a smallto-moderate bias in all cases, except for variable y 5 where the relative bias is slightly above 20%.The nonsmoothed estimators show relative biases similar to those of the unweighted estimators, albeit slightly reduced.This indicates that covariates are weakly associated with both the variables of interest and having a computer at home.As expected, the smoothed estimators did not reduce further the bias but did not increase it either.The relative bias of estimators for the two sets of covariates when unequal probability sampling is used to select s v from the subpopulation having a computer at home can be seen in Table 6.The unweighted estimator shows a small-to-moderate bias, except for variables y 3 , y 5 and y 6 .Again, the nonsmoothed estimators were ineffective at reducing bias in general.Indeed, for variable y 5 , the relative bias of the nonsmoothed estimators was significantly larger than the unweighted estimator.As expected, the smoothed estimators did not reduce the bias of the nonsmoothed estimator but did not increase it either.The relative MSE of estimators for the two sets of covariates under SRSWOR can be seen in Table 7.The nonsmoothed estimators are moderately more efficient than the unweighted estimator in general, which may partly be explained by their slightly smaller biases.However, for the artificial variables y 9 and y 10 , the relative MSE of the nonsmoothed estimators is larger than 1, albeit only marginally.This indicates that the propensity weights w i do not exhibit a large variability.As a result, weight smoothing cannot achieve large variance reductions.
The relative MSE of estimators for the two sets of covariates under unequal probability sampling can be seen in Table 8.The nonsmoothed estimators show mitigated results, sometimes being more efficient than the unweighted estimator (typically when propensity weighting reduces bias) and sometimes not.For variable y 5 , the inefficiency of the nonsmoothed estimators is due to their increased bias as noted above.Weight smoothing did not achieve large efficiency gains, except for a few cases where the nonsmoothed estimator was less efficient than the unweighted estimator.This limited efficiency improvement might be explained by a somewhat strong relationship between the variables of interest and the covariates, or a small variability of the propensity weights w i , so that the smoothed weights may not exhibit substantial deviations from w i .

Discussion
Weight smoothing was introduced by Beaumont (2008) for reducing the variance of estimates from probability samples.It consists of modelling the survey weight conditional on the variables of interest and then replacing the weight with its predicted value.This paper extends this idea to nonprobability samples, where the weight is itself estimated from a propensity model.Our assumption was that it could improve efficiency when there are covariates in the propensity model that are weakly associated with the variables of interest.First, we have shown theoretically that the smoothed estimator is never less efficient that its nonsmoothed version under a linear model for the propensity weight w i .The magnitude of the efficiency gains depends on the strength of the variables of interest for predicting the propensity weight.It also depends on how powerful the predictors of the propensity weight are for predicting each of the variables of interest.Then, we designed two simulation studies, based on artificial and real data, to evaluate the properties of weight smoothing.The results showed that weight smoothing may contribute to reduce the MSE, particularly when nonsmoothed estimators (obtained using weighted logistic regression or weighted CART) are less efficient than the simple unweighted estimator.For instance, this would occur when the variables of interest are weakly related to the covariates used in the propensity model.When the nonsmoothed estimators reduced bias and were more efficient than the unweighted estimator, weight smoothing did not yield significant efficiency gains in our simulation scenarios although it would remain theoretically possible.
In the real data simulation, there were some remarkable exceptions to the behaviour described above.In a few cases, the nonsmoothed estimators were largely inefficient, but weight smoothing could not improve the results.In those cases, propensity weighting contributed to increasing the bias, rather than reducing it.This may occur when the propensity model is misspecified (Lee 2006;Ferri-García and Rueda 2020).Therefore, the resulting augmentation of the MSE was due to an increase in bias, not variance.This explains why weight smoothing could not improve efficiency in those cases as it is not designed to reduce bias.
Regarding weight smoothing methods, LASSO regression presented better results overall than XGBoost in terms of MSE reduction.LASSO regression involves variable selection, which can be particularly relevant when some variables of interest are weakly related to the propensity weight.As shown theoretically for a linear model, no efficiency gain can be achieved for the estimation of the population mean of a variable when this variable is included in the smoothing model.Therefore, a hybrid method that would first select important variables using LASSO and then apply XGBoost to predict the propensity weight using the variables selected in the first step might be more effective than LASSO or XGBoost alone.This could be investigated in future research.
The nonsmoothed TrIPW estimator (weighted CART) appeared more effective at reducing bias than the nonsmoothed PSA estimator (weighted logistic regression) in a majority of cases.However, the TrIPW estimator was sometimes less efficient than the PSA estimator.In those cases, weight smoothing was effective at reducing the MSE to a level similar to the MSE of the PSA estimator.This suggests that a reasonable weighting strategy to use as a default choice when adjusting for selection bias in nonprobability surveys could be to use weighted CART to obtain the propensity weight w i followed by weight smoothing.
The present empirical study has some limitations that should be noted.First, we did not consider creating homogeneous propensity strata after logistic regression.This is quite common in the context of survey nonresponse and has the advantage of reducing the occurrence of extreme propensity weights as well as providing some robustness to model misspecifications.Second, only two prediction algorithms were proposed for weight smoothing.There is currently a wide range of algorithms in the machine learning literature.Further studies could explore other prediction algorithms for weight smoothing or consider other strategies for hyperparameter tuning, which could lead to better results.Finally, the data used for our simulations cover a limited range of situations; for instance, the artificial data simulation only considered a Ushaped distribution for the participation probabilities, and the real data simulation presented a situation where the selection bias was not extremely large.Other more realistic scenarios should be considered in future research on this topic.In addition, the number of variables of interest in the simulations was fixed at 10.Further research could consider scenarios where the number of variables of interest is significantly larger, as it is likely to have an impact on the properties of weight smoothing estimators. in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A Proof of result (6)
Under regularity conditions given in Chen et al. (2020), As a result, In addition, we also have that Combining ( 14) and ( 15), we obtain the first part of the result: To obtain the second part of the result, it suffices to observe that The result var ( ȳ w | Y) ≤ var ( ȳw | Y) is proven by noting that the second term on the right-hand side of ( 16) cannot be negative.

B Proof of result (10)
Under the linear model ( 7), and from result (6) and Eq. ( 15), we have Under the linear model ( 7), we also have that Combining the last two equations, we obtain: . Assuming w i given s v and Y are mutually independent (at least asymptotically), it is also straightforward to show that and that where α is given in (9).Combining ( 17) and ( 18), noting that ŷi = h i α and rearranging the terms yield: 500, 000,

Table 2
Results of t tests of the equality between the means of π for binary classes of y 1 , y 3 , y 4 , y 6 , y 7 and y 9 in Scenario 2

Table 3
Relative bias (Rel Bias) for each variable, estimator and artificial data scenario

Table 4
Relative MSE (Rel M S E) for each variable, estimator and artificial data scenario

Table 5
Relative bias (Rel Bias) for each variable, estimator and set of covariates when using SRSWOR to draw s v from the subpopulation having a computer at home

Table 6
Relative bias (Rel Bias) for each variable, estimator and set of covariates, when using unequal probability sampling to draw s v from the subpopulation having a computer at home

Table 7
Relative MSE (Rel M S E) for each variable, estimator and set of covariates when using SRSWOR to draw s v from the subpopulation having a computer at home

Table 8
Relative MSE (Rel M S E) for each variable, estimator and set of covariates, when using unequal probability sampling to draw s v from the subpopulation having a computer at home