Optimal representative sample weighting

Barratt, Shane; Angeris, Guillermo; Boyd, Stephen

doi:10.1007/s11222-021-10001-1

Optimal representative sample weighting

Published: 28 February 2021

Volume 31, article number 19, (2021)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

587 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting, which happens when certain sample averages of the data are close to prescribed values. We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved. Our formulation includes as a special case the selection of a fixed number of the samples, with equal weights, i.e., the problem of selecting a smaller representative subset of the samples. While this problem is combinatorial and not convex, heuristic methods based on convex optimization seem to perform very well. We describe our open-source implementation rsw and apply it to a skewed sample of the CDC BRFSS dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Vitor Werner de Vargas, Jorge Arthur Schneider Aranda, … Jorge Luis Victória Barbosa

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

Availability of data and material.

All data and material are freely available online at www.github.com/cvxgrp/rsw.

References

Agrawal, A., Verschueren, R., Diamond, S., Boyd, S.: A rewriting system for convex optimization problems. J. Control Decis. 5(1), 42–60 (2018)
Article MathSciNet Google Scholar
Angeris, G., Vučković, J., Boyd, S.: Computational bounds for photonic design. ACS Photonics 6(5), 1232–1239 (2019). https://doi.org/10.1021/acsphotonics.9b00154. ISSN 2330-4022, 2330-4022
Article Google Scholar
ApS, M.: MOSEK modeling cookbook. https://docs.mosek.com/MOSEKModelingCookbook.pdf (2020)
Bethlehem, J., Keller, W.: Linear weighting of sample survey data. J. Off. Stat. 3(2), 141–153 (1987)
Google Scholar
Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analysis. Springer, New York (2007). 978-0-387-72805-6
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004). 978-0-521-83378-3
Book Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2010). https://doi.org/10.1561/2200000016. ISSN 1935-8237, 1935-8245
Article MATH Google Scholar
Center for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey Data (2018a)
Center for Disease Control and Prevention (CDC). LLCP 2018 codebook report. https://www.cdc.gov/brfss/annual_data/2018/pdf/codebook18_llcp-v2-508.pdf (2018b)
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)
Article MathSciNet Google Scholar
Daszykowski, M., Walczak, B., Massart, D.: Representative subset selection. Anal. Chim. Acta 468(1), 91–103 (2002)
Article Google Scholar
Deming, W., Stephan, F.: On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Stat. 11(4), 427–444 (1940)
Article MathSciNet Google Scholar
Deville, J.-C., Särndal, C.-E., Sautory, O.: Generalized raking procedures in survey sampling. J. Am. Stat. Assoc. 88(423), 1013–1020 (1993)
Article Google Scholar
Diamond, S., Boyd, S.: CVXPY: A Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(83), 1–5 (2016)
MathSciNet MATH Google Scholar
Diamond, S., Takapoui, R., Boyd, S.: A general system for heuristic minimization of convex functions over non-convex sets. Optim. Methods Softw. 33(1), 165–193 (2018)
Article MathSciNet Google Scholar
Domahidi, A., Chu, E., Boyd, S.: ECOS: An SOCP solver for embedded systems. In 2013 European Control Conference (ECC), pp. 3071–3076, Zurich (2013). IEEE. ISBN 978-3-033-03962-9. https://doi.org/10.23919/ECC.2013.6669541
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004)
Article MathSciNet Google Scholar
Fougner, C., Boyd, S.: Parameter selection and preconditioning for a graph form solver. Emerging Applications of Control and Systems Theory, pp. 41–61. Springer, Cham (2018)
Chapter Google Scholar
Fu, A., Narasimhan, B., Boyd, S.: CVXR: an R package for disciplined convex optimization. J. Stat. Softw. 94, 1–34 (2019)
Google Scholar
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. Recent Advances in Learning and Control. Lecture Notes in Control and Information Sciences, pp. 95–110. Springer, London (2008)
Chapter Google Scholar
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1 (2014)
Gurobi Optimization. GUROBI optimizer reference manual. https://www.gurobi.com/wp-content/plugins/hd_documentations/documentation/9.0/refman.pdf (2020)
Heckman, J.: The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann. Econ. Soc. Meas. 5, 475–492 (1976)
Google Scholar
Holt, D., Smith, F.: Post stratification. J. R. Stat. Soc. Ser. A 142(1), 33–46 (1979)
Article Google Scholar
Horvitz, D., Thompson, D.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952)
Article MathSciNet Google Scholar
Iannacchione, V., Milne, J., Folsom, R.: Response probability weight adjustments using logistic regression. Proc. Surv. Res. Methods Sect. Am. Stat. Assoc. 20, 637–642 (1991)
Google Scholar
Jones, E., Oliphant, T., Peterson, P.: SciPy: Open source scientific tools for Python. http://www.scipy.org/ (2001)
Kalton, G., Flores-Cervantes, I.: Weighting methods. J. Off. Stat. 19(2), 81 (2003)
Google Scholar
Karp, R.: Reducibility among combinatorial problems. Complexity of Computer Computations, pp. 85–103. Springer, Boston (1972). https://doi.org/10.1007/978-1-4684-2001-2_9. ISBN 978-1-4684-2003-6 978-1-4684-2001-2
Chapter Google Scholar
Kish, L.: Weighting for unequal pi. J. Off. Stat. 8(2), 183–200 (1992)
MathSciNet Google Scholar
Kolmogorov, A.: Sulla determinazione empírica di uma legge di distribuzione (1933)
Kruithof, J.: Telefoonverkeersrekening. De Ingenieur 52, 15–25 (1937)
Google Scholar
Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
Lambert, J.: Observations variae in mathesin puram. Acta Helvitica, physico-mathematico-anatomico-botanico-medica 3, 128–168 (1758)
Google Scholar
Lavallée, P., Beaumont, J.-F.: Why we should put some weight on weights. In: Survey Methods, Insights from the Field (SMIF) (2015)
Lepkowski, J., Kalton, G., Kasprzyk, D.: Weighting adjustments for partial nonresponse in the 1984 SIPP panel. In Proceedings of the Section on Survey Research Methods, pp. 296–301. American Statistical Association Washington, DC, (1989)
Lofberg, J.: YALMIP: A toolbox for modeling and optimization in MATLAB. In: IEEE International Conference on Robotics and Automation, IEEE, pp. 284–289 (2004)
Lumley, T.: Complex surveys: a guide to analysis using R, vol. 565. Wiley, Hoboken (2011)
Google Scholar
McKinney, W.: Data structures for statistical computing in Python. In: Proceedings of the 9th Python in Science Conference, pp. 56–61 (2010). https://doi.org/10.25080/Majora-92bf1922-00a
Mercer, A., Lau, A., Kennedy, C.: How different weighting methods work. https://www.pewresearch.org/methods/2018/01/26/how-different-weighting-methods-work/ (2018)
Neyman, J.: On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J. R. Stat. Soc. 96(4), 558–625 (1934)
Article Google Scholar
O’Donoghue, B., Chu, E., Parikh, N., Boyd, S.: Conic optimization via operator splitting and homogeneous self-dual embedding. J. Optim. Theory Appl. 169(3), 1042–1068 (2016)
Article MathSciNet Google Scholar
Parikh, N., Boyd, S.: proximal Github repository. https://github.com/cvxgrp/proximal (2013)
Parikh, N., Boyd, S.: Block splitting for distributed optimization. Math. Program. Comput. 6(1), 77–102 (2014a)
Article MathSciNet Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends® Optim. 1(3), 127–239 (2014b). https://doi.org/10.1561/2400000003. ISSN 2167-3888, 2167-3918
Article Google Scholar
Peyré, G., Cuturi, M.: Computational optimal transport: with applications to data science. Found. Trends® Mach. Learn. 11(5–6), 355–607 (2019)
Article Google Scholar
She, Y., Tang, S.: Iterative proportional scaling revisited: a modern optimization perspective. J. Comput. Graph. Stat. 28(1), 48–60 (2019)
Article MathSciNet Google Scholar
Stella, L., Antonello, N., Falt, M.: ProximalOperators.jl. https://github.com/kul-forbes/ProximalOperators.jl (2020)
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: qdldl: a free LDL factorization routine. https://github.com/oxfordcontrol/qdldl (2020a)
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: An operator splitting solver for quadratic programs. Math. Program. Comput. 12, 637–672 (2020b). https://doi.org/10.1007/s12532-020-00179-2
Article MathSciNet MATH Google Scholar
Teh, Y., Welling, M.: On improving the efficiency of the iterative proportional fitting procedure. In: AIStats (2003)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001). https://doi.org/10.1023/A:1017501703105. ISSN 0022-3239, 1573-2878
Article MathSciNet MATH Google Scholar
Udell, M., Mohan, K., Zeng, D., Hong, J., Diamond, S., Boyd, S.: Convex optimization in Julia. Workshop on High Performance Technical Computing in Dynamic Languages (2014)
Valliant, R., Dever, J., Kreuter, F.: Practical Tools for Designing and Weighting Survey Samples. Springer, New York (2013)
Book Google Scholar
Vanderbei, R.: Symmetric quasidefinite matrices. SIAM J. Optim. 5(1), 100–113 (1995)
Article MathSciNet Google Scholar
Walt, S., Colbert, C., Varoquaux, G.: The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22 (2011)
Article Google Scholar
Wittenberg, M.: An introduction to maximum entropy and minimum cross-entropy estimation using stata. Stata J. Promot. Commun. Stat. Stata 10(3), 315–330 (2010). https://doi.org/10.1177/1536867X1001000301. ISSN 1536-867X, 1536-8734
Article MathSciNet Google Scholar
Yu, C.: Resampling methods: concepts, applications, and justification. Pract. Assess. Res. Eval. 8(1), 19 (2002)
Google Scholar
Yule, U.: On the methods of measuring association between two attributes. J. R. Stat. Soc. 75(6), 579–652 (1912)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Trevor Hastie, Timothy Preston, Jeffrey Barratt, and Giana Teresi for discussions about the ideas described in this paper.

Funding

Shane Barratt is supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1656518.

Author information

Authors and Affiliations

Electrical Engineering Department, Stanford University, Stanford, USA
Shane Barratt, Guillermo Angeris & Stephen Boyd

Authors

Shane Barratt
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Angeris
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Boyd
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shane Barratt.

Ethics declarations

Conflicts of interest

Not applicable.

Code availability.

All computer code is freely available online at www.github.com/cvxgrp/rsw.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Iterative proportional fitting

The connection between iterative proportional fitting, initially proposed by Deming and Stephan (1940) and the maximum entropy weighting problem has long been known and has been explored by many authors Teh and Welling (2003), Fu et al. (2019), She and Tang (2019), Wittenberg (2010), Bishop et al. (2007). We provide a similar presentation to She and Tang (2019), Sect. 2.1, though we show that the iterative proportional fitting algorithm that is commonly implemented is actually a block coordinate descent algorithm on the dual variables, rather than a direct coordinate descent algorithm. Writing this update in terms of the primal variables gives exactly the usual iterative proportional fitting update over the marginal distribution of each property.

Maximum entropy problem In particular, we will analyze the application of block coordinate descent on the dual of the problem

$$\begin{aligned} \begin{array}{ll} \text{ minimize } &{} \quad \sum _{i=1}^n w_i \log w_i\\ \text{ subject } \text{ to } &{} \quad Fw = f^\mathrm {des}\\ &{} \quad {\mathbf {1}}^Tw = 1, ~~ w \ge 0, \end{array} \end{aligned}$$

(6)

with variable $w \in {\mathbf{R}}^n$, where the problem data matrix is Boolean, i.e., $F \in \{0, 1\}^{m \times n}$. This is just the maximum entropy weighting problem given in Sect. 3.1, but in the specific case where F is a matrix with Boolean entries.

Selector matrices We will assume that we have several possible categories $k=1, \dots , N$ which the user has stratified over, and we will define selector matrices $S_k \in \{0,1\}^{p_k \times m}$ which ‘pick out’ the rows of F containing the properties for property k. For example, if the first three rows of F specify the data entries corresponding to the first property, then $S_1$ would be a matrix such that

$$\begin{aligned} S_1F = F_{1:3,1:n}, \end{aligned}$$

and each column of $S_1F$ is a unit vector, i.e., a vector whose entries are all zeros except at a single entry, where it is one. This is the same as saying that, for some property k, each data point is allowed to be in exactly one of the $p_k$ possible classes. Additionally, since this should be a proper probability distribution, we will also require that ${\mathbf {1}}^TS_k f^\mathrm {des} = 1$, i.e., the desired marginal distribution for property k should itself sum to 1.

Dual problem To show that iterative proportional fitting is equivalent to block coordinate ascent, we first formulate the dual problem (Boyd and Vandenberghe 2004, Ch. 5). The Lagrangian of (6) is

$$\begin{aligned} \mathcal {L}(w, \nu , \lambda ) = \sum _{i=1}^n w_i \log w_i + \nu ^T(Fw - f^\mathrm {des}) + \lambda ({\mathbf {1}}^Tw - 1), \end{aligned}$$

where $\nu \in {\mathbf{R}}^n$ is the dual variable for the first constraint and $\lambda \in {\mathbf{R}}$ is the dual variable for the normalization constraint. Note that we do not need to include the nonnegativity constraint on $w_i$, since the domain of $w_i \log w_i$ is $w_i \ge 0$.

The dual function (Boyd and Vandenberghe 2004, Sect. 5.1.2) is given by

$$\begin{aligned} g(\nu , \lambda ) = \inf _{w} \mathcal {L}(w, \nu , \lambda ), \end{aligned}$$

(7)

which is easily computed using the Fenchel conjugate of the negative entropy (Boyd and Vandenberghe 2004, Sect. 3.3.1):

$$\begin{aligned} g(\nu , \lambda ) = - {\mathbf {1}}^T\exp (-(1 + \lambda ) {\mathbf {1}}- F^T\nu ) - \nu ^Tf^\mathrm {des} - \lambda , \end{aligned}$$

(8)

where $\exp $ of a vector is interpreted componentwise. Note that the optimal weights $w^\star $ are exactly those given by

$$\begin{aligned} w^\star = \exp ( -(1 + \lambda ) {\mathbf {1}}- F^T\nu ). \end{aligned}$$

(9)

Strong duality Because of strong duality, the maximum of the dual function (7) has the same value as the optimal value of the original problem (6) (Boyd and Vandenberghe 2004, Sect. 5.2.3). Because of this, it suffices to find an optimal pair of dual variables, $\lambda $ and $\nu $, which can then be used to find an optimal $w^\star $, via (9).

To do this, first partially maximize g with respect to $\lambda $, i.e.,

$$\begin{aligned} g^p(\nu ) = \sup _{\lambda } g(\nu , \lambda ). \end{aligned}$$

We can find the minimum by differentiating (8) with respect to $\lambda $ and setting the result to zero. This gives

$$\begin{aligned} 1+\lambda ^\star = \log \left( {\mathbf {1}}^T\exp (-F^T\nu )\right) , \end{aligned}$$

while

$$\begin{aligned} g^p(\nu ) = -\log ({\mathbf {1}}^T\exp (-F^T\nu )) - \nu ^Tf^\mathrm {des}. \end{aligned}$$

This also implies that, after using the optimal $\lambda ^\star $ in (9),

$$\begin{aligned} w^\star = \frac{\exp (- F^T\nu )}{{\mathbf {1}}^T\exp (- F^T\nu )}. \end{aligned}$$

(10)

Block coordinate ascent In order to maximize the dual function $g^p$, we will use the simple method of block coordinate ascent with respect to the dual variables corresponding to the constraints of each of the possible k categories. Equivalently, we will consider updates of the form

$$\begin{aligned} \nu ^{t + 1} = \nu ^{t} + S_{t}^T\xi ^t,\quad t =1, \ldots , T, \end{aligned}$$

where $\nu ^t$ is the dual variable at iteration t, while $\xi ^t \in {\mathbf{R}}^{p_t}$ is the optimization variable we consider at iteration t. To simplify notation, we have used $S_t$ to refer to the selector matrix at iteration t, if $t \le N$, and otherwise set $S_t = S_{(t-1 \mod N) + 1}$, i.e., we choose the selector matrices in a round robin fashion. The updates result in an ascent algorithm, which is guaranteed to converge to the global optimum since $g^p$ is a smooth concave function Tseng (2001).

Block coordinate update In order to apply the update rule to $g^p(\nu )$, we first work out the optimal steps defined as

$$\begin{aligned} \xi ^{t} = {{\,\mathrm{argmax}\,}}_{\xi } ~g^p(\nu ^t + S^T_t \xi ). \end{aligned}$$

To do this, set the gradient of $g_p$ to zero,

$$\begin{aligned} \nabla _\xi ~g^p(\nu ^t + S^T_t \xi ) = 0, \end{aligned}$$

which implies that

$$\begin{aligned} \frac{\sum _{i=1}^n (S_t f_i) \exp (-f_i^T\nu ^t - f_i^TS_t^T\xi )}{\sum _{i=1}^n \exp (-f_i^T\nu ^t - f_i^T S^T_t \xi )} = Sf^\mathrm {des}, \end{aligned}$$

(11)

where $f_i$ is the ith column of F and the division is understood to be elementwise.

To simplify this expression, note that, for any unit basis vector $x \in {\mathbf{R}}^m$ (i.e., $x_i = 1$ for some i and 0, otherwise), we have the simple equality,

$$\begin{aligned} x\exp (x^T\xi ) = x\circ \exp (\xi ), \end{aligned}$$

where $\circ $ indicates the elementwise product of two vectors. Using this result with $x = S_t f_i$ on each term of the numerator from the left hand side of (11) gives

$$\begin{aligned} \sum _{i=1}^n (S_t f_i) \exp (-f_i^T\nu ^t - f_i^TS_t^T\xi ) = \exp (-\xi ) \circ y, \end{aligned}$$

where $y = \sum _{i=1}^m \exp (-f_i^T\nu ^t)Sf_i$. We can then rewrite (11) in terms of y by multiplying the denominator on both sides of the expression:

$$\begin{aligned} \exp (-\xi ) \circ y = (\exp (-\xi )^Ty)S_t f^\mathrm {des}, \end{aligned}$$

which implies that

$$\begin{aligned} \frac{y \circ \exp (-\xi )}{{\mathbf {1}}^T(y \circ \exp (-\xi ))} = S_t f^\mathrm {des}. \end{aligned}$$

Since ${\mathbf {1}}^TS_t f^\mathrm {des} = 1$,

$$\begin{aligned} y \circ \exp (-\xi ) = S_t f^\mathrm {des}, \end{aligned}$$

or, after solving for $\xi $,

$$\begin{aligned} \xi = -\log \left( \mathbf{diag}(y)^{-1}S_t f^\mathrm {des}\right) , \end{aligned}$$

where the logarithm is taken elementwise. The resulting block coordinate ascent update can be written as

$$\begin{aligned} \nu ^{t+1} = \nu ^t - S_t^T \log \left( \frac{S_t f^\mathrm {des}}{\sum _{i=1}^n \exp (-f_i^T\nu ^t)S_t f_i}\right) , \end{aligned}$$

(12)

where the logarithm and division are performed elementwise. This update can be interpreted as changing $\nu $ in the entries corresponding to the constraints given by property t by adding the log difference between the desired distribution and the (unnormalized) marginal distribution for this property suggested by the previous update. This follows from (10), which implies $w_i^t \propto \exp (-f_i^T\nu ^t)$ for each $i=1, \dots , n$, where $w^t$ is the distribution suggested by $\nu ^t$ at iteration t.

Resulting update over w We can rewrite the update for the dual variables $\nu $ as a multiplicative update for the primal variable w, which is exactly the update given by the iterative proportional fitting algorithm. More specifically, from (10),

$$\begin{aligned} w^{t+1}_i = \frac{\exp (-f_i^T\nu ^{t+1})}{\sum _{i=1}^n \exp (-f_i^T\nu ^{t+1})}. \end{aligned}$$

For notational convenience, we will write $x_{t i}= S_t f_i$, which is a unit vector denoting the category to which data point i belongs to, for property t. Plugging update (12) gives, after some algebra,

$$\begin{aligned} \exp (-f_i^T\nu ^{t+1}) = \exp (-f_i^T\nu ^t) \frac{\exp (x_{t i}^T\log (S_t f^\mathrm {des}))}{\exp \left( x_{t i}^T\log \left( \sum _j \exp (-f_j^T\nu ^t)x_{t j}\right) \right) }. \end{aligned}$$

Since $x_{t i}$ is a unit vector, $ \exp (x_{t i}^T \log (y)) = x_{t i}^Ty$ for any vector $y > 0$, so

$$\begin{aligned} \exp (-f_i^T\nu ^{t+1}) = \exp (-f_i^T\nu ^t) \frac{x_{t i}^TS_t f^\mathrm {des}}{\sum _{j=1}^n \exp (-f_j^T\nu ^t)x_{t i}^Tx_{t j}}. \end{aligned}$$

Finally, using (10) with $\nu ^t$ gives

$$\begin{aligned} w_i^{t+1} = w_i^t\frac{x_{t i}^TS_t f^\mathrm {des}}{\sum _{j=1}^n w^t_j (x_{t i}^Tx_{t j})}, \end{aligned}$$

Table 1 Desired sex values in percentages

Full size table

Table 2 Desired education and income values in percentages

Full size table

Table 3 Desired reported health values in percentages

Full size table

Table 4 Desired state and age values in percentages

Full size table

which is exactly the multiplicative update of the iterative proportional fitting algorithm, performed for property t.

B Expected values of BRFSS data

See Tables 1, 2, 3, and 4.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barratt, S., Angeris, G. & Boyd, S. Optimal representative sample weighting. Stat Comput 31, 19 (2021). https://doi.org/10.1007/s11222-021-10001-1

Download citation

Received: 21 September 2020
Accepted: 05 February 2021
Published: 28 February 2021
DOI: https://doi.org/10.1007/s11222-021-10001-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Optimal representative sample weighting

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Data clustering: application and trends

Availability of data and material.

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Code availability.

Additional information

Publisher's Note

Appendices

A Iterative proportional fitting

B Expected values of BRFSS data

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal representative sample weighting

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Data clustering: application and trends

Availability of data and material.

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Code availability.

Additional information

Publisher's Note

Appendices

A Iterative proportional fitting

B Expected values of BRFSS data

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation