pp 1–44 | Cite as

Kernel density estimation from complex surveys in the presence of complete auxiliary information

  • Sayed A. Mostafa
  • Ibrahim A. Ahmad


Auxiliary information is widely used in survey sampling to enhance the precision of estimators of finite population parameters, such as the finite population mean, percentiles, and distribution function. In the context of complex surveys, we show how auxiliary information can be used effectively in kernel estimation of the superpopulation density function of a given study variable. We propose two classes of “model-assisted” kernel density estimators that make efficient use of auxiliary information. For one class we assume that the functional relationship between the study variable Y and the auxiliary variable X is known, while for the other class the relationship is assumed unknown and is estimated using kernel smoothing techniques. Under the first class, we show that if the functional relationship can be written as a simple linear regression model with constant error variance, the mean of the proposed density estimator will be identical to the well-known regression estimator of the finite population mean. If we drop the intercept from the linear model and allow the error variance to be proportional to the auxiliary variable, the mean of the proposed density estimator matches the ratio estimator of the finite population mean. The properties of the new density estimators are studied under a combined design-model-based inference framework, which accounts for the underlying superpopulation model as well as the randomization distribution induced by the sampling design. Moreover, the asymptotic normality of each estimator is derived under both design-based and combined inference frameworks when the sampling design is simple random sampling without replacement. For the practical implementation of these estimators, we discuss how data-driven bandwidth estimators can be obtained. The finite sample properties of the proposed estimators are addressed via simulations and an example that mimics a real survey. These simulations show that the new estimators perform very well compared to standard kernel estimators which do not utilize the auxiliary information.


Auxiliary information Combined inference Complex survey data Kernel density estimation 

Mathematics Subject Classification

62D05 62G08 



The authors are grateful to the Editor and two anonymous referees for their insightful comments and suggestions which helped to improve this paper.

Compliance with ethical standards

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Supplementary material

184_2018_703_MOESM1_ESM.pdf (181 kb)
Supplementary material 1 (pdf 180 KB)


  1. Ahmad IA (2002) Kernel estimation in a continuous randomized response model. In: Handbook of applied econometrics and statistical inference vol 165, pp 97–114Google Scholar
  2. Bellhouse DR, Stafford JE (1999) Density estimation from complex surveys. Stat Sin 9:407–424MathSciNetzbMATHGoogle Scholar
  3. Billingsley P (1995) Probability and measure. Wiley, New YorkzbMATHGoogle Scholar
  4. Bleuer SR, Kratina IS (2005) On the two-phase framework for joint model and design-based inference. Ann Stat 33:2789–2810MathSciNetCrossRefGoogle Scholar
  5. Bonnéry D, Breidt FJ, Coquet F (2017) Kernel estimation for a superpopulation probability density function under informative selection. Metron 75:301–318MathSciNetCrossRefGoogle Scholar
  6. Breidt FJ, Claeskens G, Opsomer JD (2005) Model-assisted estimation for complex surveys using penalised splines. Biometrika 92:831–846MathSciNetCrossRefGoogle Scholar
  7. Breidt JF, Opsomer JD (2000) Local polynomial regression estimators in survey sampling. Ann Stat 28:1026–1053CrossRefGoogle Scholar
  8. Breunig RV (2001) Density estimation for clustered data. Econom Rev 20:353–367MathSciNetCrossRefGoogle Scholar
  9. Breunig RV (2008) Nonparametric density estimation for stratified samples. Stat Probab Lett 78:2194–2200MathSciNetCrossRefGoogle Scholar
  10. Buskirk TD, Lohr SL (2005) Asymptotic properties of kernel density estimation with complex survey data. J Stat Plan Inference 128:165–190MathSciNetCrossRefGoogle Scholar
  11. Dorfman AH, Hall P (1993) Estimators of the finite population distribution function using nonparametric regression. Ann Stat 21:1452–1475MathSciNetCrossRefGoogle Scholar
  12. Fan J, Gijbels I (1996) Local polynomial modelling and its applications. Chapman & Hall, New YorkzbMATHGoogle Scholar
  13. Fuller WA (2009) Sampling statistics. Wiley, New JerseyCrossRefGoogle Scholar
  14. Glad IK, Hjort NL, Ushakov NG (2003) Correction of density estimators that are not densities. Scand J Stat 30:415–427MathSciNetCrossRefGoogle Scholar
  15. Hájek J (1960) Limiting distribuions in simple random sampling from a finite population. Publications of Mathematical Institute of Hungarian Academy of Sciences. Ser. A vol 5, pp 361–374Google Scholar
  16. Hansen BE (2008) Uniform convergence rates for kernel estimation with dependent data. Econ Theory 24:726–748MathSciNetCrossRefGoogle Scholar
  17. Harms T, Duchesne P (2010) On kernel nonparametric regression designed for complex survey data. Metrika 72:111–138MathSciNetCrossRefGoogle Scholar
  18. Hartley HO, Sielken RL (1975) A “superpopulation viewpoint” for finite population sampling. Biometrics 31:411–422MathSciNetCrossRefGoogle Scholar
  19. Hayfield T, Racine JS (2008) Nonparametric econometrics: The np package. J Stat Softw 27:1–32CrossRefGoogle Scholar
  20. Howell KB (2001) Principles of fourier analysis. Chapman & Hall /CRC Press, New YorkCrossRefGoogle Scholar
  21. Isaki CT, Fuller WA (1982) Survey design under the regression superpopulation model. J Am Stat Assoc 77:89–96MathSciNetCrossRefGoogle Scholar
  22. Johnson AA, Breidt FJ, Opsomer JD (2008) Estimating distribution functions from survey data using nonparametric regression. J Stat Theory Pract 2:419–431MathSciNetCrossRefGoogle Scholar
  23. Korn EL, Graubard BI (1999) Analysis of health surveys. Wiley, New YorkCrossRefGoogle Scholar
  24. Krewski D, Rao JNK (1981) Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods. Ann Stat 9:1010–1019MathSciNetCrossRefGoogle Scholar
  25. Kulik R (2011) Nonparametric conditional variance and error density estimation in regression models with dependent errors and predictors. Electron J Stat 5:856–898MathSciNetCrossRefGoogle Scholar
  26. Li Q, Racine JS (2007) Nonparametric econometrics: theory and practice. Princeton University Press, PrincetonzbMATHGoogle Scholar
  27. Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142CrossRefGoogle Scholar
  28. Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076MathSciNetCrossRefGoogle Scholar
  29. Pfeffermann D (1993) The role of sampling weights when modeling survey data. Int Stat Rev 61:317–337CrossRefGoogle Scholar
  30. Pons O (2011) Functional estimation for density. regression models and processes. World Scientific Publishing Co., Pte. Ltd., SingaporeCrossRefGoogle Scholar
  31. R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  32. Randles RH (1982) On the asymptotic normality of statistics with estimated parameters. Ann Stat 10:462–474MathSciNetCrossRefGoogle Scholar
  33. Rao JNK, Kovar JG, Mantel HJ (1990) On estimating distribution functions and quantiles from survey data using auxiliary information. Biometrika 77:365–375MathSciNetCrossRefGoogle Scholar
  34. Robinson PM, Särndal CE (1983) Asymptotic properties of the generalized regression estimator in probability sampling. Sankhya Ser B 45:240–248MathSciNetzbMATHGoogle Scholar
  35. Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27:832–837MathSciNetCrossRefGoogle Scholar
  36. Särndal CE, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer, New YorkCrossRefGoogle Scholar
  37. Scott DW (2004) Multivariate density estimation and visualization. Papers / Humboldt-Universität Berlin. Cent Appl Stat Econ 16:1–23Google Scholar
  38. Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, New YorkzbMATHGoogle Scholar
  39. Sen PK (1988) Asymptotics in finite population sampling. In: Handbook of statistics vol 6, pp 291–331Google Scholar
  40. Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc B 53:683–690MathSciNetzbMATHGoogle Scholar
  41. Terrell GR, Scott DW (1980) On improving convergence rates for nonnegative kernel density estimators. Ann Stat 8:1160–1163MathSciNetCrossRefGoogle Scholar
  42. Thompson ME (1997) Theory of sample surveys. Chapman and Hall, LondonCrossRefGoogle Scholar
  43. Wand M, Jones M (1995) Kernel smoothing. Chapman and Hall, LondonCrossRefGoogle Scholar
  44. Watson GS (1964) Smooth regression analysis. Sankhya Ser A 26:359–372MathSciNetzbMATHGoogle Scholar
  45. Yao Q, Tong H (1994) Quantifying the influence of initial values on nonlinear prediction. J R Stat Soc Ser B 56:701–725zbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of StatisticsIndiana UniversityBloomingtonUSA
  2. 2.Department of MathematicsNorth Carolina A&T State UniversityGreensboroUSA
  3. 3.Department of StatisticsOklahoma State UniversityStillwaterUSA

Personalised recommendations