Skip to main content
Log in

Single- and two-stage cross-sectional and time series benchmarking procedures for small area estimation

  • Invited Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

This article is divided into two parts. In the first part, we review and study the properties of single-stage cross-sectional and time series benchmarking procedures that have been proposed in the literature in the context of small area estimation. We compare cross-sectional and time series benchmarking empirically, using data generated from a time series model which complies with the familiar Fay–Herriot model at any given time point. In the second part, we review cross-sectional methods proposed for benchmarking hierarchical small areas and develop a new two-stage benchmarking procedure for hierarchical time series models. The latter procedure is applied to monthly unemployment estimates in Census Divisions and States of the USA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Battese GE, Harter RM, Fuller WA (1988) An error components model for prediction of county crop area using survey and satellite data. J Am Stat Assoc 83:28–36

    Article  Google Scholar 

  • Bell WR, Datta GS, Ghosh M (2012) Benchmarking small area estimators. Biometrika 100:189–202

    Article  MathSciNet  Google Scholar 

  • Butar F, Lahiri P (2003) On measures of uncertainty of empirical Bayes small area estimators. J Stat Plan Inference 112:63–76

    Article  MATH  MathSciNet  Google Scholar 

  • Cholette P, Dagum EB (1994) Benchmarking time series with autocorrelated survey errors. Int Stat Rev 62:365–377

    Article  MATH  Google Scholar 

  • Dagum EB, Cholette P (2006) Benchmarking, temporal distribution and reconciliation methods for time series data. Springer, New York

    Google Scholar 

  • Datta GS, Ghosh M, Steorts R, Maples J (2011) Bayesian benchmarking with applications to small area estimation. Test 20:574–588

    Article  MATH  MathSciNet  Google Scholar 

  • Di Fonso T, Marini R (2011) Simultaneous and two-step reconciliation of systems of time series: methodological and practical issues. J R Stat Soc Ser C 60:143–164

    Article  Google Scholar 

  • Doran HE (1992) Constraining Kalman filter and smoothing estimates to satisfy time-varying restrictions. Rev Econ Stat 74:568–572

    Article  Google Scholar 

  • Durbin J, Quenneville B (1997) Benchmarking by state space models. Int Stat Rev 65:23–48

    Article  MATH  Google Scholar 

  • Fay RE, Herriot RA (1979) Estimates of income for small places: an application of James–Stein procedures to census data. J Am Stat Assoc 74:269–277

    Article  MathSciNet  Google Scholar 

  • Ghosh M, Steorts RC (2013) Two-stage Bayesian benchmarking as applied to small area estimation. Test 22:670–687

    Article  MATH  MathSciNet  Google Scholar 

  • Hall P, Maiti T (2006) On parametric bootstrap methods for small area prediction. J R Stat Soc B 68:221–238

  • Harvey A (1989) Forecasting structural time series with the Kalman filter. Cambridge University Press, Cambridge

    Google Scholar 

  • Hillmer SC, Trabelsi A (1987) Benchmarking of economic time series. J Am Stat Assoc 82:1064–1071

    Article  MATH  MathSciNet  Google Scholar 

  • Isaki CT, Tsay JH, Fuller WA (2000) Estimation of census adjustment factors. Survey Methodol 26:31–42

    Google Scholar 

  • Lahiri P (1990) “Adjusted” Bayes and empirical Bayes estimation in finite population sampling. Sankhya 52, series B, pp 50–66

  • Nandram B, Sayit H (2011) A Bayesian analysis of small area probabilities under a constraint. Survey Methodol 37:137–152

    Google Scholar 

  • Pfeffermann D (2013) New important developments in small area estimation. Stat Sci 28:40–68

    Article  MathSciNet  Google Scholar 

  • Pfeffermann D, Nathan G (1981) Regression analysis of data from a cluster sample. J Am Stat Assoc 76:681–689

    Article  MATH  Google Scholar 

  • Pfeffermann D, Burck L (1990) Robust small area estimation combining time series and cross-sectional data. Survey Methodol 16:217–237

    Google Scholar 

  • Pfeffermann D, Barnard CH (1991) Some new estimators for small area means with application to the assessment of farmland values. J Business Econ Stat 9:73–83

    Google Scholar 

  • Pfeffermann D, Tiller RL (2006) Small area estimation with state-space models subject to benchmark constraints. J Am Stat Assoc 101:1387–1397

    Article  MATH  MathSciNet  Google Scholar 

  • Prasad N, Rao JNK (1990) The estimation of prediction mean squared error of small area estimators. J Am Stat Assoc 85:163–171

    Article  MATH  MathSciNet  Google Scholar 

  • Rao JNK (2003) Small area estimation. Wiley, New York

    Book  MATH  Google Scholar 

  • Ugarte MD, Militino AF, Goicoa T (2009) Benchmarked estimates in small areas using linear mixed models with restrictions. Test 18:342–364

  • Steorts RC, Ghosh M (2013) On estimation of mean squared errors of benchmarked empirical Bayes Estimators. Stat Sin (2014, in press)

  • Wang J, Fuller WA, Qu Y (2008) Small area estimation under a restriction. Survey Methodol 34:29–36

    Google Scholar 

  • You Y, Rao JNK (2002) A pseudo-empirical best linear unbiased prediction approach to small area estimation using survey weights. Can J Stat 30:431–439

    Article  MATH  MathSciNet  Google Scholar 

  • You Y, Rao JNK, Dick P (2004) Benchmarking hierarchical Bayes small area estimators in the Canadian census undercoverage estimation. Stat Transit 6:631–640

    Google Scholar 

  • You Y, Rao JNK, Hidiroglou M (2013) On the performance of self benchmarked small area estimators under the Fay–Herriot area level model. Survey Methodol 39:217–229

Download references

Acknowledgments

We are very grateful to three anonymous reviewers for providing many excellent comments, which enhanced the quality of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danny Pfeffermann.

Additional information

This invited paper is discussed in comments available at: doi:10.1007/s11749-014-0382-6; doi:10.1007/s11749-014-0384-4; doi:10.1007/s11749-014-0386-2; doi:10.1007/s11749-014-0400-8.

Appendices

Appendix A: Computation of \(\tilde{\Sigma }_{tt}^d =E(\tilde{{\mathbf {e}}}_t^d {\tilde{{\mathbf {e}}}}{'}_t^d )\)

The matrix \(\tilde{\Sigma }_{tt}^d =\left[ {{\begin{array}{cc} {\Sigma _{tt}^d }&{} {{\mathbf {h}}_{tt}^d }\\ {{{\mathbf {h}}}{'}_{tt}^d } &{} {v_{tt}^d }\\ \end{array} }} \right] \) has as its main block the \(S\times S\) diagonal V–C matrix of the State sampling errors, \(\Sigma _{tt}^d =E({\mathbf {e}}_t^d {{\mathbf {e}}}{'}_t^d )=\hbox {diag}[\sigma _{ds,tt}^2 ]\), where \({\mathbf {e}}_t^d =(e_{d1,t} ,\ldots ,e_{dS,t})^{\prime }\) (the direct CPS sampling errors are independent between the States). The computation of the other elements of \(\tilde{\Sigma }_{tt}^d \); \({\mathbf {h}}_{tt}^d =E({\mathbf {e}}_t^d r_{dt}^\mathrm{bmk} )\) and \(v_{tt}^d =\hbox {Var}(r_{dt}^\mathrm{bmk} )\) requires revisiting the first-stage benchmarking.

Consider first the computation of \(v_{tt}^d \). Denote \(\varvec{l}_{dt}^\mathrm{bmk} =(\hat{\varvec{\alpha }}_{dt}^\mathrm{bmk} -\varvec{\alpha }_{dt} )\), such that \(r_{dt}^\mathrm{bmk} ={{\mathbf {z}}}^{\prime }_{dt} \varvec{l}_{dt}^\mathrm{bmk} \) and \(v_{tt}^d ={{\mathbf {z}}}^{\prime }_{dt} E(\varvec{l}_{dt}^\mathrm{bmk} \varvec{l}_{dt}^{\mathrm{bmk}^{\prime }}){\mathbf {z}}_{dt}\). Let \(\varvec{l}_t^\mathrm{bmk} =(\tilde{\varvec{\alpha }}_t^\mathrm{bmk} -\varvec{\alpha }_t )=(\varvec{l}_{1t}^{\mathrm{bmk}{^\prime }} ,\ldots ,\varvec{l}_{Dt}^{\mathrm{bmk}^{\prime }})^{\prime }\) and \(J_d \) be the indicator matrix of zeroes and ones satisfying, \(\varvec{l}_{dt}^\mathrm{bmk} =J_d \varvec{l}_t^\mathrm{bmk} \). It follows that \(v_{tt}^d ={{\mathbf {z}}}^{\prime }_{dt} J_d P_t^\mathrm{bmk} {J}^{\prime }_d {\mathbf {z}}_{dt} \), where \(P_t^\mathrm{bmk} =E(\varvec{l}_t^\mathrm{bmk} \varvec{l}_t^{\mathrm{bmk}^{\prime }})\) is obtained recursively as defined below (15).

Next consider the computation of \({\mathbf {h}}_{tt}^d =E({\mathbf {e}}_t^d r_{dt}^\mathrm{bmk} )\). By Eq. (D.2) in P–T, the error \((\tilde{{\mathbf {\alpha }}}_t^\mathrm{bmk} -{\mathbf {\alpha }}_t )\) when predicting the division state vectors can be written as

$$\begin{aligned} \varvec{l}_t^\mathrm{bmk} =\left( \tilde{\varvec{\alpha }}_t^\mathrm{bmk} -\varvec{\alpha }_t\right) =G_t \tilde{T}\left( \tilde{\varvec{\alpha }}_{t-1}^\mathrm{bmk} -\tilde{\varvec{\alpha }}_{t-1}\right) -G_t \varvec{\eta }_t +K_t \tilde{{\mathbf {e}}}_t, \end{aligned}$$
(35)

with \(G_t \) and \(K_t \) defined in P–T and \(\tilde{{\mathbf {e}}}_t =(e_{1t} ,\ldots ,e_{Dt} ,\sum \nolimits _{d=1}^D {b_{dt} e_{dt} } )^{\prime }\). Let \(U_t =G_t \tilde{T}\) such that (35) can be written as \(\varvec{l}_t^\mathrm{bmk} =U_t \varvec{l}_{t-1}^\mathrm{bmk} -G_t \varvec{\eta }_t +K_t \tilde{\varvec{e}}_t \). Repeated substitutions in the last equation yields,

$$\begin{aligned} \varvec{l}_t^\mathrm{bmk}&=A_{t,2} \varvec{l}_1^\mathrm{bmk} +B_{t,2} \tilde{{\mathbf {e}}}_2 +\cdots +B_{t,t-1} \tilde{{\mathbf {e}}}_{t-1}\nonumber \\&\quad +K_t \tilde{{\mathbf {e}}}_t -D_{t,2} \varvec{\eta }_2 -\cdots -D_{t,t-1} \varvec{\eta }_{t-1} -G_t \varvec{\eta }_t,\quad t=2,3,\ldots , \end{aligned}$$
(36)

where \(A_{t,j} =U_t \times U_{t-1} \times \dots \times U_j ,\,\,j=2,\ldots ,t,\,\,B_{t,j} =A_{t,j+1} K_j ,\,\,D_{t,j} =A_{t,j+1} G_j \). Suppose that the GLS filter is initialized at time \(t=1\) by \(\tilde{T}\tilde{\varvec{\alpha }}_0^\mathrm{bmk}\), independently of the rest of the series. Then, \(C_{1,0}^\mathrm{bmk} =E[(\tilde{T}\tilde{\varvec{\alpha }}_0^\mathrm{bmk} -\varvec{\alpha }_1 ){\tilde{{\mathbf {e}}}}{'}_{1,0} ]=0\) and by Eq. (D.1) in P–T, \(\varvec{l}_1^\mathrm{bmk} =[I-P_{1\vert 0}^\mathrm{bmk} {\tilde{Z}}{'}_1 R_{1,0}^{-1} \tilde{Z}_1 ](\tilde{T}\tilde{\varvec{\alpha }}_0^\mathrm{bmk} -\tilde{\varvec{\alpha }}_1 )+P_{1\vert 0}^\mathrm{bmk} {\tilde{Z}}{'}_1 R_{1,0}^{-1} \tilde{{\mathbf {e}}}_1,\) where \(R_{1,0}^\mathrm{bmk} =[\tilde{Z}_1 P_{1\vert 0}^\mathrm{bmk} {\tilde{Z}}^{\prime }_1 +\tilde{\Sigma }_{11,0} ]\). Denoting \(K_{1\vert 0} =P_{1\vert 0}^\mathrm{bmk} {\tilde{Z}}^{\prime }_1 R_{1,0}^{-1},\,\,\,B_{t,1} =A_{t,2} K_{1\vert 0} ,\,\,\,D_{t,1} =A_{t,2} (I-K_{1\vert 0} \tilde{Z}_1 ),\,\,\,B_{t,t} =K_t ,\,\,\,D_{t,t} =G_t \) and ignoring the term \(D_{t,1} \tilde{T}(\tilde{\varvec{\alpha }}_0^\mathrm{bmk} -\varvec{\alpha }_0 )\) which does not enter into any of the computations that follow, (independent of the rest of the series), Eq. (36) can be rewritten as

$$\begin{aligned} \varvec{l}_t^\mathrm{bmk} =\sum \limits _{j=1}^t {B_{t,j} } \tilde{{\mathbf {e}}}_j -\sum \limits _{j=1}^t {D_{t,j} } \varvec{\eta }_{j}. \end{aligned}$$
(37)

The relationship in (37) allows us to express the division prediction error \(r_{dt}^\mathrm{bmk} ={{\mathbf {z}}}^{\prime }_{dt} J_d \varvec{l}_t^\mathrm{bmk} \) as a difference between linear functions of the division sampling errors and the state vector errors, which are independent of the sampling errors. We thus have,

$$\begin{aligned} {\mathbf {h}}_{tt}^d =E\left( {\mathbf {e}}_t^d r_{dt}^\mathrm{bmk} \right) =E\left[ {\mathbf {e}}_t^d {l}{'}_t^\mathrm{bmk} {J}{'}_d {\mathbf {z}}_{dt} )\right] =E\left( {\mathbf {e}}_t^d \sum \limits _{j=1}^t {{\tilde{{\mathbf {e}}}}{'}_j {B}^{\prime }_{t,j} } \right) {J}^{\prime }_d {\mathbf {z}}_{dt}. \end{aligned}$$
(38)

Now, \({\mathbf {e}}_t^d =(e_{d1,t},\ldots ,e_{dS,t})^{\prime }\), \({\tilde{{\mathbf {e}}}}{'}_j =(e_{1j} ,\ldots ,e_{Dj} ,\sum \nolimits _{k=1}^D {b_{kj} e_{kj} } )\) and \(e_{dj} =\sum \nolimits _{s=1}^S {b_{ds,j} e_{ds,j} } \), implying

$$\begin{aligned} E({\mathbf {e}}_t^d {\tilde{{\mathbf {e}}}}{'}_j )\!=\!\left[ {\mathbf {0}}_{(1)},\ldots ,{\mathbf {0}}_{(d-1)} ,E\left( {\mathbf {e}}_t^d e_{dj} \right) ,{\mathbf {0}}_{(d+1)},\ldots ,{\mathbf {0}}_{(D)} ,b_{dj} E\left( {\mathbf {e}}_t^d e_{dj} \right) \right] \!=\!\tilde{H}_{t,j}^{d},\nonumber \\ \end{aligned}$$
(39)

where \({\mathbf {0}}_{(k)} \) is the null vector of length S in position (column) \(k\), and

$$\begin{aligned} E({\mathbf {e}}_t^d e_{dj} )=E\left[ {\mathbf {e}}_t^d \left( \sum \limits _{s=1}^S {b_{ds,j} e_{ds,j} } \right) \right] =\left[ b_{d1,j} \sigma _{d1,tj}^2 ,\ldots ,b_{dS,j} \sigma _{dS,tj}^2\right] ^{\prime }. \end{aligned}$$
(40)

Substituting (40) in (39) and then in (38) gives the expression for the vector \({\mathbf {h}}_{tt}^d \),

$$\begin{aligned} {\mathbf {h}}_{tt}^d =E\left( {\mathbf {e}}_t^d r_{dt}^\mathrm{bmk} \right) =\sum \limits _{j=1}^t \left( \tilde{H}_{t,j}^d {B}{'}_{t,j}\right) {J}{'}_d {\mathbf {z}}_{dt}. \end{aligned}$$
(41)

Appendix B: Computation of \(C_{dt}^\mathrm{bmk} =E[(\tilde{T}^d\tilde{\varvec{\alpha }}_{t-1}^{d,\mathrm{bmk}} -\varvec{\alpha }_t^d ){\tilde{{\mathbf {e}}}}{'}_t^d ]\)

The computation of the covariance matrix \(C_{dt}^\mathrm{bmk} \) is more involved since it requires computing the covariances between the division benchmark error and the State sampling errors. We first express the prediction error \((\tilde{T}^d\tilde{\varvec{\alpha }}_t^{d,\mathrm{bmk}} -\varvec{\alpha }_{t+1}^d)\) corresponding to time \(t+1\) as a function of the sampling errors and the state vector errors, similarly to (37). Under the model, \(\varvec{\alpha }_{t+1}^d =\tilde{T}^d\varvec{\alpha }_t^d +\varvec{\eta }_{t+1}^d \). Hence,

$$\begin{aligned} {\mathbf {m}}_t^d =\left( \tilde{T}^d \tilde{\varvec{\alpha }}_t^{d,\mathrm{bmk}} -{\varvec{\alpha }}_{t+1}^d \right) =\tilde{T}^d\left( \tilde{\varvec{\alpha }}_t^{d,\mathrm{bmk}} -\varvec{\alpha }_t^d\right) -\varvec{\eta }_{t+1}^d =\tilde{T}^d \varvec{l}_t^{d,\mathrm{bmk}} -\varvec{\eta }_{t+1}^d,\quad \quad \end{aligned}$$
(42)

where \(\varvec{l}_t^{d,\mathrm{bmk}} =(\varvec{l}_{t1}^{{d,\mathrm{bmk}}{^\prime }} ,\varvec{l}_{t2}^{{d,\mathrm{bmk}}^{\prime }} ,\ldots ,\varvec{l}_{tS}^{{d,\mathrm{bmk}}^{\prime }})^{\prime }\) is the vector of the benchmark errors for the state vectors in the division. Now, using a similar decomposition to Eq. (D.2) in P–T, we have,

$$\begin{aligned} {\mathbf {m}}_t^d =\tilde{T}^d\left( G_t^d {\mathbf {m}}_{t-1}^d +K_t^d \tilde{{\mathbf {e}}}_t^d\right) -\varvec{\eta }_{t+1}^d =W_t^d {\mathbf {m}}_{t-1}^d +\tilde{T}^d K_t^d \tilde{{\mathbf {e}}}_t^d -\varvec{\eta }_{t+1}^d , \end{aligned}$$
(43)

where \(W_t^d =\tilde{T}^dG_t^d \), \(\tilde{{\mathbf {e}}}_t^d =(e_{d1,t} ,\ldots ,e_{dS,t} ,r_{d,t}^\mathrm{bmk})^{\prime }=({{\mathbf {e}}}{'}_t^d ,r_{d,t}^\mathrm{bmk} )^{\prime }\) and \(r_{d,t}^\mathrm{bmk} ={{\mathbf {z}}}^{\prime }_{dt} J_d \varvec{l}_t^\mathrm{bmk} \) is the division benchmark error at time \(t\) with \(\varvec{l}_t^\mathrm{bmk} \) defined by (37). The matrices \(G_t^d \) and \(K_t^d \) are defined similarly to \(G_t \) and \(K_t \) in Eq. (D.2) of P–T, but referring now to the States in a given division instead of to the divisions. By repeated substitutions in (43) and ignoring the term \(\tilde{T}^d(\tilde{\varvec{\alpha } }_0^{d,\mathrm{bmk}} -\varvec{\alpha }_0^d )\) for time \(t=0\) which drops out in each of the computations that follow (independent of the rest of the series), we obtain

$$\begin{aligned} {\mathbf {m}}_t^d =\sum \limits _{k=1}^t {B_{t,k}^d } \tilde{{\mathbf {e}}}_k^d -\sum \limits _{k=1}^{t+1} {D_{t,k}^d } \varvec{\eta }_k^d , \end{aligned}$$
(44)

where \(D_{t,k}^d =W_t^d \times W_{t-1}^d \times \ldots \times W_k^d ,\,\,k=2,\ldots ,t,\,\,D_{t,t+1}^d =I_{Sq} ,\,\,B_{t,k}^d =D_{t,k+1}^d \tilde{T}^dK_k^d ,\,\,k=1,\ldots ,t-1\), \(I_{Sq} \) is the identity matrix of order \(Sq\) (\(q=\dim (\eta _{ds} ))\) and \(B_{t,t}^d =\tilde{T}^dK_t^d \).

At time \(t+1\) we need to compute \(C_{d,t+1}^\mathrm{bmk} =E[(\tilde{T}^d \tilde{\varvec{\alpha }}_t^{d,\mathrm{bmk}} -\varvec{\alpha }_{t+1}^d ){\tilde{{\mathbf {e}}}}{'}_{t+1}^d ]=E({\mathbf {m}}_t^d \tilde{{{\mathbf {e}}}{'}}_{t+1}^d )\). By (44)

$$\begin{aligned} C_{d,t+1}^\mathrm{bmk}&= \sum \limits _{k=1}^t {B_{t,k}^d } E\left( \tilde{{\mathbf {e}}}_k^d {\tilde{{\mathbf {e}}}}{'}_{t+1}^d\right) -\sum \limits _{k=1}^{t+1} {D_{t,k}^d } E\left( \varvec{\eta }_k^d {\tilde{{\mathbf {e}}}}{'}_{t+1}^d\right) \nonumber \\&= \sum \limits _{k=1}^t {B_{t,k}^d } E\left[ \left( {{\mathbf {e}}}{'}_k^d ,r_{d,k}^\mathrm{bmk}\right) ^{\prime }\times \left( {{\mathbf {e}}}{'}_{t+1}^d ,r_{d,t+1}^\mathrm{bmk} \right) \right] \nonumber \\&-\sum \limits _{k=1}^{t+1} {D_{t,k}^d } E\left[ \varvec{\eta }_k^d \times \left( {{\mathbf {e}}}{'}_{t+1}^d ,r_{d,t+1}^\mathrm{bmk}\right) \right] . \end{aligned}$$
(45)

Next, we evaluate each of the expectations in (45). Define

$$\begin{aligned} \tilde{\Sigma }_{k,t+1}^d =E\left( \tilde{{\mathbf {e}}}_k^d {\tilde{{\mathbf {e}}}}{'}_{t+1}^d \right) =\left[ {{\begin{array}{c@{\quad }c} {\Sigma _{k,t+1}^d }&{} {{\mathbf {h}}_{t+1,k}^d }\\ {{{\mathbf {h}}}{'}_{k,t+1}^d }&{} {v_{k,t+1} }\\ \end{array} }} \right] . \end{aligned}$$
(46)

By (29),

$$\begin{aligned} \Sigma _{k,t+1}^d =E\left( {\mathbf {e}}_k^d {{\mathbf {e}}}{'}_{t+1}^d \right) =\hbox {Diag}\left[ \sigma _{d1,k,t+1}^2,\ldots ,\sigma _{dS,k,t+1}^2 \right] . \end{aligned}$$
(47)

By (37) and (39) and noting that \(E({\mathbf {e}}_j^d {\varvec{\eta }}^{\prime }_k )=0\) for all \((j,k)\),

$$\begin{aligned} {\mathbf {h}}_{t+1,k}^d&= E\left( {\mathbf {e}}_k^d r_{d,t+1}^\mathrm{bmk} \right) =E\left[ {\mathbf {e}}_k^d \left( \sum \limits _{j=1}^{t+1} {{\tilde{{\mathbf {e}}}}'_j {B}'_{t+1,j} } \right) \right] {J}'_d {\mathbf {z}}_{d,t+1} =\sum \limits _{j=1}^{t+1} {\tilde{H}_{k,j}^d {B}^{\prime }_{t+1,j} } {J}^{\prime }_d {\mathbf {z}}_{d,t+1},\nonumber \\ \end{aligned}$$
(48)
$$\begin{aligned} {{\mathbf {h}}}{'}_{k,t+1}^d&= E\left( r_{dk}^\mathrm{bmk} {{\mathbf {e}}}{'}_{t+1}^d\right) ={{\mathbf {z}}}'_{dk} J_d \sum \limits _{j=1}^k {B_{k,j} } E\left( \tilde{{\mathbf {e}}}_j {{\mathbf {e}}}{'}_{t+1}^d\right) ={{\mathbf {z}}}{'}_{dk} J_d \sum \limits _{j=1}^k {B_{k,j} \tilde{H}_{j,t+1}^d }. \end{aligned}$$
(49)

Remark 18

\({{\mathbf {h}}}{'}_{k,t+1}^d \ne {{\mathbf {h}}}{'}_{t+1,k}^d \).

Recalling that the error vectors \(\tilde{{\mathbf {e}}}_j =(e_{1j} ,\ldots ,e_{Dj} ,\sum \nolimits _{d=1}^D {b_{dj} e_{dj} })^{\prime }\) are only functions of the division sampling errors and thus independent of the state error vectors \(\{\varvec{\eta }_l\}\), and that under the model \(E(\varvec{\eta }_j {\varvec{\eta }}^{\prime }_j )=Q\); \(E(\varvec{\eta }_j {\varvec{\eta }}^{\prime }_{\,l} )=0\), \(j\ne l\), and defining \(\tilde{\Sigma }_{i,j} =E(\tilde{{\mathbf {e}}}_i {\tilde{{\mathbf {e}}}}{'}_j )\),

$$\begin{aligned} v_{k,t+1}&=E\left( r_{dk}^\mathrm{bmk} r_{d,t+1}^\mathrm{bmk}\right) ={{\mathbf {z}}}{'}_{dk} J_d E\left[ \left( \sum \limits _{j=1}^k {B_{k,j}} \tilde{{\mathbf {e}}}_j-\sum \limits _{j=1}^k {D_{k,j} } \varvec{\eta }_j \right) \right. \nonumber \\&\quad \qquad \qquad \qquad \qquad \qquad \left. \times \left( \sum \limits _{l=1}^{t+1} {\tilde{{{\mathbf {e}}}{'}}_l {B}'_{t+1,l} } -\sum \limits _{l=1}^{t+1} {{\varvec{\eta }}{'}_{\,l} {D}'_{t+1,l} }\right) \right] {J}'_d {\mathbf {z}}_{d,t+1}\nonumber \\&={{\mathbf {z}}}^{\prime }_{dk} J_d \left[ \sum \limits _{j=1}^k {\sum \limits _{l=1}^{t+1} {B_{k,j} E\left( \tilde{{\mathbf {e}}}_j \tilde{{{\mathbf {e}}}}{'}_l \right) {B}'_{t+1,l} } } +\sum \limits _{j=1}^k {\sum \limits _{l=1}^{t+1} {D_{k,j} E\left( \varvec{\eta }_j {\varvec{\eta }}^{\prime }_{\,l}\right) {D}^{\prime }_{t+1,l} } } \right] {J}^{\prime }_d {\mathbf {z}}_{d,t+1} \nonumber \\&={{\mathbf {z}}}^{\prime }_{dk} J_d \left[ \sum \limits _{j=1}^k {\sum \limits _{l=1}^{t+1} {B_{k,j} \tilde{\Sigma }_{j,l} {B}^{\prime }_{t+1,l} } } +\sum \limits _{j=1}^k {D_{k,j} Q{D}^{\prime }_{t+1,j} } \right] {J}^{\prime }_d {\mathbf {z}}_{d,t+1}. \end{aligned}$$
(50)

Remark 19

Equation (50) refers to the first step of the benchmarking process.

Finally, by (37) and noting that \(E(\varvec{\eta }_k^d {\varvec{\eta } }^{\prime }_j )=0\) for \(j\ne k\),

$$\begin{aligned} E\left( \varvec{\eta }_k^d {\tilde{{\mathbf {e}}}}{'}_{t+1}^d\right)&= E\left[ \varvec{\eta }_k^d \left( {{\mathbf {e}}}{'}_{t+1}^d ,r_{d,t+1}^\mathrm{bmk} \right) \right] =\left[ 0_{Sq\times S} ,E\left( {\mathbf {\eta }}_k^d r_{d,t+1}^\mathrm{bmk} \right) \right] \nonumber \\&= \left[ 0_{Sq\times S} ,-E\left( \varvec{\eta }_k^d {\varvec{\eta }}^{\prime }_k \right) {D}^{\prime }_{t+1,k} {J}^{\prime }_d {\mathbf {z}}_{d,t+1} \right] , \end{aligned}$$
(51)

where \(0_{Sq\times S} \) is the null matrix of dimension \(Sq\times S\). We assume throughout that \(\varvec{\alpha }_{dk} = \sum \nolimits _{s=1}^S {b_{ds,k} \varvec{\alpha }_{ds,k}}\), implying \(\varvec{\eta }_k^d =J_d \varvec{\eta }_k = \sum \nolimits _{s=1}^S {b_{ds,k} \varvec{\eta }_{ds,k} } \) and hence,

(52)

where is the matrix of dimension \(Sq\times Dq\) with \([b_{d1,k} Q_{d1,k},\ldots ,b_{dS,k} Q_{dSk}]^{\prime }\) in columns \((d-1)q+1,\ldots ,dq\) and zeroes elsewhere; \(Q_{ds,k} =E(\varvec{\eta }_{ds,k} {\varvec{\eta }}^{\prime }_{ds,k} )\) (Eq. 29).

Substituting (47)–(50) in (46), and then (46) and (52) in (45) completes the computation of the covariance matrix \(C_{d,t+1}^\mathrm{bmk} \).

Appendix C: Computation of \(P_t^{d,\mathrm{bmk}} =E[(\hat{\varvec{\alpha }}_t^{d,\mathrm{bmk}} -\varvec{\alpha }_t^d )(\hat{\varvec{\alpha }}_t^{d,\mathrm{bmk}} -\varvec{\alpha }_t^d )^{\prime }]\)

The computation of the true V–C matrix of the benchmarked state prediction errors follows the same steps as the computation of the V–C matrix \(P_t^\mathrm{bmk} \) in the first-stage benchmarking (Appendix in P–T). First, rewrite the benchmarked predictor (33) as,

$$\begin{aligned} \tilde{\varvec{\alpha }}_t^{d,\mathrm{bmk}}&= \left[ I-\left( P_{t\vert t-1}^{\,d,\mathrm{bmk}} \tilde{{Z}'}_t^d -C_{dt,0}^\mathrm{bmk} \right) R_{dt}^{-1} \tilde{Z}_t^d \right] \tilde{T}_d \tilde{\varvec{\alpha }}_{t-1}^{d,\mathrm{bmk}}\nonumber \\&+\left( P_{t\vert t-1}^{\,d,\mathrm{bmk}} \tilde{{Z}'}_t^d -C_{dt,0}^\mathrm{bmk} \right) R_{dt}^{-1} \tilde{{\mathbf {y}}}_t^d =G_t^d \tilde{T}_d \tilde{\varvec{\alpha }}_{t-1}^{d,\mathrm{bmk}} +K_t^d \tilde{{\mathbf {y}}}_t^d. \end{aligned}$$
(53)

Next, substitute \(\tilde{{\mathbf {y}}}_t^d =\tilde{Z}_t^d \varvec{\alpha }_t^d +\tilde{{\mathbf {e}}}_t^d \) and decompose \(\varvec{\alpha }_t^d =G_t^d \varvec{\alpha }_t^d +K_t^d Z_t^d \varvec{\alpha }_t^d \), implying

$$\begin{aligned} \tilde{\varvec{\alpha }}_t^{d,\mathrm{bmk}} -\varvec{\alpha }_t^d =G_t^d \left( \tilde{T}_d \tilde{\varvec{\alpha } }_{t-1}^{d,\mathrm{bmk}} -\varvec{\alpha }_t^d\right) +K_t^d \tilde{{\mathbf {e}}}_t^{d}. \end{aligned}$$
(54)

The expression for \(P_t^{d,\mathrm{bmk}} \) in (34) follows straightforwardly.

Appendix D: Derivation of Eq. (24)

In what follows we drop for convenience the area index \(i\) from the notation. Let \(\beta _t =T\beta _{t-1} +\eta _t \), where \(\eta _t\) is white noise. Repeated substitutions imply

$$\begin{aligned} \beta _t =T^t\beta _0 +\sum \limits _{l=1}^t {T^{t-l}\eta _l }. \end{aligned}$$
(55)

For the model defined by (23b)–(23c), \(\beta _t =({\beta }{'}_t^{(1)} ,{\beta }{'}_t^{(2)})^{\prime }\), where \(\beta _t^{(1)} =(L_t ,R_t )^{\prime }\) and \(\beta _t^{(2)} =(S_{1,t} ,S_{1,t}^*,\ldots ,S_{5,t} ,S_{5,t}^*,S_{6,t} )^{\prime }\). Accordingly, let \(\eta _t =({\eta }{'}_t^{(1)} ,{\eta }{'}_t^{(2)} )^{\prime }\), where \(\eta _t^{(1)} =(\eta _{_{Lt}} ,\eta _{_{Rt}})^{\prime }\), \(\eta _t^{(2)} =(\eta _{_{1t}} ,\eta _{_{1t}}^*,\ldots ,\eta _{_{5t}} ,\eta _{_{5t}}^*,\eta _{_{6t}} )^{\prime }\). Define \(h=(1,0,1,0,\ldots 1,0,1)^{\prime }\) such that by (23c) \(S_t ={h}^{\prime }\beta _t^{(2)} \). Then, by (23b) the population value at time \(t\) is

$$\begin{aligned} Y_t =x_t \beta _1 +L_t +S_t =x_t \beta _1 +(1,0)\beta _t^{(1)} +{h}^{\prime }\beta _t^{(2)}. \end{aligned}$$
(56)

Under the model (23b)–(23c), \(T=\left[ {{\begin{array}{c@{\quad }c} {T_{(1)} }&{} 0 \\ 0&{} {T_{(2)} } \\ \end{array} }} \right] \), where \(T_{^{(1)}} =\left[ {{\begin{array}{c@{\quad }c} 1&{} 1 \\ 0&{} 1\\ \end{array} }} \right] \) and

$$\begin{aligned} T_{^{(2)}} =\left[ {{\begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} {\mathrm{\cos }\frac{\pi }{6}} &{} {\mathrm{sin}\frac{\pi }{6}} &{} 0 &{} 0 &{} 0 &{} 0 \\ {-\mathrm{sin}\frac{\pi }{6}} &{} {\mathrm{cos}\frac{\pi }{6}} &{} &{} &{} &{} 0 \\ 0 &{} &{} 0 &{} &{} &{} 0 \\ 0 &{} &{} &{} {\mathrm{cos}\frac{5\pi }{6}} &{} {\mathrm{sin}\frac{5\pi }{6}} &{} 0 \\ 0 &{} &{} &{} {-\mathrm{sin}\frac{5\pi }{6}} &{} {\mathrm{cos}\frac{5\pi }{6}} &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} {\mathrm{cos}\pi } \\ \end{array} }} \right] . \end{aligned}$$

At time \(t, T^t=\left[ {{\begin{array}{c@{\quad }c} {T_{(1)}^t } &{} 0 \\ 0 \ &{} {T_{(2)}^t }\\ \end{array} }} \right] \) where \(T_{(1)}^t =\left[ {{\begin{array}{c@{\quad }c} 1 &{} 1 \\ 0 &{} 1 \\ \end{array} }} \right] ^t=\left[ {{\begin{array}{c@{\quad }c} 1 &{} t \\ 0 &{} 1 \\ \end{array} }} \right] \) and \(T_{(2)}^t \) is obtained from \(T_{(2)} \) by replacing every angle \(\frac{j\pi }{6}\) in \(T_{(2)} \) by \(\frac{j\pi t}{6}\), \(j=1,\ldots ,6\).

It follows that \(T_{(1)}^t \beta _0^{(1)} =(L_0 +tR_0 ,R_0 )^{\prime }\), and by familiar trigonometric equalities \({h}^{\prime }\beta _t^{(2)} ={h}^{\prime }T_{(2)}^t \beta _0^{(2)} =(\cos \frac{\pi }{6}t,\sin \frac{\pi }{6}t,\ldots ,\cos \pi t)\beta _0^{(2)} =\sum \nolimits _{j=1}^6 {S_{j0} \cos \frac{\pi j}{6}t} +\sum \nolimits _{j=1}^5 {S_{j0}^*\sin \frac{\pi j}{6}t}.\)

Similar computations imply,

$$\begin{aligned} T_{(1)}^{t-l} \eta _l^{(1)}&= (\eta _{_{Ll}} +(t-l)\eta _{_{lR}} ,\eta _{_{lR}} )^{\prime };\\{h}^{\prime }T_{(2)}^{t-l} \eta _l^{(2)}&= \sum \limits _{l=1}^6 {\eta _{_{l,j}} \cos \frac{\pi l}{6}(t} -l)+\sum \limits _{l=1}^5 {\eta _{_{l,j}}^*\sin \frac{\pi l}{6}(t-l)}. \end{aligned}$$

By (55) and (56)

$$\begin{aligned} Y_t -x_t \beta _1&= (1,0)\beta _t^{(1)} +{h}'\beta _t^{(2)} =L_0 +tR_0 +R_0 +\sum \limits _{j=1}^6 {S_{j0} \cos \frac{\pi j}{6}t} \nonumber \\&+\sum \limits _{j=1}^5 {S_{j0}^*\sin \frac{\pi j}{6}t}+\sum \limits _{l=1}^t {\eta _{_{Ll}} } +\sum \limits _{l=1}^t {(t-l)\eta _{_{Rl}} } +\sum \limits _{l=1}^t {\eta _{_{Rl}} } \nonumber \\&+\sum \limits _{l=1}^t {\left[ \sum \limits _{j=1}^6 {\eta _{jl} \cos \frac{\pi j}{6}(t} -l)+\sum \limits _{j=1}^5 {\eta _{jl}^*\sin \frac{\pi j}{6}(t-l)}\right] }. \end{aligned}$$
(57)

Equation (24) follows from (57).

Appendix E: Census Divisions and States in the USA

Census divisions

States

New England

Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont

Middle Atlantic

New Jersey, New York, Pennsylvania

East North Central

Illinois, Indiana, Michigan, Ohio, Wisconsin

West North Central

Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, South Dakota

South Atlantic

Delaware, District of Columbia, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, West Virginia

East South Central

Alabama, Kentucky, Mississippi, Tennessee

West South Central

Arkansas, Louisiana, Oklahoma, Texas

Mountain

Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, Wyoming

Pacific

Alaska, California, Hawaii, Oregon, Washington

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pfeffermann, D., Sikov, A. & Tiller, R. Single- and two-stage cross-sectional and time series benchmarking procedures for small area estimation. TEST 23, 631–666 (2014). https://doi.org/10.1007/s11749-014-0398-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-014-0398-y

Keywords

Mathematics Subject Classification

Navigation