Skip to main content

Small Area Estimation with Correctly Specified Linking Models

  • Chapter
  • First Online:

Abstract

It is possible to improve the precision of a sample estimator for a small area based on sparse area-specific data by combining it with a model of its estimand, provided that this model is correctly specified. A proof of this result and the method of correctly specifying the models of the estimands of sample estimators are given in this paper. Widely used two-step estimation is shown to yield inconsistent estimators. The accuracies of different sample estimates of a population value can be improved by simultaneously estimating the population value and sums of the sampling and non-sampling errors of these sample estimates.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Let \( {f_{{{\tau_i}}}}(\theta, \cdot ),{\tau_i}=1,2,\ldots \) be a family of sequences of probability measures on R k , a k-dimensional Euclidean space. Let the parameter θ take values in a compact metric space I. Let \( {\phi_{{{\tau_i}}}}(t,\theta ) \) denote the characteristic function of \( {f_{{{\tau_i}}}}(\theta, \cdot ) \). Then \( {f_{{{\tau_i}}}}(\theta, \cdot ) \) is said to converge to \( {f_0}(\theta, \cdot ) \) in the UC* sense relative to \( \theta \in I \) if (a) \( \mathop{{ \sup }}\limits_{{\theta \in I}}|{\phi_{{{\tau_i}}}}(t,\theta )-{\phi_0}(t,\theta )|\to 0\ \mathrm{as}\ {\tau_i}\to \infty \) where the characteristic function \( {\phi_0}(t,\theta ) \) of \( {f_0}(\theta, \cdot ) \) is equicontinuous in \( \theta \) at t = 0, and (b) \( {\phi_0}(t,\theta ) \) is a continuous function of \( \theta \) for each t.

References

  • Basmann, R.L. (1988). Causality Tests and Observationally Equivalent Representations of Econometric Models. Journal of Econometrics, 39, 69–104.

    Article  Google Scholar 

  • Brown, L.D. (1990). An Ancillarity Paradox Which Appears in Multiple Linear Regression (with discussion). The Annals of Statistics, 18, 471–538.

    Article  Google Scholar 

  • Cavanagh, C.L. and Rothenberg, T.J. (1995). Generalized Least Squares with Nonnormal Errors. Advances in Econometrics and Quantitative Economics, G.S. Maddala, P.C.B. Phillips and T.N. Srinivasan (eds). Oxford, UK: Blackwell.

    Google Scholar 

  • Cochran, W.G. (1977). Sampling Techniques. 3rd edition. New York: John Wiley & Sons.

    Google Scholar 

  • Crainiceanu, C.M., Ruppert, D. and Wand, M.P. (2004). Bayesian Analysis for Penalized Spline Regression Using WinBUGS. Posted in the Internet.

    Google Scholar 

  • de Finetti, B. (1974). The Theory of Probability. Vol. 1. New York: John Wiley & Sons.

    Google Scholar 

  • Durbin, J. and Koopman, S.J. (2001). Time Series Analysis by State Space Methods. Oxford: Oxford University Press.

    Google Scholar 

  • Greene, W.H. (2008). Econometric Analysis, 6th edition. Upper Saddle River, New Jersey: Pearson Prentice Hall.

    Google Scholar 

  • Hall, S.G., Swamy, P.A.V.B. and Tavlas, G.S. (2012a). Generalized Cointegration: A New Concept with an Application to Health Expenditure and Health Outcomes, Empirical Economics, 42, 603–618.

    Article  Google Scholar 

  • Hall, S.G., Swamy, P.A.V.B. and Tavlas, G.S. (2012b). Milton Friedman, the Demand for Money, and the ECB’s Monetary Policy Strategy, Federal Reserve Bank of St. Louis Review, 94, 153–185.

    Google Scholar 

  • Hall, S.G., Kenjegaliev, A., Swamy, P.A.V.B. and Tavlas, S.G. (2012). The Forward Rate Premium Puzzle: A Case of Misspecification? Studies in Nonlinear Dynamics and Econometrics, forthcoming.

    Google Scholar 

  • Hwang, J.S. and Dempster, A.P. (1999). A Stochastic System for Modeling Labor Force Series of Small Areas. Statistica Sinica, 9, 297–324.

    Google Scholar 

  • Judge, G.G., Griffiths, W.E., Hill, R.C., Lütkepohl, H. and Lee, T. (1985). The Theory and Practice of Econometrics, 2nd edition. New York: John Wiley and Sons.

    Google Scholar 

  • Kariya, T. and Kurata, H. (2004). Generalized Least Squares. Hoboken, New Jersey: John Wiley & Sons, Ltd.

    Book  Google Scholar 

  • Kiefer, J. (1977). Conditional Confidence Statements and Confidence Estimators. Journal of the American Statistical Association, 72, 789–808.

    Google Scholar 

  • Lehmann, E.L. (1999). Elements of Large-Sample Theory. New York: Springer.

    Book  Google Scholar 

  • Lehmann, E.L. and Casella, G. (1998). Theory of Point Estimation. 2nd edition. New York: Springer.

    Google Scholar 

  • Lent, J. (1991). Variance Estimation for Current Population Survey Small Area Labor Force Estimates. Proceedings of the Section on Survey Research Methods, American Statistical Association.

    Google Scholar 

  • Little, R.J. (2004). To Model or Not to Model? Comparing Modes of Inference for Finite Population Sampling. Journal of the American Statistical Association, 99, 546–556.

    Article  Google Scholar 

  • Montgomery, A.L., Zarnowitz, V., Tsay, R.S. and Tiao, G.C. (1998). Forecasting the U.S. Unemployment Rate. Journal of the American Statistical Association, 93, 478–493.

    Google Scholar 

  • Murphy, K. and Topel, R. (2002). Estimation and Inference in Two Step Econometric Models. Journal of Business and Economic Statistics, 20, 88–97.

    Article  Google Scholar 

  • Nelson, C.R. and Startz, R. (2006). The Zero-Information-Limit Condition and Spurious Inference in Weakly Identified Models. Posted in the Internet.

    Google Scholar 

  • Pearl, J. (2000). Causality. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Pratt, J.W. and Schlaifer, R. (1988). On the Interpretation and Observation of Laws. Journal of Econometrics, 39, 23–52.

    Article  Google Scholar 

  • Rao, C.R. (1973). Linear Statistical Inference and Its Applications, 2nd edition. New York: John Wiley & Sons.

    Book  Google Scholar 

  • Rao, J.N.K. (2003). Small Area Estimation. Hoboken, New Jersey: John Wiley & Sons, Inc.

    Google Scholar 

  • Rothenberg, T.J. (1984). Approximate Normality of Generalized Least Squares Estimates. Econometrica, 52, 811–825.

    Article  Google Scholar 

  • Sethuraman, J. (1961). Some Limit Theorems for Joint Distributions. Sankhya, Series A, 379–386.

    Google Scholar 

  • Skyrms, B. (1988). Probability and Causation. Journal of Econometrics, 39, 53–68.

    Article  Google Scholar 

  • Swamy, P.A.V.B. and Mehta, J.S. (1969). On Theil’s Mixed Regression Estimator. Journal of the American Statistical Association, 64, 273–276.

    Article  Google Scholar 

  • Swamy, P.A.V.B., Mehta, J.S., Chang, I., and Zimmerman, T.S. (2009). An Efficient Method of Estimating the True Value of a Population Characteristic from its Discrepant Estimates. Coputational Statistics & Data Analysis, 53, 2378–2389.

    Article  Google Scholar 

  • Swamy, P.A.V.B., Tavlas, G.S., Hall, S.G.F., and Hondroyiannis, G. (2010). Estimation of Parameters in the Presence of Model Misspecification and Measurement Error. Studies in Nonlinear Dynamics & Econometrics, 14, 1–33.

    Article  Google Scholar 

  • Swamy, P.A.V.B. and Hall, S.G.F. (2012). Measurement of Causal Effects. Economic Change and Restructuring, 45, 3–23.

    Article  Google Scholar 

  • Wolter, K.M. (1985). Introduction to Variance Estimation. New York: Springer.

    Google Scholar 

  • U.S. Department of Labor, Bureau of Labor Statistics (1997). BLS Handbook of Methods, Bulletin 2490. Washington, DC: Government Printing Office.

    Google Scholar 

  • U.S. Department of Labor, Bureau of Labor Statistics (2006). Design and Methodology: Current Population Survey, Technical Paper 66. Washington, DC: Government Printing Office.

    Google Scholar 

  • van Dijk, D., Teräsvirta, T. and Franses, P.H. (2000). Smooth Transition Autoregressive Models: A Survey of Recent Developments. Posted in the Internet.

    Google Scholar 

Download references

Acknowledgement

We thank Jean Roth of the NBER and Roslyn Gorin of Temple University for helping J. S. Mehta retrieve the data used in this paper from NBER Public Use Data Files and thank David Hill of Temple University for teaching J. S. Mehta how to use MATLAB.

Data Sources: NBER Public Use Data Files are the sources of our monthly data on direct state CPS estimates of employment, \( {\mathit{BP}_{\mathit{it} }} \), \( {\mathit{HP}_{\mathit{it} }} \), \( {\mathit{TP}_{\mathit{it} }} \), and “a” and “b” parameters of the CPS Generalized Variance Functions. The BLS website is the source of our monthly data on state CES and two-step estimates of employment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. A. V. B. Swamy .

Editor information

Editors and Affiliations

Appendices

Appendix 1: Proofs of Propositions 4 and 7

For simplicity, set K = 2 and L it = 3 so that there is only one included explanatory variable and one omitted regressor in (10.2). In this simpler case, the equation \( {Y_{\mathit{it} }}=\alpha_{0\mathit{it}}^{*}+\alpha_{1\mathit{it}}^{*}x_{1\mathit{it}}^{*}+\ldots +\alpha_{{{L_{\mathit{it} }},it}}^{*}x_{{{L_{\mathit{it} }},it}}^{*} \) of Section 10.2.2 becomes

$$ {Y_{\mathit{it} }}=\alpha_{0\mathit{it}}^{*}+\alpha_{1\mathit{it}}^{*}x_{1\mathit{it}}^{*}+{u_{\mathit{it} }} $$
(10.20)

where following the usual econometric practice the error term \( {u_{\mathit{it} }} \) represents \( \alpha_{2\mathit{it}}^{*}x_{2\mathit{it}}^{*} \), the net effect of omitted regressor on \( {Y_{\mathit{it} }} \) (Greene 2008, p. 9). Equation (10.20) is adequate to prove Propositions 4 and 7.

Let us make a change in the slope coefficient of (10.20) and the offsetting change in its error term such that (10.20) and its dependent and independent variables remain unchanged by mathematical necessity. This calls for the operations of adding and subtracting \( \alpha_{2\mathit{it}}^{*}x_{1\mathit{it}}^{*} \) on the right-hand side of the equality sign in (10.20). These operations give

$$ {Y_{\mathit{it} }}=\alpha_{0\mathit{it}}^{*}+(\alpha_{1\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*})x_{1\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*}(x_{2\mathit{it}}^{*}-x_{1\mathit{it}}^{*}) $$
(10.21)

Equations (10.20) and (10.21) are the same but have different coefficients and error terms. The omitted variable in (10.21) is (\( x_{2\mathit{it}}^{*}-x_{1\mathit{it}}^{*} \)), which is different from that in (10.20), the coefficient of \( x_{1\mathit{it}}^{*} \) in (10.21) is \( (\alpha_{1\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*}) \), which is different from that in (10.20), and the error term in (10.21) is \( \alpha_{2\mathit{it}}^{*}(x_{2\mathit{it}}^{*}-x_{1\mathit{it}}^{*}) \), which is different from that in (10.20). However, the dependent variable and the included regressor of (10.21) are the same as those of (10.20). From these results and our definition of uniqueness given in Section 10.2.2 it follows that the coefficient of \( x_{1\mathit{it}}^{*} \), omitted variable, and the error term in (10.20) are not unique. It is incorrect to say “the” omitted variable when referring to a nonunique omitted variable. It is also incorrect to say that a regression equation with nonunique coefficients and error term is a real-world relationship, since such relationships cannot have nonunique coefficients and error term (see Basmann 1988, p. 73).

Even though (10.20) and (10.21) are the same, they can have different implications for the correlation between \( x_{1\mathit{it}}^{*} \) and \( {u_{\mathit{it} }} \). If \( {u_{\mathit{it} }} \) is equal to \( \alpha_{2\mathit{it}}^{*}x_{2\mathit{it}}^{*} \) and the coefficient of \( x_{1\mathit{it}}^{*} \) is equal to \( \alpha_{1\mathit{it}}^{*} \) as in (10.20), then \( {u_{\mathit{it} }} \) may or may not be independent of \( x_{1\mathit{it}}^{*} \). On the other hand, if \( {u_{\mathit{it} }} \) is equal to \( \alpha_{2\mathit{it}}^{*}(x_{2\mathit{it}}^{*}-x_{1\mathit{it}}^{*}) \) and the coefficient of \( x_{1\mathit{it}}^{*} \) is equal to \( (\alpha_{1\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*}) \) as in (10.21), then \( {u_{\mathit{it} }} \) is clearly not independent of \( x_{1\mathit{it}}^{*} \). One cannot know a priori which one of the omitted variables in (10.20) and (10.21) \( {u_{\mathit{it} }} \) represents, since the coefficient of \( x_{1\mathit{it}}^{*} \) is unknown. Without knowing whether \( {u_{\mathit{it} }} \) is equal to \( \alpha_{2\mathit{it}}^{*}x_{2\mathit{it}}^{*} \) or \( \alpha_{2\mathit{it}}^{*}(x_{2\mathit{it}}^{*}-x_{1\mathit{it}}^{*}) \), one cannot make the right assumption about the correlation between \( {u_{\mathit{it} }} \) and \( x_{1\mathit{it}}^{*} \). This difficulty arises as the direct consequence of the nonuniqueness of \( {u_{\mathit{it} }} \) and \( \alpha_{1\mathit{it}}^{*} \) in (10.20). Because of this nonuniqueness, (10.20) is not a correctly specified model. This completes the proof of Proposition 4. ∎

These difficulties can be avoided by changing the interpretation of \( {u_{\mathit{it} }} \) in (10.20), as we now show. If \( x_{2\mathit{it}}^{*} \) is the symbol used to denote nonunique omitted variable, then we assume that

$$ x_{2\mathit{it}}^{*}=\lambda_{20\mathit{it}}^{*}+\lambda_{21\mathit{it}}^{*}x_{1\mathit{it}}^{*}$$
(10.22)

which is a special case of the equation \( x_{\mathit{git}}^{*}=\lambda_{g0\mathit{it}}^{*}+\lambda_{g1\mathit{it}}^{*}x_{1\mathit{it}}^{*}+\ldots +\lambda_{g,K-1,\mathit{it}}^{*}x_{K-1,it}^{*} \) of Section 10.2.2. Replacing \( x_{2\mathit{it}}^{*} \) in (10.20) by the right-hand side of (10.22) gives

$$ \begin{array}{llll}{Y_{\mathit{it} }}&=\alpha_{0\mathit{it}}^{*}+\alpha_{1\mathit{it}}^{*}x_{1\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*}(\lambda_{20\mathit{it}}^{*}+\lambda_{21\mathit{it}}^{*}x_{1\mathit{it}}^{*})\hfill\\&=\alpha_{0\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*}\lambda_{20\mathit{it}}^{*}+(\alpha_{1\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*}\lambda_{21\mathit{it}}^{*})x_{1\mathit{it}}^{*} \end{array}$$
(10.23)

where \( \alpha_{2\mathit{it}}^{*}\lambda_{20\mathit{it}}^{*} \) is the error term. This is a special case of (10.4) of Section 10.2.2.

If \( (x_{2\mathit{it}}^{*}-x_{1\mathit{it}}^{*}) \) is the symbol used to denote nonunique omitted variable, then the appropriate form of (10.22) is

$$ (x_{2\mathit{it}}^{*}-x_{1\mathit{it}}^{*})=\lambda_{20\mathit{it}}^{*}-x_{1\mathit{it}}^{*}+\lambda_{21\mathit{it}}^{*}x_{1\mathit{it}}^{*}$$
(10.24)

Substituting the expression on the right-hand side of the equality sign in (10.24) for \( (x_{2\mathit{it}}^{*}-x_{1\mathit{it}}^{*}) \) in (10.21) gives

$$ \begin{array}{llll} {Y_{\mathit{it} }}&=\alpha_{0\mathit{it}}^{*}+(\alpha_{1\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*})x_{1\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*}(\lambda_{20\mathit{it}}^{*}-x_{1\mathit{it}}^{*}+\lambda_{21\mathit{it}}^{*}x_{1\mathit{it}}^{*})\hfill\\&=\alpha_{0\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*}\lambda_{20\mathit{it}}^{*}+(\alpha_{1\mathit{it}}^{*}+\alpha_{2\mathit{it}}^{*}\lambda_{21\mathit{it}}^{*})x_{1\mathit{it}}^{*}\end{array} $$
(10.25)

This equation is exactly the same as (10.23). Thus, in the presence of (10.22), the coefficients and error term of (10.23) are unique in our sense of uniqueness made clear in Section 10.2.2. This is a property of all real-world relationships or all correctly specified models. Equation (10.22) is needed to correctly specify (10.20). This completes the proof of Proposition 7. The proofs given in this section can be extended to any K and \( {L_{\mathit{it} }} \). ∎

Appendix 2: Derivation of the MSE of (10.16)

Let \( {x_{\mathit{it} }} \) be the 2-vector \( (1,{{\hat{Y}}_{2\mathit{it} }}{)}^{\prime} \), \( {z_{\mathit{it} }} \) be the 5-vector \( (1,{z_{1\mathit{it}}},\ldots,{z_{4\mathit{it} }}{)}^{\prime} \), \( {\Pi_t} \) be the (2 × 5) matrix having \( ({\pi_{10t }},{\pi_{11t }},{\pi_{12t }},0,0) \) and \( ({\pi_{20t }},0,0,{\pi_{23t }},{\pi_{24t }}) \) as its first and second rows, respectively, and \( {\zeta_{\mathit{it} }} \) be the 2-vector \( ({\zeta_{1\mathit{it}}},{\zeta_{2\mathit{it} }}{)}^{\prime} \), where prime indicates transposition. Using these definitions the m equations in (10.15) with the restrictions \( {\pi_{13t }}={\pi_{14t }}=0 \) and \( {\pi_{21t }}={\pi_{22t }}=0 \) are written

$$ {{\hat{Y}}_{1t }}={X_{\mathit{zt} }}\pi_t^{\mathit{Long} }+{D_{\mathit{xt} }}{\zeta_t} $$
(10.26)

where \( {{\hat{Y}}_{1t }} \) is the m-vector \( ({{\hat{Y}}_{11t }},\,\ldots\,{{\hat{Y}}_{1\mathit{mt} }}{)}^{\prime} \), \( {X_{\mathit{zt} }} \) is the (m × 10) matrix having a Kronecker product between \( {{z^{\prime}}_{\mathit{it} }} \) and \( {{x^{\prime}}_{\mathit{it} }} \), denoted by \( ({{z^{\prime}}_{\mathit{it} }}\otimes {{x^{\prime}}_{\mathit{it} }}) \), as its ith row, the rank of X zt is 10, \( \pi_t^{\mathit{Long} } \) is the 10-vector given by a column stack of \( {\Pi_t} \), \( {D_{\mathit{xt} }} \) is the (m × 2m) matrix diag[\( {{x^{\prime}}_{1t }}\,\ldots\,{{x^{\prime}}_{\mathit{mt} }} \)] of rank m, \( {\zeta_t} \) is the 2m-vector \( ({{\zeta^{\prime}}_{1t }}\,\ldots\,{{\zeta^{\prime}}_{\mathit{mt} }}{)}^{\prime} \). The zero restrictions on the elements of \( \pi_t^{\mathit{Long} } \) are stated as \( \mathrm{R}\pi_t^{\mathit{Long} }=0 \) where R is the 4 × 10 matrix of full row rank having ones as its (1, 4)-th, (2, 6)-th, (3, 7)-th, and (4, 9)-th elements and zeros elsewhere, and 0 is the 4-vector of zeros.

Now a (6 × 10) matrix C of full row rank can be found such that \( R{C}^{\prime}=0 \). Under Assumptions V and VI, \( {E_m}({D_{\mathit{xt} }}{\zeta_t}|{X_{\mathit{zt} }})=0 \) and \( {V_m}({D_{\mathit{xt} }}{\zeta_t}|{X_{\mathit{zt} }})={D_{\mathit{xt} }}({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}={\Sigma_{{{\omega_t}}}} \) where \( {I_m} \) is the \( m \times m \) identity matrix, and \( {\omega_t} \) is the 3-vector having \( \sigma_{{\zeta t}}^2 \) times each of the distinct elements of \( {\Delta_{{\zeta t}}} \) as its elements.

Identification: The coefficient vector \( \pi_t^{\mathit{Long} } \) is identifiable if \( {X_{\mathit{zt} }} \) has full column rank. The error vector \( {\zeta_t} \) is unidentifiable because \( {D_{\mathit{xt} }} \) does not have full column rank. This result implies that \( {\zeta_t} \) is not consistently estimable (see Lehmann and Casella 1998, p. 57). Coefficient drivers are used in (10.13) and (10.14) to reduce the unidentifiable portions of the coefficients of (10.12). However, \( {D_{\mathit{xt} }}{\zeta_t} \) is identifiable if \( {D_{\mathit{xt} }} \) has full row rank. The EBLUP of \( {D_{\mathit{xt} }}{\zeta_t} \) can be used to obtain a consistent estimator of \( {\omega_t} \).

Case 1.

\( {\omega_t} \) is known and \( {\zeta_t} \) may not be normal

The generalized least squares estimator of \( \pi_t^{\mathit{Long} } \) subject to the restriction R\( \pi_t^{\mathit{Long} }=0 \) is

$$ \hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})={C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C{{X^{\prime}}_{\mathit{zt} }}\Sigma_{{{\omega_t}}}^{-1 }{{\hat{Y}}_{1t }} $$
(10.27)

where the subscript R of \( \hat{\pi}_{\mathit{tR}}^{\mathit{Long} } \) is shorthand for “restricted,” \( {\varPsi_{{{\omega_t}}}}={{({{X^{\prime}}_{\mathit{zt} }}\Sigma_{{{\omega_t}}}^{-1 }{X_{\mathit{zt} }})}^{-1 }} \), and use is made of the identity \( {C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C={\Psi_{{{\omega_t}}}}\,\hbox{-}\,{\Psi_{{{\omega_t}}}}{R}^{\prime}{{(R{\Psi_{{{\omega_t}}}}{R}^{\prime})}^{-1 }}R{\Psi_{{{\omega_t}}}} \) in C. R. Rao (1973, p.77, Problem 33). Estimator (10.27) is unbiased with the model covariance matrix

$$ {V_m}(\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})|{X_{\mathit{zt} }})={C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C. $$
(10.28)

The BLUP of \( {\zeta_t} \) is

$$ {{\hat{\zeta}}_{\mathit{tR} }}({\omega_t})=({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1 }{M_{{{\omega_t}}}}{{\hat{Y}}_{1t }} $$
(10.29)

where \( {M_{{{\omega_t}}}}={I_m}\,\hbox{-}\,{X_{\mathit{zt} }}{C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C{{X^{\prime}}_{\mathit{zt} }}\Sigma_{{{\omega_t}}}^{-1 } \). The matrix \( {M_{{{\omega_t}}}} \) is idempotent (though not symmetric) with the property that \( {M_{{{\omega_t}}}}{X_{\mathit{zt} }}{C}^{\prime}=0 \). It can be shown that \( {E_m}({{\hat{\zeta}}_{\mathit{tR} }}({\omega_t})|{X_{\mathit{zt} }})=0 \), \( Co{{\mathrm{v}}_m}[(\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t}),{{\hat{\zeta}}_{\mathit{tR} }}({\omega_t}))|{X_{\mathit{zt} }}]=0 \), and \( {V_m}({{\hat{\zeta}}_{\mathit{tR} }}({\omega_t})|{X_{\mathit{zt} }})=({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1 }{M_{{{\omega_t}}}}{D_{\mathit{xt} }}\times ({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}) \) because \( {M_{{{\omega_t}}}}{\Sigma_{{{\omega_t}}}}{{M^{\prime}}_{{{\omega_t}}}}={M_{{{\omega_t}}}}{\Sigma_{{{\omega_t}}}} \).

Rewrite (10.16) as \( \hat{Y}_{1\mathit{it}}^{*}({\omega_t})={{\hat{Y}}_{1\mathit{it}}}-{{\hat{\alpha}}_{1\mathit{it}}}({\omega_t})={{\hat{Y}}_{1\mathit{it}}}-({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_1})\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})-{{j^{\prime}}_i}{D_{{{l_1}}}}{{\hat{\zeta}}_{\mathit{tR} }}({\omega_t}) \), where \( {j_i} \) is the m-vector having 1 as its ith element and zeros elsewhere, \( {l_1} \) is the 2-vector \( (1,0{)}^{\prime} \) and \( {D_{{{l_1}}}} \) is the m × 2m matrix \( ({I_m}\otimes {{l^{\prime}}_1}) \).

The MSE of \( \hat{Y}_{1\mathit{it}}^{*}({\omega_t}) \) is

$$ \begin{array}{llll}{E_m}[{{\{\hat{Y}_{1\mathit{it}}^{*}({\omega_t})-{Y_{\mathit{it} }}\}}^2}|{X_{\mathit{zt} }}]&={E_m}[{{\{{{\hat{Y}}_{1\mathit{it}}}-{{\hat{\alpha}}_{1\mathit{it}}}({\omega_t})-{{\hat{Y}}_{1\mathit{it}}}+{\alpha_{1\mathit{it}}}\}}^2}|{X_{\mathit{zt} }}]\hfill\\&={g_1}({\omega_t})+{g_2}({\omega_t}) \end{array}$$
(10.30)

where

$$ \begin{array}{llll}{g_1}({\omega_t})&\,{=}\,{{j^{\prime}}_i}{D_{{{l_1}}}}[({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}})\,{-}\,({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1 }{D_{\mathit{xt} }}({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}})]{{D^{\prime}}_{{{l_1}}}}{j_i}\hfill\\ &\quad+[{{j^{\prime}}_i}{D_{{{l_1}}}}({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1 }{X_{\mathit{zt} }}-2({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_1})]{C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}\hfill\\ &\qquad C{{X^{\prime}}_{\mathit{zt} }}\Sigma_{{{\omega_t}}}^{-1 }{D_{\mathit{xt} }}({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{{{l_1}}}}{j_i}\hfill\\[-1pc]\end{array} $$
(10.31)

and

$$ {g_2}({\omega_t})=({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_1}){C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C({z_{\mathit{it} }}\otimes {l_1}) $$
(10.32)

which arises as a consequence of using an estimate of \( \pi_t^{\mathit{Long} } \) in place of \( \pi_t^{\mathit{Long} } \).

The gain in efficiency: Suppose that (1) the elements of \( \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}} \) are small in magnitude and (2) \( \hat{Y}_{1\mathit{it}}^{*} \) in (10.16) has the right sign and magnitude that is near the interval between design-based lower and upper confidence limits of \( {y_{1\mathit{it}}} \) so that \( {g_1}({\omega_t}) \) is much smaller than the design variance of \( {e_{1\mathit{it}}} \), and (3) the elements of \( {\Sigma_{{{\omega_t}}}} \) are uniformly bounded and the largest diagonal element of (\( \mathrm{I}\,\hbox{-}\,{M_{{{\omega_t}}}} \)) is 0(m-1), so that \( {g_2}({\omega_t}) \) tends to 0 as \( m\to \infty \) (see Rao 2003, p. 117, (7.1.9), (7.1.10)). Then MSE (10.30) is smaller than the design MSE of \( {e_{1\mathit{it}}} \) for large m.

Case 2.

\( {\omega_t} \) is known and \( {\zeta_t} \) is normal

The best-unbiased predictor (BUP) of \( {\zeta_t} \) is

$$ {E_m}({\zeta_t}|{M_{{{\omega_t}}}}{{\hat{Y}}_{1t }})=({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}{{M^{\prime}}_{{{\omega_t}}}}{{({M_{{{\omega_t}}}}{\Sigma_{{{\omega_t}}}}{{M^{\prime}}_{{{\omega_t}}}})}^{\hbox{-}}}{M_{{{\omega_t}}}}{{\hat{Y}}_{1t }} $$
(10.33)

where \( {{({M_{{{\omega_t}}}}{\Sigma_{{{\omega_t}}}}{{M^{\prime}}_{{{\omega_t}}}})}^{\hbox{-}}} \) is a generalized inverse of \( {M_{{{\omega_t}}}}{\Sigma_{{{\omega_t}}}}{{M^{\prime}}_{{{\omega_t}}}} \) defined in C. R. Rao (1973, p. 24). The right-hand side of (10.33) is equal to the BLUP in (10.29). Thus, when \( {\zeta_t} \) is normal, its BLUP is the same as its BUP.

Let A be a \( m\times (m-6) \) matrix of full column rank such that \( {A}^{\prime}{X_{\mathit{zt} }}{C}^{\prime}=0 \). Then

$$ {E_m}({\zeta_t}|{A}^{\prime}{{\hat{Y}}_{1t }})=({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}A{{({A}^{\prime}{\varSigma_{{{\omega_t}}}}A)}^{-1 }}{A}^{\prime}{{\hat{Y}}_{1t }} $$
(10.34)

It follows from C. R. Rao (1973, p. 77, Problem 33) that \( A{{({A}^{\prime}{\Sigma_{{{\omega_t}}}}A)}^{-1 }}{A}^{\prime}+\Sigma_{{{\omega_t}}}^{-1 }{X_{\mathit{zt} }}{C}^{\prime}{{(C{{X^{\prime}}_{\mathit{zt} }}\Sigma_{{{\omega_t}}}^{-1 }{X_{\mathit{zt} }}{C}^{\prime})}^{-1 }}C{{X^{\prime}}_{\mathit{zt} }}\Sigma_{{{\omega_t}}}^{-1 }=\Sigma_{{{\omega_t}}}^{-1 } \). Inserting this identity into (10.34) shows that (10.34) is equal to the BLUP of \( {\zeta_t} \).

Case 3.

\( {\omega_t} \) is unknown and \( {\zeta_t} \) is normal

We use \( {A}^{\prime}{{\hat{Y}}_{1t }} \) to estimate \( \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}} \). Therefore, our estimator, denoted by \( \hat{\sigma}_{{\zeta t}}^2{{\hat{\Delta}}_{{\zeta t}}} \), of \( \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}} \) is a quadratic function of \( {A}^{\prime}{{\hat{Y}}_{1t }} \) or \( {M_{{{\omega_t}}}} \) \( {D_{\mathit{xt} }}{\zeta_t} \). Let \( {{\hat{\omega}}_t} \) be the 3-vector having the distinct elements of \( \hat{\sigma}_{{\zeta t}}^2{{\hat{\Delta}}_{{\zeta t}}} \) as its elements. Then \( {{\hat{\omega}}_t} \), viewed as a function of the right-hand side of (10.26), does not depend on \( \pi_t^{\mathit{Long} } \) and is an even function of \( {D_{\mathit{xt} }}{\zeta_t} \). Therefore, we can assume that

Assumption VII.

\( {{\hat{\omega}}_t} \) does not depend on \( \pi_t^{\mathit{Long} } \) and is an even function of \( {D_{\mathit{xt} }}{\zeta_t} \).

Estimator (10.16) can be written as \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})={{\hat{Y}}_{1\mathit{it}}}\,\hbox{-}\,{{\hat{\alpha}}_{1\mathit{it}}}({{\hat{\omega}}_t})={{\hat{Y}}_{1\mathit{it}}}\,\hbox{-}\,({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_1})\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({{\hat{\omega}}_t})\,\hbox{-}\,{{j^{\prime}}_i}{D_{{{l_1}}}}{{\hat{\zeta}}_{\mathit{tR} }}({{\hat{\omega}}_t}) \). The MSE of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \) is

$$ \begin{array}{llll} {E_m}({{[\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-{Y_{\mathit{it} }}]}^2}|{X_{\mathit{zt} }})&={E_m}({{[\hat{Y}_{1\mathit{it}}^{*}({\omega_t})-{Y_{\mathit{it} }}]}^2}|{X_{\mathit{zt} }})\hfill\\&\quad+{E_m}({{[\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-\hat{Y}_{1\mathit{it}}^{*}({\omega_t})]}^2}|{X_{\mathit{zt} }})\hfill\\&\quad+2{E_m}([\hat{Y}_{1\mathit{it}}^{*}({\omega_t})-{Y_{\mathit{it} }}][\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-\hat{Y}_{1\mathit{it}}^{*}({\omega_t})]|{X_{zt }})\end{array} $$
(10.35)

In (10.30), we have already evaluated the first term on the right-hand side of this equation. To show that the third term on the right-hand side of (10.35) vanishes, we first note that \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})\,\hbox{-}\,\hat{Y}_{1\mathit{it}}^{*}({\omega_t})\,{=}\,-{{\hat{\alpha}}_{1\mathit{it}}}({{\hat{\omega}}_t})\,{+}\,{{\hat{\alpha}}_{1\mathit{it}}}({\omega_t})\,{=}\,{-}({{{z^{\prime}}}_{\mathit{it} }}\otimes {{{l^{\prime}}}_1})\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({{\hat{\omega}}_t})\,\hbox{-}\,{{{j^{\prime}}}_i}{D_{{{l_1}}}}{{\hat{\zeta}}_{\mathit{tR} }}({{\hat{\omega}}_t})\,{+}\,({{{z^{\prime}}}_{\mathit{it} }}\otimes {{{l^{\prime}}}_1})\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})\,{+}\,{{{j^{\prime}}}_i}{D_{{{l_1}}}}{{\hat{\zeta}}_{\mathit{tR} }}({\omega_t})\,{=}\,({{{z^{\prime}}}_{\mathit{it} }}\,{\otimes}\,{{{l^{\prime}}}_1})\) \( \{{C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C{{{X^{\prime}}}_{zt }}\Sigma_{{{\omega_t}}}^{-1}\,\hbox{-}\,{C}^{\prime}{{(C\Psi_{{{{\hat{\omega}}_t}}}^{-1 }{C}^{\prime})}^{-1 }}C{{{X^{\prime}}}_{zt }}\Sigma_{{{{\hat{\omega}}_t}}}^{-1}\}\{{{\hat{Y}}_{1t }}\,\hbox{-}\,{X_{\mathit{zt} }}{C}^{\prime} {{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}\) \( C\Psi_{{{\omega_t}}}^{-1}\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})\}+{{{j^{\prime}}}_i}{D_{{{l_1}}}}[({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{{D^{\prime}}}_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1}\{{{\hat{Y}}_{1t }}-{X_{\mathit{zt} }}\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})\}\,\hbox{-}\,({I_m}\otimes \hat{\sigma}_{{\zeta t}}^2{{\hat{\Delta}}_{{\zeta t}}}){{{D^{\prime}}}_{\mathit{xt} }}\Sigma_{{{{\hat{\omega}}_t}}}^{-1}\{{{\hat{Y}}_{1t }}-{X_{\mathit{zt} }}\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({{\hat{\omega}}_t})\}] \) is a function of \( {A}^{\prime}{{\hat{Y}}_{1t }} \) because \( {{\hat{\omega}}_t} \), \( {{\hat{Y}}_{1t }}\,\hbox{-}\,{X_{\mathit{zt} }}{C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C\Psi_{{{\omega_t}}}^{-1}\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t}) \), \( {{\hat{Y}}_{1t }}\,\hbox{-}\,{X_{\mathit{zt} }}\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t}) \), and \( {{\hat{Y}}_{1t }}\,\hbox{-}\,{X_{\mathit{zt} }}\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({{\hat{\omega}}_t}) \) are all functions of \( {A}^{\prime}{{\hat{Y}}_{1t }} \). Furthermore, in the equation \( \hat{Y}_{1\mathit{it}}^{*}({\omega_t})-{Y_{\mathit{it} }}=-({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_1})[\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})-\pi_t^{\mathit{Long} }]-{j_i}{D_{{{l_1}}}}[{{\hat{\zeta}}_{tR }}({\omega_t})-{\zeta_t}] \), the first term on its right-hand side is independent of \( {A}^{\prime}{{\hat{Y}}_{1t }} \) because of the condition that \( {A}^{\prime}{X_{zt }}{C}^{\prime}=0 \) (see Swamy and Mehta 1969) and the second term on its right-hand side can be shown to be equal to \( {{j^{\prime}}_i}{D_{{{l_1}}}}[{E_m}({\zeta_t}|{A}^{\prime}{{\hat{Y}}_{1t }})-{\zeta_t}] \) using the result in (10.34). Hence, the third term on the right-hand side of (10.35) is equal to \( 2\times ({E_m}[\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-\hat{Y}_{1\mathit{it}}^{*}({\omega_t})]|{X_{zt }})({E_m}[-({{{z^{\prime}}}_{\mathit{it} }}\otimes {{{l^{\prime}}}_1})\{\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})-\pi_t^{Long}\}|{A}^{\prime}{{\hat{Y}}_{1\mathit{it}}},{X_{zt }}])+ 2\times ({E_m}[\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-\hat{Y}_{1\mathit{it}}^{*}({\omega_t})]|{X_{zt }})[-{{{j^{\prime}}}_i}{D_{{{l_1}}}}({E_m}\{{E_m}({\zeta_t}|{A}^{\prime}{{\hat{Y}}_{1t }})-{\zeta_t}\}|{A}^{\prime}{{\hat{Y}}_{1t }},{X_{zt }})]) \), which vanishes.

Because of the second term on the right-hand side of (10.35), the MSE of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \) is always larger than that of \( \hat{Y}_{1\mathit{it}}^{*}({\omega_t}) \) in the normal case. The MSE of \( \hat{Y}_{1\mathit{it}}^{*}({\omega_t}) \) understates the MSE of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \). Unfortunately, the exact evaluation of the second term on the right-hand side of (10.35) is generally not possible except in some special cases, as Rao (2003, p. 103) has pointed out. It is therefore necessary to find an approximation to the MSE of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \). We begin the derivation of such an approximation imposing sufficient regularity conditions on \( {{\hat{\omega}}_t} \), \( {X_{zt }} \), and \( {\Sigma_{{{\omega_t}}}} \) to insure the validity of an expansion of \( {{\hat{\alpha}}_{1\mathit{it}}}({{\hat{\omega}}_t}) \) about \( {{\hat{\alpha}}_{1\mathit{it}}}({\omega_t}) \) with bounded coefficients (see Lehmann and Casella 1998, p. 430, Theorem 1.1). Using a Taylor approximation, we obtain

$$ {{\hat{\alpha}}_{1\mathit{it}}}({{\hat{\omega}}_t})-{{\hat{\alpha}}_{1\mathit{it}}}({\omega_t})\approx d({\omega_t})\prime ({{\hat{\omega}}_t}-{\omega_t}) $$
(10.36)

where \( d({\omega_t})=\partial {{\hat{\alpha}}_{1\mathit{it}}}({\omega_t})/\partial {\omega_t} \) and it is assumed that the terms involving higher powers of \( {{\hat{\omega}}_t}-{\omega_t} \) are of lower order relative to \( d({\omega_t})\prime ({{\hat{\omega}}_t}-{\omega_t}) \); for an appropriate restriction on the remainder term of (10.36), see Rothenberg (1984, p. 817). Let \( b{\prime_{1t }}=j{\prime_i}{D_{{{l_1}}}}({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}})D{\prime_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1 } \). Then \( {{\hat{\alpha}}_{1\mathit{it}}}({\omega_t})=[(z{\prime_{\mathit{it} }}\otimes l{\prime_1})-b{\prime_{1t }}{X_{zt }}][\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})-\pi_t^{\mathit{Long} }]+(z{\prime_{\mathit{it} }}\otimes l{\prime_1})\pi_t^{\mathit{Long} }+b{\prime_{1t }}({{\hat{Y}}_{1\mathit{it}}}-{X_{zt }}\pi_t^{\mathit{Long} }) \). Under normality,

$$ d({\omega_t})\approx (\partial b{\prime_{1t }}/\partial {\omega_t})({{\hat{Y}}_{1t }}-{X_{zt }}\pi_t^{\mathit{Long} })={d^{*}}({\omega_t}) $$
(10.37)

since the terms involving the derivatives of \( \hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})\,\hbox{-}\,\pi_t^{\mathit{Long} } \) with respect to \( {\omega_t} \) are of lower order. Therefore,

$$ \begin{array}{llll} {E_m}{{[d({\omega_t})\prime ({{\hat{\omega}}_t}-{\omega_t})]}^2}\approx {E_m}{{[{d^{*}}({\omega_t})\prime ({{\hat{\omega}}_t}-{\omega_t})]}^2}\approx \mathrm{tr}[{E_m}({d^{*}}({\omega_t}){d^{*}}({\omega_t})\prime )\bar{V}({{\hat{\omega}}_t})] \hfill \\ =\mathrm{tr}[\partial b{\prime_{1t }}/\partial {\omega_t}){\varSigma_{{{\omega_t}}}}(\partial b{\prime_{1t }}/\partial {\omega_t})\prime \bar{V}({{\hat{\omega}}_t})]={g_3}({\omega_t}) \hfill \\ \end{array} $$
(10.38)

where\( \bar{V}({{\hat{\omega}}_t}) \) is the asymptotic covariance matrix of \( {{\hat{\omega}}_t} \), and the neglected terms are of lower order. It now follows from (10.36), (10.37), and (10.38) that

$$ {E_m}({{[\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-\hat{Y}_{1\mathit{it}}^{*}({\omega_t})]}^2}|{X_{zt }})={E_m}({{[{{\hat{\alpha}}_{1\mathit{it}}}({{\hat{\omega}}_t})-{{\hat{\alpha}}_{1\mathit{it}}}({\omega_t})]}^2}|{X_{zt }})\approx {g_3}({\omega_t}) $$
(10.39)

Inserting (10.30) and (10.39) into (10.35) gives a second-order approximation to the MSE of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \),

$$ {E_m}({{[\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-{Y_{\mathit{it} }}]}^2}|{X_{zt }})\approx {g_1}({\omega_t})+{g_2}({\omega_t})+{g_3}({\omega_t}) $$
(10.40)

where the terms, \( {g_2}({\omega_t}) \) and \( {g_3}({\omega_t}) \), arise as a direct consequence of using the estimators of \( \pi_t^{\mathit{Long} } \) and \( {\omega_t} \) rather than their true values in \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \), respectively. Lemma 1.14 in Lehmann and Casella (1998, p. 437) shows that the mean square error of the limiting distribution of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})\,\hbox{-}\,{Y_{\mathit{it} }} \) is less than or equal to \( {\lim_{{_{{m\to \infty }}}}}{E_m}({{[\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-{Y_{\mathit{it} }}]}^2}|{X_{zt }}) \), if it exists. The limiting distribution of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})\,\hbox{-}\,{Y_{\mathit{it} }} \) is derived in Appendix 6 below.

Appendix 3: Estimation of the MSE of (10.16)

It follows from Rao (2003, p. 104) that \( {E_m}{g_2}({{\hat{\omega}}_t})\approx {g_2}({\omega_t}) \) and \( {E_m}{g_3}({{\hat{\omega}}_t})\approx {g_3}({\omega_t}) \) to the desired order of approximation, but \( {g_1}({{\hat{\omega}}_t}) \) is usually a biased estimator of \( {g_1}({\omega_t}) \) with a bias that is generally of the same order as \( {g_2}({\omega_t}) \) and \( {g_3}({\omega_t}) \). To evaluate this bias, we make all the assumptions that permit a Taylor expansion of \( {g_1}({{\hat{\omega}}_t}) \) about \( {g_1}({\omega_t}) \) with bounded coefficients (see Lehmann and Casella 1998, p. 430). Under these assumptions,

$$ {g_1}({{\hat{\omega}}_t})={g_1}({\omega_t})+({{\hat{\omega}}_t}-{\omega_t}{)}^{\prime}\nabla {g_1}({\omega_t})+\frac{1}{2}({{\hat{\omega}}_t}-{\omega_t}{)}^{\prime}{\nabla^2}{g_1}({\omega_t})({{\hat{\omega}}_t}-{\omega_t}) $$
(10.41)

where \( \nabla {g_1}({\omega_t}) \) is the vector of first-order derivatives of \( {g_1}({\omega_t}) \) with respect to \( {\omega_t} \) and \( {\nabla^2}{g_1}({\omega_t}) \) is the matrix of second-order derivatives of \( {g_1}({\omega_t}) \) with respect to \( {\omega_t} \). The estimator \( {{\hat{\omega}}_t} \) is generally a biased estimator of \( {\omega_t} \) and hence the model expectation of the second term on the right-hand side of (10.41) is generally nonzero. Consequently,

$$ {E_m}{g_1}({{\hat{\omega}}_t})\approx {g_1}({\omega_t})+{E_m}({{\hat{\omega}}_t}-{\omega_t}{)}^{\prime}\nabla {g_1}({\omega_t})+\frac{1}{2}\mathrm{tr}[{\nabla^2}{g_1}(\omega_t)\bar{V}({{\hat{\omega}}_t})] $$
(10.42)

If \( {\Sigma_{{{\omega_t}}}} \) has a linear structure, then (10.42) reduces to

$$ {E_m}{g_1}({{\hat{\omega}}_t})\approx {g_1}({\omega_t})+{E_m}({{\hat{\omega}}_t}-{\omega_t}{)}^{\prime}\nabla {g_1}({\omega_t})-{g_3}({\omega_t}). $$
(10.43)

This result shows that an estimator of the MSE of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \) to the desired order of approximation is given by

$$ {g_1}({{\hat{\omega}}_t}) - \mathrm{estimate}\ \mathrm{of}\ [{E_m}({{\hat{\omega}}_t}-{\omega_t}{)}^{\prime}\nabla {g_1}({\omega_t})]+{g_2}({{\hat{\omega}}_t})+2{g_3}({{\hat{\omega}}_t}) $$
(10.44)

The model expectation of (10.44) is approximately equal to the MSE of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \). The second term in (10.44) can be ignored if it is of lower order than \( \hbox{-}{g_3}({\omega_t}) \).

Appendix 4: Derivation of the MSE of (10.17)

When \( {\omega_t} \) is known, estimator (10.17) of \( {Y_{\mathit{it} }} \) in (10.1) can be written as \( {{\hat{Y}}_{\mathit{it} }}({\omega_t})=[({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_2})\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})+{{j^{\prime}}_i}{D_{{{l_2}}}}{{\hat{\zeta}}_{tR }}({\omega_t})]{y_{2\mathit{it} }} \), where \( {l_2} \) is the 2-vector (0,1)′, \( {j_i} \) is the m-vector having 1 as its ith element and zeros elsewhere, and \( {D_{{{l_2}}}} \) is the (\( m\times 2m \)) matrix, \( ({I_m}\otimes {{l^{\prime}}_2}) \). The MSE of (10.17) is

$$ {E_m}[{{\{{{\hat{Y}}_{\mathit{it} }}({\omega_t})-{Y_{\mathit{it} }}\}}^2}|{X_{zt }}]={E_m}[{{\{{{\hat{\alpha}}_{2\mathit{it} }}({\omega_t})-{\alpha_{2\mathit{it} }}\}}^2}{{({y_{2\mathit{it} }})}^2}|{X_{zt }}]={f_1}({\omega_t})+{f_2}({\omega_t}) $$
(10.45)

where

$$ \begin{array}{llll}{f_1}({\omega_t})&={{j^{\prime}}_i}{D_{{{l_2}}}}\{({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}})-({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1 }{D_{\mathit{xt} }}({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}})\}\hfill\\&\quad{{D^{\prime}}_{{{l_2}}}}{j_i}{{({y_{2\mathit{it} }})}^2}+\{{{j^{\prime}}_i}{D_{{{l_2}}}}\times ({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1 }{X_{zt }}-2({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_2})\}\hfill\\&\quad{C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C{{X^{\prime}}_{zt }}\Sigma_{{{\omega_t}}}^{-1 }{D_{\mathit{xt} }}({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{{{l_2}}}}{j_i}{{({y_{2\mathit{it} }})}^2}\hfill\\[-1pc]\end{array} $$
(10.46)

and

$$ {f_2}({\omega_t})=({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_2}){C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C({z_{\mathit{it} }}\otimes {l_2}){{({y_{2\mathit{it} }})}^2} $$
(10.47)

A second-order approximation to the MSE of (10.17) based on \( {{\hat{\omega}}_t} \) is

$$ {E_m}({{[{{\hat{Y}}_{\mathit{it} }}({{\hat{\omega}}_t})-{Y_{\mathit{it} }}]}^2}|{X_{zt }})\approx {f_1}({\omega_t})+{f_2}({\omega_t})+{f_3}({\omega_t}) $$
(10.48)

where \( {f_3}({\omega_t})\,{=}\,\mathrm{tr}[(\partial {{b^{\prime}}_{2t }}/\partial {\omega_t}){\Sigma_{{{\omega_t}}}}(\partial {{b^{\prime}}_{2t }}/\partial {\omega_t}{)}^{\prime}\bar{V}({{\hat{\omega}}_t})]{{({y_{2\mathit{it} }})}^2} \) with \( {{b^{\prime}}_{2t }}\,{=}\,{{j^{\prime}}_i} {D_{{{l_2}}}}\) \( ({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1 } \).

Appendix 5: Estimation of the MSE of (10.17) based on \( {{\hat{\omega}}_t} \)

An estimator of the MSE of \( {{\hat{Y}}_{\mathit{it} }}({{\hat{\omega}}_t}) \) to the desired order of approximation is given by

$$ {f_1}({{\hat{\omega}}_t}) - \mathrm{estimate}\ \mathrm{of}\ [{E_m}({{\hat{\omega}}_t}-{\omega_t}{)}^{\prime}\nabla {f_1}({\omega_t})]+{f_2}({{\hat{\omega}}_t})+2{f_3}({{\hat{\omega}}_t}) $$
(10.49)

Appendix 6: Approximate Normality of (10.16)

Case 3.

(continued)

Equation (10.36) based on Lehmann and Casella’s (1998, p. 430) Theorem 1.1 is invalid if \( {E_m}({{[\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-{Y_{\mathit{it} }}]}^2}|{X_{zt }}) \) does not exist. In this case, we should not consider the MSE in (10.35) but consider instead the second-order moment of a higher-order asymptotic approximation to the distribution of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \) derived below. It follows from (10.26), (10.27), (10.28), (10.29) and (10.30) and the normality of \( {\zeta_t} \) that the standardized estimator \( \{\hat{Y}_{1\mathit{it}}^{*}({\omega_t})-{Y_{\mathit{it} }}\}/\sqrt{{{g_1}({\omega_t})+{g_2}({\omega_t})}} \) is normal with zero mean and unit variance. A sufficient condition for \( \{\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-{Y_{\mathit{it} }}\}/\sqrt{{{g_1}({\omega_t})+{g_2}({\omega_t})}} \) to be asymptotically normal with zero mean and unit variance is that

$$ \{\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-\hat{Y}_{1\mathit{it}}^{*}({\omega_t})\}/\sqrt{{{g_1}({\omega_t})+{g_2}({\omega_t})}} $$
(10.50)

converges in probability to zero (see Rao 1973, p. 122, (x)(d)). To obtain higher-order asymptotic approximations to the distribution of \( \{\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-{Y_{\mathit{it} }}\}/\sqrt{{{g_1}({\omega_t})+{g_2}({\omega_t})}} \), somewhat stronger assumptions are necessary, as shown by Rothenberg (1984). One of such assumptions is that for i = 1, …, m and fixed t,

Assumption VIII.

The standardized difference (10.50) can be written as \( \frac{{{W_m}}}{{\sqrt{m}}}+\frac{{remainder{1_m}}}{{{m^2}\sqrt{m}}} \) where the restrictions are: \( {W_m} \) possesses bounded moments as m tends to infinity; \( \mathit{remainder}{1_m} \) is stochastically bounded with \( \mathrm{P}{{\mathrm{r}}_m}[|\mathit{remainder}{1_m}|>{{(\log m)}^q}]=o({m^{-2 }}) \) for some constant q.

Following Rothenberg (1984), we impose sufficient regularity conditions on \( {{\hat{\omega}}_t} \), \( {X_{\mathit{zt} }} \), and \( {\varSigma_{{{\omega_t}}}} \) to insure that Assumption VIII is satisfied. Without such conditions (10.36) is not valid. We can write that

$$ \begin{array}{llll}\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})\,\hbox{-}\,\hat{Y}_{1\mathit{it}}^{*}({\omega_t})&=[-({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_1}){C}^{\prime}{{(C\Psi_{{{{\hat{\omega}}_t}}}^{-1 }{C}^{\prime})}^{-1 }}C{{X^{\prime}}_{zt }}\Sigma_{{{{\hat{\omega}}_t}}}^{-1 }\hfill\\&\quad+{{j^{\prime}}_i}{D_{{{l_1}}}}\{({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}\Sigma_{{\omega_t}}^{-1 }\hfill\\&\quad-({I_m}\otimes \hat{\sigma}_{{\zeta t}}^2{{\hat{\Delta}}_{{\zeta t}}}){{D^{\prime}}_{\mathit{xt} }}\Sigma_{{{{\hat{\omega}}_t}}}^{-1}\}]{M_{{{\omega_t}}}}{D_{\mathit{xt} }}{\zeta_t}\end{array} $$
(10.51)

Under Assumption VII and the normality of \( {\zeta_t} \), we have established in (10.35) that (1) \( \hat{Y}_{1\mathit{it}}^{*}({\omega_t})-{Y_{\mathit{it} }}=-({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_1})\{\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})-\pi_t^{Long}\}\,\hbox{-}\,{{j^{\prime}}_i}{D_{{{l_1}}}}\{{E_m}({\zeta_t}|{A}^{\prime}{{\hat{Y}}_{1t }})-{\zeta_t}\} \) is normal with mean zero and variance equal to (10.30), (2) \( -({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_1})[\hat{\pi}_{\mathit{tR}}^{\mathit{Long} }({\omega_t})-\pi_t^{\mathit{Long} }] \) is independent of \( {A}^{\prime}{{\hat{Y}}_{1t }} \) (or \( {M_{{{\omega_t}}}}{D_{\mathit{xt} }}{\zeta_t} \)), (3) each of {\( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})\,\hbox{-}\,\hat{Y}_{1\mathit{it}}^{*}({\omega_t}) \)} and \( {{\hat{\omega}}_t} \) is a function of \( {A}^{\prime}{{\hat{Y}}_{1t }} \) alone, and (4) \( \{\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})\,\hbox{-}\,\hat{Y}_{1\mathit{it}}^{*}({\omega_t})\} \) is uncorrelated with \( \{\hat{Y}_{1\mathit{it}}^{*}({\omega_t})-{Y_{\mathit{it} }}\} \). It follows from these results that both \( \{\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-\hat{Y}_{1\mathit{it}}^{*}({\omega_t})\} \) in (10.51) and \( \sqrt{m}({{\hat{\omega}}_t}-{\omega_t}) \) are asymptotically independent of \( \{\hat{Y}_{1\mathit{it}}^{*}({\omega_t})-{Y_{\mathit{it} }}\} \) if \( \{\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-\hat{Y}_{1\mathit{it}}^{*}({\omega_t})\} \) is asymptotically normal. Equation (10.30) defines \( {g_1}({\omega_t}) \) and \( {g_2}({\omega_t}) \).

Theorem 7.

Under Assumptions V–VIII,

$$ \mathrm{P}{{\mathrm{r}}_m}\left[ {\frac{{\hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t})-{Y_{\mathit{it} }}}}{{{\sigma_m}}}\leq \tau } \right]=\Phi \left( {\tau -\frac{{{\tau^3}-3\tau }}{{24{m^2}}}{\eta_m}} \right)+o({m^{-2 }})$$
(10.52)

where \( \Phi (\cdot ) \) is the distribution function for a standard normal variable, \( {\eta_m} \) is the fourth cumulant of \( {W_m} \), and

$$ \sigma_m^2=\{{g_1}({\omega_t})+{g_2}({\omega_t})\}\left( {1+\frac{{\operatorname{var}{W_m}}}{m}} \right)$$
(10.53)

Proof (Rothenberg 1984). Approximate expressions for \( {\eta_m} \) and \( \sigma_m^2 \) are as follows: For j = 1, 2, 3 and k = 1, 2, 3, let B be the 3 × 3 matrix having

$$\begin{array}{llll} {\beta_{jk }}&=\frac{1}{{{g_1}({\omega_t})+{g_2}({\omega_t})}}\cdot [-({{z^{\prime}}_{\mathit{it} }}\otimes {{l^{\prime}}_1})\{{C}^{\prime}{{(C\Psi_{{{\omega_t}}}^{-1 }{C}^{\prime})}^{-1 }}C{{X^{\prime}}_{zt }}\}\frac{{\partial \Sigma_{{{\omega_t}}}^{-1 }}}{{\partial {\omega_{jt }}}}\hfill\\ &\quad-{{j^{\prime}}_i}{D_{{{l_1}}}}\frac{{\partial ({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}}){{{D^{\prime}}}_{\mathit{xt} }}\Sigma_{{{\omega_t}}}^{-1 }}}{{\partial {\omega_{jt }}}}]\hfill\\ &\quad\times {M_{{{\omega_t}}}}{\Sigma_{{{\omega_t}}}}{{M^{\prime}}_{{{\omega_t}}}}[-\frac{{\partial \Sigma_{{{\omega_t}}}^{-1 }}}{{\partial {\omega_{\mathit{kt} }}}}\{{X_{zt }}{C}^{\prime}{{(C{\Psi_{{{\omega_t}}}}{C}^{\prime})}^{-1 }}C\}({z_{\mathit{it} }}\otimes {l_1})\hfill\\ &\quad-\frac{{\partial \Sigma_{{{\omega_t}}}^{-1 }{D_{\mathit{xt} }}({I_m}\otimes \sigma_{{\zeta t}}^2{\Delta_{{\zeta t}}})}}{{\partial {\omega_{\mathit{kt} }}}}{{D^{\prime}}_{{{l_1}}}}{j_i}]] \end{array}$$

as its (j,k)th element. Suppose that \( \sqrt{m}({{\hat{\omega}}_t}-{\omega_t})={\xi_t}+(\mathrm{remainder}{2_m}/{m^2}) \), where \( {\xi_t} \) has bounded moments and Pr[|remainder2 m |] > (logm)h = o(m−2) for some h, is asymptotically normal with covariance matrix \( \Lambda \). Then

$$ {\eta_m}=6\mathrm{tr}\varLambda B\varLambda B+O({m^{-1 }}),$$
(10.54)
$$ \sigma_m^2=\{{g_1}({\omega_t})+{g_2}({\omega_t})\}[1+\frac{{\mathrm{tr}\varLambda B}}{m}+O({m^{-2 }})] $$
(10.55)

When \( {\zeta_t} \) is not normal, a higher-order asymptotic approximation to the distribution of \( \hat{Y}_{1\mathit{it}}^{*}({{\hat{\omega}}_t}) \) can be found, as in Cavanagh and Rothenberg (1995).

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Swamy, P.A.V.B., Mehta, J.S., Tavlas, G.S., Hall, S.G. (2014). Small Area Estimation with Correctly Specified Linking Models. In: Ma, J., Wohar, M. (eds) Recent Advances in Estimating Nonlinear Models. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8060-0_10

Download citation

Publish with us

Policies and ethics