Generalized Estimator for Population Mean Using Auxiliary Attribute in Stratified Two-Phase Sampling

In sampling theory, the auxiliary information is widely used to increase precision of the estimators when the study variable is correlated with the auxiliary variable. In several practical situations, the auxiliary information is available in the form of the auxiliary attribute(s). In this paper, we proposed a generalized estimator for the estimation of finite population mean using the auxiliary attribute under stratified two-phase sampling. The mathematical expressions of approximate bias and mean square error (MSE) of the proposed estimator are obtained to the first-order. Many special cases of the proposed estimator are also obtained using the known parameters of the auxiliary attribute. The algebraic comparisons of the MSE of the proposed estimator have been made the MSE of the competing estimators. The performance of the proposed estimator is evaluated using real data-sets. The results illustrate the good performance of the proposed estimators and its sub-cases.


Introduction
Simple random sampling is a widely used and routinely applied method for the collection of data from the population of homogeneous characteristics. But it may not be the appropriate technique to estimate the population parameters when the population is heterogeneous in nature. The use of the stratification is the only solution for such situations and stratified random sampling (StRS) is considered one of the best options that apply dissimilar weights to the sampling units drawn from different strata of the population. In survey sampling, almost every field investigator collects the auxiliary information (correlated with the study variable) together with the main variable of interest to increase the precision and consistency of the estimators. Many estimation methods like; ratio, regression, product etc., are available that use the auxiliary information and provide comparatively better results than the estimators which do not use such supporting information.
There are several real-world circumstances in which the supporting information is obtainable in the form of the auxiliary characteristics. For example, (i) the study variable may be the production of a crop and the attribute may be the particular variety of seeds (ii) the study variable may be the amount of milk produced per day and the attribute may be the particular breed of cow (iii) the study variable may be the yield of wheat crop and the attribute may be the particular variety of wheat.

Background and Plan of the Research
Two-phase sampling, first used by 16, happens to be a powerful and cost-effective (economical) technique to generate reliable estimates of the unknown population parameters of the auxiliary variable(s) in a first-phase sample. It is particularly useful when the information of the population parameter of an auxiliary variable is not available and/or the cost for observing auxiliary characteristics is relatively high. In stratified double sampling, the first-phase sample is taken for estimating the auxiliary attribute of the population in various strata. The second-phase sample is selected from the first phase sample for getting the observations of the study variable and the auxiliary attribute from each stratum. Cochran [2] described that the classical double sampling for stratification is a method that uses two random samples, where the second sample is a stratified sub-sample of the first sample. In literature, Sahoo et al. [8], Kadilar and Cingi [4], Shabbir and Gupta [11], Samiuddin and Hanif [9], Singh and Vishwakarma [14], Koyuncu and Kadilar [6], Singh and Choudhary [1], Hamad et al. [3], Sanaullah et al. [10], Malik and Singh [7] and Shabbir and Gupta [12] have suggested various estimators using stratified and twophase sampling. Recently, Zaman and Kadilar [15] suggested a family of exponential ratio and exponential product estimators to estimate population mean using an auxiliary attribute in stratified two-phase sampling.
From the stratification point of view, we presented the generalized estimator for population mean using auxiliary attribute under stratified two-phase sampling.
After a brief introduction and discussion on the background, the remaining part of the paper is described as follows. Stratified sampling procedure with the notations and some existing estimators is presented in Sect. 3. Mathematical expressions for bias and MSE of the proposed estimator are derived using Taylor series in Sect. 4 using stratified two-phase sampling scheme. Many special cases as family and subfamilies of the proposed estimator are also discussed in the same Section. Mathematical comparisons of the proposed estimator with the existing estimators in terms of the MSE are given in Sect. 5. A numerical study is conducted in Sect. 6 to evaluate the performance of the proposed estimator over the existing estimators. Section 7 is based on the conclusions and recommendations of the paper.

Sampling Procedure with Basic Notations and Existing Estimators
Consider a finite population U = U 1 , U 2 , U 3 , … , U N divided into L homogeneous groups or strata with N h (h = 1, 2, 3, \ldots, L) in the h th stratum such that ∑ L i=1 N h =N . Let Y and P be the mean study variable and the auxiliary attribute respectively. Let y hi and p hi be the observed values and Y hi and P hi be the true values of the study variable and the auxiliary attribute respectively associated with the i th (i = 1, 2, 3, …, P hi be the corresponding population stratum mean and stratum proportion. Let h = N h ∕N be the stratum weight. When the information on p h is unknown, a first-phase stratified sample of size n ′ h is selected from h th stratum to estimate p h by using simple random sampling without replacement (SRSWOR) such that ∑ L i=1 n � = n � . Another stratified sample of size n h is selected from n ′ to get the information on study variable as well as the auxiliary attribute from second-phase.
Let us define Such that The classical ratio and product estimators in stratified two-phase sampling are given by and φ MSE of ŷ st r and ŷ st p are given by and The exponential ratio and exponential product estimators using two phase-stratified sampling are given by and The approximate MSE of ŷ st er and ŷ st ep are given by Zaman and Kadilar [15] have suggested following families of exponential estimators under stratified two-phase sampling to estimate the population mean of study variable by using auxiliary attribute: The approximate MSE of ŷ st zkj is given by where j = 1, 2, 3, …,9. And

Proposed Generalized Estimator and its Properties
Motivated by Khoshnevisan et al. [5], we have suggested a generalized mean estimator under stratified two-phase sampling to estimate the mean of study variable using an auxiliary attribute given by where is assumed to be the unknown whose value is to be determined from optimal considerations. The constants α (α ≠ 0) and β may assume different population parameters of the auxiliary attribute. Further, g and k be the suitable chosen constants, which may assume different values to get classical ratio, product and regression type estimators respectively, such as.
In order to obtain the bias and MSE of the proposed estimator, we may re-write the proposed generalized estimator given in Eq. (11)   where It is noted that the proposed estimator can produce many ratio-type and producttype estimators as special cases using different values of g, k and . A family of the ratio and product type estimators is mentioned in Table 3 (Appendix 1) using different choices of the constants. As the bi-serial correlation coefficient between the study variable and the auxiliary attribute is positive, we only consider ratiotype estimators as the sub-family of the proposed estimator mentioned in Table 4 (Appendix 1).

Mathematical Comparisons
In this section, we have compared the generalized estimator with some existing estimators in stratified two-phase sampling given as under: i. Proposed generalized estimator will perform better than the combined mean estimator under two-phase sampling if which implies Or ii. Proposed generalized estimator will perform better than the combined ratio estimator under two-phase sampling if which implies Or iii. Proposed generalized estimator will perform better than the combined product estimator under two-phase sampling if which implies Or iv. Proposed generalized estimator will perform better than the combined exponential ratio estimator under two-phase sampling if which implies Or v. Proposed generalized estimator will perform better than the combined exponential product estimator under two-phase sampling if which implies Or vi. Proposed generalized estimator will perform better than the family of modified exponential estimator proposed by Zaman and Kadilar [15] if which implies Or

Numerical Study
In this numerical study, we considered the real population of apple production as the main variable of interest and the number of apple trees more than 15,000 as an auxiliary attribute in 1999 (Original Sources: Institute of Statistics, Republic of Turkey). The data has been taken from Kadilar and Cingi [4] to judge the performance of the proposed estimator in terms of MSE and percentage relative efficiencies (PRE) for population mean using stratified two-phase sampling. The whole population is classified into six regions of Turkey named as: Marmara, Aegean, Mediterranean, Central Anatolia, Black Sea and Southeast Anatolia. From each stratum, the samples were selected at random and the size of each stratum was determined by Neyman allocation method. The detail summary statistics of the data is summarized in Table 1 given below. The main study variable and the auxiliary attribute are obtained as. Yh = apple production amount in six different regions in 1999 The assessment of the proposed estimator and its special cases over the combined mean estimator based on the mathematical equation of the PRE. The general formula for the computation of PRE for all the estimators is given by where and Ω =ŷ

MSE(Ω)
× 100, Note that Ω is the value of the relevant estimators presented in this Section and R (= 10,000) is the total number of iterations.
The results of MSE and PRE of all the estimators considered in this paper are summarized in Table 2 given by From the results presented in Table 2, it is shown that the proposed estimator and its sub-cases ( ŷ st(A) q3 and ŷ st(A) q6 ) are more efficient than the combined ratio, product, exponential ratio and exponential product and the class of exponential estimators proposed by Zaman and Kadilar [15]. The product type estimators did not perform well as the point bi-serial correlation coefficient between the study variable and the auxiliary attribute is positive in each stratum. The MSE of generalized estimator is the least as compared to the amount of MSE its special cases and the other competing estimators considered in this paper.

Conclusions
In a general sense, the application of the double sampling is very useful when the population parameters of the auxiliary characteristic are either unknown or expensive to compute. Stratification is very important to increase the precision of the estimators for the heterogeneous population units. In this paper, we have proposed a generalized estimator for the estimation of population mean using two-phase stratified sampling. We have used a single auxiliary attribute and derived the From the results of numerical study, we conclude that the proposed estimator and its few special cases are more efficient than the existing classical ratio-product, exponential ratio-product estimators and the class of exponential ratio-type estimators proposed by Zaman and Kadilar [15]. The proposed estimator was found to be the superior among all the estimators considered in this paper. From the main findings of the numerical study, we may conclude that our estimator is efficient and very useful for the estimation of population mean using two-phase sampling for the populations of different characteristics.