Judgment Post-stratified Assessment Combining Ranking Information from Multiple Sources, with a Field Phenotyping Example

This paper presents novel estimators for a judgment post-stratified (JPS) sample, which combine the ranking information from different methods or rankers. A JPS sample divides the units in the original simple random sample (SRS) into several ranking groups based on the relative positions (ranks) of the units in their individual small comparison sets. Ranks in the comparison sets may be assigned with several different ranking procedures. When considered separately, each ranking method leads to a different JPS sample estimator of the population mean or total. Here we introduce equally or unequally weighted estimators, which combine the ranking information from multiple sources. The unequal weights utilize the standard errors of the individual ranking methods estimators. The weighted estimators provide a substantial improvement over an SRS estimator and a JPS estimator based on a single ranking method. The new estimators are applied to crop establishment phenotypic data from an agricultural field experiment. Supplementary materials accompanying this paper appear online.


INTRODUCTION
A stratified sample combines probability samples (e.g., SRS) from each stratum in a population. The total variation in a stratified sample can thus be partitioned into withinand between-stratum variation. In certain settings, it may not be straightforward to stratify the population if the stratification variable is subjective, such as visual inspection, and not available for all population units at once. In such cases, the stratification principle can be used at the sample level. MacEachern et al. (2004) introduced the concept of the JPS sample to improve the information content of an SRS by post-stratifying based on a (subjective) ranking variable.
To construct a JPS sample in an infinite population, one first selects an SRS sample and measures each unit in the sample for the characteristic of interest. For each unit, H − 1 additional units are then selected, without measurement, at random from the population to form a comparison set of size H . Relative positions (ranks) of every unit in the original SRS are determined in their comparison sets using a ranking procedure on an auxiliary variable, such as visual ranking. Under a consistent ranking procedure, regardless of its quality, the JPS sampling yields unbiased estimators for the population mean and total (Ozturk 2016a).
The main difference between the stratified and JPS sampling procedures is that the former combines SRSs selected across all strata, while the latter partitions a single SRS from the population into ranking groups post-experimentally, based on the ranking information from additional comparison sets. The stratum sample sizes are constant in the stratified sampling, while the judgment ranking group sample sizes are random in the JPS sampling. The JPS sample is similar to a post-stratified (PS) sample, with a clear distinction. The PS sample is stratified based on the population strata membership information, while the JPS sample is segregated into the judgment (ranking) groups based on the position (ranking) information in small comparison sets. Hence, a JPS sample can be used for populations for which no strata membership is available.
In the context of modern digital plant research (Araus and Cairns 2014), JPS sampling presents a range of potential applications in agricultural and environmental research and practice. We provide here a motivating example of the field estimation of early crop establishment of faba bean in an agronomic experiment conducted in the University of Adelaide, Australia (Kasprzak 2021). The aim was to compare the crop performance, including faba bean emergence, under a range of planting densities, controlled by the rate of sowing and row spacing. It is believed better early crop establishment leads to better yield and profitability. The study measured several variables, including the seedling emergence observed over the first two weeks after sowing. An expert human observer would typically inspect five random transects of 1m length in an experimental plot. For automated observations, the entire research field was divided into 2640 grid cells (approximately 50 cm long and 25 cm wide) for aerial photograph observations with a drone (DJI's Phantom 4 Pro v2.0) in addition to the observations by the human observer. The drone had an on-board camera and recorded the image of each cell. These images were later processed using a machine learning algorithm to predict the actual number of seedlings emerged.
For the purposes of this paper, we consider the data collected in a single year of the trial over the entire field: (1) the manual counts by the expert observer of seedling emergence (Y ) on each grid cell approximately two weeks after sowing; (2) the automated counts X predicted from the image data. The ranking variable X is highly correlated with Y , ρ Y,X = 0.766. The lack of perfect correspondence is explained by the uncertainty in the manual and automated counts due to the presence of ground cover (stubble) concealing the dusty green faba beans seedlings.
In the seedling emergence data, the auxiliary variable X can be used as a ranking variable for JPS. Even though the X values would be different from the Y values, they provide reasonably accurate ranks for faba bean emergence on the grid cells selected for comparison sets.
In a JPS sample, rank classes for individual units are assigned after the simple random sample is collected. In certain settings, it is possible to have more than one set of ranks for the same measured unit in the SRS. Ranks can be constructed with different ranking mechanisms. To illustrate this, in our example of the faba bean emergence, the ranks of Y on each measured unit can be obtained K , K ≥ 1, times, with a minimal ranking cost, by constructing K different comparison sets (simply selecting grid cells at random from automated images) for each Y -value. In a different setting, units in each comparison set may be ranked by K rankers using different visual inspection or assessment protocols, or by K machine learning algorithms requiring, say, different amounts of training. In some other cases, for the n units selected and measured for a JPS, n(H − 1) units can be additionally selected in the population, and the same person may visually rank the units in K different comparison sets of size H for each unit in the JPS by permuting the n(H − 1) unmeasured units into n comparison sets K times.
The ranking information plays a significant role in the construction of a JPS sample. In a post-stratified sample (PS), the observations are binned based on their membership in stratum populations. However, knowledge of the strata membership may be too strong an assumption. When the membership information is not available for all population units, position information may be readily available in a small set of units in comparison sets. This position information is used in JPS sampling to create judgment groups of homogeneous observations. Each measured unit in a JPS sample possesses two pieces of information: the value of the characteristic of interest Y , and the relative position of the unit among the other H − 1 unmeasured units in its comparison set. The latter presents additional information through the ranks of the measured units. If the ranking quality is good and H is large, the information content of a JPS sample can be substantially higher than the information content of a PS sample of the same size.
The problem of reducing the impact of ranking error on the estimators in the JPS sampling design for an infinite population has generated extensive research. Wang et al. (2006) used the concomitants order statistics to estimate the population mean. Frey (2016) and Frey andFeeman (2012, 2013) constructed estimators for the population mean and variance by conditioning on the judgment group samples sizes. These new estimators improve the unconditional JPS estimators. Judgment ranks induce a stochastic ordering among judgment group means in a JPS sample. Chen et al. (2014), Frey and Ozturk (2011), Wang et al. (2008, and Stokes et al. (2007) constructed constrained estimators using isotonic regression under the restriction of stochastic ordering among judgment ranking group means. Judgment group sample sizes are random variables in a JPS sample. Hence, sample sizes for some judgment groups may be zero. Ozturk (2017) constructed conditional ranks with smaller set sizes by conditioning on the original ranks in a JPS sample. He showed that these conditional ranks in smaller comparison sets reduce the impact of empty judgment classes. The main approach in the aforementioned papers is either to average the ranking information in the sample or to smooth it through isotonic regression. Omidvar et al. (2018) estimated the prevalence of osteoporosis using a JPS sample in a finite mixture model. Zamanzade and Wang (2017) constructed several estimators for the population proportion using a JPS sample.
Recently, there has been increased research on JPS sampling in the finite population setting. Ozturk (2016a); Ozturk (2016b); Ozturk (2019a) used design-based inference to construct JPS estimators for the population mean and total. The JPS sample can be constructed under sampling with or without replacement. It has been shown that the estimator of the variance of the sample mean requires a finite population correction factor in sampling without replacement. Ozturk (2019b) constructed a JPS sample using probability proportional-to-size sampling to assign higher selection probabilities to important units in the population. In a model-based inference, Ozturk (2018a); Ozturk and Bayramoglu Kavlak (2018b) constructed predictors for the population mean and total using a super-population model.
Regardless of how the ranks are created, the ranking information from multiple consistent sources can be combined to draw statistical inference. If the ranking cost is negligible, combining sets of ranks improves the information content of a JPS sample without impacting its sampling cost. In a slightly different context, Ozturk (2013) and Ozturk and Demirel (2016) used the tie structures among K rankers to construct a ranked set sample and estimate the population mean and variance. Stokes et al. (2007) used the raking principle in contingency tables to incorporate ranking information into a JPS sample.
This paper adds to the JPS research for finite populations and presents new estimators of the population mean and total which combine the ranking information obtained from different sources in a JPS design. The design-based inference is presented in detail, and the model-based inference is also provided in the Supplementary Material. Section 2 details the construction of JPS sampling designs with and without replacement, D 1 and D 2 . In design D 1 , units in a comparison set are selected without replacement. The units are ranked, and the rank of the measured unit is identified. Once the rank is identified, all units, including the measured ones, are returned to the population before constructing the next comparison set, and the process is repeated till all the units in the JPS are assigned their judgment rank. In this design, the same unit can appear in the sample more than once, and all observations in the sample are independent. Hence, this sample is equivalent to a JPS sample constructed from an infinite population. The design D 2 is similar to design D 1 . The main difference is that all units in the comparison set are removed from the population, including both the measured and unmeasured ones, before selecting the next comparison set. The designs D 1 and D 2 induce different kinds of the correlation structure among the measured observations in a JPS sample. All observations in D 1 are independent, while observations in D 2 are negatively correlated. Section 2 also provides a brief review of the known results for the distributional properties of the sample mean of the JPS sampling design. Section 3 introduces equally and unequally weighted estimators, which combine the ranking information from K different sets of ranks. The unequal weights depend on the standard errors of the JPS estimators based on individual ranking sources. Section 4 provides the variance estimator for the weighted estimators of the population mean. Section 5 investigates the empirical properties of the proposed estimators. Section 6 applies the methodology to the faba bean seedling emergence data to estimate the average density of seedlings. Section 7 provides some concluding remark. Supplementary Material presents developments for model-based inference and agreement-weight estimators.

SAMPLING DESIGNS
In this paper, we assume the finite population setting unless specified otherwise. The finite population setting under sampling with replacement also covers the JPS sampling designs in the infinite population setting. A finite population of size N will be denoted by P. Let Y be the variable of interest. The values of Y on population units will be denoted by y 1 , . . . , y N . The population mean and variance are defined as follows Design D 1 : we first select an SRS with replacement and measure each of the n units in the sample, Y 1 , . . . , Y n . For each unit (Y i ), we then select additional H − 1 units under sampling without replacement from the population to form a comparison set We rank all units in the comparison set from smallest to largest based on the perceived value of the characteristic Y (or some other ranking information available) and identify the rank of Y i , R i,1 . All units in the comparison set, including the one we measured initially, are returned to the population before the construction of the next comparison set. A population size of at least H units is required to complete the design. This process creates the following JPS sample Design D 2 : the JPS sample is constructed in a fashion similar to D 1 . We first select an SRS of size n without replacement, and measure all the items, . . , H − 1, and identify the ranks (R i,2 ) for i = 1, . . . , n. In this design, all units in a comparison set are removed from the population before selecting a comparison set for another item. Hence, all comparison sets are disjoint, and a population size of at least Hn units is required. The JPS sample created is We note that samples D 1 and D 2 have different distributional properties. If the population size is sufficiently large, the pairs in sample D 2 become approximately independent, and samples D 1 and D 2 are approximately equivalent.
The conditional mean and variance of Y j given R j,r = h, and the conditional covariance of Y j , Y t given that R j,r = h, R t,r = h will be denoted by For design D 1 , σ 2 [h,h ],1 = 0 since the observations are independent. In the expressions above, the square brackets correspond to judgment class statistics. Under perfect ranking, the brackets can be replaced with round parentheses corresponding to order statistics.
Unbiased estimators of the population mean and total based on samples D 1 and D 2 are given byȲ where the subscript r denotes that the JPS sample is generated by design D r and The estimatorȲ JPS,r can be written in a slightly different form as a weighted average of the judgment class means where the weights w h,r make the estimator unbiased for any set size H and sample size n. In JPS sampling, ranks have a discrete uniform distribution with the support on integers 1, . . . , H . Hence, n h,r , d n,r , and w h,r are all random variables since they are functions of ranks R i,r , i = 1, . . . , n. The sample size vector of judgment class groups, n r = (n 1,r , . . . , n H,r ), has a multinomial distribution with the number of trials n and the success probability vector (1/H, . . . , 1/H ). The following theorem provides some useful results for these random variables.
Lemma 1. The following equalities hold for the distribution of I h,r d n,r , h = 1, · · · H, r = 1, 2 (Ozturk 2014; Dastbaravarde et al. 2016): We note that the expected values, variances and covariances in Lemma 1 do not depend on the values of Y on population units. They can be computed for any given sample and set sizes. We do not assume that the assigned rank R i,r is the same as the actual rank of Y i in its comparison set, but we require a consistency in the ranking procedure. The ranking procedure is called consistent if it satisfies the following equality The consistency of a ranking procedure in JPS sampling holds since P(R i,r = h) = 1/H for h = 1, . . . , H . For convenience, we will consider ranking in the ascending order throughout the paper. Ozturk (2016a) showed that the estimatorȲ JPS,r is unbiased for the population mean under a consistent ranking scheme and provided a closed form expression for its variance under designs D 1 and D 2 . The following theorem is stated for easy access in the remainder of the paper.
for design D 1 and We now consider an unbiased estimator for the variance ofȲ JPS,r . Let and I * h,r = I (n h,r > 1), d * n,r = H h=1 I * h,r . From Lemma 1, one can easily establish that E(I 1,r I 2,r /d 2 n,r ) = (1/H − E(I 1,r /d n,r ) 2 )/(H − 1). Hence, U 1,r is a statistic that depends only on the data. The following theorem (adapted to the notation of this paper) is stated from Ozturk (2016a).
. . , n, be JPS samples of set size H from designs D r , r = 1, 2. Assume that at least one of the judgment groups has at least two measured observations, d * n,r > 0. Unbiased estimators of σ 2 1 and σ 2 2 are given by, respectively, Theorem 2 provides two unbiased estimators:σ 2 2 andσ 2 2 for σ 2 2 . Ozturk (2016a) showed that for some values of n, H , and N , the coefficient C 2 (n, H, N ) could be negative. This may rarely lead to a negative value forσ 2 2 . When this happens, the following truncated variance estimator can be used Further development on this estimator can be found in Ozturk (2016a). Statistical inference can also be drawn using a model-based approach under a superpopulation model using a multi-ranker model. Detailed development for this approach is provided in the Supplementary Material.

COMBINING RANKING INFORMATION
In this section, we look at JPS sampling from a different perspective. We assume that each measured unit is assigned K conditionally independent ranks given the comparison set. The JPS sampling still has a single measurement (Y i ) for each unit, but it has K different ranks. The definition of the JPS sample in the previous section is now extended to cover the source of ranking information D r,k = {Y i , R i,k,r }, i = 1, . . . , n, r = 1, 2, k = 1, . . . , K , r = 1, 2, where R i,k,r is the rank assigned to Y i by ranking method k in design D r . The subscript r = 1, 2 denotes that the sample is generated from designs D 1 or D 2 , respectively. As we indicated in Sect. 1, ranks can be constructed using information from different sources. Our objective here is to combine the information contained in these K sets of ranks to construct a better estimator than a JPS sample estimator based on a single ranking method. We consider two different approaches for combining the ranking information. The first approach uses the standard errors of the estimators. The second approach combines the ranking information using equal weights for each of the K sets of ranks.
LetȲ k,r andσ 2 k,r be the JPS estimator and its variance estimate, respectively, based on ranking method k and design D r . For design D 1 , we combine the ranking information from different sources by constructing weighted estimators. We use either equal weight (E) or unequal weight of inverse standard error (S).
Similar estimators can be constructed for design D 2 . There are two unbiased estimators for the variance ofȲ 2 (σ 2 2 ). Our small scale simulation study showed that the variance estimatorσ 2 k,2 performs slightly better thanσ 2 k,2 . Therefore, we useσ 2 k,2 to construct the estimator that combines the ranking information from K ranking methods.
The combined estimators can be classified into two groups. The first group gives equal weight to each individual ranking estimator. The second group assigns greater weights to the ranking methods producing smaller variances. In the Supplementary Material, we demonstrate another approach to combining the ranking information at each observation level using the agreement scores.

VARIANCE ESTIMATOR OF THE COMBINED ESTIMATOR
In order to asses the uncertainty of the combined estimators, we need to estimate their variances. We first consider the jackknife variance estimate for the combined estimators. This estimate is appealing in settings where it may not be clear how to calculate a good estimator or the standard deviation of the estimator. Such situations occur when the delta method may not work properly due to the absence of a clear theoretical or functional relationship between the data and the estimator. In our study, the JPS sampling generates one set of response measurements and K different sets of ranks for each measured unit in its comparison set, and the ranking methods could vary substantially. For example, the same comparison set in a field research project may be ranked through visual inspection, an auxiliary field variable and computer assisted automated ranking methods. LetȲ S,r , for r = 1, 2, be the combined estimators after removing the i-th observation along with all its ranks from the sample. The jackknife variance estimate for each of the combined estimators is then given bŷ S,r /n. In the jackknife variance estimators, the coefficient (n − 1)/n is replaced with ((n − 1)/n) 2 .
The variance estimates allow us to construct an approximate 100(1 − α)% confidence interval for the population mean and total, Y E,r ± t n−1,1−α/2σE,r ,Ȳ S,r ± t n−1,1−α/2σS,r , Y JPS,r ± t n−1,1−α/2σJPS,r ,Ȳ SRS,r ± t n−1,1−α/2σSRS,r for r = 1, 2, where t n−1,a is the a-th upper quantile of the t-distribution with n − 1 degrees of freedom, andȲ SRS,r andσ SRS,r are the sample mean and sample variance of a simple random sample under design D r , r = 1, 2. An R-package RankedSetSampling is developed to compute the estimators and construct confidence intervals for Design D 1 and D 2 , . It is available to download at https://biometryhub.github.io/Ranked SetSampling.

EMPIRICAL COMPARISONS OF COMBINED ESTIMATORS
In this section, we performed a simulation study to empirically investigate the properties of the estimators. We considered four populations of various distribution shapes likely to occur in practice. The populations include uniform (also beta( p = 1, q = 1)), beta (4,4), beta(4,13) and the standard exponential distribution (exp(1)). These distributions correspond to populations with short tail (uniform, beta(1,1)), symmetric (beta(4,4)), slightly positively skewed (beta(4,13)) and positively skewed (exponential, exp (1)). For convenience, uniform, beta(4,4), beta (4,13), and exp(1) distributions are denoted as Group I, II, III, and IV populations, respectively. The values of the population units in simulations are generated by the quantiles of these four distributions where N = 400 is the size of the finite population and F G is the cumulative distribution function of the Group G population, G = I, II, III, IV. These four populations are also used in McIntyre (1952McIntyre ( , 2005 to illustrate the efficiency of ranked set sample designs in agricultural experiments. The JPS samples are generated using designs D 1 and D 2 with set sizes H = 3, 6 and sample sizes n = 30. The number of rankers and simulation size are selected to be K = 3, 12 and 2000, respectively. The quality of ranking information is modeled through the Dell and Clutter (1972) model. This model creates a ranking variable V in each comparison set.
be the comparison set for Y i generated from a population with variance σ 2 . The Dell and Clutter model generates another random vector, i = ( i , 1 , . . . , H −1 ), from a normal distribution with mean zero and variance σ 2 . We then add these two vectors to construct the ranking vector V i = (V i , V 1 , . . . , V H −1 ). Units in the comparison set are ranked based on the values of V , and the rank of V i is taken as the judgment rank of Y i . The correlation coefficient between Y i and V i is given by ρ = 1/ 1 + σ 2 /σ 2 . The magnitude of σ 2 controls the quality of ranking information. If σ 2 = 0, the rank of Y i is the same as the rank of V i , hence, no ranking error occurs. Large values of σ 2 correspond to random ranking. In our simulation study, we considered six different ranking models. for r = 1, 2. We note that the estimators in each design D r , r = 1, 2 are compared with the simple random sample mean estimator. Values of RE .,. greater than one indicate that the rank-based estimators have higher efficiencies than the SRS estimator of the same size.  Table 1 provides the relative efficiencies for Group I (uniform) and II (beta(4,4)) populations when K = 3 and K = 12. It is clear that the JPS estimators perform better than the SRS estimators for models M 2 , M 3 , M 4 , M 5 and M 6 . In ranking model M 1 , there is a weak correlation, ρ = 0.1, between the response and ranking variables. Hence, for this model, the JPS estimators have RE values less than one and appear to be slightly less efficient than the SRS estimator. These low efficiency values are related to the quality of ranking information. Since the judgment group sample sizes are random variables in a JPS sample, they bring an additional variation to the variance of the estimator. When there is no ranking information to separate the data into homogeneous ranking groups, the JPS sample estimators may be slightly less efficient. On the other hand, the multi-ranker combined JPS estimators with K = 12 and H = 6 are almost as efficient as the SRS estimator even for ranking model M 1 . This shows that the multi-ranker JPS estimators perform as well as the SRS estimator even if the ranking quality is poor. If there is a reasonable information gain to improve the ranking quality, the multi-ranker JPS estimators always outperform the SRS and a JPS estimator with K = 1.
The impact of ranking quality can be clearly seen by considering the efficiency achieved for models M 1 , M 2 and M 3 , where all the ranking methods have the same correlation ρ = 0.1, 0.5, and 0.8, respectively. The efficiency values are ordered for models M 1 , M 2 and M 3 in all columns in Table 1. The efficiencies of all the estimators are increasing functions of ρ, for a given design D r and the number of ranking methods K . The efficiencies for model M 1 (ρ = 0.1 ) are smaller than 1 when K = 3, but nearly equal one when K = 12. When ρ increases to 0.5 (M 2 ) and 0.8 (M 3 ) the efficiencies also increase for both designs (D 1 , D 2 ) with the number of ranking methods K = 3 and K = 12.
If the ranking correlation ρ is the same for all ranking methods (ranking models M 1 , M 2 , and M 3 ), the estimatorsȲ S,r andȲ E,r have similar efficiency results. On the other hand, if ranking methods have different ρ values (ranking models M 4 , M 5 and M 6 ), the estimator weighted with the inverse standard error outperforms the equal weight estimator. For example, the values of RE S,r and RE E,r are essentially equal for models M 1 , M 2 and M 3 , while RE S,r are larger than RE E,r for models M 4 , M 5 and M 6 .
It is also clear that the efficiencies are an increasing function of the set size when there is a reasonable ranking information to rank the units in the comparison sets. For example, the efficiencies of all the estimators for set size H = 6 are higher than the efficiencies of the estimators with H = 3 for ranking models M i , i = 1, 2, . . . , 6. For model M 1 , the efficiencies of the estimators of set size H = 6 (row 7, Table 1) are lower than the efficiencies of the estimators of set size H = 3 (row 1). The reason for this is that when H = 6 and n = 30, the judgment ranking group sample sizes are more uneven than when H = 3 and n = 30. This leads to some ranking groups being empty, and inflates the variance of the estimator. It is less likely to have empty ranking groups when H = 3.
Both Groups I and II populations are symmetric, but the Group I distribution has a short tail (or heavy tail) in comparison to that of Group II. The Group III population is slightly rightskewed, while the Group IV distribution has a strong skewness to the right. The efficiencies of the estimators for Groups III and IV are given in Table 2. Even though the shapes of these populations are different, the efficiency patterns are similar. The main difference is that the improvement in efficiencies decreases with the strength of skewness. The uniform distribution yields the maximum improvement in efficiency. The least improvement happens for the exponential distribution. This result is consistent with the efficiency results of ranked set samples in McIntyre (1952), McIntyre (2005. McIntyre reported that the efficiencies are the highest for the uniform distribution and decrease with skewness. A similar result is also reported in Takahasi and Wakimoto (1968). Under perfect ranking and infinite population settings, they proved that the upper bound of relative efficiency of a ranked set sample mean with respect to the simple random sample mean is (H + 1)/2. This upper bound is achieved for the uniform distribution.
We also computed the simulation standard errors of the multi-ranker estimators of population mean. The maximum value of the simulation standard error for simulation size 2000 was 0.002. Due to space considerations, no tables of the estimated simulation standard errors are reported here.
Let C E,r , C S,r and C J,r be the coverage probabilities of the confidence intervals of population mean based on the estimatorsȲ E,r ,Ȳ S,r andȲ J,r , respectively. Tables 3 and 4 present these coverage probabilities for the sample size of n = 30 and the population The population values are generated from y i = F −1 G (i/(N + 1)), where F G is the CDF of group G, G = I (uniform), I I (beta(4,4)), I I I (beta(4,13)), I V (exp(1)). The population size is N = 400, the sample sizes n = 30. The simulation size is 2000 distributions of Groups I, II, III and IV. It is clear that all the coverage probabilities are reasonably close to the nominal coverage probability of 0.95, for both symmetric and skewed distributions. We also performed another simulation study with smaller sample size n = 15. In that case, the coverage probabilities are reasonably close to 0.95 for Groups I, II and III. The coverage probabilities for Group IV (exponential distribution) were about 0.90, which is smaller then the nominal value 0.95. The results are presented in Supplementary Material. This additional simulation study shows that to achieve a reasonable coverage probability for skewed distributions, the multi-ranker JPS samples require larger sample sizes.

FABA BEAN CROP ESTABLISHMENT EXAMPLE
In this section, we performed another simulation study using the faba bean seedling emergence data introduced in Sect. 1. The population means and variances of the actual seedling counts (Y ) and automated seedling counts (X ) for the entire field of N = 2640 transects are μ Y = 1.051 seedling/50 cm, σ 2 Y = 0.926 (seedling/50 cm) 2 , and μ X = 0.769 Table 3. Coverage probabilities of jackknife confidence intervals of population mean Design D 2 Design D 1 Design D 2 C E,1 C S,1 C J,1 C E,2 C S,2 C J,2 C E,1 C S,1 C J,1 C E,2 C S,2 C J,2 Population values are generated from y i = F −1 G (i/ (N + 1), where F G is the CDF of group G, G = I(uniform), II(beta(4,4)), IIII(beta(4,13)), I V (exp (1)). The population size is N = 400, the sample size n = 30, the simulation size 2000 seedling/50 cm, σ 2 X = 0.649 (seedling/50 cm) 2 , respectively. The correlation coefficient between X and Y is relatively high, ρ Y,X = cor(X, Y ) = 0.766.
In this simulation, the JPS samples were generated with set sizes H = 3, 4, 6 and sample sizes n = 15, 30. For each JPS sample in the simulation, we constructed K = 3, 12 different comparison sets and determined the rank of Y using the auxiliary variable X . Hence, each Y value in a sample had either K = 3 or K = 12 judgment ranks R i,k , {Y i , R i,k , k = 1, . . . , K ; i = 1, . . . , n}, where R i,k is the rank of Y i in the comparison set k. The simulation size was taken to be 2000. We note that the actual values of seedling emergence Y and their predicted values are count data per a given area. Since the X -variable is a discrete random variable, ties among X -values in comparison sets may occur. When ties happen, we break them at random. Even though the correlation coefficient between Y and X is relatively high, ties among the X -values in the comparison sets may reduce the quality of ranking information.
In this population, values of the X variable are available for all N = 2640 population units. This provides an alternative estimator to estimate the population mean or total using a ratio estimator. For the ratio estimator, we select an SRS of size n using sampling with (D 1 ) and without (D 2 ) replacement and measure the pairs (Y i,r , X i,r ), i = 1, . . . , n for design D r , r = 1, 2. The ratio estimator of the population mean is then given bȳ The efficiency of the ratio estimator with respect to the SRS estimator is defined by , r = 1, 2. Table 5 presents the efficiencies of the estimators with respect to the estimatorȲ SRS,r . It is clear that all estimators, including the ratio estimators, provide an improvement over the SRS within each sampling design D r . Among these estimators, the least efficient estimators (RE J,r ) are the JPS estimators with a single ranking method. The JPS estimators are in general less efficient than the ratio estimators; see the efficiency values (RE J,1 , RE R,1 ) and (RE J,2 , RE R,2 ).
The efficiency improvements of multi-ranker JPS estimators, even when K = 3, are substantial. Both equal and unequal weight JPS estimators perform better than JPS estimators with K = 1 and the ratio estimators in both designs D 1 and D 2 . For example, the standard error-weighted estimators are 1.252 (1.906/1.523) and 1.206 (1.907/1.581) times more efficient than the ratio estimators when n = 30, K = 12 and H = 6 for designs D 1 and D 2 , respectively. For the JPS estimator with K = 1, the same efficiency values are 1.536 (1.906/1.241) and 1.537 (1.907/1.241). Table 5 demonstrates that the efficiency values of equally and standard error-weighted estimators are very close to each other. In this simulation, all K ranking methods use the same correlation coefficient ρ X,Y = 0.766. Hence, the standard error weights for all ranking groups are on average equal. This leads to equal weights and hence similar efficiency values to those of the equally weighted estimators. Table 5 shows very little difference between the efficiency values of the same estimators under design D 1 and design D 2 . For example, the efficiency values in columns 4 and 8, 5 and 9, and 6 and 10 are very close. This can be anticipated since the finite population correction factor 0.989 (1-30/2640) is very close to 1, and the impact of sampling without replacement is not visible.
We have also computed the empirical coverage probabilities of the confidence intervals for the population mean. All confidence intervals provided coverage probabilities reasonably close to the nominal coverage probability of 0.95. These results are not reported here. Design D 2 Design D 1 Design D 2 C E,1 C S,1 C J,1 C E,2 C S,2 C J,2 C E,1 C S,1 C J,1 C E,2 C S,2 C J,2

CONCLUDING REMARKS
Many survey and field sampling studies collect concurrently with the main variable or make it available in some other way; additional auxiliary variables correlated with the variable of interest. This situation is typical of modern field phenotyping in plant research. This additional information can induce a structure among the sample units, creating multiple sets of ranks for each measured observation. These ranks can be used to form judgment poststratified samples with multiple ranks for a given data set. In this paper, we introduce novel weighted estimators, which combine the ranking information from different sources in a JPS sample. We constructed two different weight functions. The first weight function uses the standard error of each single-ranker JPS estimator. The second weight function (presented in Supplementary Material) utilizes the agreement scores of all ranking methods. The variances of the estimators are computed using the jackknife variance estimate. Approximate (1 − α)100% confidence intervals are constructed.
The empirical studies show that the proposed estimators always perform better than the SRS and the JPS estimator of a single ranking method. The efficiency gain is substantial if the set size of the comparison set is large, H ≥ 3. Combining ranking information from multiple sources can also be applied to more complex designs, such as stratified JPS and two-stage JPS designs. The implementation of such novel estimators requires new computing tools to handle the computational complexity. An R-package has been developed to compute the estimators.