Weighting estimation under bipartite incidence graph sampling

Bipartite incidence graph sampling provides a unified representation of many sampling situations for the purpose of estimation, including the existing unconventional sampling methods, such as indirect, network or adaptive cluster sampling, which are not originally described as graph problems. We develop a large class of design-based linear estimators, defined for the sample edges and subjected to a general condition of design unbiasedness. The class contains as special cases the classic Horvitz-Thompson estimator, as well as the other unbiased estimators in the literature of unconventional sampling, which can be traced back to Birnbaum et al. (1965). Our generalisation allows one to devise other unbiased estimators in future, thereby providing a potential of efficiency gains. Illustrations are given for adaptive cluster sampling, line-intercept sampling and simulated graphs.


Introduction
study the situation where patients are sampled indirectly via the hospitals from which they receive treatment. Insofar as a patient may be treated at more than one hospital, the patients are not nested in the hospitals like elements in clustered sampling.
Birnbaum and Sirken consider three estimators for such indirect sampling. The first one is the classic Horvitz-Thompson (HT) estimator (Horvitz and Thompson 1952) based on all the sample patients, each of which is weighted by the inverse of the probability of being included in the sample. The second estimator is based on all the sample hospitals and a constructed value for each of them, and the third one is only based on a sub-sample of hospitals determined by a priority rule. In particular, the estimator using all the sample hospitals is often referred to as a Hansen-Hurwitz (HH) type estimator. The HH-type estimator and its variations are used for network sampling (Sirken 1970(Sirken , 2005; it is recast as a "generalised weight share method" (Lavalleè 2007); and a modified HH-type estimator is considered for adaptive cluster sampling (Thompson 1990(Thompson , 1991. All the sampling techniques mentioned above are considered somewhat unconventional, compared to the standard sampling methods using stratification or multistage selection. Unconventional sampling techniques are often characterised by the presence of some rules of observation, in addition to the probability design of an initial sample. For example, under network sampling (Sirken 1970), "siblings report each other" are needed to reach a "network" of siblings following an initial sample of households. Under adaptive cluster sampling (Thompson 1990), sample propagation depends on the "network" relationship among the units and the values of the surveyed units. Moreover, unconventional sampling require that information of "multiplicity" of sources is collected in addition to the sample. For instance, in the example of indirect sampling of patients via hospitals, one needs to identify all the relevant hospitals outside the initial sample, in order to compute the inclusion probability of a sample patient. The same requirement exists as well for any other unconventional sampling, such as "counting rules" of links between population elements and selection units under network sampling (Sirken 2005), or the relationship between edge units and their neighbouring networks under adaptive cluster sampling (Thompson 1990). Zhang and Patone (2017) formally define sampling from finite graphs, in analogy to sampling from finite populations (Neyman 1934), extending the previous works by Frank (1971Frank ( , 1980aFrank ( , 1980bFrank ( , 2011, which deal with different graph motifs separately. In particular, they show that each of the aforementioned unconventional sampling techniques can be given different graph sampling representations. Zhang and Oguz-Alper (2020) identify sufficient and necessary conditions for feasible representation of sampling from arbitrary graphs as bipartite incidence graph sampling (BIGS), including indirect, network and adaptive cluster sampling. For instance, the nodes can be the hospitals and the patients and an edge exists between a hospital and any patient that receive treatment at the hospital. This is a bipartite graph since the nodes of the graph are bi-partitioned, where an edge can exist only between two nodes in different parts, but not between any two nodes in the same part.
Under graph sampling (Zhang and Patone 2017), one needs to specify an observation procedure, by which the edges of the sample graph are observed following an initial sample of nodes. As demonstrated by Zhang and Oguz-Alper (2020), BIGS can provide a unified representation of various situations of sampling, which are originally described in other terms, where one part of the nodes refer to the initial sampling units and the other part the measurement units of interest, to be referred to as motifs, such that the edges represent the observational links between sampling units and motifs. More examples will be given later in this paper. Also, the observation procedure needs to be ancestral (Zhang and Patone 2017), so that one knows which other out-of-sample nodes could have led to the motifs in the sample graph, had they been selected in the initial sample of nodes. The information of multiplicity or ancestry is apparent under BIGS, which is simply the knowledge of the nodes (representing sampling units) that are adjacent to the node representing a sample motif in the BIG.
BIGS can provide a unified representation of many so-called unconventional sampling techniques in the literature, and the three estimators considered by (Birnbaum and Sirken 1965) are applicable under any BIGS.
Our aim in this paper is to formulate a large class of unbiased incidence weighting estimators, which includes the three estimators of (Birnbaum and Sirken 1965) as special cases but are not limited to them. This allows one to study design-based estimation under the general setting of ancestral BIGS (satisfying the requirement of ancestral observation), where the results are immediately applicable to all the relevant situations. Notice that we do not consider model-based estimation in this paper, which requires additional assumptions but would allow one to draw conclusions about the superpopulation from which the given population graph is taken from.
We shall develop the class of unbiased incidence weighting estimators, based on the sample edges that link the sampling units to the observed motifs. As will be explained, all the three estimators used by (Birnbaum and Sirken 1965) are special cases of this class of estimators, which is an insight hitherto unknown in the literature. Many other unbiased estimators can be devised as members of the proposed class, and one can apply the Rao-Blackwell method (Rao 1945, Blackwell 1947 to the non-HT estimators, to generate distinct unbiased estimators that can improve the estimation efficiency. Thus, the discovery of the class of incidence weighting estimators provides a potential for efficiency gains. Below, in Sect. 2, we formally introduce ancestral BIGS, and develop the incidence weighting estimators. The general condition of unbiased estimation is established. New understandings of the three aforementioned estimators are discussed. We consider also the application of Rao-Blackwell method, which motivates a new subclass of the HH-type estimators. Illustrations are given in Sect. 3 of adaptive cluster sampling (Thompson 1990), line-intercept sampling (Becker 1991) and simulated graphs, which demonstrate the scope and flexibility of the proposed approach across a variety of situations. Some concluding remarks are given in Sect. 4.

Incidence weighting estimator under BIGS
Denote by B ¼ ðF; X; HÞ a bipartite simple directed graph, where ðF; XÞ form a bipartition of the node set F [ X, and each edge in H points from one node in F to another in X. No edge exists between any two nodes in F or any two in X. For BIGS from B, let F be the set of initial sampling units, and X the population of motifs that are of interest, where a motif is a subgraph exhibiting a particular pattern, for example a pair of nodes with directed edges to each other, or three nodes forming a triangle in an undirected simple graph. An edge ðijÞ that is incident to i 2 F and j 2 X exists, if and only if the selection of i in a sample s from F leads to the observation of motif j in X, hence the edges (and the graph B) are defined to be directed. The edge set H is unknown to start with. Let the size of F be M ¼ jFj, and that of X be N ¼ jXj, where N is generally unknown. The incidence relationships corresponding to the edges in H represent thus the observational links between the sampling units and the motifs of interest.
Zhang and Oguz-Alper (2020, Theorem 1) establish the sufficient and necessary conditions, by which an arbitrary instance of graph sampling can be given a feasible BIGS representation. They examine and discuss the BIGS representation of indirect, network and adaptive cluster sampling. For instance, for indirect sampling of patients via hospitals, let F consist of the hospitals and X the patients, where ðijÞ 2 H iff patient j receives treatment at hospital i. For network sampling of siblings via households, one can let F consist of the households and X the networks of siblings, i. e. each j represents a group of people who are siblings of each other, where ðijÞ 2 H iff at least one of the siblings in j belongs to household i. Adaptive cluster sampling will be discussed in Sect. 3.
Let a i ¼ fj : j 2 X; ðijÞ 2 Hg be the successors of i in B. Given the initial sample s from F, the observation procedure of BIGS is incident (Zhang and Patone 2017), such that all the nodes in a i are included in the sample graph provided i 2 s; hence, the term BIGS. Let X s ¼ S i2s a i , which consists of all the sample motifs. Following the general definition of sample graph (Zhang and Patone 2017), the sample BIG is given by is the sample of edges. To be able to calculate the inclusion probabilities of each j in X s , the observation procedure needs to be ancestral as well. Let b j ¼ fi : i 2 F; ðijÞ 2 Hg be the ancestors (or predecessors) of j in B. Let bðX s Þ ¼ S j2X s b j . The knowledge of ancestry (or multiplicity) amounts to the observation of bðX s Þ n s, although these nodes are not part of the sample graph B s , such as the out-of-sample hospitals of the sample patients.
Example 1 Consider ancestral BIGS from the population BIG below.
We have F ¼ fi 1 ; i 2 ; i 3 ; i 4 g and X ¼ fj 1 ; j 2 ; j 3 g and H ¼ fði 1 j 1 Þ; ði 2 j 1 Þ; ði 2 j 2 Þ; ði 3 j 3 Þg. Suppose s ¼ fi 1 ; i 3 g & F. By incident observation procedure, we have X s ¼ fj 1 ; j 3 g and H s ¼ fði 1 j 1 Þ; ði 3 j 3 Þg, and the sample graph B s ¼ ðs; X s ; H s Þ as defined above. In addition, we observe bðX s Þ n s ¼ fi 2 g, where i 2 is not part of the sample BIG. Notice that the ancestry knowledge requires one to obtain additionally the information identifying all the ancestors of all the observed sample motifs in X s . For instance, for each patient j sampled from the hospital i, all the hospitals (other than i) in which k receives treatments must be identified, whether or not they are among the actual sample of hospitals. This can e.g. be achieved by adding a survey question to each sample patient in X s , which enumerates all the relevant hospitals. Sometimes, it may be more natural to survey the units in s instead. For instance, when sampling children via their parents, where the mother and father are used as separate sampling units in F, one can ask the in-sample parent about the out-of-sample parent(s). Finally, it may be possible or preferable to retrieve the ancestry knowledge from external sources, such as the Birth Register when sampling children via parents.
Notice also that, in computer science (e.g. Leskovec and Faloutsos 2006;Hu and Lau 2013), one may be concerned with situations where the graph is in principle known but is too large or dynamic to be fully processed or stored for practical purposes. Taking a sub-graph according to some chosen probability scheme is then a possible approach. For an example the whole Twitter graph consisting of users and their following/follower relationships can be constructed by the company at any given time point. However storing every instance of the graph might be unfeasible due to the enormous amount of memory required and the fact that the graph is changing all the time. Taking a sample may suffice for the purpose of estimating e.g. the follower to following ratio. As another example, let F be the products available in an online market place and X the buying customers, and let ðijÞ 2 H iff the customer j has bought the product i. Again, the whole graph is available to the owner of the market, but sampling may be preferred for various market analytics. Of course, in these situations the ancestry knowledge of the sample X s is guaranteed.
Sometimes, either the design or circumstances may prevent one from obtaining the complete ancestry knowledge, such that not all the ancestors b j of an observed motif j are known. Without losing generality, suppose one only manages to obtain information about a subset of b j , denoted by b Ã j , where b Ã j is non-empty now that j is already observed. It is then both necessary and possible to modify the sampling strategy (including the estimators described below), an example of which will be discussed in Sect. 3.1 later. Moreover, we refer to Zhang and Oguz-Alper (2020) for a treatment of incomplete ancestry knowledge, which can arise in a number of situations of graph sampling.

The incidence weighting estimator
Let y j be an unknown constant associated with motif j, for j 2 X, given the population graph B. The aim is to estimate the total h ¼ P j2X y j , including e.g.
Given the sample graph B s , let fW ij ; ðijÞ 2 H s g be the incidence weights of the sample edges, and W ij 0 if ðijÞ 6 2 H s . The incidence weighting estimator (IWE) is given by Notice that the definition (1) allows for sample dependent weights W ij .
Proof The expectation ofĥ with respect to the sampling distribution of s is given by The condition (2) ensures that the IWE is unbiased under repeated sampling. When the weights are constant of sampling, denoted by x ij for distinction, it reduces to P i2b j x ij ¼ 1 for any j 2 X. Let p ij be the second-order sample inclusion probability of i; j 2 F.

Proposition 2
The BIG sampling variance of an unbiased IWE is given by where

HT-type estimator
Let p ðjÞ ¼ Pr ðj 2 X s Þ and p ðj'Þ ¼ Pr ðj 2 X s ; ' 2 X s Þ for j; ' 2 X, where parentheses are used in the subscript to distinguish these inclusion probabilities of the motifs from those of the sampling units. The HT-estimator is given bŷ where p b k is the exclusion probability of b k in s, which is the probability that none of the ancestors of j in B is included in the initial sample s, and the knowledge of the out-of-sample ancestors b j n s is required to compute . The HT-estimator is a special case of the IWE, where the weights W ij for each k and s satisfy X Notice that these weights W ij are not constant of sampling if jb j j [ 1, since they depend on how s intersects b j . For Example 1 earlier, we have (4), since both j 2 and j 3 have only one ancestor in the BIG. Moreover, The value a does not matter, since the coefficient of y j 1 in the IWE (1) is To see that the weights given by (4) satisfy the condition (2) generally, let / s j be the probability that the sample intersection is Arguing similarly in terms of the joint probability that the sample intersections for j and ' are s j and s ' , it can be shown that D j' in (3) reduces to p ðj'Þ =p ðjÞ p ð'Þ given (4) and (2).
More generally, let g s j ¼ p ðjÞ P i2s j W ij =p i for any weights W ij that are not constants of sampling. To satisfy the condition (2), for any j 2 X, the weights must be such that X The HT-estimator is the special case where g s j 1. It is possible to assign g s j that differs from 1 for different sample intersects s j , subjected to the restriction (5). Any estimator satisfying (5) but not (4) may be referred to as a HT-type estimator.

HH-type estimator
While a HT-type estimator uses sample dependent weights W ij , a HH-type estimator uses weights x ij that are constant of sampling. The condition (2) is reduced to P i2b j x ij ¼ 1, for any j 2 X. Thus, for Example 1 earlier, we have now Birnbaum and Sirken (1965) It follows that the HH-type estimator given bŷ is unbiased for h under repeated sampling, where z i is a constructed constant for each initial sample unit i. The BIG sampling variance ofĥ z is given by Notice that one only needs z i for the initial sample units in order to applyĥ z , which is possible provided ancestral BIGS. Moreover, the HH-type estimator (6) defines actually a family of estimators, depending on the choice of x ij , although (Birnbaum and Sirken 1965) use only the equal weights x ij ¼ 1=jb j j. The correspondinĝ h z is referred to as the multiplicity estimator, denoted byĥ zb . Variations of the multiplicity estimator under other settings of indirect, network sampling are considered by Sirken (1970), Sirken and Levy (1974), Sirken (2004) and Lavalleè (2007). Unlike the HT-estimator, it is in principle possible to apply the Rao-Blackwell method to improve the HH-type estimator, to which we return in Sect. 2.5. Some other HH-type estimators will be discussed then.

Priority-rule estimator
Birnbaum and Sirken (1965) invent a third estimator based on a prioritised subset of H s , where they let and 0 otherwise, i.e. if unit i happens to be enumerated first in the frame F among all the in-sample ancestors of j, for each j 2 X s . For Example 1 earlier, we have I i 2 j 2 ¼ 1 whenever j 2 2 X s and I i 3 j 3 ¼ 1 whenever j 3 2 X s , since both j 2 and j 3 have only one ancestor in the BIG. The priority-rule only matters for j 1 here. If fi 1 ; i 2 ; i 3 ; i 4 g is the frame arranged in the order of enumeration, then we would have I i 1 j 1 ¼ 1 if i 1 2 s whether or not i 2 2 s, and I i 2 j 2 ¼ 1 only if i 2 2 s and i 1 6 2 s. Whereas if fi 4 ; i 3 ; i 2 ; i 1 g is the frame arranged in the order of enumeration, then we would have I i 2 j 1 ¼ 1 if i 2 2 s whether or not i 1 2 s, and I i 1 j 2 ¼ 1 only if i 1 2 s and i 2 6 2 s. The priority-rule estimator based on fðijÞ : I ij ¼ 1; ðijÞ 2 H s g is given bŷ where p ij ¼ Pr is the conditional probability that ðijÞ is prioritised given ðijÞ 2 H s , and x ij ¼ 1=jb j j are the equal weights for any j 2 X. Clearly, other priority rules or choices of x ij are possible.
One can easily recogniseĥ p as a special case of IWE with W ij ¼ I ij x ij =p ij . It can satisfy the unbiasedness condition (2), provided p ij [ 0 for all ðijÞ 2 H s , in which case EðW ij jd i ¼ 1Þ ¼ x ij . Birnbaum and Sirken (1965) did not provide an expression of V ðĥ p Þ, but indicated that it is unwieldy. Now thatĥ p is a special case of IWE, its variance follows readily from Proposition 2. Let because P i2b j x ij ¼ 1 for any j 2 X. An unbiased variance estimator can be given by The priority probabilities p ij and p ij;j' depend on the priority rule, as well as the sampling design. The details for the estimtator of Birnbaum and Sirken (1965) under initial simple random sampling (SRS) without replacement of s are given in Appendix A. It should be noticed that the priority rule is not part of sampling; the sample graph B s includes all the edges incident to every sample unit in s. Had one applied subsampling by randomly selecting one of the edges incident to each i in s with some designed probabilities, the sample graph would have contained one and only one edge from each sample unit. Instead, the priority rule selects only one sample edge incident to each motif in X s for the purpose of estimation.
There is a possibility that a unit i can be sampled but never prioritised, in which caseĥ p would be biased. For an extreme example, suppose a motif j is incident to all the sampling units in F, then the last unit in F can never be prioritised (for j) according to the priority rule of Birnbaum and Sirken (1965), as long as jsj [ 1. Generally,ĥ p is biased under this priority rule, provided there exists at least one motif j in X, where jb j j [ 1 and Pr ðjs j j [ 1 j j 2 X s Þ ¼ 1 such that the ancestor i ¼ maxðb j Þ has no chance of being prioritised when it is in s. The probability above depends on the ordering of sampling units in F, as well as the initial sample size. Given any ordering of the units in F, as the initial sample increases, it is possible forĥ p to behave more erratically and become biased eventually.

Using Rao-Blackwell method
The minimal sufficient statistic under BIGS is fðj; y j Þ : j 2 X s g, or simply X s as long as one keeps in mind that the y-values are constants associated with the motifs. Letĥ be an unbiased IWE. Applying the Rao-Blackwell method toĥ yieldsĥ RB ¼ EðĥjX s Þ as an improved estimator, if the conditional variance V ðĥjX s Þ is positive. Since the HT-estimatorĥ y is fixed conditional on X s , we haveĥ yRB ĥ y . For a non-HT estimator, it is in principle possible that the RB method can improve its efficiency, as illustrated below.
Example 2 Consider the BIG in Example 1. Given jsj ¼ 1, there are 4 distinct initial samples, leading to 4 distinct X s , such that V ðĥjX s Þ ¼ 0 andĥ RB ¼ĥ for any unbiased IWE. Given jsj ¼ 2, there are 6 different initial samples, leading to 5 distinct X s , where both s ¼ fi 1 ; i 2 g and s 0 ¼ fi 2 ; i 4 g lead to the same motifs fj 1 ; j 2 g, so thatĥ RB 6 ¼ĥ given motif sample fj 1 ; j 2 g, ifĥðsÞ 6 ¼ĥðs 0 Þ. Take e.g. the HH-type estimatorĥ z by (6), we havê The calculation required for the RB method may be intractable, if the conditional sample space of s given X s is large and the initial sampling design p(s) is not fully specified, which is common in practice for designs with unequal inclusion probabilities over F. Moreover, the result of RB method is generally not a unique minimum variance unbiased estimator under BIGS, because the minimal sufficient statistic is not complete. It is thus worth exploring other useful choices of the IWE. Due to the inherent shortcoming of the priority-rule estimator pointed out earlier, we concentrate on the HH-type estimatorĥ z below.
Consider the special case where ja i j 1 in the population BIG, such as when sampling households via persons. Suppose first with-replacement sampling of s, where the different draws generate an IID sample, and compareĥ y andĥ z based on a single draw. Let p i and p ðjÞ ¼ P i2b j p i be the respective selection probabilities. We have p ij ¼ p i if i ¼ j and 0 if i 6 ¼ j, and p ðj'Þ ¼ p ðjÞ if j ¼ ' and 0 if otherwise, now that ja i j 1. We have given which we haveĥ z ¼ĥ zRB . The variance of any other h z would be larger, as long as x ij =p i is not a constant over b j , because A similar argument holds approximately for the choice x ij / p i under sampling without replacement of s, provided p ij % p i p j and p ðj'Þ % p ðjÞ p ð'Þ , as in the case of sampling households via persons with a small sampling fraction |s|/|F|. This can make z i =p i more similar to each other over F, which is advantageous with respect to the anticipated mean squared error ofĥ z under the sampling design and a population model of z i , according to Theorem 6.2 of Godambe and Joshi (1965). To make z i =p i more similar to each other over F without the restriction ja i j ¼ 1, one may consider setting x ij \x jj if ja i j [ ja j j, despite p i ¼ p j , because there are more motifs contributing to z i than z j . Thus, under general unequal-probability sampling of s, it may be reasonable to consider the probability and inverse-degree adjusted (PIDA) weights subjected to the condition (2), where c [ 0 is a tuning constant of choice. Denote bŷ h zac the corresponding PIDA-IWE. The multiplicity estimatorĥ zb becomes a special case ofĥ zac given c ¼ 0 and constant p i over F. Notice that to apply the weights (8) with c 6 ¼ 0, one needs to know ja i j for all i 2 b j and j 2 X s , in addition to the ancestral observation of b j . For instance, under indirect sampling of children via parents, one would need to collect the number of children for the out-of-sample parents in bðX s Þ n s as well. For network sampling of siblings via households, one would need to collect the number of other sibling networks in each household i with at least one member from a sample sibling network j.

Adaptive cluster sampling
Consider the example of adaptive cluster sampling (ACS) discussed by Thompson (1990). The population F consists of 5 grids, with y-values f1; 0; 2; 10; 1000g. Each grid has either one or two neighbours which are adjacent in the given sequence, as in the graph G below, where as Thompson (1990) we simply denote each grid by its yvalue. Given an initial sample of size 2 by SRS from F, one would survey all the neighbour grids (in both directions if possible) of a sample grid i if y i exceeds the threshold value 5 but not otherwise. The observation procedure is repeated for all the neighbour grids, which may or may not generate further grids to be surveyed. The process is terminated, when the last observed grids are all below the threshold. The interest is to estimate the total amount of species (or mean per grid) over the given area.
In particular, the grid 2 is a so-called edge unit, which can be observed from 10 or 1000, but would not lead to 10 or 1000 if only 2 is selected in s. The inclusion probability of grid 2 under ACS cannot be calculated correctly when it is selected in s but not 10 or 1000, in which case the knowledge of multiplicity (or ancestry) is lacking. Thompson (1990) proposes a modified HT-estimator which uses the grid 2 in estimation, only if it is selected on its own, the probability of which is known from the design of the initial sample.
Zhang and Oguz-Alper (2020) develop feasible BIGS representations of ACS from G above. Here we use one of them to illustrate how the IWE can be applied to ACS. The population BIG is given by B ¼ ðF; F; HÞ, with X ¼ F and H as below.
B : 1 0 2 10 1000 1 0 2 10 1000 Zhang and Oguz-Alper (2020) point out that it is possible to consider BIGS from B, where the observational links between (10, 2) and (1000, 2) under ACS are removed to ensure ancestral observation, and apply the classic HT-estimator under this BIGS representation of ACS from G. They show that the two strategies (ACS, modified HT) and (BIGS, HT) actually lead to the same estimator. The difference is that one cannot apply the RB method to the HT-estimator under BIGS, as one can with the modified HT-estimator under ACS. We refer to Zhang and Oguz-Alper (2020) for more details. Thompson (1990) proposes also a modified HH-type estimator, where an edge unit is used in estimation only if it is selected in s directly. This modified HH-type estimator is simply the multiplicity estimatorĥ zb under BIGS from B, with equal weights x ij ¼ 1=jb j j in (6). The two strategies (ACS, modified HH-type) and (BIGS,ĥ zb ) lead to the same estimator. Moreover, application of the RB method tô h zb is the same as that for the modified HH-type estimator; we refer to Thompson (1990) for the details.
Finally, since the contiguous grids that form a network are all observed together under ACS if any of them is observed, ancestral BIGS from B entails the observation of ja i j needed for the PIDA weights given by (8). However, since ja i j is the same for all the grids in the same network and the initial sampling is SRS, the weights by (8) are all equal in this case, so that the estimatorĥ zac coincides with the multiplicity estimatorĥ zb .

Line-intercept sampling
Line-intercept sampling (LIS) is a method of sampling habitats in a region, where a habitat is sampled if a chosen line segment transects it. The habitat may e.g. be animal tracks, roads, forestry, which are of irregular shapes. Kaiser (1983) considers the general situation, where a point is randomly selected on the map and an angle is randomly chosen, yielding a line segment of fixed length or transecting the whole area in the chosen direction. Repetition generates an IID sample of lines. In the simplest setting, each transect line is selected at random by selecting randomly a position along a fixed baseline that traverses the whole study area, in the direction perpendicular to the baseline. We apply IWE under BIGS to the following example of LIS (Becker 1991) under this simple setting.
The aim is to estimate the total number of wolverines in the mapped area, as sketched in Fig. 1. Four systematic samples A, B, C and D, each containing 3 positions, are drawn on the baseline that is equally divided into 3 segments of length 12 miles each. Following the 12 selected lines and any wolverine track that intercepts them yields 4 observed tracks, denoted by j ¼ 1; :::; 4 and heuristically indicated by the dashed lines in Fig. 1. Let y j be the associated number of wolverines, and L j the length of the projection of j on the baseline. From top to bottom and left to right, we observe ðy 1 ; L 1 Þ ¼ ð1; 5:25Þ, ðy 2 ; L 2 Þ ¼ ð2; 7:5Þ, ðy 3 ; L 3 Þ ¼ ð2; 2:4Þ and ðy 4 ; L 4 Þ ¼ ð1; 7:05Þ.

Feasible BIGS representation of LIS
First we construct a feasible BIGS representation of LIS in this case. Given the observed tracks, partition the baseline into 7 projection segments, each with associated length x i , for i ¼ 1; :::; 7 from left to right, where x 1 refers to the overlapping projection of j ¼ 1 and 2, x 2 the projection of j ¼ 2 that does not overlap with j ¼ 1, x 3 the distance between projections of j ¼ 2 and 3, x 4 the projection of j ¼ 3, x 5 the distance between projections of j ¼ 3 and 4, x 6 the projection of j ¼ 4, and x 7 the distance between j ¼ 4 and right-hand border. The probability that the i-th projection segment is selected by a systematic sample is The 4 systematic samples are IID.
The sample BIG on the r-th draw is given by B r ¼ ðs r ; X r ; H r Þ, where s r contains the selected projection segments, and a i the wolverine tracks that intercept the sampled line originating from i 2 s r , such that X r ¼ S i2s r a i and H r ¼ S i2s r i Â a i . In this example, we have s 1 ¼ s 2 ¼ f1; 5; 6g, yielding X 1 ¼ X 2 ¼ f1; 2; 4g on the first two draws A and B, and s 3 ¼ s 4 ¼ f4; 6; 7g, yielding X 3 ¼ X 4 ¼ f3; 4g on the last two draws C and D. The distinct projection segments selected over all the draws are s ¼ S 4 r¼1 s r ¼ f1; 4; 5; 6; 7g, and the distinct tracks are X s ¼ S 4 r¼1 X r ¼ f1; 2; 3; 4g. Let F Ã ¼ f1; 2; :::; 7g contain the 7 projection segments constructed from ðs; X s Þ, and H s ¼ S 4 r¼1 H r . Let B Ã ¼ ðF Ã ; X s ; H s Þ be given as below: Let X ¼ f1; :::; j; :::; N g contain all the wolverine tracks in the area, where N ! 4 given the sample X s . Let FðXÞ ¼ f1; :::; i; :::; M g be the sampling frame, which consists of all the projection segments constructed from X. Let H ¼ fðijÞ; i 2 F; j 2 Xg, where an edge exists from i to j provided j intercepts In practice, only B Ã can be constructed but not B. The two are not the same generally, in that one needs to further partition the projection segments of F Ã in F based on X, in order to accommodate the unobserved tracks in X n X s . For instance, suppose there is a track that can only be intercepted from the 7-th projection segment in F Ã and the track does not reach the right-hand border, then this projection segment would be partitioned into 3 segments in F, and (F, H) would differ from ðF Ã ; H Ã Þ accordingly.
Under LIS, field observation along a line has an actual width of detectability. Dividing the baseline accordingly yields thus a known sampling frame F 0 of detectability partitions. Let B 0 ¼ ðF 0 ; X; H 0 Þ be the corresponding BIG. By Theorem 1 of Zhang and Oguz-Alper (2020), LIS can be represented as BIGS from B 0 where, in particular, the observation procedure of LIS ensures that BIGS from B 0 is ancestral for X s . Now, as along as the unit of detectability is negligible in scale compared to the baseline, one can assume the elements of F 0 to be nested in those of F Ã (or F), such that the selection probability of each observed track j with respect to BIGS from B 0 can be correctly calculated using B Ã (or B). Thus, the strategy BIGS-IWE defined for B 0 can be applied using the observed B Ã , just as when B were known.
Given the systematic sampling design of the transect lines, the tracks f1; 2; 4g can only be observed if a position is selected in the left part of 1st projection segment, which would only result in f1; 5; 6g as the sampled projection segments. Similarly, the tracks f3; 4g can only be observed if a position is selected in 4th projection segment, which would only result in f4; 6; 7g as the sampled projection segments. Thus, applying the RB method would not change any unbiased IWE based on the observed sample BIGS in this case.
The estimatorĥ HH of Becker (1991) is the IWEĥ za0 . The HT-estimatorĥ y noted by Thompson (2012) can be given as the IWE with weights satisfying (4). Other unbiased IWE can be used for LIS under BIGS from B Ã , two of which are as given in Table 1. Neither the HT-estimatorĥ y nor the multiplicity estimatorĥ zb is efficient here. Efficiency gains can be achieved using the PIDA weights (8). In this case, adjusting the equal weights by the selection probability while disregarding the degrees of the initial sample units performs well, whereĥ za0 has the lowest estimated variance. Of course, the true variance ofĥ za0 may or may not be smaller than that of, say,ĥ za:5 . Meanwhile, setting c ¼ 1:227 would numerically reproduce the equal weights x 12 ¼ x 22 ¼ 0:5 based on the observed sample. It seems that the IWE by (8) has the potential to approximate the relatively more efficient estimators in different situations, if one is able to choose the coefficient c in (8) appropriately.

A simulation study
Two graphs B ¼ ðF; X; HÞ and B 0 ¼ ðF; X; H 0 Þ are constructed for this simulation study. Both B and B 0 have the same node sets F and X, and jFj ¼ 54 and jXj ¼ 310. The edge sets |H| and jH 0 j have the size jHj ¼ jH 0 j ¼ 1200, but different distributions of the degree on the motifs in F, as shown in Fig. 2. The distribution of the degree of the motifs in F is relatively uniform over a small range of values in B, but much more skewed and asymmetric in B 0 . Let h ¼ jXj, and y j 1 for j 2 X. We consider the following 7 estimators of h under BIGS from B or B 0 with SRS of s, where m ¼ jsj varies from 2 to 53: • the IWEĥ y with weights satisfying (4) (the HT estimator); • the IWEĥ zac with weights satisfying (8) for c ¼ 0; 1; 2, (the multiplicity estimator); • the IWEĥ p by (7) (the priority-rule estimator of Birnbaum and Sirken (1965)). We explore different ordering of the motifs in F: random, ascending or descending yielding three estimators, denoted byĥ pR ,ĥ pA andĥ pD , respectively. Table 2 gives the relative efficiency of the 6 other estimators against the HTestimator, for a selected set of initial sample sizes, each based on 10000 simulations of BIGS from either B or B 0 . All the results are significant with respect to the simulation error.
We notice that all the three priority-rule estimatorsĥ pR ,ĥ pA andĥ pD are biased when the sample size is large enough. This happens at m ¼ 45 for B and m ¼ 46 for B 0 . Note that the maximum degree of the motifs is 10 in B and 9 in B 0 . Moreover, the variance of any priority-rule estimators decreases as the sample size m increases until a threshold value after which the variance starts to increase. In these simulation the threshold is somewhere between 10 and 30.
The sampling variance of the priority-rule estimator is also affected by the ordering of the sampling units in F. The variance tends to be lowest when F is arranged in descending ordering by ja i j, whereas ascending ordering tends to yield the largest variance. Without prioritisation, the value z i is a constant of sampling given x ij . Due the randomness induced by the priority-rule, z i varies over different samples. A sampling unit with large ja i j has a large range of possible z i values and placing such a unit towards the end of the ordering tends to increase the sample variance of fz i : i 2 sg due to prioritisation. It then makes sense that descending ordering by ja i j may work better than ascending ordering. However, one may not know fja i j : i 2 Fg in practice, in which case applyingĥ p given whichever ordering of F can be a haphazard business.
Given initial SRS, the different HH-type estimators here differ only with respect to the use of ja i j in the PIDA weights (8) via the choice of c. The equal-weights esmator h za0 is the least efficient of the three HH-type estimators, especially for B 0 where the distribution of ja i j is more skewed. The differences between the other two estimatorŝ h za1 andĥ za2 are relatively small, compared to their differences toĥ za0 , so that a nonoptimal choice of c 6 ¼ 0 is less critical than simply setting c ¼ 0. Taken together, these results suggest that the extra effort that may be required to obtain ja i j is worth considering in practice, and a sensible choice of c depending on the distribution of ja i j over F if it is known, or B s if it is only observed in the sample BIG, is an interesting question to be studied.
Finally, bothĥ za1 andĥ za2 are more efficient than the HT-estimator when m is small, whereas the HT-estimator improves more quickly as m becomes larger, especially for B 0 . The matter depends on the sampling fractions jX s j=jXj and |s|/|F|, as well as the respective inclusion probabilities of motifs and sampling units. The interplay between them is complex as it depends on the population BIG. Further research is needed in this respect.

Concluding remarks
In this paper we develop a large class of incidence weighting estimators (1) under BIGS. The IWE is applicable to all situations of unconventional sampling techniques that require a specific observation procedure in addition to an initial sample, which can be represented by ancestral BIGS, including indirect, network, adaptive cluster and line-intercept sampling. The condition (2) ensures exactly design-unbiased IWE, which synthesises and generalises the conditions underlying the other unbiased estimators known in the literature. The classic HT-estimator from finite-population sampling is shown to be a special case of IWE, with any sample dependent weights satisfying the restriction (4), which provides a novel insight. A more general restriction (5) is given for sample dependent weights. It will be intriguing to investigate other HT-type estimators satisfying this restriction.
The priority-rule estimator invented by Birnbaum and Sirken (1965) is another a special case of IWE. However, it may become biased as the initial sample size increases and behave erratically long before that, such that its application may be a haphazard business if one is unable to control the interplay between the ordering of sampling units and the priority-rule of Birnbaum and Sirken (1965). It remains to be seen whether one is able to overcome these shortcomings by future developments.
The HH-type estimators used in the literature are also members of the proposed class. While it is in principle possible to apply the Rao-Blackwell method to an HHtype estimator to improve its efficiency, the computation may be intractable if the conditional sample space of s is large and/or if the initial sampling design p(s) is not fully specified. However, consideration of the Rao-Blackwell method and the degrees (in the BIG) of the sampling units points to the PIDA weights (8) for IWE, as a general alternative to the commonly used equal weights and the corresponding multiplicity estimator. The numerical illustration of line-intercept sampling and the simulation results suggest that the PIDA weights can easily outperform the equal weights. Further study is warranted, in order to identify the sensible choice of the PIDA weights in applications.
Finally, other incidence weights can be explored subjected to the condition (2), beyond those examined in this paper. This is clearly another direction of future research.
m À 2 if j 6 ¼ '; i 6 ¼ j and jb i j \ fjgj þ jb j ' \ figj ¼ 0 0 i f j 6 ¼ '; i 6 ¼ j and jb i j \ fjgj þ jb j ' \ figj [ 0 8 > > > > > > > > > > > < > > > > > > > > > > > : where b i j is the subset ancestors of j with higher priority than i, and d iðj;'Þ ¼ jb i j [ b i ' j is the number of units in b j [ b ' with higher priority than i, and d iðjÞ;jð'Þ ¼ jb Funding The authors did not receive support from any organization for the submitted work.

Data Availability
The datasets generated during the current study are available from the corresponding author on reasonable request.

Declarations
Conflict of interest The authors have no conflicts of interest to declare that are relevant to the content of this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/.