Murthy’s Estimator in Unequal Probability Inverse Adaptive Cluster Sampling

This paper derives a Murthy’s unbiased estimator of population total under unequal probability inverse sampling. A general unequal probability inverse sampling is combined with adaptive cluster sampling. An unbiased estimator of population total and its variance estimator are given using Murthy’s approach. The general unequal probability inverse adaptive cluster sampling and general equal probability inverse adaptive cluster sampling are compared using simulation study based on real life data. The results indicate that the general unequal probability inverse adaptive cluster sampling has a small coefficient of variation for estimates compared to equal probability inverse adaptive cluster sampling. When the coefficients of correlation between study variable and probability of selection units increase, the coefficient of variation decreases.


Introduction
An efficient sampling design for estimating parameters of interest in a rare population is one of the most challenging areas for statisticians.Haldane (1945) used an inverse sampling for selecting units of interest in a population which contains a small fraction of the units of interest.Thompson (1990) proposed adaptive cluster sampling that is an efficient design when populations are rare and clustered because survey effort can be targeted to units of interest.One of problems encountered when adaptive cluster sampling from a rare population is that an initial sample may not contain a unit of interest.In order to get sampled units of interest, Christman and Lan (2001) applied inverse sampling to selecting an initial sample in adaptive cluster sampling.For any inverse sampling design, there is the possibility that we cannot select a fixed number (m) of units of interest under a given resource.It might happen because m is too large or the number of units of interest in the population is too small.Salehi and Seber (2004) proposed a general inverse sampling with fixing the final sample size incorporated into adaptive cluster sampling.
The sampling designs mentioned above are taken under sampling with equal probability.If the probability of selecting a unit is highly correlated to the study variable, unequal probability sampling design can give higher efficiency than equal probability sampling design.Greco and Naddeo (2007) proposed an unequal probability inverse sampling with replacement.An unbiased estimator of the population total was derived by using conditional expectation of sample size approach.This paper derives an unbiased estimator of population total using Murthy's approach.A general unequal probability inverse sampling is combined with adaptive cluster sampling.An unbiased estimator of the population total and an unbiased variance estimator are given.A comparison of the sampling design with the general equal probability inverse adaptive cluster sampling design is considered using simulation study.Murthy (1957) derived an unbiased estimator of population total as

Murthy's Estimator
where  is the number of distinct units in the sample,   P s|i is the conditional probability of getting the sample s , given the i-th unit was drawn first.The variance of  is given by where   P s|i,j denotes the probability of getting sample s given that the  i th and  j th units were drawn in the first two draws.Salehi and Seber (2004) proved that Murthy's estimator can be applied for sequential sampling.Using this approach, we derive unbiased estimators of population total and its variance estimator under considered sampling desings.

Unequal Probability Inverse Sampling
Assume that a finite population consists of N units with associated study values  Let k and g be the numbers of distinct units in c S and c S , and which indexed by r is the number of times that unit i appears in the sample.With an ordered sample   * S , the probability of getting the ordered sample is   where the last sampled unit belongs to the set c S .For an unorded sample   s , the last sampled unit must belongs to c S , so that after allocating one i-th unit of k sampled unit in c S , the rest of sampled units can be ordered in ways.Therefore, the sample s can be constructed in ways.The probaility of getting a sample s is r 1, , r .
Theorem 1 Under the unequal probability inverse sampling design, an unbiased estimator of the population total is Proof: Using Murthy's estimator, the into (1) and using some algebra, we obtain the expression (4).The into (3) and using some algebra, we obtain the result as in expression (5).
Note that using Murthy's estimator, we obtain the same estimator given by Greco and Naddeo (2007).

General Unequal Probability Inverse Sampling
In a sampling procedure which avoids sampling large number of sample size, an initial sample of size 0 n is drawn by unequal probability sampling with replacement where Theorem 2 Under the general unequal probability inverse sampling design, an unbiased estimator of the population total is where n 2 .

Proof: When
into (3) and using some algebra, we obtain the result as in expression (7).

General Unequal Inverse Adaptive Cluster Sampling
We apply the general unequal probability inverse sampling to adaptive cluster sampling.Assume that each unit in the population is defined to have a neighborhood.The neighborhood of a unit is a set of other units associated with the unit.The sampling scheme is as following.
An initial sample is drawn by the general unequal probability inverse sampling with replacement.For the sample units in class C , their neighborhoods are added to be sampled and observed.The procedure continues until no more units in the class C are found.This sampling scheme is called a general unequal probability inverse adaptive cluster sampling because sampling begins with general unequal probability inverse sampling and incorporates adaptive cluster sampling.The final sample consists of the initial sample and all adaptively sampled units.
The set of units that are adaptively sampled as a result of unit i-th being sampled and that are also the member of class C are called the network to which the unit i belongs.The units that are adaptively sampled but are in the class C are called edge units.By this way, if any unit in the i-th network is selected in the initial sample, all units in the network are sampled.From definition of network, the population can be divided into K mutually exclusive networks.
Let n be the final sample size. .The parameter to be estimated can be written as Since the probability of any edge unit included in the final sample is not known, some estimators which included edge units will be biased.So the edge units in the final sample are excluded from the estimation stage.An unbiased estimator for the population total uses the sample units in the class C only when they are drawn to be the initial sample.The estimator is formed by modifying the unbiased estimator given by Theorem 2.
Theorem 3 Under the general unequal probability inverse adaptive cluster sampling design, an unbiased estimator of the population total is where n 2 .
Proof: Let i w represent the new value of a study variable of the i-th unit in the k-th network, given by . Under an initial sample with general unequal probability inverse sampling, when observed value i y is replaced by the value i w , we obtain the results of this theorem.

Simulation Study
The ring-necked ducks data given by Smith et al. (1995) was used as the study population.The population consists of  N 200 units.In general unequal probability inverse adaptive cluster sampling, the population units are selected by using probability proportional to auxiliary variable.Auxiliary variable (x) correlated to the study variable are generated with the coefficients of correlation equal to 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7 and 0.8.
The initial selection probability of the i-th unit is 200 i i j j 1 z x / x    .For general inverse adaptive cluster sampling, the probability of selection a unit is equal to 1/ 200 .Simulations of sampling from the population were carried out to study the properties of the general unequal probability inverse adaptive cluster sampling (GUIACS) compared to the general inverse adaptive cluster sampling (GIACS) when sampling is with replacement.We chose the condition  

Discussion
An adaptive cluster sampling is an efficient sampling design for rare and clustered population.However, if an initial sample in adaptive cluster sampling is selected by fixed sample size design, a sample may not consist of units of interest.The paper considered a general unequal probability inverse sampling and combined it with adaptive cluster sampling.An unbiased estimator of the population total and an unbiased estimate of its variance were derived using Murthy's method.The simulation study showed that the efficiency of the general unequal probability inverse adaptive sampling design depends on the coefficient of correlation between the study variable and the auxiliary variable, number of sampled units of interest, initial sample size and truncated sample size.
When the auxiliary variable is highly correlated with the study variable the general unequal probability inverse adaptive cluster sampling design is fairly efficiency.In addition, the general unequal probability inverse adaptive cluster sampling design is more efficient than general inverse adaptive cluster sampling design.However, when the auxiliary variable is not appropriate for the study variable, an estimator of parameter by using the general unequal probability inverse adaptive cluster sampling design may lose its efficiency.
, , y .An initial selection probability of the i-th unit is denoted by i z .The parameter to be estimated is the population total, that population units are divided into two classes according to whether the study values satisfy a given condition.A common form of the condition is   y : y c  where c is a given constant.The class of units which study values satisfy the condition is defined to be the class ( C ) of interest and C is the class of the remaining units.In unequal probability inverse sampling, we select units one at a time with unequal probability with replacement until we have obtained a given number   m of units from class C , in the sample.The sample size   1 n is a random variable.The sample   s can be partitioned into two parts: a part c S is the set of sample units from the class C and c S is the set of sample units from C with cardinalities m and  1 n m , respectively.
We stop further sampling if the initial sample consists at least m units from class C. Otherwise, we sequentially continue sampling with unequal probability with replacement until either the sample consists of m units from the class C or the total sample size is equal to 2 n (  2 0n n ) where 2 n is fixed in advance.This sampling scheme is called a general unequal probability inverse sampling.Let 1 n denote the sample size of the final sample.fixed, the sampling design reduces to unequal probability inverse sampling.The sampling design reduces to original unequal probability sampling when 0 2 n n  .If the probabilities of selection are equal, the sampling design is a general inverse sampling.
Let k  denote the set of units in the k-th network and k m denote a number of units in the network.The total value of the study variable in a network k


y : y 0 for dividing the units into class C or C .The numbers   m of sample units satisfying the condition are 3, 4, 5 and 6.The initial sample sizes   0 n are given as 10, 15, 20 and 25.The final sample sizes   2 n in the initial sample are 25, 35, 45 and 55.The neighborhood of a unit is defined as the set of the four adjacent units.The simulation consists of 50,000 replications.The population total (  ) was estimated for each sample.In each sampling design, the values of the estimates    ˆare averaged.The averages were interpreted as expected values, i.e.,