ffi ciency of Second Occasion over First Occasion On Successive Sampling for Regression Estimation

Received: December 14, 2010 Accepted: January 21, 2011 doi:10.5539/jmr.v3n2p235 Abstract In this study, successive sampling for regression estimation was used to determine the current estimate of the mean, minimum variance, maximum precision, estimate of change between the two successive occasions under consideration and estimate of average over the period of the two occasions. The data used were based on the number of persons per block in 50 randomly selected blocks out of 500 blocks in Uli, Ihiala Local Government Area, Anambra State, Nigeria.The data were gotten from National Bureau of statistics, Nigeria. The estimate of change and estimate of average over time gives their optimum variances when


Introduction
Successive sampling is used repeatedly to survey a population over time.It allows the first sample to be taken (on the first occasion) and a second sample is then taken (on the second occasion).The scheme provides an opportunity of making use of the information obtained in the first sample in improving on the precision of future estimate.In another perspective, Successive sampling refers to the process of eliminating some of the old element from the sample and adding new element to the sample each time a new sample is drawn.
It has been extensively used in applied Sciences, Sociological and economic research to provide more efficient estimates of population characteristics such as means of measuring time trend as well as current characteristics of time series.There are several types of procedures to adopt for estimating the population parameters in successive sampling.The sample may be used on each occasion and new sample may be taken on each occasion.A part of the sample may be retained while the remainder of the sample may be drawn afresh.
Conditions to be considered in taking successive sampling are: For estimating change from one occasion to the next, it may be best to retain the sample on each occasion, a new sample on each occasion (that is entire independent sample) is drawn if average of change is our target, and it is desired to estimate the mean on each occasion and also the change from one occasion to the next, it may be best to retain part of the sample and draw the remainder of the sample afresh.
In addition, it should be noted that high positive correlation always exists between observations made on the same unit at two occasions that are successive.
On the first occasion, a sample of n units is selected from N units in the population by SRSWOR.
On the second occasion a sub-sample of λn units, (0 ≤λ≤ 1), is selected from the first occasion sample by SRSWOR.This is supplemented by a fresh sample of μn units, μ+ λ= 1,selected by SRSWOR from N or from (N-n) units.Information on the characters X and Y are obtained on all the sample units on both occasions (Okafor, 2005).The theory of successive sampling appears to have started with the work of Jessen (1942).He utilized the entire information collected in the previous occasions and obtained two estimates, one was the sampling mean based on new sample only and the other was a regression estimate based on the sample units observed on both occasions by combining the two estimates.
Yates (1949) extended Jessen's scheme to the situation where the population mean of a variable is estimated on each one of the h ≥ 2 occasions from a rotational sample design.The results were generalised by Patterson (1950) and Tikkiwal(1951).Patterson assumed that the correlation between observations on the same units on occasion h and occasion h + k is.
He arrived at this using partial replacement of units from occasion to occasion.A set of conditions were established in the work whereby a proposed estimate may be tested for efficiency.It was shown that in the case in which correlation coefficient ρ is available and the number of units replicated is the same for all occasions, efficiency estimates are not easily calculated for the last but one occasion.Patterson (1950) aimed at providing the optimum estimate by combining.
(i) a double sample regression estimate from the matched of the values of y on the first occasion and (ii) a sample based on a random sample from the unmatched portion on the second occasion.The theory was generalised to provide the optimum estimate using p auxiliary variables (p ¿ 1).Eckler (1955) introduced the idea of rotation sampling and obtained the minimum variance unbiased estimator (MVUE) of the population mean assuming an infinite population.Tikkiwal (1955) developed the estimation of the mean of several characters in a multipurpose sampling on occasions.He assumed the pattern of the correlation between the ith and the jth occasion to be of the form, Rao(1957),considered the estimate of the population ratio based on sampling two occasions.He used a ratio type estimator of the form,R= r 2 R 1 r 1 .Where r 1 and r 2 are sample ratio of two characters for the first and second occasions respectively.R is known as the population ratio of the first occasion.Kulldorf(1963) modified Jessen's scheme of the sampling by selecting the unmatched sample from the units not selected on the first occasion.He considered in detail the optimum choice of the matching fraction under the most general form of cost function apart from fixed costs.Rao and Grahim (1964) developed a unified population theory for composite estimators and employed fixed rotation design in a finite population.Raj(1965) pioneered the use of varying probability with replacement for sampling over two successive occasions by using probability proportional to size with replacement (PPSWR) and a simple random sampling without replacement (SRSWOR).His estimation for the population total on the second occasion was based on a linear combination of two independent estimates of the population total from the matched and unmatched samples.This Raj's estimator was modified by Pathak and Rao(1967, Ravindra Singh(1972, 1980) modified Raj's estimator further.Tikkiwal(1967) again dealt with the study of k characters in a multipurpose survey for finite and infinite populations on each h¿2 occasions under a specific correlated pattern.Singh,D (1968) was the first person who extended the theory of one-stage sampling to two-stage sampling scheme over two occasions.His scheme sampling is as follows: On the first occasion, a sample of n first stage units (F.S.U) is selected by simple random sampling and without replacement (SRSWOR) from a population of N.F.S.U.Each of the selected F.S.U was sub-sampled by SRSWOR method.
On the second occasion, a simple random sample np (0 < p < 1) of these F.S.U are retained along with their second stage unit (S.S.U) selected on the first occasion.A fresh sample of nq first stage unit (p+q) =1 is selected from the entire population again by SRSWOR to obtain the required number of second stage unit.
Singh D(1968)obtained a minimum variance linear unbiased estimator for the population means assuming an infinite population.Abrahim, khosta and kathuria (1969) considered the sampling scheme where partial matching of units was carried out at both stages.Their result was applied to a survey on the incidence of pest and diseases.Ghangurde and Rao(1969) extended Raj's strategy by introducing varying probability without replacement scheme of sampling for the succession of initial sample and unmatched sample S 2 .Sen (1971), developed the theory of sampling over two occasions using two auxiliary variables with unknown population mean.Sen(1972) generalised Jessen's work by using a double sampling multivariate ratio estimate,using P-auxiliary variables (p > 1) from the matched portion of the sample.
Expressions for optimum matching fraction and the combined estimate and its error have also been derived.He also considered the theory of the case where the mean of matched sample on second occasion is adjusted by the multivariate ratio estimate for equal sample sizes.Kathuria and Singh (1971) examined empirically the relative efficiencies of the three sampling schemes for sampling on successive occasions using two-stage design.Shivtar and Srivastava (1973) improved on Singh's (1968)estimators by using several auxiliary variables in estimating the population mean on the most recent occasion,difference in mean between any two successive occasions and the overall population mean for all the occasions.Srivastava and Shivtar (1974) used a more general two stage sampling scheme of partial retention of both primary units and secondary units in line with Abraham et al (1968) auxiliary variable.Tripathi and Sinha (1976) considered estimation of the population ratio over two occasions by taking a linear combination of the estimates of the population ratio based on matched and unmatched samples.Arnab (1976) also generalized Raj (1965), Ghanguarde and Rao (1969) strategies to h > 2 occasions.Tewari (1981) estimated the domain mean using successive sampling scheme.Lokesh Arora and Singh (1981) extended the theory of successive sampling for the estimation of the population mean to the estimate of the frequency distribution of the population.Arnab (1982) has shown that strategies due to Raj (1979) can be improved upon by selecting the current unmatched sample from the complements of the matched sub-sample for the current occasions.Das(1982) extended Tripathi and Sinha's work using a single auxiliary variable for the estimation of the population ratio over two occasions.Chaturvedi and Tripathi (1983) extended the theory further using more than one auxiliary variables.Okafor (1987)compared some estimators of the population total in two-stage successive sampling using an auxiliary variable.Also Okafor (1988) obtained ratio-to-size estimators of the mean per subunit using two-stage sampling over two occasions.
The estimation of the population ratio in two-stage sampling over two occasions was considered by Okafor and Arnab (1987).Iuchan and Jones (1987) looked into rotation sampling patterns while Steel and Maclauren (2000) examined the effect of these rotations of trend estimates.Artes, Rueda and Arcos (1998), in examining the relationship between the auxiliary variables X and the study variable Y,proved that when X and Y are negative, the optimum estimate which combines a double sample product estimates for the matched part of the sample and simple sample mean for the unmatched portion has less variance than the usual estimator provided that, Artes and Garcia (2001) worked on estimating the current mean in successive sampling using a product estimate.Garcia Luengo ( 2004) considered the problem of estimation of a finite population mean and for the current occasion based on the sample selected over two occasions for the case when, several auxiliary variables are correlated with the main variable.
A double sampling multivariate product estimate from the matched portion of the sample is presented.Expressions for optimum estimator and its error have been derived.
The gain in efficiency of the combined estimate over the direct estimate using no information gathered on the first occasion was computed.Artes et al (2005) also worked on successive sampling using a product estimate but, they considered the case when the auxiliary variables are negatively correlated and double sampling product estimate from the matched portion of the sample was presented.
Expression for optimum estimator and its variance have been desired.Reuda et al (2006) talked on estimating quantitative under sampling on two occasions with arbitrary sample designs.Rueda et al (2007) further extended the work on successive sampling in estimating quartiles with p-auxiliary variables.
They mainly discussed the estimation of quartiles for the current occasion based on sampling on two occasions and using p-auxiliary variables obtained from the previous occasions.A multivariate ratio estimate from the matched portion was used to provide the optimum estimate of a quartile by weighing the estimate inversely to derive optimum weight.Housila et al (2007) looked into the problem of estimating a finite population quantile in successive sampling on two occasions.They aimed at providing the optimum estimates by combining.
(i) three double sampling estimator viz.ratio-type, product-type and regression-type estimator, from the matched portion of the sample and.
(ii) a sample quantile based on a random sample from the unmatched portion of the sample on the second occasion.
A simulation study was carried out in order to compare the three estimators and it is found that the performance of the regression-type estimator is the best among all the estimators discussed.Manish and Shukla(2008)looked into the efficient estimator in successive sampling using post stratification.They stated that it is often seen that a population having large number of elements remains unchanged in several occasions but the value units change.
In their work, they introduced an estimator under successive survey, the estimator is unbiased and efficient over poststratification estimations, the minimum variance of the optimum estimator was derived and comparative study incorporated.Housila et al (2010) looked into estimation of population variance in successive sampling and proposed a class of estimators of finite population variance in successive sampling on two occasions and analysed its properties.
These classes of estimators were motivated by Isaki (1983)to consider the problem of estimation of finite population variance in survey sampling.A general class of estimators for estimating the finite population variance σ 2 y based on the matched portion of the sample consisting of m units is defined as S 2 t = t(S 2 ym , v 1 , v 2 ), the bias and variance of the estimator, S 2 t , exist since the number of possible samples is finite and it is assumed that the function is bounded.The bias of the estimator S 2 t is of the order of m −1 and hence its contribution to the mean square error will be of the order m −2 .The new estimator S 2 proposed which is more efficient than the usual unbiased estimator S 2 y 2 was gotten from the combination of a bias and unbiased estimator based on unmatched units.The intention of this paper is therefore to find out if the second occasion has a greater efficiency than the first occasion.

Data Used
The data employed for this study is a secondary data from the record of the number of persons per block in 50 randomly selected blocks out of 500 blocks in Uli, Ihiala Local Government Area, Anambra State, Nigeria.The data were gotten from National Bureau of Statistics, Nigeria.

Methodology
The linear regression estimate is designed to increase precision by the use of an auxiliary variable X which is correlated with Y.When the relation between Y and X is examined,it may be found that although the relation is approximately linear,the line does not go through the origin.This suggests an estimate based on the linear regression of Y on X rather than on the ratio of the variables.We suppose that Y and X are each obtained for every unit of the sample and the population mean X of X (x 1 , x 2 , ....., x N )is known.The linear regression estimate of Yis.
where the subscript L r denotes the linear regression and b is an estimate of the change in Y whenever there is a change in X.The rationale of this estimate is that if X is below average,we should expect that Y also should be below average by the amount, b(X -X / ) because of the regression of Y on X.A population sampled over two occasions is considered for making current estimates of the population characteristics.On the first occasion, a sample of n units is selected from N units in the population by SRSWOR.On the second occasion, a sub-sample of λn units,(0 ≤ λ ≤ 1), is selected from the first occasion sample by SRSWOR.
This is supplemented by a fresh sample of μn units (μ +λ= 1),selected by SRSWOR from N or from (N-n) units.Information on the characters X and Y are obtained on all the sample units on both occasions.

Results
From the data used, a random sample of fifty blocks were selected from a population of 500 blocks on each occasion; these comprises of 25 matched blocks and 25 unmatched blocks that is n = 50.The minimum variance unbiased estimator is used for the estimation because of the assumption of the equal variability at both occasions.We used the variance obtained by pooling the variances from the matched samples on the first and second occasions as an estimate of the population variance.The pooled variance is therefore The variance for the matched portion.
Estimate of unmatched fraction, Estimate of the unmatched portion, The optimum / minimum variance.
Estimation of optimum variance, Mean at the second occasion.
The optimum variance of the mean at the second occasion.
Mean at the first occasion.
If a single sample of 50 blocks had been used at the second occasion; the variance of the sample mean obtained would be.
Thus, the gain in efficiency of using sampling on two successive occasions is. G Estimate for average over time, Average over time.
From table 1, minimum variance as well as maximum precision is achieved when ρ = 1.This implies that there is a perfect positive relationship between the first and the second sampling occasions.From tables 3,variance estimates of change give the same values at ρ = 0 and μ = 1.Therefore to enhance precision,μ should be considerable small and ρ considerably high.It should be noted that for a constant value of ρ and changing values of μ, the variance estimate of change changes without a particular pattern.But when μ is kept constant and ρ is varied the variance estimate decreases with every increase in ρ.This implies that variance estimate is dependent on the value of ρ.
From tables 4 and 5, the variance estimates of sum shows that the same estimate was obtained when ρ and μ the said estimate happens to be the minimum value when compared to others at different value of ρ.To enhance precision therefore, independent samples should be taken on each occasion.

Conclusion
Based on the collected data utilized in the analysis of this study, we have beeen able to achieve extensively the intention of the work.The current estimate of the mean was found to be 25.3537 with a variance of 1.7023 and a standard error of 1.3047.The minimum variance as well as maximum precision was achieved in table 1 when ρ =1, because at this point,the value was 1.2558 which is the miminum among other values of ρ.
The estimate of change was found to be 2.4296, having a variance of 0.1042 and standard error of 0.3228.Furthermore,the estimate of average over time was found to be 24.1401 with a variance of 1.6709 and a standard error of 1.2963.The estimate of change and estimate of average over time all give their optimum variances when ρ = o and μ = 1 under varying values of ρ and μ in tables 2 − 5. V(Σ) 0.0 5.0235 0.2 5.4801 0.4 5.8607 0.6 6.1827 0.8 6.4587 1.0 6.6979

Table 1 .
Table for variance of current estimate with different values of ρ

Table 2 .
Table for Estimate of change with different values of μ

Table 3 .
Table Estimate of change with different values of ρ

Table 4 .
Table of Average over time with varying μ

Table 5 .
Table of Average over time with varyin ρ ρ