Bootstrap Confidence Intervals for the Estimation of Average Treatment E ff ect on Propensity Score

Funded by collage st. innovative projects Received: January 21, 2011 Accepted: February 10, 2011 doi:10.5539/jmr.v3n3p52 Abstract Causal inferences on the average treatment effect in observational studies are always difficult problems because the distributions of samples in the two treatment groups can not be observed at the same time, and the estimation of the treatment effect is often biased.In this paper, the propensity score and the propensity score subclassification, selected from several methods, are used to assess the treatment effect.The estimation of the average treatment effect give the Bootstrap confidence intervals. Simulation studies are inducted for the continuous samples in normal distribution and the mixed samples of discrete and continuous type.


Introduction
Causal inferences on the average treatment effect in observational studies are difficult problems because the effect could be confounded with the covariates whose distributions differ systematically in the two treatment groups, and a direct estimation of the treatment effect is often biased.Propensity score method has been shown to be an effective way to reduce this bias in the point estimation of the average treatment effect.However, we have not been well developed the inference procedures concerning this average treatment effect.A generally used approach is to stratify the data based on the estimated propensity scores and carry out the desired inferences as it is in a stratified random sample.
However, the validity of such procedures, is rather questionable.As the subclassification is based on the propensity scores estimated from a common logistic model, the responses within each subclass and between the subclasses are not likely to be independent.Meanwhile, the estimation of the unknown propensity scores also presents an addition source of variation, which will affect the variance estimate in the inference.
We introduce a Bootstrap confidence interval that takes into account the dependent structure of the propensity score stratified data in this paper, as well as the extra variation arisen from the propensity score estimation, under an assumption that the measured covariates can be balanced within all the subclasses based on estimated propensity scores.Different from the current methods, this procedure does not require an estimation of the variance quantity on the purpose of inference.Nor does it assume any specific distribution for the pivotal statistic used in the traditional confidence interval construction.

Proposition of the problem
Let P be a population from which we have a random sample consisting of N units.For each unit i in the sample, i = 1, • • • , N, let Z i be a binary treatment assignment variable so that unit i receives some control treatment if Z i = 0, and unit i receives some treatment if Z i = 1.For example, we may toss a coin and let unit i receives the treatment if the Head appears.The coin may or may not be biased.For an unbiased coin we have This is the case, for instance, when we randomly divide the N units into two groups, one for treatment and the other for control.For many problems in medicine and economics involving observational studies or evaluation studies, the above assumption on randomization is usually not realistic.In these cases, typically, the assignment probabilities vary with individuals.It is useful to think that they depend on some extraneous individual characteristics.
In this paper we shall consider the simple case where each unit i has a scalar outcome depending on the assignment variable Z i .We let Y i (1) denote the outcome if Z i = 1, that is, unit i is under treatment;and Y i (0) denote the outcome if Z i = 0, or unit i is under control.For each unit i, the following quantities are assumed to be observable: (1) a covariate vector (2) an assignment variable Z i , which is correlated with correlated with X i (3) the outcome variable The quantity Y i (1) − Y i (0) is known as the causal effect for unit i caused by the treatment.The primary focus of this paper concerns the construction of good confidence intervals for θ, the average treatment effect over population defined by We can also write noting that using the conditional mean treatment effect Actually, there exists a problem that we want Y i (1) and Y i (0) at the mean time when estimating the average treatment effect.Throughout this paper we shall make the following important assumptions on conditional independence (Rosenbaum and Rubin, 1983) to solve this difficulty.
ASSUMPTION 1.1 The treatment assignment is said to be strongly ignorable if Y i (1) and Y i (0) are independent of Z i conditional on Using the assumption we may express g(X i ) as This last expression seems to suggest that g(X i ) depends on the dimension of covariate X i .Serious practical problems occur when the dimension of X i is large, which is usually the case in many applications involving comparisons of treatment effects.To reduce the dimension of X i , we shall use the idea of propensity score proposed by Rosenbaum and Rubin(1983).
DEFINITION 1.1 (PROPENSITY SCORE)The propensity scores are probabilities for receiving the treatment conditional on the covariates, that is e By the assumptions 1.1 and definition 1.1 we get and then draw It can be directly obtained the unbiased estimate of E[Y i (1)|Z i = 1, e(X i )] and E[Y i (0)|Z i = 0, e(X i )] if the propensity score is known, then we naturally obtain the effect estimate of the project, as well as the treatment group.We will apply the Bootstrap method to estimate the confidence interval of θ.
This paper is organized as follows:In section 2, we briefly introduce the maximum likelihood method to estimate propensity score.In Section 3 we describes two methods to estimate the average treatment effect using propensity score.In Section 4 we report the several methods of estimation for Bootstrap confidence intervals.In Section 5, we illustrate the estimation of the average treatment effect for confidence intervals with simulations of different methods, and make a brief discussion about the potential utility of the method in practice.

Maximum likelihood estimation of propensity score
Since the propensity score e(X) is rarely known, we usually estimate the unknown propensity scores via a logistic model As (Z, X) follows the logistic model, we drawn a random sample (1, X 1 ), , which contains n 1 + n 2 vectors from the population (Z, X).Then the logarithmic likelihood function is Derivate to parameters like β 0 , β 1 , • • • , β q respectively.Using Newton-Raphson iterative method to seek the maximum of l(β).Maximum likelihood estimation βmle of β is available and then estimate the propensity score e(X) as 3. Different methods in estimating average treatment effect

Propensity Score Method
As described in the introduction we can directly use estimated propensity score considering covariates were known and the estimation is Where ê(X i ) is an estimate of the propensity score e(X i ).

Propensity Score Subclassification
Using the estimated propensity scores we stratify all the subjects into K subclasses so that the estimated propensity scores have similar values within each subclass.Let J im represent the mth subclass' characteristic function.One way of this method is to divide the unit into M blocks.The boundary of each block is m Then the population average treatment effect is

Bootstrap method in estimating confidence intervals
Bootstrap method is a better approximation method to estimate the interval structure, and we will introduce two common methods in Bootstrap interval estimation.

Percentile Method
Percentile method is also called Bootstrap-p method.Let θ = θ(F) and θ = θ(F n ), where θ is the estimate of θ.We want to find the 1 − 2α confidence interval of θ.
We can derive confidence interval directly by the formula if G is known.If G is unknown, the empirical distribution function G n of G can be substituted.Algorithm is as follows: as step 2 and 3,and obtain θ *

BCa Method
BCa method is the abbreviation standing for bias-corrected and accelerated method, which is defined more complicated than the percentile method.
as described in the previous section 2. The bias-correction constant z 0 is computed as where 3. Compute the acceleration parameter a.There are various ways to compute the acceleration parameter a.The easiest to explain is given in terms of the jackknife values.G (i) denotes the sample with the ith observation removed form the original sample G, and θ(i where θ(•) 4. The resulting 1 − 2α confidence interval is defined as where ), Here Φ(•) is the standard normal cumulative distribution function and z (a) is the α percentile point of a standard normal distribution.
We calculate the 100(1 − 2α)% BCa confidence interval of the average treatment effect following Hall and Martin.The bias-correction constant is computed as where Φ(•) is the cumulative distribution function of the standard normal distribution.The two-sample acceleration parameter is computed as where σ2 = σ2 t jack n −1 t + σ2 c jack n −1 c ; we use the jackknife variance estimates here from original sample σ2 t jack and σ2 c jack to estimate the unknown variances of the two treatment groups, according to the method of Efron and Tibshirani(1993);r t and rc are the sample skewnesses of the respective groups.
Sorting the Bootstrap treatment effect estimates into increasing order, θ(1 and [x] is the largest integer less than or equals to x.

Simulation and discussion
We re-introduce the notation with the subscription reserved for the subject to better describe the proposed Bootstrap procedure: Let n = n t + n c be the total number of subjects;(Y i , X i , Z i ) contain the response variable, the covariate vector for the true propensity model, and the treatment assignment for the ith subject, where i = 1, • • • , n.Our procedure begins with the fitting of propensity model from using the original sample (X i , Z i ) for i = 1, • • • , n.We estimate the propensity score of each subject after fitting the model.a) Obtain the average treatment effect using propensity score: We stratify the subjects into K homogeneous subclasses based on these estimates.Then the post-stratification balance of each covariate is then examined.A point estimate for the average treatment effect is obtained as We re-sample with replacement n t treated and n c control subjects separately from the treated and control subjects in the original sample for each bootstrap iteration.Let ( . By using the re-sampled data (X (b)  i , Z (b) i ), we re-fit the same logistic model and re-estimate the propensity score for each of the re-sample subjects,ê (b)  i .We then stratify the bootstrapped responses Y (b) i , and compute the mean treatment effect, and denote it as θb .
We conducted two simulation studies to assess the finite sample performance of the proposed procedure.Firstly, we generate the covariates X.We consider a logistic regression propensity model with three covariates in our simulation: 1) There are continuous covariates X 1 ,X 2 and X 3 .We compare the confidence intervals of percentile method and the BCa method simulated by propensity score.
2) A continuous covariate X 1 and two binary covariates X 2 and X 3 .We estimate the average treatment effect using percentile method and propensity score subclassification method and obtain the confidence interval by the two methods.
To simulate different situations of the covariate distributions systematically in the two treatment groups, we generate the covariate deviates for the treatment and control groups separately: For the control group(Z = 0), we assume that X 1 ∼ N(0, σ 2 1 ), X 2 ∼ Bernoulli(p 2c ) and X 3 ∼ Bernoulli(p 3c );for the treatment group (Z = 1), we assume that We would be able to simulate situations of varying level of differential covariate distributions by controlling d and the probabilities p 2c , p 2t , p 3c and p 3t in the Bernoulli distributions.
With the pseudo-random covariate deviates X and the treatment assignment Z, we obtain responses Y from a linear relationship Y = Zδ + X t β + ε by giving values of δ and β and the independently generated normal errors ε ∼ N(0, σ 2 ε ).Here, δ represents the true treatment effect.As covariates X have different distributions in the two treatment groups, the effect of the treatment δ can not be directly estimated from the response Y before the first adjusting for the effects of X.For simplicity, throughout the simulation we set the coeffcients of the covariates to be 1)(β 1 , β 2 , β 3 ) = (0.5, 0.4, 0.4);2)(β 1 , β 2 , β 3 ) = (0.5, 0.4, 1.5).
The other values of the parameters that we used in the simulation are listed in Table 1 and 2. It should be noted that several factors lead to the extend of confounding effects of X on Y, including: 1) numbers of β 1 , β 2 , β 3 , which have direct effects on the level of confounding;and 2) the differential distributions of X in the two treatment groups, which affect the level of confounding indirectly.
For each parameter structure,we conduct 1000 simulations in an iterative model.We use 2000 Bootstrap to construct 95% and 90% confidence intervals within each iteration.The empirical coverage probabilities under different parameter configurations are reported to assess of the performance of the proposed procedure.In order to understand how the coverage property changes in different sample size situations, we consider sample sizes as n c = n t = 500, for each of the parameter settings in Table 1 and 2. The simulation results are reported in Table 3-1 and 3-2, the results of simulation 1, and Table 4-1 and 4-2, the results of simulation 2. L, Ū denote the mean lower and upper confidence limit in 1000 simulation, and E( θ) is the mean estimate of θ.B-p means the Bootstrap-p method and B-S means Bootstrap stratified method.
Propensity score methodology has been applied to many clinical and epidemiological studies successfully since Rubin and Rosenbaum's early creative work.It has become a widely used tool to reduce the potential bias in treatment effect estimation during observational data analysis.We have proposed a Bootstrap method based on inference procedure for the treatment effect within the framework of propensity score and propensity score subclassification in this paper.
Our study shows that the proposed method provides valid causal inferences in large observational studies.It has several advantages in summary: First, it does not require a variance estimation.Our experience suggests that it is difficult to directly estimate the analytical derivation of the variance in the treatment effect estimate generally, if not entirely impossible.Secondly, it does not rely on any restrictively distributional assumption on the covariates.This method is particularly important in practice because there are rarely explanatory variables, all of which are normally distributed.Thirdly, the Bootstrap intervals consider the variation that arises from the estimation of propensity scores, and they accommodate the dependency among the responses both within and between subclasses due to the ordering structure introduced by the subclassification.Finally, it is relatively easy to implement the new Bootstrap procedure in most computing platforms.
Our simulation suggests that the empirical coverage of the procedure are reasonable.While the empirical coverage of the probabilities are below the nominal level (95% and 90%), they are closer to the nominal level when the sample sizes are greater than 1000 per group.The simulation results also show that the BCa confidence interval is slightly better than percentile method on the coverage probability, as well as the accuracy of estimated intervals.Propensity score subclassification is better than percentile method in the two aspects.The size of the treatment effect is an additive component in a linear model; δ only stands for a shift in the central locations between the two treatment groups when the responses are generated from this model.The simulation also shows that when all of the covariates are used in the logistic regression model to estimate the unknown propensity scores, the proposed method adjusts for the effects with the systematically different covariates quite effectively.
Although the first simulation results are promising, more extensive simulation studies are apparently needed to establish the operating characteristics of the proposed method for various practical data situations.In this case, the current simulation has several limitations: First, it only considers balanced designs while few observational data have balanced group sizes.For example, the sample n c = n t = 500 is used to reduce the imbalance.Second, the values range of the parameters used in the current simulation is still limited.For example, only values of β in linear relationship Y = Zδ + X t β + ε are used to produce random responses.Since β has a direct effect on the level of confounding between the observed covariates X and the treatment assignment X, it is interesting to examine the performance of the proposed method under many different values of β.
A smaller value of β decreases the level of confounding when we hold other parameters constant(in the most extreme case of β = 0, we have Y = Zδ + X t β + ε, indicating no confounding effects from X.In addition, parameters p 2c , p 2t , p 3c and p 3t control the different distributions between the treatment groups for a set of pre-selected β values.The size of d, and the difference between p 2c and p 2t (or that between p 3c and p 3t ) reflect the separation of covariate distribution between the two treatment groups.In the current simulation, we only consider one d value, and a limited number of binomial probabilities.We feel that further investigation is certainly necessary to have a more comprehensive understanding of the new method's operating characteristics according to these observations.Our current work pays more attention on a re-sampling based on the approach for constructing a simple confidence interval of an unknown treatment effect.Several related issues have yet to be explored.Treatment effects estimated by other measures, such as Stratified Matching Method and Stratifying Regression Method, also necessary to be discussed.