Bootstrap Based Confidence Interval Estimation of Quantiles for Current Status Data

In this paper, we proposed a bootstrap approach to construct the confidence interval of quantiles for current status data, which is computationally simple and efficient without estimating nuisance parameters. The reasonability of the proposed method is verified by the well performance presented in the extensive simulation study. We also analyzed a real data set as illustration.


Introduction
Current status data, also called the "case 1" interval censored data, arise extensively in epidemiological studies, clinical trials, and other areas, where the time of occurrence of some event is of interest, but one only know whether the event has occurred or not at the examination time. For example, Keiding et al. (1996) provided such a data set, which is also analyzed to illustrate our approach. The data set concerns 230 Austrian males, in which it is interested to acquire the time to infection by Rubella and each subject was tested once during the period 1-25 March 1988 for immunization against Rubella. Apparently, the exact time to infection is impossible to be known, and the only information available for each subject is whether the immunization is achieved or not at the examination time.
In the past three decades, there have been considerable researches on the analysis of current status data. Among of them, estimating the distribution of failure time and some other related topics are still attractive and many meaningful results have been obtained. Let X be the failure time having the distribution function F, and T the examination time having the distribution function G. Then the observed data consists of n independent and identically distributed copies of (T, ∆), where ∆ = I(X ≤ T ) and I(.) denotes the indicator function, namely, (T i , ∆ i ), i = 1, . . . , n. When F and G are continuously differentiable at t 0 ∈ (0, +∞) with derivatives f (t 0 ) > 0 and g(t 0 ) > 0, Groeneboom and Wellner (1992) showed that for the nonparametric maximum likelihood estimator (NPMLE) F n of F, in distribution, where C = argmin t∈R W(t) + t 2 , and W is a standard two-sided Brownian motion process with W(0) = 0. Then it seems to be straightforward to construct the confidence interval for F(t 0 ). But to do this, one need to estimate the nuisance parameter κ in the limiting distribution. As noted by Groeneboom and Wellner (2005), the difficulty in computation of κ is enormous, and the resulting performance is rather unstable. To avoid this trouble, they suggested the likelihood ratio approach and conducted a small simulation study.
Although estimating the survival probability is very useful, quantile survival time has more flexible interpretation, and plays an important role in various statistical applications, especially in data modeling, reliability and medical studies.
The existing researches about the quantile for survival data mainly focus on right censored data. However, the same problem for current status data just attracts little attention. When without covariates, Banerjee and Wellner (2001) briefly discussed the problem of constructing the confidence interval by using the likelihood-ratio test. When covariates exist, Ou et al. (2016) proposed a novel estimating method for linear quantile regression model, see also the reference therein.
In this paper, we are interested to construct the confident interval of quantiles for current status data without covariates. Note that confidence region of quantile in Banerjee and Wellner (2001) is obtained by inverting the acceptance region of the likelihood ratio test, it is inevitable to tackle the same difficulty in the computation of confidence interval of F(t 0 ), as argued by (Sen and Xu, 2015). In the light of the merits of bootstrap method and its superior performance having been achieved in constructing the the confidence interval of F(t 0 ), we develop a current status model based bootstrap method to establish the confidence intervals of quantiles of F. It is worth noting that our bootstrap method is directly based on the NPMLE F n , not relied on the smooth version of F n , although the former is indeed adopted in the method of (Sen and Xu, 2015).
This paper is organised as follows. The current status model based bootstrap method is presented in Section 2, where a consistent estimator of standard error of NPMLE of a quantile is proposed. In Section 3, simulation studies are carried out to evaluate our proposal under different scenarios. In Section 4, we analyzed a real data set as illustration. Finally Section 5 provides some concluding remarks.

Methods
For the observed data (T i , ∆ i ), i = 1, . . . , n, the NPMLE F n of F can be obtained by maximizing the log-likelihood function over all distributions. Then for a given level τ ∈ (0, 1), we can estimate the τth quantile F −1 (τ), which is defined by F −1 (τ) = inf{s : F(s) ≥ τ}, by its empirical version F −1 n (τ), namely, F −1 n (τ) = inf{s : F n (s) ≥ τ}. As F n is a step function, F −1 n (τ) is naturally also stepwise with respect to τ. Banerjee and Wellner (2001) have derived the asymptotic distribution of F −1 n (τ) (0 < τ < 1) as follows, in distribution, where t = F −1 (τ). Following the limiting distribution, a natural asymptotic confidence interval of F −1 (τ) with nominal level 1 − α , such as α = 0.05, is where Q 1−α/2 (C) is the (1 − α/2)100%th percentile of C (Groeneboom and Wellner, 2001). As γ herein involves with both the density functions of F and G at t, one has to resort to some smooth technique like the kernel method to yield a consistent estimate of γ. It is well known that to ensure the kernel method works well, the sample size usually should be moderate at least. And the cumbersome and heavy computation cost should be paid. Therefore, a computationally simple and stable procedure for solving the problem would be desirable.
In many situations, when the limiting distribution of a estimator has been established and the variance of the estimator is very complicated, one usually seeks to find a consistent estimate of the variance as a counterpart. To do this, one popular strategy is to employ the resampling approach. Among of them, the bootstrap method is typically adopted. However, under the current status model, using the naive nonparametric bootstrap method to generate the limit distribution κC has been shown to be inconsistent (Sen and Xu, 2015). As a result, other consistent bootstrap methods should be developed, more details see (Cattaneo et al.,2020) and the references therein.
In this paper, we do not follow them to approximate the limiting distribution, but develop a new bootstrap method aiming to find out a consistent estimation of the variance of the estimator of a quantile, not a consistent estimate of the limit distribution function. Since the variance of F −1 n (τ) is approximately equal to n −2/3 time by the variance of γC, and suppose that one can estimate the variance of F −1 n (τ) using a bootstrap method, then by solving this equation, a consistent estimation of γ yields, denoted by γ n . Through the simulation study in the next section, we find that the bootstrap method outlined below performs well and indeed can provide a desirable estimator of variance of F −1 n (τ). Furthermore, a bootstrap method to construct a asymptotic confidence interval of F −1 (τ) with nominal level 1 − α is introduced as follows.
Step 1: For the original observations (T i , ∆ i ), i = 1, . . . , n and fixed τ, one compute the NPMLE F n of F as mentioned above. Then F −1 n (τ) is computed.
Step 2: For each T i and b = 1, . . . , B, we generate the b-th bootstrap one of the corresponding indicator ∆ i through a bernoulli variable with success probability . . , n, compute the corresponding NPMLE F nb , then a bootstrap quantile F −1 nb (τ) is obtained. Repeat the process above B times.
Step 3: Utilizing the bootstrap quantiles F −1 nb (τ), b = 1, . . . , B, compute the variance of the limiting distribution through the sample variance of F −1 nb (τ). Thus a equation about γ n can be established by where D is the variance of C (Groeneboom and Wellner, 2001). The solution can be easily obtained.
Step 4: Finally, one can construct the asymptotic confidence interval of F −1 (τ) with level 1 − α by Remark 1: Similar to the techniques used in Groeneboom and Hendrickx (2017), the consistence of the estimator in the left side of the equality in (4) can be shown.
Remark 2: In the typical bootstrap method, the left side of the equality in (4) has the form which is centering the estimate F −1 n (τ). As explained above, due to the inconsistency of the nonparametric bootstrap method (Sen and Xu, 2015), this expression should be adjusted. If one use A to estimate the unknown variance, our simulation study display that it exaggerate heavily the true error, and the confidence interval is longer and has more higher coverage probability than the nominal level except the level 0.99.

Remark 3:
The computation of the NPMLE F n of F can be implemented using the R packages "Icens" or "curstatCI", and so on.

Numerical Studies
To evaluate the finite performance of the proposed method, several simulation studies were conducted. Here we mainly considered three simulation settings. Explicitly, the failure time X was generated from the distribution exp(1), | N(0, 1) |, and exp(1) but truncated on the interval [0,2], respectively. Corresponding to it, the examination time T was generated independently from the distribution exp(1), Unif(0,2), and Unif(0,2). These simulation settings are used in Groeneboom and Wellner (2005), Groeneboom et al. (2010), Groeneboom (2012), Groeneboom and Hendrickx (2017), and Sen and Xu (2015). The sample sizes n =50, 75, 100, 200, 400 and 800 were chosen. For each value of n, we generated 10000 samples independently from the preceding distributions of the pair (X, T ). For each data set, we used the method described in Section 2 to bootstrap the ∆ i from the Bernoulli distribution with the success probability F n (T i ), fixing the values of T i , by taking 1000 bootstrap samples and determining the (1 − α/2)100th percentile of the limiting distribution. All simulations are implemented using the software R. For each given setting and the NPMLE of the τ quantile, we record its empirical sample standard deviation (SD), standard error estimation (SEE), the observed average coverage probability (CP) and length (LEN) of confidence intervals with levels 80%, 90%, 95% ,99%, respectively, where the valves of τ vary from 0.1 to 0.9.
All simulation results are summarized in Tables 1-3. It can be found that the SD is close to the SSE. As the increase of the sample size, for any given τth quantile, the average coverage probability of its confidence intervals with a fixed nominal level is more and more near to the true level, and the corresponding average length becomes shorter. It is noted that when τ ≤ 0.2 or τ ≥ 0.9, the resulting behavior is not satisfactory, and the coverage probability is slightly below the nominal level, even though the sample size reaches to 800. This maybe caused by that few data can be obtained below or above the true quantile. The bad behavior at the tails also occurs in other problems. When τ lies between 0.3 and 0.8, the proposed method indeed behaves well, especially when the sample size is greater than 100. Besides, we also found a interest result.    Vol. 10, No. 5;2021 to estimate the unknown variance, the resulting SSE tends to heavily exaggerate the true error, and the confidence interval has more higher coverage probability except the level 0.99. This seems to be in line with the fact that the bootstrap method is inconsistent when approximating the limiting distribution, see Theorem 2 in Sen and Xu (2015).

Application
In this section, we applied the proposed bootstrap method to analyze the Rubella data set described in the Section 1 or Keiding et al. (1996). The data set contains 230 observations on the prevalence of Rubella in Austrian males. The goal here is to estimate selected quantiles of the distribution of the time to immunization. Several procedures to construct the confidence interval for this data set have been proposed. Explicitly speaking, MLE-based (p) denotes the MLE-based confidence intervals estimated using the Weibull parametric fits for density functions as obtained in Keiding et al. (1996), MLE-based (np) denotes the MLE-based confidence intervals where the density functions involved therein are estimated non-parametrically using the kernel estimates, LR-based denotes the confidence intervals constructed by the likelihood ratio-based method. We compared them with our method for quantiles for τ varying across the sequence 0.4, 0.5, 0.7, 0.9, and the result is exhibited in column 6 of Table 4, where data in the columns 3-5 origins from Table 5 of Banerjee and Wellner (2005). The confidence level is chosen to be 0.95, and the bootstrap time here is 5000. We found that the bootstrap based confidence interval has some similar behaviors with the existing methods and they overlap heavily. It is remarkable that for τ < 0.9, the difference between the results using different methods is slight, but for τ = 0.9, the difference is heavy. Maybe it is led by the sparse of data around it.

Discussions
In this paper, we suggested a bootstrap based method to construct the confidence interval of quantiles for current status data, which is computationally simple and efficient without estimating nuisance parameters. The reasonability of the proposed method is verified by the well performance presented in the simulation study.
Note that we only use the NPMLE of the distribution function of the failure time in the proposed method. As the convergence rate of the NPMLE is cubic root of n, the behavior may be not well when the sample size is small, such as n < 200 in our simulation study. To overcome this problem, one strategy is to replace the NPMLE by its smooth version.
The smoothed MLE has a faster convergence rate as discussed in Groeneboom et al. (2010) and Sen and Xu (2015). However, when applying this method, one has to select a proper bandwidth involved therein, which is critical and has drastic influence in making statistical inference. This is interesting and will be investigated in future.
Another interesting extension to the current research is the construction of confidence bands for the quantiles varying with τ. Besides, it can be conjectured that results similar to those presented in the current paper will follow for the case 2 interval censored data. These topics will be studied further but beyond the scope of this paper.