Approximate Nonparametric Maximum Likelihood Estimation for Interval Censoring Model Case II ( Running Head : NPMLE for Interval Censoring Case II )

We study the nonparametric maximum likelihood estimate of the distribution function in a type II interval censoring model. We propose an approximate solution of the problem under a technical assumption. Some basic asymptotic properties of the estimator are investigated.


Introduction
Interval censoring models are commonly used in practice, especially in the biomedical sciences, such as in many clinical trials and longitudinal studies, in acquired immune deficiency syndrome studies, or the studies of HIV infection times, etc.There are two basic types of interval censoring models.In the interval censoring model case I, also termed current status data model, the left (or right) censoring and the status variables of the original data are observed; while in the interval censoring model case II, two censoring variables and the statuses are observed.The type I interval censoring model is much more commonly used in practice, and there are rich statistical research and application literature about this model, for example, Huang (1996), Bebchuk and Betensky (2000), Chen and Jewell (2001), Yu et al. (2000), Pan and Chappell (2002).Fang and Sun (2001) studied the nonparametric maximum likelihood estimation (NPMLE) of the doubly interval censored model.Type II interval censoring model is less commonly used in practice, and its mechanism is more complicated, and there are some research literature on it, for example, in the book of Groeneboom and Wellner (1992), Wellner (1995), Groeneboom (1996), Jongbloed (1998).To our knowledge, investigation of this model is still incomplete, NPMLE of the distribution function exists, can be given by a nonparametric EM algorithm, and its strong consistency is obtained, but its asymptotic distribution seems not available yet.A good review of the topic of the two types of censoring models can be found in Sun (2012).Also, these models, along with the other survival models, provide practical background for the theory of the estimation of Banach space valued parameters.
Unlike Euclidean parameters, Banach parameters are often not √ n estimable, in that there exists no estimator which is consistent at rate √ n, and when the weak limits of estimators exist, they are often non-Gaussian.Also, unlike maximum likelihood estimators for Euclidean parameters, Banach parameters have no standard method of estimation.For some models, setting the normal equation of the Hadamard differential (along some particular selected direction) to zero will yield a version of the nonparametric maximum likelihood estimation (NPMLE), although it may not be unique.The isotonic regression is another commonly used method for this problem.Originally this method is used in the optimization of a class of regression problem, such as weighted least squares problem in the regression context, in which the response and covariate variables have some particular relationship (isotonic relationship, as characterized by Theorem 1.4.1 in Robertson, Wright and Dykstra, 1988), such as monotone relationship.Then it was found that the method can be used to solve the NPMLE for some models, as a optimization procedure.See Robertson, Wright and Dykstra (1988) for a systematic account of this method.Early usage of this method to compute NPMLE can be found in Grenander (1956), Chernoff (1964), Prakasa Rao (1969).This method works only for certain models with log-likelihood satisfying some convexity condition.For some models, one of the methods works; for some models, both methods work; for many other models, none of the methods works, and often approximate NPMLE can be found which maximizes the log-likelihood up to a small constant n , with n → 0. Such estimates are also called n -MLE, studied by a number of authors, such as Wald (1949), Bahadur (1967), and Wong and Severini (1991), and are shown to have some nice properties.For interval censoring model case II, Groeneboom and Wellner (1992), hereafter GW, studied the NPMLE of the distribution function, which uses iterative procedures for its computation.They showed the convergence rate of n 1/3 log n for the NPMLE when the data have some mass distributed around the diagonal, otherwise the convergence rate is n 1/3 , like that for censoring model case I, but the formal identification of the asymptotic distribution is not easy, although there is a conjecture about it.Another NPMLE is the one step iteration in the iterative convex minorant algorithm, with the true distribution as starting value.Groeneboom and Wellner justified the asymptotic distribution of the one-step iteration "estimator", obtained its asymptotic distribution.Since this one-step iteration is not an actual estimator, the result can only be used as a technical tool in the further study of the NPNLE.The iterative convex minorant is one of the computational algorithms of the isotonic regression, it makes the optimization problem into a sequence of weighted isotonic regression problems, with the weights obtained by the values at the preceding step.It is known that the parameter (a distribution function) in this model is not rate-√ n estimable, not even for many smooth functionals of it (for example, Yuan, Xu, & Zheng, 2012).Geskus and Groeneboom (1999) showed rate √ n efficient estimability for certain smooth functionals (mostly linear functionals) of this model.In this article we study a slightly different version of this model in which we observe the statuses of two right censoring variables.It is known that the (NP)MLE is neither always efficient, nor consistent, nor optimal (Wellner, 2005).Using the isotonic regression method, we find an approximate NPMLE of our model in that it maximizes the averaged loglikelihood up to a o(n −1/α ) (a.s.) term (0 < α < 1/2).It can be computed in closed form, we show rate n 1/3 weak consistency of it and evaluate its asymptotic distribution, study its strong consistency, convergence rate in probability, rate √ n estimability for certain smooth functionals of it, and for some submodels of it.
In Section 2 we describe the model, the approximate NPMLE of the underlying distribution function, and some basic results.Section 3 gives some illustration for the uses of this model and its estimation.Relevant derivations are given in the Appendix.
To study the asymptotic distribution of Fn (•), to study the asymptotic distribution of Fn (•), they used the following working hypothesis: starting from the real underlying distribution function F, the iterative convex minorant algorithm will give at the first iteration step estimator F(1) n (•), which is asymptotically equivalent to the maximum likelihood estimator Fn (•).Let D → denote convergence in distribution.GW justified the working hypothesis as below.Assuming that f (t) > 0, g(t, t) > 0 and that g(t, •) is left continuous at t, they obtained (Theorem 5.3, GW, p. 100) where B(h) is the two-sided Browning motion, originating from zero, i.e., it is a zero-mean Gaussian process on R and the increment B(r) − B(h) has variance |r − h|.
The proposed model and method.Our model below is slightly different from (0), in that we define γ = 1 [Z≤V] (instead of 1 [U<Z≤V] .In Example 1.6 of GW, the density-mass function of (U, V, ) is given as in ( 0), but we find that (0) is actually the density-mass of (U, V, )).The distribution P F of (U, V, δ, γ) has the following density/mass function Here our main interest is to estimate F in model (1), as g(u, v) is factored out in the model, its estimate is straight-forward using the data (u 1 , v 1 ), ...., (u n , v n ) by many existing methods, such as the kernel estimator.So we assume g(•, •) is known.The averaged log-likelihood for F is The approximate NPMLE Fn (•) of F(•) is the maximizer of l n (F) with respect to F, up to a o(n −1/α ) (a.s.) term (0 < α < 1/2).Unlike MLE for Euclidean parameters, NPMLE for infinite-dimensional parameter in likelihood model (like F in the above model) is not straight forward.Setting the corresponding Hadamard differential of the log-likelihood to zero often leads to nowhere.Instead, we use the method of isotonic regression.For this, rearrange (u 1 , ..., u n ; v 1 , ..., v n ) in increasing order, denoted as (x 1 , ..., x 2n ), and {Δ i : i = 1, ..., 2n} as the concomitants of the {u i , v i : i = 1, ..., n}, i.e., Δ i = δ j if x i = u j for some j, and In the proof of Theorem 1 we will see that ) for some 0 < α < 1/2.Let Fn (•) be the greatest convex minorant (Robertson, Wright, & Dykstra, 1988) based on (Δ 1 , ..., Δ 2n ), (x 1 , ..., x 2n ) and ln (F), as Fn = arg max F ln (F).
Let g 1 (•) and g 2 (•) be the margins of g(•, •).Below, we study the asymptotic distribution of Fn .We need the following conditions, in which (C2) is for the averaged log-likelihood l n (F) to satisfy a convexity condition, up to a o(n −1/α ) (a.s.) term (0 < α < 1/2), so that the isotonic regression method can be applied.
, where the expectation is for (U, V).
Next we study the almost sure behavior of the estimator Fn , and of any general NPMLE F n of F of model (1) with or without assumption (2), we are not confined to the isotonic regression method, only assuming its existence.We need the following two conditions.B, ρ) be the bracketing covering number of the family B, of size with respect some semi-metric ρ on B, and H [ ] ( , B, ρ) be the bracketing entropy.It is known (LeCam, 1973;Birgé, 1983;as stated in Wellner, 2005) that the optimal rate r n of convergence in probability, of estimating b 0 , in the sense r n h(p b n , p b 0 ) = O P (1), is determined by Typically, ), where d is the dimension of the argument of b and α is a smoothness measure of p b .Thus the optimal rate in this case is r n = n α/(2α+d) .
For NPMLE and more generally the minimum contrast estimator, the best achievable rate r n is determined by Birgé and Massart (1993), for some 0 < c < ∞, as So, in this case the NPMLE can achieve the optimal rate of convergence only if , then p 1,F and p 2,F are density-mass functions.We need the following assumptions: (C5) E p F (log(p F /p F )) α < ∞ for some α > 2, for all F with D(p F ||p F ) < δ for all sufficiently small δ.
Theorem 3 (i) Let F n be as in Theorem 2. Assume (C1) and (C5).Then, In both (i) and (ii) the rate of n 1/3 is optimal.
Note that F in model ( 1) is not rate √ n estimable, like many smooth functionals of it.There may still be certain smooth functionals of it which are rate √ n estimable.Geskus and Groeneboom (1999) showed this kind of result for model (0).Groeneboom and Wellner (1992, Chapter 5.4) showed such a result for the mean of F in case I interval censoring model.Huang and Wellner (1995) showed such a result for smooth linear functionals of the form ν(F) = HdF, with fixed H, for type I interval censoring model, and that the plug-in estimator is efficient.In fact, one of their purposes is to help treating estimation of smooth functionals in the case II interval censoring model.Yuan, Xu, and Zheng (2012) showed the same functional is rate √ n estimable for case II interval censoring model, without knowing the result of Huang and Wellner.Note that not all linear functionals of F are rate √ n estimable.For example, for fixed t 0 , ν(F) = F(t 0 ) is not rate √ n estimable, but the moments of F are (see for example, Theorem 2 in Yuan, Xu, & Zheng, 2012).Under some conditions, van der Laan (1993) obtained the following relationship between smooth functional, canonical gradient and empirical process: ν( θn ) − ν(θ) = (P n − P θ )( lν (•; θn ), where θ is a general parameter (Euclidean or Banach), θn an estimator of it, and lν the canonical gradient (also called efficient influence function) of θ.This relationship allows one to investigate a class of rate √ n estimable smooth functionals.Below, we give a similar result for model (1) as in Huang and Wellner (1995) for model (0).For fixed h(•) and M > 0, let the smooth (and linear) functional of F be Below we give the asymptotic distribution of the plug-in estimator νn = ν( Fn ), which has nothing to do with the asymptotic distribution of The following conditions are from Huang and Wellner (1995).
(C7) The support of F is a bounded interval [0, M], both G and F are dominated by the Lebesque measure.
Since models with Banach valued parameters are often not rate √ n estimable, and their weak limits, if exist, are often non-Gaussian, we cannot talk about efficiency of their estimators.However, this may be possible for some submodels.Below we explore such a scenario.For this we first review some basic facts of efficient estimation for Euclidean parameters.For the estimation of Euclidean parameter θ, let θ n = θ 0 + n −1/2 b for some b ∈ C, the complex plane.A rate n 1/2 consistent estimator → W for some random variable W, and the result does not depend on the sequence {θ n }.Let Z ⊕ V denote the summation of two independent random variables Z and V, I(θ) be the Fisher information for f (•|θ) at θ.The convolution Theorem (Hájek, 1970) states that for any regular estimator T n with weak limit W, there is a random variable V such that The Cramer-Rao theorem gives the lower bound of the asymptotic variance of any asymptotically unbiased estimators.The convolution theorem further characterizes the weak limit of an asymptotically optimal estimator: it is a normal random variable with mean zero and variance I −1 (θ 0 ).An estimator is efficient iff V = 0.In many cases, the convergence rate of Euclidean or infinite-dimensional parameters can be different from √ n.For example, for distributions with singularity of order α, the optimal convergence rate an estimator of its Euclidean parameter in the model is r n = n 1/(1+α) , −1 < α < 1 (α 0).In this case, the local parameter is defined as θ n = θ 0 + r −1 n b, and the local likelihood ratio is often asymptotically non-normal, see Ibragimov and Has'minskii (1981).For Euclidean parameter taking only finite number of possible values, the optimal convergence rate r n of an estimator is exponential (for example, Hammersley, 1950;Robson, 1958).For infinite-dimensional parameters, the convergence rates of their estimators are often slower than √ n and the weak limits are often non-Gaussian, although in some few cases rate √ n exist with Gaussian weak limits.When the convergence rate is not √ n, the problem is much harder, as the locally asymptotical normal (LAN) property no longer holds with such rates for the full model.However a number of papers have tackled this question, such as in Millar (1985) and LeCam (1994).These authors considered very general parameter spaces, established convolution results for estimators regardless of their convergence rates or forms of their weak limits.But these results are mostly of the existence type, not the specific type.Also, it is unclear whether one of the two components in their convolution representation is optimally achievable.For example, given an infinite-dimensional parameter and/or the corresponding likelihood model, although the optimal convergence rate for estimators of this parameter can be determined in principle (LeCam, 1973;Birgé, 1983), but it is still unknown if there is an optimal weak limit of its estimators, and what is its specific form if it exists.Pötzelberger, Schachermayer and Strasser (2000) gave examples in which the infinite-dimensional version of the convolution theorem does not hold in general abstract space, but does hold under some regularity conditions.The results are of existence type.Janssen and Ostrovski (2005) gave more detailed account of the optimal weak limit.Their Theorem 2.3 gives a convolution result for arbitrary convergence rate r n , for linear functions of infinitedimensional parameter and their estimate, with the assumption that the two involved estimators are asymptotically joint Gaussian.They gave the optimal weak limit as the minimal variance random element defined in their condition (a), but how to find this random element is still not clear.Also, the joint asymptotic Gaussian assumption can only be satisfied for a few parameters in the infinite-dimensional spaces, and in these cases often r n = √ n.Their Theorem 3.1 and 4.1 established convolution results for infinite-dimensional parameters in abstract spaces, but again the results are of existence type.
For rate √ n estimable parameter θ, the asymptotic minimax theorem (Hájek, 1972) states that for arbitrary (not necessarily regular) estimator T n of θ, and any bowl-shaped function l(•), with Z as given before.Also, for non-√ n consistent estimators, such results are unclear. Let → W for some random variable W, and the result does not depend on the sequence {θ n }.It is possible that on some submodels of the original one, both the convolution and asymptotic minimax result can hold with rate r n √ n, although they are not on the original model.Motivated from the exercises of Chapter 2 in Groeneboom and Wellner (1992), below we give such results for the type II censoring model (1), although it is not clear whether such optimal weak limit Z is achievable by some estimator(s).Consider the following parametric submodels of (1): fix t > 0 with f (t) > 0, and ds, and for |θ| < 1, let F n (t|θ) has a density (derivative) Then the local log-likelihood of F n (t, θ) satisfies the LAN condition with rate r n , and (i) For any rate-r n regular estimator T n of F(t), with r n (T n − F(t)) D → W, we have, for some random variable V, W = Z ⊕ V, Z ∼ N(0, 1).

Illustration for Applications
Interval censoring model case II is a generalization of that of case I.A common example of interval-censored survival data occurs in medical or health studies.In clinical trials, an individual due for the scheduled observations may miss some observations and may return with a changed status, thus contributing an interval-censored time of the occurrence of the change.As another example, in the acquired immune deficiency syndrome (AIDS) studies, if a subject is HIV positive at the beginning of the study, the subject's HIV infection time is usually determined by a retrospective study of the subject's history.This an interval censoring case II, with censoring variables as the first HIV positive test and the last HIV negative test.More practical bakground can be found in Sun (2012).
Although the asymptotics of Fn (•) is non standard, with a rate of n 1/3 instead of the common n 1/2 , and with a non-Gaussian weak limit, these making Fn harder to use, but the application of Fn (•) is not hindered by this phenomenon.Like most standard estimators, it can be used to predict the probability of events related in this model, or test hypothesis about a given null distribution F. When the corresponding densities f and g 1 and g 2 are given, then Theorem 1 can be used to test F, construct confidence interval for F and error bounds for Fn .The key is to evaluate the distribution of H := arg min h {B(h) + h 2 }, known as the Chernoff distribution, its density function is given in Corollary 3.3 of Groeneboom (1989).For the NPMLE Fn (•), there are existing softwares to compute it, for example the R-package "Decon" developed by Wang and Wang (2011).For given α, let H −1 (1 − α) be the (1 − α)-th upper quantile of the Chernoff distribution, by Theorem 1, the (1 − α)-th confidence interval for where Â(t) = 4 Fn (t) f n (t) , here an estimate f n (t) of f (t) is also needed and may be obtained by differencing of Fn (t); g 1 (t) and g 2 (t) are either known, or can be easily estimated as the data (U i , V i ) (i = 1, ..., n) are directly observed.Similarly for testing H 0 : F(t) = F 0 (t), a test statistic is given by Under H 0 , T n is asymptotically distributed with weak limit given in Theorem 1.
), H 0 is rejected with significance level α; otherwise H 0 is accepted.
In fact, since the random variable H does not depend on t, and A(t) is deterministic, under some more conditions, we can have that n 1/3 ( Fn (•) − F(•)) being a tight stochastic sequence on the support space, and the resulting weak convergence can be strengthened to that on the corresponding metric space equipped with the supreme norm.Thus any linear functional of n 1/3 ( Fn (•) − F(•)) converges weakly, and the confidence interval described above can be strengthened to confidence band, the point-wise test of H 0 can be generalized to test of H 0 : F(•) = F 0 (•) using these linear functionals.
The interested variable X and the observed variables U and V can all be generated from the Gamma densities, with different parameters for them to satisfy the underlying conditions, The condition V > U (a.s.) can be satisfied by specifying V = U + Z for some non-negative random variable Z, for example from another independent Gamma distribution.In some literatures, the censoring variable are just generated from uniform distributions.For the Gamma distribution, condition (C1) is automatic, (C3)-(C4) are commonly used conditions for the proof of consistency of NPMLE, as in van de Geer (1993).( C5)-(C6) are easily satisfied for the Gamma families, condition (C2) can be satisfied with suitable chosen parameters in two different Gamma distributions for U and V. Conditions (C7)-(C9) are satisfied if we constraint the Gamma distributions on a finite interval.
We are to find the limit distribution of n −1/3 ( Fn (t) − F(t)).(Why the convergence rate of n 1/3 ?We will see this latter after we computed the asymptotic variance of m n,h (Z, U, V), to be defined latter).Now, let's evaluate the limit of probability of the event {n −1/3 ( Fn (t) − F(t)) ≤ x} = { Fn (t) ≤ F(t) + xn −1/3 } for each fixed x.Thus we need to evaluate the limit of the arg max s event in (A.1) for a = F(t) + xn −1/3 .Make the change of variable s = t + n −1/3 h, and let ŝn = arg min s {V n (s) − aG n (s)} and ĥn = arg min h {V n (t + n −1/3 h) − aG n (t + n −1/3 h)}.Then ŝn = t + n −1/3 ĥn , and so arg min Thus we are to evaluate the limit of P( ĥn ≥ 0).Let P 1,n be the empirical measure of (z 1 , u 1 ), ..., (z n , u n ), P 2,n be that of (z 1 , v 1 ), ..., (z n , v n ), P n = (P 1,n + P 2,n )/2, P the theoretical measure corresponding to P n , A = {(z, u) : z ≤ u}, and Ph means E P (h) for any measure P and function h.We have Below, without confusion, for h < 0 by the notation [t, t + hn −1/3 ] we actually mean [t + hn −1/3 , t].We have and Similarly, By the same way, for s, h < 0, For s, h with sh < 0 (suppose Thus we have It is easy to check that r(s, h) is the covariance function of {B(h): h ∈ R}.Here we can see why the convergence rate of Fn (t) is n 1/3 .If a rate a(n) is used with a(n) → ∞, then in the definition of m n,h (u, v, z) we should replace n −1/3 and n 2/3 /n 1/2 = n 1/6 by a −1 (n) and a 2 (n)/n 1/2 , and the asymptotic variance of m n,h (U, V, Z) would then be Let ĥ be the minimizer of the right hand side of (A.2) above.To simplify the expression for ĥ, the following Lemma will be used.
Lemma Let B(h) be the two-sided Brownian motion, originating from zero, a > 0, b > 0 and c are constants, then Let h * and g * be the minimizers of the left hand side and right hand side above, then h * = (a/b) 2/3 r * − 1 2 c b simply by the given relationship between h and r.Since, as a(a/b) 1/3 > 0, the desired conclusion follows.
Using this Lemma, we have In the last step above, with the notation D = denotes equal in distribution, we used the fact that −B(h) and hence B(h) + h 2 is symmetric about 0, in distribution, and so Below we will show that ĥn = O P (1), so by (A.2) and the continuity of arg min{•}, we have, for all x, P n 1/3 ( Fn (t) − F(t)) ≤ x = P( ĥn ≥ 0) → P( ĥ ≥ 0) this gives the desired result.Now, we show that ĥn = O p (1).Recall ŝn = t + n −1/3 ĥn or ĥn = n 1/3 ( ŝn − t), where ŝn = arg min s {V n (s) − (F(t) + xn −1/3 )G n (s)}.So we only need to show n 1/3 ( ŝn − t) = O P (1).For this, let G 1 (•) and G 2 (•) be the margins of G(•, •), and Then ŝn = arg max s M n (s), and it is easy to check that arg max s M(s and V 2 (s) are the conditional distribution function P(U ≤ s|Z ≤ U) and P(V ≤ s|Z ≤ V), and V n (s) is the empirical version of V(s), thus ||V n − V|| R → 0 a.s.. Thus for fixed x and t, ||M n − M|| R → 0 a.s., and so by Corollary 3.2.3 in VW (p.287), we have ŝn → t in probability.Also, for s in a small neighborhood of t, for some 0 and for |s − t| < δ with δ small, Let r n be the largest sequence satisfies Proof of Theorem 2. (i).Let μ = μ 1 × μ 2 , where μ 1 is the Lebesque measure on (R + ) 2 and μ 2 the counting measure on D := {(0, 0), (0, 1), (1, 1)}, P F be the distribution of p F , B be the Borel filed on (R + ) 2 × D, and H(p F n , p F ) be the Hellinger distance as given before Theorem 3, and ||p F n − p F || be the variational distance, between p F n (•) and p F , Bickel, Klaassen, Ritov, & Wellner, 1993, p. 464).We will show that H(p F n , p F ) → 0 a.s., so that ||p F n − p F || → 0 (a.s.).Note p F can be re-written as So we will have i.e., G is a Glivenko-Cantelli class with respect to P.
For this, given a probability measure Q and r > 0, let Below we need to evaluate H( , G, || • || P n,1 ).Since for all F 1 , F 2 ∈ F , where Q is the probability measure corresponding to √ p F (after normalization) and μ 2 , as by condition (C3) this measure is well defined.Now, let and by Theorem 24 in Pollard (1984, p. 23), G is a Glivenko-Cantelli class with respect to P.
Proof of Theorem 3.
It is seen that F = arg max F l(F ).Also, for F ∈ F within a small neighborhood of F, i.e., with D(p F ||p F ) ≤ 1, we have Proof of Theorem 4. Let P n , P and Δ i 's as in the proof of Theorem 1.Note P F is the distribution in model (1), not to confuse it with P. Then as in Huang and Wellner (1995), hereafter HW, for any function r(•), we have So we have, as in HW, Now the rest proofs are the same as in HW.We only point out that, in the proof of the o P (1) part, to show [ Fn (t) − F(t)] 2 dG(t) P → 0, we can use our Theorem 2(ii) directly when the conditions there are met, in stead of the more complicated arguments there; to show K ), we only need to apply our Theorem 3(ii) directly when the conditions there are met, in stead of the arguments there.It is easy to check that the asymptotic variance of √ n Δ−F(t) g(t) h(t)d(P n − P)(t, Δ) is σ 2 .Now we compute the efficient influence function Ĩν (•) and the information bound for estimating ν(F) via model (1) and constraint (C2).We first compute the efficient influence function of ν(F) without constraint (C2), the extended version Ĩν,e (•).By Theorem 2 (iii) in Yuan, Xu and Zheng (2011), with x = (u, v, δ, γ), we have Ĩν,e (x) = g 1 (u)h(v) g(u, v) Let r(u, v) = (F(v) − F(u)) log(F(v) − F(u)) − (1 − F(u)) log(1 − F(u)) − F(v) log F(v).Then (C2) is η(F) := E P F [r(U, V)] = 0. Let Π(s|s 1 ) be the projection of s onto [s 1 ], the linear span of s 1 , s ⊥ 1 the orthogonal complement of [s 1 ] with respect to P P , < s 1 , s 2 > P F = E P F (s 1 , s 2 ) and ||s|| 2 P F =< s, s > P F .By Proposition A.5.2 in Bickel, Klaassen, Ritov and Wellner (1993) and the information bound for estimating ν(F) in model (1) with constraint (C2) is E P F ( Ĩ2 ν (x)).Since generally E P F ( Ĩ2 ν (x)) σ 2 , ν( Fn ) is not efficient for ν(F), unlike the corresponding result for the case I interval censoring model.
Proof of Theorem 5.The log-likelihood for θ under this submodel is with local log-likelihood ratio where L n (θ) and L n (θ) are the first two derivatives, Note the ξ ni 's are i.i.d., with E θ (ξ ni ) = 0; H n (t) = f (t)ca −1 n , F n (t|θ) = F(t) + θ f (t)ca −1 n ∼ F(t) and We have
Below we investigate the convergence rate of the approximate NPMLE Fn and any NPMLE F n of F based on model (1) in probability.For a parameter b ∈ B, a Banach space, and a probability model p b , let b n be an estimator of the true parameter b 0 based on n i.i.d.observations from p b 0 , h(p b n , p b 0 ) be the Hellinger distance between p b n and p b 0 by Theorem 2.2.4 in Cs örg ö and Révész (1981, p.94), there is a Wiener process W(•) such that van de Geer (1993) showH(p F n , p F ) → 0 a.s.For fixed F, let h F = ( p F /p F − 1)1(p F > 0), G = {h F : F ∈ F }, and P n and P are as given in the proof of Theorem 1.By Lemma 1.1 ofvan de Geer (1993), since F n is the NPMLE of F in model (1),H 2 (p F n , p F ) ≤ 2(P n − P) 1(p F > 0)[ p F n /p F − 1] .