Parametric Estimation of the Cure Fraction Based on BCH Model Using Left-Censored Data with Covariates

Medical investigations nowadays allow the incorporation of cure individuals in the analysis, especially for chronic diseases such as cancer. Therefore, survival models that incorporate the cured patients in the analysis are called cure rate models. In this paper, we propose an analytical approach for parametric estimation of the cure fraction in cancer clinical trials based on the bounded cumulative hazard (BCH) model with covariates involved in the data set. The analysis is constructed by means of the exponential distribution in the case of left censoring and within the framework of the expectation maximization (EM) algorithm. The analysis provided the analytical solution and a simulation study for the cure rate parameter.


Introduction
In cancer clinical trials, the population of patients is considered as a heterogeneous population since it is eventually divided into two categories.One group consists of patients who will never experience the event of concern and who are hence considered as cured while the other group comprises patients who remain uncured.However, the main interest of cancer trials is estimating the proportion of cured patients which stands an important criterion for elucidating the trends in the survival of cancer patients.Therefore, the survival models which incorporate the cure fractions in the analysis are called cure rate models.
The first created cure model, which is still widely used in survival analysis, is the model constructed by Boag in 1949 and later developed by Berkson and Gage in 1952.This model is called the mixture cure model since it can estimate the proportion of patients cured as well as the survival function of the uncured patients.According to this model, the survival distribution function can be written in terms of the 'mixture' of the cured, plus the uncured, patients such that:
(1) This model is described as parametric or semi-parametric model, depending on whether standard probability distributions are, or are not, employed.If a standard probability distribution like the exponential, Weibull, Gompertz negative binomial, or the generalized F distribution is used, then the model is parametric.If, on the other hand, the mixture model is used without any standard probability distribution, then the model is described as semi-parametric model.
In model (1), 1 , where is the cumulative distribution function.Furthermore, 0 0 and ∞ 1, so that 0 1 and ∞ , corresponding to the plateau value.Moreover, the hazard function concomitant to this model is , where is the density function attendant to .
Hence, even mixture cure models fit cancer data well and they usually cannot be viewed literally as describing a mixture of both cured and uncured individuals.However, it should be highlighted that the literal interpretation of the cure model is meaningful in some non-cancer applications, e.g., Hauek et al. (1997).
Despite the vast volume of literature on the mixture cure rate model, it has some drawbacks as was illustrated by Chen (Chen et al., 1999).The main drawback is that when a set of covariates is included through, then this model will lack a proportional hazard structure.Therefore, Chen proposed an alternative model which overcomes the drawbacks of the mixture cure model.This alternative model is the "Bounded Cumulative Hazard (BCH)" model which was initially developed by Yakovlev andco-wrkers in 1993 (Yakovlev et al., 1993).

The BCH model
The milestone of the ( BCH ) model is the assumption of an individual in the population left with cancer cells after the initial treatment.The cancer cells (often called clonogens) grow rapidly and produce a detectable cancer mass later on.The variable is not observed and has the Poisson, Bernoulli or negative binomial distribution (Rodrigues et al., 2009).Chen et al. (1999) considered the Poisson distribution for with a mean of θ, and in this paper we adopt Chen's assumption.

Given
, let , 1,2, … be independent random variables with a common distribution function 1 that is independent of .The variable denotes the time which the clonogen takes to produce a detectable cancer mass.Then, the time it takes cancer to relapse can be defined by the random variable T where min , 0 , such that ∞ 1 , the are independent and identically-distributed (i.i.d), and is independent of the sequence , , … , .Therefore, the survival function for , and hence for the population, is given by (Aljawadi et al., 2011): (Probability of no cancer by the time t).2) is an improper survival function.The study defines the cure fraction ( ) as follows: ∞ 0 exp .
(3) As ∞ , 0, whereas as 0 , 1, i.e., 0 1.It should be underlined that the first partial derivative of with respect to is exp .
Since 1 , and accordingly -, then is an improper survival function , and therefore is an improper probability distribution function (p.d.f), as well.

The likelihood Function
In this analysis, the likelihood function is considered using the left censoring type of input data.(5) When covariates are involved in the analysis, the scale parameter of the exponential distribution ( ) given the th covariates can be expressed as: exp , where and are the covariates and coefficients vectors, respectively; 1, … , ; and 1, … , .Therefore, the log-likelihood function defined in equation ( 5) becomes: The solutions of 0 and 0, … , 0 are our desired estimates of and , where As the cure status is not fully observed, the EM algorithm will be used.Before employment of the EM algorithm, is defined as the expected value of the patient to be uncured under the conditions of current estimates of and the survival function of uncured patients, , and its value is drawn from the equation (Peng and Dear, 2000): 1 .
For censored individuals 0 and hence the equation giving can be re-written as follows: For simplicity, we can define as the probability of cured individuals such that 1 1 , 1 .
(10) 2.3 The EM algorithm Suppose that the data vector is in the form of , , , .For 1, … , and 1, … , , If represents the number of uncensored individuals, then is the number of censored individuals.Accordingly, the observed data are (i) the lifetime ( ); (ii) the censoring status ( 1) for 1 … ; (iii) the cure status ( 1) for 1 … , , and (iv) the covariates vector (V j ) for all j, while the only unobserved data are the cure status for 1 … .However, in the expectation step (E-Step) we determine the expected value of the log likelihood function (6) as follows: / , c , t , On grounds of the data partition defined above and the sufficient statistics, we can re-write equation ( 7) and the system of nonlinear equations in ( 8) respectively as follows: However, for the maximization step (M-Step), the complete data maximum likelihood estimates of the parameters are given by equations ( 14) and the system of nonlinear equations (15).Thus, for an initial value of ° the system of non-linear equations ( 15) with respect to the parameters vector ( ) using any appropriate numerical method, such as the Newton Raphson method, can be solved.The solution of the non-linear equation in (15) in addition to the initial value of ° can be used to solve the complete data sufficient statistics given by equations ( 11), ( 12) and ( 13).Substitution of these sufficient statistics in equation ( 14) will give a new value for that will be reiterated until convergence and eventually the desired cure fraction is exp .

Simulation and results
Simulation studies based on left censored data involve similar steps in comparison with right censoring type.In this study, exponential, binomial and uniform distributions were used to generate the data which is composed of the lifetimes , censoring, cure status and covariates vector , , respectively.
The steps used to generate a left censored data set are shown below: a) Generate a variable for true survival time from an exponential distribution with various scale parameter values since we were quite interested in varying the censoring rates (P) as this enables identification of the pattern which the cure rate estimation will assume.So, for the sake of flexibility in finding out how the pattern of the cure rate would progress as the censoring rate increased, both slowly and rapidly, we were interested in the scale parameter 0.5, 2 .b) Generate another variable for censoring time from a uniform distribution on the interval (min , min max /2) to obtain left censored survival times.This censoring variable is denoted as .c) Compare the true survival time with the censoring time .Then the lifetimes and censoring indicator can be defined as follows: In this simulation we generated 20 data sets each with 100 individuals where as a special case for each set of cohorts we considered the left censored individuals are cured.Thus, the cure indicator can be defined in the same manner of censoring indicator where 0 1 Regarding the covariates, only two covariates were considered: gender, which derived from a binomial distribution; and type of treatment, i.e., chemotherapy or radiotherapy, which derived from binomial distribution.
In this simulation we were interested in the bias and in the mean square error (MSE), where bias is commonly defined as the difference between the true and the expected values of an estimator as given by: bias , where is the maximum likelihood estimate for π.
Consequently, the MSE of an estimator is known as the expected squared deviation of the estimated parameter value from the true one.By using a standard notation for a scalar parameter, it can be expressed in the following form: The simulation was carried out with the built-in random generators in the R statistical software to fulfill the entire simulation and the final results are presented next by Tables 3.1 and Figure 3.1, respectively.The above results show the results of the parametric estimation of the cure fraction when covariates were included in the analysis.The bias and MSE values for the various given rates of censoring indicate that the proposed method of cure rate estimation was more efficient when the censoring rate was low than when it was high, and that the estimation started to diverge in the case of heavy censoring rates.Hence, increasing the proportion of censored data will distort the estimated parameters, and vice versa.

Conclusion
By assuming that both ° .and ° .are known, the researchers investigated the parametric maximum likelihood estimation equations of the cure fraction.The analysis was conducted by consideration of the left censoring case based on the bounded commutative hazard model with the exponential distribution used to represent the survival function of the uncured patients.Covariates were involved in the analysis via the scale parameter of the exponential function whereby the parametric estimation equations of the parameters can be solved numerically by selection of an appropriate numerical method since the researchers could not find an explicit solution.The results demonstrate that cure fraction estimation based on the proposed procedure was more attractive when censoring rate is low than when it is high.
Suppose that T is a random variable with probability density function ; , to be estimated and , , … .., is a random sample of size n, then the joint probability density function is given as ; In parametric maximum likelihood estimation of the cumulative distribution function, both.and the probability density function .for the entire population are known.This study employs the exponential

Table 3 .
1 Results of the simulation based on the scale parameter 0.5, 2 and 20 data sets.
Figure 3.1 Censoring rate versus bias based on the results in table 3.1.