A Parametric Approach to Estimate Survival Time of Diabetic Nephropathy with Left Truncated and Right Censored Data

Diabetic Nephropathy (DN) is the deleterious effect of diabetes mellitus on renal structure, or a function caused by it. The rate of rise in, serum creatinine is observed for the progression of DN. Retrospective data is collected from 132 patients. As 60 patients form uncensored cases, remaining 72 patients are censored cases. In this paper we have developed three models. The first model is based only on uncensored cases with left truncation. The second model also includes censored data. Fisher’s information matrix has been applied to find variances and asymptotic confidence interval for the estimated parameters. The method of maximum likelihood is used to estimate unknown parameters in both the models. In the last model, onset times are arranged as to estimate mean onset time of DN for all the patients by using application of order statistic. We aim to predict onset time of DN for new diabetic patients.


Introduction
Diabetic nephropathy (DN) develops in 20 -40% of the patients within 10 to 15 years after the onset of diabetes.About one third of those affected eventually have progressive deterioration of renal function (Remmuzz, Schipath & Rugenetiti, 2002).Data from the UKPDS demonstrated that 25% of patients approximately with type-2 diabetes develop worse diabetic nephropathy by 10 years (UKPDS, 1998a(UKPDS, , 1998b)).Also, it is estimated that almost 50% of patients develop DN within 19 years from diagnosis of diabetes.According to the American Diabetic Association in year 2004, about 20-30% of patients with type-1 or type-2 diabetes develop nephropathy (ADA, 2004).Thus, DN is a serious problem in terms of financial load, morbidity and mortality in the developed world (Banerjee, Ghosh & Saha, 2005).
In many prospective and retrospective studies, survival data are subject to left truncation in addition to the usual right censoring.There is a widespread application and use of left truncation and right censored data (LTRC) in survival analysis.Alioum and Commenges have given proportional hazard models for arbitrarily censored and truncated data to estimate the distribution of time from diabetes onset to development of DN (Alioum & Commenges, 1996).Enzo, Vittorio, and others have applied conditional probability to find that a diabetic patient will develop a second complication, given that they had already developed the first complication (Enzo, Vittorio et al., 2003).They also propose the Bayes' formula for the same problem.Chappell, Fine and Jiang have considered semi-parametric analysis of survival data with LTRC to quantify the association between time to DN and diabetes related death and to estimate the probability of developing DN after being diagnosed with insulin dependent diabetes (Chappell, Fine & Jiang, 2005).Sparling, Younes, and Lachin have applied a parametric family of regression models for interval-censored event-time data accommodating both fixed (e.g.baseline) and time-dependent covariates.Their model employs a three-parameter family of survival distributions including special cases of Weibull, negative binomial, and log-logistic distributions , and can be applied to data with left, right, interval, or non-censored event times.The model is applied by them on diabetic patients' data to describe the effects of longitudinal measures of glycemia over time on the risk of progression of diabetic retinopathy (Sparling, Younes & Lachin, 2006).
Medical and epidemiological studies are mostly conducted with an interest in measuring the occurrence of an outcome event.In this paper we are estimating the time from onset of diabetes to development of DN subject to censoring and truncation.Weibull distribution is fitted on both uncensored and LTRC data.Weibull distribution fitting is tested through (i) chi-square test for goodness of fit and (ii) log-cumulative hazard plot for uncensored cases.Fitting of the Weibull distribution for censored cases is tested by applying (i) Hollander and Proschan test and (ii) modified Cox-Snell residuals.The unknown parameters shape , and scale, of the distribution are estimated by the method of maximum likelihood.Fisher information matrix has been used to construct variances and asymptotic confidence intervals of the unknown parameters for LTRC data.The DN onset times of uncensored out of total cases are arranged in ascending order for the application of order statistic as: t (1) , t (2) , ..., t (i) , ..., t (n 1 ) .The probability density functions and means of minimum and maximum DN onset time have been estimated using the application of order statistic.Also, we have estimated mean DN onset times for all the patients who are under advanced nephropathy.Observed DN onset times have been compared with DN onset times obtained by using the application of order statistic.Besides introduction, the course of this paper is as follows: In section 2 developments of the models is discussed.Section 3 applies the models to type-2 diabetic patient data and section 4 contains discussion.

Model for Uncensored but Left Truncated Data
Let t (1) , t (2) , ..., t (i) , ..., t (n 1 ) be the exact DN onset times of n 1 individuals out of n patients which are known under this study.Left truncation arises when a subject is not included in the study because its diabetic nephropathy onset times (event of interest) originated prior to the starting time of the study.The DN onset times are assumed to follow Weibull distribution.The probability density and survival function of a W(λ, γ) for the i th (i = 1, 2, ..., n 1 ) patient are given by:

Estimation of Unknown Parameters Using Maximum Likelihood Estimation
The log likelihood function is given as follows: The maximum likelihood estimates of λ and γ is found by differentiating above function with respect to λ and γ, and equating the derivatives to zero.The resulting equations are: The maximum likelihood estimates (MLE) of λ and γ obtained as λ, γ by solving above two equations ( 4) and ( 5) simultaneously.

Estimated Mean, Variance and 95% Confidence Interval of DN Onset Time from the Diagnosis of Diabetes
Using MLE of λ and γ as λ, γ we obtained estimated mean, variance and 95% confidence interval of DN onset time/survival time, which are given by:

Model for Left Truncated and Right Censored Data
All subjects under study experienced an initial event E1 (diagnosed as diabetes as per ADA standards), but not all of them experienced a second event E2 i.e. the event of interest (onset of renal disease/diabetic nephropathy) till the study was terminated on November 2007.Right censoring happens when DN onset time of a subject is not completely observed as a second event, the end time of the study is fixed and patients enter the study at different times i.e. duration of diabetes is different for different patients.The form of censoring is generalized Type I censoring (Lee & Wang, 2003).This can be explained as: suppose that there are four patients with different duration of time and study is terminated at fixed time, where each individual's minimum diabetes duration is 5 years (left truncated).Then the starting time is backed up to 5 years for four individuals.DN onset time for first patient is observed before the end of the study, it is an uncensored case with (δ 1 = 1); DN onset time for second patient cannot be observed before the end of the study, it is a censored case with (δ 2 = 0); DN onset time for third patient is known and an uncensored case with (δ 3 = 1) and DN onset time for fourth patient is unknown and is a censored case with (δ 4 = 0).The DN onset time of first and third patient may not be same, also, censoring time of second and fourth may not be same, because patients under study are with different duration of diabetes.
There are n patients under study, out of which diabetic nephropathy onset times for n 1 patients are uncensored and DN onset time for remaining n − n 1 patients are censored.The DN onset time from the diagnosis of diabetes is a non-negative random variable which is assumed to follow Weibull distribution .The probability density function of a W(λ, γ) for the i th (i = 1, 2, ..., n) patient is given by (Lee & Wang, 2003)

MLE for Unknown Parameters for Right Censored and Fixed Left Truncated Weibull Distribution
When the censoring times T i are different for different patients, the likelihood function for the survival times is defined as: where δ i is zero, if for the i th patient DN onset time is censored and δ i is unity when DN onset time is known.The corresponding log likelihood function is given by: The maximum likelihood estimates of λ and γ is found by partially differentiating the above function with respect to λ and γ, and equating the derivatives to zero.The resulting equations are: The MLE of λ and γ are obtained as λ, γ by solving above ( 11) and ( 12) equations by trial and error method.

Approximate Fisher Information Matrix to Construct Asymptotic Confidence Intervals of the Unknown Parameters
In this sub-section, the approximate Fisher information matrix of maximum likelihood estimators of parameters from Weibull distribution when the data are left truncated and right censored has been obtained, which can be used to construct asymptotic variance and confidence intervals of the parameters (Gross & Clark, 1975).The Fisher information matrix can be written as: The approximate 95% confidence intervals for λ and γ are λ ± 1.96 √ Var( λ) and γ ± 1.96 √ Var(γ), respectively; where Var( λ) and Var(γ) are the diagonal elements of the variance-covariance matrix.

Estimated Mean, Variance and 95% Confidence Interval of DN Onset Time
Using maximum likelihood estimates of λ and γ for LTRC data as λ, γ we obtained the estimated mean, variance and 95% confidence interval of onset time /survival time which are given by: Where, T denotes the maximum duration of diabetes among n 1 (uncensored) patients under study.The above results are compared with those obtained for uncensored cases and observed cases.

The Probability Density Function and Mean Time of First, Last and Order Statistic for the Patients Who Proceed Towards Nephropath
Let τ 1 ≤ τ 2 ≤ ... ≤ τ i ≤ ... ≤ τ n be the ordered duration of n diabetic patients and n 1 be the number of uncensored diabetic nephropathy of patients with ordered onset times as, t (1) ≤ t (2) ≤ ... ≤ t (i) ≤ ... ≤ t (n 1 ) where t (1) and t (n 1 ) denote the minimum and maximum time taken by type-2 diabetic patient proceeding towards nephropathy, respectively, among n 1 patients.The DN onset times t (1) , t (2) , ..., t (i) , ..., t (n 1 ) are following left truncated Weibull distribution with parameters λ and γ as given in equation ( 1).
The probability density function and mean of first order statistic or the patient who has taken minimum time to proceed towards nephropathy is given as: Here we are dealing with right censored data and T denotes the maximum duration of diabetes among n 1 patients under study.
The probability density function and mean of r th order statistic is given as: The probability density function and mean of n th 1 order statistic or the patient who has taken maximum time to proceed towards nephropathy is given as: Thus, the mean time of first or minimum, last or maximum and r th order statistic for the patients who proceed towards nephropathy are computed.

Model Checking
There are a number of techniques for evaluating the fit of parametric survival models, including analogues of residuals and influence diagnostics (Collett, 2003).To assess the adequacy of the present model, two methods have been applied for the testing of the distribution assumption for both the uncensored and censored data.Firstly for uncensored data, logcumulative hazard plot is used, if the single sample of the survival data is available.For obtaining log-cumulative hazard plot, first compute the Kaplan Meier estimate of the survivor function, Ŝ (t) for each uncensored observation and then find the survivor function, Ŝ (t) of the fitted distribution.If the plot of log(−logS (t)) against log(t) is an approximate straight line, and close to the line obtained from plotting log(−log Ŝ (t)) against log(t) then the fitted distribution assumption is tenable.Secondly, the fitting has been tested by applying Karl-Pearson's chi-square goodness of fit test.In case of censored data, firstly, the modified Cox-Snell residual plot is used for fitting distribution, proposed by Crowley and Hu.
To include censored observation, Cox-Snell residual is modified by the addition of a positive constant ∆, the excess residual.Suppose that the i th survival time is a censored observation, t * i , and let t i be the actual, but unknown, survival time, so that t i > t * i .Then modified Cox-Snell residual can be used for i th censored observation as: The, Ŝ i (t * i ) and Ĥi (t * i ) are the estimated KM survival and cumulative hazard functions, respectively, for the i th individual at the censored survival time.After computing r ′ C i , it is plotted against the −log(S (t)) where S (t) is the survival function of the fitted distribution.If this plot is close to a straight line passing through the origin, this suggests that model fitted to the data is satisfactory.Secondly, Hollander and Proschan test has been applied to test the fitted distribution survival function with the survival function obtained by applying Kaplan Meier method (Hollander & Proschan, 1979).

Applications
The methods discussed above are applied to the following data concerning nephropathy of type-2 diabetic patients.Upto-date pathological reports/records of 132 diabetic patients, using a common path lab, are collected through a house to house Survey.Retrospective study is conducted on the collected data.Patients with less than 5 years diabetic history are not included or data considered is left truncated which arises when a patient is not included in the study because its DN onset time (event of interest) originated prior to the starting time of the study.Under truncation these individuals are never considered for inclusion into the study.
On the basis of serum creatinine the event of interest nephropathy, using reference as the rate of rise in SrCr, a wellaccepted marker for the progression of DN, (creatinine value 1.4 to 3.0 mg/dl) is the indicator for impaired renal function (Adler, Stevens, Manley et al., 2003).Also, the renal health/dysfunction can be estimated from serum creatinine level for 132 patients under study.Thus, at the end of the study, out of 132 patients, there are only 60 diabetic nephropathy cases and remaining 72 are treated as censored.Data is simultaneously left truncated and right censored.Table-1 represents the duration-distribution of type-2 diabetic patients who developed diabetic nephropathy.It shows that out of 60 DN cases, 16.67% are with less than 10 years of diabetes, 25% with 10 to 15 years, 38.33% with 15 to 20 years and 20% are of more than or equal to 20 years of duration of diabetes.In all there are 45.45% cases with diabetic nephropathy.
The minimum age at which diabetes is diagnosed is found to be 29 yrs and maximum age as 58 years, as available from data.Table 2 represents descriptive statistics of 132 patients giving minimum, maximum and mean ± S.D of the variables; age at diagnosis, duration of disease and SrCr for two groups; censored or non diabetic nephropathy (NDN) and uncensored or diabetic nephropathy group.<Table 1-2>

Estimated Mean, Variance and 95% Confidence Interval of DN Onset Time from the Diagnosis of Diabetes for Uncensored Cases
For modelling the survival time (from the diabetes diagnosis till the onset of nephropathy) for the uncensored cases only from the data, the minimum onset time is observed as 6 years and the maximum onset time is 26.6 years.Left truncated Weibull distribution is assumed for this case.Fitting is tested, firstly, by applying Karl-Pearson's chi-square goodness of fit test and calculated chi-square value came out to be 2.732 with p=0.435.Therefore, we conclude that there is insufficient evidence to say that data are not from left truncated Weibull distribution with λ=0.079515 and γ=2.811274.Secondly, graphical method, log-cumulative hazard plot is also used to test the fitting of distribution.For obtaining this plot, the values log(−log( Ŝ (t))), where, Ŝ (t) is Kaplan-Meier estimate of survival, and log(−log(S (t))), where S (t) is fitted distribution estimates of survival are computed.Then plotting, log(−log( Ŝ (t))) against log(survival time) and log(−log(S (t))) against log(survival time), the resulting log-cumulative hazard plot is shown in Figure 1.In this figure the line log(−log(S (t))) against log(survival time) is reasonably straight and close to the other line.This means that the assumption of left truncated Weibull distribution for the uncensored data of DN onset times is quite plausible (Collett, 2003).
Using maximum likelihood estimates as λ=.079515 and γ =2.811274, estimated mean DN onset time is obtained as 15.19228, variance of DN onset time as 17.09349 and 95% confidence interval as (14.14615, 16.23842).These results are also compared with mean, variance and 95% confidence interval of DN onset time obtained from the observed data, which are displayed in Table 4.

Estimated Mean, Variance and 95% Confidence Interval of DN Onset Time from the Diagnosis of Diabetes by Including Censored Cases
For modelling the DN onset times including censored cases, the minimum duration of diabetes is found to be 5.6 years and maximum as 27 years as available from the data.From the data we have only partial information about these 72 cases, as they are censored cases.For this model, data is left truncated and right censored simultaneously, and Weibull distribution is assumed for this case also.Fitting is tested, firstly, on applying Hollander and Proschan goodness of fit test for censored data.The observed value of the test statistic is 1.256186 with p= 0.182, which does not fall in the rejection area.Therefore, we conclude that there is insufficient evidence to say that data are not from Weibull distribution with λ =0.066834 and γ=1.78211 estimated by applying maximum likelihood method.Secondly, graphical method, i.e., modified Cox-Snell residual plot (Gross & Clark, 1975) is also used to test the fitting of distribution.For obtaining this plot, r = −(log(S (t))) is plotted against −log( Ŝ (t)), where S (t) and Ŝ (t) are same as defined in section 3.1.The resultant line is modified Cox-Snell residual plot and line is close to straight line as shown in Figure 2. Therefore, from both the methods Weibull distribution is found to be an appropriate parametric model for DN onset times.

Construct Asymptotic Confidence Intervals of the Estimated Parameter by Using Fisher Information Matrix
Again using maximum likelihood estimators as λ=0.066834 and γ=1.78211 of unknown parameters for censored case, the Fisher information matrix is used to obtain the asymptotic confidence intervals, Var( λ) and Var(γ), the results are depicted in Table 3.The estimated mean DN onset time is found to be 13.9058552 with corresponding variance and 95% confidence interval are obtained as 13.953207 and (13.4213197, 14.695785), respectively.The results are displayed in Table 4.

Estimated Mean Time of First, Last and r th Order Statistic for the Patients Who are under Advanced Nephropathy Group
For modelling the nephropathy onset time from diagnosis of diabetes for estimating mean DN onset time for every patient, the application of order Statistic has been used.We have used this model to estimate mean DN onset time for 27 patients with mean SrCr = 1.91mg/dl, who are under advanced nephropathy group out of 60 uncensored cases.Estimated minimum and maximum mean DN onset time are obtained and the results came out to be 5.646346 years and 24.58083 years respectively.Also, estimated mean DN onset times for 27 patients are compared with observed onset times and results are displayed in Table 5.Comparison is also illustrated graphically in Figure 3.It is observed from graph that the mean values of onset of DN for each patient up to 10 th order are close to observed onset time, but as the order increases the difference between observed and estimated mean value increases as higher ordered values depend on all the prior ordered values.Also, it can be observed from Table 5.The mean and variance of means of mean DN onset time obtained through application of order statistic are 15.71540 and 37.949, respectively.SPSS for Windows, Version 15 and MATLAB, Version 6.5 statistical packages were used for the calculation and analysis.<Table 5> <Figure 3>

Discussion
The aim of this paper is to estimate nephropathy onset time arising out of type-2 diabetic patients.Previous studies reveal that type-2 DM is the most common form of diabetes constituting 90% of the diabetic population and nephropathy is a life-threatening complication of diabetes mellitus.The DN onset times calculations throughout this paper are based on the values of serum creatinine, since, Bio-Stratum at their 64th Scientific Sessions ADA meet accepted rate of rise in serum creatinine as a marker for the progression of diabetic nephropathy.
There is a widespread application and use of left truncation and right censored data (LTRC) in survival analysis.Data used in this paper are simultaneously left truncated and right censored.Thus, no information was available for the patients whose DN onset time appear prior to five years and partial information was available for the patient whose DN onset times could not be observed till the end of study.
The main objectives of this paper are to estimate mean, variance and 95% confidence interval of DN onset time by fitting (i) model for uncensored cases, and (ii) model including censored cases.Weibull distribution is assumed for both the cases as it is very flexible and because of flexible shape and ability to model a wide range of failure rates, the Weibull has been successfully used in many applications (Collett, 2003;Lee & Wang, 2003).The distribution fitting is tested through graphical method, to access whether a particular theoretical distribution provides an adequate fit to the data (Lee & Wang, 2003).Also, most common Chi-square goodness of fit test is used for testing the fitting for uncensored data and Hollander and Proschan goodness of fit test is used for testing the fitting of data that includes censored observations.Weibull distribution is found to be appropriate in both the cases.Previous studies suggest that when survival time has a specific statistical distribution; the statistical power of parametric distribution is higher than nonparametric and semiparametric models.
Firstly, for uncensored cases using Weibull model the mean of DN onset time is found to be 15.19228 years whereas the the observed mean of DN onset time is 15.21333 years.Thus, the estimated and the observed mean DN onset times are found to be remarkably similar.Secondly, mean of DN onset time, 13.905855 years, is obtained by including censored cases.The mean of DN onset time based on uncensored observations is higher than the case which includes censored observations and this matches with Li and Lagakos finding suggesting that considering only uncensored cases will increase the mean of the survival time (Li & Lagakos, 1997).The mean DN onset times obtained from both the cases are almost consistent with the previous findings which suggest that diabetic nephropathy develop within 10 to 15 years after the onset of diabetes (ADA, 2004).
In part three, the model is applied with the application of order statistic to estimate minimum and maximum mean DN onset time and also, mean DN onset time for all the 27 patients out of 60, who are under the advanced nephropathy group.It is found from Table 5, which is visible through graph, that estimating mean DN onset time using the application of order statistic is a useful exercise if the sample size is small.This is further, verified by applying paired t-test.We have divided 27 patients in accordance with the ascending order of DN onset time into two groups of sizes 14 and 13.By applying paired t-test for comparing the observed DN onset times with those obtained from the application of order statistic on first group( comparing first 14 ordered values) the test statistic value is found to be 1.659 with p=0.121, (p >0.05) on 13 degrees of freedom.Thus, we conclude that there is not enough evidence to reject the hypothesis as there is no significant difference between the estimated and the observed DN onset times.Then by applying paired t-test on second group (ordered values from 15 to 27) the test statistic value is found to be 14.645 with p <0.001 on 12 degrees of freedom.From this we conclude that difference between observed and estimated mean DN onset time increases with an increase in the order of the observation.The minimum and maximum mean of DN onset times are found to be 5.646346 years and 24.58083 years respectively.
Weibull model can empirically fit a wide range of data.Thus, Weibull distribution may be used to model the survival distribution of a population with increasing, decreasing, or constant risk and can be taken as a baseline model for future studies.The model can be applied for estimating survival times of other complication arising out of type-2 diabetes such as retinopathy, CVD and others.The application of order statistic can be further explored with other survival models and also for other biomedical complications.The major use of estimating the nephropathy onset time of diabetic patient is that current as well as future DN onset time of new diabetic patients can be predicted.

Table 1 .
Duration-distribution of Diabetic patients till Nov'2007, who developed Diabetic Nephropathy

Table 2 .
Descriptive statistics of 132 patients giving minimum, maximum and mean ± standard deviation of age at diagnosis, duration of diabetes, and serum creatinine for two groups i.e.NDN (censored) and DN (uncensored) Group

Table 3 .
Variance and asymptotic confidence intervals for the estimates of the parameters obtained by using Fisher information matrix

Table 4 .
Estimated mean, variance and 95% confidence interval of DN onset time obtained from observed and fitting (i) left truncated Weibull distribution on uncensored cases (ii) left truncated Weibull distribution by including censored cases

Table 5 .
Estimated and observed DN onset time for 27 patients who are under advance diabetic nephropathy group