Survival Modelling of Tuberculosis Data-A Case Study of Federal Medical Centre, Bida, Nigeria

The article aimed at fitting Cox-proportional hazards model to Tuberculosis (TB) data. TB data on 259 patients spanning 2010 through 2016 were collected from the Federal Medical Centre, Bida, Nigeria. Covariates involved were gender, age, type of TB and occupation. Fifteen different Cox models, representing all possible combinations of covariates in question were fitted. Parameters were estimated by method of maximum partial likelihood and model selection was based on Akaike information criterion (AIC). Model (G+C), with gender and occupation as covariates produced the least AIC of 618.597 and hence, was adjudged the best. That is, gender and occupation constituted the best subset of covariates that explained survival of TB patients. The model suggested that recovery hazard of a male TB patient was 24.1% lower than that of a female patient possessing same occupation. This implies that male patient had higher survival time than the female having same occupational status. It further suggested that recovery hazard for patient on technical occupation was 27.46% higher than for patient on non-technical job and of same gender. Hence, a patient on technical occupation had reduced survival time compared to one of same gender on non-technical occupation. It was concluded that gender and occupation explained best, survival of TB patients based on AIC.


Introduction
By survival analysis, we mean a collection of methods for analyzing data for which the outcome variable is time until an event occurs. The event should of course depend on the area of application. In economics, it may be employment, shutdown, economic recession, payment of health insurance, payment of gratuity and so on; in medicine, it may mean death, disease incidence, relapse from emission, recovery and so on; in academics, it could be completion of a doctoral dissertation while in society at large, it may be marriage, divorce, migration and so on. The event is technically regarded as a failure. This does not necessarily imply that the event is of negative consequence, although typically in health sciences, it is, but in other areas, it may not. For instance, employment and marriages are positive phenomena.
One major feature of survival data that distinguishes it from other types of numerical data is that the event will not necessarily have occurred in all individuals by the time the study ends (Bradburn, Clark, Love & Altman, 2003). This invariably means that the time to event, which is of primary concern in survival analysis, may not have been observed in all subjects at the end of the study. This could be because subject does not experience the event before study ends, lost to follow-up or outright withdrawal from the study due to death if event of interest is not death. This phenomenon is called censoring and two types exist.
Observations are said to be censored when information about the survival time is incomplete. That is, although we have some information, it is not exact. Survival data may be right-censored or left-censored. It is right-censored when an individual's true survival time is greater than observed survival time and left-censored when the contrary is the case. Presence of censoring remains one of the cogent reasons for ruling out use of ordinary least squares method for analyzing survival data. The special feature of survival data hence, calls for development of special techniques for its handling. These techniques constitute a body of knowledge called survival analysis.
A lot of research efforts have gone into survival analysis. Such efforts range from development of methods to comparisons to applications. These applications are predominantly but not limited to the health sciences. Mayo, Korner-Bitensky and Becker (1991) used the Cox model to investigate factors influencing recovery time from stroke to full and independent sitting function. They concluded that patients with least degree of perceptional impairment recovered more quickly. Cheingsong-Popov et al. (1991) compared the time to progression to stage IV disease in patients with AIDS whose gag antibody levels were 1600 or more when survival than those with fewer antibodies. They found that the disease of patients with fewer gag antibodies progressed more quickly. Lee and Go (1997) reviewed common statistical techniques involved in analyzing survival data in medical research. Korenman, Goldman, and Fu (1997) examined the effect of widowhood on mortality using time varying covariate model. Cheung (2000) used Cox model to investigate cancer mortality. Ravangard et al. (2001) compared Cox model and parametric models in the study of length of stay in a tertiary health teaching hospital in Tehran, Iran. Chow et al. (2002) studied the effects of tamoxifen in the treatment of inoperative hepatocellular carcinoma as it relates to length of patient survival. Remuzzi et al. (2004) used Cox model to compare the effect of two treatments viz: mycophenolate mofetil (MM) and azathioprine (AL) for the prevention of acute rejection in renal transplant. They concluded a risk on MM compared to AL as 13.7%. Faradmal, Talebi, Rezaianzadeh, and Mahjub (2012) applied Cox-and frailty models to breast cancer.
Researches on tuberculosis include Gavrilenko (2001), Chang, Leung, and Tam (2004), Ponnuraja and Venkatesan (2010), Maciel et al. (2013), and Nwumbeni, Luguterah and Adampah (2014). Kim (2012), and Gogtay and Thatte (2017) reviewed methods in survival analysis from medical science perspective; Baghestani et al. (2015) worked on breast cancer data using Weibull model and found out that patients with lymphovascular invasion were at 2.13 times greater risk of death due to breast cancer. Veisi, Rezaei and Nadarajah (2018) applied parametric and semiparametric models to growth failure of children in Iran; they recommended log-normal model. Ghorbani et al. (2019) applied Cox model to study rejection rate of kidney transplants in Iran.
Although survival models seem mostly applied in health sciences, the application is not limited to this. Applications in business environment as they relate to survival of firms include: Demirbag, Apaydin, and Tatoglu (2011), Morikawa (2013), Kim and Lee (2016), Moniche, and Morales (2016), and Matsuno and Ito (2018).
Previous studies have focused much on various types of cancer, tuberculosis (TB), HIV, unemployment duration and so on. Most studies on TB have not incorporated occupation as a covariate, the present article does. Other covariates involved are gender, TB type and age. It is the focus of this article to obtain a subset of the covariates that best explain survival (recovery) of TB patients.
The remaining part of the article is organized as follows: Section 2 presents the Theoretical Framework; Section 3 presents Methodology; Section 4 presents Results and Discussion while the last section concludes the article.

Theoretical Framework
Several models exist in the literature for handling survival data. These models range from non-parametric Kaplan-Meier model, life tables and cumulative hazards estimator to semi-parametric Cox proportional hazards model due to Cox (1972) to fully parametric models. The fully nonparametric models are univariate in nature in that they describe survival with respect to the factor under investigation and take no cognizance of the possible impact of any other factors. Other models however, recognize the impact of covariates on the factor under investigation and hence, fall under multivariate methods of survival analysis.
Let us denote by ) (t s , the survivor function which gives the probability that a subject survives longer than specified time t.
Theoretically, for a continuous time variable, T is the density assumed for the time to event variable, T. No matter its form, s (t) has the following properties: it is non-increasing; s (0) = 1 and s (∞) = 0 (Kleinbaum & Klein, 2012 An important characterization of survival data is the hazard function. Hazard function denoted ) (t h is also called failure rate, conditional failure rate, age-specific failure rate or force of mortality. It is called instantaneous rate because it provides instantaneous potential for the event of interest to occur, per unit time, given that the subject has survived up to time t. That is, the probability of failing or experiencing the event in the next small interval, having already survived to the beginning of the interval (Kleinbaum & Klein, 2012).
For a continuous random variable, Hence, the probability density, hazard and the survival functions are alternative forms for describing distribution of survival times.

Data
The data were collected from Federal Medical Centre, Bida, Niger State, Nigeria. Bida is located in North Central Zone of Nigeria and it is the third largest community in the State. The natives of Bida are by tribe Nupes. The data were extracted from TB patients' records (covering 2010 to 2016) at Federal Medical Centre, Bida.
The covariates of interest in this study were age, gender, type of TB, and occupation. They were coded as follows: Gender Occupations that typically expose the individual to chemicals or any other toxic substance were categorized as Technical. Such included farming, welding, agrochemical business, engine oil business and so on.

Cox-Proportional Hazards (PH) model
The PH model is a multivariate survival analysis tool that utilizes the relationship between survival and explanatory variables.
The mathematical formula of PH model is The function ) ( 0 t h is called the baseline hazard function and it represents the value of ) (t h when all covariates assume value zero. It is unspecified and hence, estimated nonparametrically. The function ) (t h is therefore, some multiple of the baseline hazard function. The PH model is robust to modest departures from PH assumptions and this explains reasons for its popularity among multivariate survival analysis methods. It has often proved to be a safe choice when one is not sure of fulfillment of parametric alternatives. An advantage of note is that model parameters can be estimated without assuming a distribution form for The PH model is a multiple linear regression of )) ( log( t h on the covariates with log of baseline function, )) ( log( 0 t h as the intercept. Proportionality in the PH model means that i e β are hazard ratios. A higher hazard ratio greater than 1 indicates that increase in the value of the variable is accompanied by increased event hazard and hence, decreased length of survival (Bradburn, Clark, Love & Altman, 2003).
Under proportionality assumption, the log of the hazard ratio for the i-th subject to the reference group is linear in the covariates. That is, All-possible combination approach that utilizes all combination of covariates was employed. The approach produced 15 models consisting of 4 one-variable models, 6 two-variable models, 4 three-variable models and 1 four-variable model.

Model Estimation
The models were estimated by a method known in the literature as maximum partial likelihood method. It is partial because the approach requires that only probabilities for subjects who experience the event are considered, censored subjects are excluded.
We shall motivate the estimation procedure by recognizing that: An individual censored at t i contributes mas.ccsenet.org Modern Applied Science Vol. 14, No. 4; 2020 An individual that experiences the event at t i contributes Combining the condition of each subject, whether censored or not, into the likelihood, that is combining Equations 10 and 11, we have a total likelihood of the form: δ is an indicator that assumes 1 when a subject fails and 0, otherwise.

Multiplying and dividing the total likelihood (Equation 12) by
where R(t i ) as the risk set at the failure time of individual i, we have ( ) The first term in this expression contains all of the information about β , while the last terms contain information about the baseline function ) ( 0 t h (Cox, 1972).
Focusing on the first term, Then, under the Cox PH model assumption, where l j is the log-partial likelihood contribution at the j-th ordered death time. Maximum partial likelihood estimators can be obtained by solving (for each β using Equation 15)

Model Selection
Akaike information criterion (AIC) was the adopted as basis for selection in the article. Proposed by Akaike  Vol. 14, No. 4;

Results and Discussion
Parameter estimates, standard errors and AICs of fitted models are presented in Tables 1 to 4 below.   Table 1 presents parameter estimates, associated standard errors, hazards ratios and AIC for the four one-variable models involved. The Cox model (G) involving only gender as the covariate has the least AIC of 1618.705 and is hence, the best among the four.  Table 2 presents results for the six two-variable models under consideration. Model (G+C), with gender and occupation as covariates has the least value of AIC being 1618.597. The model hence, represents the best in the category of two-variable models.   Table 4 presents results for the only four-variable model involved. For obvious reasons, there is no choice to make in this case. However, considering all the fifteen models in contention, Model (G+C), with gender and occupation as covariates has the least value of AIC of 1618.597. It is hence, the overall best. Gender and occupation constitute the best subset of covariates that explain survival of TB patients. The two variables can therefore, be said to have had greatest influence on time to recovery of TB patients from the infection. The model suggests that recovery hazard of a male TB patient is 24.1% lower than that of a female patient that possesses same occupational classification. Typically, a male patient has higher survival time than the female having same occupational status. It further suggests that recovery hazard for patient on technical occupation is 27.46% higher than for patient on non-technical job and of same gender. Hence, a patient on technical occupation has reduced survival time compared to one of same gender on non-technical occupation.

Conclusion
The article has performed Cox Proportional Hazards modeling of TB data with four covariates. Typically, a male TB patient has higher survival time than the female having same occupational status; a patient on technical occupation has reduced survival time compared to one of same gender on non-technical occupation; gender and occupation are found to explain best, survival of TB patients based on AIC.