On the Gumbel-Burr XII Distribution: Regression and Application On the Gumbel-Burr XII Distribution: Regression and Application

In this article, additional properties of the Gumbel-Burr XII distribution, denoted by (GBXII(L)), deﬁned in (Osatohan-mwen et al., 2017), are studied. We consider some useful characterizations for the GBXII(L) distribution and some of its properties. A simulation study is conducted to assess the performance of the MLEs and the usefulness of the GBXII(L) distribution is illustrated by means of three real data sets. The simulation study suggests that the maximum likelihood method can be used to estimate the distribution parameters, and the three examples show that the GBXII(L) is very ﬂexible in ﬁtting di ﬀ erent shapes of data. A log-GBXII(L) regression model is proposed and a survival data is used in an application of the proposed regression model. The log-GBXII(L) regression model is adequate and can be used in comparison to other models. assess the estimation of the parameters using the maximum likelihood method. From the results of the simulation study, the MLEs perform reasonably well and improve as the sample size increases. To illustrate the usefulness of GBXII(L) model in application, three real data sets arising from diverse disciplines and having di ﬀ erent characteristics (di ﬀ erent sample sizes and di ﬀ erent shapes) are used. The ﬁt is compared with other existing models based on the K-S, AIC and BIC statistics. The GBXII(L) performed very well in ﬁtting di ﬀ erent right skewed or left skewed data that can be unimodal or bimodal. A LGBXII regression model is proposed to explain and predict a response variable. A two-arm (clinical trial) survival time data is used in an application of the proposed LBGXII regression model and the ﬁt is compared to three other comparable models. The LBGXII regression model performed very well based on the AIC and the BIC statistics. Thus our results exhibit the fact that GBXII(L) probability model and the LGBXII regression model may be used as alternatives to some other well-known models. Future research includes exploring e ﬃ cient parameter estimation, extension to the multivariate cases and associated inference.


Introduction
Statistical distributions can be used to model many real life scenarios, such as reliability, actuarial science, survival analysis and lifetime (Tahir and Cordeiro, 2016).Seeking flexibility in modeling real life phenomena has been a strong reason to develop new statistical distributions.Many distributions have been developed by adding new shape, scale or location parameters.Eugene et al. (2002) introduced the beta-generated class of distributions with cumulative distribution function (CDF) G(x) = F(x) 0 1 B(α,β) t α−1 (1 − t) β−1 dt, α > 0, β > 0, where F(x) is the CDF of any continuous random variable X.Many distributions were developed utilizing this technique, such as the beta-normal (Eugene et al., 2002), beta-Weibull (Famoye et al., 2005), beta-generalized exponential (Barreto-Souza et al., 2010), beta-Cauchy (Alshawarbeh et al., 2012) and beta-Pareto (Akinsete et al., 2008).Jones (2009) and Cordeiro and de Castro (2011) proposed the Kumaraswamy-generated distributions (KwG) with CD-F F(x) = G(x) 0 αβt α−1 (1 − t α ) β−1 dt, α > 0, β > 0. Distributions developed using the KwG technique include the Kumaraswamy-Weibull (Cordeiro et al., 2010), the Kumaraswamy-generalized half-normal (Cordeiro et al., 2012) and the Kumaraswamy-geometric (Akinsete et al., 2014).Alzaatreh et al. (2013) proposed a transformer-transformed technique to generate a class of distributions T -X(W) by using a transformation W(F(x)) of the CDF F(x) of a random variable X, that satisfies the conditions: ] is differentiable and monotonically non-decreasing, and (iii) W [F(x)] → a as x → −∞ and W [F(x)] → b as x → ∞. (1) The class of T -X(W) distributions is then defined by where T is a random variable with probability density function (PDF) r T (t) on (a, b).Aljarrah et al. (2014) used W(F(x)) = Q Y (F(x)), the quantile of a random variable Y, as a transformation of a CDF F(x).
They defined the CDF of the T -X{Y} family as Families of distributions developed using (2) and (3) include the Weibull-G (Bourguignon, et al., 2014), the T -normal (Alzaatreh et al., 2014), the T -Weibull (Almheidat, et al., 2015) and the T -X(Logit) (Al-Aqtash, 2013).For more on recent developments in distribution theory, we refer the interested reader to Lee et al. (2013) and Tahir and Cordeiro (2016).
(4) Osatohanmwen et al. (2017) used the CDF of the Burr XII distribution (F(x) = 1 − (1 + (x/λ) c ) −k ) (Burr, 1942) in (4) to define the CDF of the five parameter GBXII(L) distribution The PDF of the GBXII(L) distribution is given by Osatohanmwen et al. ( 2017) studied the shapes, hazard, Shannon entropy, moments and parameter estimation of the GBXII distribution.Figure 1 displays some of the different shapes of the GBXII density.In this article, we discuss additional properties of the GBXII distribution.
The rest of the article is organized as follows: In section 2, we discuss some characterizations of the GBXII(L) model.
In section 3, we present some additional properties of the GBXII(L) distribution.In section 4, the maximum likelihood method for parameter estimation is discussed and simulation results are presented to study and assess the performance of the maximum likelihood estimators.In section 5, three data sets are used to exhibit the flexibility of this distribution.In section 6, a log GBXII regression model is proposed and a survival time censored data set is used in an application of the proposed regression model.Finally, some concluding remarks are provided in section 7.

Characterizations of GBXII(L) Distribution
This section deals with various characterizations of GBXII(L) distribution.These characterizations are presented in three directions: • based on the ratio of two truncated moments, • in terms of the reverse hazard function and • based on the conditional expectation of certain function of the random variable.
It should be noted that characterization based on truncated moments can be employed even when the CDF does not have a closed form.

Characterization Based on Truncated Moments
Our first characterization employs a theorem due to Glänzel (1987), see Theorem A1 of Appendix A. The result, however, holds also when the interval H is not closed since the condition of Theorem A1 is on the interior of H.We like to mention that this kind of characterization based on a truncated moment is stable in the sense of weak convergence (Glänzel, 1990).
Proposition 1.Let X : Ω → (0, ∞) be a continuous random variable and let for x > 0. The random variable X belongs to the family (6) if and only if the function η defined in Theorem A1 has the form Proof : Assume X is a random variable with PDF (6), then Conversely, if η is given as above, then and hence Now, according to Theorem A1, random variable X has density (6) .
Corollary 1.Let X : Ω → (0, ∞) be a continuous random variable and let q 2 be as in Proposition 1.Then, X has PDF (6) if and only if there exist functions q 1 and η defined in Theorem A1 satisfying the differential equation The general solution of the differential equation in Corollary 1 is where D is a constant.Note that a set of functions satisfying the above differential equation is given in Proposition 1 with

Characterization in Terms of the Reverse Hazard Function
The reverse hazard function, r F , of a twice differentiable distribution function, F, is defined as In this subsection we present characterization of GBXII(L) distribution in terms of the reverse hazard function.
Proposition 2. Let X : Ω → (0, ∞) be a continuous random variable.Then, X has PDF (6) if and only if its reverse hazard function r F (x) satisfies the differential equation Proof : If X has PDF (6), then clearly the differential equation holds.Now, if the differential equation holds, then , from which we arrive at the reverse hazard function of (6).
Remark 2. For k = 1, we have the following simple differential equation

Characterization Based on the Conditional Expectation of Certain Function of the Random Variable
In this subsection we employ a single function ψ of X and characterize the distribution of X in terms of the conditional expectation of ψ.The following proposition has already appeared in Hamedani(2013), so we will just state it here which can be used to characterize GBXII(L) distribution.
In the next section we present some structural properties of the GBXII(L) distribution.

Mean Deviations
Two measures of dispersion for GBXII(L) are the mean absolute deviation from the mean and the mean absolute deviation from the median.Let X be a random variable from GBXII(L) with mean µ and median M. The mean absolute deviation from the mean is given by The mean absolute deviation from the median is given by The integral I 1,(0,τ) = τ 0 x f GXBII(L) (x)dx in ( 7) and ( 8) can be calculated numerically.

Transformation
A special case of the GBXII(L) distribution, when c = 1, is the Gumbel-Lomax distribution, studied by Tahir et al. (2016).The connection between the GBXII(L) to the Gumbel, standard uniform, standard exponential and Weibull distributions were discussed in Osatohanmwen et al. (2017).The link between the GBXII(L) to the Weibull-Dagum distribution is given in the following lemma.
Lemma 1.If Y is a random variable following the Weibull-Dagum distribution (Tahir et al., 2016) Proof : The results follow directly from the transformation technique.

Quantile Function
Osatohanmwen et al. ( 2017) derived the formulas for the quantile function and the median for GBXII(L), respectively, as Remarks 4. It follows directly from (9) that the quantile function for GBXII(L) is • an increasing function of λ, when all other parameters are held fixed, • a decreasing function of k, when all other parameters are held fixed, • an increasing function of β, when all other parameters are held fixed, • decreasing, increasing, or constant function of σ when all other parameters are held fixed, if p < e −β , p > e −β , or p = e −β respectively.
• decreasing, increasing, or constant function of c when all other parameters are held fixed, if

Parameter Estimation and Simulation
According to Nadarajah and Okorie (2018), the log likelihood function for the GBXII(L) distribution reported in Osatohanmwen et al. ( 2017) is incorrect and hence might affect the accuracy of the simulation and estimation results.Nadarajah and Okorie (2018) provided the correct log likelihood function for the GBXII(L) which can be written as , where x 1 , . . ., x n are the observed values of GBXII(L) with parameters β, σ, c, k and λ.
The correct log likelihood function for the GBXII(L), provided by Nadarajah and Okorie (2018), is used in estimation of the unknown parameters of the GBXII(L) distribution, using the method of maximum likelihood.
The components of the score vector, The initial vlues are calculated using the criteria reported in Osatohanmwen et al. (2017) as follows: Since c, k and λ are coming from Burr XII distribution (BXII), and β and σ are coming from Gumbel distribution (GD), we can assume that the random sample x 1 , . . ., x n is following BXII, and use the BXII MLEs c, k, and λ as initial values c 0 , k 0 and λ 0 .The next step is to transform the sample from GBXII(L) to GD using the transformation Then the initial values for the last two parameters are β 0 = e ν 0 /σ 0 and σ 0 = s y √ 6/π, where ν 0 = ȳ − γσ 0 and σ 0 are the GD moment estimates, γ is the Euler's constant, ȳ and s y are the mean and standard deviation of y 1 , . . ., y n respectively (Johnson et al., 1995, p. 12).

Simulation Study
A simulation study is conducted to assess the performance of the maximum likelihood estimators based on the bias and the standard deviation.For this purpose we fix the value of scale parameter λ as 1.0 and k as 2.0 and use a total of 10 parameter combinations and different sample sizes: • β = 0.9, 1.6, 2.4, 3.0, 3.6 • n = 200, 400, 600, 1000.For each parameter combination and each sample size, a random sample is simulated from the GBXII(L) distribution.The SAS NLPTR subroutine is used to estimate the parameters by maximizing the log-likelihood function.This process is repeated 100 times and the bias and the standard deviation are presented in Table 1.These results show that the maximum likelihood estimation method performs reasonably well.It is observed that the shape parameters σ and c are negatively biased and β is positively (or negatively) biased depending on σ less (or greater) than 1.In general, bias and the standard deviation of the parameters are reasonable and decrease as the sample size increases.The simulation study suggests that the maximum likelihood method can be used efficiently to estimate the parameters of the GBXII(L) distribution.

Application
In this section, three examples are used for the purpose of illustration.The MLEs are computed and the fit is compared to other well known distributions based on the p-value of the Kolmogorov-Smirnov (K-S) test statistic, the Akaike infor-mation criterion (AIC), and the log-likelihood value.These data sets were selected because they come from completely diverse disciplines, have different sample sizes, have non-negative values, beside that they differ structurally, i.e., in terms of skewness and kurtosis.

Actuarial Science Data
This Actuarial Science data (skewness = 0.07, kurtosis = 0.08) is taken from Balakrishnan et. al. (2009) and later on was also used for model fitting by Tahir et. al. (2016).In order to be able to do long and short term financial estimation, such as the assessment of the reserve required to pay the 'minimum pensions', it is important for the Mexican Institute of Social Security (IMSS) to study the distributional behavior of the mortality of retired people on disability.This data set corresponding to lifetime (in years) of retired women with temporary disabilities, which are incorporated in the Mexican insurance public system and who died during 2004, are:

Glass Fibers Data
The strengths of 1.5 cm glass fibers data presented below was recorded at the National Physical Laboratories, England.This data is reported in Smith and Naylor (1987) and is left skewed with skewness = -0.95 and kurtosis = 1.10.The data was used by Barreto-Souza et al. (2010) in an application of the beta-generalized exponential (BGE) distribution, and was also used by Al-Aqtash et al. (2015) as an application of the Gumbel-Weibull (GW) distribution.0.55, 0.74, 0.77, 0.81, 0.84, 0.93, 1.04, 1.11, 1.13, 1.24, 1.25, 1.27, 1.28, 1.29, 1.30, 1.36, 1.39, 1.42, 1.48(2) Again four distributions are used to fit the data, namely, the beta-generalized exponential distribution, the beta-Birnbaum- Saunders (BBS) distribution (Cordeiro and Lemonte, 2011), the Gumbel-Weibull distribution, and the GBXII(L) distribution.The MLEs and goodness of fit statistics are presented in Table 4. Figure 4 contains the histogram of the data and the fitted PDFs as well as the empirical and fitted CDFs.By comparing the goodness of fit statistics among the four distributions, we observe that the GBXII(L) outperforms the other distributions.

The Log-GBXII Regression Model With Application to Survival Data
Suppose that the survival time X follows the GBXII distribution in (5) with the parameters β, σ, c, k, λ > 0, then the survival function for the GBXII distribution is given by If we take the log transform of X, and redefine the parameters as c = 1/τ and λ = e µ , then, Y = log(X), can be written as a log linear model, Y = µ + τW, where the random variable W = (Y − µ)/τ is the standardized log-GBXII (SLGBXII) distribution with PDF, and In the analysis of most survival data, the relationship between the covariates and the survival time X is of interest.This relationship might be written as a linear relationship between the log of the survival time X and the covariate values as follows: Consider the survival time X i of the i th individual in the sample, for i = 1, . . ., n, and suppose that we also have a set of p covariates such that Z i = (1, z i1 , . . ., z ip ) T , where the 1 is for the intercept term.The log-linear (or location-scale) regression model which links the dependent variable Y i = log(X i ) and the set of covariates is given by where γ T = (γ 0 , γ 1 , . . ., γ p ) are the unknown regression coefficients of the values of p covariates, τ is an unknown scale parameter, and W i is the error variable.
In order to incorporate covariates into the LGBXII model, we use the log-linear model ( 12) for the survival time X i , where W i has the SLGBXII distribution (10) such that µ i = γ T Z i is the location parameter of Y i and τ, β, σ, k > 0 are unknown parameters.With the model in ( 12), the survival function for Y is expressed as . Now, consider a sample of n independent observations, and let the random variables X i and C i denote the lifetime and censoring time for the i th individual, whereas the response Y i represents a log-lifetime or a log-censoring time for the i th individual, such that Y i = min(log(X i ), log(C i )) for i = 1, 2, . . ., n.If all the observations are uncensored, then based on the LGBXII distribution in (11), the log likelihood for the model parameters θ = (β, k, σ, τ, γ T ) T can be written as where Assume some of the observations are right censored and let C and F be the sets of censored and uncensored observations, respectively.In addition, if we assume non-informative censoring such that the censoring times are independent of the observed survival times, then the log-likelihood function for the model parameters Θ = (β, k, σ, τ, γ T ) T is given by where m is the number of uncensored observations.
The MLE θ can be obtained by maximizing the log-likelihood function in ( 13) or ( 14).It is common to use numerical non-linear optimization methods for that purpose.The NLMIXED procedure in SAS is used in this article to obtain the values in θ.
In the remaining part of this section, we apply the LGBXII regression model to fit the data from a two-arms (different treatments) clinical trial, which was previously analyzed by Efron (1988), from a study comparing treatment with radiotherapy alone (Arm A) to radiotherapy plus chemotherapy (Arm B), for head and neck cancer.The response for each patient is the survival time in days.Nine patients out of 51 in Arm A and 14 patients out of 45 in Arm B were lost to follow-up (censored).Now, let Y i be the log survival time for the i th patient and z i1 be a binary covariate: two-arms, which is coded as (0 for Arm A, or 1 for Arm B).We fit this data using the LGBXII regression model.The log linear model is defined as Y i = γ 0 + γ 1 z i1 + τW i , for i = 1, 2, . . ., 96, where the random variable W i follows the SLGBXII distribution with PDF (10).
We also compare the fit of the LGBXII regression model with the fits of other competitive lifetime models, namely: log-Gumbel-Weibull (LGW) (Al-Aqtash et al., 2015), log-beta-Weibull (LBW) (Ortega et al., 2011), and log-Pareto-Weibull generalized lambda (LPWGL) (Aldeni et al., 2017).The LGW, LBW, and LPWGL survival functions (for y ∈ R, −∞ < µ = γ 0 + γ 1 z i1 < ∞, τ > 0 and the remaining parameters are all positive) are given below: LGW survival function Among the fitted models, the LGBXII model has the lowest AIC and BIC values as indicated in Table 5, so the LGBXII model provides the best fit to the data, followed successively by the LBW and LGW models.In the fitted LGBXII regression model, we see that the covariate two-arms is not significant at the 5% level.In other words, Arm A clinical trial is not significantly different from Arm B clinical trial for the survival times.The plots of the empirical survival (Kaplan-Meier) function and the estimated survival functions of the LGBXII, LBW, and LGW models are depicted in Figure 6.These plots suggest that the LGBXII model is appropriate to fit this data.The two shape parameters, k and σ provide more flexibility to this distribution, so it can be considered a very competitive model to other lifetime models.

Concluding Remarks
In this article, we discuss a member of the T -X(Logit) family of distributions, namely the GBXII(L) distribution.Some useful characterizations and additional properties of the GBXII(L) are discussed.A simulation study is conducted to assess the estimation of the parameters using the maximum likelihood method.From the results of the simulation study, the MLEs perform reasonably well and improve as the sample size increases.To illustrate the usefulness of GBXII(L) model in application, three real data sets arising from diverse disciplines and having different characteristics (different sample sizes and different shapes) are used.The fit is compared with other existing models based on the K-S, AIC and BIC statistics.The GBXII(L) performed very well in fitting different right skewed or left skewed data that can be unimodal or bimodal.A LGBXII regression model is proposed to explain and predict a response variable.A two-arm (clinical trial) survival time data is used in an application of the proposed LBGXII regression model and the fit is compared to three other comparable models.The LBGXII regression model performed very well based on the AIC and the BIC statistics.Thus our results exhibit the fact that GBXII(L) probability model and the LGBXII regression model may be used as alternatives to some other well-known models.Future research includes exploring efficient parameter estimation, extension to the multivariate cases and associated inference.
and q 2 be two real functions defined on H such that E q 2 (X) | X ≥ x = E q 1 (X) | X ≥ x η (x) , x ∈ H, is defined with some real function η.Assume that q 1 , q 2 ∈ C 1 (H), η ∈ C 2 (H) and F is twice continuously differentiable and strictly monotone function on the set H. Finally, assume that the equation ηq 1 = q 2 has no real solution in the interior of H. Then F is uniquely determined by the functions q 1 , q 2 and η , particularly x a C η (u) η (u) q 1 (u) − q 2 (u) exp (−s (u)) du, where the function s is a solution of the differential equation s = η q 1 ηq 1 −q 2 and C is the normalization constant, such that H dF = 1.We like to mention that this kind of characterization based on the ratio of truncated moments is stable in the sense of weak convergence (Glänzel, 1990), in particular, let us assume that there is a sequence {X n } of random variables with distribution functions {F n } such that the functions q 1n , q 2n and η n (n ∈ N) satisfy the conditions of Theorem A1 and let q 1n → q 1 , q 2n → q 2 for some continuously differentiable real functions q 1 and q 2 .Let, finally, X be a random variable with distribution F.Under the condition that q 1n (X) and q 2n (X) are uniformly integrable and the family {F n } is relatively compact, the sequence X n converges to X in distribution if and only if η n converges to η , where This stability theorem makes sure that the convergence of distribution functions is reflected by corresponding convergence of the functions q 1 , q 2 and η respectively.It guarantees, for instance, the 'convergence' of characterization of the Wald distribution to that of the Lévy-Smirnov distribution if α → ∞.
A further consequence of the stability property of Theorem A1 is the application of this theorem to special tasks in statistical practice such as the estimation of the parameters of discrete distributions.For such purpose, the functions q 1 , q 2 and, specially, η should be as simple as possible.Since the function triplet is not uniquely determined it is often possible to choose η as a linear function.Therefore, it is worth analyzing some special cases which helps to find new characterizations reflecting the relationship between individual continuous univariate distributions and appropriate in other areas of statistics.

Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Figure 2. (A) Histogram and the estimated PDFs and (B) The empirical and the estimated CDFs for the Actuarial Science data

Figure 3 .
Figure 3. (A) Histogram and the estimated PDFs and (B) The empirical and the estimated CDFs for the skin folds data Figure 4. (A) Histogram and the estimated PDFs and (B) The empirical and the estimated CDFs for the glass fibers data

Figure 6 .
Figure 6.Kaplan-Meier and the estimated survival functions for the two-arms data (A) LGBXII model versus LBW model.(B) LGBXII model versus LGW model

Table 1 .
Bias and standard deviation of MLEs

Table 2 .
Akaike information criterion (AIC) and Bayesian information criterion (BIC) and concluded that WD gives the overall best fit, followed by McD.We fit the Weibull-Dagum distribution, the beta-normal (BN) distribution, the Gumbel-Weibull (GW) distribution, and the GBXII(L) distribution to the data.The maximum likelihood estimates along with their standard errors and different goodness-of-fit statistics or model selection criteria are reported in Table2.It is evident from this table that GBXII(L) has the smallest lowest AIC, BIC and K-S test statistic value and consequently the highest p-value.Figure2presents the histogram of the Actuarial Science data along with the estimated densities and also the empirical and estimated CDFs of the four models.Thus it can be concluded that the GBXII(L) model outperforms the other distributions under consideration.MLEs (standard errors in parentheses) and model selection criteria for the Actuarial Science data * parameter estimates are fromTahir et al. (2016).

Table 4 .
MLEs (standard errors in parentheses) and model selection criteria for the glass fibers data * parameter estimates are from Barreto-Souza et al. (2010) ** parameter estimates are from Al-Aqtash et al. (2015)

Table 5 .
Parameter estimates and fit statistics for the two-arms data (standard errors in parentheses) and[p-values in  brackets]