A New Bivariate Distribution With Applications on Dependent Competing Risks Data

,


Introduction
The Weibull distribution is one of the most commonly used life-time distributions in reliability and lifetime data analysis. It is flexible for modeling failure time data, as the corresponding hazard rate function can be increasing, constant or decreasing. But in many applications in reliability and survival analysis, an appropriate hazard rate function should be of bathtub shape. The hazard rate function plays a central role to the work of reliability, see (Bebbington, Lai, & Zitikis, 2007b), (Bebbington, Lai, & Zitikis, 2007a) and the references therein. Models with a bathtub hazard rate function are often needed in reliability analysis and decision-making when the system's lifetime is to be modeled. Many parametric probability distributions have been introduced to analyze real data sets with bathtub shaped hazard functions. The bathtub shaped hazard function provides an appropriate conceptual model for some electronic and mechanical products as well as the lifetime of humans (Sarhan & Apaloo, 2013). An extensive amount of work has been done related to bathtub shaped hazard functions, see for example, (Sarhan, Tadj, & Hamilton, 2014), (Sarhan & Mustafa, 2022), (Sarhan & Zaindin, 2009), (Sarhan, 2009), (Sarhan & Kundu, 2009), (Sarhan, Abd EL-Baset, & Alasbahi, 2013), (Smith & Bain, 1975), (Leemis, 1986), (Gaver & Acar, 1979), (Hjorth, 1980), (Mudholkar & Srivastava, 1993), and (Lemonte, 2013). (Xie, Tang, & Goh, 2002) proposed a new modified extension of the Weibull distribution with a bathtub-shaped hazard rate function. We refer to this extension as the Modified Weibull Extension (MWE) distribution. The probability density function (pdf), the survival function (sf), and the hazard rate function (hrf) of the MWE(λ, α, γ) distribution are, respectively: where x ≥ 0 and α, γ, λ > 0. Here α and λ are scale parameters while γ is a shape parameter. The Chen distribution (Chen, 2000) with parameters λ and γ is a special case of the MWE with α = 1.
In many practical situations, multivariate lifetime data arise, and it is important to consider different distributions that could be used to model such data. (Sarhan & Balakrishnan, 2007), (Sarhan, El-Gohary, El-Bassiouny, & Balakrishnan, 2009), (Sarhan, Hamilton, Smith, & Kundu, 2011) , (Kundu, Sarhan, & Gupta, 2012) and (Sarhan, Apaloo, & Kundu, 2022) used independent shock models to propose bivariate/multivariate lifetime distributions. These studies used the bivariate distributions mainly to fit bivariate data sets. Such distributions can also be used to analyze dependent competing risks In this paper, we will first use the independent shock model concept to introduce a new bivariate distribution using the MWE distribution, named as the BMWE distribution. The BMWE distribution generalizes the BCD of  . We study the properties of the BMWE distribution. We discuss how to use this new proposed distribution as a dependent competing risks model. The maximum likelihood method is used to estimate the unknown parameters of the BMWE distribution using both bivariate data and competing risks data.
The rest of the paper is organized as follows. In Section 2, we introduce the BMWE distribution and discuss some of its basic properties. In Section 3, we discuss the dependent competing risks model. The maximum likelihood method using bivariate data and dependent competing risks data is discussed in Section 4. Data generation and some simulation results are presented in Section 5. In Section 6, we provide the analysis of two real data sets using the new proposed distribution and compare it with the BCD. The paper is concluded in Section 7.

The BMWE Distribution
The BMWE distribution can be formulated as follows. Let U 1 , U 2 , and U 3 be independent random variables such that U j ∼ MWE(λ j , α, γ), j = 1, 2, 3. Let X 1 = min(U 1 , U 3 ) and X 2 = min(U 2 , U 3 ). The distribution of the random vector (X 1 , X 2 ) follows the bivariate modified Weibull extension distribution, and it will be denoted by BMWE(λ 1 , λ 2 , λ 3 , α, γ). This distribution can be interpreted in a reliability context as follows. Consider a reliability system consisting of two units. The system units are subject to three independent fatal sources of shocks. A shock from the first source destroys unit 1, a shock from the second source destroys unit 2, while a shock from the third source destroys both units. Let X 1 and X 2 be the lifetime of the system units 1 and 2, respectively. Then, X 1 = min(U 1 , U 3 ) and X 2 = min(U 2 , U 3 ). Under the assumption that time at which the shocks occur (U 1 , U 2 and U 3 ) follow MWE distributions, as described above, the joint distribution of (X 1 , X 2 ) will follow the BMWE distribution. We can easily show that the marginal distributions of X 1 and X 2 follow MWE(λ 1 + λ 3 , α, γ) and MWE(λ 2 + λ 3 , α, γ), respectively.
Note that due to the fact that P(X 1 = X 2 ) = λ 1 λ 1 +λ 2 +λ 3 , while the Lebesgue measure of the set A = (x 1 , x 2 )|x 1 = x 2 > 0 is zero, the BMWE distribution will have both absolutely continuous and singular parts. This means that the BMWE distribution is continuous but not absolutely continuous with respect to ordinary Lebesgue measure on (0, ∞) × (0, ∞).
The following theorem gives the joint survival and joint probability density functions of the BMWE distribution, and its marginal distributions.
2. The absolutely continuous parts of the joint pdf of (X 1 , X 2 ), f 1 (x 1 , x 2 ) and f 2 (x 1 , x 2 ) can be easily obtained by using the relationship between the joint pdf and joint sf given by f X 1 ,X 2 (x 1 , x 2 ) = ∂ 2 ∂x 1 ∂x 2 S X 1 ,X 2 (x 1 , x 2 ). The singular part f 0 (x) can be obtained by using the following well know identity: 3. It is straightforward using the definition of X j = min(U j , U 3 ), j = 1, 2.
Special Case: The bivariate Chen distribution (BCD), that was proposed in , can be derived from the BMWE distribution as a special case from the BMWE distribution when α = 1. Figure 1 shows the absolutely continuous parts of the jpdf of (X 1 , X 2 ) and the corresponding contour plots at some values of the model parameters. While figure 2 displays the pdf and hrf of the marginal distribution of X 1 at a set of parameter values chosen to illustrate a bathtub shape for the hazard function. We used Matlab to draw these figures.

Dependent Competing Risks Model
In survival and reliability analysis one is often interested in the assessment of one risk in the presence of other risk factors. In statistics, this is known as the competing risks problem. The competing risks model assumes that the data consists of the failure time of the underlying object and the associated cause of failure. An extensive amount of study has been carried out under both parametric and non-parametric assumptions, see for example the monograph by (Crowder, 2001) in this respect. In this paper, we adopt the parametric setup along with the latent failure time assumption. Furthermore, it is assumed here that there are two possible causes of death (competing risks), although the method can be easily generalized to any fixed number of competing risks.
It is assumed in the competing risks model that the underlying experimental object (system, or a human, or an animal) is under fatal attack from two competing risks. The object will be destroyed if it receives at least one attack from the two risks. The attack from risk j ( j = 1, 2) occurs at time X j . When the object fails, we observe two quantities (T, δ), where T represents the lifetime of the object, and δ is an indicator that describes the cause of failure. That is, 0 if the cause of failure is unknown, 1 if the risk 1 causes the failure, X 1 < X 2 , 2 if the risk 2 causes the failure, X 2 < X 1 . (Crowder, 2001) includes an extensive literature dealing with the analysis of competing risks data based on the specific continuous parametric distribution assumption on X 1 and X 2 , assuming X 1 and X 2 to be independent random variables. In many applications, the independence assumption will not hold. Therefore, competing risks models with dependent causes of failure are needed in reliability/survival analysis.
Recently, to analyze the dependent competing risks data, some work adopted the assumption that (X 1 , X 2 ) has a specific bivariate distribution, see for example (Feizjavadian & Hashemi, 2015), (Shen & Xu, 2018), (Samanta & Kundu, 2021) and . In this paper, we assume that that the joint distribution of (X 1 , X 2 ) is the BMWE distribution. This assumption produces a dependent competing risks model with risks that follow modified Weibull distributions. Under this assumption, the risks will have either increasing, decreasing or bathtub shaped hazard rate functions. It follows that the BMWE distribution can be applied to the analyis of a variety different types of lifetime data of dependent competing risks type.

Maximum Likelihood Estimation
In this section, we estimate the model parameters θ = (λ 1 , λ 2 , λ 3 , α, γ) using two different types of data. Type I Data consists of simple random samples of the bivariate vector (X 1 , X 2 ) that follows the new bivariate distribution. Type II Data consists of dependent competing risks data (T, δ), with T = min{X 1 , X 2 }, where (X 1 , X 2 ) follows a bivariate lifetime distribution and δ represents the cause of failure. The maximum likelihood method is used to estimate model parameters.
Type I Data (Bivariate data): Let us assume that (X 11 , X 21 ) , (X 12 , X 22 ) , · · · , (X 1n , X 2n ) is an independent and identical random sample of (X 1 , X 2 ) that follows the BMWE(λ 1 , λ 2 , λ 3 , α, γ). For simplicity, let us introduce the indicator variables δ i , i = 1, 2, · · · , n, where: Using the above sample, the likelihood function can be expressed as Substituting from (5) into (6) , we get the log-likelihood function as where n j = n i=1 I(δ i = j), j = 0, 1, 2, with I(A) = 1 if A is true and 0 otherwise. In order to get the MLE of the model parameters, we need to maximize the log-likelihood function of the parameters, for the given data, with respect to those unknown parameters. Equivalently, we need to solve the likelihood equations derived by setting the first partial derivatives of the log-likelihood function with respect to the unknown parameters to zero. The MLE's are the solution of obtained likelihood equations, at which the Fisher information matrix should be positive definite. The Fisher information matrix consists of the second partial derivatives of the log-likelihood function with respect to the parameters. For the distribution in question, the likelihood equations cannot be solved analytically, and we have used numerical routines in R software to solve the system of five non-linear equations in five unknowns.
Type II Data (Dependent Competing Risks Data): We observe a pair of quantities: T , the system time to failure, and δ an indicator of the cause of failure. This means that the observations in this case are (T i , δ i ), i = 1, 2, · · · , n. The lifetime experiment that produces this type of data can be illustrated as follows: (1) we put n independent and identical devices (system/objects) on the life test, (2) each system is under the attack from two dependent competing risks which occur at times X 1 and X 2 , the system will fail once it receives one of the two attacks, (3) (X 1 , X 2 ) follows the BMWE distribution; (4) we observe (T i , δ i ), i = 1, 2, · · · , n, where T i = min(X 1i , X 2i ), and δ i is defined as δ i = 1, if risk 1 causes the failure (X 1i < X 2i ); δ i = 2, if risk 2 causes the failure (X 2i < X 1i ) and δ i = 0, if the cause of failure is unknown (X 1i = X 2i ). The likelihood function using such dependent competing risks data is where Substituting (9) and (10) into (8), we can get logarithm of the likelihood function L 2 (θ), say L 2 (θ), as L 2 (θ) = n 0 ln λ 3 + n 1 ln λ 1 + n 2 ln λ 2 + n ln γ + (γ − 1) To get the MLE of the model parameters, using dependent competing risks data, we set the first partial derivatives of L 2 with respect to the five parameters λ 1 , λ 2 , λ 3 , α, γ equal to zero, we get a system of five non-linear equations in five unknowns. Solving the first three equations, in λ j , j = 1, 2, 3, we get λ j as a function of (α, γ), as Substituting (12) into (11), we can express the log-likelihood function as a function of two parameters (α, γ), say L * 2 (α, γ). This will make the optimization of the log-likelihood function much easier, since we will deal only with two parameters instead of five. Once we get the MLE of α and γ, we can use (12) to get the MLE of λ j , j = 1, 2, 3.
Since the MLE of the parameters using either type of data are not obtained in closed form, we can not get the explicit sampling distributions of the MLEs of these parameters. Therefore, we cannot obtain exact confidence intervals for the model parameters. Alternatively, we can use the large sample distribution of the MLE and the corresponding observed Fisher information matrix to derive approximate confidence intervals for the model parameters.

Data Generation and Simulation Results
In this section, we present how to generate random samples from the proposed BMWE model. We then show how to use the bivariate sample to generate dependent competing risks data from the underlying distribution.
Given a random sample of the bivariate variables (X 1 , X 2 ), say (x 11 , x 21 ), (x 12 , x 22 ), · · · , (x 1n , x 2n ) from BMWE distribution, we can generate a dependent competing risks data set, "Type II Data", with two risks, by applying the following algorithm: Algorithm 2: 1. Compute the time to failure of the i th object as t i = min{x 1i , x 2i }, i = 1, 2, · · · , n, 2. Determine the cause of the i th failure, δ i , as 1 if x i1 < x i2 (this indicates that risk 1 causes the failure) 2 if x i1 > x i2 (this indicates that risk 2 causes the failure) 0 if x i1 = x i2 (this indicates that the cause of failure is unknown.) We used algorithms 1 and 2, to generate a random sample of 100 dependent competing risk observations from the BMWE distribution when λ 1 = 0.063, λ 2 = 0.072, λ 3 = 0.023, α = 2160000, γ = 1.556, as shown in Table 1. The reason for using these values of the model parameters is that we get similar values when we analyze a real data set, as will be discussed later in Section 6.
Using the generated sample, we computed the MLE of all model parameters, the corresponding standard error (SE), the 95% CI (lower and upper limits) and the percentage error (PE) as given in Table 2. Based on these results, we conclude that the random number generation algorithms and the estimation process used in this paper work very well.

Data Analysis
This section illustrates how the proposed BMWE distribution can be applied to real data sets. We analyze two real data sets. The first is a bivariate data set and the second is a dependent competing risks data set.

Bivariate Real Data Set
In this subsection we analyze the Union of European Football Associations (UEFA) data set. This data was introduced in (Meintanis, 2007). It consists of bivariate observations (X 1i , X 2i ), i = 1, 2 · · · , 37. Here X 1 measures the time in minutes of the first kick goal scored by any team, while X 2 is the time in minutes of the first goal of any type scored by the home team. For more information about this data, we refer to (Meintanis, 2007) . This data was analyzed by different researchers using different bivariate distributions. For example, (Meintanis, 2007) used the Marshall-Olkin bivariate exponential (MOBE), (Kundu & Gupta, 2009) used the bivariate generalized exponential (BVGE), (Sarhan et al., 2011) used the bivariate generalized linear failure rate (BGLFR) distribution, and recently  used the bivariate Chen distribution (BCD). In this paper, we use the BMWE distribution to fit this data set.
To compare the performance of different non-nested distributions (BVGE, BGLFR, BCD, and BMWE) to fit the underlying data set, we use Akaike Information Criterion (Akaike, 1969), AIC = −2L + 2k, where k is the number of the model parameters. Table 5 shows the MLE, the value of the log-likelihood function and AIC for all the models stated above. The results for BVGE, BGLFR and BCD are taken from  . (Sarhan et al., 2011) showed that BVGE should be rejected versus the BGLFR distribution at any significance level greater than or equal to 0.05.  showed that BCD fits the data better than BGLFR model. Since BCD is a special case from BMWE, we can use the likelihood ratio test statistic Λ that follows a chi square distribution with one degree of freedom χ 1 , to test the following hypothesis: H 0 : the data follow BCD, (α = 1) vs H 1 : the data follow BMWE, (α 1).
Using the results in Table 5, we get Λ = −2 L H 0 −L H 1 = 4.08, and the corresponding p-value is P(χ 1 ≥ 4.08) = 0.04496. Therefore, BCD is rejected to fit this data set, in favor of BMWE distribution at at any significance level of 0.04496 or larger.

Diabetic Retinopathy Data Analysis
Diabetic retinopathy is a major eye condition that can cause vision loss and blindness in diabetic people. It affects blood vessels in the retina (the light-sensitive layer of tissue in the back of the eye). The National Eye Institute in Bethesda, Maryland conducted an experiment on 71 patients, to study the effect of laser treatment in reducing the risk of blindness. For each patient, the laser treatment was given to a randomly selected eye. The time to blindness and the indicator whether treated or untreated or both eyes became blind were recorded. The main aim of the experiment is to test whether the laser treatment has any effect in delaying the onset of blindness in the diabetic retinopathy patients. The data set provided from this experiment can be treated as a dependent competing risks data with two causes of failure (blindness). The data are presented in Table 6, and the BMWE distribution may be a reasonable model to analyze this data set.  used BCD to analyze this data and reported, based on the Kolmogorov-Smirnov test statistic, that the BCD is a good fit for this data. In this paper, we use the BMWE distribution to fit this data set and compare it with the BCD. Table 7 shows the MLE, the corresponding log-likelihood function and AIC for BCD and BMWE distribution. Based on the AIC, we can conclude that BMWE distribution fits this data set better than BCD. Moreover, performing the Likelihood ratio test statistic, the value of the test statistic is 6.54 and the corresponding p-value is 0.0105. Therefore, the BCD is rejected in favor of the BMWE distribution at a level of significance larger than or equal to 0.0105. Table 7. MLE, log-likelihood values, and AIC for BCD and BMWE distributions, using the Diabetic Retinopathy Data Model MLEL AIC BCDα 1 = 0.00123,α 2 = 0.00145,α 3 = 0.00044,γ = 0.27221 −581.26 1170.52 BMWEλ 1 = 0.063,λ 2 = 0.072,λ 3 = 0.023,α = 2.16 × 10 6 , −577.99 1165.98 γ = 1.556 Finally, we can investigate if the laser treatment has a significant effect on delaying the blindness of the Diabetic Retinopathy patients. To do so, we can test the following hypothesis H 0 : λ 1 = λ 2 (laser is not effective) vs H 1 : λ 1 λ 2 (laser is effective) To use the likelihood ratio test, we need the the log-likelihood function, under H 0 , which can be obtained from (11), by assuming that λ 1 = λ 2 = λ, as L 02 (θ) = n 0 ln λ 3 + (n 1 + n 2 ) ln λ + n ln γ + (γ − 1) Under the null hypothesis, the MLE of the unknown parameters areλ = 0.0196,λ 3 = 0.0103,α = 227279.9,γ = 1.5552, the corresponding log-likelihood value is −579.43. Thus, the value of the test statistic is Λ = −2 (577.99 − 579.43) = 2.88, and the associated p-value is 0.09. Therefore, we cannot reject H 0 , in favor of H 1 , at any significance level less than or equal to 9%, which indicates that the laser treatment does not have a significant effect in delaying blindness.

Conclusion
In this paper, we used the fatel shock model concept to propose a new bivariate distribution of Marshall-Olkin type, using three independent univariate random variables that follow the modified Weibull extension distribution. This distribution is referred to as BMWE distribution. The BMWE distribution generalizes the bivariate Chen distribution that was recently proposed by . We illustrated how to use the BMWE distribution to fit dependent competing risks data as well as bivariate data. We estimated the unknown parameters included in the BMWE distribution using the maximum likelihood method based on two different types of data sets: a bivariate data set and a dependent competing risks data set. We performed a simulation study to investigate the consistency of the maximum likelihood method. Finally, we used the new proposed distribution to fit two real data sets (bivariate data and dependent competing risks data) and compared it with the bivariate Chen distribution and some other related distributions and reported that the BMWE fits those data sets better than the other distributions.
As a future plan, we can use either step-stress acceleration life test plan and/or progressively type-II censored data to estimate the parameters of BMWE in the presence of either bivariate data or dependent competing risks data.