A Marshall-Olkin Power Log-normal Distribution and Its Applications to Survival Data

In this paper, using Marshall-Olkin transformation, a new class of Extended Power Log-normal distribution which includes the Power Log-normal and Log-normal distributions as special cases is introduced. Its characterization and statistical properties are studied. A real survival dataset is analyzed and the results show that the proposed model is ﬂexible and appropriate.


Introduction
A Log-normal distribution is a well known continuous probability distribution of a random variable whose logarithm is normally distributed.In survival analysis, the lognormal distribution is extensively used in applications, for example, see Gupta et al. (1997), Royston (2001), Rutqvist (1985) and Johnson et al. (1996) etc.The density and cumulative distribution functions of a Log-normal random variable denoted by X ∼ LN(μ, σ) are given by, for −∞ < μ < ∞, σ > 0, x > 0, where φ and Φ are the density and cumulative distribution functions of the standard normal distribution.Nelson and Dognanksoy (1992) extended the Log-normal distribution and introduced the Power Log-normal distribution whose density and cumulative distribution functions are given by, for −∞ < μ < ∞, σ > 0, p > 0, x > 0. We denote it as X ∼ PLN (μ, σ, p).
They fitted it to the life or strength data from specimens of various sizes.They presented that such a model arises when any specimen can be regarded as a series system of smaller portions, where portions of a certain size have a normal life (or strength) distribution.The statistical analysis can also be found in Nelson and Doganaksoy (1995).Szyszkowicz and Yanikomeroglu (2009) and Liu et al. (2008) proposed the use of power lognormal distributions to approximate lognormal sum distributions.
On the other hand, by adding a new parameter α > 0 to an existing distribution, Marshall and Olkin (1997) proposed a new family of survival functions.The new parameter results in flexibility in the distribution.Let F(x) = 1 − F(x) be the survival function of a random variable X.Then The density function corresponding to (3) is given by and the hazard rate function is given by , where h F (x) is the hazard rate function of the original model with distribution F.
Using the Marshall-Olkin transformation (3), several researchers have studied various distribution extensions.Marshall and Olkin (1997) generalized the exponential and Weibull distributions.Alice and Jose (2003) introduced Marshall-Olkin extended semi Pareto model for Pareto type III and estabilished its geometric extreme stability.Semi-Weibull distribution and generalized Weibull distributions are discussed by Alice and Jose (2005).Ghitany et al. (2005) studied the Marshall-Olkin Weibull distribution, that can be obtained as a compound distribution mixing with exponential distribution, and applied it to model censored data.Marshall-Olkin Extended Lomax Distribution was introduced by Ghitany et al. (2007).Jose et al. (2010) investigated Marshall-Olkin q-Weibull distribution and its max-min processes.García et al. (2011) generalized the standard Log-normal distribution.
In this paper, we use the Marshall-Olkin transformation to define a new model, so-called the Marshall-Olkin Power Log-normal distribution (MPLN), which generalizes the Power Log-normal, the Log-normal model.We aim to reveal some statistical properties of the proposed model and apply it to survival analysis.
The rest of this article is organized as follows: in Section 2, we introduce the new defined distribution and investigate its basic properties, including the shape properties of its density function and the hazard rate function, stochastic orderings and representation, moments and measurements based on the moments.Section 3 discusses the estimation of parameters by the method of maximum likelihood.An application of the MPLN model to real survival data is illustrated in Section 4. Our work is concluded in Section 5.

Density and Hazard Function
Let X follow the Power Log-normal distribution PLN(μ, σ, p), then its survival function is given by F Substituting it in (3) we obtain a Marshall-Olkin Power Log-normal distribution denoted by MPLN(μ, σ, p, α) with the following survival function The corresponding density function is given by If α = 1, we obtain the Power Log-normal distribution with parameter μ, σ, p.Furthermore, if p = 1, it reduces to the Log-normal distribution.This distribution contains the Power Log-normal distribution and Log-normal distribution as particular cases.The hazard rate function of the MPLN(μ, σ, p, α) distribution is given by Figure 1(b) shows some shapes of the MPLN(μ, σ, p, α) hazard function with various parameters.

Stochastic Orderings
In statistics, a stochastic order measures the concept of one random variable being "larger" than another.It is an important tool to judge the comparative behavior.Here are some basic definitions.
A random variable X is less than Y in the ususal stochastic order (denoted by Ramesh and Kirmani (1987).
Proof.The density ratio is given by .
Taking the derivative with respect to x, is a decreasing function of x.The results follow.

Stochastic Representation
Let Ḡ0 (x|λ), −∞ < x < ∞, −∞ < λ < ∞, be the conditional survival function of a continuous random variable X given a continuous random variable λ.Let Λ be a random variable with probability density function m(λ).Then the distribution with survival function is called a compounding distribution with mixing density m(λ).Compounding distribution provides a useful way to obtain new class of distributions in terms of existing ones.The following result shows that the MPLN(μ, σ, p, α) distribution can be expressed as a compound distribution.
Proposition 2 Suppose that the conditional survival function of a continuous random variable X given Λ = λ is given by Let Λ have an exponential distribution with density function Then the random variable X has the MPLN(μ, σ, p, α) distribution.
Proof.For x > 0, the survival function of X is given by , which is the survival function of the MPLN(μ, σ, p, α) distribution.
For λ > 0, Ḡ0 (x|λ) defines a class of non-standard distributions.Compounding a distribution belonging to this class with an exponential distribution for λ leads to a certain MPLN(μ, σ, p, α) distribution.Next we will present another stochastic representation of the MPLN(μ, σ, p, α) distribution.
Proof.By definition of the moment, The above expression seems to have no compact form.We can compute it with the help of computer.For the standardized skewness coefficient , where μ 1 , μ 2 , μ 3 , μ 4 are the moments given in (8), Figure 2 shows the skewness and kurtosis coefficients for the Marshall-Olkin Power Log-normal MPLN(μ = 0, σ = 1, p, α) model.The qth quantile x q = G −1 (q) of the MPLN(μ, σ, p, α) distribution is given by where G −1 (•) is the inverse of distribution function.In particular, the median of the MPLN(μ, σ, p, α) distribution is given by median Figure 3 displays the measures of central tendency (mean, median) of the MPLN(μ = 0, σ = 1, p, α) distribution.
From the figure, it is found that, as expected, the mean is larger than the median.The distribution has a right tail.

Maximum Likelihood Estimation
In this section, we consider the maximum likelihood estimation about the parameters (μ, σ, p, α) of the Marshall-Olkin Power Log-normal model.Suppose X 1 , X 2 , . . ., X n is a random sample of size n from the Marshall-Olkin Power Log-normal distribution MPLN(μ, σ, p, α).Then the likelihood function is given by and the log-likelihood function is given by The estimates of the parameters maximize the likelihood function.Taking the partial derivatives of the loglikelihood function with respect to μ, σ, p, α respectively and equalizing the obtained expressions to zero yield to likelihood equations.
However, the equations do not lead to explicit analytical solutions for the parameters.Thus, the estimates must be obtained by means of numerical procedures such as Newton-Raphson method.The program R provides the nonlinear optimization function optim for solving such problems.
It is known that under some regular conditions, as the sample size increases, the distribution of the MLE tends to a multivariate normal distribution with mean θ = (μ, σ, p, α) T and covariance matrix equal to the inverse of the Fisher information matrix I −1 (θ), see Cox and Hinkley (1979).The score vector and Hessian matrix are given in the Appendix.The multivariate normal distribution can be used to construct approximate confidence intervals for the parameters.
The likelihood ratio test can be used to test if the fit using MPLN model is statistically better than a fit using the PLN model.That is, we can test the hull hypothesis H 0 : α = 1 against H 1 : α 1.When H 0 is true, the likelihood ratio statistic d = 2[l(μ, σ, p, α) − l(μ, σ, p, 1)] has approximately a chi-square distribution with 1 degree of freedom, see Neyman and Pearson (1928) and Wilks (1938).

Application
In this section, we consider a real data set to illustrate the proposed model.The data taken from Davis (1952) are the number of miles to first and succeeding major motor failures of 191 buses operated by a large city bus company.The data is shown in Table 1.We fit the data set with the Log-normal(LN), the Power Log-normal (PLN) and the Marshall-Olkin Power Lognormal(MPLN) distributions, respectively, using maximum likelihood method.The results are reported in Table 2.The usual Akaike information criterion (AIC) introduced by Akaike (1973) and Bayesian information criterion (BIC) proposed by Schwarz (1978) to measure of the goodness of fit are also computed.AIC = 2k − 2 ln(L) and where k is the number of parameters in the distribution and L is the maximized value of the likelihood function.
The results show that MPLN model fits best.Figure 4 displays the histogram and fitted models using the MLE estimates.Its detailed characterization and statistical properties such as stochastic orderings, stochastic representation, the moments and measures based on the moments, are presented.The estimation of parameters is approached by the method of maximum likelihood and the Hessian matrix is derived.A real survival dataset is analyzed and the results show that the proposed model is flexible and appropriate.

Appendix: Score Vector and Hessian Matrix
Suppose x 1 , x 2 , ..., x n is a random sample from the MPLN(μ, σ, p, α) distribution, then the log-likelihood function is given by ( 11).The elements of the score vector are obtained by differentiation The Hessian matrix, second partial derivatives of the log-likelihood, is given by where ) p + 1 , The Fisher information matrix I(θ) = −E(H(θ)).

Figure 1 .
Figure 1.Plots of Marshall-Olkin power log-normal density and hazard function for some parameter values

Table 2 .
Maximum likelihood parameter estimates (with standard deviation) of the LN, PLN and MPLN models for the initial bus motor failure data