A New Family of Distributions : Libby-Novick Beta

We define a family of distributions, named the Libby-Novick beta family of distributions, which includes the classical beta generalized and exponentiated generators. The new family offers much more flexibility for modeling real data than these two generators and the Kumaraswamy family of distributions (Cordeiro & de Castro, 2011). The extended family provides reasonable parametric fits to real data in several areas because the additional shape parameters can control the skewness and kurtosis simultaneously, vary tail weights and provide more entropy for the generated distribution. For any given distribution, we can construct a wider distribution with three additional shape parameters which has much more flexibility than the original one. The family density function is a linear combination of exponentiated densities defined from the same baseline distribution. The proposed family also has tractable mathematical properties including moments, generating function, mean deviations and order statistics. The parameters are estimated by maximum likelihood and the observed information matrix is determined. The importance of the family is very well illustrated. By means of two real data sets we demonstrate that this family can give better fits than those ones using the McDonald, beta generalized and Kumaraswamy generalized classes of distributions.


Introduction
It is important to have extended forms of classic distributions in many applied areas such as lifetime modelling.Beta generalized (BG) distributions pioneered by Eugene, Lee, and Famoye (2002) are very versatile to analyze different types of data and these authors extended some special models based on this generator.Eugene et al. (2002) defined the beta normal (BN), Nadarajah and Kotz (2004) introduced the beta Gumbel (BGu) and Nadarajah and Kotz (2006) proposed the beta exponential (BE) distributions.More recently, Pescim, Demério, Cordeiro, Ortega, and Urbano (2010), Paranaía, Ortega, Cordeiro, and Pescim (2011) and Cordeiro and Lemonte (2011) studied important structural properties of the beta generalized half-normal (BGHN), beta Burr XII (BBXII) and beta Birnbaum-Saunders (BBS) distributions, respectively.However, the beta, Kumaraswamy and exponentiated generators do not provide flexibility to the extremes (right and left) of the probability density functions (pdf's) and for this reason, they are not suitable for analyzing real data with high levels of asymmetry.
For an arbitrary baseline distribution G(x; ξ) with parameter vector ξ, the exponentiated-G ("exp-G" for short) model or Lehmann type I class with power parameter d > 0, say exp-G(d), is defined by the cumulative distribution function (cdf) and pdf given by respectively.Hereafter, we consider the random variable Y ∼exp-G(d).For some special cases of Y taking for G the exponential, Weibull, inverse Gaussian and gamma distributions, see Nadarajah and Kotz (2006).A very known exponentiated model is the exponentiated Weibull (EW) (Mudholkar et al., 1995;Mudholkar et al., 1996) distribution.Libby and Novick (1982) pioneered the Libby-Novick beta (LNB) distribution defined on the unit interval (0, 1).
Its cdf and pdf are given by where a > 0, b > 0 and c > 0 are positive parameters, K = c a /B(a, b) and B(•, •) is the beta function.For c = 1, the LNB distribution becomes the standard beta with parameters a and b.
Let Z be a random variable having the LNB density function with positive parameters a, b and c and consider that a = b = k (for k > 0).The dependence of the density function (1) on the shape parameter c > 0 is studied in the following two cases: (i) 0 < c < 1 and (ii) c > 1.For case (i), the parameter c gives tail weights of the pdf to extreme right and the skewness and kurtosis increase when c approaches to zero.On the other hand, the pdf (1) becomes more symmetric when c approaches one.For case (ii), the parameter c gives tail weights of the pdf to the extreme left when c tends to ∞. Similarly as (i), the skewness and kurtosis increase when c tends to infinity.Table 1 provides the values of the variance, skewness and kurtosis of the LNB distribution for a = b = 3.5 for the cases (i) and (ii).Plots of the LNB density function (1) are displayed in Figure 1 for selected parameter values.
where a > 0, b > 0 and c > 0 are three extra shape parameters to those of the G distribution, in order to control the skewness and kurtosis, the variation of the extremes of the generated distribution and add more flexibility.
The LNB-G density function corresponding to (2) can be expressed as Hereafter, we denote by X ∼ LNB-G(a, b, c, ξ) a random variable with pdf (3).For each continuous baseline distribution G, we can associate the LNB-G model with these positive parameters a, b and c defined by Equation (3).
The family (3) extends two well-known families of distributions (BG and exponentiated distributions) and offers more flexibility for modeling data than these two families and the Kumaraswamy family (Cordeiro & de Castro, 2011).The new family usually provides a reasonable parametric fit to real data in engineering, biology, psychology, agriculture, lifetime and reliability studies.Equation (3) would also be of great interest to researchers interested in regression models with presence or absence of censored data.
Several LNB-G models can be easily generated from Equation (3).The LNB-normal (LNBN) distribution follows by taking as baseline the normal cdf.Analogously, the LNB-Weibull (LNBW), LNB-gamma (LNBGa), LNB-Gumbel (LNBGu) and LNB-inverse Gaussian (LNBIG) distributions are obtained from the Weibull, gamma, Gumbel and inverse Gaussian cumulative distributions, respectively.Hence, each new LNB-G model can be generated from a special G distribution.The LNB distribution is clearly a basic exemplar of the LNB-G family when G is the uniform distribution on [0, 1].Equation (3) reduces to the basic exemplar G when a = b = c = 1, which is a positive point for the proposed generator.For c = 1, the LNB-G generator reduces to the beta-G generator (Eugene et al., 2002).Further, for c = 1 and b = 1, we obtain the classical exp-G family given by F LNBG (x; a, 1, 1, ξ) = G(x; ξ) a , which is also called the Lehmann type I class.However, we emphasize that some exponentiated distributions belong in fact to the Lehmann type II class defined by where α > 0. These are the cases of the exponentiated Fréhet and exponentiated Gumbel distributions (Nadarajah & Kotz, 2006).
In this paper, we investigate some structural properties of the LNB-G family.Section 2 defines some special models.In Section 3, the LNB-G pdf is given as a linear combination of exp-G densities.This expansion is used to obtain some of its structural properties.In Sections 4 and 5, we obtain the moments, generating function and Rénvy entropy.The mean deviations and order statistics are investigated in Sections 6 and 7.In Section 8, we address maximum likelihood estimation.In Section 9, we fit special distributions to two real data sets to illustrate their performances.We conclude that they can yield better fits than some other widely-known distributions generated from other families.Section 10 ends with some concluding remarks.

Special Libby-Novick Beta Distributions
Equation ( 3) is most tractable when the baseline cdf and pdf have simple forms.We now provide a few generated cases which can arise of the LNB-G family.

The Libby-Novick Beta-Normal (LNBN) Distribution
The LNBN pdf is obtained from (3) by taking the normal N(μ, σ) as the parent distribution, where ξ = (μ, σ), so that where x ∈ R, μ ∈ R is a location parameter, σ > 0 is a scale parameter, and φ(•) and Φ(•) are the pdf and cdf of the standard normal distribution, respectively.If the extra shape parameters are integers, the moments of (4) are given in terms of the Lauricella functions of type A (Exton, 1978) as shown by Nadarajah (2008).The random variable X ∼ LNBN(a, b, c, μ, σ) represents the density above.The standard LNBN density comes when μ = 0 and σ = 1.For a = b = c = 1, it reduces to the normal density.For c = 1, we obtain the beta-standard normal (Eugene et al., 2002).Finally, if c = 1 in addition to b = 1, it reduces to the exponentiated-normal (EN) density.Plots of (4) for some parameter values are displayed in Figure 2.
The cdf and hazard rate function (hrf) corresponding to (5) are respectively.here, the hrf can be monotonically increasing or decreasing, bathtub shaped and upside-down bathtub.Some possible shapes of the LNBW pdf are given in Figure 3.
For any real b and | t |< 1, we use throughout the power series (Nadarajah & Kotz, 2004, Equation (1.7)) where the binomial coefficient is defined by From Equations ( 2) and ( 8), we can write the LNB-G cdf as Using Equation (3.194.1) in Gradshteyn and Ryzhik (2000), where |arg(1 + βu)| < π and Re(μ) > 0, we obtain where Alternatively, using (7) in Equation ( 9), we can write where If a is a positive integer, Equation (10) shows that the LNB-G cdf is an infinite power series of the G cdf. Otherwise, in the general case, when a is positive real, G(x; ξ) a+i+ j can be expanded as Further, we can express (2) as where and w i, j is given in (10).Changing ∞ k=0 k r=0 by ∞ r=0 ∞ k=r in Equation ( 11), we obtain where the coefficient ∞ k=r t i, j,k,r is a sum of constants and H r (x; ξ) = G(x; ξ) r is the exp-G cdf with power parameter r.Expansion (12) holds for any positive real a.
We note from ( 13) that the LNB-G pdf is a mixture of exp-G densities.This result is important to obtain some measures of X from those of exponentiated distributions.This result plays an important role in the paper.Based on this equation, we can obtain, for example, the moments, generating function and mean deviations of X.The structural measures derived here for the new family even those of formidable complexity can be easily handled in most software platforms such as Maple, Mathematica and Matlab.Established explicit expressions to compute these mathematical measures can be simpler than using numerical integration.The infinity limit in the sums can be substituted by a large positive integer such as 20 or 30 for most practical purpose.The computations for fitting the proposed family to real data in practical problems can be easily performed using the script AdequacyModel in software R.

Moments
Consider that the random variable Y follows the baseline G distribution.The sth moment of X depends on the (s, q)th probability weighted moments (PWMs) of Y, say τ s,q = E[Y s G(Y) q ] (for q = 0, 1, . ..) defined in Greenwood et al. (1979).The quantities τ s,q are usually calculated numerically since they are available in closed-form for a few distributions.We can write from ( 13) Then, the moments of any LNB-G distribution can be determined from the PWMs of the baseline distribution.
Next, we provide an alternative expression for μ s from Equation ( 13) as functions of exp-G moments.From now on, let Y r+1 has the exp-G distribution with power parameter r + 1, i.e.Y r+1 ∼ exp-G(r + 1).We also can write from ( 13) Equations ( 14) and ( 15) are the basic results to be used to derive several LNB-G moments.As a first example, consider the Weibull baseline distribution with scale parameter α > 0 and shape parameter β > 0. If Y r+1 has the EW distribution with power parameter r + 1, its sth moment is where (−r) p was defined before.From (13) and the last equation, the sth moment of the LNBW distribution reduces to (r + 1) e r (−r) p p! (p + 1) (s+β)/β .
For a second example, we take the standard logistic cdf G(x) = (1 + e −x ) −1 .We can easily obtain the sth moment of the LNB-logistic (LNBL) distribution (for t < 1) as The central moments (μ s ) and cumulants (κ s ) of X can be obtained from ( 14) or (15) as respectively, where κ 1 = μ 1 .The skewness γ 1 = κ 3 /κ 3/2 2 and kurtosis γ 2 = κ 4 /κ 2 2 can be calculated from the third and fourth standardized cumulants.Some plots of the skewness and kurtosis for the LNBN distribution (c fixed) as functions of a are displayed in Figure 5.Some plots of the skewness and kurtosis for the LNBW distribution (c fixed) as functions of a are displayed in Figure 6.

Generating Function and Entropy
We provide two representations for the moment generating function (mgf) M LNBG (t) = E e tX of X.The first representation for M(t) is obtained from (13) where M r+1 (t) is the mgf of Y r+1 .Hence, M LNBG (t) can be immediately determined from the exp-G generating function.
A second representation for M LNBG (t) is derived from (13) as where the function ρ(t, r) is a function of the baseline quantile function (qf) Q G (u) as We can obtain the mgf's of several LNB-G distributions from Equations ( 16) and (17).For example, the mgf's of the LNB-exponencial (LNBE) (with parameter λ and for t > λ) and LNBL (for t < 1) distributions are easily determined from these equations as respectively.
The characteristic function (chf) has many useful and important properties which gives it a central role in statistical theory.Its approach is particularly useful in analysis of linear combination of independent random variables.

Mean Deviations
The mean deviations about the mean (δ 1 ) and about the median (δ 2 ) of the random variable X are given by respectively, where μ 1 = E(X), F(μ 1 ) comes from ( 2), M = Median(X) denotes the median computed from the nonlinear equation F(M) = 1/2, and dx is the basic quantity to determine δ 1 (X) and δ 2 (X) in Equations ( 18).Setting u = G(x; ξ) in (13) gives where The mean deviations of any LNB-G distribution can be computed from Equations ( 18)-(20).For example, using the generalized binomial expansion, the mean deviations of the LNBE (with parameter λ), LNBL and LNBPa (with parameter ν > 0) distributions are calculated using the generalized binomial expansion from the functions respectively.
An alternative representation for T (z) can be derived from (13) as where Equation ( 22) is the basic quantity to compute the mean deviations of the exp-G distributions.The LNB-G mean deviations depend only on the quantity J r+1 (z).So, alternative representations for δ 1 and δ 2 are given by A simple application refers to the LNBW distribution.The EW pdf with parameter r + 1 is given by (for x > 0) and then We can calculate the last integral using the incomplete gamma function γ(α, x) = x 0 w α−1 e −w dw (for α > 0) and then Equations ( 19) and ( 21) are useful to obtain the Bonferroni and Lorenz curves defined (for a given probability π) by B(π) = T (q)/(πμ 1 ) and L(π) = T (q)/μ 1 , respectively, where μ 1 = E(X) and q = F −1 (π).

Order Statistics
Consider a random sample X 1 , . . ., X n from a continuous distribution and let X 1:n < • • • < X i:n denote the corresponding order statistics.It is well-known that where B(•, •) denotes the beta function.
Replacing (12) in Equation ( 23), we have Now, using the identity (see Gradshteyn & Ryzhik, 2000), we can write where Combining ( 13) and ( 25), we can rewrite f i:n (x) as where h r+1 (x) was defined in Section 1 and The density function of the LNB-G order statistics can be obtained from ( 26) as a simple linear combinations of exp-G densities.So, the ordinary and incomplete moments, mean deviations, mgf and other measures of these order statistics can be obtained immediately from those exp-G quantities.

Inference and Estimation
A random sample of size n from the random variable X ∼ LNG(a, b, c, ξ), where ξ is a p×1 vector of unknown parameters of the baseline distribution G(x; ξ) is denoted by x 1 , . . ., x n .The log-likelihood function for θ = (a, b, c, ξ) can be expressed as Equation ( 27) can be maximized using the Ox sub-routine MaxBFGS (Doornik, 2007) or SAS Proc NLMixed or from the nonlinear likelihood equations for the score vector.Initial estimates for the extra shape parameters may be inferred from the estimates in ξ.The components of the score vector U(θ) are given by where ψ(•) is the digamma function and ġ(x i ) ξ = ∂g(x i ; ξ)/∂ξ and Ġ(x i ) ξ = ∂G(x i ; ξ)/∂ξ are p × 1 vectors.
We easily evaluate the (p + 3) × (p + 3) observed information matrix J = J(θ), whose elements can be requested from the authors.The multivariate normal N p+3 (0, J( θ) −1 ) distribution, where J( θ) is J evaluated at θ, can be used to construct approximate regions for the model parameters.Further, we can compute the maximum values of the unrestricted and restricted log-likelihoods to compute likelihood ratio (LR) statistics for testing some LNB-G special models.

Applications
Here, we illustrate the importance of the LNB-G family in two applications to real data.We compare the fits of the

Application for the LNBN Distribution
First, consider the data discussed by Weisberg (2005, Section 6.4) which represent 102 male and 100 female athletes collected at the Australian Institute of Sport.The variable evaluated in this study is the Plasma ferritin concentration (Plasma).These data were analyzed recently by Cordeiro et al.(2012) using the McN density function given by where σ > 0 is a scale parameter, −∞ < μ < ∞ is a location parameter, and a, b and c are positive shape parameters.Let W ∼ McN(a, b, c, μ, σ 2 ) be a random variable with the pdf (28).For μ = 0 and σ = 1, we obtain the standard McN distribution.Equation ( 28) includes as sub-models the beta normal (BN) (Eugene et al., 2002) (for c = 1), Kumaraswamy normal (KwN) (Cordeiro & de Castro, 2011) (for c = 1) and the exponentiated normal (EN) (for b = c = 1) distributions.
A natural distribution for modeling these data is the normal distribution which belongs to the class of symmetric best known distributions due to its various theoretical properties and applicability achieved over the years.Further, the skew-normal (SN) (Azzalini, 1985) distribution can also be used to model the behavior of these data since it is a distribution with tails heavier than normal, thus reducing the influence of aberrant observations.The SN distribution is given by where λ ∈ R is the skewness .The density ( 29) is symmetric if λ = 0 (Azzalini, 1985).Formal tests for the extra shape parameters in the LNBN distribution can be performed based on LR statistics.The results for comparing the models to the current data are displayed in Table 3.The rejection of the null models is extremely highly significant for the three LR tests.So, we have a clear evidence of the potential need for the three parameters of the LNBN distribution when modeling real data of this type.

Application for the LNBW Distribution
Here, we compare the fit of the LNBW distribution to the data studied by Meeker and Escobar (1998, p. 383).They give the times of failure and running for a sample of devices from a field-tracking study of a larger system.We fit this distribution to these data.Recently, Alexander et al. (2012) analyzed them using the generalized beta Weibull (GBW) distribution since it extends various distributions previously considered in the lifetime literature.Its cdf and hrf are given by Several special models follow from Equation (30).The McDonald exponential (McE) distribution corresponds to the choice α = 1.Other sub-models, apart from the Weibull distribution itself, are described before: BW (Lee et al., 2007) (for c = 1), KwW (Cordeiro et al., 2010) (for a = 1), exponentiated Weibull (EW) (Mudholkar et al., 1995(Mudholkar et al., , 1996) ) (for b = c = 1), exponentiated exponential (EE) (Gupta and Kundu (2001) (for b = c = β = 1) and beta exponential (BE) (Nadarajah & Kotz, 2006) (for b = c = α = 1) distributions.
A characteristic of the GBG distribution is that its hrf can be monotonically increasing or decreasing, bathtub shaped and upside-down bathtub depending on the parameters.This distribution was also studied by Cordeiro et al. (2014), which derived several of their structural properties.Now, we compare the LNBW and McW distributions and some of their sub-models.
Table 4 lists the MLEs and their standard errors (in parentheses) of the parameters from the fitted LNBW, McW, BW, EW and Weibull models and the AIC, CAIC and BIC values.The computations were performed using the subroutine NLMixed in SAS.These results indicate that the LNBW model has the lowest values for these statistics among the fitted models, and therefore it could be chosen as the best model.Formal tests for other sub-models of the LNBW distribution are conducted using LR statistics as described before.
Applying these statistics to the voltage data, the results are given in Table 5.Clearly, we reject the null hypotheses for the three LR tests in favor of the LNBW model

Figure 1 .
Figure 1.Some plots of the LNB density function

Figure 5 .
Figure 5.For the LNBN distribution.(a) Skewness of X as function of a for some values of c, by taking μ = 100.0,σ = 50.0and b = 0.5.(b) Kurtosis of X as function of a for some values of c, by taking μ = 10.0, σ = 5.0 and b = 2.5

Figure 6 .
Figure 6.For the LNBW distribution.(a) Skewness of X as function of a for some values of c, by taking α = 0.5, β = 1.5 and b = 2.5.(b) Kurtosis of X as function of a for some values of c, by taking α = 1.5, β = 0.5 and b = 0.5

Figure 8 .
Figure 8. Fitted densities for the plasma data.(a) Fitted LNBN vs McN models.(b) Fitted LNBN vs BN models.(c) Fitted LNBN vs KwN models.(d) Fitted LNBN vs SN modelsThe plots of the fitted LNBN pdf and of pdfs of the four distributions discussed before are displayed in Figures

Table 1 .
The variance, skewness and kurtosis values for the LNB model when a = b = 3.5

Table 2 .
MLEs and some statisticsTable2lists the MLEs and their standard errors (in parentheses) for some fitted models to the current data and the Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC) and Bayesian Information Criterion (BIC).The lower the values of these criteria, the better the fit.The computations were performed using the subroutine NLMIXED in SAS.Since the values of these statistics are smaller for the LNBN distribution compared to those values of the other models, this distribution is a better model to explain these data.

Table 4 .
MLEs and statistics