The Kendal-Ressel Exponential Dispersion Model : Some Statistical Aspects and Estimation

In this paper we revisit the Natural Exponential Family (NEF) of distributions generated by the Kendall-Ressel density. We study some of the statistical properties of this class (KR-NEF), as well as those of an associated class of Exponential Dispersion Model (KR-EDM). In particular, we discuss some of the immediate properties of these distributions, especially under the so called mean re-parameterization. The moments and other cumulants of these distributions are thoroughly presented as well as expressions for the measures of their skewness and kurtosis. Estimation procedures under the mean parameterization are also discussed where the maximum likelihood, second order minimax and a Bayes estimators are presented and illustrated.


Introduction
Let λ > 0 and consider the probability density function (pdf) f λ on the positive real line defined by The pdf in (1) is called the Kendall-Ressel density (hereafter the KR density or distribution) with parameter λ.This density appears in various areas.For an M/G/1 queueing system with arrival rate λ, it is the limiting distribution, as λ → ∞, of the length of the busy period T (λ) − λ initiated by the virtual time quantity λ > 0 (c.f.Prabhu,1965, pages 73 and 237).In characterization of the regression of the sample variance on the sample mean, Fosam and Shanbhag (1997) showed that such a regression is cubic in the sample mean for only for six distributions, of which one is the KR distribution.Kokonendji (2001) also revealed this distribution on his investigation of first passage times on 0 and 1 of some Lévy processes for natural exponential families (NEF's).Additional references can be provided here regarding the KR distribution, but one of the most detailed reference on some of the probabilistic nature of the KR density is Letac and Mora (1990) who characterized all NEF's having cubic variance functions (VF's), of which, one is the KR-NEF.One can consult Proposition 5.5 of Letac and Mora (1990) for checking the puzzling formula ∫ ∞ It should be noted that the Laplace Transform (LT) of (1) cannot be expressed explicitly (see next section) but rather is can be given as the solution of some functional equation.This fact implies that the corresponding KR-NEF and KR-EDM do not have explicit representation in terms of their canonical parameter θ (see Eq. ( 3)).Fortunately, they can be explicitly expressed by their mean value parameter m (a more important parameter than θ), making them applicable for statistical use.
The aim of the present study is to further expose the KR-NEF and KR-EDM by some of their statistical properties as we believe they could naturally be used for statistical modeling.However, the work is primarily focused on some estimation aspects such as second order minimax estimation of the mean and its related Bayes estimates.The paper is organized as follows.In Section 2 we present first some known basic preliminaries on NEF's and EDM's, and then describe the KR-NEF and EDM in terms of their mean value parameterization.Cumulants, moments and other related measures are presented in Section 2.4.Section 3 is devoted to estimation, where Section 3.1 deals devoted to the maximum likelihood equations (where the related dispersion parameter ζ is either known or not).Section 3.2 is devoted to second order minimax estimation of the mean (with either ζ known or unknown).We prove that these estimates behave better than the MLE for the mean (i.e. the sample mean) in terms of mean squared error.Some simulation results are provided.Section 3.3 focuses on Bayesian estimation of the mean.We show that that under the implied conjugate prior or the flat, non-informative, prior distributions, useful expressions for the Bayes estimates of the mean are readily available.We close in Section 4 with some concluding remarks.
2. Some Preliminaries of NEF's and the KR NEF and EDM
The KR NEF distribution we consider here is generated from (1) with λ = 1 and (2), and is given in the form of where Letac and Mora (1990) . Accordingly, we define the KR-EDM class of distributions in a similar fashion to (4) by the pdf, with θ ∈ Θ and λ > 0.

The Mean Value Reparameterization
Note that f 1 (•; θ), θ ∈ Θ, defines a one-parameter family of distributions whereas f λ (•; θ) θ ∈ Θ, λ > 0, defines a twoparameter one.Further, it is well known that the cumulant transform k 1 in (4) is strictly convex and real analytic on Θ and m = k ′ 1 (θ) ∈ M is the mean function of f 1 and M is the mean domain of f 1 .Similarly, V(m) = k ′′ 1 (θ), is the variance function (VF) of f 1 , with m ∈ M. We remind the reader that the VF (V, M), uniquely determines an NEF within the class of NEF's.Classes of various VF's have been thoroughly classified in the literature.Along these lines, Letac and Mora (1990) have shown that despite the intractable form of the cumulant transform k 1 in (4) as defined by the implicit function (3), the VF of the KR-NEF defined in (4) has a simple tractable form given by As was already indicated earlier, the LT (2) and hence k 1 (θ) cannot be expressed explicitly in terms of θ.Practically however, the parameter θ by itself is not of much interest, but rather, the mean parameter m is.In which case, both θ and the cumulant transform k 1 (θ) can be expressed explicitly in terms of the mean parameter m, m ∈ M.
For this, we use the well-known fact that given the VF, (V, M) of an NEF, the respective natural parameter θ and cumulant transform k(θ) can be expressed in terms of its mean m as follows (c.f.Morris, 1982, andLetac andMora, 1990), Since by ( 6), the VF of Now, using the fact that f 1 is a density and that θ → 0− iff m → ∞, we obtain that c 1 = c 2 = 0 in ( 8) and ( 9).Consequently, the reparameterization of f 1 by the mean m, (m > 0), is given by probability density function Similarly, the probability density function of the corresponding EDM f λ in (5) under this mean value reparameterization, with m > 0, is Further, when the transformation x → x/λ and λ → 1/λ ζ is applied to the EDM densities in ( 11), one obtains a family of probability densities of the form with (ζ, m) ∈ R + × R + .Note that (12), may be presented in a more compact form as, The form of the density in ( 13) was derived in eq. ( 27) of Vinogradov (2011) (while correcting a typo there in which Γ(x + λ + 1) should have been Γ(λ(x + 1) + 1)).However, it is not straightforward to see from Vinogradov's derivations how one obtains (13).Here we have chosen to present (8) and ( 9) which provide the details on how to compute the constants c 1 and c 2 in (8) and ( 9) and thus to allow one to readily obtain the mean value representation of the densities in 12) and ( 13).We note that such computations could be utilized for other NEF's and their corresponding EDM's.
The behavior of the tail or near 0 of (13) can be deduced from Vinogradov (2011, eq. 34).Namely, for a fixed ζ, as x ↓ 0, and as x → ∞, we obtain These expressions imply that as x ↓ 0 and for ζ > 1, f ζ (x; m) behaves as the gamma density with shape parameter smaller than 1 and for ζ = 1 as the negative exponential density.As x → ∞ and ζ > 0, it behaves as the inverse Gaussian density.The above figures suggest the potential versatility of the Ressel EDM in statistical applications and modeling.We further illustrate the applicability of the KR-EDM model in the following figure which depicts the histogram of data which is a (unbeknownst) mixture of two negative exponential distributions; 75% Exp(2) and 25% Exp(0.5).As can be seen, the KR-EDM model produces a much better fit to these (mixture) data than a single negative exponential distribution (with the same estimated mean for both).

Cumulants, Moments, Skewness and Kurtosis
In general, the r-th cumulant of an EDM with densities as in ( 13) is given by (see Jorgensen (1987, Eq. 2.6)) where k (r) is the r-th derivative of k.The first two cumulants, as are given in terms of the mean m, are where V(m) is given by (3) (i.e., for ζ = 1).As we consider m to be the parameter of interest, we write κ r (m) = κ r (θ(m), ζ) as a function of m (if no ambiguity is caused).Hence by ( 14) and ( 15) we have recursively that (where κ ′ r−1 is to reflect a differentiation with respect to m).Accordingly, upon using (3), we obtain that the r-th cumulant can be given the general form where p r−2 is a polynomial of degree r − 2 which apparently does not have a closed form and should be computed specifically for each r.
If we denote by µ r the r-th central moment then µ r = κ r for r = 2, 3, and otherwise it is given by The coefficients of skewness, γ 1 = k 3 /k 3/2 2 , and kurtosis, γ 2 = k 4 /k 2 2 have been shown by Vinogradov (2011, eq. 32-33) to be and and thus are leptokurtic.

Estimation
Based on a sample (X 1 , ..., X n ), n ≥ 2, of i.i.d.observations from (12) we consider in this section three estimation procedures for the mean m when ζ is either known or unknown.The three estimation procedures are the maximum likelihood, second order minimaxity and Bayes.Section 3.1 briefly deals with the maximum likelihood equations (where the related dispersion parameter ζ is either known or not).Section 3.2 is devoted to second order minimax estimation of the mean (with either ζ known or unknown).We prove that these minimax estimates behave better than the MLE for the mean (i.e. the sample mean) in terms of mean squared error.Section 3.3 is focused on Bayesian estimation of the mean.We show that that under a naturally implied conjugate prior or the non-informative, flat prior distributions, useful expressions for the Bayes estimates of the mean are readily available.These estimations procedures are augmented and illustrated with some simulations results.

Maximum Likelihood Estimation
Whether ζ is known or not it is readily seen from ( 13) that the MLE for m is the sample average, Xn .Since the corresponding KR-NEF or KR-EDM are steep the MLE m = Xn exists with probability 1.
The MLE ζn for ζ cannot, however, be derived explicitly but it can be obtained numerically by the implicit solution of the likelihood equation for ζ given by where ) − m and θ(m) and k 1 (θ(m)) are as given in ( 8) and ( 9) (with c 1 = c 2 = 0) and where

Second Order Minimax Estimation of the Mean
Second order minimax theory of estimation of the mean parameter of EDM's was worked out in Landsman (2001), Bar-Lev and Landsman (2006) and Bar-Lev, Bshouty and Landsman (2010) (hereafter, L(2001), BL(2006) and BBL( 2010)).As the KR-NEF is a special case of the KR-EDM, when the dispersion parameter ζ = 1, we consider the case of KR-EDM (with either ζ known or unknown).
Recall that the MLE m = Xn of m is a first order optimal estimator of m, as its quadratic risk is given by where, as in ( 3)), V(m) = m 2 (m + 1) is unit (ζ = 1) VF the KR-EDM.
To begin, we first assume that the dispersion parameter ζ is known.We are reminded that an estimator m n for m minimaxly improves on the MLE m = Xn in second order with respect to a given weight function q(m) > 0 and where R( Xn , m) = ζV(m)/n and with α 1 > 0, In L( 2001) it was shown that Theorem 4 in BBL(2010) introduced some conditions under which q(x) and the value α 1 can be determined directly in a manner that a second order minimax estimator m n can be derived explicitly.Indeed, this Theorem asserts that with V(x) = x 2 (1 + x), t = 2, p = 3, s = (p − 1)/2 = 1 and r = 0, (in the notations of that theorem -we skip details for brevity), together with a polynomial P(x) = 1 + x and the steepness of F ζ , for any given ζ > 0, we obtain the weight function q(x) as Accordingly, the differential equation in the Dirichlet problem (21) becomes This equation has positive solution with α = α 1 = 4. Hence, the second order minimax estimator of m has the form (see Remark 2 in L( 2001)) and eq.(4.12) in BBL ( 2010)) whose quadratic risk has the following asymptotic expansion (see ( 20)) We now consider the situation when dispersion parameter ζ is unknown.It was shown in BL (2006) that for an EDM, the mean parameter m and the dispersion parameter ζ are orthogonal in the sense that the 2 × 2 Fisher Information matrix is diagonal.This would imply the following (see Lemma 1 in BL( 2006)).In similarity to (25), where ζn is MLE of ζ obtained from ( 22).Then i.e., the risks of m * n and m n coincide up to the term o( 1 n 2 ).The estimator m * n is called the modified second order minimax estimator of mean parameter m (when ζ is estimated by ζn in ( 19)).Note that the estimator in ( 23) is a consistent estimator of m.

Some Simulation Analysis
Numerical investigation shows that equation ( 19) has a unique positive root, ζn >.Accordingly, for numerical illustrations we generated N = 1000 samples from a KR-EDM density in (13) of sizes n = 10 and ζ = 0.2, using Metropolis-Hastings algorithm.This experiment was repeated 5 times with m varying from 0.5 to 2.9 with uniform increments.For each experiment we calculated the following three estimators: m = Xn , the second order minimax estimator m n (calculated from ( 22) with ζ = 0.2), and the modified second order minimax estimator m * n (calculated from ( 23)).For each estimator we calculated its sample relative mean squared error (SRMSE) where m n ( j) is the corresponding estimator in the j-th sample, j = 1, ..., N. Figure 4 displays the graphs of RMS E(m) for the three estimators.We find that the graphs of the S RMS E(m)'s for m * n (solid line) and m n (dashed line) are relatively close while both are essentially better than the S RMS E of Xn (dotted line).

Bayesian Estimation
Let X 1 , ..., X n be i.i.d.observations from a KR-EDM as given in ( 13) and denote T n = ∑ n i X i .As was pointed out earlier, the MLE of m is m = Xn ≡ T n /n.For any m > 0, let β = (1 + m)/m.We will compare the properties of the MLE βMLE = (1 + m)/ m to those of the Bayesian estimator for β obtained under a conjugate and a non-informative prior distributions.To that end, we consider this re-parameterization of the mean m with which, f ζ (•; •) in ( 13) can be rewritten in terms of β, β ∈ (1, ∞), as We denote the sample data by X n = (X 1 , ..., X n ) and their observed values by x n = (x 1 , ..., x n ), then for a given ζ > 0, the likelihood function for β, based on x n is where t n = ∑ n i=1 x i and C n (x n ; λ) is some factor that is independent of β.The form of the likelihood function ( 25) suggests the modified gamma distribution as the conjugate prior distribution for β (over (1, ∞)).That is, let ξ ≥ 0, α > 0 and b > 0, then the modified gamma distribution over (ξ, ∞), denoted by G ξ (α, b), has a pdf.

Figure 3 .
Figure 3. Illustrating the fit of the Ressel Distribution to mixed Exponential Data

Figure 4 .
Figure 4. Graphs of sample RMSE for estimators m * n (ζ is unknown), m n (ζ is known) and the MLE Xn .