Bayes Statistics for Mixed Erlang-models

Bayesian techniques are usually no standard techniques of Mathematical Statistics. Nevertheless they are applied in different fields of research and practice, e.g. in insurance premium rating. Many theoretical results were derived about Bayesian-statistical methods. Applied to more special model assumptions they give often handy techniques for application. The present paper gives under spezialized conditions such theory-based techniques, that have not been given like that in the literature up to now.


Introduction
Bayesian ideas are already comparably old.Already in the 18th century Thomas Bayes and Pierre-Simon Laplace developed first conclusions on the Bayes-calculus.Nowadays Bayes-theory or Bayes-Statistics is a highly developed part of Mathematical Statistics.The significant difference to usual Mathematical Statistics is the assumption that the parameter of the statistical model is a realization of a random variable.The distribution of that random variable is called a-prioridistribution. Usually one interprets the a-priori-distribution as something like a previous information on the value of the parameter of the model.But note that sometimes one has a more concrete explanation for such an a-priori-distribution.For example in the actuarial field it is the concretely given distribution of the socalled risk parameter in a collective of insurance risks.The classical approach of Bayes-Statistics consists in calculating a socalled a-posteriori-distribution out of the a-priori-distribution by integrating the data of a statistical sample.With that a-posteriori distribution one derives optimal statistical decision rules then.These decision rules are called Bayes-rules.
A lot of research on the Bayes-rules was done already.For surveys on most important things see the books of Berger (1985), Gosh et.al. (2003) and Robert (1994).An elegant introduction (in German) was given by the author (see Kremer (2005a)).
Recently the author noted a certain gap in the field of Bayes research.The closing of that gap is contents of the following paper, that is quite elegant and closed in its results.

The context
Let be given the sample X = (X 1 , ..., X n ) as random variables: and the random-variable θ (on also (Ω, A, P) with values in a set Θ), describing as realisation ϑ the parameter of the underlying stochastic model.Denote with P X ϑ := P X|θ=ϑ the conditional probability (measure) of the X given the realisation ϑ of θ.With this notation the probability law P θ is the a-priori-distribution (of θ) and the contitional distribution of θ given X = x, the socalled a-posteriori-distribution (of θ), just in symbol: P θ|X=x .The last one can be computed (usually) with the socalled Bayes-Theorem (see on this e.g.Kremer (2005a), Theorem 2.1, on page 30 there).
For all that follows assume that the X 1 , ..., X n are i.i.d.given θ, what means in details: Usually one assumes in addition that the P X i |θ=ϑ (ϑ ∈ θ) are dominated by a σ-finite measure μ, giving the μ-density f X i ϑ of the conditional distribution of X i given ϑ, with symbol: So far for the general notations.The new parts of the paper are specialized to the additional model assumptions: (C) θ has an Erlang(a, b)-distribution with (given) fixed a ∈ N and parameter b ∈ (0, ∞), meaning P θ has the Lebesguedensity: on ϑ ∈ (0, ∞).

Basic results
For giving Bayes-rules under the model defined through (A)-(C), one can take certain general theorems in Kremer (2005a).These shall be listed up adequately in the following: Result 1: The assumptions (A), (B) with ϑ ∈ θ = (a 1 , b 1 ) ⊂ R. Suppose that one has: (a) P θ is dominated by the Lebesgue-measure with density of type: ϑ is dominated by the Lebesgue-measure and has the density: with the function C(•) of (a) and where h(•) is an adequate, nonnegative, measurable function.
Then the a-posteriori-distribution P θ|X=x has the Lebesgue density according to the formula: where x = (x 1 , . . ., x n ) and D n + m, x 0 + n i=1 x i is again the norming constant like in (a).This result is Theorem 2.14 in Kremer (2005a).It will be applied lateron to models according assumptions (A)-(D).
For the next let the function γ on θ be defined according: and consider the problem of estimating γ(ϑ), based on (the data) X.
The corresponding optimal estimator (in the context of the above Bayes-model (see all in front of (C)in part 2.)) is the so called Bayes-estimator.For its general definition see part 4.1 in Kremer (2005a).Here one gets more special: Result 2: Take the complete context of Result 1 with in addition: The Bayes-estimator is given according to: This result is just Theorem 4.6 in Kremer (2005a).Also it will be applied lateron.
Finally let Θ be split up into two disjoint sets H and K: (3) A decision (based on X) shall be made between the hypotheses: ϑ ∈ H and the alternative: The Bayes-rule for this (so called) testing problem is called Bayes-Test.For details on it see again Kremer (2005a), part 3.1.
Result 3: Take the assumptions (A),(B) of section 2 and assume the existence of μ-densities Then one has as Bayes-Test: Here c 1 , c 2 are certain weighting constants in the so called Bayes-risk (see Theorem 3.2 in Kremer (2005a)).They are for defining a certain loss-function (see Kremer (2005a), page 53).Most simple is to take The proof of a more generalized version of Result 3 can be found in Kremer (2005a), pages 55-57 (note, the above ϕ is given in the remark on page 57).

Remark 1:
Note that the excluded case α = 1 is just the (classical) exponential distribution, what (in usual applications, like nonlife insurance rating) is of not such great importance.
But also note, that this exponential case is already done in (Kremer 2005a), example 2.11 amd example 4.5 case 4.

Remark 2:
From the properties of the Erlang(α, ϑ)-distribution one knows that: what means: Proof of Theorem 1: The f X i ϑ of (D) is of type (b) of section 3.One has: Also f b of (C) is of the Type (a) of section 3.Here one has: Obviously the conditions of result 1 are given with: Since: all conditions of result 2 are given.Inserting there into the formula of γ(x) the special choices (5), one arrives at the result of theorem 1.
Of certain interest are: Remark 3: From formula (1) and ( 4) one concludes easily that the a-posteriori-distribution what is for n = 1 in agreement with one result in table 3.2 in Robert (1994).
Remark 4: Example 4.9 in Robert (1994) is related to the above theorem.In the special situation n = 1 one has for the Bayes-estimator δ π 1 (X) in Robert (1994) and the γ(X) of Theorem 1: The factor α can be easily explained.Robert (1994) takes the loss function: whereas the above is based on:

Bayes-Test
Suppose again that the Bayes-context of section 2 with (A), (B) and with in addition (C), (D) is given.Considered shall be an adequate testing problem now, more concretely the hypotheses: against the alternative: where ϑ 0 is fixed and given.That this type is most nearlying, one concludes from (4).

One has:
Theorem 2: The Bayes-test for H against K is given by: where F −1 ϑ 0 is the inverse of the (strictly monotone increasing) function: Proof: According to Remark 2 one has: since the Erlang(a * , b * )-distribution has the distribution function value F y (b * ) at the point y.As a consequence the condition: Remark 5: Note that F ϑ 0 is the distribution function of the Erlang(a * , ϑ 0 )-distribution.This means, that one can compute the k(a, α, ϑ 0 ) just as γ-fractile (with γ = c 2 c 1 +c 2 ) of the Erlang(a * , ϑ 0 )-distribution.Note that the above Bayes-test of theorem 2 can not be found in Berger (1994) (not even for the special case n = 1!) or somewhere else.

Finally a fine:
Application: Also in Bayes-statistics one thinks about certain confidence regions on the parameter ϑ.A quickest introduction into that area can be found again in Kremer (2005a) (see part 4.4 there).There one derives confidence regions from the Bayes-tests.Denote with C(x) a confidence region for the parameter ϑ based on (the sample) x = (x 1 , ..., x n ).According to formula (3.7) in Kremer (2005a) it has greatest sense to take: where ϕ ϑ is the test of Theorem 2 with choice ϑ instead of ϑ 0 .Obviously: what can be rewritten as: (compare proof of Theorem 2).
For practical applications one needs an adequate value for c.Certainly one wants that the confidence region holds in a certain confidence level, say (1 − δ) with a fixed, chosen δ ∈ (0, 1) (small, e.g.0.05).Consequently one has the condition for choosing c: where one can replace "'≥"' through "'="'.According to remark 2 one has P θ|X=x .It is Erlang (a + α, b + n i=1 x i ).But first rewrite C(x) into a more nice form.
Take the notation: Since also G b * (•) is strictly increasing, also its inverse G −1 b * exists what gives: The condition (6) (with "'≥"' replaced by "'="') means as a consequence that G −1 b * (c) must be the δ-fractile u δ (x) of P θ|X=x .Alltogether one has as Bayesian-confidence-region for ϑ: where u δ (x) is the δ-fractile of the Erlang(a + α, b + n i=1 x i )-distribution.C(x) is a HPD δ-credible region in the sense of Robert (1994) (see definition 5.7 there).But note, that the above results (especially (7)) are not given by Robert (1994) (and others).

Parameter-estimation
For application of the results of the section 4 and 5 one needs to know the parameter b of the a-priori-distribution (remember a ∈ N, α ∈ R were assumed to be given (and known)).
Since b is not given, one needs an estimator for b.For deriving such an estimator suppose, that one has k replications of the X, say: X j = (X j1 , ..., X jn ) , j = 1, ..., k .
Obviously one has the context of section 6.2 and 6.3 in Kremer (2005a).
In addition assume that θ j is distributed like the θ in (C) (for all j = 1, ..., k) and that: is that P X ji ϑ of (D) (for all j = 1, ..., k and i = 1, ..., n).According to section 6.2 in Kremer (2005a) one gets the socalled moment-estimator for the b as the solution b of with: and where the outer integral E b (•) is standing for the integration over θ j = ϑ j according to the Erlang(a, b)-law.According to (4) one has E(X ji |θ j ) = a θ j , and since: Consequently one has as moment-estimator bME for b simply: Finally one also likes to know, how to calculate the maximum-likelihood-estimator for b in the above given Bayescontext.According to section 6.3 the b is a solution of the equation: where P b is the a-piori-distribution with parameter b (a is known!).
Inserting all densities (according to (C) and (D)), one gets after routine calculations: As consequence, (9) gives as equation for b: .
With further modifications one arrives at the final result, that the maximum-likelihood-estimator bML of b is given as: where B is (the) solution of the equation: with the S (•) according: , where: Clearly the solution B of (10) has to be calculated in practical application with an adequate method of numerical mathematics.
Certainly the practitioner might prefer the bME to the bML .But note, according to certain general theoretical investigations of Asymptotic Statistics, also the more unhandy bML has its sense.

Final remarks
Note, that the author's roots go back to non-parametric statistics.He made elegant research on Bahadur efficiencies of rank tests (see e.g.Kremer (1979Kremer ( ), (1981))).At the begin of the 80th he changed more to mathematical risk theory.In that field he brought a lot of new research in Bayes-techniques e.g. in premium rating (see e.g.Kremer (1982Kremer ( ), (2005b))).