Bayesian Simultaneous Intervals for Small Areas: an Application to Variation in Maps

Bayesian inference about small areas is of considerable current interest, and simultaneous intervals for the parameters for the areas are needed because these parameters are correlated. This is not usually pursued because with many areas the problem becomes difficult. We describe a method for finding simultaneous credible intervals for a relatively large number of parameters, each corresponding to a single area. Our method is model based, it uses a hierarchical Bayesian model, and it starts with either the 100(1 − α)% (e.g., α = .05 for 95%) credible interval or highest posterior density (HPD) interval for each area. As in the construction of the HPD interval, our method is the result of the solution of two simultaneous equations, an equation that accounts for the probability content, 100(1 − α)% of all the intervals combined, and an equation that contains an optimality condition like the " equal ordinates " condition in the HPD interval. We compare our method with one based on a nonparametric method, which as expected under a parametric model, does not perform as well as ours, but is a good competitor. We illustrate our method and compare it with the nonparametric method using an example on disease mapping which utilizes a standard Poisson regression model. 1.Introduction Simultaneous inference on parameters is an important problem; see Miller (1981). In Bayesian analysis the parameters are generally correlated, and so simultaneous interval estimation is of interest. For example, in Bayesian small area estimation in which a parameter indexes each area, the parameters are usually correlated, and simultaneous intervals should be reported. This is not usually done because the problem is generally difficult. We propose a method to construct 100(1 − α)% (e.g., α = .05 for 95%) simultaneous credible intervals for many small areas, and we apply our method to study variation in a problem on disease mapping. This is not a new problem. brought the general principles of multiple comparisons into their current structure; see Miller (1981, p. 2). The basic technique of multiple comparisons can be separated into two groups: those which can provide confidence intervals (confidence regions), and those which provide tests of hypotheses. Our study contributes to the first of these groups. Simultaneous intervals based on standard normal theory form an optimal ellipsoid (Box & Tiao, 1973). Why do we need simultaneous intervals? Consider two parameters μ 1 and μ 2. Let a 95% credible interval (CI) …


1.Introduction
Simultaneous inference on parameters is an important problem; see Miller (1981).In Bayesian analysis the parameters are generally correlated, and so simultaneous interval estimation is of interest.For example, in Bayesian small area estimation in which a parameter indexes each area, the parameters are usually correlated, and simultaneous intervals should be reported.This is not usually done because the problem is generally difficult.We propose a method to construct 100(1 − α)% (e.g., α = .05for 95%) simultaneous credible intervals for many small areas, and we apply our method to study variation in a problem on disease mapping.This is not a new problem.By 1955 three principal investigators (Duncan, 1952;Scheffé, 1953;Tukey, 1953) brought the general principles of multiple comparisons into their current structure; see Miller (1981, p. 2).The basic technique of multiple comparisons can be separated into two groups: those which can provide confidence intervals (confidence regions), and those which provide tests of hypotheses.Our study contributes to the first of these groups.Simultaneous intervals based on standard normal theory form an optimal ellipsoid (Box & Tiao, 1973).
Why do we need simultaneous intervals?Consider two parameters μ 1 and μ 2 .Let a 95% credible interval (CI) for μ 1 be (a 1 , b 1 ) and a 95% CI for μ 2 be (a 2 , b 2 ).Then the intersection (a 1 , b 1 ) ∩ (a 2 , b 2 ) does not form a set giving a 95% credible interval; in fact, it is at most 95%.Thus, we need to lengthen these individual intervals in an optimal manner.For n events A i , i, . . ., n, Bonferroni's inequality, Pr ∩ n i=1 A i ≥ n i=1 Pr(A i ) − (n − 1), allows us to provide a lower bound to the probability of a simultaneous event (the intersection) in terms of the probabilities of the individual events (Miller, 1981, p. 8).For example, for just two 97.5% credible intervals, the simultaneous coverage has a lower bound of .95,and with 798 credible intervals, each with probability content 95%, the simultaneous www.ccsenet.org/ijspInternational Journal of Statistics and Probability Vol. 1, No. 2; 2012 coverage of all intervals has a meaningless lower bound of −38.9! Thus, this method gives a meaningful result when the number of events is small and the probabilities of the individual events are sufficiently large.While the well known methods of Bonferroni, Tukey, Scheffé and others are reasonable for simultaneous intervals of a moderate number of parameters, they become overly conservative for a large number of parameters.Also, these methods are limited by their specific distributional assumptions.
For a unimodal posterior density the 100(1 − α)% HPD interval is obtained by solving two equations.Letting , where f (θ| d ) is a unimodal posterior density with its mode not on the boundary and both tails approaching zero monotonically, then the equations, provide the HPD interval (a, b).Conditions ( 1) and ( 2) can be expressed using a single equation, for example, by solving for a, Nandram (1993) describes a method for constructing simultaneous cuboid intervals for predicting k new observations.There a Bayesian normal-normal model is used to provide intervals based on the multivariate Student's t distribution.These intervals form a simple cuboid which engineers use, instead of the relatively more complex optimal ellipse (Box & Tiao, 1973).The bounds on each interval are obtained by solving a pair of simultaneous equations.The first equation satisfies the simultaneous probability content by forcing the difference of the values of the cumulative density function evaluated at the lesser interval bound from the greater interval bound to be 1 − α.The second equation satisfies optimality criterion of equal ordinates by forcing the difference of the values of the probability density function evaluated at the interval bounds to be zero.Thus the cuboids are optimized by constructing the smallest such k-dimensional cuboid by using HPD intervals in each dimension.However, this method considers only a small number of future observations.Also note that in this method the future observations have the same distribution, and they are correlated.Similarly, Nandram and Choi (2004) construct simultaneous concentration bands for quantile-quantile probability plots, accounting for the correlation of the order statistics and providing exact coverage probability.Comparisons of pointwise and Bonferroni concentration bands are given.Besag et al. (1995) presents a method, henceforth BESAG, to calculate simultaneous credible regions based on order statistics (see Appendix A).The idea is to use samples drawn from the empirical distribution of each parameter of interest.The procedure is analogous to ordering each sample, counting from the minimum and maximum of each ordered sample a fixed number of ranks and use these order statistics to form the simultaneous interval.Because the method is nonparametric, it ignores the properties of the underlying distribution the sample was drawn from, but uses an assumption of symmetry.Therefore, the method conservatively makes the intervals wider than necessary.One would expect poor performance if there is a specific underlying parametric model.
There are a few attempts at simultaneous interval estimation in the context of small area estimation.Lui and Cumberland (1989) use simultaneous interval estimates in small domain estimation under the Bayesian paradigm.They use the Bonferroni method, the multivariate Student's t method and Scheffé's method and make comparisons.Andrews and Birdsall (1988) compare three simultaneous confidence interval procedures: ordinary-χ 2 , full-design, and Bayesian.They found that for their study, the Bayesian procedure has the best properties in terms of correct coverage with small average interval width having small variation over replications.
We use these 100(1 − α)% simultaneous intervals in an application on the variation in maps.Nandram, Sedransk and Pickle (2000) described a method to study variation in maps (see Section 4 of their paper) for the continental U.S. with 798 health service areas (HSAs).They use the 1,000 iterates of the 798 mortality rates from the output of a Markov chain Monte Carlo method to construct 1,000 maps.They study variation by finding the identity of the quantiles of a HSA in the mean map and over each of the 1,000 alternative maps.This method addresses directly the issue of how the apparent map patterns change, but it is difficult to present all the available information from 1,000 maps.Our idea is to construct three maps, the mean map (the one that is usually presented), and the end points of 95% simultaneous intervals (an upper map and a lower map) for all HSAs.We assess variation in the mean map estimator by the degree of similarity among the three maps.
The rest of the paper is as follows.In Section 2 we describe how to construct the 100(1−α)% simultaneous credible intervals.We describe two methods, one uses an equation on the probability content only, and the other has an additional optimality condition (i.e., an "equal ordinates" condition).In Section 3 we describe an application in disease mapping of 798 health service areas (a relatively large number of parameters for simultaneous inference).
In Section 4 we describe a numerical example on chronic obstructive pulmonary disease of the 798 HSAs.We also compare our method with the nonparametric method (BESAG, see Besag et al., 1995); see Appendix A. Section 5 has concluding remarks, and we suggest future research.

Construction of Simultaneous Intervals
We propose to construct 100(1 − α)% simultaneous intervals by "stretching" individual HPD intervals until the desired content is obtained; this includes an optimality criterion.Unlike ellipsoidal regions, because the parameters are not identically distributed, the simultaneous intervals form a hyper-rectangle, not a cuboid.
Our methods require (a) the support of the individual posterior densities to be on the nonnegative part of the real line, (b) the posterior densities not to be flat in any interval, (c) the tails of the posterior densities to asymptote to zero monotonically, (d) the posterior density to be unimodal, and (e) the marginal posterior densities must be similar, especially in the tails.Although we do not consider a posterior density with its mode on the boundary of the parameter space, our method can be adjusted to meet this requirement.We consider a simple example.Let x be data, θ be the parameter of interest, and θ | x ∼ Gamma(δ, τ), where δ and τ are functions of the data x .Then, this posterior density f (θ|x ) is unimodal, provided δ > 1, and the both tails satisfy (a)-(d), and when there are several posterior densities we expect them to be similar to this posterior density.In problems on mortality data analysis in a Bayesian model, the posterior densities of the mortalities typically satisfy our requirements.In our methods for simultaneous intervals, we assume that these conditions are met.
Generalizing the method for individual HPD intervals, we want to solve the system of equations, The combined probability content is ensured to be 100(1 − α)% by (3), and the equal ordinates conditions are ensured by (4).However, because we have nearly twice as many unknown quantities as the number of equations, a unique solution does not exist.Even if there were a unique solution, optimizing over such a large set of parameters (e.g., 798 in the disease mapping problem) is undoubtedly computationally demanding.
First, we relax the condition in (4) to obtain our first method, Method I. Here, our simplification is to provide one parameter to operate on the individual HPD intervals.Let 0 < γ ≤ 1 be a "stretching" factor on the HPD interval (a i , b i ), giving the stretched interval higher content than the initial interval because In this way we can begin with the 100(1 − α)% credible or HPD intervals (a i , b i ) and stretch them until the desired content is obtained.Thus, our objective is to solve the equation This can be obtained by determining the value for γ such that min γ Δ(γ) is zero, where and using the Nelder-Mead algorithm (Nelder & Mead, 1965) to force Δ(γ) in ( 5) down to zero.Clearly, the minimum of this function over γ is zero.We will call the method that starts with the individual credible intervals Method I-CRE, and the method that starts with the individual HPD intervals Method I-HPD.
The following argument shows that there is a unique solution to (5).Let − α be the probability of the intersection.Observe that as γ decreases to 0, each A i (γ) is a sequence of increasing sets in γ so that ∩ i=1 A i (γ) is a sequence of increasing sets in γ.So, letting λ = (λ 1 , . . ., λ ) , by the continuity theorem, That is, as γ goes to 0, the sequence of increasing sets ∩ i=1 A i (γ) encompasses the entire space + , the positive -dimensional Euclidean space.Also, at γ = 1, Pr In Method II we simplify the problem in equations ( 3) and ( 4) to provide two parameters to operate on the individual HPD intervals.Let 0 < γ 1 ≤ 1 and 0 < γ 2 ≤ 1 be "stretching" factors on the HPD intervals (a i , b i ). Then, In this way we can begin with the HPD intervals (or credible intervals) (a i , b i ), and stretch them until the desired content is obtained together with an ordinate optimality criterion.Thus, our objective is to solve the set of two equations Note that we have two parameters and at least two equations, depending on how we wish to formulate our ordinate optimization criterion in equation ( 7).
The conditions in (7) are very tight because the distributions of λ i | d can be very different, although we expect them not to be different.If given d the λ i are exchangeable, the equations in ( 6) and ( 7) reduce to This is similar to the situation for prediction in Nandram (1993), and it is easy to show that there is a unique solution (γ 1 , γ 2 ) to this system of equations.However, in this case the assumption of exchangeability is unreasonable.Thus, there may be no (γ 1 , γ 2 ) which satisfy (7).Therefore, although there are several values of (γ 1 , γ 2 ) which satisfy (6), there may be no simultaneous solution to ( 6) and (7).
www.ccsenet.org/ijspInternational Journal of Statistics and Probability Vol. 1, No. 2; 2012 Thus, because we need the condition in (6), we will relax the condition in (7) to require where the e i are small perturbations.However, we replace the conditions in (9) by Furthermore, for a reasonable small value of ē we have In many applications we may have ē ≈ 0, where given d , the λ i are nearly exchangeable.In fact, in our applications the λ i are nearly exchangeable.Thus, our problem reduces to finding the solution ( Thus, we need to obtain the min γ 1 ,γ 2 Δ(γ 1 , γ 2 ), where Under our general conditions, min γ 1 ,γ 2 Δ(γ 1 , γ 2 ) ≈ 0. Again, computation can be done by using the Nelder-Mead algorithm (Nelder & Mead, 1965) to force Δ(γ 1 , γ 2 ) in ( 12) down to zero.Also, we will call the method that starts with the individual credible intervals Method II-CRE, and the method that starts with the individual HPD intervals Method II-HPD.

An Application on Disease Mapping
In this section we describe the disease mapping application in which there are 798 health service areas.We need the mean map, a lower bound map and an upper bound map so that the overall coverage is 95%.We use a hierarchical Bayesian model, a Poisson regression model, to study the mortalities over the 798 health service areas.
We briefly describe the model used by Nandram, Liu and Choi (2005), introduced by Christiansen and Morris (1997).Let n i denote the population size, d i denote the number of deaths and λ i denote the mortality rate for the i th HSA, i = 1, . . ., , where = 798.Then, the hierarchical Bayesian (Poisson regression) model is where x i , a p-dimensional vector, are covariates with x i1 ≡ 1.Finally, the hyper-parameters α and β have proper prior densities where a 0 , μ β and Λ β are to be specified.We have used the same noninformative specification of μ β and Λ β as in Nandram, Liu, and Choi (2005) and we have taken a 0 = 1; see Albert (1988).
In ( 16), conditional on α, β , and d , the λ i are independent gamma random variables with λ i ≥ 0, and the corresponding posterior distribution function is The conditional posterior density in (17) permits us to obtain Rao-Blackwellized estimators of the λ i , thereby facilitating the construction of the maps.Also note that g i (λ i |α, β , d ) depend on i (i.e., the λ i are not conditionally identically distributed as in Nandram (1993)).
Thus, we construct the posterior mean map using Rao-Blackwellized estimators for the λ i .Letting r i = d i /n i , i = 1, . . ., , denote the observed mortality rate and letting x i β .This is a weighted average of the observed mortality rate r i and the prior mortality rate e x i β .It follows that the posterior mean (unconditional) of and the Rao-Blackwellized estimator of λ i is , where and h) , h = 1, . . ., M, are the M iterates obtained from the Metropolis-Hastings sampler.A map of these E(λ i |d ) is an estimator of the mean map.
The individual credible intervals or HPD intervals are easy to obtain.To construct the HPD intervals (a i , b i ), for the i th HSA, we minimize the function Δ(a i , b i ) over (a i , b i ), where and g i (•) and G i (•) are given by ( 17).These individual intervals are to be used to construct the simultaneous intervals.To construct the credible intervals one does not need any optimization.For each (α (h) , β (h) ), we generate one iterate λ (h) i , i = 1, . . ., , h = 1, . . ., M = 1000 from (17).Then, the 95% credible interval for the i th area is (λ (25)  i , λ (976) i ), i = 1, . . ., .Thus, if the two methods lead to the same 95% simultaneous intervals, one can start with the credible intervals, thereby avoiding some computational effort.
We obtain our simple Method I for simultaneous intervals by finding the value of γ minimizing Δ(γ) over parameter γ, where We obtain the Method II with the optimality condition by minimizing the function Δ(γ 1 , γ 2 ) over (γ 1 , γ 2 ), where Instead of (20) we have used Note that by the triangle inequality ( 21) is a stronger condition.But, our results show very minor differences.
The minimizations in ( 18), ( 19) and ( 20) or ( 21) are obtained using the Nelder-Mead algorithm (Nelder & Mead, 1965).In Section 4 we apply these methods to study variation in maps using data on chronic obstructive pulmonary diseases.

A Numerical Example
We use data in Nandram, Liu and Choi (2005) for white males with chronic obstructive pulmonary diseases (COPD) in age classes 8, 9, and 10 (i.e., white males older than 65 years).There are 798 HSAs over the continental U.S., and we have the data on the population sizes and the number of deaths in these age classes combined.So our 95% simultaneous credible intervals consist of 798 intervals for the mortality rates, which form a hyper-rectangle.Four potential risk factors were selected in Nandram et al. (2000) for COPD, and used by Nandram, Liu and Choi (2005).These risk factors, used as covariates, help to explain the spatial patterns of COPD and its constituent diseases (asthma, chronic bronchitis, and emphysema).Here x 0 ≡ 1 is used for an intercept and the covariates are: x 1 is the white male lung cancer rate per 1,000 population; x 2 is the square root of (population density/10 4 ); x 3 is the square root of (elevation/10 4 ); and x 4 is (annual rainfall/100).
Using the model in Section 3, Nandram, Liu and Choi (2005) showed that the 95% credible intervals for each of the regression coefficients do not contain zero (i.e., all the covariates are significant).Nandram, Liu and Choi (2005) showed that inference about the mortality rates is not much sensitive to the specifications of a 0 , μ β and Λ β .Using a cross-validation and a deviance measure, they also showed that the Poisson regression model fits reasonably well.Similar diagnostics are discussed in Nandram, Sedransk andPickle (1999, 2000).Also, the 95% credible interval for α when all 798 HSAs are used is (19.5, 36.8);showing that the conditional posterior distributions of the λ i are all unimodal.
Again, we fit the model described in Section 3 using the Metropolis-Hastings algorithm exactly as described by Nandram, Liu and Choi (2005) to obtain M iterates of (α, β ) from the posterior density p(α, β |d ).In most situations we have taken M = 1000 to get the sample Ω (h) = (α (h) , β (h) ), h = 1, . . ., M. From these Ω (h) we have obtained the posterior mean map and the 95% simultaneous credible intervals for the mortality rates λ i , i = 1, . . ., 798, as described in Section 3.All maps are colored with the same five-color monochromatic color scheme with light representing a low rate and dark representing a high rate.All intervals have 95% coverage.In the three maps made up by the simultaneous intervals, the mean map is always presented between the lower and upper maps as a basis of reference, and to study variation.

Comparisons
To compare our methods, we have studied the simultaneous probability contents of all methods, the values of γ in Method I and the values of (γ 1 , γ 2 ) in Method II, and the maps based on the 95% simultaneous credible intervals.We note that the probability content is calculated under the Poisson regression model in Section 3 as follows.Let (a i , b i ), i = 1, . . ., , be a set of intervals for the mortalities of the = 798 HSAs.Then, the combined probability content of the hyper-rectangle formed by these intervals is where G i (•|d i , Ω (h) ) is given near (17) and p(α, β |d ) is the posterior density of (α, β ) under the Poisson regression model.Thus, a Monte Carlo consistent estimator of C is For all our methods M = 1000 suffices.
The simultaneous probability content by region for the BESAG method (Besag et al., 1995) using 10,000, 15,000, 20,000, and 25,000 iterates is given in Table 1.To highlight the behavior of BESAG method, we have separated out the U.S. into its 12 regions (Note that the 95% credible intervals for α by region are all to the right of 15.3).Under the Poisson regression model we observe that the contents of the intervals do not converge to the nominal value of 95% as the number of iterates increases from 10,000 to 25,000.For some of the regions BESAG method converges to the wrong value, and for others perhaps 25,000 iterates are too small.For example, for W. N. Central-North and Mountain-North the probability contents are 93.3% and 93.4% respectively, lower than the nominal value of 95%.It appears that BESAG method is not appropriate for very high dimensions.Of course, we are operating in a parametric situation, not favorable to BESAG method.Also, convergence for extreme order statistics or quantiles is generally slow; see Serfling (1980, Ch. 2) for a discussion of the asymptotic behavior of order statistics and quantiles.
To assess BESAG method further we have performed a small simulation study.We have drawn samples from the independent conditional posterior densities We set α and β at their posterior means, obtained when the full Poisson regression model in Section 3 is fit to all the COPD data.We use the (n i , d i ), i = 1, . . ., = 798, as in the COPD data.The importance of this exercise is that we know precisely that the data come from ( 22), so that this simple exercise shows the defect of BESAG method.Letting λ (h) i , h = 1, . . ., M, i = 1, . . ., , denote the samples obtained, we use BESAG method to compute the 95% simultaneous credible intervals for λ 1 , . . ., λ , denoted by (a i , b i ), i = 1, . . ., .Then, the simultaneous probability content of these intervals is where G i (•| d i , α, β ) are the incomplete gamma functions obtained from ( 22).We apply this simulation to the 798 HSAs and each region separately.For each design point we repeat the experiment 100 times, and Figure 1, in which the means and standard deviations (reasonably small) are reported, shows very interesting results, much in concordance with Table 1.We obtain the simultaneous coverage for M = 1000, 5000, 10000, 15000, 20000, and 25000 for all 798 HSAs and by region.Generally, BESAG method has very poor simultaneous coverage for small sample sizes (e.g., M = 1000).There is convergence for small regions with small values of HSAs by about M = 10000, but for larger regions convergence has not been realized by M = 25000 (e.g., regions 4, 5, 6, 8).Also, for all HSAs the combined coverage at M = 1000 is .217,and it increases to .941 at M = 25000.Table 1.An assessment of the nonparametric method (Besag et al., 1995)  The probability contents for various methods are computed under the Poisson regression model.To assess convergence, in a Monte Carlo integration procedure we have used sample sizes of 10,000, 15,000, 20,000 and 25,000 from the importance function.
The simultaneous probability content of five types of intervals is given in Table 2.The intervals are the individual equal-tailed credible interval (CI), individual highest posterior density interval (HPD), and Method I-HPD simultaneous interval each using 10,000 iterates, and the nonparametric interval described by Besag et al. (1995) from Appendix A using 10,000 iterates (BESAG10) and 25,000 iterates (BESAG25).Obvious details to note are the approximately zero simultaneous contents for the CI and HPD intervals for all regions, and the less than 0.15 simultaneous content for each region.Method I-HPD is the only method that consistently obtains the 95% simultaneous coverage for all cases.Our other three methods also have the nominal coverage value of 95%.BESAG10 underestimates and BESAG25 overestimates for all regions showing the importance of a large number of iterates; 10,000 was too few for the correct content and 25,000 shows the conservative nature of the method.For most individual regions, BESAG25 is comparable to Method I-HPD.
The values of γ that optimize (5) for Method I and the (γ 1 , γ 2 ) pair that optimize (12) for Method II are given in Table 3.As expected, the values for γ and (γ 1 , γ 2 ) are smaller when more HSAs are considered simultaneously.To compare the sensitivity of the values of γ and (γ 1 , γ 2 ) on the condition of starting with individual HPD intervals, simultaneous intervals based on the credible intervals were obtained.There are differences between values of γ and (γ 1 , γ 2 ) when starting with credible intervals versus using HPD intervals.The difference is not large for Method I where no ordinate optimality criterion is specified.However, there is a small difference for Method II which compensates for asymmetric densities to obtain equal ordinates when starting with the credible intervals.This small difference is attributed to the rough symmetry of our individual distributions.If the individual distributions are highly skewed, this difference will be even greater.
Figure 1.A simulation study to assess the simultaneous probability content of BESAG method by region.Mean and (sd) of the 100 simultaneous probability contents are reported NOTE: The (n i , d i ), i = 1, . . ., = 798, with α and β specified in the COPD data are used to generate 100 datasets of the 798 mortalities from their conditional posterior gamma densities.BESAG method then computes 95% simultaneous intervals for all HSAs and by regions for samples of sizes 1,000, 5,000, 10,000, 15,000, 20,000, and 25,000.Finally, the simultaneous probability content is computed using the conditional posterior gamma densities.This is done for each of the 100 datasets, and in the plot we report the mean±sd of the 100 simultaneous probability contents.

Variation
In practice, the mean map (more appropriately posterior mean map) is presented without any measure of variability.
In fact, the mean map is one of several maps that can be presented; the Markov chain Monte Carlo procedure gives M = 1000 maps because there are M values of the vectors λ 1 , . . ., λ M .A natural summary is the posterior mean map estimator, a realization we call the mean map.How much confidence can we put on the mean map as an estimator of the true map?One way to answer this question is to use three maps; 1000 maps are too many.There are two possibilities.First, one can obtain the 95% HPD intervals for each HSA, and use the end points of these intervals to construct two maps, the lower map and an upper map.Then the degree of similarity among the lower map, mean map and upper map is a measure of the variation of the mean map estimator.This method has the problem that the overall coverage is much smaller than 95% (close to 0%).Thus, the second possibility is to use the 95% simultaneous credible intervals as a measure of the variation.
We have drawn the maps for 95% simultaneous credible intervals corresponding to the four methods: Method I-CRE, Method I-HPD, Method II-CRE, and Method II-HPD.As these maps are very similar we present the maps corresponding to Method I-HPD in Figure 3.For comparison we also study the map based on the individual HPD intervals in Figure 2. The mean map, the map of the parameter estimates, is the middle map of the three.In practice, the mean map is NOTE: This is a numerical study of the variation in the estimator of the mean map.

Concluding Remarks
We have described four methods to construct the 100(1 − α)% simultaneous credible intervals for a relatively large number of small areas.One type of method arises from a single equation that holds the coverage probability, and the other type of method is the solution of a pair of simultaneous equations, one with the coverage probability and the other an optimality condition.Each of these types either starts with the individual 100(1 − α)% credible intervals or HPD intervals, and uses the Nelder-Mead algorithm to obtain the 100(1 − α)% simultaneous intervals.We prefer the method based on credible intervals with two equations because (a) it uses less computational effort and (b) it incorporates a sort of optimality condition.Both (a) and (b) are desirable properties for 100(1 − α)% simultaneous credible intervals.
Our method performs better than BESAG, the nonparametric method of Besag et al. (1995).We have demonstrated this fact using a small simulation study and an application on disease mapping with a standard Poisson regression model.When simultaneous intervals are required for a large number of areas, the nonparametric method needs large samples to converge.It is not within our interest to investigate the behavior of the nonparametric method for a small number of areas; but we believe that it will perform well.We must admit that the nonparametric method is simple, and in principle needs moderate amount of computation, but our method is worth the extra effort.The application is a nonstandard one because it needs 95% credible intervals for 798 parameters.In this application, our method assesses variation in the posterior mean map by constructing a lower bound map and an upper bound map so that the probability that the mean is between these two is 95%.We believe that this construction is novel.
Our work can find further application in bioinformatics in which simultaneous interval estimation will be useful.Problems in bioinformatics are typically of high dimensions, much higher dimensions than the one we have considered.Thus, standard methods like Bonferroni method are not appropriate, and further research is needed.
We mention two problems similar to the one in this paper.First, one can use spatial models to construct 100(1−α)% simultaneous credible intervals as in our proposed method.For example, as in Besag et al. (1991), a pair-wise difference prior (PDP) can be assumed on the λ i .The additional difficulty is that the λ i are not independent given the other parameters and the data.Second, a problem of much interest in small area estimation is to construct the 100(1 − α)% credible intervals for small area proportions in a three stage hierarchical model, as in the Poisson regression model; see Nandram (1998) and Nandram and Erhardt (2005).Both of these problems are worth future investigations.

A. Nonparametric Simultaneous Credible Intervals
For convenience, we reproduce the nonparametric method of Besag et al. (1995).
Denoting the stored sample by {x (t)  i : i = 1, . . ., ; t = 1, . . ., M}, order {x (t) i : t = 1, . . ., M} separately for each component i, to obtain order statistics x [t]  i and ranks r (t) i , t = 1, . . ., M. For fixed k ∈ {1, . . ., M}, let t * be the smallest integer such that x [M+1−t * ] i ≤ x (t)  i ≤ x [t * ] i , for all i, for at least k values of t; t * is equal to the kth order statistic from the set a (t) = max max i r (t)  i , M + 1 − min i r (t)   i , t = 1, . . ., M, that is, t * = a [k] . Then i : i = 1, . . ., are a set of simultaneous credible regions containing at least 100k/M% of the empirical distribution.

Figure 2 .
Figure 2. Choropleth maps corresponding to the 95% individual HPD intervals for COPD mortalities of white males at least 65 years old; the legend is based on the quintiles of the mean map (middle), and the lower (bottom) and upper (top) maps are the end points of the 798 credible intervals (hyper-rectangle) using the simultaneous probability content by region based on the Poisson regression model using 10,000, 15,000, 20,000, and 25,000 iterates

Table 2
CI indicates the individual 95% credible intervals and the table shows their simultaneous content, HPD indicates the individual 95% HPD intervals and the table shows their simultaneous content, Method I-HPD simultaneous intervals contain 95% probability, BESAG10 and BESAG25 indicate the BESAG method using 10,000 and 25,000 iterates and the table shows their simultaneous content based under the Poisson regression model.

Table 4 .
Number of HSAs in the quintiles of each pair of three maps, Lower to Upper, Mean to Upper, and Lower to Mean, for the 95% individual HPD interval (a-c) and the Method I-HPD corresponding to the 95% simultaneous credible intervals(d-f)