Asymptotically Optimal Regression Prediction Intervals and Prediction Regions for Multivariate Data

This paper presents asymptotically optimal prediction intervals and prediction regions. The prediction intervals are for a future response Yf given a p×1 vector x f of predictors when the regression model has the form Yi = m(xi)+ei where m is a function of xi and the errors ei are iid from a continuous unimodal distribution. The prediction intervals have coverage near or higher than the nominal coverage for many techniques even for moderate sample size n, say n > 10(model degrees of freedom). The prediction regions are for a future vector of measurements x f from a multivariate distribution. The nonparametric prediction region developed in this paper has correct asymptotic coverage if the data x1, ..., xn are iid from a distribution with a nonsingular covariance matrix. For many distributions, this prediction region appears to have good coverage for n > 20p, and this region is asymptotically optimal on a large class of elliptically contoured distributions. Hence the prediction intervals and regions perform well for moderate sample sizes as well as asymptotically.


Introduction
This paper presents asymptotically optimal prediction intervals and prediction regions.The prediction regions are for a future vector of measurements x f from a multivariate distribution, and are asymptotically optimal on a large class of elliptically contoured distributions.Regression is the study of the conditional distribution Y|x of the response Y given the p × 1 vector of predictors x.The prediction intervals are for a future response Y f given a vector x f of predictors when the regression model has the form for i = 1, ..., n where m is a function of x i and the errors e i are iid from a continuous unimodal distribution.Many of the most important regression models have this form, including the multiple linear regression model and many time series, nonlinear, nonparametric and semiparametric models.If m is an estimator of m, then the ith residual is Olive (2007) showed how to form asymptotically optimal prediction intervals for model (1), but for many regression models and estimators, large n is needed for the intervals to perform well.Prediction intervals derived for multiple linear regression did perform well.This paper derives asymptotically optimal prediction intervals that perform well for many models for moderate n.
A large sample 100(1 − δ)% prediction interval (PI) has the form ( Ln , Ûn ) where P( Ln < Y f < Ûn ) P → 1 − δ as the sample size n → ∞.Following Olive (2007), let ξ δ be the δ percentile of the error e, i.e., P(e ≤ ξ δ ) = δ.Let ξδ be the sample δ percentile of the residuals.Consider predicting a future observation Y f given a vector of predictors x f where (Y f , x f ) comes from the same population as the past data (Y i , x i ) for i = 1, ..., n.Let 1 − δ 2 − δ 1 = 1 − δ with 0 < δ < 1 and δ 1 < 1 − δ 2 where 0 Assume that m is consistent: m(x) According to regression folklore, the percentiles of the residuals are consistent estimators, ξδ P → ξ δ , under "mild" regularity conditions, and this consistency is the basis for using QQ plots.The folklore is true for linear models: sufficient conditions are β P → β and the x i are bounded in probability.See Olive and Hawkins (2003), Welsh (1986) and Rousseeuw and Leroy (1987, p. 128).
Consider the multiple linear regression model Y = Xβ + e where Y is an n × 1 vector of dependent variables, X is an n × p matrix of predictors, β is a p × 1 vector of unknown coefficients, and e is an n × 1 vector of unknown iid zero mean errors e i with variance σ 2 .Let the hat matrix H = X(X T X) −1 X T .Let h i = h ii be the ith diagonal element of H for i = 1, ..., n.Then h i is called the ith leverage and Suppose new data is to be collected with predictor vector x f .Then the leverage of For the multiple linear regression model, let ξδ be the sample quantile of the residuals.Following Olive (2007), let A PI is asymptotically optimal if it has the shortest asymptotic length that gives the desired asymptotic coverage.
The PI ( 4) is asymptotically optimal on a large class of unimodal continuous symmetric error distributions.For more general distributions, an asymptotically optimal PI can be created by applying the shorth(c) estimator to the residuals where c = n(1 − δ) and x is the smallest integer ≥ x, e.g., 7.7 = 8.See Grübel (1988).That is, let r (1) , ..., r (n) be the order statistics of the residuals.Compute ) correspond to the interval with the smallest length.Following Olive (2007), a 100 where a n is given by (3).This prediction interval performs well for moderate n for multiple linear regression and several estimators, including least squares.
A problem with prediction intervals is choosing a n and b n so that the intervals have short length and coverage close to or higher than the nominal coverage for a wide variety of regression models when n is moderate.Section 2.1 shows how to modify (4) and ( 5) to achieve these goals while Section 2.2 covers prediction regions for a future vector of measurements x f .Examples and simulations are in Section 3.

Method
The idea for finding the asymptotically optimal prediction intervals and regions is simple.Find the target population 100(1 − δ)% covering region.For small n, the coverage of the training data will be higher than that for the future case to be predicted.In simulations for a large group of models and distributions, the undercoverage could be as high as min(0.05,δ/2).Let q n = min(1 − δ + 0.05, 1 − δ + p/n) for δ > 0.1 and If 1 − δ < 0.999 and q n < 1 − δ + 0.001, set q n = 1 − δ.Then use the prediction interval or region that covers 100q n % of the training data.The coverage of the training data is 100q n % and converges to 100(1 − δ)% as n → ∞, even if the model assumptions fail to hold.

Asymptotically Optimal Prediction Intervals
The technique used to produce asymptotically optimal PIs that perform well for moderate samples is simple.Find Ŷf and the residuals from the regression model.Since the leverage of x i is closely related to the Mahalanobis distance of x i from the sample mean x of the n predictor vectors, leverage and extrapolation are useful for a wide range of regression models.For a wide range of regression models, extrapolation occurs if h f > 2p/n: if x f is too far from the data x 1 , ..., x n , then the model may not hold and prediction can be arbitrarily bad.This result suggests replacing (3) by Let δ n = 1 − q n where q n is given by (6).Then is a large sample 100(1 − δ)% PI for Y f that is similar to (2) and (4).r (d) , r (d+c−1) ) = ( ξδ1 , ξ1−δ2 ) correspond to the interval with the smallest length.Then the asymptotically optimal 100 (1 and is similar to (5).
To see that the PI ( 9) is asymptotically optimal, assume that the sample percentiles of the residuals converge to the population percentiles of the iid unimodal errors: ξδ P → ξ δ .Also assume that the population shorth (ξ δ1 , ξ 1−δ2 ) is unique and has length L. Since b n → 1, m(x f ) P → m(x f ), and q n = 1 − δ for large enough n, it is enough to show that the shorth of the residuals converges to the population shorth of the e i : ( ξδ1 , ξ1−δ2 ) P → (ξ δ1 , ξ 1−δ2 ).Let L n be the length of ( ξδ1 , ξ1−δ2 ).Let 0 < τ < 1 and 0 < < L be arbitrary.Assume n is large enough so that q n = 1 − δ.Then P(L n > L + ) → 0 since ( ξδ1 , ξ1−δ2 ) covers 100 (1 − δ)% of the data and L n = ξ1−δ2 − ξδ1 ≤ ξ1−δ2 − ξδ1 P → L as n → ∞ since the sample percentiles are consistent and the shorth is the smallest interval covering 100 (1 − δ)% of the data.If P(L n < L − ) > τ eventually, then the shorth is an interval covering 100 (1 − δ)% of the cases that is shorter than the population shorth with positive probability τ.Hence at least one of ξ1−δ2 or ξδ1 would not converge, a contradiction.Since and τ were arbitrary, L n P → L. If P( ξδ1 < ξ δ1 − ) > τ eventually, then But such an interval (of length going to L in probability with left endpoint less than ξ δ1 − and right endpoint less than ξ 1−δ2 − /2) contains more than 100(1 − δ)% of the cases with probability going to one since the population shorth is the unique shortest interval covering 100(1 − δ)% of the mass.Hence there is an interval covering 100(1 − δ)% of the cases that is shorter than the shorth, with probability going to one, a contradiction.The case P( ξδ1 > ξ δ1 + ) > τ can be handled similarly.
Since and τ were arbitrary, ξδ1 P → ξ δ1 .The proof that ξ1−δ2 The above results show that PI (9) and the shorth of the residuals behave well when the sample percentiles are consistent.Even if these assumptions do not hold, the PI covers 100q n % of the training data, and often the coverage of the future case will be close to 100(1 − δ) if the future case Y f is similar to the training data.
For asymptotic optimality, can not have extrapolation.Also, even if the coverage converges to the nominal coverage, the length of the PI need not be asymptotically shortest unless the highest 1−δ density region of the probability density function of the iid errors is an interval.The highest density region is an interval for unimodal distributions, but need not be an interval for multimodal distributions for all δ.Also see Cai, Tian, Solomon and Wei (2008).
Notice that the technique computes a PI for coverage q n ≥ 1 − δ which converges to the nominal coverage 1 − δ as n → ∞.Suppose n ≤ 20p.Then the nominal 95% PI uses q n = 0.975 while the nominal 50% PI uses q n = 0.55.Prediction distributions depend both on the error distribution and on the variability of the estimator m.This variability is typically unknown but converges to 0 as n → ∞.Also, residuals tend to underestimate the errors for small n.For small n, ignoring estimator variability and using q n = 1 − δ resulted in undercoverage as high as min(0.05,δ/2).Letting the "coverage" q n decrease to the nominal coverage 1 − δ inflates the length of the PI for small n, compensating for the unknown variability of m.
The geometry of the "asymptotically optimal prediction region" is simple.The region is the area between two parallel lines with unit slope.Consider a plot of m(x i ) versus Y i on the vertical axis.The identity line with zero intercept and unit slope is E(Y i ) = m(x i ).Let (L i , U i ) be the asymptotically optimal population 95% prediction interval containing m(x i ).For example, if the errors are iid N(0, σ 2 ), then ).Then the upper line has unit slope and passes through (m(x i ), U i ) while the lower line has unit slope and passes through (m(x i ), L i ).
The geometry of the "prediction region" for PI ( 9) is a natural sample analog of the population "asymptotically optimal prediction region."A response plot of Ŷi = m(x i ) versus Y i has identity line Ê(Y i ) = m(x i ).The region corresponding to pointwise prediction intervals is between two lines with unit slope passing through the points ( m(x i ), Ûi ) and ( m(x i ), Li ), respectively, where ( Li , Ûi ) is the asymptotically optimal prediction interval (9) for Y f if x f = x i .For the multiple linear regression model, expect the points in the response plot to scatter in an evenly populated band for n > 5p.Other regression models, such as additive models, may need a much larger sample size n.See Section 3.1 for an example and simulations.

Prediction Regions
Asymptotically optimal prediction regions use ideas similar to those in the previous subsection.Some notation is needed.Let the ith case x i be a p × 1 random vector, and suppose the n cases are collected in an n × p matrix X with rows x T 1 , ..., x T n .The classical estimator (x, S) of multivariate location and dispersion is the sample mean and sample covariance matrix where (10) Some important joint distributions for x are completely specified by a p × 1 population location vector µ and a p × p symmetric positive definite population dispersion matrix Σ.An important model is the elliptically contoured EC p (µ, Σ, g) distribution with probability density function where k p > 0 is some constant and g is some known function.The multivariate normal (MVN) N p (µ, Σ) distribution is a special case.
Let the p × 1 column vector T (X) be a multivariate location estimator, and let the p × p symmetric positive definite matrix C(X) be a dispersion estimator.Then the ith squared sample Mahalanobis distance is the scalar for each observation x i .Notice that the Euclidean distance of x i from the estimate of center T (X) is D i (T (X), I p ) where I p is the p × p identity matrix.Often the data X will be suppressed.Then the classical Mahalanobis distance uses (T, C) = (x, S).Following Johnson (1987, pp. 107-108), the population squared Mahalanobis distance and for elliptically contoured distributions, U has probability density function (pdf) The volume of the hyperellipsoid see Johnson and Wichern (1988, pp. 103-104).
where D (up) is the q n th sample quantile of the D i .If x 1 , ..., x n and x f are iid, then region ( 15) is asymptotically optimal on a large class of elliptically contoured distributions in that its volume converges in probability to the volume of the minimum volume covering region {z : and U has pdf given by ( 13).The classical parametric multivariate normal large sample prediction region uses Notice that for the data x 1 , ..., x n , if C −1 exists, then 100q n % of the n cases are in the prediction region, and q n → 1 − δ even if (T, C) is not a good estimator.Hence the coverage q n of the data is robust to model assumptions.
Of course the volume of the prediction region could be large if a poor estimator (T, C) is used or if the x i do not come from an elliptically contoured distribution.Also notice that q n = 1 − δ/2 or q n = 1 − δ + 0.05 for n ≤ 20p and q n → 1 − δ as n → ∞.If q n ≡ 1 − δ, then ( 15) is a large sample prediction region, but taking q n given by ( 6) improves the finite sample performance of the region.Taking q n ≡ 1 − δ does not take into account variability of (T, C), and for small n the resulting prediction region tended to have undercoverage as high as min(0.05,α/2).Using (6) helped reduce undercoverage for small n due to the unknown variability of (T, C).
The Olive and Hawkins (2010) RMVN estimator (T RMV N , C RMV N ) is an easily computed √ n consistent estimator of (µ, cΣ) under regularity conditions (E1) that include a large class of elliptically contoured distributions, and c = 1 for the N p (µ, Σ) distribution.Also see Zhang, Olive and Ye (2012).The RMVN estimator also gives a useful estimate of (µ, Σ) for N p (µ, Σ) data even when certain types of outliers are present.
Three new prediction regions will be considered.The nonparametric region uses the classical estimator (T, C) = (x, S) and h = D (up) .The semiparametric region uses (T, C) = (T RMV N , C RMV N ) and h = D (up) .The parametric MVN region uses (T, C) = (T RMV N , C RMV N ) and h 2 = χ 2 p,qn where P(W ≤ χ 2 p,qn ) = q n if W ∼ χ 2 p .All three regions are asymptotically optimal for N p (µ, Σ) distributions with nonsingular Σ.The first two regions are asymptotically optimal for a large class of elliptically contoured distributions.For distributions with nonsingular covariance matrix c X Σ, the nonparametric region is a large sample (1 − δ)100% prediction region, but regions with smaller volume may exist.See Section 3.2 for examples and simulations.

Regression
Example 1. Chambers and Hastie (1993, pp. 251, 516) examine an environmental study that measured the four variables Y = ozone concentration, x 1 = solar radiation, x 2 = temperature, and x 3 = wind speed for n = 111 consecutive days.Figure 1 shows the response plot made in Splus with the pointwise large sample 95% PI bands for the additive model Y = m(x) + e where the additive predictor m(x) = α + 3 j=1 S j (x j ) for some functions S j to be estimated.Here m(x) = estimated additive predictor (EAP).Note that the plotted points scatter about the identity line in a roughly evenly populated band, and that 3 of the 111 PIs (9) corresponding to the observed data do not contain Y.A small simulation study compares the PI lengths and coverages for sample sizes n = 50, 100 and 1000 for PIs (8) and ( 9).Values for PI (8) were denoted by scov and slen while values for PI (9) were denoted by ocov and olen.The five error distributions in the simulation were 1) N(0,1), 2) t 3 , 3) exponential(1) −1, 4) uniform(−1, 1) and 5) 0.9N(0, 1) + 0.1N(0, 100).The value n = ∞ gives the asymptotic coverages and lengths and does not depend on the model.So these values are same for multiple linear and nonlinear regression as well as additive models.
Software for the simulations is described in Section 4. The multiple linear regression model with E(Y i ) = 1 + x i1 + • • • + x i7 was used.The vectors (x 1 , ..., x 7 ) T were iid N 7 (0, I 7 ) where I p is the p × p identity matrix.Another regression model was . This model was fit as an additive model in x 1 , x 2 , and x 3 .The model was also fit with nonlinear regression where the mean function is known up to the six parameters, although then the second order multiple linear regression model is appropriate.For the additive model, the additive predictor m(x i ) = α + 3 j=1 S j (x i j ).Both the nonlinear regression and additive model had the same mean function m(x i ) = x i1 + x 2 i1 .Thus β = (1, 1, 0, 0, 0, 0) T , α = 0, S 1 (x i1 ) = x i1 + x 2 i1 , S 2 (x i2 ) = 0 and S 3 (x i3 ) = 0.For these two models, the vectors (x 1 , x 2 , x 3 ) T were iid N 3 (0, I 3 ).
The Olive ( 2007) PIs ( 4) and ( 5) are tailored for multiple linear regression but are liberal (too short) for moderate n for many other techniques.The new PIs ( 8) and ( 9) are meant to have coverage near or higher than the nominal  4) and ( 5).For multiple linear regression, the new PIs ( 8) and ( 9) were conservative (too long with roughly 98% coverage for the 95% PI and 70% or 60% coverage for the 50% PI) for n = 50 and 100 compared to (4) and ( 5) for least squares, least absolute deviations L 1 and an M-estimator using the Splus functions l1fit and rreg.See MathSoft (1999, pp. 293-295.)The PIs ( 8) and ( 9) for nonlinear regression and additive models appear to have coverage near the nominal values in the simulations.For n = 50 and 100, the PIs for nonlinear regression were usually roughly 10% longer than those for additive models.The PIs for the additive model were computed using the R function gam.See Hastie and Tibshirani (1990) and Wood (2006).The PI (8) is not asymptotically optimal with error type 3.It is not known whether m is a consistent estimator of m, but the prediction intervals appear to have the correct asymptotic coverage and length.Some consistency results for the additive model and models of the form Y = m(x) + e where m is smooth are given in Müller, Schick and Wefelmeyer (2012) and Wang, Liu, Liang, and Carroll (2011).
The simulation used 5000 runs and gave the proportion p of runs where Y f fell within the nominal 100(1 − δ)% PI.The count m p has a binomial(m = 5000, p = 1 − τ n ) distribution where 1 − τ n converges to the asymptotic coverage (1 − τ).The standard error for the proportion is p(1 − p)/5000 = 0.0031 and 0.0071 for p = 0.05 and 0.5, respectively.Hence an observed coverage p ∈ (.941, .959)for 95% and p ∈ (.479, .521)for 50% PIs suggests that there is no reason to doubt that the PI has the nominal coverage.
Table 1 shows that for n = 1000, the coverages and lengths are near the asymptotic n = ∞ values.For the 95% PI (9), the coverages were in or near (.94,.96) while the 50% PI (9) was sometimes slightly conservative.The coverage for the 50% PI (8) was near 60% for n = 50.PI ( 9) is recommended since its asymptotic optimality does not depend on the symmetry of the error distribution.

Prediction Regions
Rousseeuw and Van Driessen (1999) introduce the DD plot of the classical Mahalanobis distances MD versus the robust distances RD.Olive (2002) shows that if consistent estimators are used and n is large, then the plotted points will follow the identity line with unit slope and zero intercept if the data distribution is multivariate normal, and the plotted points will follow some other line through the origin if the data distribution is from a large class of elliptically contoured distributions but not multivariate normal.
Example 2. Buxton (1920) gives five measurements on 87 men: height, head length, nasal height, bigonal breadth and cephalic index.The 5 outliers have heights that were recorded to be about 19mm and head lengths recorded as The horizontal line at RD = 3.33 corresponding to the parametric MVN 90% region is obscured by the identity line.This region contains 78 of the cases.Since n = 87, the nonparametric and semiparametric regions used the 95th quantile.Since there were 5 outliers, this quantile was a linear combination of the largest clean distance and the smallest outlier distance.The semiparametric 90% region blows up unless the outlier proportion is small.
Figure 3 shows the DD plot and 3 prediction regions after the 5 outliers were removed.The classical and robust distances cluster about the identity line and the three regions are similar, with the parametric MVN region cutoff again at 3.33, slightly below the semiparametric region cutoff of 3.44.
Example 3. Cook and Weisberg (1999, pp. 351, 433, 447) give a data set on 82 mussels sampled off the coast of New Zealand.The variables are X 1 = log(S ), X 2 = log(M), X 3 = L, X 4 = log(W), and X 5 = height where S is the shell mass, M is the muscle mass in grams, L is the length L, W is the shell width and H is the height of the shell in mm. Figure 4 shows a DD plot of the data with multivariate prediction regions added.This plot suggests that the data may come from an elliptically contoured distribution that is not multivariate normal.The semiparametric and nonparametric 90% prediction regions consist of the cases below the RD = 5.86 line and to the left of the MD = 4.41 line.These two lines intersect on a line through the origin that is followed by the plotted points.The parametric MVN prediction region is given by the points below the RD = 3.33 line and does not contain enough cases.Points to the left of a vertical line MD = 3.33 would give a modified classical MVN prediction region.Parametric prediction regions for multivariate normal data tend to have severe undercoverage if the data is not multivariate normal.This undercoverage problem becomes worse as p increases, since if the cutoff h is too small, then the volume of the prediction region depends on h p by ( 14).
For large n, the semiparametric and nonparametric regions are likely to have coverage near 0.90 because the coverage on the training sample is slightly larger than 0.9 and x f comes from the same distribution as the x i .For was recorded where i = 1 was the nonparametric region, i = 2 was the semiparametric region, and i = 3 was the parametric MVN region.The volume ratio converges in probability to 1 for N p (µ, Σ) data, and the ratio converges to 1 for i = 1 on a large class of elliptically contoured distributions.The parametric MVN region often had coverage much lower than 0.9 with a volume ratio near 0, recorded as 0+.The volume ratio tends to be tiny when the coverage is much less than the nominal value 0.9.For 10p ≤ n ≤ 20p, the nonparametric region often had good coverage and volume ratio near 0.5.
Simulations and Table 2 suggest that for N p (µ, Σ) data, the coverages (ncov, scov and mcov) for the 3 regions are near 90% for n = 20p and that the volume ratios voln and volm are near 1 for n = 50p.With fewer than 5000 runs, this result held for 2 ≤ p ≤ 80.For the non-elliptically contoured LN data, the nonparametric region had voln well under 1, but the volume ratio blew up for w ∼ MVT p (1).

General Comments
There are not many practical competitors for the new prediction intervals and regions.Parametric prediction intervals and regions usually assume normality and tend to have severe undercoverage when the normality assumption does not hold.For confidence intervals and testing, misspecification of normality is sometimes not too important if the estimators are asymptotically normal, but for parametric prediction intervals and regions, correct specification of the parametric model is important.For example, do not use a parametric prediction region based on the multivariate normal distribution if the plotted points in the DD plot fail to cover the identity line.
Another competitor for regression is bootstrap prediction intervals.These PIs take hundreds of times longer to compute than PI (9), and convergence problems are greatly multiplied for models such as nonlinear regression models.Also bootstrap PIs may not be valid if a fixed number B of bootstrap samples are used.Di Bucchianico, Einmahl and Mushkudiani ( 2001 ].This PI is (5) using the least absolute deviations estimator, but with a closed interval.
Simulations were done in Splus and R. See R Development Core Team (2008).The Buxton data and programs in the collection of functions rpack.txtare available at (www.math.siu.edu/olive/ol-bookp.htm).For multiple linear regression, the function pisim simulates PIs (4) and (5) while the Splus function pisim4 simulates PIs (8) and (9) using OLS, L 1 and M-estimators.The function pisim3 was used to create Table 1 while pisim5 uses nls to simulate PIs for nonlinear regression.Care is needed when using pisim5 since for some versions of R/Splus, the nls function will fail to converge for some runs.Using nruns = 500 is less likely to cause an error than nruns=5000.The function predsim was used for Table 2.The function ddplot4 was used to produce Figures 2, 3 and 4. The function lpisim simulates the PI for the location model while covrmvn computes the RMVN estimator.

Conclusions
Parametric prediction intervals and regions are notorious for severe undercoverage.The new techniques are designed to have good coverage at the training data, even if the model assumptions fail to hold.The Olive (2007) PIs ( 4) and ( 5) are tailored for multiple linear regression but are too short for many other techniques for moderate n.PIs (8) and ( 9) are generally longer than PIs (4) and ( 5) and have coverage near or higher than the nominal value for many techniques even for moderate n, say n > 10(model degrees of freedom).PIs ( 8) and ( 9) are quite conservative for multiple linear regression for moderate n.These PIs are useful since the error distribution does not need to be known.
The new nonparametric and semiparametric prediction regions appear to have good coverage for n > 20p and may be the first easily computed prediction regions that are effective when the underlying multivariate distribution is unknown.
For the prediction regions, use the DD plot to check the multivariate normality assumption and to check for the presence of outliers.If n > 20p and the plotted points cluster tightly about a line through the origin, then the nonparametric and semiparametric prediction regions may have good coverage.For regression with additive errors, if n is large and the plotted points cluster about the identity line in the response plot, then the new prediction intervals may have good coverage.

Figure 1 :
Figure 1: Pointwise Prediction Interval Bands for Ozone Data

Figure 2 :
Figure 2: Prediction Regions for Buxton Data

Figure 4 :
Figure 3: Prediction Regions for Buxton Data without Outliers ) use the minimum volume ellipsoid (MVE) estimator to cover m out of n cases to produce MVE tolerance regions, but the technique can only be used on tiny data sets.The location model is a special case of both the regression model (1) and of the multivariate location and dispersion model.Let a n = 1 + 15 n n + 1 n − 1 .Let c = n(1 − δ) .Let shorth(c) = (Y (d) , Y (d+c−1) ).Let MED(n) be the sample median.If Y 1 , ..., Y n are iid, then the recommended large sample 100(1 − δ)% PI for Y f is the closed interval [L n , U n ] = [(1 − a n )MED(n) + a n Y (d) , (1 − a n )MED(n) + a n Y (d+c−1)

Table 1 :
PIs for Additive Models coverage for moderate n and for a wide variety of techniques and are longer than PIs (

Table 2 :
Coverages for 90% Prediction Regions ≤ p ≤ 40, the semiparametric region had coverage near 0.9.The ratio of the volumes h