Analyzing and Forecasting Output Gap and Inflation Using Bayesian Vector Auto Regression ( BVAR ) Method : A Case of Pakistan

We attempt to forecast inflation and output gap of Pakistan using Bayesian VARs. We implement three different priors for this purpose. Analysis in this paper is conducted using Monetary Aggregates and Credit macro variables in order to forecast Output Gap and CPI Index based measure of Inflation. Output Gap used in our analysis is estimated in a State–Space framework using Kalman filter. Literature suggests that Bayesian shrinkage is an appropriate tool for forecasting using large number of Macro Economic variables. In addition, appropriate Prior selection is fundamental to robust forecasting in Bayesian VARs; in this backdrop, the 3 types of Priors implemented in our analysis are; 1: Minnesota Priors, 2: Independent Normal–Wishart Priors and 3: Independent Minnesota–Wishart Priors. Estimation and forecasting is conducted in conformity with Koop and Korobilis (2009). Diagnostics of Bayesian VAR models and robustness of forecast estimates show that Bayesian VARs provide robust forecasts and have suitable structural interpretation. This conclusion is especially relevant considering that Bayesian methods provide inherent solution to circumvent the problem of multicollinearity and over parameterization.


Introduction
Linear stochastic difference equations or commonly Vector Auto Regression are standard tools at the disposal of macro econometric researcher for forecasting and analysis.VARs provide both flexibility of model elicitation and the ability to capture intricate data linkages and relationships.This ease of model representation involves probable presence of model misspecification, multicollinearity, over parameterization, loss of degrees of freedom.In order to circumvent these issues VARs are generally implemented in a parsimonious variables setting.
Recently the use of Bayesian methods in time series has heralded in a new era of robust estimation, structural analysis and modeling complex inter-linkages.Litterman (1979) presented a new and revolutionary method of VAR specification.Litterman (1979) proposed biased estimation of VARs in order to avoid the problems of large sampling errors and the problem of multicollinearity (see also Doan, Litterman, & Sims, 1984;Litterman, 1986a).He introduced a method similar to Ridge regression (Hoerl & Kennard, 1970); Stein rule estimators (Stein, 1974), whereby prior information is incorporated in the model estimation.He argues that this may be interpreted as a form of Bayesian estimation, which entails incorporation of prior distribution for parameters for which the biased estimators (like Minnesota Priors) represent the posterior mean.Colloquially this VAR representation can be understood as one that uses prior information along with the data for robust estimation.See the next section for a detailed presentation.
Litterman's implementation of Bayesian methods to VAR representation has been interpreted in literature to be Quasi-Bayesian as he assumes the variance of the parameters to be known.This methodology greatly simplifies the task of estimation, as Bayesian estimation is notoriously hard and time intensive, consider for example a situation where analytical joint posteriors of the parameters are not available, in this case inference is carried out by first integrating out the conditional posteriors of all the parameters and then using a sampling technique, like importance sampler, particle sampler, Metropolis-Hastings Algorithm, Gibbs sampler, etc for drawing inference.
With the advent to fast computing hardware and software, Bayesian methods have gained popular use in all fields of study, particularly in Economics, see Hamilton, J. D. (2006).A vast range of priors can now be implemented for estimation and structural analysis based on researcher's prior knowledge and research needs.
In out paper we implement 3 different priors for forecasting inflation and Output gap in Bayesian VAR methodology.Out of the three prior methods the latter two have complex posterior forms; i.e. analytical posteriors for Independent Normal-Wishart priors and Independent Minnesota-Wishart Priors are not available.We therefore utilize a Gibbs sampler consistent with Koop and Korobilis (2009) for obtaining inference.
The discussion on forecast results shows that our methodology and in-sample forecasts are robust.Finally, we discuss model diagnostics for comparison of the three models and conclude there from.
The paper is structured as follows; next section discusses the relevant literature on Bayesian methods in VARs, section 3 discusses the model as well as prior elicitation, along with the information about the use of data sets for the study while section 4 concludes by stating the model findings, results and interpretation.

Literature Review
Linear stochastic difference equations or common acronym Vector Auto Regression (VAR) specifications mainly suffer from the caveat of probable presence of multicollinearity and over parameterization (Litterman, 1979).This problem which impacts the model specification as well as forecast accuracy is exacerbated in the presence of data set which is characterized by low range.Earliest Bayesian VAR specifications are based on theory compliant with Ridge regressions (Hoerl & Kennard, 1970), Stein Rule estimators, see Stein (1974), and ridge-type estimators in uni-variate autoregressive context as Swamy and Rappaport (1975).These studies demonstrate that they generate biased estimators which possess statistically smaller Mean Squared Errors (MSEs) than OLS estimates.
Ridge estimators are argued to overcome large sampling error problem generally associated with multicollinearity by adding prior information to the existing data that; larger coefficients signify that they are more unreasonable, thereby implying that inherently ridge regression aims to penalize the unreasonably large estimation coefficients by "shrinking" then towards zero.Therefore this class of estimators is also called Shrinkage estimators and the respective priors are termed as "Shrinkage Priors".Litterman (1979) and Litterman (1980) argue that each of the above stated estimators have Bayesian interpretation, whereby the researcher specifies the prior distribution for the parameters of interest and the posterior mean based on the priors can be interpreted as posterior biased with respect to priors.Essentially these procedures imply estimates based on data as well as researcher's prior beliefs on the subject matter.
Collectively Ridge estimators or Stein rule estimators are analogous in their treatment of multicollinearity and large sampling estimation errors by incorporating prior information in the regression equation; however they differ in the metrics through which the 'shrinkage' is applied.Below we present the various ridge regression methods along with the one implemented by Litterman(1979) in his novel study.
The normal linear model can be stated as; Ridge estimators can be expressed as; * , Where 0 These estimators may be classified as posterior means corresponding to priors illustrated in the representation; 0, , with ⁄ ; where and are assumed to be known.
Stein class of estimators is similar to ridge estimators and can be stated as; * , Where 0 Where the prior is assumed to be defined as; 0, Initial application of Bayesian VARs or Bayesian VARs implementing the Minnesota / Litterman priors are in essence similar to a related and similar type of Stein estimators given by Leamer (1972).Where Leamer (1972) He considers a geometrically decaying response pattern in the model below; Where; , , , Litterman (1979) simplified the task of computation and specification of priors by deriving the matrix of coefficients through Bayesian methods while he assumed constant variance of the coefficients which is also assumed to be known, thereby drastically decreasing the task of computation.The above treatment of shrinkage priors proposed by Litterman (1979) in essence of Leamer (1972), implies that as lag length increases the coefficients of these lags are scaled to have tight marginal distributions around zero, in effect the farther the lag, the least its explanatory weight in the VAR regression equation.Leeper, Sims and Zha (1996), Sims and Zha (1998) and Robertson and Tallman (1999) implement Bayesian VARs similar to Litterman (1986a and1986b), they conclude that such priors can be used effectively to circumvent typical problems with VARs, also Bayesian VARs exhibit robust forecast performance.
The assumptions of Litterman have since 1970a been developed and expanded in many directions.Kadiyala and Marta B., Domenico G., & Lucrezia R. (2008), Koop and Korobilis (2009) extend the initial Litterman's Priors by relaxing the assumption of a known diagonal error variance-covariance term.Kadiyala andKarlsson (1993, 1997) are of particular importance as they consider 4 different classes of distributions used to parameterize the prior beliefs.Particularly Kadiyala and Karlsson (1997)  As argued earlier, incase both the coefficients and their variances are assumed to be unknown, then the researcher has to specify probability distributions of both coefficients and their variances, the same holds for posterior distribution and for forecasting.With the introduction of fast computing hardware and software at researcher's disposal a lot of head way has been achieved in economic structural analysis and forecasting.

Methodology, Data and Estimation
Relevant literature for Bayesian VARs identify a wide set of priors methodologies available for implementation in various scenarios.Assumptions for probability distribution, hyper-parameters and posterior sapling methods differentiate one BVAR method from another.In this essence we perform a structural exercise on the impact of a Monetary Policy shock from a Monetary Aggregates channel on Output Gap and Inflation, in line with Marta, Domenico and Lucrezia (2008).In all we utilize three types of Priors for estimation and forecasting of Inflation and Output Gap.
Output Gap estimates are obtained using State-Space framework for quarterly total Output series and then using cubic-spline interpolation method for obtaining output gap for Pakistan economy on monthly basis (see Appendix 1).Koop and Korobilis (2009) and Lutkepohl (2007).
The main rationale for implementing this methodology using monetary aggregates is to access the dynamic relationship of discretionary Monetary policy in controlling inflationary tendencies in Pakistan.Discount rate is assumed to curb inflationary pressures though monetary aggregates, which are assumed to impact the credit appetite of the Government borrowing (PSB) leading to increase of decrease in credit off take by the private sector (PSC), which ultimately translates in changes in Aggregate Demand (Output Gap), leading to changes in CPI inflation.
We elaborate first the VAR notation methodology to simplify the BVAR elicitation.VAR with p number of lags can be denoted as; ∑ Where; : For 1, … , is a 1 vector of observations about M number of variables.
: 1 vector of intercepts; : matrix of coefficients; : 1 vector of errors assumed to be ~ 0, Σ ; We can write the VAR in matrix notation following the convention in Canova (2007), i.e. let be a matrix which stacks the observations on the variables in columns next to each other, let , , … , Note that, if we let 1 be the number of coefficients in each equation of the VAR; is a matrix.Also let, … ′ and therefore is 1 matrix which stacks all the VAR coefficients and the intercept into a vector; therefore we can write the VAR as; Where ε is ~ 0, Σ , Likelihood function can be derived from the sampling density | , Σ which can be solved as one distribution for given Σ another where Σ has a Wishart distribution.

Minnesota Priors
Economists at the University of Minnesota and the Federal Reserve Bank of Minneapolis in 1980s laid the foundations for great simplification in prior elicitation and evaluation (for reference in great detail see Doan, Litterman, & Sims (1984); Litterman, 1986b).The simplification sighted is that in the case of replacing Σ with an estimate Σ doing so greatly reduces the computational burden as in this case analytical posterior and predictive densities are available for further analysis and forecasting.In the context of Bayes' Theorem, It can be seen that the task of estimation is much simplified as there only remains prior elicitation for the case of α.For more information and examples of implementation of BVAR analysis in a Minnesota Prior setup refer to Banbura et al. (2010), Koop and Korobilis (2009) for excellent discussion.The original Minnesota Priors (see Doan, Litterman, Sims (1984) simplify the task of selection of an appropriate Σ by choosing the matrix Σ from an OLS estimate.Banbura et al. (2010) conclude that implementation of Minnesota Priors in BVARs leads to robust forecasts, even in the scenario of large number of variables in the VAR.In its essence Minnesota Priors (Litterman (1986)) assume all VAR equations to be "centered" i.e. following random walk with drift; (11) Meaning that, in the VAR connotations, individual elements of coefficient matrix converge or "Shrink" towards 1 (Note 1).Technically utilizing a suitable Σ the diagonal elements of coefficient matrix A are shrunk towards one and the off diagonal elements are sought to be shrunk towards zero, doing so is advantageous in two major aspects; 1: Shrinking diagonal elements to 1 would lead to own lags to exhibit more explanatory power in VAR.

2:
The impact of recent lags gets more weight in explanation of the dependent variable of the VAR.
The above objective is achieved by utilizing the following methodology; Prior elicitation for the case of is illustrated below, let; The Minnesota priors therefore simplify the task of specification of matrix by choosing three scalars , , .The above specification allows for coefficient shrinkage towards zero and hence circumvents the over-specification or over-fitting problem of VARs with more than 3 or 4 variables.In this manuscript we use the following values in the BVAR methodology; 0.9, 0.5, 100 Moving on the next step is the posterior formulation; One of the major advantages of Minnesota priors is that posterior inference involves only Normal Distribution, thus we can state; | ~ , (16) Where; Using the monthly data at hand on the 7 variables described earlier we evaluate the BVAR with Minnesota priors, as with most VAR analyses the coefficients are not of primary concern, so we present here only the one period ahead forecast performance, see Table 1 and Figures 1a and 1b for details.For discussion on the results see Results section below.

Independent Normal-Wishart Prior and 3.3 Independent Minnesota-Wishart Priors
Development of Independent Normal-Wishart prior first appeared in the work of Zellner (1971) in the context of Seemingly Unrelated Regression Models, where he showed that marginal posterior for the matrix of coefficients can be expressed in a bi-modal stricture of Normal and Student's t distribution.
Let us express the advantages of implementing an independent prior setup as compared with a conjugate prior selection method.Natural Conjugate Priors impose an important restriction of dependence of probability distributions, doing so restricts the VARs to contain the exact same number of variables and lags, also, in Natural conjugate priors the prior covariance of the coefficients in any two equations is proportional to one another.However, Independent Normal-Wishart priors assume that VAR coefficients and the error covariance matrix to be "independent" of one another.This would lead to allow for the researcher to implement another variation of Priors i.e.Independent Minnesota-Wishart Priors.Theoretical underpinnings of the Independent Normal-Wishart priors can be stated below.As we allow for VAR equations to contain different coefficients, we must change the notation pertaining to the VAR framework, following Koop and Korobilis (2009) and Karlsson (2012), Here let be the matrix of VAR coefficients such that; : Is the observation on the variable.
: Is the vector of dimensions 1 containing the observation of explanatory variables for the variable.
: Is the vector of dimensions kmx1 containing the coefficients conformable to .
Allowing for VARs with variable number of explanatory variables allows the researcher to impose automatic restrictions and also to allow for the desirable property of shrinkage of coefficients to zero.This is the major advantage of Independent Normal-Wishart Prior method over the Natural Conjugate priors.Moving on, we can thus write; It is interesting to note here that we implement in our analysis two types of versions of the prior variance-covariance matrix ; Independent Normal-Wishart Priors we utilize the following convention; 10 .
Independent Minnesota-Wishart Priors we implement the Minnesota Prior for (see equations 11-18).
Inference in Independent Normal-Wishart Priors and Independent Minnesota-Wishart Priors is computationally demanding as the joint posterior densities are not analytical and the inference must be carried out by first analytically integrating the conditional posterior distributions | , Σ and Σ |y, β and thereafter using a type of Metropolis-Hasting algorithm, in this manuscript we implement a Gibbs sampler to draw inference of the Joint posterior densities.The Gibbs sampler used here is referenced in Koop, G. & Korobilis, D (2009).In order for Gibbs sampler to draw from the conditional distributions we specify them as; Where; (31) ∑ (32) From here on a Gibbs sampler (Note 2) is used to draw sequentially from the Normal | , Σ and Wishart Σ |y, β .A Gibbs sampler is used to draw 10000 samples for the posterior distribution; the first 2000 are discarded as burn-in draws in conformation with relevant econometrics literature.In order to arrive at a point forecast; Predictive mean is used here.For Predictive mean of one period ahead forecast of Potential Output and Inflation, and their standard deviation please refer to Figure 2a, 2b, 3a and 3b and Table 1 and 2.

Results and Discussion
The results described in the following Tables 1 and 2  As discussed above, implementation of Minnesota prior simplifies the computational process whereby effectively the researcher only specifies the priors on the matrix of coefficients.Output gap forecast from the three Bayesian VARs also yield robust predictive densities, signified by the relatively tight distribution of draws and the low magnitude of standard deviation (see Figures 1, 2, 3).Although all three VAR methods exhibit good overall performance, MSFE of Independent Minnesota-Wishart Priors appears to be least, which signifies this prior as a better candidate for forecasting in the horizon ahead.
Concluding we can observe from Table 1 and 2 that Bayesian VARS based on Independent Normal Wishart priors and Independent Minnesota-Wishart Priors perform better in comparison with Minnesota prior based VAR.
This is exemplified by the Log Predictive Likelihood, the value of the former two models is significantly better than the corresponding value of the Minnesota prior based Bayesian VAR.

Figure
Figure 1a.Predictive density for output gap Figure 1b.Predictive density for inflation Nov-2013, Minnesota priors Nov-2013, Minnesota priors

Figure 3a .
Figure 3a.Predictive density for output gap Figure 3b.Predictive density for inflation Nov-2013, independent Minnesota-Wishart priors Nov-2013, independent Minnesota-Wishart priors are based on predictive draws estimated by the Gibbs sampler implemented for the three different priors.The forecasting exercise and the results thereof are conducted in sample, i.e. we use monthly sample of the range 1991M01-2013M11 for the seven variables in order to forecast the values of inflation and Output gap for the month of December 2014.
Koop and Korobilis (2009)based on Normal-Wishart priors and the Minnesota Priors exhibit robust forecasting performance.Karlsson S. (2012)andKoop and Korobilis (2009)provide extensive and concise representation of a host of prior selection, computation, forecasting and model diagnostics.
George, Sun and Ni (2008)ceri (2012)ation include examples likeGiannone, Lenza and Primiceri (2012)andGeorge, Sun and Ni (2008)these papers illustrate hierarchical priors, i.e. prior based on a prior; this is a prior structure which is influenced by the data, thereby providing "objective" selection procedure for prior selection.

Table 1 .
Therefore the Minnesota prior representation of Bayesian VAR has analytical posterior and predictive densities.The posterior density in this prior setup effective comprises 500000 draws, whose mean represents the equivalent of point estimate for Bayesian analysis.See Table1 and 2.The last two Bayesian VARs based on Independent Normal-Wishart Priors and Independent Minnesota Wishart Priors are computationally demanding as described earlier.We therefore use a Gibbs sampler to obtain posterior densities for these two priors.The Gibbs sampler is programmed to draw 10000 draws from the predictive density, and in accordance with the literature first 2000 draws are discarded as burn-in draws.Figures 1, 2 and 3 represent the predictive mean or Bayesian forecast for inflation and Output gap for the month of November 2013.All three methods yield reasonable and acceptable magnitude of forecast error (see Tablesbelow).Where Bayesian VAR incorporating Independent Normal-Wishart Priors yields the least MSFE among the three methods proposed here.CPI inflation forecast (YOY) for Nov-2013 using data rangeJan-1991 to Sep-2013