A Comparative Study of the Classical and Bayesian Methods of Estimating a Just-Identified Simultaneous Equations Econometric Model

A just identified two-equation econometric model is simulated using both Classical and Bayesian procedures. The estimates of the parameters for both methods were compared under a wide range of scenarios; sample size, residual variance and variance of the data on the predetermined variable. The Monte Carlo experiment was performed using E-veiws and WinBUGS computer softwares. The median, being a robust estimator of average in terms of validity, was used as the posterior estimate. As indicated in similar research in the past where the posterior mode was used as estimate, the Bayesian procedure performed better in most cases, while some scenarios showed similar behavior for the two procedures.


Introduction
Simultaneous equations model (SEM) is a very important field of Econometrics.Some important Statistical implications of a linear simultaneous equation model were presented by Haavelmo (1943), such as estimation of the stochastic equations which should not be done separately.The restrictions imposed upon the same variables by other equations ought to be taken into consideration.Simultaneous equations model could be under-identified, just-identified or over-identified, depending on how each parameter of the model uniquely contribute to the endogenous variable.The just-identified model, where the equations are exactly identified is considered in this research work.The indirect least squares method, two-stage least squares method, k-class estimators, three-stage least squares method, full information maximum likelihood method, Jackknife instrumental variable method due to Angrist, Imbens and Krueger (1999) and Blomquist and Dahlberg (1999) method are the well known classical inferential approaches that have been in use.They are majorly extensions of the two basic techniques of single-equation methods, the ordinary least squares and maximum likelihood.The 'true' model structure is assumed unknown, and is being estimated.However, Dreze (1962) argues that such classical inference has a shortcoming in that, the available information on parameters is ignored; for instance, it is known that the marginal propensity to consume is in the unit interval, an information that could be made use of.The Bayesian inference however combines prior information on the parameter of interest with the likelihood function to give the posterior value.The Posterior distribution thus provides updated information on the parameter(s) under study.A comparative study of the classical and the Bayesian approaches is thus necessary so as to take advantage of their strength and research more on possible ways of improving on their weaknesses.
The need to carry out valid, generally acceptable, appropriate and convenient estimation of the Simultaneous equations econometric models has brought about quite a number of researches on the classical and the Bayesian procedures.Arnold Zellner (1971) presented the comparison of these two procedures on model 1.1, the result showed that the Bayesian method performed better than the classical mostly for the small sample case.
A research was also carried out by Gao and Lahiri (2001), focusing on weak instruments.In cases with very weak instruments, there was no estimator that was superior to another, while in the case of weak endogeneity, Zellner's MELO (Minimum expected loss), a Bayesian procedure, was the best.Their result also showed that under certain scenario (See Gao and Lahiri, 2001), of all the estimators, the BMOM (Bayesian method of moments) performed best.However, Jacknife instrumental variable estimator, a classical procedure due to Angrist, Imbens and Krueger (1999) and Blomquist and Dahlberg (1999), had a poor performance throughout.
These studies reflected some Bayesian estimation method of Simultaneous equations econometric models such as; Bayesian method of moment proposed by Zellner, the methods used by Chao andPhillips (1998), Geweke (1996) and Kleibergen and Van Dijk (1998).
In this paper, we compare the properties of the Bayesian estimators and the classical estimators in repeated trials as carried out by Arnold Zellner(1971) but making use of the median as Bayesian estimate and using additional comparison criteria; mean of the estimates, the bias, and Mean Squared Error.
Generally, Econometric models are often expressed in terms of an unknown vector of parameters T Ĭ R k which fully specifies the joint probability distribution of the observations X = ( x 1… x T ).Given the probability density function f(X/T ), the classical estimation often proceeds by making use of the likelihood function L(T ) = f(X/T ), while the Bayesian estimation technique combines the likelihood function with the prior information which is usually expressed as probability density function of the parameters, ( ) S T .This gives the posterior distribution, which is proportional to ( ) ( ) ( ) p L T S T T .Most Bayesian inference problems, according to Geweke (1989), can be seen as evaluation of the expectation of a function u(ș) of interest under the posterior, Methods of solving this problem (1.1) are not as systematic, methodical or general as are those to classical inference problems: because classical inference is carried out routinely using likelihood functions for which the evaluation of (1.1) is difficult and for most practical purposes impossible.There is a problem with the analytical integration of (1.1) in that the range of likelihood functions that can be considered is small, and the class of priors and functions of interest that can be considered is severely restricted.Also, many numerical approaches like quadrature methods, require special adaptation for each u, ʌ, or L, and become unworkable if k exceeds, say, three.
However, with the advent of powerful and cheap computing, numerically intensive methods for solving (1.1) have become more interesting.This is where Monte Carlo integration comes in (among others) as a way out, particularly the Markov Chain Monte Carlo (MCMC), which involves Gibbs sampling.It provides a systematic approach that, in principle, can be applied in any situation in which E[u ( ) T ] exists, and is practical for large values of k.In this regard, the works of Gao and Lahiri (2001) are of note, also Geweke (1989), Gilks, Richardson and Spiegelhalter (1996), and others.The analysis was carried out electronically with the use of Eviewsc as well as WinBUGS (Bayesian Analysis using Gibbs sampling), a Computer software developed by Biostatistics Unit at MRC(Medical Research council) Cambridge, and Imperial College School of medicine, London.

The model
The model analyzed here is Which is a just identified model where Y 1t and Y 2t are observations on two endogenous variables, X t is an observation on an exogenous variable, U 1t and U 2t are disturbance terms, and Ȗ and ȕ are scalar parameters.Zellner (1971) analyzed the model (2.1) using diffuse prior on the parameters and the posterior modal value as the Bayesian estimate.Here, we use the median of the posterior distribution as the Bayesian estimate and also make use of other comparison criteria; mean of the estimates, bias and Mean Squared Error of the estimator.
The identification status of the model (2.1) is easily evident using the order condition, which specifies that for each endogenous variable in the model having a coefficient to be estimated; at least one exogenous variable must be excluded from the equation.Also by the rank condition which states that; in a system of m equations, any particular equation is identified if it is possible to construct at least one non-zero determinant of the order (m-1) from the coefficients excluded from that particular equation but contained in other equations of the system.This is obvious in our model (2.1) The matrix form of our model (2.1) is Where Y = (Y 1 , Y 2 ), an nx2 matrix of observations on two endogenous variables.Ƚ is a unit matrix I 2 of coefficients for the endogenous variables.X is nx1 vector of observations on the predetermined variables.B is 1x2 vector of coefficients for the predetermined variables, and U = (U 1 , U 2 ) is an nx2 matrix of random disturbance terms.For easy analysis, we need the reduced form of (3.1) (and later carry out transformations as appropriate), given as;

RUN III
Ȗ = 2.0, ȕ = 0.5, X t : NID(0,9), (U 1t U 2t ): NID(0,0;ı 11 , ı 12 ı 22 ), ı 11 =1.0, ı 12 = 1.0, ı 22 = 4.0 In each of these runs, 1000 samples of size 20, 40, 60, and 100 were generated, making a total of 4,000 samples in one run, and 12,000 samples altogether.To obtain random disturbance terms that behave as stated in the three runs, we made use of the method presented in Nagar (1969) Making use of the data generated, the parameter estimates were obtained with the aid of Eviews for the classical method and WinBUGS for the Bayesian method.(The same data sets for the two methods, the data was generated in Eviews).The following were used in comparing the two estimation methods: Performance in repeated sampling -frequency distribution; Mean Estimate; Bias and Mean Squared Error (MSE), for a parameter ș, is given as; , which is also the same as = Var( T ) + (Estimated bias) 2 where N r is number of replications and therefore number of estimates ( T ).
Estimated bias The estimator of Ȗ, by most of these principles of classical (sampling-theory) estimation for this just identified model is the same, which is given as;

Bayesian Estimation method
The Prior Probability density function could be informative or diffuse (non-informative).
(1) Informative prior.This is a situation where information is available about the Prior pdf.The informative prior applied here is; Where S , a 2 x 1 vector, is the mean of the prior pdf, Ȉ -1 is the inverse of the variance-covariance matrix and C = (C Įl ) is a 2 x 2 matrix of the prior covariance.
(2) Diffuse prior.The idea behind the use of diffuse (otherwise known as non-informative, vague) prior distributions is to make inferences that are not greatly affected by external information or when external information is not available.Here, we assume little is known, a priori, about the parameters, S , and the three distinct elements of Ȉ.As our diffuse prior pdf, we assume that the elements of S and those of Ȉ are independently distributed; that is, Using the Jeffrey's invariance theory, we take ( , , ) ( , , ) So that the prior pdf implies the following prior pdf on the three distinct elements of Ȉ -1 This is as a result of taking an informative prior pdf on Ȉ -1 in the Wishart pdf form and allowing the "degrees of freedom" in the prior pdf to be zero.With zero degrees of freedom, there is a "spread out" Wishart pdf which then serve as a diffuse prior pdf since it is diffuse enough to be substantially modified by a small number of observations.
Based on the assumption from our model that rows of V are normally and independently distributed, each with zero mean vector and 2x2 covariance matrix Ȉ, the likelihood function for S and Ȉ is; This is the same as; = S + (S -Ŝ )'X'X( S -Ŝ ) S = (Y-X Ŝ )'(Y-X Ŝ ) and Ŝ is the estimate of S Thus, the likelihood function for the parameters is as given in (4.8).
Combining the diffuse Prior p.d.f (4.3) and (4.6) with the likelihood function (4.8), we have a posterior distribution that is in the bivariate student-t form.This is given as: ¦ , and ˆij w the i,jth element of i, j=1,2.To obtain the posterior distribution in terms of J and ȕ, we carry out the following transformation: , ȕ = ʌ 2 with the Jacobian of transformation |ȕ|.This gives us: (4.10) The Baye's estimate is the mean of the Posterior distribution, if it can be identified, this is solving for the posterior distribution analytically.If the solution of the Posterior function cannot be obtained analytically, then numerical integration is employed in obtaining the normalizing constant which will be the antiderivative of the function, as the situation is in this study.The WinBUGS mentioned earlier is used to obtain the Bayesian estimate by drawing samples from the Posterior distribution and obtaining the mean after ensuring convergence.
Other measures of central tendencies like median and mode) could also be used as the Bayesian estimate (see Zellner, 1971).Hence, in this case, the median is used.
Routes to deriving point estimates via Bayesian analysis are; The point estimates' summary presented in Tables 1-3 reflects some properties of the two estimation methods under discussion.For the classical method, the various estimators (least squares, maximum likelihood) were not separately considered because they give the same estimate for this model being a just-identified model.For all the 3 runs, the Bayesian estimates in the class containing the true value (2.0) are more than the classical estimates, mostly for the small sample case.This performance was the same in all the comparison criteria considered (i.e, frequency distribution of estimates, mean, mean squared error and bias).The distribution of these estimates was closer to Normal in the large samples (see fig. 1 and fig.2) for the two estimation methods.
As expected, as the sample size increases, the mean squared error of the estimators reduces.In run III, where the variance of the exogenous variable (x t ) was raised to 9, the estimates from the two methods were more concentrated around the class containing the true value than in the first and second runs where the variances were 1 and 2 respectively (fig.3).This is an indication that the distribution of the exogenous variable also affects the properties of the estimates.We noticed that in run I and II, the mean squared error was questionably large for the classical method when N=20, this is as a result of outliers that are uncharacteristic of the Bayesian method.
An explanation for the outliers was presented by Zellner(1971) that, "the distribution of the estimator Ĵ , given in (5.1), is such that under the conditions underlying run I and II, extreme values can be encountered with a non-negligible probability".
These results suggest that the features of the underlying model also influence the bias and consistency of the estimator.

Conclusion
Estimation of Simultaneous equations model in Econometric research should be approached with care.The choice of estimation method, as observed in this research work, affects the estimates in terms of bias and consistency, especially when dealing with small samples.The Bayesian estimation method has gained a lot of attention recently which makes practical statistical inference more interesting.Our study here has shown, as expected, that the Bayesian estimation method performs better than the classical for small samples, at least for the just identified model, since in this case, all the classical estimation methods give the same estimate of the parameter gamma.The results also suggested that the median as a measure of central tendency, gives the same result as the mode as the point estimate of the posterior distribution.However, it is important to put the loss function into consideration, an issue that should be given more research.The classical estimation method being more easily applied, might be a better choice when handling large samples, since it appears to give the same result with the Bayesian approach.summary.fill@mean(beta), @mean(gamma), @cov(e1t, e2t), @cov(u1t, u2t),@mean(e1t), @mean(e2t), @var(u1t), @var(u2t) the true value of the parameter, which in this case is the value used to generate the sample values.Routes to deriving point estimates through the classical method are; Maximum Likelihood (ML) Method of moments, generalized method of moments Minimum mean square error (MMSE) Minimum variance unbiased estimator (MVUE) Best linear unbiased estimator (BLUE) of the inverse of Ȉ, the Jacobian of the transformation of the three variances, (ı 11 , ı 12 , ı 22 ) to (ı 11 ,ı 12 , ı 22 ) is

Fig
Fig 2: The Distribution of the Estimates from the Bayesian method The result of this Monte Carlo experiment also emphasized the more on the importance of large samples in Statistical inference

Table 1 .
Summary of Monte Carlo Experiment, RUN I

Table 2 .
Summary of Monte Carlo Experiment, RUN II

Table 3 .
Summary of Monte Carlo Experiment, RUN III