Combining Forecasts from Linear and Nonlinear Models Using Sophisticated Approaches

This paper aims at improving the prediction accuracy through using combining forecasts approaches. In forecast combination, the crucial issue is the selection of the weights to be assigned to each model. In addition to traditional methods, we propose, also, two sophisticated approaches. These suggested methods are modified Bayesian Moving Average (BMA) and Extended Time-varying coefficient (ETVC). The first technique is based on merging the traditional BMA with other frequentist combination schemes to avoid the subjective prior inside the traditional Bayesian technique. The suggested ETVC approach provides consistent time-varying parameters even if there are some measurement errors, omitted variables bias and if the true functional form is unknown. Concerning the included models, we consider both linear and nonlinear models in order to calculate the forecasts of quarterly Egyptian CPI inflation. We find that our proposed scheme ETVC is superior to the best model and all other static combination schemes including the time-varying scheme based on the random walk coefficients updated (TVR) approach. Additionally, the suggested modified Bayesian approach improves the traditional BMA and overcomes the problem of depending on the arbitrary choice for the initial priors.


Introduction
Economic forecasting is an essential tool for economic policy-making.It is believed that the best forecast can be obtained by estimating a parametric model based on a particular dataset and, then, generating predictions from the fitted model.Since different model specifications have the heterogeneous information, they yield different forecasts.Usually, we use some criteria to select the best model and to eliminate the other ones.However, these rejected projections may have some marginal information which is not contained in the best predictor.Therefore, as confirmed empirically, the inclusion of these predictions to form one combined forecast can improve the accuracy of predictions (Clemen, 1989;Armstrong, 1989).Furthermore, if there is structural instability in the data, it is recommended to obtain the average forecast of different models to deal with this variability (Ravazzolo, Van Dijk, & Verbeek, 2007).
In combining forecasts, the primary concern is to calculate the optimal weights that correspond to each model in order to minimize a particular loss function.Methods of obtaining the optimal weights are divided into two groups: namely, Bayesian Model Averaging (BMA); and Frequentist Model Averaging (FMA).Granger and Ramanathan (1984) introduced the FMA methodology by developing Bates and Granger's (1969) method.This technique is based on averaging different predictions in order to minimise a defined loss function.They proposed employing the coefficients of restricted Ordinary Least Square (OLS) as weights for the competing models.Terui and Van Dijk (2002) extended the OLS combination approach by representing the dynamic forecast combination where the weights were estimated as a random walk process.Additionally, Hansen (2007) suggested using the Mallows Model Averaging (MMA) technique based on Mallows' (1973) criterion.Furthermore, Hansen (2008) applied MMA to estimate the weights associated with different forecasts.However, there is widespread criticism of the latter approach because it assumes that all considered models are nested so that the results are sensitive to changing the order of regressors inside the model.
On the other hand, the BMA combines different forecasts by computing the conditional posterior probability for each model and, then, generating weights by calculating the mean of these probabilities.Initially, the BMA uses the Bayesian Information Criterion (BIC), developed by Schwarz (1978), for model selection where the former could be considered to be a simple form of the BMA approach.There is very extensive literature on BMA where the core of this approach is to account for the future uncertainty in terms of probabilities (Lahiri & Martin, 2010) (Note 1).Recent studies of forecast combination in economics showed significant interest in the application of the BMA method.However, there is extensive criticism of the BMA because its results are dependent on using some subjective priors.Those priors should be predetermined for each model and parameters.In other words, BMA priors are based on the researchers' arbitrary choices.
This paper employs different techniques to choose the optimum weights.Therefore, it utilises not only the aforementioned traditional approaches but proposes, also, two more methodologies: namely, modified BMA and Extended Time-varying coefficient (ETVC).Our aim is to demonstrate that, in comparison with traditional methods, these sophisticated techniques improve the accuracy of combining forecasts.The suggested modified BMA avoids the subjective priors inside the traditional Bayesian approach by incorporating traditional BMA with others frequentist combinations schemes.Precisely, we propose the use of the weights of both Granger and Ramanathan (1984) and Inverse Mean Square Forecasting Errors (IMSFE) as priors for the modified BMA.
On the other hand, the proposed ETVC technique is based on Swamy, Tavlas, Hall, and Hondroyiannis (2010) and Hall, Swamy, and George (2014).The first authors developed the Time-varying coefficient (TVC) as a method for estimating consistent parameters even though there is uncertainty about the exact functional form.Additionally, the estimated coefficients are consistent either in cases of omitting some relevant variables, or if there are measurement errors in the included variables.Therefore, due to its above mentioned advantages, we believe that the ETVC method is quite important in generating the optimum weights.Furthermore, it can improve the combination of forecasts since it allows us to imitate the unknown functional form for the included variable.Additionally, it can add more information inside the combination scheme since it is based on some transformations of the predictors in the state variables and not only in the linear form.Finally, ETVC provides more flexibility in the prediction process; this implies that many different types of available forecasts can be considered in order to obtain the best identification.
Concerning the employed models, which we use to calculate the individual predictions, we apply both linear and nonlinear specifications.That is because, sometimes, it is difficult to argue whether the underlying time series is linear or nonlinear.Also, nonlinear models are very useful when the relationships are subject to regime changes.Furthermore, the specific data generating process may change its features from linear to nonlinear (Terui & Kariya, 1997).Therefore, by incorporating those different models, we can compare the forecasting accuracy of each type.The included nonlinear models are Generalized Autoregressive Conditional Heteroscedasticity (GARCH) and its threshold extension (TARCH).Also, we consider the Autoregressive conditional variance, skewness and kurtosis (GARCHSK-M), the Neural Network (NN), and Markov-switching (MS) models.The latter model is a piece-wise linear model since the data generating process is linear within each regime.Then, we utilise two structural linear models, namely, Bayesian Vector Autoregressive (BVAR) and the modern semi-structural "DSGE-VAR" models.Finally, we estimate Time-Varying Coefficients Autoregressive model.We apply those models to predict Egypt's quarterly CPI inflation.Then, by using both traditional and developed techniques proposed in this paper, we use the predictions, resulting from these different specifications, to compute the combined forecast.
Consequently, this research contributes to the literature by suggesting two more sophisticated combination schemes and compares them with the traditional methods.Additionally, we employ a range of both linear and nonlinear models to forecast future inflation which we use as inputs in our combination methods.Finally, we apply our methodology to quarterly Egyptian CPI inflation; this was not investigated in previous studies.Our motivation for choosing Egypt as a case study is the Central Bank of Egypt's announcement of its plan to move to a full-fledged inflation targeting framework when its prerequisites are satisfied.Consequently, it must have accurate models to predict future inflation since it is the core of this regime.
The results indicate that the Semi-Structural model is the best model according to all employed prediction criteria.Furthermore, both the time-varying Autoregressive and TARCH models provide good forecasts whereas the linear BVAR model has the lowest prediction accuracy.Regarding the combination techniques, our proposed ETVC technique dominates the best model and all other static combination approaches.Also, we compared the ETVC's forecasting ability with the time-varying scheme with random walk coefficients updated (TVR) and found that it was inferior to the ETCV approach.Additionally, the suggested modified Bayesian methodology improves the traditional BMA and overcomes the problem of subjective choice of initial priors .
The paper is arranged as follows.Section 2 discusses the employed combination methods.Section 3 presents the applied models.Section 4 displays the empirical results of the eight different models.Section 5 is assigned to assess the performance of the alternative forecast combination approaches.Finally, Section 6 concludes and suggests some policy recommendations.

Combination Methodology
This paper applies different procedures for choosing the optimal weights in combining forecasts.In addition to the two proposed methods, namely modified BMA and ETVC averaging with time-varying weights, these measures include simple schemes, Frequentist This loss function is supposed to be a function only of the forecast error.Additionally, the vector  ̂ contains the optimal weights which satisfy equation (1).According to Ravazzolo et al. (2007), equation (1) allows for the inclusion of both nonlinear and time-varying approaches of combinations.In order to achieve a closed form solution of equation ( 1), the loss function is assumed to be the Mean Squared Forecast Error (MSFE) as expressed in equation (2).( ( )) ( ̂  )  > 0 (2) Thus, in this study, we assume that the loss function is the MSE which is given a fixed value of θ =1.Now, our discussion focuses on the different approaches applied to compute the weights assigned to each model.These approaches are divided into four categories: Simple; frequentist; Bayesian; and the proposed approaches.

Simple Forecasts Combination Schemes
As indicated by Timmermann (2006), there is no need to estimate any parameters or to compute the variance-covariance matrix in simple forecast combination schemes.In this research, we use two simple approaches which are equal weights (EQ) and Inverse Mean Square Forecast error (IMSFE); these are explained briefly below:

Equal Weights (EQ)
This approach is considered to be the simplest method for calculating the combination of weights.Despite its simplicity, many studies found that it worked better than many of the complicated techniques.It computes those weights as the mathematical average of all available individual forecasts: This combination scheme assumes that the models, which have less forecasting error, should be associated with higher weights.Hence, calculating the combination weights depends on the inverse forecasting error for the available forecasting models; and it can be obtained as follows:
These are covered below: 1.1.1Granger-Ramanathan Combination (GR) This technique depends on estimating OLS regression to compute the weights of each single model by imposing some constraints.These restrictions include no intercept, nonnegative coefficients and, finally, the summation of these estimates should equal unity.Therefore, the constrained regression model can be represented by the following: The   ,  and  are the forecasts resulting from the first, second and k th model.(Note 2).

Mallows
Model Averaging (MMA) Hansen (2007;2008) introduced (MMA) as a method for models combination; this could be expressed as: Where: Where () is the averaged projection matrix and, given that the sample variance is an alternative of the unknown variance of population () refers to the number of effective coefficients as follows: (8) For some fixed integer N, the weights are derived by minimizing equation ( 6) with constraints  ∈  * () Hansen (2008) adopted the following discrete set H k * for weights:

Bayesian Combination
The logic behind the BMA is that, if there are k potential models, only one of them is the true one.Then, we estimate the posterior distribution as the weighted average of the conditional predictive densities for the included models.The predictive density of   providing the available observed data till the time t, , is estimated using the posterior probabilities as follows: (Note 3).

𝑝(𝑀 |𝐷) 𝑤
Where D is specified dataset and ( ) is the prior probability associated for the model .

The Proposed Combination Approaches
This subsection presents the two proposed combination schemes: namely, the modified BMA; and ETVC averaging methods.

The modified Bayesian Approach
We propose a combination of both frequentist and Bayesian schemes by using the results of the former combination methods as priors inside the latter technique.This suggestion is advantageous since it should improve forecasting accuracy and, also, it overcomes the subjective choice in choosing priors inside the Bayesian approach.Therefore, using the weights  resulting from previous methods and substituting the results of both OLS and IMSFE as priors for the same Bayesian identification as in the previous procedure, gives the following as follows: As indicated in the introduction, Swamy et al. (2010) used the basis of the theorem of Swamy and Mehta (1975) to derive the TVC technique as a method for estimating consistent parameters.According to this theory, any nonlinear functional form can be specified precisely as a linear model,;however, it allows the parameters to vary over time.This indicates that, by using the TVC model, we can always estimate any relationship even though we do not have enough knowledge about the specification of the true function.Therefore, we can obtain consistent coefficients taking into consideration any nonlinearity that may exist in the actual dataset or that some relevant variables may be missing from the estimated models.
Forecast combination can face the misspecification problem exactly like individual models.Specifically, combined predictions can have the same omission bias like other least squares regression models.Therefore, the estimation of optimum weights may be exposed to the omission problem since it is not expected to include all possible specifications.Additionally, the relationship between the forecasted variable and the regressors (i.e.alternative predictors) inside the combination scheme may be nonlinear (Landram, Shah, & Landram, 2011;Dong, 2002).Consequently, traditional estimation methods, such as OLS are invalid while ETVC is appropriate under these circumstances.
Based on the TVC approach, we can express any nonlinear relations to average different forecasts as follows: Thus,   represents the predictions resulting from the different models and   are the time-varying coefficients (or weights) for each model.As indicated in the introduction, the TVC approach depends on the selection of some relevant variables to feed the coefficients drivers.Initially, the selection of those drivers depends on a bench of "arbitrary" assumptions which, usually, are challenging in conducting this approach.Hall et al. (2014) classified the complete set of biased TVCs into two subgroups: namely, biased, and unbiased components.The biased part contains the coefficients associated with the misspecification of the model.Therefore, this component should be eliminated in order to obtain the consistent estimates of the correct functional form.We follow Hall et al.'s (2014) technique to obtain consistent estimates of the weights assigned to individual models in our forecast combination exercise.The following are the assumptions that should be imposed to implement operationally the above mentioned technique (Note 5).
Assumption1: each coefficient can be expressed as a linear function of a group of variables (coefficient drivers) plus a random error.On the assumption that we have fixed parameters, and   coefficient drivers, the time-varying coefficients is written as: Assumption 2: we can categorise the group of coefficient drivers and the constant in equation ( 15) into three different subgroups, namely .Therefore, is related to the variability in the true parameter because of the nonlinearity of the relationship.Also, both are associated with the omitted variable bias and the measurement error bias respectively.Moreover, the selected driver set should satisfy some prerequisites.As a rule, the total drivers should produce a well-specified relationship and it should have high predictive power.
In this paper, we base our estimate of the ETVC on the full coefficients ohtemd in TVC approach with some driver sets inside each time-varying coefficient.In addition to some nonlinear transformations from each alternative forecast, we include some polynomial components to be able to account for any bias and, also, for any possible nonlinearity inside the actual data of the considered forecasted variable (Note 6).

Time-Varying with Random Walk Updating Coefficients (TVR)
Finally, it is useful to compare our proposed ETVC combination scheme with other time-varying coefficients methods.One of the most common alternatives is the Time Varying with Random Walk Coefficients (TVR) method.According to this methodology, the combination of different point forecasts can be written in the form: (Note 7).
+ ⋯ +     +   (16) − +   (17) Where,   and k are the actual series and the predictions of the various models.Additionally, the time-varying coefficients   are modelled as random walk processes and error terms;   and   are independent and distributed normally with zero means and constant variances.
Both non-constant combinations schemes can be identified in state space form in which the first equation is assigned to the variable of interest with other state equations for each time-varying coefficient.The state space can be estimated by utilizing the predictive Kalman Filter algorithm as recursive relationships with respect to the predictions (Harvey, 1989).However, compared to TVR, the ETVC is a more sophisticated approach for fitting the non-constant coefficients since the former incorporates extensive information to overcome many misspecifications in the underlying relationship as mentioned earlier.This means that ETVC is expected to give better imitation for the real data generation process; this is the ultimate aim of all econometricians.

Models
This section presents the employed linear and nonlinear specifications.The nonlinear models include time-varying conditional volatility models, GARCH-M and TARCH-M models.Then, we present the GARCHSK-M model which assumes that the conditional distribution is time-varying in the first four moments.Additionally, we employ the Neural Network (NN) and Markov Switching (MS) models.This is followed by introducing two structural linear models, namely, BVAR and the modern Dynamic Structure General Equilibrium (DSGE) augmented with the Vector Autoregressive model which yields the (DSGE-VAR) model.Finally, we present briefly the Time-Varying Coefficients Autoregressive model.

GARCH Model with Generalised Error Distribution
We employ the GARCH model, developed by Bollerslev (1986), as an extension of the ARCH model introduced by Engle in 1982.The basic GARCH model estimates the conditional volatility as a function of its past lags as well as lagged squared errors.This model is estimated using the Maximum Likelihood (ML) approach.Therefore, in the GARCH(1,1) model, the conditional variance equation can be written as: Where  − are the past squared errors.Given that the variance must be strictly positive, the parameters of equation ( 19) must be greater than zero (i.e., β 0 ≥ 0 β ≥ 0 β ≥ 0).Furthermore, to guarantee that  is stationary, the sum of ARCH and GARCH parameters must be less than unity (i.e., β + β < 1).

TARCH-M Model with T-Distribution
There is criticism of the GARCH (p,q) models because they assume that the variance responds similarly to positive and negative shocks.Glosten, Jagnnnathan, and Runkle (1993) developed the TARCH specification to capture the asymmetric response of financial time series to different signs of the shock.According to this specification, the conditional volatility can be written as: The following conditions are required to ensure that conditional volatility is strictly positive;  0 > 0,  > 0,  +  > 0,  ≥ 0. The asymmetry parameter  can be positive or negative depending on the shock.Also, this coefficient is used to measure the contributions of shocks to persistence over both the short-run and the long-run; ( +  /2) and ( +  +  /2) respectively.

Modelling Conditional Variance, Skewness and Kurtosis
Based on the Gram-Charlier series expansion of the normal density, Leon, Rubio, and Serna (2005) proposed a new methodology to estimate jointly time-varying conditional second, third and fourth moments.Specifically, they assumed that each of these moments is generated by a GARCH-type process.Let GARCHSK-M indicates the time-varying higher order moments model when the conditional variance, skewness and kurtosis follow a GARCH (1,1) specification.Therefore, the GARCHSK-M model is estimated in steps starting with the GARCH (1,1) model of inflation.Then, the estimated parameters are used as starting values for the equations of mean and variance in the GARCHSK-M model.Thus, the variance equation takes the form described in equation ( 19

Neural Network (NN) (Note 8)
Artificial NN models comprise complex nonlinear relationships to generate forecasts based on the brain's simple mathematical approaches.They can be seen as a network of "neurons" categorized with particular layers.As inputs come in the bottom layers, outputs or forecasts come in the top layers and "hidden neurons" come as intermediate layers.Thus, by using lagged values as inputs inside the neural network system, the NN model can be utilized to estimate nonlinear autoregressive models for a particular variable.In this case, the relationship can be represented in the form NNT(p,k) where p is the number of lags, and the k is the number of the hidden nodes inside the layers.We employ the Feed-Forward networks with one hidden layer and the three nodes approach, which depends on the training sample to fit the data.Then, we obtain the out-of-sample forecasts depending on a learning algorithm to minimize a particular loss function (Hyndman & Athanasopoulos, 2013).

Markov Switching Regression
This model is based on decomposing a series in a finite sequence of distinct stochastic processes or regimes.Therefore, the current process in each regime is linear but the combination of the processes generates a nonlinear regime.The autoregressive model, which is subject to changes in the autoregressive parameter, can be expressed as in the following system.Equations ( 24) and ( 25) assume that we have two regimes, (Brooks, 2002;Laurini & Portugal, 2002): The parameters and β capture the behaviour of the series when the current system is one while  and β describe the behaviour of the series in the second regime.In this paper, we employ the Markov Chain method and assume that the probability of a variable   conditional on some particular j value depends only on its previous value  − .This is represented by the following equation: *  | −  + *  | −  +  (26) Where i; j, give the probability that state j follows the state i.The key feature of this Markov transition matrix of the first order is that the probability of transition to the next regime depends only on the current state (Laurini & Portugal, 2002).

Bayesian Vector Autoregressive (BVAR)
Vector Autoregressive (VAR) is a common workhorse in forecasting purposes.In this study, our VAR model includes four variables which are incorporated usually in the case of fitting inflation inside the small open economy.Moreover, we employ the Bayesian approach that combines between the initialised priors and data fit summed up in the final posteriors.This type of model is advantageous in comparison with the classical VAR, especially in case of small samples, since it allows for more degrees of freedom by including priors in the initial fitting values.The included variables are CPI inflation, GDP growth rate, changes in real exchange rath ake oil prices.

Semi Structure Model
Dynamic Stochastic General Equilibrium (DSGE) models are employed regularly as they provide analytical tools to understand better the equilibrium relationships inside the economy.However, DSGE models have received many criticisms in terms of forecasting accuracy.Therefore, many efforts have been made to improve their predictions (Ingram & Whiteman, 1994;Schorfheide, 2000;Del Negro & Schorfheide, 2004;and Del Negro, Schorfheide, Smets, & Wouters, 2007;Gupta & Steinbach, 2013).These studies recommended that both the structural DSGE and the VAR models be merged.According to Del Negro and Schorfheide (2004), the DSGE estimated parameters might provide useful information for VAR parameters.Therefore, subject to the relative weights (λ) assigned to each type of data, the VAR model should be estimated based on both the actual and the DSGE priors.This study uses that methodology to compute the forecasts from the DSGE-VAR model based on the optimization assumptions for micro-agents (Woodford, 2003).The corresponding model encompasses a representative householder, a sequence of monopolistic competitive firms and the central bank (For more details, see Appendix A) .BVAR posterior estimate is conditional on the value of the relative weights associated with both models DSGE and the unrestricted VAR.Where the optimal value of can be estimated such that it maximizes ( | ) as follows:

Time-Varying Coefficients Autoregressive
We analyse a linear autoregressive model with time-varying coefficients that can be presented in the following form:   0 +   − + ⋯ +   − +   (30) The underlying estimates follow a random walk process.The model can be characterised in a state space form which can be solved by a predictive Kalman Filter algorithm (Note 9).
Consequently, Y is CPI inflation and Σ ̂ is the estimated covariance matrix and the optimal lag length is determined based on AIC criteria.

Data and Preliminary Check
We employ quarterly data sourced from International Financial Statistics (IFS) for the period 1957:1 to 2015:1.We include variables of CPI, nominal exchange rate, Gross domestic product (GDP), nominal interest rate (r) for Egypt as well as world oil prices (oil) and CPI of the USA.Inflation data is computed as quarterly changes in the logarithm of the CPI.We chose the sample to include the largest number of available observations in order to provide more accurate results.Table 1 displays the basic descriptive statistics for the data which, according to Jarque-Bera (JB) test statistic, is unlikely to be drawn from normal distribution We follow Box-Jenkins approach in selecting the best specification of the mean equation for the models which allows for volatility modelling.For both the GARCH and GARCHSK-M model, the selected specification includes first and fourth lags of inflation while the TARCH specification is represented as an ARMA(1,2) process.These specifications are selected according to both AIC and SIC criteria and they are free of the serial correlation between the errors.However, the serial autocorrelation amongst the residuals of these different models exists in the sequences of     and   4 .Furthermore, from the ARCH LM test, there are indications of the ARCH effects in the residuals.Therefore, the models, which assume time-varying conditional variance and higher order moments are more suitable in modelling inflation .Also, we add two dummies in the volatility equations to account for the shift to the open door policy in 1974 and the start of Economics Reform and Structural Adjustment Programme (ERSAP) in May 1991.

Results
We estimated the models over the period (1957:1-2000:2).Therefore, we used the rest of the observations to predict out-of-sample inflation in order to allow a suitable number of observations to be employed in the combination of forecasts.Table 2 presents the results of the first three models, GARCH-M, TARCH-M, and the GARCHSK-M.As shown in the Table, the volatility persistence parameter  is positive and significant in all models with the lowest magnitude being in the GARCHSK-M model.Concerning the volatility effect in the mean equation for the GARCHSK-M model, the estimated parameter is both positive and significant.In addition, the effects of shocks to variance are significant in all models with the lowest magnitude being in the GARCHSK-M model.With reference to the conditional skewness, the shocks parameter is significant with a negative sign while the persistence parameter is both positive and significant.In the same way, the shocks to conditional kurtosis and the persistence parameters are both positive and significant.Furthermore, the lagged kurtosis coefficient is greater than that of the lagged volatility whereas the shock effects on kurtosis are the smallest when compared to those effects of shocks to volatility and skewness.
Table 2. GARCH-M (GED), TARCH-M (t-dist) and GARCHSK-M models  Regarding the Artificial feed-forward Network model, its fitment is based on a single hidden layer with three nodes and four lags of quarterly inflation.This allows us to obtain the filter in-sample forecasts as training period and, then, the out-of-sample predictions.Figure 2 shows the in-sample filter of the model while Figure 3 displays the out-of-sample forecasts where the red lines represent the estimated series and the blue lines represent the actual data.
In the MS model, the suitable number of regimes is selected according to the AIC. Figure 4 displays the results of the prediction, filtered, and smoothed probabilities of the model.The filtered probabilities are computed using the information up to period t-1 to infer the probabilities at moment t.
With respect to the estimation of the structural models BVAR and DSGE-VAR, they are estimated over the period 1982:1 2000:2 using the Bayesian approach.The BVAR model includes an exogenous dummy variable to account for the ERSAP structural break.The selected lag length is chosen according to three criteria which are the Likelihood ratio (LR), the Final prediction error (FBE), and the AIC.Diagnostic tests show that the model is well-specified.
On the other hand, the DSGE-VAR model consists of some linear equations that describe the endogenous variables dynamics which can be solved by using the standard Blanchard-Kahn condition and the Metropolis-Hastings algorithm to derive the posterior distribution.Table 3 presents the estimation results of the DSGE-VAR model; it shows that ̂ is 0.6749.(Note 10).
In order to estimate the Time-Varying Autoregressive model, we utilise the predictive Kalman Filter approach with random walk coefficients as explained previously.Also, AIC lag criterion indicates that the best lags are the first and the fourth including a constant term in each equation.Figure 6 shows the time-varying for the coefficients. -

Forecasting Performance of Estimated Models
Table 4 shows the different measures used to assess the accuracy of predictions.The first two criteria, namely, RMSE and MSE depend on the scale of the dependent variable which implies that they are relative measures in order to compare forecasts across different models.According to this criterion, the smaller the error, the better is the related model's forecasting ability.With respect to Theil inequality coefficient, it must lie between zero and one, where zero is a sign of perfect fit.
Table 5 presents the results of different measures for the one-step-ahead out-of-sample forecasts.According to the three employed criteria, the DSGE-VAR model is the best model whereas, the BVAR model is the worst one.
Additionally, based on all the conducted measures, the TVAR model provides good forecasts.Also, although the TARCH model is good in terms of both RMSE and MSE measures, its performance is poor in terms of the TIC measure.The numbers in the square brackets indicate rankings of the models where [1] indicates the best models according to the corresponding measure.

Combination Results
The main aim of any forecasts combination procedure is to improve the prediction accuracy of the individual forecasts.Therefore, the good combination scheme should be superior to all individual forecasts and it should perform well compared to the other competing combination methods.In our analysis, we compare the forecasting performance of the different forecasting combination schemes and the best model in terms of MSE and RMSE.Table 6 reports the results of comparing the prediction power of all combination methods whereas Table (B) presents the weights associated with the individual models according to the different static combinations schemes.Additionally, the time-varying weights, based on both random walk updated coefficients and ETVC, are shown in graphs (B1) and (B2) respectively.In general, we can observe that the dynamic combination technique using ETVC dominates the best model and all other static combination schemes and, also, the traditional time-varying with random walk updated coefficients in our exercises.
Specifically, the dynamic combination scheme using ETVC is the most accurate combination procedure since it outperforms the best individual model and all other combination methods in terms of all forecasting measures.Additionally, as shown in Table 6, the best forecast is superior to both simple and frequentist schemes since the GR, MMA, MFE and EQ combination methods have higher values of MSE, RMSE and TIC in comparison with the best individual model.Concerning the Bayesian combination schemes, the suggested improvements in substituting initial priors based on the other combination schemes proved to be very successful since the traditional Bayesian approach had values of 0.000185 and 0.0136 for MSE and RMSE respectively.Additionally, the Bayesian combination augmented with IRMSFE scheme achieved 0.000180 and 0.0134 respectively, and, for the above-mentioned criteria, the Bayesian combination augmented with OLS scheme achieved 0.00017964 and 0.0134 respectively.It is worth noting that we did not face the puzzle of forecasts combination literature in which the EQ approach outperforms sophisticated combinations methods.There are two reasons for us investigating this matter.Firstly, we have initial heterogeneous models which imply that each model has its specific information and some specific features.Secondly, we used some more sophisticated combinations methods such as ETVC and the modified Bayesian combination technique; these proved that it works very well with averaging different forecasts.

Conclusion and Policy Implications
By using Egyptian quarterly inflation data, the paper aimed at improving the inflation prediction using forecast combination of predictions of different linear and nonlinear models.In choosing the optimal weights associated with alternative models, we used not only the traditional approaches but, also, we proposed two advanced approaches: namely, modified BMA; and ETVC.In order to avoid the arbitrary choice, we based the proposed modified BMA on using the weights of some frequentist combination methods as priors inside the traditional Bayesian technique.The ETVC technique allowed us to compute consistent optimal weights even if there were measurement errors, misspecification or if the correct functional form was not identified.Consequently, it added more information inside the combination scheme since it was dependent on some transformations of the predictors in the state variables and not only in the linear form.
The results indicate that, according to MSE and RMSE criteria, the Semi-Structural model is the best model while the BVAR gives the worst forecasts.Concerning the combination techniques, the proposed ETVC approach dominates the best model, the time-varying scheme with random walk coefficients (TVR) and all other static combination schemes.Furthermore, the suggested modified Bayesian approach improves the traditional BMA and overcomes the problem of subjective choice for the initial priors.
Based upon the above conclusions , when generating the inflation predictions, the Central Bank of Egypt should consider a range of different models including linear and nonlinear models.Also, we recommend that the Central Bank of Egypt does not depend on a single model for prediction purposes.Instead, we recommend that the sophisticated combinations schemes, namely ETVC and the modified BMA, should be used to improve the published forecasts for inflation.
This research could be extended in different ways.Firstly, we based our forecasts on some linear and nonlinear models and recommend that more models be incorporated such as the bilinear or the Dynamic Factor models.Secondly, we recommend that the ETVC combination scheme be extended by incorporating the Bayesian probability techniques since we consider that this would be a worthwhile starting point for future studies.Note.The estimation of ETVC is based on the full coefficients method in TVC approach, with two driver sets inside each time varying-coefficient which include the first lag for the associated point forecast variable and its quadratic form to account for the nonlinearity inside the combination scheme.As we tried to augment more nonlinear terms, but the quadratic forms were the most significant ones.

Appendix B Results of forecast combinations
the number of considered models, ( | ) is the posterior probability of model  Additionally, ( |  ) is the conditional predictive density conditional on the  and the function .Based on and other information set in model  , we can calculate the conditional predictive density as: ( |  ) ∫ ( |   )( |  ) (11) Where ∫ ( |  ) can be defined as the conditional predictive density of  given  , and  .Then, model M' posterior probability can be estimated by: (Note 4).

Figure B2 .
Figure B2.Weights assigned to different models according to ETVC Where   is the error term,   is the standardized residuals,    and   are conditional volatility, skewness and kurtosis corresponding to   respectively.They determined that  − (  ) 0  − (  ) 1  − (  )   and (  4 )   .
) while meaks ,newness and kurtosis equations are represented by the following set of equations: Mean equation:

-dist) estimate p-value estimate p-value Estimate p-value Mean Equation
All models are estimated using ML estimation using Marquardt algorithm.Significant p-values are indicated by bold.

Table 4 .
Different criterions of predictive power

Table 5 .
Out-of sample forecasts power of different models for one step ahead

Table 6 .
Out-of-sample forecasts power of different combination methods for one step ahead

Table B1 .
Different weight of static models