Financial Volatility Forecasting by Least Square Support Vector Machine Based on GARCH, EGARCH and GJR Models: Evidence from ASEAN Stock Markets

This work is supported by Shanghai Leading Academic Discipline Project, Project Number: S30504. Abstract In this paper, we aim at comparing semi-parametric method, LSSVM (Least square support vector machine), with the classical GARCH(1,1), EGARCH(1,1) and GJR(1,1) models to forecast financial volatilities of three major ASEAN stock markets. More precisely, the experimental results suggest that using hybrid models, GARCH-LSSVM, EGARCH-LSSVM and GJR-LSSVM provides improved performances in forecasting the leverage effect volatilities, especially during the recently global financial market crashes in 2008.


Introduction
Time series method plays a vital role in financial areas, particularly volatility modeling and forecasting.Most of the financial researchers and practitioners are mainly concerned with modeling volatility in asset returns.In this context, volatility is the variability in the asset prices over a particular period of time.It refers to the standard deviation of the continuously compounded returns of a financial instrument with a specific time horizon.It is often used to quantify the risk of the instrument over that time period.Investors want a premium for investing in risky assets.A risk manager must know today the likelihood that his portfolio will decline in the future and he may want to sell it before it becomes too volatile.Therefore, the ability to forecast financial market volatility is important for portfolio selection and asset management as well as the pricing of primary and derivative assets.Researches on time varying volatility using the time series models have been active ever since Engle introduced the ARCH (autoregressive conditional heteroscedasticity) model in 1982.Since its introduction, the GARCH model generalized by Bollerslev (1986) has been extended in various directions.Several extensions of the GARCH model aimed at capturing the asymmetry in the response of the variance to a shock.These extensions recognize that there may be important nonlinearity, asymmetry, and long memory properties in the volatility process as suggested by various researchers based on empirical evidences.The popular approaches can be referred to Exponential GARCH model by Nelson (1991) as well as the GJR model by Glosten, Jaganathan, and Runkle (1993) which both account for the asymmetric relation between stock returns and changes in variance; see Black (1976) the beginning study of the asymmetric effect; Engle and Ng (1993) for further discussion.Other models such as APARCH, AGARCH, TGARCH and QGARCH models have also been developed (by Ding, Granger and Engle (1993); Engle (1990); Zakoian (1994) and Sentana (1995)) for the flexibility of the models.However, all of the models do require specified distribution of innovations in order to estimate the model specification and to appropriately forecast future values.One of the most classic one is Gaussian process and it is widely used in most of literature; but other distributions of innovations are also attracted after the empirical studies of modeling returns have shown the violation of normality conditions.For example, Student's t distribution by Bollerslev (1987), GED in Nelson (1991), Granger and Ding (1995) for the Laplace distribution and Hsieh (1989) for both Student's t and GED as distributional alternative models for innovations.The researches have found that returns usually exhibit empirical regularities including thick tails, volatility clustering, leverage effects (Bollerslev et al,1994).
Semi-parametric approaches do not require any assumptions on data property (i.e. return distribution).These models have been successfully shown for modeling and forecasting time series, including volatility.One of them is NN (neural network) and it is a powerful tool for prediction problems due to their best ability to estimate any function arbitrary with no priori assumption on data property (Haykin, 1999).Donaldson and Kamstra (1997) proposed neural network to model volatility based GJR-GARCH; their hybrid approach captured asymmetric effects of new impact well like parametric model and also generated better forecasting accuracy.Bildirici & Ersin(2009) fitted neural network based on nine different models of GARCH family such as NN-GARCH, NN-EGARCH, NN-TGARCH, NN-GJR, NN-SAGARCH, NN-PGARCH, NN-NGARCH, NN-APGARCH, and NN-NPGARCH to forecast Istanbul stock volatility and most of the hybrid models improved forecasting performance.This indicates that the hybrid model is also able to capture the stylized characteristics of return.Another efficient (semi-parametric) model is SVM (support vector machine) originally introduced by Vapnik (1995).The SVM, a novel neural network algorithm, guarantees to obtain globally optimal solution (Cristianini & Shawe-Taylor, 2000), and hence it solves the problems of multiple local optima in which the neural network usually get trapped into.Perez-Cruz et al (2003) predicted GARCH(1,1) based volatility by SVM and the proposed model yielded better predictive capability than the parametric GARCH(1,1) model for all situation.Chen et al (2008) developed recurrent SVM as a dynamic process to model GARCH(1,1) based volatility.The experimental results with simulated and real data also showed the model generated better performance than MLE (maximum likelihood estimation) based GARCH model.More applications of SVM in GARCH prediction based on different kernels, wavelet and spline wavelet can be referred to Tang et al (2008Tang et al ( , 2009)).
Another version of SVM is LSSVM (Least squares support vector machine), modified by Suykens et al (1999).The SVM algorithm requires Epsilon insensitive loss function to obtain convex quadratic programming in feature space, while LSSVM just uses least square loss function to obtain a set of linear equations (Suykens, 2000) in dual space so that learning rate is faster and the complexity of calculation in convex programming in SVM is also relaxed.In addition, the LSSVM avoids the drawback faced by SVM such as trade-off parameters ( , , 2

C
) selection, instead it requires only two hyper-parameters ) , ( 2 while training the model.According to Suykens et al (2001), the equality constraints of LSSVM can act as recurrent neural network and nonlinear optimal control.Due to these nice properties, LSSVM has been successfully applied for classification and regression problems, including time series forecasting.See Van Gestel et al (2004) for detailed discussion on classification performance of LSSVM and Ye et al (2004) for predictive capability of LSSVM in chaotic time series prediction.Van Gestel et al (2001) proposed to predict time varying volatility of DAX 30 index by applying Bayesian evidence framework to LSSVM.The volatility model is constructed based on inferred hyperparameters of LSSVM formulation within the evidence framework.The proposed model provided a better predictive performance than GARCH(1,1) and other AR(10) models in term of MSE and MAE.
In this paper, we aim at comparing the LSSVM method with the classical GARCH(1,1), EGARCH(1,1) and GJR(1,1) models to forecast financial volatilities of ASEAN stock markets as a new concept to be investigated.The hybrid models denoted as GARCH-LSSVM, EGARCH-LSSVM, and GJR-LSSVM are constructed by using lagged terms as input and present term as output which corresponds to the parametric models.The hybrid models are not the same as the volatility model proposed by Van Gestel et al (2001) but they are similarly built according with the results by Donaldson & Kamstra (1997) and Bildirici & Ersin(2009) with neural network approach, and Perez-Cruz et al (2003) with SVM method.In our experiment, we consider two stage forecasts for the whole year 2007 as first period and 2008 as the second stage which cover global financial crisis period.Several metrics MAD, NMSE, HR, and linear regression R squared are employed to measure the model performances.The paper is organized as follows.Next section briefly reviews LSSVM formulation.Section 3 discusses volatility modeling of hybrid models based on GARCH, EGARCH and GJR.Section 4 illustrates the experimental results and the final section is about the conclusion.

Least squared support vector machines
In LSSVM formulation, the data are generated by nonlinear function which may be approximated by another nonlinear function The . (3) Here the equality constraint is used in LSSVM instead of the inequality constraint in SVM.Lagrangian can be defined to solve the above minimization problem as where i denotes Lagrange multipliers (also called support values).From the Karush-Kuhn-Tucker (KKT) theory, a system of equations is obtained as the following Note that sparseness is lost from the condition i i e .By eliminating w and i e , the following linear system is written as follow satisfies Mercer's condition and the LS-SVM model for estimating function is obtained as (6) (.,.) K is the Mercer's kernel function representing the high-dimensional feature space that nonlinearly mapped from the input space.In this work, Gaussian kernel or RBF(radial basis function) is used as it tends to give a good performance under general smoothing assumptions.The kernel is defined as . The kernel and regularized parameters ) , ( 2are tuned by gridsearch technique to avoid overfiting problem.Matlab toolbox is used in the whole experiment.

Model building
Let t P be the stock price at time t .Then ) / ( log .100 denotes the continuously compounded daily returns of the particular stock at time t .Let 1 t F be the past information set available up to time 1 t ; this information set contains the realized values of all previous relevant variables.The expected return at time t after observing the past information up to 1 t defined as The volatility to investors investing in the particular stock at time 1 t is denoted as follow where (.)  f and (.) h are well defined functions with 0 (.) h .
Then the return of stock t y can be modelled Here, we aim at estimating volatility (or conditional variance of return) in ( 9) by kernel regression (called semi-parametric method) based on parametric models of GARCH, EGARCH and GJR.One particular approach of the kernel regression is LSSVM(least square support vector machine) presented in the previous section.

Experimental Analysis
We examine three stock price indexes from three major ASEAN stock markets including Straits time index of Singapore stock market, PSEI of the Philippines and KLCI of Kula Lumpur stock market.Each stock index price is collected from Yahoo Finance website and is transformed into log return as in ( 7) before making analysis.1 plots price and log return of each index series for the entire sample.Though movement of the index prices of the three markets is almost in similar direction, the returns behave differently.From the plots, we can see some high volatility on log return series after financial crisis in ASEAN in 1997 and during the recent crisis of global market crashes; this is obviously seen that the plots of each stock price fall down sharply during 2008.

Estimation results
Three parametric models GARCH, EGARCH and GJR are fitted to all return series by ( 12), ( 14) and ( 16) respectively.Each model is estimated twice for each market return as first stage and second stage estimations with updating in-sample.From the first stage of the cost column in STI index, smallest cost falls to GARCH-LSSVM and the largest value goes to EGARCH-LSSVM.For KLCI series, GJR-LSSVM generates the least cost and the largest cost is from GARCH-LSSVM, but these errors are not far from one another.Finally, PSEI series produces the smallest error to EGARCH-LSSVM and the error is a bit far from the errors driven by GARCH-LSSVM and GJR-LSSVM.
In the second stage, training mean square error for STI and PSEI are analogue to the mean square errors for STI and PSEI in the first stage respectively; that is the STI is in favour with GARCH-LSSVM and PSEI produces the smallest cost while getting trained by EGARCH-LSSVM.For Kula Lumpur stock market, GARCH-LSSVM gives the smallest cost, but EGARCH-LSSVM still produces the highest value of cost like before.In the next section, these hybrid models will be performed to forecast volatility of the three markets and also be compared with the parametric approaches estimated in the previous section.

Forecasting results
The following Evaluation metrics are used to measure the performance of proposed models in forecasting of the three different stock markets volatilities.We also use linear regression to evaluate the forecasting performance of the volatility model.We simply regress square returns on a constant and the forecasted volatility for out-of-sample time point, n t ,..., 2 , 1 , . The t-statistic of the coefficients is a measure of the bias and the square correlation 2 R is a measure of forecasting performance.In this regression, the constant term 0 c should be close to zero and the slope 1 c should close to 1. Table 4.A and 4.B illustrate forecasting performances by different models for each market.The MAD, NMSE, HR and R squared with c0 and c1 are shown in the second to seventh columns.

First stage for 2007:
Beginning with STI series, the hybrid approaches perform better than parametric models for almost all metrics: MAD, NMSE, HR, and R squared.Only R squared criterion is in favour to EGARCH model that generates the highest value.Among all the models, EGARCH-LSSVM is best at a predictive performance because it provides highest HR (0.8273), smallest values of MAD and NMSE and it also satisfies to (c0 and c1) values which are not far from (0 and 1) respectively.Now by considering Kula Lumpur market, based on MAD and NMSE, the hybrid models are much better but in term of HR and R squared some semi-parametric models especially EGARCH-LSSVM is unable to defeat its counterpart, EGARCH.The KLCI return is well modelled by EGARCH like STI case since it generates least NMSE, highest R squared and HR among the others.Looking at c0 and c1 criteria, the hybrid models are more satisfied than the parametric approaches.For PSEI, the semi-parametric models are superior to the parametric models for all cases.

Second stage for 2008:
The STI return series is well forecasted by EGARCH model like its previous performance in the first stage forecast due to the highest values of HR and R squared, which can be seen from the Table 4.B.For the other formed GARCH and GJR, LSSVM is better than the parametric models.From the Table 4.B, the values of c0 and c1 of 2.25), though generated best performance, deviate far from the appropriate norm (0, 1) respectively due to the global financial market crashes.However, EGARCH-LSSVM and other LSSVMs are more resistant in forecasting performance to the crashes since their c0 and c1 are not much far from 0 and 1 respectively.For KLCI and PSEI, hybrid approaches beat all parametric models for all criteria and EGARCH-LSSVM is superior among the others.These evidences can argue that LSSVM is more robust than the parametric models in forecasting volatility in spite of the high volatile situation during the global financial market crashes.Figures 2, 3, & 4 plot the out of sample forecasts by parametric models of GARCH, EGARCH and GJR and the corresponding hybrid models for STI, KLCI and PSEI respectively.From the plots, the forecast lines by hybrid models capture more extreme points than the parametric models do and therefore they improve forecasting performance.Noticeably, the LSSVM algorithm here has not been imposed the sparsity and robustness conditions proposed by Suyken et al (2002).

Conclusion
In this paper, we combine Least square support vector machine (LSSVM) with GARCH(1,1), EGARCH(1,1) and GJR(1,1) models as a hybrid approach to forecast leverage effect volatility of ASEAN stock markets.To check the performance of the proposed models, we employ the corresponding parametric models to compare with the hybrid models.The forecasts are conducted twice in which the whole year 2007 is treated as the first stage and the second stage is for 2008 including the recent global financial crisis period.From the experimental results, it is found that the hybrid models are resistant and robust to the high volatile situation of the financial market crashes and hence they generate improved forecasting performance.This supports the general idea that LSSVM is the promising machine learning system which is good at estimating nonlinear function without assumptions on data property in time series applications.

Figure 2 .
Figure 2. Volatility Forecasts of Singapore Stock Market (STI).Note: Plots in left part are referred to the First stage forecast in 2007 (before crisis) and plots in the right side are referred to the second stage forecast for whole 2008 (during financial crisis).Small dot line is forecasted by parametric models (GARCH, EGARCH and GJR) while dash line is obtained by hybrid approaches.

Figure 3 .Figure 4 .
Figure 3. Volatility Forecasts of Kula Lumpur Stock Market (KLCI).Note: Plots in left part are referred to the First stage forecast in 2007 (before crisis) and plots in the right side are referred to the second stage forecast for whole 2008 (during financial crisis).Small dot line is forecasted by parametric models (GARCH, EGARCH and GJR) while dash line is obtained by hybrid approaches.
Table1reports the in-sample and out-of-sample periods of each market for two stages, basic statistics of the data and diagnostics.From the Table1, we see that mean of all returns is close to zero.Two indexes KLCI and PSEI have positive skewed returns while STI produce negative skewed coefficient.The excess kurtosis appears in all series and the largest is from KLCI (54.152).The Jarque Bera statistics strongly suggest that all returns are non normal.Ljung Box test for squared return at lag 20 and Engle LM test significantly indicate all return series exhibit ARCH effects; that means the homoscedasticity hypothesis is strongly rejected.This shows the presence of volatility clustering and the leverage effects that could be caused by the excess kurtosis.Figure

Table 2 .
A, 2.B, and 2.C present the model parameters and their corresponding standard errors in brackets.The stationary conditions of the models hold for all series.Furthermore, significance of negative leverage coefficients in EGARCH and positive leverage coefficients of the corresponding GJR indicate the presence of asymmetric effects to the returns for both stages which may be caused by global financial crisis.By log likelihood, AIC and BIC criteria in Table2.A and 2.C, GJR model is more adequate to the both stage estimations of STI and PSEI returns.For the KLCI return in Table2.B, the GJR model fits well to the in-sample data at the first stage but the second stage estimation data is favour to EGARCH model according to the log likelihood, AIC and BIC.Now we proceed to estimation results obtained from training the least square support vector machine.First, return series from all indexes are transformed into input and output format and then get them trained by LSSVM algorithm in (6) so as to get the estimated nonlinear function in (13) for GARCH hybrid, (15) for EGARCH case and (17) for GJR model.The training results are summarized in Table3.A and 3.B for first and second stages respectively.Each second column of the table 3.A and 3.B shows the costs of training measured by the mean square errors.The third and fourth columns display the optimal regularized parameters and optimal kernel parameters obtained by gridsearch technique while training.The last column tells the bias term of resulted function obtained by the LSSVM.

Table 1 .
Descriptive statistics of each return series a JB is the Jarque Bera test for normality b Q 2 (20) is the Ljung-Box test for squared returns c ARCH-LM is the Engle's Lagrange Multiplier test for conditional heteroskedasticity with 12 lags

Table 2 .
A. MLE estimation of the Parametric models for Straits times index * significant at the 1% level , ** significant at 5% level.

Table 2 .
B. MLE estimation of the Parametric models * significant at the 1% level , ** significant at 5% level.

Table 2 .
C. MLE estimation of the Parametric models

Table 3 .
A. Training results by LSSVM for 1 st stage

Table 3 .
B. Training results by LSSVM for 2 st stage

Table 4 .
A. Forecast performances of ASEAN stock volatilities by different models for 2007 Note: higher R squared and HR is preferred, while smaller values of MAD and NMSE indicate the forecasted volatility is closer to the actual values.The coefficients of c0 and c1 should be close to (0, 1) respectively showing small forecasting errors.

Table 4 .
B. Forecast performances of ASEAN stock volatilities by different models for 2008 Note: higher R squared and HR is preferred, while smaller values of MAD and NMSE indicate the forecasted volatility is closer to the actual values.The coefficients of c0 and c1 should be close to (0, 1) respectively showing small forecasting errors.