Predict GARCH Based Volatility of Shanghai Composite Index by Recurrent Relevant Vector Machines and Recurrent Least Square Support Vector Machines

A new machine learning method so called Relevant Vector Machine (RVM) is an efficiently learning technique for classification and regression problems, including financial time series forecasting. One of the main advantages is that the model is treated by Bayesian approach and its functional form is identical to a powerful prediction tool Support Vector Machine. In this paper, we propose a new recurrent algorithm of the relevant vector machine to predict GARCH (1,1) based volatility of Shanghai composite index. The recurrent support vector machine, recurrent least square support vector machine and normal GARCH (1,1) models are also employed to make a comparison with the proposed model. Our empirical results show that the proposed approach generates superior forecasting performance.


Introduction
Volatility is important for pricing derivatives, calculating measure of risk and hedging.A large number of time series based volatility models have been developed since the introduction of ARCH model of Engle (1982).See Poon and Granger (2003) for review and references.Ability of predicting volatility accurately is a crucial job for stock market researchers and practitioners.Recently, machine learning approaches have been introduced to predict volatility based on various models of GARCH family since they are expected to generate high accuracy of prediction.The empirical results also show that using machine learning approaches combined with GARCH models yield better results.For instance the improved results of forecasting performances by some machine learning techniques can be found in Donaldson and Kamstra (1997) for Neural Network based GJR model, Perez-Cruz et al (2003): SVM based GARCH, Tang et al (2008Tang et al ( , 2009) ) for SVM based GARCH with wavelet and spline wavelet kernels, and Bildirici and Ersin (2009) for Neural Network based on nine different models of GARCH family.Chen et al (2008b) applied SVM to model and forecast GARCH(1,1) volatility based on the concept of recurrent SVM in Chen et al (2008a), following from the recurrent algorithm of neural network and least square support vector machine of Suykens and Vandewalle (2000).The model was shown to be a dynamic process and capture longer memory of past information than the feed-forward SVM which is just static.
Based on the recurrent SVM result of Chen et al (2008aChen et al ( , 2008b)), in this paper we propose the recurrent algorithm for relevant vector machine (RRVM).The RVM, an alternative method of SVM, is a probabilistic model introduced by Tipping in 2000.The RVM has recently become a powerful tool for prediction problems.One of the main advantages is that the RVM has functional form identical to SVM and hence it enjoys various benefits of SVM based techniques: generalization and sparsity.On the other hand, RVM avoids some disadvantages faced by SVM such as the requirement to obtain optimal value of regularized parameter, C, and epsilon tube; SVM needs to use Mercer's kernel function and it can generate point prediction but not distributional prediction in RVM (Tipping, 2001).Our goal here is to compare the proposed recurrent RVM model with other competitive approaches including recurrent SVM, recurrent LSSVM and normal GARCH(1,1) to forecast volatility of Shanghai composite index.It is important for us to forecast the China stock market volatility more accurately.Recently the potential growth of China stock market has attracted foreign and local investors.Annual rate of return for the Shanghai composite index was 81.7% during 2006 and the rapid growth of the rate of return has led to the increasing volatility of this emerging China stock market.
The remainder of the paper is organized as follow.Next section summarizes LSSVM and RVM formulations as well as their recurrent algorithms.Section 3 deals with empirical analysis.The last section of the paper is for conclusion.

Least Square Support Vector Machines
LSSVM approximates the data {x i , y i } of the form y i = f (x i ) + e i for i = 1, • • • , n by a nonlinear function defined as The model parameter w is called weight and e i is random noise.Output y i ∈ R can be referred as volatility, while the input vector x i ∈ R n may consist of lagged volatility.Mapping φ(•) : R n → F is nonlinear function that maps the input vector x into a higher dimensional feature space.Estimating the function by the LSSVM is involved in the optimization problem formulated as, Suykens (2000), Here the equality constraint is used in LSSVM instead of the inequality constraint in SVM.Lagrangian can be defined to solve the above minimization problem as where α i denotes Lagrange multipliers (also called support values).From the Karush-Kuhn-Tucker theory, a system of equations is obtained as the following (2) By eliminating w and e i , the linear system is written as follow where By solving (3), the LS-SVM model for estimating function is shown to be In this case the complexity of computing the nonlinear mapping φ is avoided.Gaussian kernel or RBF (radial basis function) ) is used in our experiment as it tends to give a good performance under general smoothing assumptions.

Relevance Vector Machines
For a given training data {x i , t i } n i=1 , the goal is to seek a function indexed by parameter w: where Note that, the function in ( 5) is identical to SVM based function and it describes the mapping relation between the input vector x and target t with t i = x i , w + ε i , where ε 1 , • • • , ε n are assumed to be independent Gaussian distribution with mean zero and variance σ 2 .
In notation, p(ε) Thus the likelihood of the complete dataset can be written as Maximum likelihood estimation of w and σ 2 from (6) will generally lead to overfitting problem.To avoid this advantage, zero mean Gaussian prior over the weights is introduced, where α i is the i th element of vector hyperparameter α assigned to each model parameter w i .
By Bayes rule, and mean where To evaluate μ and , we need to obtain α and σ 2 which maximize By using uniform prior, the problem is to maximize the term p(t/α, σ 2 ) with respect to α and σ 2 : .The hyperparameters are estimated by iterative algorithm and can be obtained as where μ i is i th posterior mean weight from (10) and γ i ≡ 1 − α i ii ∈ [0, 1] can be interpreted as a measure of well determinedness of each parameter w i .Whereas ii is i th diagonal element of the posterior weight covariance in (9).
During the re-estimation, many α i tend to infinity such that w will have a few nonzero weights that will be considered as relevance vectors and analogous to the support vectors of SVM.Thus the resulting model enjoys the properties of SVM such as sparsity and generalization.

The predictive distribution for a new input x
which is easily computed due to the fact that both integrated terms are Gaussian, implying a Gaussian form too with mean y * = μ T φ(x * ) and variance σ 2 * = σ 2 MP + φ(x * ) T φ(x * ).So the predictive mean is y(x * ; μ) and the predictive variance composes of two variance components.

Recurrent Relevance Vector Machines
The recurrent input/output model which is nonlinear output error model is defined as where ỹt denotes the estimated output and f is a smooth nonlinear mapping.u t ∈ R is input of any deterministic nonlinear dynamic system and y t ∈ R is output.
The corresponding feed-forward input/output model is represented as The models in ( 11) and ( 12) can be trained by algorithm of SVM and LSSVM (Suykens and Vandewalle 2000) and hence they can be further trained by the algorithm of RVM since the RVM is of identical form to SVM (Tipping, 2001).Thus we denote RRVM, RSVM and RLSSVM to be the recurrent vector machines obtained by fitting model in ( 11) by algorithm of RVM, SVM and LSSVM respectively.
The parameterization of f in (12) by the RVM (or LSSVM) is static because there is no recursion in the variable ỹt .
Hence the recurrent models act as nonlinear dynamic process and capture longer memory of past information than the feed-forward models and the parametric models.See (Suykens and Vandewalle, 2000;and Chen et al, 2008a) for detailed discussion on Dynamic system acted by the recurrent LSSVM and recurrent SVM respectively.For simplicity, ARMA model is illustrated as follow: Linear ARMA(1,1) model estimated by MLE (Maximum likelihood estimation) is described as The nonlinear ARMA(1,1) model estimated by the RRVM (or RLSSVM) can be expressed Then the feed-forward RVM (or LSSVM) corresponding to nonlinear AR(1) is written as Now we turn to GARCH model which is the volatility modeling for asset return.GARCH(1,1) is the most popular form for modeling and forecasting the conditional variance of return or volatility, (Hansen & Lunde, 2005).Therefore, we consider GARCH(1,1) model throughout our paper.
Let P t be stock price at time.Then y t = 100.(lnP t − ln P t−1 ) denotes the continuously compounded daily returns of the underlying assets at time t.
AR(1)-GARCH(1,1) is defined as Note that conditional variance of ε t is given by σ Bollerslev (1986), the conditional variance of ε 2 t is the ARMA process given as Here w t can be shown to be white noise (or error).The parameters are assumed to be positive to guarantee positive conditional variance: ω > 0 α 1 ≥ 0, β 1 ≥ 0 and the stationary condition of the covariance requires α 1 + β 1 < 0. {z t } is a sequence of (iid) independent identically distributed random variables with mean 0 and variance 1.Its one step ahead forecast is 16) and ( 18), the corresponding nonlinear GARCH model can be formulated as the following: where the functions h(.) and f(.) are estimated by feed-forward RVM and by recurrent RVM respectively.Below is the illustration of recurrent algorithm of RVM (or LSSVM) for modeling and forecasting GARCH model.
Step 1: Fit RVM (or LSSVM) to the return y t as AR(1) format in the full sample period N, Step 2: recursively run the recurrent RVM (or LSSVM) for squared residuals ) + w t to obtain n one-step-ahead forecasted volatilities: N 1 +n .For each of n estimations, set the residuals of w t−1 to be zero at the first time in the Step 2, and then run the feed-forward RVM (or LSSVM) to obtain estimated residuals.Using the estimated residuals as new w t−1 inputs, this process can be carried out repeatedly until the stopping criterion is satisfied.Unlike the parametric case, by using the proposed approach we don't need any assumption on the model parameters for stationary condition.

Data description
We examine Shanghai Composite Index (SSECI) of China Stock Market in the experiment.The stock index price is collected from Yahoo Finance and is transformed into log return before making analysis.The whole sample of size 1564, spanned from 01 Jan. 2001 to 29 Dec. 2006, is used in the experiment to check the predictive capability and reliability of the proposed models.The sub-sample of size 1305, from 04 Jan. 2001 to 31 Dec. 2005, is taken for the in-sample estimation and full one year of 260 points spanned from 02 Jan. to 29 Dec. 2006 is reserved for out of sample forecasting.Table 1 displays the descriptive statistics of the return series of SSECI.The mean of the return is close to zero.The series is positive skewed though the skewness coefficient is not so large in magnitude.The kurtosis value (5.6896) indicates the return has excess kurtosis than the normal value, 3. The large value of Jarque Bera statistic also claims that the return is non-normally distributed.Finally, Ljung Box test of squared return strongly rejects the hypothesis of no ARCH effect.Based on the diagnosis, we can conclude that the return series exhibit volatility clustering and leptokurtic pattern.Therefore it is very suitable to model and forecast the return series by GARCH(1,1) model.We will in next subsection fit this return series by normal GARCH and nonlinear GARCH models.

In sample estimation or training results
We first fit the return series to equations ( 16) and ( 17) to obtain GARCH(1,1) model.The estimation result obtained from Maximum likelihood estimation on GARCH(1,1) with normal innovation is given below: The stationary condition holds and the MLE estimates with their corresponding standard errors (0.028, 0.025, 0.015) are all significant.These imply that the model is appropriate and can be further applied for out-of sample forecasting.
Now we turn to consider our proposed model recurrent relevance vector machine, recurrent support vector machine and recurrent least square support vector machine.The proposed models must be trained using the above algorithm stated in Step 1 and Step 2. ].The optimal parameters are obtained to be (C, γ) = (2 5 , 2 −4 ) which corresponds to the smallest training error 1.425.Here the epsilon tube is taken to be 0.005.

Out of sample forecasting
The following evaluation metrics are used to measure the performance and reliability of the proposed models while they are applied to forecast Shanghai composite index volatility: Mean Absolute Deviation (MAD), and Normalized Mean Square Error (NMSE), and Hit Rate which are defined as the following Also linear regression technique is employed to evaluate the forecasting performance of the volatility models.We simply regress square return on a constant and the forecasted volatility for out-of-sample time point The square correlation is a measure of forecasting performance.Table 4 summarizes the forecasting performance based on four measures defined above, MAD, NMSE, R square and Hit Rate.From the table 4, we can see that recurrent RVM generates smallest values of MAD (1.3422) and NMSE (0.7179) but largest value of R square (0.6696) and Hit Rate (0.8416), hence outperforms the other models.Whereas recurrent LSSVM and SVM, they provide better performance than GARCH(1,1) for all cases.Yet, the two models are still competitive.The recurrent SVM is better than recurrent LSSVM based on MAD and R square only, but in term of NMSE and Hit Rate, the recurrent LSSVM is better than the recurrent SVM. Figure 2 plots one step ahead forecasts by the proposed and normal GARCH(1,1) against actual values (upper plot) and the various forecasts by all forecasting models (lower plot).From the bottom plots we can see that though the RVM approach generates better forecasting performances, the difference among the other machine learning techniques is not large; that means the forecasting lines by the three recurrent approaches are almost overlapped.

Conclusion
In this paper, we propose recurrent relevance vector machine based on GARCH to forecast volatility of Shanghai composite index.Other corresponding machine learning approaches including recurrent LSSVM and RSVM, as well as normal GARCH(1,1) are employed to make a comparison with the proposed model.The experimental results suggest that the recurrent RVM yields better predictive capability than the other models since it is a dynamic process and can capture longer memory of past information compactly.Furthermore, the RVM takes more advantages than SVM and LSSVM.

Figure 1 .
Figure 1.Plot of Alpha (left) and Relevance Vectors (right) obtained from Training RRVM Note: The horizontal line shows the number of alphas (left figure) and the number of relevance vectors (right figure) while the vertical axis indicates the values of the alpha and relevance vectors.

Figure 2 .
Figure 2. Plots of Volatility Forecasts by GARCH and Recurrent RVM against Actual values Note: The small dot line is actual value.The dash line is the forecast values by GARCH model and the thick line is the forecasts by recurrent relevance vector machine.
Table 2, 3, 4 illustrate the training results by RRVM, RLSSVM and RSVM respectively.From the Table2the RRVM produces 0.46203 as smallest training error and 0.50961, the variance, as well as 3.7291 to be the optimal value of RBF kernel parameter.The RVM requires 136 relevant vectors with the same number of alphas while training.Figure1plots the values of 136 alphas (left) and the values of 136 relevant vectors (right).By considering Table3, RLSSVM needs 108.0387 as the optimal value of the regularized parameter and 6.55708 as the RBF kernel parameter while training.But it just generates 0.2744 as the smallest training error.The value of 8.1306 is the constant term of the estimated function by LSSVM.Finally, Table4visualizes the training process of RSVM.Gridsearch technique is used to select the optimal values of cost, C, and RBF kernel parameter, γ which are in the same range [2 −5 , 2 5 a t ) 2 t = y 2 t actual values, p t = σ2t forecasted volatility and n is out of sample size.

Table 1 .
Descriptive statistics of return series

Table 4 .
Training result from Recurrent SVM

Table 5 .
Forecasting performance based on evaluation metrics by different models