Financial Volatility Forecasting by Nonlinear Support Vector Machine Heterogeneous Autoregressive Model : Evidence from Nikkei 225 Stock Index

Support vector machines (SVMs) are new semi-parametric tool for regression estimation. This paper introduced a new class of hybrid models, the nonlinear support vector machines heterogeneous autoregressive (SVM-HAR) models and aimed to compare the forecasting performance with the classical heterogeneous autoregressive (HAR) models to forecast financial volatilities. It was observed through empirical experiment that the newly proposed hybrid (SVM-HAR) models produced higher predicting ability than the classical HAR model.


Introduction
Volatility, the standard deviation of the continuously compounded returns of a financial instrument over a specific time horizon, is both the boon and bane of all traders, you can't live with it and you can't really trade without it.Most of the financial researchers are mainly concerned with modeling and forecasting volatility in asset returns to quantify the risk of financial instruments over a particular time period so that the risk manager and practitioners can realize whether their portfolio will decline in the future and they may want to cell it before it becomes too volatile.Therefore, volatility plays the key roles in the theory and applications of asset pricing, optimal portfolio allocation, and risk management.
Researches on time varying volatility using the time series models have been active ever since Engle (1982) introduced the ARCH model.The GARCH model, generalized by Bollerslev (1986), has been extended in various directions and these extensions recognize based on the various researcher's empirical evidences that there may be important nonlinearity, asymmetry, and long memory properties in the volatility process.The popular extensions can be referred to Nelson's (1991) EGARCH model, Glosten, Jaganathan, and Runkle's (1993) GJR-GARCH which both account for the asymmetric relationship between stock returns and changes in variance (see, e.g., Black 1976, the beginning study of the asymmetric effect and Engle and Ng, 1993 for further discussion).Engle's (1990) AGARCH; Ding, Granger and Engle's (1993) APARCH; Zakoian's (1994) TGARCH; and Sentana's (1995) QGARCH models also have been developed for the flexibility of the models.The stochastic volatility (SV) modeling capitalized on and often contributed in turn to the concurrent development in the Bayesian statistical analysis using Markov chain Monte Carlo procedure (see, e.g., Shephard (2005)).
When GARCH type and SV latent volatility models are used, a well established result in the financial time series literature is that the standardized returns do not have a Gaussian distribution.The excess kurtosis factor of time series motivates the use of heavy-tailed distributions.For example, Student's t distribution has been used by Bollerslev (1987), GED by Nelson (1991), both Student's t and GED by Hsieh (1989) as alternative distributional models for innovations.The researchers have found that returns usually exhibit empirical regularities including thick tails, volatility clustering, leverage effects (see, e.g., Bollerslev et al. 1994).Andersen et al. (2000aAndersen et al. ( , b, 2001Andersen et al. ( , 2003) ) showed that the distribution of the standardized exchange rate series was almost Gaussian when the realized volatility (RV) was used.Furthermore, the logarithm of the realized volatilities was also nearly Gaussian.It was also corroborated for stock returns in Andersen et al. (2001a).Other literatures on realized volatility can be referred along with many researchers to Aït-Sahalia and Mancini (2006), Ghysels and Sinko (2006), Corradi et al. (2006), and recently Corsi et al. (2008).
In addition, there is significant evidence of long memory in the time series, which has been conventionally modeled as an ARFIMA (p,d,q) process (see, e.g., Andersen et al. 2000aAndersen et al. ,b, 2001aAndersen et al. , 2003)).A large number of papers in the RV literature employ the ARFIMA model without a conditionally heteroskedastic error specification to fit daily RV series (see, e.g., Oomen 2001, Giot andLaurent 2004).Corsi et al. (2001) and Corsi (2009) proposed the Heterogeneous Autoregressive Realized Volatility (HAR-RV) model as an alternative to the ARFIMA model and it has quickly become popular for modeling the dynamics of RV and other related volatility measures due to its ease estimation and extendability of the baseline model.The HAR-RV model employs a few predictor terms, the past daily RVs averaged over different horizons (typically a day, a week, and a month), and is capable to producing slow-decay patterns in autocorrelations exhibited by many RV series.Another recent development in the RV literature is the approach due to Barndorff-Nielsen and Shephard (2004Shephard ( , 2006)), Andersen et al. (2003Andersen et al. ( , 2007) ) of decomposing the RV into the contribution of continuous sample path variation and that of jumps.Extending the theory of quadratic variation of semimartingales, Barndorff-Nielsen et al. (2006) provided an asymptotic statistical foundation for this decomposition procedure under very general conditions.
However, all of the models do require specified distribution of innovations in order to estimate the model specification and to appropriately forecast future values.The semi-parametric approaches do not require any assumptions on data property (return distribution).These models have been successfully used for modeling and forecasting time series including volatility.One of such models is Support Vector Machine (SVM), introduced by Vepnik (1995), that guarantees to obtain globally optimal solution (see, e.g., Cristianini and Shawe-Taylor, 2000), which solves the problems of multiple local optima in which the neural network usually get trapped into.Pẻrez-Cruz et al. (2003) predicted GARCH (1,1) based volatility by SVM and showed that the SVM-GARCH(1,1) model yielded better predictive ability than the parametric GARCH(1,1) model.Chen et al. (2008) proposed recurrent SVM as a dynamic process to model GARCH (1,1) based Volatility and showed through simulated and real data that the model produced better performance than MLE based GARCH (1,1) model.More recently, Ou and Wang (2010) proposed GARCH-LSSVM, EGARCH-LSSVM and GJR-LSSVM hybrid models based on modification of Suykens and Vandewalle (1999) to forecast the leverage effect volatilities of ASEAN stock markets.They showed that these models provided improved performances in forecasting the leverage effect volatilities especially during the recently global financial market crashes in 2008.This paper, closer to Andersen et al. (2003Andersen et al. ( , 2007)), aims to apply the SVM approach on HAR-RV models to forecast empirically the daily RV of the Nikkei 225 index.Watanabe and Yamaguchi (2007), Ishida and Watanabe (2009) among other researchers studied the RV of the Nikkei 225 index and reported empirical findings.But, to the author's knowledge, this paper is the first to apply the SVM-HAR-RV model to RV literature.
The plan for the rest of the paper is as follows.In section 2, we briefly discuss the realized volatility, realized bi-power variation, and jump component extraction.Section 3 describes the data and summary statistics.Section 4 describes the SVM volatility model.Section 5 describes different HAR-RV models.Section 6 reports the forecasting results of the RV and Section 7 concludes with suggestions for further research.

If we consider a simple diffusion process
(1) where is the instantaneous log-price, is a standard Brownian process and is the standard deviation of , which may be time-varying but is assumed to be independent of .Then the volatility for day t is defined as the integral of over the interval , 1 i.e., , which is known as integrated volatility and it is unobserved.Let the discretely sampled Δ-period returns be denoted by, ,∆ ∆ .If the process (in our case the log of Nikkei 225 index level process) is a continuous semimartingale then under mild regularity conditions, is the t-th day realized variance since t has the daily unit and ∆ is integer.We will hereafter use the terms realized volatility or realized variance interchangeably, or their common abbreviation RV.
Again, if the process is semimartingale with finite-activity jumps, i.e., only a finite number of jumps occurring in any finite time interval, such as Poisson jumps, then the realized variance converges to the quadratic variation, which can be decomposed as, where k(s) refers to the size of the jump occurring at time s.Barndorff-Nielsen andShephard(2004, 2006) showed that even in the presence of jumps the bipower variation where 2 , holds under mild conditions and proposed to use as an estimator for ∑ . is known to take non-zero, small values very frequently due to measurement and possibly due to the presence of jumps infinite-activity types.Andersen et al. (2007) introduced shrinkage estimator for the jump contribution based on the asymptotic distribution theory developed by Barndorff-Nielsen andShephard (2004, 2006) and Barndorff-Nielsen et al. (2006) as where I is an indicator function, ∆ , is asymptotically standard normally distributed under the null hypothesis of no jumps, 2 , Φ Φ , the standard normal distribution function where α is usually set to the values such as .999so that can picks up only "significance jumps" and the realized tripower variation where 2 Γ 7 6 Γ 1 2 .The convergence result holds even in the presence of jumps.Andersen et al. (2007) introduced the shrinkage estimator for the continuous sample path variation as Φ Φ (9) Andersen et al. (2007) also proposed microstructure-noise-robust versions of and as The definitions of and will be modified as well.

Calculation of intraday returns and related realized volatility measures from minute-by-minute Nikkei 225 data
This paper measures the realized volatility of the Nikkei 225 index for the sample of the period 11 March 1996 to 30 September 2009.First, constract a "five-minute (percentage) returns" series by taking the five-minute log differences multiplied by hundred from the minute-by-minute data.This choice is made to mitigate the effect of microstructure related noise and increase the precision of volatility measures.(see, e.g., Ishida and Watanabe, 2009;Watanabe and Yamaguchi, 2007).
Our database includes every minute prices of the Nikkei 225 stock index for both sessions.This paper first extracts prices for 9:01, 9:05, 9:10,........,11:00 in the morning session and for 12:31, 12:35, 12:40,…….,15:00 in the afternoon session.Sometimes, the last transaction price for morning (and/or afternoon) session is observed slightly after 11:00 (and/or 15:00).In such cases, the last prices instead of prices at 11:00 (and/or 15:00) are used.Next using these prices the five-minute returns as mentioned in section 2 are calculated.There are 54 five-minute returns for a typical trading day in total, 24 from the morning session and 30 from the afternoon session.
Given the recent literature on the market microstructure noise effect on realized volatility estimation, the optimal choice of sampling frequency as studied by Bandi andRussell (2003, 2008) has been considered here.The sampling frequency M opt (the number of observations per day) is calculated as (see also Zhang et al., 2005 and Clements et  We cannot calculate the 5-minute, 15-mimute and optimally-sampled returns for the non-trading hours including lunch time and overnight period though we can calculate the lunch time and overnight returns by considering the last price of the morning session and the first prices of the afternoon session, and the last price of the afternoon session and the first price of the next morning session but following Hansen and Lunde (2005), we drop this idea and scale the realized volatility as follow, where ∑ ∑ , where ∑ , and T is the number of complete trading days.
In my sample period, the first trading in the second session from January 1, 2006 to April 21, 2006 observed at 13:01.Therefore, I remove these trading days along with the sessions from half trading days including the first and the last trading days of each year.The remaining number of complete trading days, T is 3279.We calculate and by using this 3279 days data for the four series.

Properties of the realized volatility and related measures
Summary statistics of daily returns, the daily RV, it's standard deviation form, i.e., ⁄ , the logarithmic form i.e, ln( RV), the daily jump, microstructure noise robust version of daily jump (MSNR-Jump) (where has been calculated according to Andersen et al., 2007) series and their standard deviation and logarithmic) are presented on Table 1a.The summary statistics of continuous path component, significant jump series, the microstructure noise robust version of the continuous path component (MSNR-C) and significant jump (MSNR-SJ) series due to Andersen et al. (2007), and their standard deviation and logarithmic form are presented on Table 1b.In addition to the sample skewness and kurtosis, the Jarque-Bera (JB) statistic for testing normality and the Ljung-Box statistics of order 5, 10 and 22 (corresponding to roughly one week, two weeks and a month) for testing serial correlations up to their respective order are also presented on the Tables.
From Table 1a, we observed that the unconditional distribution of the daily return series is negatively skewed but highly significantly nonnormal with high positive kurtosis.The LB statistics also indicate that the series is significantly serially correlated.We also observed that the daily RV, Jump and MNSR-Jump series are highly unconditionally nonnormally distributed with large positive values of skewness and kurtosis and highly significantly serially correlated.The average value of Jump and MSNR-Jump are 1.471 and 1.504 respectively with positive minimum values, that implies more than one jump occurred in every single days.
The square-root transformation reduces the deviation from normality but still huge.The log transformation brings down the sample skewness and kurtosis values for the RV series but still significantly nonnormal.All the transformed series remain highly significantly serially correlated.
Looking at the summary statistics from Table 1b, where 0.999 in ( 7) and (9), we observed that the average of the significant jump and MSNR-SJ are slightly reduced but still greater than one while the minimum values reduced to zero.The Jauque-Bera statistic shows the strong evidence of highly significant nonlinearity for all series and the LB statistic shows the strong evidence of highly significant serial correlation.
Figure 1.shows the daily RV, Jump, MSNR-Jump, Significant Jump and MSNR-SJ.We visually observe few big jumps in the initial stage and the biggest jumps in the ending part of our sample period, the period of global financial market crashes.

The Support Vector Machines (SVMs)
The Support Vector Machines (SVMs) were introduced by Vapnik (1995) based on the statistical learning theory, which had been developed over the last three decades by Vapnik, Chervonenkis and others (see, e.g., Vapnik 1982Vapnik , 1995) ) from a nonlinear generalization of the Generalized Portrait algorithm.SVMs were developed to solve the classification problem, but recently they have been extended to the domain of regression problems (e.g., Vapnik et al.1997).The SVMs usually map data to a high-dimensional feature space and apply a simple linear method to the data in that high-dimensional space nonlinearly related to the input space.Moreover, even though we can think of SVMs as a linear algorithm in a high-dimensional space, in practice, it does not involve any computations in that high-dimensional space (see, e.g., Karatzoglou and Meyer 2006).The terminology for SVMs can be slightly confusing in the literature.In few literatures, SVM refers to both classification and regression with support vector methods.In this paper, the tern SVM will be used for the Nonlinear Support Vector Regression (NL-SVR).The mathematical formulation of SVM is as follows, In the -insensitive support vector regression of Vapnik (1995), our goal is to find a function that has an deviation from the actually obtained targets for all training data, and at the same time, is as flat as possible.Suppose takes the following form where X is the space of the input patterns and ., .denotes the kernel function.Flatness of the above model means need to find the small .One way to ensure this is to minimize the Euclidean norm, i.e., (see, e.g., Smola 1998).By applying the soft margin formulation of Cortes and Vapnik, (1995), and the Karush-Kuhn-Tucker (KKT) conditions (Karush, 1939, Kuhn andTucker 1951) one can estimate the above model as where b can be computed as where, 0 determines the trade-off between the flatness of the and the amount up to which derivations larger than are tolerated and , 0. See,e.g., Smola and Schölkopf (1998) for further discussion.A several numbers (see, e.g., Kernlab in R, MATLAB, etc) of statistical software are available to handle SVM method.
According to Cortes and Vepnik(1995), any symmetric positive semi-definite function that satisfies the Mercer's conditions can be used as a kernel function in the SVMs context.The Mercer's conditions are

HAR-RV models
The HAR-RV class volatility models proposed by Corsi (2003) on the basis of a straightforward extension of the so-called Heterogeneous ARCH (HARCH) class of models analyzed by Müller et al.(1997).
To sketch the HAR-RV model, define the multi-period realized volatilities by the normalized sum of the one-period volatilities, , Note that, by definition of the daily volatilities, , .Also, provided the expectations exist, , for all h.(see, e.g., Andersen et al. 2003Andersen et al. , 2007)).Also h=5 and h=22 will produce the weekly and monthly volatilities, respectively.The daily HAR-RV model of Corsi (2003) may then be expressed as where 1,2, … … … … , .Andersen et al.(2003Andersen et al.( , 2007) ) included the jump component, which has been explained in the Section 2, as an explanatory variable to the above model and introduced the new model as The standard deviation and logarithmic form of the above model respectively are After introducing the so-called shrinkage and microstructure-noise-robust estimator for the significance jump and continuous sample path variation, those have been discussed in the Section 2, Andersen et al. (2007) represented the HAR-RV-CJ model as where, , and S , The standard deviation and logarithmic version of this model respectively are See, Andersen et al. (2003Andersen et al. ( , 2007) ) for further discussion.

Modeling and RV with HAR-RV and SVM-HAR-RV models
This paper compared the forecasting performance of the SVM-HAR-RV class models with the classical HAR-RV class model.For this comparison, the in-sample period considered from March 11, 1996 to December 29, 2004 and out-of-sample period from January 5, 2005 to September 30, 2009, the period including global financial market crashes.First, estimated model (17) (using RV, standard deviation and logarithm of RV series), ( 18), ( 19), ( 20), ( 21), ( 22) and ( 23) by ordinary least squares (OLS) method and next, by SVM setting the values 1 and 0.1 to these models and named SVM-HAR-RV models.The R 2.12.0-win32 and R 2.12.0-win32'sKernlab package were used for both HAR-RV and SVM-HAR-RV class models.
Both class of models for horizons h = 1, 5, and 22 days were estimated.To compare the forecasting performance, the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Root Mean Square Percentage Error (RMSPE) and Mean Absolute Percentage Error (MAPE) were computed, which defined as follow: .
where | denotes one-day ahead realized volatility forecast.We evaluate these errors for 5 days ahead and 22-days ahead volatility forecast as well.
To save space, this paper did not include the estimation results of all models.The values of for different models are presented in Table 2a and Table 2b while the forecasting errors are presented in Table 3a and 3b.

Empirical Results
Let us first compare the results.It is observed from Table 2a, presents values of for different models, that the value of successively increases for the standard deviation of RV series than RV series and for the log RV series than standard deviation series for all the models and all different horizons but successively decreases for higher horizons in each and every series.In each series and horizon, the in-sample forecasting performance of SVM-HAR-RV models is remarkably better than HAR-RV models for each and every series and horizon.The out-of sample forecasting of SVM-HAR-RV models is also higher that the HAR-RV models for standard deviation and logarithmic series.Only the values of out-of-sample of HAR-RV model for the RV series are slightly higher than the SVM-HAR-RV.Almost similar results (differ in values) have been observed to compare the forecasting performance of the HAR-RV-J (and MSNR-J) and SVM-HAR-J (and MSNR-J) models.It is also observed for both classes of models that the model performances improved after adding the jump (and/or MSNR-J) component as explanatory variable.The MNSR-Jump remarkably improves the predictive ability for the SVM-HAR-RV-MSNR-J class models but not for the HAR-RV-J class models.

Table 2b presents values of
for the HAR-RV-RV-CJ, HAR-RV-RV-MSNR-CJ, SVM-HAR-RV-CJ and SVM-HAR-RV-MSNR-CJ models.This table also produced the similar results as Table 2a.The value of successively increases for the standard deviation of RV series than RV series and for the log RV series than standard deviation series for all the models and all different horizons but successively decreases for higher horizons in each and every series.In the in-sample case, The SVM-HAR-RV-CJ (and/or SVM-HAR-RV-MSNR-CJ) class of models performed well that the HAR-RV-CJ (and/or HAR-RV-MSNR-CJ) class of models.The out-of-sample performances of SVM-HAR-RV class models are also satisfactory.
The logarithmic transformed series produced better performances compared to RV and standard deviation of RV series for both classes of models.For both class of models, the best performances observed when 5-minute intraday returns are used to estimate the realized volatility.
Next, the different errors are calculated for the logarithmic transformed series for both classes of models.
Let us now compare the results based on different above defined error squares.Table 3a represents the forecasting errors for HAR-RV, SVM-HAR-RV, HAR-RV-J, SVM-HAR-RV-J, HAR-RV-MSNR-J and SVM-HAR-RV-MSNR-J models while and 3b presents the forecasting errors for HAR-RV-CJ, SVM-HAR-RV-CJ, HAR-RV-MSNR-CJ and SVM-HAR-RV-MSNR-CJ models.It is observed that in the in-sample case, the SVM-HAR class models completely defeat the HAR-RV class models for every series, horizon and intraday returns series.For the Out-of-sample case, the performance of SVM-HAR class models is also satisfactory compared to HAR-RV class models.Figure 2 presents the out-of-sample forecasting performances of the above models when 5-minute intraday returns are used.

Concluding remarks
This paper combined the Support Vector Machine (SVM) regression with Heterogeneous Autoregressive (HAR) model as a hybrid model (SMV-HAR model) to improve the volatility forecasting ability.It is examined the realized volatility forecasting ability of the models for Nikkei 225 stock returns.The empirical results presented here are suggestive for several interesting extensions.First, the values 1 and 0.1 for the SVM-HAR-RV class models were set and observed better forecasting ability.The appropriate choice of the value and could be helpful to improve the forecasting ability.
Second, the Polynomial and Laplaceian kernel were considered for the SVMs and observed better performances.The appropriate choice of other existing kernels in SVM literature or an appropriate new kernel could improve the forecasting ability.
Third, the optimally sampled sampling frequencies are considered to mitigate the market microstructure noise.This choice failed to improve the forecasting performances.It would be interesting to consider the other market microstructure noise mitigation techniques.
Those topics are left for further research.Table 3a.Forecasting Errors of HAR, SVM-HAR, HAR-J, SVM-HAR-J, HAR-MSNR-J and SVM-HAR-MSNR-J models for the logarithmic series This paper used the Polynomial kernel function (used for out-of-sample forecast) and Laplacian kernel function (used for in-sample forecast) for SVMs.The general form of the Polynomial kernel function is , scale., offset and the Laplacian kernel function is , See,e.g., Smola and Schölkopf (1998) for further discussion.
-a, Panel-b, Panel-c, and Panel-d show the daily RV, log-RV, Jump and Significance Jump series of 5-min, 10-min, 15-min and optimally sampled intraday return data.The significant jumps have been calculated using a cutoff value α= 0.999.

Figure 2 .
Figure 2. Daily, Weekly and Monthly out-of-sample realized volatility forecasts from HAR-RV, SMV-HAR-RV R, HAR-RV-J and SVM-HAR-RV-J models

Table 2a .
The R 2 -Values for HAR, SVM-HAR, HAR-J, SVM-HAR-J, HAR-MSNR-J and SVM-HAR-MSNR-J models The sample of the period 11 March 1996 to 30 September 2009, there are total 3279 Daily observations.The Table reports the R 2 -Values those have been calculated for daily (h=1), weekly (h=5) and monthly (h=22) horizons.The out-of-sample R 2 value of HAR-RV-J model of 22-day ahead horizon for standard deviation of RV series observed negative.This implies that HAR-RV-J model is not appropriate model for those data sets.

Table 2b .
The R 2 -Values for HAR-CJ, SVM-HAR-CJ, HAR-MNR-CJ and SVM-HAR-MNR-CJ models for different horizons The sample of the period 11 March 1996 to 30 September 2009, there are total 3279 Daily observations.The Tablereportsthe R 2 -Values those have been calculated for daily (h=1), weekly (h=5) and monthly (h=22) horizons.

Table 3b .
Forecasting Errors of HAR-CJ, SVM-HAR-CJ, HAR-MNR-CJ and SVM-HAR-MNR-CJ models for different horizons for the logarithmic series The sample of the period 11 March 1996 to 30 September 2009, there are total 3279 Daily observations.The Table reports the R 2 -Values those have been calculated for daily (h=1), weekly (h=5) and monthly (h=22) horizons. Key: