Theory Survey of Stock Yield Prediction Models

According to the difference of the modeling theories, the stock yield prediction models can be divided into two sorts. One is the traditional fluctuation rate prediction model based on statistical theory, and the other is the innovational prediction model based on theories such as NN, grey theory, support vector machines (SVM) and so on. In this article, we introduced these two models and their research actualities, and compared and analyzed the characters of these two sorts of models, and studied the problems existing in the concrete application of the stock yield prediction model in China, and put forward corresponding advices for the future development.


Introduction
How to exactly describe and predict the stock yield fluctuation has been one of hotspot problems discussed by the financial domain all along.It possesses very important theoretical and practical meanings to grasp the characters and direction of stock yield fluctuation for investors to measure, avoid and manage the stock market risk.Therefore, for a long time, many scholars utilize various prediction models to empirically analyze and predict the stock yield fluctuation, and hope to find helpful revelations and useful rules.At present, from domestic and foreign relative literatures, though there are many models to predict the fluctuation of the stock yield, but according to the difference of the modeling theories, the stock yield prediction models can be divided into two sorts.One is the traditional fluctuation rate prediction model based on statistical theory, such as popular and representative models including ARCH type models and SV type models, and the other is the innovational prediction model based on theories such as NN, grey theory, support vector machines (SVM) and so on.These two sorts of model will present different characters when they are used to predict the stock yield fluctuation, but up to now, there are no relative literatures to systematically compare and analyze these two models and open out their own respectively applied conditions and situations.In this article, we compared and analyzed the characters of these two sorts of models, studied the problems existing in the concrete application of the stock yield prediction model in China, and tried to offer helpful references for the researches in this domain.

The stock yield prediction model based on statistical principle
The stock yield prediction models based on statistical principle which are applied most extensively and possess representative characters include GARCH type models and SV type models.

Engle's ARCH model
In 1982, Engle firstly put forward the autoregressive conditional heteroskedasticity model (ARCH model) which could better simulate the clustering and time varying properties existing in the stock fluctuation.
According to the definition put forward by Engle, the ARCH model can be expressed as ( , )

S
The time sequence is decided by two equations, and one is the conditional mean equation, and the other is the conditional variance equation, and h denotes the vector composed by some exogenous variables or lagged endogenous variables, x is the parameter vector of the mean equation, : . .(0,1), : . .(0, ) , and 2 is unknown.and are constants, and is the durative parameter which reflects the influence of present fluctuation on future fluctuation, 1 .t h can also be extended to be a ARMA process.
Similar with the GARCH model, there are many extension forms for SV model.Kim, Shepard and Chib supposed that the t didn't obey the normal distribution, but obey the t distribution, so the basic SV model can be extended as the SV-t model.Aiming at the relationship between the anticipated income and fluctuation in financial market, Kooperman and Uspensky put forward the SV-M model (SV in mean).
The parameter evaluation methods of SV model mainly include maximum likelihood method, Markov chains, Monte Carlo method and generalized matrix method, and comparing with GARCH type models, because SV model includes two stochastic processes, so SV model is theoretically better than the GARCH type model.But the parameter evaluation of SV model is relatively difficult, and in practical application, the SV model is not as extensively applied as GARCH type models.

The grey prediction model (GM) based on grey theory
In 1982, Chinese scholar, Professor Deng Julong firstly put forward the theory of grey system which was largely concerned by the world.At present, the method of grey prediction has been extensively applied in the short term predictions of numerous industries.
The GM (1,1) model is a sort of usual grey model which is only composed by one-order differential equation containing single variable, and it is the special example of the GM (1,n) model.To establish the GM (1,1) model, we only need one material sequence x (0) .

(n)]
Where, (1 ) Because the sequence x (1) (k) possesses the index increase rule, and the solution of the one-order differential equation is just the solution of the index increase form, so we can think the x (1) sequence fulfills the following one-order linear differential equation model.
( 1 ) ( 1 ) Where, a is the grey endogenous control parameter of the development parameter u.

The NN prediction model based on bionics
The theory of ANN is a rising marginal and cross subject, and it can denote any nonlinear relationships and study, so it offers new ideas and new methods to solve many complex practical problems with uncertainty and time-varying.In the domain of prediction, Lapeds and Farber first applied NN into the prediction in 1987, which started the utilization of ANN prediction.
The learning ability of ANN can be utilized to train the nerve cell network by large numbers of samples and adjust the joint weights and thresholds, and then the assured model can be used to predict.NN can automatically study former experiences from the data samples without multifarious query and description process, and automatically approach those optimal functions which describe the rule of sample data without reference to how forms these functions have, and the function form is more complex, the property of NN is more obvious.
The basic idea of the back propagation training algorithm (BP algorithm) is to adjust and modify the joint weight and threshold to make the error minimum through the back propagation of network error, and its learning process includes forward computation and error back propagation.BP algorithm only needs one simple three-layer ANN model can realize any complex nonlinear mapping relationship from input to output.

The prediction model based on SVM
According to the statistical learning principle, Vapnik et al (1995,1998) put forward the method of support vector machine learning (SVM).The method can realize the minimum experience risk and credit range through seeking minimum structured risk, and can obtain good statistical rule when the statistical samples are few.SVM algorithm is a convex optimization problem, so the local optimal solution must be the global optimal solution.SVM algorithm converts the impartible data in the original space into linear and partible high-dimensional character space, and then maps it into the original space to solve it by the core function, which can skillfully solve the problem of dimension disaster, and the complexity of the algorithm is independent of the input samples, but the selection of the core function is a difficult problem in the method of SVM.

GARCH type models
Foreign researches have utilized the GARCH type models to implement large numbers of researches, which indicates GARCH model and its extension forms possess very good effect to describe the fluctuation of financial time sequence.French, Schwert and Stambaugh (1987) used the GARCH model to evaluate the relationship between US stock market anticipated income and fluctuation, and they found that the anticipated income was positively correlated with the predicted fluctuation.In the same year, Engle et al utilized GARCH-M model to find that the conditional variance could better explain the variance of S&P 500 index anticipated income.In 1991, Bollerslev and Engle's research also found that the risk price premium presented positive correlation with the fluctuation.The level effect of the stock price fluctuation was proved in Nelson's article (1991), Glosten, Jagannathan and Runkle's article (1993), Engle and Ng's article (1993), Fornari and Mele's article (1997) again and again.In addition, many empirical applications such as in Campbll and Hetschel (1991), Engle and Ng (1991), Pagan and Schwert (1990)'s researches also proved GARCH could offer ideal data simulation and prediction effect.

SV type models
There are many researches about the SV model studied by foreign scholars.For example, Ghysels, Harvey and Renault (1995)  In a word, comparing foreign and domestic scholars' research results, we can find that foreign scholars always obtain better effects because they adopt the data of US or other western countries to implement data fitting or prediction by GARCH type or SV type models, and domestic scholars usually adopt relative data of domestic stock market to implement same researches.

Grey model
The grey prediction model put forward by Chinese scholar, Professor Deng Julong, has been applied in the short-term prediction of multiple domains such as food yield prediction and power load prediction, but in foreign literatures, there are few literatures which utilized the grey model to predict the stock fluctuation in short term.In China, many researchers have applied the grey model to the stock prediction.
Chen Haiming and Duan Jindong (2002) combined GM (1,1) model with Markov model, and established the Grey-Markov prediction model to predict Shanghai composite index, and obtained the precision of the Grey-Markov was higher than GM (1,1).Shi Jiuyu and Hu Jingpeng (2004) utilized GM (1,1) model to establish the culmination prediction model of 65 days mean running orbit for Shanghai composite index, and the prediction values were very consistent with the market values.Tan Siqian (2006) applied GM (1,1) model to implement short-term prediction for the stock price, and compared with the ARIMA model, and the result indicated the precision of the grey model was higher than the precision of ARIMA model.Matsuba (1991) firstly introduced the NN into the price prediction of stock market.After that, quite a few researchers utilized the NN model to predict the stock market.Hill et al (1996) compared NN with six sorts of traditional statistical prediction method, and they used 111 time sequences to predict, and the results indicated that when the short-term (monthly or season) data were adopted to predict, the NN would obviously better than traditional statistical models, but when the long-term (yearly) data were adopted, the prediction results were almost same.

NN prediction model
There are many Chinese scholars to utilize NN to predict the stock market.Li, Minqiang and Meng Xiangze (1997) adopted the genetics algorithm based on NN to study the investment strategy of stock market.Combining with the characters of domestic stock market, Wu Wei et al (2001) utilized the better classification ability of the multiple-layer forward feedback BP network to predict the fluctuation of stock.Wu Chengdong and Wang Changtao (2002) utilized ANN NP network to predict the stock market.Liu Yongfu and Wu Haihua (2003) et al established the BPNN prediction model to predict the Shanghai composite index, and they found the convergence speed of the model was very quick, and the learning ability was strong, and the prediction precision was high, and the error rare was lower, and the method was very effective for the short-term prediction of stock index.Shang Junsong ( 2004), Long Jiancheng (2005) and Hujing (2007) and other scholars all utilized the NN model to study the prediction of stock market.Large numbers of empirical researches of Chinese scholars indicated it was very feasible and effective to apply ANN in the prediction of Chinese stock market.

SVM prediction model
H. Nakayama ( 2003) et al introduced the increasing learning and data abandoning method in SVM and applied these methods in the price prediction of stock.W. Huang (2005) et al used the SVM to predict the direction of the stock market.P. Pai (2005) et al combined ARIMA (auto regressive integrated moving average) model with SVM model, and put forward a sort of integrated model to predict the stock price, and the result showed the integrated model was better than the conclusion of single ARIMA or single SVM.
From relative domestic researches, Yang Yiwen and Yang Chaojun (2005) utilized SVM to make exactly multiple-step prediction for the sequence tendency of Shanghai composite index.Li, Lihui (2005) et al applied SVM in the prediction of Chinese Shanghai 180 index.Zhou Wanlong, Ma Fayao and Peng Lifang et al (2006) utilized SVM to implement short-term prediction to the stock, and Zhao Jinjing (2007)'s experimental result showed that the method had higher prediction precision than the NN and time sequence method.
As a whole, the quantity of the literature that Chinese scholars utilized innovational prediction models to predict the stock market is more than the quantity of the literature utilizing traditional prediction models.And as viewed from the prediction effect, the prediction precision of innovational prediction model would be higher than traditional statistical prediction mode.In three innovational models, the application of NN model is most extensive, and the SVM possesses the most exact prediction effect and the method which uses the grey model to implement short-term prediction is the simplest method.

Different theoretical bases of modeling
The traditional stock yield fluctuation prediction model is based on the statistical analysis theory.But to dispose the stochastic process based on probability statistics requires more sample quantity and more complete and confirmatory original data.But in fact, in the practice, even if there are large numbers of sample quantity, we don't always find the rule, and even if the statistical rule can be found, but it is not always the representative one.The innovational prediction model completely breaks away the statistical theory, and it establishes the prediction model by a sort of innovational modeling thinking.For example, the grey model is based on the grey theory, and it accumulates and disposes the historical data and makes the data present the rule of index change according to the change rule of generalized energy, and then establishes the model.But ANN model is based on the ANN theory, and it establishes the NN model to predict through simulating the structure and the information processing and index function of the human brain neural system.SVM is based on the machine learning theory of statistical learning, and it makes local optimal solution to be the global optimal solution through convex optimization, which can overcome the deficiencies such as low NN convergence speed and local minimum point.

Different data requirements and processing
The prediction model based on the statistical principle requires large numbers of sample and good distribution rule, and whether for GARCH type models or SV type models, if only the sample quantity is enough large and the distribution is good, the prediction effect will be ideal.For example, the prediction effect will be better when we use GARCH model to predict the US stock index, because the development time of Chinese stock market is relatively short, and because of macro-control and stock reform, the stock index changed radically, which induced the distribution rule of data was not obvious, and there is still certain limitation to utilized this type model.Bu the innovational prediction model requires low sample quantity and distribution degree.For example, the grey model can predict the next data only by possessing 7 or 8 data.For the processing technology, the grey model will implement accumulated processing to the original data, and make the disorderly and unsystematic data present obvious index rule, and when the modeling and computation are completed, the model will decrease and deoxidize the data.The NN model adopts the data drive and black box to modeling, and it needs not experimental information, and it can pick up data character and effective predict the future through self structured adjustment under the data environment that the information resource is not complete and exact.

Different stabilities and adaptabilities of model structure
Once the prediction model based on the statistical principle, the model structure will possess strong stability, and a stable interior relationship will exist among model variables.Whether the GARCH model or the SV model, the model structures are relatively stable and simple, and they all belong to the model with single factor.But in the practice, the prediction environment is complex and changeful, and once new relationship occurs among system variables, the model will not adjust and adapt the change.The innovational prediction model is the model that one factor or multiple factors or the structure can be changed, and the computation is relatively complex, but its adaptability is better than the prediction model based on the statistical principle.For example, for the grey model, except for the basic GM (1,1) model, the grey theory can solve the high-order system through the GM (1,1) model group and can comprehensively consider the influences of multiple factors.But the NN and SVM belong to the variable structured model, and they adjust their interior structures to adopt the changes of the system variables through the learning of network to the new samples.For the nonlinear high-dimensional and high-order problems, the effect of NN and SVM will be better.

Different prediction precision and extrapolation
Comparably speaking, the prediction model based on the statistical principle has larger error and worse extrapolation, because the type of model lacks in the re-processing or re-learning process to the data samples, and the low fitting of sample induces the bad extrapolation.But the innovational prediction model possesses higher precision and strong extrapolation because the type of model possesses re-processing or re-learning process to the data.The grey model implements accumulation processing to the data, and the NN model and SVM first studies the data and then implements reasoning and optimization.So the fitting degree and the extrapolation ability of the innovational prediction model will be higher than the statistical models.

Different prediction difficulties and prediction time length
The technology of the prediction model based on the statistical principle is mature and the prediction process is relatively simple.Whether for the GARCH type model or for the SV type model, the theoretical base to establish the model is stable, and the model structure is relatively simple and the computation difficulty is relatively low.Because this type of model adopts the historical data in long time, so it can be used to predict the future in a long time.And the prediction technology of the innovational prediction model can be further improved and the prediction difficulty is large.For example, the process which utilizes the NN to predict the fluctuation of stock yield is relatively difficult because the NN needs set up the hidden layer and the weight which are directly correlated with the rationality and nicety of the prediction result.The prediction using the method of SVM comes down to the confirmation of the core function which is very difficult to be confirmed in fact.Because the innovational prediction model only requires small quantity of sample, so it is fit to implement short-term prediction for the prediction objects.

The research results about the application of the stock yield prediction model in China are very limited, the research difficulty is large and the prediction precision should be further enhanced
From the index result of CJFD, in the periodical literatures from 1998 to 2007, there are four articles which utilized the GARCH type model to predict the stock market, and there are seven articles which used the SV type model to prediction the stock market, and the sum is 11.And there are five articles which used the grey model to predict the stock market, and there are 67 articles which used the NN to predict the stock market, and there are 9 articles which used the SVM to predict the stock market, and the sum is 81.
Add two above sums, the quantity of the relative articles in ten years is less than 100.The figure explains two problems at least, and one is the research in this domain in China has not be developed intensively, and relative research talents and results are very limited and the existing research is only in the stage of start.The other one is that the prediction of the stock yield fluctuation is very difficult, and the precision of the prediction hardly achieves researchers' anticipated level, so quite part of research results may not be vended because of failure prediction.

The prediction effect utilizing the innovational prediction models for Chinese stock market is better than the prediction model based on statistical principles
From the prediction results of the articles vended in China, the prediction effect of innovational model is better than the traditional statistical prediction model.The research mainly is that the prediction model based on statistical principle always very strictly requires the original data, and only under the premise that the original data distribution is good and the material is complete and the sample quantity is large, the fitting and the prediction effect will be ideal.But since the Shanghai Stock Exchange was established in 1990, there is only 18 years' history.In these 18 years, the Chinese stock market experienced drastic changes several times because of policy, supervision and stock reform, and the quantity of listed company is limited and changeful continually, so the original data are not ideal in the aspects such as distribution, sample quantity and data integrality.So in China, the prediction effect which adopts the prediction model based on statistical principle is not ideal to predict the fluctuation of stock yield.But the innovational prediction models don't require the data as the statistical models, and most of them only require small samples and short-term prediction, so at present, the prediction effect of these types of model is better than the statistical types.

Most Chinese stock yield fluctuation prediction models adopt single prediction model
From existing research literatures, the prediction models predicting the stock yield fluctuation of China all adopt single prediction model whether for the prediction model based on statistical principal or for the innovational prediction model based on non-statistical principle.The advantages utilizing single prediction model include that the model structure is relatively simple, and the influencing factors are few and the prediction difficulty is relatively low.But its disadvantages are also obvious.Single prediction model can not contain all influencing factors in one model, and any one sort of single prediction model only uses part of useful information and abandons other useful information.But the influencing factors in the stock market are numerous.Single prediction model can not contain and reflect enough information, so it can not better effectively predict the future tendency.

The technologies and methods to effectively deal with the non-quantitative factors are deficient
The factors which influence the stock yield fluctuation are very complex.Except for quantitative influencing factors such as stock price, turnover and stock index, the fluctuation is also influenced by non-quantitative factors such as policy, psychological fluctuation and international emergency.At present, most stock yield prediction models of China belong to purely quantitative models which can only compute the prediction result by using the quantitative information.But the non-quantitative information which largely influences the stock yield fluctuation can not be added into the quantitative model because of the limitation of the model and the processing technology, so the exact prediction always can not accord with the actual situation.

The innovational intelligent prediction model will be one development direction of Chinese stock yield prediction
First, the innovational prediction model can overcome many disadvantages such as incomplete data, large fluctuation and unreasonable distribution in China stock market, and it adopt the small sample data to implement short-term prediction to the stock market, and the prediction precision is relatively higher than the traditional statistical prediction model.Second, the intelligent models such as NN and SVM in the innovational models can simulate or partly simulate artificial intelligence, and implement complex nonlinear changeable structure processing to various factors influencing the stock market, and they can overcome not only the disadvantage that the single factor model can not contain sufficient information, but the disadvantage that the fixed structured model can not dispose the emergency, and they can fully reflect various sorts of information and changes influencing the stock market and increase the nicety of the prediction.

The integrated prediction model will be another development direction of Chinese stock yield prediction
Each sort of prediction model has its advantages and limitation in the acquisition of predicted information.How to exert the advantages of different prediction models and overcome their limitations and enhance the nicety of the prediction is the urgent problem which should be solved by the prediction theory and technology.The integrated prediction is to endow different weights for the prediction results of different prediction models according to certain principles, and implement weighted average, and obtain the final prediction result.This prediction method can overcome the disadvantage that the single prediction model has not sufficient information, fully exert the advantages of different prediction models, acquire information from different aspects to the large extents, and enhance the prediction level of stock yield.

The prediction model with various kinds of non-quantitative information will be one important development direction of Chinese stock yield prediction
At present, all stock yield prediction models belong to the quantitative prediction model, and the non-quantitative factors can not be added into the models, which induce the prediction losses large numbers of non-quantitative information and the nicety of the prediction will be influenced largely.How to translate various sorts of non-quantitative information influencing the stock market into quantitative information, add them into the stock yield prediction model, fully reflect the influences of non-quantitative factors such as policy, psychology and emergency and enhance the nicety of the prediction is one important development direction for the stock yield prediction model.

The continual absorptions and applications of new prediction theories and technologies will be the necessary tendency of Chinese stock yield prediction model
As viewed from the development process of the stock yield prediction model, we can see that the process is just that new prediction theories and technologies are continually introduced, absorbed, assimilated, applied and perfected.The fractal theory of GARCH model, the grey theory and technology of SV model, and the NN theory and technology of NN model all experience thus process.New prediction theories and technologies continually offer new views and more perfect and complete technologies for the prediction of the stock yield fluctuation, and they make the prediction effect of the stock yield fluctuation prediction model more go to perfection and nicety.At present, the chaos theory and the wavelet technology have been primarily applied in the prediction of stock yield fluctuation, and with the development of relative researches, new ideas will be introduced the prediction of stock yield fluctuation.
Many domestic scholars studied and applied the GARCH type models.WeiWeixian and Zhou Xiaoming (1999)  utilized GARCH model to predict the fluctuation of Chinese stock market.Dinghua (1999), Zhou Zhefang and Li Zinai (2000), Tang, Qiming and Chenjian (2001) et al analyzed the ARCH effect of Chinese stock market and fit the fluctuation of the stock index.Yue Chaolong (2001), Chen Qianli (2002) and Zhou Shaofu (2002) utilized the GARCH type model to empirically study the fluctuation clustering and asymmetry of stock market income of China.Zhang Shiying and Keke (2002) systematically commented the ARCH model system.Li Shengli (2002) and Lou Yingjun (2003) discussed the level effect of Chinese stock market.Zhang Yongdong and Bi Qiuxiang (2003) empirically compared the fluctuation prediction model of Shanghai Stock Market.
detailedly summarized relative literatures about the origin, evaluation method, model expansion and fluctuation durative of the capital market stochastic fluctuation model.So, Lam, Li and Smith introduced the Markov structure conversion mechanism into the SV model, and obtained the MRS-SV model with Markov structure conversion.Kalimipalli and Susmel put forwad the different MRS-SV model to describe the level and fluctuation of short-term interest rate.Jun Yu (2002) utilized basic SV model to predict and analyze the stock market of New Zealand, and he found the basic SV model possessed good prediction ability.G.B. Durham (2007) utilized the SV-mix model to predict the S&P 500 index and the prediction effect was good.In China, Baihuang and Zhang Shiying (2001) utilized extended SV model to empirically study the fluctuation of Shenzhen stock market, and they pointed out the extended SV model was better than SV model to describe the financial fluctuation.Yang Kelei and Mao Minglai et al (2004) used SV model to analyze the fluctuations of Shanghai index and Shenzhen index, and the result indicated that the risk of Shanghai stock market was larger than Shenzhen stock market.Wang Chunfeng (2005) compared the SV model which mean conditional distribution was the normal distribution and the SV model with conditional heavy-tailed distribution, and the empirical result indicated the SV model with conditional heavy-tailed distribution could better describe the fluctuation of Chinese stock market.Qian Haoyun (2006) adopted SV model and A-SV model to fit the daily yields of Shanghai composite index, Shenzhen composite index and Hongkong Hang Seng index, and the result showed A-SV was better than SV.Wu Qiquan et al (2006) used the SV model to study the policy effect of Chinese stock market.Mao Minglai et al (2006) comprehensively discussed the SV type models.Zhouyan and Zhang Shiying (2007) utilized the Monte Carlo simulation integral method based on Markov Chains to evaluate the SV model with continual time, and empirically studied the daily composite index of Shanghai stock market, and the result proved the validity of the model and the method.
In 1986, Bollerslev put forward the generalized ARCH model (GARCH model).GARCH considers not only the lagged value of the disturbance item, but the lagged value of the disturbance item conditional variance.Therefore, GARCH model is a long-memory process, and it can depict the financial time sequence with long memory.In fact, GARCH process is the ARCH process with infinite orders, and we can utilize GARCH model to more conveniently describe the ARCH process with higher orders and compensate the deficiency of the ARCH model.After that, GARCH model is continually expended and perfected, and at present, the ARCH type model system including GARCH-M model, Taylor put forward the stochastic volatility model (SV).The production of the SV model is directly correlated with the diffusion process of pricing in financial assets, and it thought not only the price of the objective assets but the fluctuation rate can be described by the Wiener Process, and SV model is the financial fluctuation model with very good application foreground.
t ARCH p .