Design of the Nonlinear System Predictor Driven by the Bayesian-gaussian Neural Network of Sliding Window Data

The model identification of the nonlinear system has been concerned by the industrial community all along. The relationship of the nonlinear dynamic system is contained in the data accumulated in the scene. To better utilize the data about the industrial objects, in this article, we put forward the nonlinear system predictor driven by the Bayesian-Gaussian neural network (NN) model, use the trained threshold matrix and sliding window data to realize the online output prediction for the nonlinear dynamic system. The simulation experiment indicates that the Bayesian-Gaussian NN based on the sliding window data can fulfill the demands of the online identification and prediction of the adaptive nonlinear system.


Introduction
Most industrial control objects are nonlinear objects with time-varying, time-lag and saturation, so the input and output relationship model of the controlled system can not be exactly established.And general classic control method is designed based on the exact model of the system, so the model is difficult to be established and many antinomies exist in many control designs.On the other hand, in the dynamic running process, much input and output data will be produced, and these data are exterior representations of the nonlinear structure characteristics of the system, and these data can help us to establish the structure model of the system.
The confirmation of the input and output nonlinear structure model of the nonlinear dynamic system is the identification problem essentially, and it is composed by the identification model with proper parameters and the performance function which adjusts the parameters through optimizing the errors between the unknown system identification and the model output (Zhang, 2000, P.566-568).The NN model is a sort of effective function approximation tool, and it has been applied in the nonlinear system identification (Li, 2001, P.499-502, Zeng, 2009, P.2293-2300, Yan, 2007, P.232-236).In theory, any three-layer forward NN can approximate any nonlinear function, but the disadvantage of the NN is that the confirmation of the hidden layer mainly depends on the experiments and experiences, and if the network weight parameters are too much, the adjustment process of the weights will get in the local minimum.Otherwise, when the structure character of the nonlinear dynamic system changes, the trained NN model always can not fit the nonlinear system after structure change, so the NN must be retrained, and the pure forward NN doesn't adapt the time-varying identification and prediction of the dynamic system.However, in the response process of the nonlinear dynamic system, large numbers of input and output data have described the structure characters of the nonlinear dynamic system from the exterior.As viewed from the probability theory, the structure character of the dynamic system should be included in the relationship of these data.Based on Bayesian inference and Gaussian hypothesis, in this article, we put forward a sort of Bayesian-Gaussian NN reasoning model based on sliding window data which can integrate sliding window data into the structure of the reasoning model.Only through confirming same threshold matrix parameters with the nonlinear system, we can use the historical data in the sliding window to realize the output prediction of the present system, and when the structure of the system changes, we can realize the online follow identification output of the system.

Description of nonlinear dynamic system
In the nonlinear dynamic system of the discrete time seen in Figure 1, suppose the system is stable, and the input and output nonlinear relationship of the system is Where, ) (k y denotes the output of the k'th step of the system, ) ( i k y − (i=1, 2… n) denotes the system output of the former n steps, ) (k u denotes the input of the k'th step of the system, ) ( i k u − (i=1, 2… m) denotes the system control inputs of the former m steps, f denotes the dynamic relationship between input and output of the dynamic system, and the nonlinear function relationship can be approximated by the identification method, and the target of the article is to use the Bayesian-Gaussian NN model based on sliding data window to approximate the structure of the nonlinear function f and the online identification and prediction of the dynamic system.

Bayesian-Gaussian NN based on sliding window data
Suppose the input vector of the nonlinear system identification model can be denotes as (2) k X denotes the input of the system at the k'th sampling, and it is the column vector with n+m+1 lines.The output is real number, and the input and output relationship of the system can denoted as Based on historical input and output data, utilizing Bayesian inference and Gaussian hypothesis, the Bayesian-Gaussian model can realize the prediction ) ( ˆk y of the system output ) (k y , and the superscript " Λ " denotes the identification output of the model.

Suppose
) , (X i i y (i=1, 2… N) is the sample set of the training, i X is the sampling input of the i'th step, and it denotes the column vector of the m'th line, X i =( X i1 , X i2 , …… ,X im ) T .i y denotes the output of the system, and based on Bayesian inference and Gaussian hypothesis, the output y can be generated by the method of probability under the new input X.
And the Bayesian theorem is Substitute above (4) and ( 5) into ( 6), and simplify it and we can obtain Where, 1 c is the normalization parameter, and the mean parameter ) is known, what is the probability that X exports Y?
Suppose the prior probability of ) are independent each other under the appointed condition Y, so the conditional probability that N data samples generate the output Y for the new input X is K is the normalization constant independent with Y and Ye Haiwen's ariticle (Haiwen Ye, 1999, P.21-36) gives the proof process in detail.

Bayesian-Gaussian reasoning model
Substitute ( 7) into (10), we can obtain 2 c is a normalization constant independent of Y, because the distribution of the prior probability approximate as the constant, so the prior variance 2 0 σ is big, and ( 8) and ( 9) can be respectively approximated as In the above formula, 4 c is the normalization constant independent of Y, and the estimated mean Suppose the variance fulfills ( 15) In ( 15), D is called as the threshold matrix.Therefore, the formulas ( 13), ( 14) and ( 15) composes the Bayesian-Gaussian reasoning model, and the parameters of the whole model mainly include the threshold matrix D and the initial estimation variance 2 0 σ , and the dimension of the threshold matrix is equal to the input amount of the nonlinear dynamic system, so the parameters which need to be confirmed from the network are few, and the operation time of the reasoning model can be largely saved.

Bayesian-Gaussian NN
Based on above Bayesian-Gaussian reasoning model, we can obtain the Bayesian-Gaussian NN seen in Figure 2, and it adopts the nerve cell nodes (seen in Figure 3) as same as general NN, and the network includes five layers.
The first layer: The second layer: Store N groups historical input data samples, and each group of sample includes m input variables.For the j'th node in the i'th group, its input and output relationship can be expressed as The superscript " [2] " denotes the second layer of the Bayesian-Gaussian NN, and the corresponding third layer and the fourth layer are denoted as " [3] " and " [4] ".The threshold matrix parameter of the second layer has been included in the encouragement function.From the experiment process, the N groups of historical input data samples are very important to the prediction of the system, and to reduce the operation of the Bayesian-Gaussian NN and follow the dynamic responses on line, we adopt the sliding window data method to select the historical input data in N groups.
The third layer: In N nodes, the i'th node corresponds with the i'th input sample in the second layer, and the input and output relationship is denoted as The fourth layer: Includes two nodes and the relationship of the first node and the second layer can be expressed as The fifth layer: Includes two nodes and the input output relationships are

Working procedure of Bayesian-Gaussian NN based on sliding window data
The work process can be divided into the network off-line training and the online prediction application, and the Bayesian-Gaussian NN training is mainly to confirm the threshold matrix parameter D, and the online prediction application is to predict the present system output by N groups of historical input sample, and N groups of prediction sample set adopts the sliding window method to confirm, and above two approaches can be respectively described as follows.

Off-line training of Bayesian-Gaussian NN
First, to the N 1 training sample ( , use the following performance evaluation function Where, i y denotes the actual system output, i y ˆ denote other N 1 -1 training samples except for i X , use ( 13) and ( 15) to train the Bayesian-Gaussian NN and obtain the prediction value.
The target of the train is to find out proper threshold matrix D which can make the output of actual system and the prediction value better fit, and make ( 23) to be least or fulfill the application precision demand of the engineering.
Above process is the process to minimize the formula ( 23), and we can adopt the optimization algorithm based on the grads such as the least square method and the simplex method (Yin, 2003, P.135-137, 145), and we can also adopt the genetic algorithm, the ant colony algorithm, the particle swarm optimization and other random evolutionary optimization algorithms which have been deeply researched and applied in recent years (Guo, 2003, P.70-73, Aaron, 2005, P.175-191, Susuki, 2008, P.249-253).According to the foraging process of the colon bacillus (Liu, 2007, P.991-994), we put forward the improved foraging optimization algorithm (seen in ( 24) and ( 25)), and validate they can be used to optimize these parameters through the experiment.
The concrete contents and symbol parameters of the improved foraging optimization algorithm are in Liu's article (Liu, 2007, P.991-994), and in this article, we use the improved foraging optimization algorithm to optimize the threshold matrix parameter in ( 23), and the concrete optimization includes following six approaches.
Approach 1: Initialize relative parameters, and the approach includes optimizing the field range of the parameter θ , the step number of the chemical trend c N , the step number of the walking operator s N , the step length ) (i C , the number of the species group S , the initial position of ), the weighted coefficient  Through above optimization process, we can obtain the threshold matrix D, and realize the training and learning process of the nonlinear dynamic system.

Bayesian-Gaussian NN based on sliding window data
Through above network training by the threshold matrix D, to realize the online prediction in the dynamic response process of the system, Yinli (Yin, 2003, P.135-137, 145) adopted the self-adjusted Bayesian-Gaussian NN model to sustain the constant of the number of N. Suppose there are N historical input data samples ( If one sample can be predicted by other data samples, so its MSPE computed by (26) should be small, i.e. the sample can be obtained by the prediction from other samples.So we can eliminate the same from N+1 samples and keep the number of the input data samples of the online prediction unchangeable.
Above self-adjusted process of the Bayesian-Gaussian NN can bring extra computation time, especially when the input data sample number N is numerous.So the self-adjusted method has deficiencies for the online prediction application of the nonlinear dynamic system.
In this article, we use the sliding data window to confirm the input samples of Bayesian-Gaussian NN in the online prediction application.The data near the present time contribute most to the output of the present system, i.e. the data sample near the present time can predict the present output with higher precision.
The aim adopting the sliding data window is to sustain the prediction data sample scale N unchangeable for the Bayesian-Gaussian NN when predicting the output y , and the concrete method is seen in Figure 4.
Figure 4 shows three windows, and the data sample quantity of every sliding window, i.e. the width of the sliding window is N, and in the change from window 1 to window 2, only eliminate the data which is farthest to the present time of the window 1, and compose the window 2 with the data sample which is nearest to the present time, and the We utilize the threshold matrix D obtained by the improved foraging optimization algorithm and select 20 as the width of the sliding data window N to implement online prediction of the system, and we can obtain the broken line output in Figure 6.

Analysis and conclusions
From above simulation experiment of the nonlinear dynamic system, we can see that the Bayesian-Gaussian NN based on the sliding data window can fully utilize the window data to realize the online prediction of the system and acquire better effect of the online prediction and follow.From the experiment, we can also see that the Bayesian-Gaussian NN can better adapt the structure change of the nonlinear dynamic system, because the Bayesian-Gaussian NN can integrate window data into the structure, sustainably update the structure of the network through the continual sliding of the window, and quickly capture the change of the nonlinear system structure, and this character is attractive for the dynamic system which characters change often, and it can adapt the online prediction application for the nonlinear dynamic system.

σ
3.1.1When the single historical data (X i , y i ) is known, what is the probability that X exports Y?Under Gaussian hypothesis, Y possesses the probability density function p(Y) and fulfills the Gaussian normal distribution Y~N ( is the variance.Suppose Y is appointed, Y i fulfils the normal distribution Y i ~N( 2 , i Y σ ) and possesses the probability density function p(y i |Y=y). 2 position of E.Coli by the formula (25).Approach 5: If fulfilling the end condition, quit from the computation, or else, go on.Approach 6: Go to the next chemical trend step.
the i'th sample by other N samples, and compute the mean square prediction error (MSPE) of the i'th sample. 2