A NN Image Classification Method Driven by the Mixed Fitness Function

The mixed fitness function of the error sum squares linear transformation is proposed in the article, and this function can improve the evaluation method of the individual fitness, and combining with NN, this method can be used in the high-speed paper money image analysis system. Aiming at many characters such as the high comparability of paper money images of different denominations, small class distance and large in-class discreteness induced by the using abrasion, this method first codes the weight values and threshold values of NN with real values, and transforms the problem from the representation type to the genotype, and performs many genetic operations such as selecting, crossing and variation, and takes the weight value and threshold value trained by the genetic algorithm according to the individual fitness value of the mixed fitness function as the initial weight value and initial threshold value of NN in the next stage, and trains these values by NN to establish the sorter. This method was tested in the embedded system with resource restriction (TI TMS320C6713 DSP), and 20000 RMB images were acquired as the samples, and 12000 images of them were tested, and the test result indicated that the method combining improved genetic algorithm with NN obviously enhanced the recognition rate.


Introduction
Financial institutions would settle large numbers of paper money every day, which requires that the paper money sorting system possesses quick processing speed and high identification reliability.The paper currency sorter is an automatic settlement took, and it is used to sort the denomination, face and deformity class of paper money.The designs of paper money with different denominations are similar, and the pollution and depreciation will make the discreteness of the money samples with same denomination and face very large, and the running instability of the high-speed equipment will change the geometric shape of paper money, so the geometric size of the paper money can not be a reliable classification reference.The image analysis technology is the core technology in the paper money sorting system, and to use the image to identify the denomination of paper money in the financial tools is a new identification measure in recent years.
The design of the sorter is an important stage in the identification of paper money, and the minimum distance sorter has been applied in the paper money identification system, and it could compute the distances between unknown samples and each training sample vector in the models, and select the minimum distance among them to make a decision, but its efficiency is low.So Takeda and Omatu (Fumiaki Takeda, 1995, P.73-77) applied NN in the sorter design of paper money identification in 1995, and acquired better effect.Ahmadi (Ahmadi A, 2003(Ahmadi A, , P.2550(Ahmadi A, -2554) ) et al applied the learning vector quantification (LVQ) (Ahmadi, 2004(Ahmadi, , P.1313(Ahmadi, -1316) ) into the sorter design.As the sorter of the paper money identification, the NN algorithm possesses many advantages such as parallel processing, generalization, self-organization and exact optimization.Because the BP algorithm adopts the search solution algorithm solution descending along the grads, and the learning speed decides the weight value change in cycled training, and large learning speed may induce the instability of the system, the error squares will fluctuate, and the local minimization and slow convergence speed will be induced in the network.
The basic idea of the genetic algorithm (GA) is to adopt certain coding mode to map the solution space to the coding space, and each code corresponds with one chromosome or individual.The random method is used to confirm one initial group of individual, which is called the species group.In the species group, the individual is selected according to the fitness or certain competition mechanism, and the genetic operators such as selecting, crossing and variation are used to generate the next generation, and in this way, the evolvement continues until the condition fulfilling the expectation ends.Barrios et al improved the coding method (Barrios, 2000, P.844-847), and Miller improved the operating operator of the GA (Miller, 1993(Miller, , P.1340(Miller, -1351)), and the standard GA is improved from the design of the fitness function in the article, and a mixed fitness function combining error squares and linear transformation is proposed.The mixed fitness function can solve the problem of the individual with super-large fitness value in the species group when singly adopting the error squares, and effectively extend the range of the fitness value.The GA has strong macro searching ability, and it can find the global optimal solution by the big probability, and it has high robustness.But GA is deficient for the local searching ability, and its execution efficiency is low and the convergence will occur too early.
According to the disadvantages and advantages of BP (back propagation) and GA, a mixed GA-BP algorithm has been successfully applied in many domains such as earthquake prediction (Qiuwen, 2008, P.128-131) and sound identification and acquired better effect (Min-Lun, 2006, P.527-530).The GA-BP is applied in the paper money identification in the article, and because the images of paper money with different denominations are very similar, the class distance is small and the in-class discreteness is large, this article first codes the weight value and threshold of BP network with real values, and then uses GA to train the weight value and threshold values of the network (Jiansheng, 2005, P.288-291 & Chien-Yu, 2008, P.1459-1465), and uses the BP algorithm (Fariborz, 2008, P.389-404 & Ming, 2008, P.115-119 & Qiang, 2005, P.357-360) to train the weight value and the threshold value optimized by GA, and finally predict the unknown samples.The test result shows that the mixed algorithm combining improved genetic algorithm and BP network in the article has higher identification rate and reliability.

Basic principles of BP algorithm and genetic algorithm
NN is one method to solve the nonlinear problem, and it has strong function approaching complex nonlinear function and strong fault-tolerant ability, so it can establish the NN model by existing sample information, and with the continual accumulation of sample information, it can perform self-study based on new sample information, and form more perfect and exact evaluation system.BP network is a kind of multi-layer feed forward NN, and if the input and output relations are gave continually, the interior will certainly form the internal structure with this relation in the learning process of BP network.Each neutron in BP network has several outputs, and it connects with many other neutrons, and each neutron corresponds with several connection accesses and each connection access corresponds with one connection weight value coefficient.Each node in the network has one status variable i x and one threshold variable j θ .The connection weight coefficient from the node i to the node j is ji w .For each node, one transformation function ( ) f x is defined, and generally, 1 ( ) . When the input vector and the objective output vector are confirmed, through initial connection weight coefficient and the threshold value, the network performs the nonlinear reasoning according to the transformation function, and according to the error between the obtained actual output vector and objective output vector, the connection weight efficient and threshold value are adjusted.Through the repeating training to above process, when the error between the actual output vector and the objective output vector achieves the error pre-established, the training process ends.So the connection weight coefficient and threshold value among adjusted nodes can be used to predict unknown samples.The computation process of BP network is divided into the input mode forward propagating and the output error reverse propagating.And the process of the forward-propagating can be described as follows.
Where, j s denote the activation value of various neutrons and the activation function adopts above S-type function, j b denotes the output of the j'th unit in the hidden layer, and p is the amount of neutron in the hidden layer in the network.In the same way, the activation value and the output value can be solved.
The mode forward propagating is used to obtain the actual output value of the network, and when the error between the actual output value and the expectation output value is big, the connection weight value and the threshold value in the network should be modified.The error reverse propagating process of the BP network can be described as follows. ( Where, k d denotes the correction error of the output layer, k ο denotes the expectation output, j e denotes the correction errors of various units in the hidden layer, and the formula (5) and the formula (6) can adjust the connection weight values and the threshold values layer by layer from the output layer to the hidden layer, and from the hidden layer to the input layer.( ) GA is a kind of probability searching algorithm, and it utilizes certain coding technology to act on the number cluster which is called chromosome, and its basic idea is to simulate the individual evolvement process composed by these clusters.Its essential is a kind of high-effectively, parallel and global searching algorithm, and it can automatically acquire and accumulate knowledge about searching space in the searching process, and self-adaptively control the searching process to seek the optimal solution.The GA uses the principle of the survival of the fittest, and gradually generates an approximately optimal project in the potential solution group.GA first performs coding, i.e. realizes the mapping of problem from the representation type to the genotype, and then calculates the fitness function, and its value reflects the situation of the individual, and finally the genetic operators such as selecting, crossing and variation are calculated, and the approximately optimal solution of the problem can be sought.
Because the BP algorithm has the self-organization and self-study abilities, it can directly accept data to perform the study, and self-adaptively find the rule in the sample data, and it has good extension ability to introduce new money types in the paper money identification system.So the BP algorithm is very fit to be used in the processing of paper money image.But the BP algorithm is easily to get in local optimization, slow convergence speed, and uncertain initial weight value and threshold value of the network.GA is a global optimal searching technology, and it can effectively compensate the disadvantages of the BP algorithm.In the paper money identification, GA is used to seek the optimal initial weight value and threshold value in the network and according to the character of similar paper money images, the BP algorithm is used to train the weight value and the threshold value.

Improved genetic algorithm
The fitness value in the GA is used to measure the degree that various individuals achieve or approach the optimal solution in the optimization computation.The individual with high fitness has larger probability to be inherited to next generation, and the individual with low fitness has relative lower probability to be inherited to next generation.The function to measure the individual fitness is the fitness function, and it is the drive of the GA evolvement, and the unique reference to perform natural selection.The extensive fitness function in GA is the error squares.But if in the initial group, certain special individual with excessive fitness exists, this function can not prevent this individual to govern the group, and it will mislead the optimization development direction of the group, and make the algorithm to be convergent in the local optimal solution, and when the GA is gradually convergent, the individual fitness values in the group will be close, and it will be difficult to perform the optimization, and the optimal solution will be easily to sway near the optimal solution.Based on above reasons, a linear transformation of the fitness function is introduced in the article, and its intention is to properly amplify the value of the fitness, and increase the selecting ability of GA.The concrete linear transformation formula is Where, f is the original fitness value, min f is the lower limit of the fitness function value, max f is the upper limit of the fitness function value, and ' f is the fitness value after transformation.From the formula (11), if the difference between max f and min f is big, the fitness value after transformation will be correspondingly reduced, which can effectively avoid the problem misleading the group optimization direction because of the existence of the individual with super-large fitness.But the linear transformation has not same effect to describe the difference among similar individuals than the error square, so a mixed fitness function such as the formula ( 12) is proposed as follows.
'' ' Where, E is the sum of error square, and a ( ) is the harmonic coefficient.The formula (12) fully considers the knowledge of the concrete problem field of the paper money, and adds the information of the change rate of the fitness function value into the fitness function, which can effectively overcome the limitation that the chromosome selected in the standard GA may be not good chromosome, and avoid the phenomena of earliness, and possess higher convergence speed.The harmonic parameter a in the formula ( 12) can be finally confirmed by the test mode.

Optimizing NN by genetic algorithm
The advantage of BP algorithm is that it is easy to be implemented and the optimization is exact.But it has two disadvantages.First, the BP algorithm is easy to get in the local minimization, because the error curve generally has several extreme points.Second, the convergence speed of the BP algorithm is slow, and when the grads descending method is adopted, the step length is difficult to be confirmed, and if the step length is too large, the required precision can not be achieved, and even the result will be dispersed, and if the step length is too small, the iterative approach will increase and the convergence speed will decrease.The advantage of GA is that it can not easily get in local optimization in the searching process, and even if the defined fitness function is not continual and regular, or has noise, it can find the global optimal solution by the large probability, and possess strong robustness.At the same time, the GA has many disadvantages such as low efficiency, too early convergence and weak local searching ability.
Therefore, a mixed algorithm (GA-BP) combining above two algorithms is formed in the article to optimize the NN.Its idea is to first train the weight value and threshold value of BP network by GA which replaces the method randomly evaluating the connection weight value and threshold value by the standard BP network, and can effectively reduce the searching range, and then train the weight value and threshold value optimized by GA by the BP network, and finally utilize the generalized ability of the network to predict the input unknown samples.The approaches of the GA-BP training algorithm can be described as follows.Initialize the species group and crossing probability Pc , the variation probability Pm , the weight values ji w and kj v , the threshold values j θ and k θ , and perform the coding by the real numbers, and the coding length is seen in the formula (13).In the formula (13), in S denotes the amount of neutron in the input layer, out S is the amount of the neutron in the output layer, i S is the amount of the neutron in the hidden layer i , and p is the amount of the hidden layer.
The formula (13) can realize the transformation of the paper money image from the representation type to the genotype.Then compute the fitness functions of the individuals, and rank them, and select the individuals in the initial species group according to the probability value of the formula ( 14).
Where, j f is the fitness value of the individual j , and it can be measured by the formula (15).
( ) (1 ) max( ) min( ) is the amount of chromosome, p is the amount of sample, k is the amount of neutron in the input layer, k y is actual output of the network, and k o is the expected output of the network.Use the crossing probability Pc to perform the crossing operation to the individuals j c and 1 j c + to generate new individuals ' j c and ' 1 j c + , and the individuals which are not be crossed will be copied directly.Utilize the variation probability m P to generate new individual ' i c , and insert the new individual to the species group, and compute the new fitness function value.If the new individual fulfills the conditions, the optimization ends, or else, continual perform the genetic operator operation to the group.Finally perform the decoding operation to the individuals in the final group, and obtain the optimized connection weight value and threshold values of NN.Aiming at the characters of the paper money image, the GA-BP algorithm can be adopted to train the samples in the network under the optimal initial weight value and threshold value, enhance the performance of the network, quicken the convergence speed, and avoid getting into the local optimization.

Establishment of test database
To validate the efficiency of the method in the article, 20000 paper money images were collected in the multi-function money detection instrument designed, and the light-source sensor with 200dpi resolution, and the samples include five types of paper money of 2005 edition RMB.Each money type has four faces and there are 20 classes.Each class has 1000 samples, and 400 of them are used for training, and other 600 samples are used for test.

Preprocessing of paper money image
The image should be positioned before abstracting the character of the image, i.e. finding the position of the paper money image.In the article, test many dispersed points on the borders of the papery money first, and then adopt the least square method to fit the border line of the paper money image for the border sequence points (seen in Figure 2).Because the paper money image collection is to scan the image in the paper money movement, so the geometric distortion will generally occur to some extent, and this distortion comes from two aspects, and one aspect is induced by the slope of the paper money, and the other aspect is induced by the transverse movement in the scanning process.The slope correction of the paper money image is seen in Figure 3.

Character abstraction of paper money image
The meshing character is adopted as the identification character, and the size of the collected paper money image is 270 150 × pixel.Through analyzing of the paper money images with different types, the sensitive region with predominant contribution to the identification can be confirmed.These regions are divided into small panes of 16 6 × in the article, so each paper money will form 96-dimensional eigenvector, and the eigenvector of each dimension is the sum of pixel grey value of the corresponding pane, and standardize the output to obtain the eigenvector.

Analysis of test result
The intention of the test 1 is to confirm the harmonic coefficient a of the mixed fitness function, and 12000 samples were tested in the article, and the optimal value is confirmed through setting up corresponding identification rate of the different harmonic coefficient.From Table 1, when 0.6 a = , the identification rate is highest, and when the value of a increases or decreases, the identification rates all will decrease.Figure 5 are the corresponding mixed fitness function error curve and the fitness function value curve of different harmonic coefficients, and when 0.6 a = , the convergence speed of the convergence speed is quick, and the fitness function value curve can use less iterative times to achieve the stable state.
In the article, the GA fitness function adopts the error square as the standard GA, which is denoted by SGA, and the fitness function adopts the mixed function as the improved GA, which is denoted by IMGA.The intention of the test 2 is to compare the performances of BP network in Omatu's article (Omatu, 2007, P.413-417), the combined network of SGA and BP, and the combined network of IMGA and BP in the paper money identification.The fitness error function of IMGA in Figure 6 has quicker convergence speed than the fitness error function of SGA, and the iterative times that the fitness value goes to stable is less than the iterative times of SGA.The data in Table 2 indicates that the identification rate using the combination of IMGA and BP network is higher than the identification rate singly using BP network or the combination of SGA and BP network.

Conclusions
BP NN has the problem of local minimization and GA has good global searching ability, so an improved GA is proposed in the article, and a mixed GA-BP algorithm combining improved GA and BP NN is applied into the paper money identification.GA-BP mixed method could optimize the find the optimal point in the solution space, and search the BP network according the negative grads direction, which can avoid that the BP algorithm gets into the problems such as local minimization and slow convergence speed, and overcome the disadvantages of GA that the searching time is too long and the searching speed is slow in the optimization process.The test result indicates that the algorithm in the article has higher reliability and robustness in the paper money identification.

θ
respective denote the connection weight value corrections and the threshold corrections from the output layer to the hidden layer, the connection value corrections and the threshold value corrections from the hidden layer to the input layer, and

Figure 1 .Figure 6 .
Figure 1.Four Faces of the Paper Money (a. the first face, b. the second face, c. the third face, d. the fourth face)

Table 1 .
Measured data