Research on Multi-Classification of Credit Rating of Small and Medium-Sized Enterprises in Growth Enterprises Board Based on Fuzzy Ordinal Regression Support Vector Machine

It is necessary to classify credit rating of small and medium-sized companies in Chinese growth enterprises board. We selected the data of 90 small and medium-sized companies, used fuzzy theory to calculate the qualitative variables, and reformulated support vector machine for ordinal regression method so that different input points can make different contributions to decide hyper plane, to analyze multi-classification of credit rating problem, and divided them into four different categories and demonstrated the good performance. The effectiveness of this improved method is verified in multi-classification of credit rating of small and medium-sized logistical companies; the experiment results show that our method is promising and can be used to other multi-classification problems.


Introduction
Small and medium-size enterprises are a pillar in China's economy.The statistical data demonstrated that small and medium-sized enterprises account for more than 99% of total number of enterprises, the contribution to GDP is more than 60% and the tax contribution is more than 50%, providing nearly 70% of the import and export trade amounts, creating about 80% of urban jobs in China in 2009.The country's long-awaited NASDAQ-style second board -Chinese's growth enterprises board started with 40 listed firms at November 2009.Chinese government and people hope that the growth enterprise board will play an important role in stimulating private investment, advancing industry upgrading, and promoting employment.It will also allow the capital market play its fundamental role in allocating market resources in a better way.Until nowadays, there are more than 200 enterprises in growth enterprises board in Shenzhen stock market.However, during two years, the growth enterprises board once crowned by "high growth potential market" make the investors disappointed, appeared repeatedly declining, a great deal of loss, the managers selling their own stocks to earn profits and financial frauds.Even if patient investors also had the questions, the companies in growth enterprises board highly not only cannot bring undertaking profits, but also induce to a large amount of loss and deficits.The researchers and investors hope delisting provision is brought into effect in companies of growth enterprises board.And China's Securities Regulatory Commission was improving delisting system to contain "shell" speculation and it was started that non-public bonds of SME in growth enterprises board.Thus, it is necessary to research on multi-classification of comprehensive evaluation and credit rating of SME in growth enterprises Currently, the qualitatively appraised method used in the practice is mainly the artificial expert analytic method, called the classical credit analysis method using 5 fields to analyze the debtor's credit condition (called 5C method): Character, Capacity, Capital, Collateral and Condition.The quantitative assessment method is statistical methods used vary widely, such as Multi-linear Discriminate Analysis, Logistic analysis model, Fisher model analysis and artificial analysis method.

Research on Small and Medium-Sized Enterprises' Credit Rating
Xu Zhi-chun (2008) stochastically selected 12 financial indexes and 11 non-financial indexes (external environment changes, guarantee outside, investment increased, appraised value of mortgages decreased, misbehaviors, negative news such as tax or environmental pollution, investigated by government authorities, net cash flow decreased, not coordinating to bank, manager changed) of small and medium-sized enterprises' as samples in the CMS database from some provinces ,establishing logistic regression model to forecast the borrowers' default risk.In doing so, early warning ability has been remarkably enhanced and early-warning model classification leads to better results than random classification.Dai Fen (2009) proposed the small and medium-sized enterprise credit rating index system which combined the non-financial indexes with the financial indexes, using ant group neural network to evaluate the samples to five levels, the results of which indicated that the ant group neural network is better than the traditional BP neural network.The enterprise's credit rating from the view of credit sales in enterprises: Steven Finlay (2010) selected the selling data from a British large-scale catalog retailer and described the customers' default behavior with the binary continually financial behavior forecasting model.Finlay proved that this model is a feasible taxonomic approach and that people can even may also further calculate the profit optimization boundary if using genetic algorithm.

Support Vector Machines
Support vector machines (SVM) was first proposed by Corinna Cortes and Vapnik in 1995 that has been used very well to solve problems with a few such samples as nonlinear and high-dimensional pattern recognition, and can applies to other machine learning question, just as the function fitting.Support vector machine can solve regression problem (time series analysis) and the pattern recognition problem (classified question analysis), and also is also used in comprehensive evaluation and forecasting problems.Support vector machine is established on statistical learning theory, making up of VC dimension and structural risk minimization, which, based on the limited sample information, seek the balance between complexity of the model (ie, specific learning accuracy of training samples) and learning ability (ie, the ability of error-free identification to any sample) in order to obtain the best generalization ability.The classification process is a machine learning process.In the space of N hyperplane, VC is N +1.These data in VC dimension is n-dimensional space points, these points separated by a n-1 dimensional hyper plane, which is usually referred to as a linear classifier.Training samples of VC dimension in limited circumstances, when the sample size n is fixed, the higher VC dimension of learning machine, the more complicated of learning machine, VC dimension of learning reflects the learning ability of function set.The greater VC dimension is, the more complicated machine learning is (larger capacity is larger).Structural risk minimization is to ensure the classification accuracy (empirical risk), while reducing VC dimension of learning machine, and control expected risk of learning machine on the entire sample set, and then find the best classification plane, which can make the two different types data have the largest interval, also known as the largest interval hyper plane.SVM maps the vectors to a higher dimensional space, in this space to establish a maximum interval of hyper plane, then to build two parallel hyper plane on both sides of hyper plane which separates data each other, and maximize the distance between the two parallel hyper plane which have the right direction.The larger that the distance between these parallel hyper plane is, the smaller the classifier has total errors, support vector refers to the training sample points at edge of the interval.Support vector machine and neural network have the similar learning mechanisms, but SVM differ from neural network to use the mathematical methods and optimization techniques.Kernel function of SVM is the key.Set of low-dimensional vector space is usually difficult to divide, if mapping them to high-dimensional space, which will increase the computational complexity, citing the concept of kernel function can solve such problems.Choosing the appropriate kernel function, high-dimensional space will be classified by function, using different kernel functions will lead to different SVM algorithms.Zhou Qifeng, et al (2005) , selected more than 1,000 industrial enterprises as samples from a commercial bank in 2003, made the rates of AAA, AA, A and A-as the output.By experiments, several multi-classification strategies assessing performances were compared and directed acyclic graph SVM and embedded space SVM were more suitable for credit risk assessment, which had a faster learning speed and higher forecast accuracy for commercial banks as credit risk assessment modeling.Wun-Hwa Chen et al (2006) used SVM to classify commercial banks which were qualified distributors in Taiwan's capital market, selected the index of stock market information, government financial support, major stockholder financial support and other variables analysis, and got the results which had the ratio higher accuracy than that of neural network.Xiao Wenbing et al (2007) used support vector machines to establish individual credit evaluation model, and compared the results with that of three fully connected BPN methods.The results showed that SVM in identifying potential loan applicants is better than that of neural network.Cheng-Lung Huang (2007) used mixed method combining genetic algorithms with support vector machine to analysis credit data in Australia and Germany, and obtains a more ideal result.Yan Youhao, et al. (2007) proposed an improved FSVM based on vague sets by assigning a truth-membership and a false-membership to each data points and reformulate the improved FSVM so that different input points can make different contributions to decision hyper plane.It is verified that the effectiveness of the improved FSVM in credit rating and the experiment results show that these method is promising.Cheng-Lung Huang, et al (2007) used support vector machine with mixture of kernel to evaluate credit risk and demonstrate the good performance for the credit dataset from a US commercial bank.Atish P. Sinha, et al (2008) compared the results of Naive Bayes method, logistic regression, decision trees, decision tables, neural networks, K nearest neighbor method, support vector machine method in the borrower's credit evaluation and considered that in the certain fields and data limited it is important that to stress synergy between data mining and professional knowledge.Wu Chong, et al (2008) used integrated fuzzy integral-based support vector machine method to assess customers' credit in e-commerce environment and found good robustness and generalization ability.Shu-Ting Luo, et al (2009) used support vector machines and clustering algorithm to rate German credit data, and received more satisfied results.Wan Fuyong (2009) using Gaussian kernel SVM established the financial risk assessment model for listed companies, selected from 13 key indicators of listed companies, established 42 kinds of financial risk assessment models and compared the prediction accuracy of results.Wan proved support vector machines and statistical methods are better, and that based on numerical simulations Gaussian kernel of support vector machines in financial risk assessment of listed companies is superior.Wen Jingchen (2010) used similar Lp standard support vector machines to analyze 6,000 records of major U.S. credit card banks, solving smoothing problems by conjugate gradient method, and the results showed that this method was effective.

Research Methodology
It is necessary to analyze comprehensive evaluation and credit ratings of small and medium-sized companies in growth enterprises board for investors and other stakeholders to make decisions.And it is helpful to early warning and delisting system.
According Law of the People's Republic of China on Promotion of Small and Medium-sized Enterprises and Provisional Regulations on Standards for SMEs, small and medium-sized enterprises of industry, construction, wholesale and retail trade, must meet the following criteria: total number of employees less than 2000, sales revenue less than RMB300 million yuan per year, or total assets of RMB40,000 yuan; Among them, medium-sized enterprises have to meet people and over 300 employees, sales of RMB30 million yuan and above, total assets of RMB40 million yuan and above; the other for small businesses.
From nearly 300 listed companies in growth enterprises board, we selected 90 medium-sized companies according with the law and regulations.(Data collected from: http://quote.eastmoney.com/chuangyeban.html).We collected qualitative variables, including the basic information of companies: companies' name, companies' code, actual controllers, controlling shareholder, information of people in charge (such as age and education); macro environment of company: Industry prospects and industry performance ranking, location of company, Executives annual salary.Moreover, it is necessary to use quantitative variables, including total common stock, outstanding shares, total assets total sales revenue, operating income, net income, liquidity ratios: current ratio, quick ratio, cash ratio, operating cash flow ratio; profitability analysis ratios: return on assets, return on common equity, profit margin, EPS; activity analysis ratios: asset turnover ratio, accounts receivable turnover ratio, inventory turnover ratio; capital structure analysis ratios: debt ratio, interest coverage ratio; capital market analysis ratios: price earnings ratio, market to book ratio, dividend yield, dividend payout ratio Because some small and medium-sized companies did not have enough information, default data was 0. Since sample data obtained mostly were qualitative variables, we should use the theory of fuzzy sets proposed in 1965 by Zadeh, which has been used for handling fuzzy decision-making problems.The elements of fuzzy sets have degrees of membership.A fuzzy set is a pair (A, m) where A is a set and m : A→[0,1].For each x∈A , m(x) is called the grade of membership of x in (A, m).For a finite set A = {x 1 ,..., x n }, the fuzzy set (A, m) is often denoted by {m (x 1 ) / x 1 ,...,m(x n ) / x n }.Let x∈A.Then x is called not included in the fuzzy set (A, m) if m(x) = 0, x is called fully included if m(x) = 1, and x is called a fuzzy member if 0 < m(x) < 1.The set is { x∈A | m(x) > 0 }is called the support of (A, m) and the set { x∈A | m(x) = 1 } is called its kernel.After calculation, we changed quantitative variables into data [0, 1], and easily used in next calculation.
And the selected indicators will duplicate information, such as total assets, and debt ratio.Therefore, principal component analysis is used to dimension reduction analysis, which obtained variables through computing process.
Because this problem belongs to small sample classification question, it selects support vector machine of multi-classification algorithm on small sample learning.Assigns training set: Support vector machine solving multi-classification problem is to construct a series of problems into two classifications, and establish corresponding two classified machines, to determine which category input x belongs to based on two classified machines.One versus one method, one versus the rest method, and Crammer-Singer multi-classification support vector machines.It is selected ordinal regression support vector machine, consider that M category of input is sequential from 1 to M in space Rn, which has identified adjacent relationship, that is, class j is the adjacent class of j-1, j +1 , and class j-1 and class j +1 are not adjacent class, and space can be separated by M-1 parallel hyper plane.

 
Here, x is training input, superscript j expresses the category of corresponding training, li is the number of jth kind training data.
The primitive question is: and penalty parameter C>0, primitive question transforms to convex quadratic programming problem:

Conclusion
Using support vector machine classified credit rating of small and medium sized companies of growth enterprises board into selling(from website the investment rating agencies gave these companies no comments and these companies had losses), holdings, buying three category.According to the experimental results, accuracy rate of training set were: 84.2%, 86.4%, 83.1%, accuracy rate of testing set were: 79.1%, 76.8%, 74.3%The results can be explained that level selling means the enterprises with poor performance and losses of investors in the future, level holdings means the enterprises were not so good, but in the future the profits of them would be increased, level buying means the enterprises operating performance very well.Sometimes, the performance of enterprises will be affected by national industrial policy and macroeconomic environment, such as bank and real estate industry.
For these exceptional phenomena, it is manually adjusted by artificial expert method in practice, and achieves the better results of judgments.Thus, it is certified that we reformulated support vector machine for ordinal regression method so that different input points can make different contributions to decide hyper plane, to analyze multi-classification problems, and divided them into different categories to demonstrate the good performance.The effectiveness of this improved method is verified in multi-classification of credit rating of small and medium-sized companies in growth enterprises board.The experiment results show that our method is promising and can be used to comprehensive evaluation, sequencing problems, and other multi-classification problems.Our future direction of the research will focus on how to improve the accuracy of multi-classification.We believe that more suitable parameters and variables selection will affect and improve the performance of generalization.Extending the multi-class classification to solve other problems is also our future research work.