Bankruptcy Prediction Using Support Vector Machines and Feature Selection During the Recent Financial Crisis

This study aims at identifying an optimal set of features for predicting firms bankruptcy events in the current macroeconomic context. To this aim, among many financial features, we propose new country-specific factors which consider the macroeconomic conditions of the countries where firms operate. Our forecasting model is based on Support Vector Machines (SVMs), which are tools employed in supervised learning. Firstly, starting from a wide set of variables commonly used for bankruptcy prediction we assess the general effectiveness of SVMs also in comparison with the performances of other commonly used methods. Secondly, we try to improve the accuracy of forecasts by selecting optimal subsets of variables through a feature selection method. The results show that, in the current socio-economic context, the conjunct use of SVMs and the proposed feature selection technique significantly improves the accuracy of bankruptcy predictions compared to the performance of the other methods examined. Furthermore, we show that the proposed country-specific factors are relevant information for predicting the failure of firms and that most of the ratios proposed by Altman in 1968 are still relevant nowadays.


Introduction
Since 1968, when Altman proposed his z-score test (Altman, 1968), many authors have been studying alternative ways in bankruptcy prediction.The reader could refer to many works in the literature.An almost complete survey on this topic is proposed in Ravi Kumar and Ravi (2007).approach is adopted by Jeong et al. (2012), where the goal of the work is not to improve the prediction model itself but the method used in the prediction, in this case the architecture of the NN system.
Our study contributes to the existing literature by finding a better set of features to predict bankruptcy events of firms in the recent economic context.The first step was to make a comparative analysis between the performances of methods commonly used in literature-namely the Linear Discriminant Analysis and the Logistic regression-and the performances of SVMs, that recently have gained more and more interest in the scientific community.As a benchmark for this initial analysis we considered Altman's z-score model, which still nowadays is a very common method to assess the financial capabilities of a firm.Afterwards, in order to capture the economic situation, we have provided to our models a large set of information (variables), including new macroeconomic indicators which, according to our opinion, is a noteworthy aspect.Presuming that not all the provided variables were actually useful in the forecast, we performed a feature selection.In this way we identified the variables with more attractive information content respect to today's context.
In the past, few researchers tried to use algorithms for selecting optimal subsets of predictors.In particular, in du Jardin ( 2010) the author attempted to analyze the impacts of correctly selecting the variables subset on the I type error, while in Tsai (2009) a comparison of well known feature selection methods is made through the employment of a Multilayer Perceptron (MLP).Recently, Zhou et al. (2014) proposed an approach for feature selection and parameters optimization used in the NN system for the prediction based on one of the most common GA techniques.
Relating to previous studies that are similar to our research, Min and Lee (2005) already used SVMs for bankruptcy prediction and Salcedo-Sanz et al. (2004), in particular, was a pioneer in the use of feature selection and SVMs for predicting insolvency of non-life insurance companies.Furthermore, Xie et al. (2011) found empirical evidence on the relevance of external market variables for bankruptcy prediction based on SVMs.In the light of their findings, in our work we propose and test the efficacy of using jointly all these elements.Furthermore, we propose the definition of new macroeconomic variables and we provide new empirical evidence on their relevance at a world-level also in the light of the recent financial crisis.Our work allows to infer whether the explanatory variables of Altman's z-score model are still relevant nowadays and if the examined macroeconomic factors (and/or other variables considered in the feature selection) have gained importance after the burst of the recent financial crisis.Finally, since we apply on the same dataset also other commonly used bankruptcy prediction methods, our results enable us to compare the performances of the examined methods in the recent economic context.
The paper is organized as follows.The classification methods used in this study-i.e. the Altman's z-score model, Linear Discriminant Analysis, Logistic regression and SVMs-are briefly described in section 2. Section 3 is devoted to the description of the empirical dataset used and to the procedure adopted for developing the analyses.Section 4 reports the results of our empirical study, distinguishing among the results obtained by using of Altman's z-score features, using the full set of features and using the attributes selected through a feature selection technique respectively.The final section concludes.

Altman's Z-Score Model
Altman's model tested in this work is the well-known z-score model which was developed by Altman in (1968).Subsequent studies on this topic and extensions of Altman's original model can be found in Altman (1973Altman ( , 1977)), Altman et al. (1977), Altman et al. (1994), Altman and McGough (1974) and Altman and Hotchkiss (2006).The original discriminant function developed by Altman in 1968 is a linear combination of five common business ratios, weighted by their respective coefficients (equation 1): (1) where, for each firm j, we have: is the overall index (z-score); is (Working Capital) j /(Total Assets) j ; is (Retained Earnings) j /(Total Assets) j ; is (Earnings Before Interest and Taxes) j /(Total Assets) j ; is (Market Value Equity) j /(Book Value of Total Debt) j ; is (Sales) j /(Total Assets) j .
Once calculated the value of for a given company j, the classification of firm j into the solvent or the non-solvent group is performed having regard to the critical values of determined by Altman.In particular, a firm j is expected to be solvent if is greater than an upper bound , while it is expected to go bankrupt if is below a lower bound .The area between and is defined as the zone of ignorance or gray area because of the susceptibility to misclassification, hence for the model does not classify company j into any group.The original Altman model is characterized by a fixed set of variables and fixed values of the coefficients, while the critical values of can vary depending on the specific characteristics of the sample of firms being analyzed.For the purposes of this work, we consider the values = 1.81 and = 2.67.

Other Classification Methods
The objective of classification methods is to classify observations into one, two or more mutually exclusive and exhaustive groups using information about a given set of variables measured for each observation.Among these, linear classification methods are aimed at detecting one or several linear functions of the given set of variables to be used for classification.Traditionally, two of the main linear classification methods are Linear Discriminant analysis (LDA) and Logistic regression or Logit regression (Logit).Details on the LDA methodology can be found in Kolossa and Haeb-Umbach (2011), while for a description of the Logit regression structure see Boyacioglu (2009) and Tinoco and Wilson (2013).Other linear regression methods can be found in Varmuza and Filzmoser (2009).
Relating to non-linear classification method, in this work we tested the prediction accuracy of Support Vector Machines, with special reference to the problem of training a classifier able to distinguish between two sets of points.For more details about the theory or the solution approach for the SVMs learning problem applied in this study see Vapnik (1998).

Data
The models that we present in this paper are tested using samples of companies constituted by both solvent and non-solvent firms using data collected on Bloomberg.To this aim, we selected all the 6,929 companies that were included in the equity index Market World published by Thomson Reuters-Datastream at the date of the research.
We first determined the sample of non-solvent companies (the NS-Group) by requiring the bankruptcy date to be greater than January 1 st , 2007, in order to select only the firms which went bankrupt during or after the recent financial crisis.For the purposes of developing balanced analyses, we selected a random sample of solvent firms characterized by the same cardinality of the NS-Group (the S-Group).Consequently to the use of a random criterion, the selected sample covers the whole spectrum from healthy to border-line companies, thus avoiding any selection bias, as described in Atiya (2001).In order not to alter the analyses with exchange rate effects, since all the explanatory variables are ratios between monetary values or pure numbers, all monetary values were collected in their native currency.For each examined independent variable, extreme values were excluded from the group of observations through a Winsorization procedure eliminating the values that are below the 0.10% percentile and above the 99.90% percentile.Finally, the observations that did not have the necessary data to determine all the examined explanatory variables were excluded.
Data were collected on a yearly basis, at the date of December 31 st of each reference year, where the reference year is specific for each company and it is determined differently for solvent and non-solvent firms.For the S-Group it was chosen 2012, while for the NS-Group we considered two cases: 1) The date of December 31 st of the year before the year in which companies went bankrupt, therefore between 0 and 1 years before the bankruptcy date of each firm j; 2) The date of December 31 st of two years before the year in which companies went bankrupt, therefore between 1 and 2 years before the bankruptcy date of each firm j.
This choice is aimed at examining the predictive capability of the models, one year ahead (case 1) and two years ahead (case 2) from the moment of the analyses, for each combination of set of variables and default forecasting method.The assumption made is that each year investors could apply the models at the reference date of December 31 st .
Once defined both the S-Group and the NS-Group(s), we selected two final samples: Sample 1, constituted by the union of the observations of the S-Group and the observations of the NS-Group sub case 1, and Sample 2, constituted by the union of the observations of the S-Group and the observations of the NS-Group sub case 2. Finally, we defined the dichotomous dependent variable indicating by 0 solvent firms and by 1 non-solvent firms.
Table 1 shows descriptive statistics of all the examined variables for both Sample 1 and Sample 2. The variables from 1 to 28 shown in Table 1 are ratios derived from the accounting data reported in the annual financial reports of firms, which have been determined considering the variables most commonly used by researchers to predict bankruptcy (Du Jardin, 2010;Ravi Kumar & Ravi, 2007).The variables from 29 to 31 instead are country-specific factors, some of which have been determined following an approach that, to the extent of our knowledge, has not been used in previous studies for predicting bankruptcy.Similar research on this topic can be found in Laghi et al. (2013) and Lam (2004).
The Sovereign rate 1 Y and the Sovereign rate 10 Y are the gross sovereign rates with maturity of 1 and 10 years respectively, whose values, for a given firm j, have been set equal to the market value of the official gross rate of the country where company j operated at December 31 st of its reference year.
The sovereign rating spread (SRS) was determined on the basis of the historical sovereign ratings (R) issued by Standard and Poor's (2013).Official ratings are issued in a textual form (e.g.A-or BBB+) so they must be converted to numerical values in order to be considered in the analyses.To this aim, differently than other authors who assume that one-notch movements have the same effect on credit spreads independently from the asset class (e.g.Aunon-Nerin et al., 2002), we associated different numeric values with each element of the S&P rating scale, hence ( ) .The values of were set equal to the risk spreads-in basis points-associated with Moody's ratings estimated by Damodaran (2012) as at January of each reference year.Our only contribution to those numerals has been the association of those risk spreads, originally associated with Moodys ratings, with the corresponding ratings of the S&P scale.Table 2 shows the numeric values estimated by Damodaran (2012) between 2005 and 2012 that we assigned to each element of the reference S&P rating scale.In addition, also positive and negative outlooks issued by S&P have been considered according to the following formula (equation 2): where indicates the sovereign rating spread attributed to the j-th firm, is the rating at time t of the country where the company operates, e is the numerical value associated with and and represent respectively the positive or negative outlook eventually issued for the same country at time t.

Implementation
The analyses presented in this study were developed using different softwares for each type of default prediction model: the Altman model was implemented using Excel, the LDA and the Logit regression models were applied using Stata and SVMs were trained and tested through Weka (Waikato Environment for Knowledge Analysis).Relating to Weka, we used the libSVM classification method, which implements the SMO algorithm for kernelized support vector machines.More details about this method are available in Chang and Lin (2011).
In order to apply and test the general predictive capability of LDA, Logit regressions and SVMs, two different datasets are required: a training set, used to train the models, and a test set, used to verify the efficiency of the learning procedure.Hence, in order to guarantee the comparability of results among the examined models, although the test proposed by Altman is able to predict the default of a firm using only the information related to that firm, the tests using Altman's model were made considering only the instances within the test set.We split each sample in two parts randomly, using the proportion 2/3 and 1/3 for the training and the test set respectively.We repeated this procedure five times generating five couples of training and test sets with a different composition.This procedure was followed for both Sample 1 and Sample 2.
Regarding to the setting of the SVMs parameters we followed a strategy based on a grid search procedure.A grid search simply consists in an exhaustive searching through a manually-specified subset of possible values for the parameters.According to this strategy, we defined a certain hyperparameter space and evaluated the performances achieved by the SVMs with the different settings.At the end of these evaluations we identified the best configuration as: i) the Gausian Kernel with parameter ; ii) .
Since from a practical point of view the problem of classifying incorrectly a non-solvent firm is more serious than classifying incorrectly a solvent firm, we decided to assign different weights to the two classes, namely if the firm is insolvent and if the firm is solvent.The best setting for the SVMs resulted to be the same for both the prediction horizons.

Altman's Z-Score Features
In this section we analyze the results achieved by Altman's z-score test and the other examined methods using the same set of ratios of the former test.As already said, one of the strengths of this model is related to the property of realizing a prediction using only the data of a firm.On the other hand the model is characterized by two main weakness: i) Altman's z-score model simply consists in a linear regression with fixed values for the five regressors.In a context where the world economic scenario has changed, these values may not reflect the nowadays situation; ii) Altman's z-score model provides a range of values (the gray zone) wherein a firm is classified neither solvent nor insolvent.
Considering the previous observation, in order to compare the results obtained with Altman's model with the ones achieved using the other methods, we forced Altman's model to classify a firm as solvent or insolvent in case the score lies within the gray zone.In particular we defined two different and reasonable rules, reported in Table 3.According Rule 1 we consider as unique threshold value while Rule 2 considers as unique threshold value the midpoint between and .The results achieved by these two configurations of Altman's test are compared with the ones obtained by the LDA, the Logit and the SVM approach.In Table 4 we reported what we achieved in the prediction one year ahead, while in Table 5 the same results are reported considering the prediction two years ahead.
Before starting to illustrate the results, some explanations about the notation used in the main tables: S: represents a solvent firm while NS represents a non-solvent firm.In particular NS/NS is the number of firms correctly classified as non-solvent, while NS/S represents the number of non-solvent firms classified as solvent.Similarly, S/NS is the number of solvent firms classified as non-solvent and S/S represents the number of solvent firms correctly classified; cc/all: represents the percentage of instances correctly classified; NS/aNS: represents the percentage of non-solvent firms correctly classified.
First of all we consider the results reported in Table 4 and in particular the ones reported in row Mean.In the evaluation of the performances it is possible to adopt two different points of view.The former has a theoretical nature and it is focused on evaluating the efficiency of the models according to the ratio of instances correctly classified over the cardinality of the sample.The latter gives more importance to the practical purpose of this bankruptcy test.In fact, from the point of view of investors, the misclassification of a solvent firm (S/NS) implies only the loss of an investment opportunity, while the misclassification of a non-solvent firm (NS/S) could determine a capital loss, as a consequence of investing in that bankrupt company.In this context the efficiency is evaluated by the percentage of how many instances are correctly classified as non-solvent among all the non-solvent instances in the sample.Following the first criterion, it seems that the best tool is the Logit since it reaches a mean value of 87.04%.According to the second criterion, the best method is Altman with threshold value 2.24, which obtains on average the 94.84% of correct classifications among all the non-solvent instances.At this point a question arises naturally: which is the best tool in this case?In order to answer this question, the user should express before his preference for one criterion.It is important to notice that, on the basis of the results, these two criteria seem to be in contrast.In fact while the Logit reaches an 87% of successes following the first criterion, it obtains only 77.54% following the second one.Likewise Altman 2.24 reaches 73.57% and 94.84% according to the first and the second criterion respectively.However, the aim of this work is to find a tool which can satisfy both the criteria in a suitable manner, in other words a model that reaches a balance between the two criteria.Thus, without expressing any preference on the two criteria, we can see that the most balanced tool is the SVM, which obtains 83.66% of successes according to the first criterion and 86.67% according to the second one.
Repeating the former analysis on the results obtained in Table 5 the SVMs are again the best tool with 77.65% and 82.22% of successes.In general it can be noticed that when the prediction framework is extended, passing from one to two years ahead, the overall performances decrease.

Full Set of Features
As said at the beginning of this section the first goal of this work is to assess the accuracy of SVMs at their full potential.Keeping this in mind we extended the number of ratios from the 5 proposed by Altman to 31, namely those listed in Table 1.For this reason Altman's model is excluded from the following analysis and the next comparisons will be made only between LDA, Logit regression and SVMs.The results of the predictions one year and two years ahead are shown in Table 6 and Table 7 respectively  We can see that, when we pass from 5 to 31 explanatory variables, the performances of all the models decrease with respect to the first criterion while increase with respect to the second.This means that, considering the performance related to the capabilities of generalization, the information within all the features generates a sort of noise or redundancy.On the other hand, the capabilities of the models to correctly classify a non-solvent firm are increased.However, similarly to what we observed with reference to the case with Altman's variables, the SVM model seems again the most balanced method for both the forecast horizons.
The results achieved in this section let us to suppose that the information carried out by the ratios used by Altman may no longer be exhaustive in the nowadays economical context.However, extending too much and without a particular logical assumption the number of ratios, the performances do not improve according both the criteria.This led us to exploit feature selection techniques.In the following section we introduce the feature selection scheme implemented and then we show the results achieved by considering only the selected variables.

Feature Selection
Before inducing a model, we have a set of information collected in some features and, most of the time, we do not know which part of it is the most significant.Theoretically, having more features should result in more discriminating power.However, practical experience with machine learning algorithms has shown that this is not always the case.
In this regard, we focus the attention on the importance of selecting the features (also called attributes) to be used in the model.Feature or attribute selection is a technique whose goal is to form a subset of the initial features of the problem aiming to improve the performance of the underlying model, both in terms of correctness and fastness.The question now is whether it is possible to discard some features, and how to select the correct subset of variables.There is not an univocal answer to the question but the underlying logic may change depending on the particular method implemented.In the previous section we underlined that extending the set of variables the performances of the models improved in one direction while worsened in the second.Using the feature selection on the SVM model our aim is to improve the prediction performance according to both the criteria and to verify if the subset of selected attributes is able to improve also the results achieved by the other two classification methods.In order to implement this kind of selection we used again the Weka software.Weka offers different kind of evaluators and search methods.Among all of them we selected as evaluation method the CfsSubsetEval and as search method the BestFirst: CfsSubsetEval stays for Correlation-based Feature Selection Subset Evaluation and the basic idea underlying this algorithm is to prefer subsets of features that are highly correlated with the class while having low autocorrelation; BestFirst is the selected search method.We used it in a forward search direction which means that it adds greedy the features starting from the empty set.
It seems clear now that the results achieved by training a machine learning using the subset of feature derived by a feature selection algorithm strictly depend on the composition of the sample.The higher the size of the sample, the lower this dependency.However, we are interested in finding an optimal set of ratios in order to realize a better prediction of non-solvency/solvency of a firm characterized also by a certain stability.For this reason we need to introduce a deeper approach in order to verify the performances of the SVMs.To this aim, for both Sample 1 and Sample 2, we generated five different subsets of features applying the feature selection algorithm to each of the five samples.Then, considering the i-th subset of variables, we trained four other SVMs, each one for the remaining four couples of training and test sets.In this way we can assess the fitness of the particular subset of feature not only for the sample on which the feature selection was made.The underlying idea is that selecting among all the subsets of features the one which on average has a better performance we obtain the group of attributes that better generalizes the bankruptcy events.Once identified the optimal subset of features, we reused the LDA and the Logit models considering only those features in order to obtain again comparable results.The attributes selected for the prediction one year ahead are reported in Table 8, while Table 9 shows the results achieved by applying the different models considering only those features.It is evident that SVMs obtain definitively higher and more stable performances than the ones obtained with the other two models for the predictions both one year ahead and two years ahead.Moreover we notice that using SVM models with the appropriate information it is possible to reach a prediction accuracy higher and more balanced than the one obtained with Altman's variables and with the full set of features.

Conclusions
In this work we defined a new set of features, which includes country specific macroeconomic factors, that improves the accuracy of predictions of firms' bankruptcy events in the recent economic context.Using SVMs jointly to feature selection techniques, we identified the optimal subset of variables and assessed whether the ratios proposed by Altman in 1968 are still relevant nowadays for bankruptcy prediction.Furthermore, we applied also other commonly used bankruptcy prediction methods on the same dataset and we compared their performances with the ones of SVMs.
The results show that in the current socio-economic context the conjunct application of SVMs and the proposed feature selection method significantly improves the accuracy of bankruptcy predictions compared to the examined traditional set of variables and default prediction methods.In particular, the conjunct use of these elements permits to obtain stable percentages of success around 90% for both one year and two years head predictions.
From an economic point of view, despite the use of a different model (the SVMs) and despite the deep changes of the world economic system that followed the crisis from 2007, most of the ratios proposed by Altman in 1968 result to be still relevant nowadays.Furthermore, the examined macroeconomic indicators appear to be relevant information for predicting bankruptcy.It is evident that in crisis periods, independently from individual firms characteristics, the existence of an economic stability at a country-level and the accessibility to credit have become even more important factors for the survival of a company.With reference to sovereign rating spreads, our study demonstrates that sovereign ratings, though their reliability has been widely criticized in recent years, constitute a relevant information for predicting whether a firm that operates in a given country will go bankrupt or not.However, further research should be conducted on this topic in order to develop an internal model for quantifying the risk spreads associated with each element of the reference rating scale.

Table 1 .
Descriptive statistics of the examined explanatory variables for sample 1 and sample 2

Table 3 .
The rules adopted for applying the Altman's z-score model

Table 4 .
Features proposed by Altman in his z-score test, prediction one year ahead

Table 5 .
Features proposed by Altman in his z-score test, prediction two years ahead

Table 6 .
. All features proposed, prediction one year ahead

Table 7 .
All features proposed, prediction two years ahead

Table 8 .
Features selected for the prediction one year ahead

Table 9 .
Features selected by the algorithm, prediction one year ahead Table10are resumed the attributes selected for the prediction two years ahead and in Table11the results achieved by applying the different models considering only those features.

Table 10 .
Features selected for the prediction two years ahead

Table 11 .
Features selected by the algorithm, prediction two years ahead