Financial Distress Prediction Using Distress Score as a Predictor

Financial distress can be the reflection of corporation's management condition. Consequently the distress score of corporations should be considered as a new predictor variable in predicting the financial distress. The analysis of ROC curve, among the models employed to compare the effectiveness of different statistical models, is often used in the fields of psychology and bio-physics in order to summarize the discriminatory of a diagnostic test and also to compare the performance of different models for binary outcomes. Therefore, concerning the topic of this research and the use of ROC curves in predicting the financial distress of corporations, we use logit models to study the financial distress of the manufacturing corporations in Tehran Stock Exchange. We also compare the accuracy of the prediction method with financial distress score variable to the method without this variable. Concerning the accuracy of prediction and classification, the results of this research show that the accuracy of prediction can be enhanced by using the distressed score, gained from DEA, as a new predictor variable in predicting the financial distress.


Introduction
Therefore considering the effects of financial distress of corporations on the beneficiary groups, presenting the models of financial distress prediction, has been one of the most attractive fields of financial and economic researches.
In all financial failure prediction models, variable selection, also called feature selection, is a fundamental problem that has significant impact on the prediction accuracy of the models.Under the assumption that a corporation's financial statements appropriately reflect all of its characteristics, current prediction models select variables directly from various financial ratios defined based on information that appears in the corporation's financial statements (Xu, X. and Wang, Y., 2009).Although financial ratios, originated in a corporation's financial statements, can reflect some characteristics of a corporation from various aspects to a certain extent.It is widely recognized that a main cause of financial failure is poor management (Gestel et al., 2006).Therefore, we believe that the financial distress of corporation can be a reflection of its management condition.As a result the score of the corporation's distress should be selected as a new variable in the prediction model of financial distress of the corporation.Therefore the variables used in this research are divided into five groups: liquidity ratios, profitability ratios, activity ratios, leverage ratios, and the distress score of the corporations.
In this article we present the financial distress prediction model as the predictor variable by using the financial distress score which is driven from the output-oriented BCC model.So we consider the manufacturing corporations accepted in Tehran stock exchange whose information was available during the years 2001 to 2008, and collected the data related to them in this period of time.
This research is about ROC curves performance in predicting the financial distress of the corporations.We used logit model to study the financial distress and we compare the accuracy of the prediction model with distress score variable to the model without this variable.
The remainder of this paper is organized as follows: Section 2 presents a detailed literature survey that discusses bankruptcy evaluation models.In Section 3, we give a brief exposition of ROC curve analysis.Section 4 describes the data sources, sample selection criteria, and variable selection process.Section 5 presents the empirical results.Section 6 concludes this study.

Literature Review
Searching the history of the researches conducted in the financial distress prediction and bankruptcy shows that many researches have been undertaken in this field.These researches have considerable differences in the number and the kinds of predictive factors and the structure of model-making.By presenting a univariate model and multiple discriminant analysis (MDA), Beaver (1966) and Altman (1968) respectively have great effects on the prediction and classification of the distressed and non-distressed corporations.However, the credit and efficiency of this models are dependent on some statistic hypothesizes.For elimination the defects of MDA, Ohlson (1980) has suggested logistic regression (Logit).This traditional method has been used as the best artistic method for classification and prediction (see Barniv et al. (2002), Poon, Firth, andFung (1999), andWest (2000)).Then by achieved improvements in other fields of science like mathematics and computer during the last decades, making use of artificial neural networks, Fuzzy logic, and data envelopment analysis (DEA) for designing models have attracted the researchers.
After it was noted largely in the financial distress prediction in banking industry, Data Envelopment Analysis has increasingly made use of in the investigating the financial condition and prediction of financial distress of corporations.
DEA was first proposed by Charnes et al. (1978).Cooper et al. (2006) provided complete knowledge on recent DEA developments.Gattoufi et al. (2004) listed more than 3000 previous DEA contributions.Instead of designing an independent model based on DEA, Xu&Wang (2009) considered the calculated performance score in this method with other financial ratios as a predictor variable.Initially they designed three models based on discriminant analysis, logit and decision tree by the use of financial ratios to test the ability of prediction of financial ratios.Then the calculated efficiency score for each corporation entered the discriminant analysis models, logit models and decision tree as a new variable.The comparison of the new model results to the elementary model results showed that entering the efficiency score to the former models enhances the ability of prediction in the elementary models.
Studying the past researches shows that different researchers use different composition of inputs and outputs which is one of the defects of DEA.It should be noted that in this article the method of choosing input and output is based on the Premachandra, Bhabra & Sueyoshi' research, (2009).In the context of bankruptcy assessment, the smaller (inferior) values in the financial ratios, which could possibly cause financial distress, are considered to be input variables.In contrast, the larger (superior) values in those ratios, which could cause financial distress, are classified as output variables.This analytical feature due to the selection of inputs and outputs is different from the conventional use of DEA.In the conventional DEA-based production analysis, productive performers consist of an efficiency frontier and insufficient performers exist within a production possibility set shaped by the efficiency frontier.In contrast, this study takes an approach that is opposite to the conventional production analysis because we are interested in distress prediction.
The frontier used in this study is a ''distress frontier" (not an efficiency frontier found in the conventional use of DEA-based production analysis) which contains many distress corporations (poor performers).Non-distressed (healthy) corporations are expected to exist inside a ''distress possibility set", which is shaped by the distress frontier (See Premachandra et al (2009), Xu, X. and Wang, Y. (2009)).
The mathematical definition of distress possibility set is like the production possibility set in economics.However, the nature of outputs in predicting the financial distress is the opposite point of the production analysis outputs meaning that in production analysis the higher the outputs, the better; on the contrary in the financial distress prediction the lower the outputs, the better.In addition, an opposite description can be applied to the use of inputs.
The analysis of ROC curve, among the models employed to compare the efficiency of different statistic models, is often used in the fields of psychology and bio-physics in order to measure the accuracy of a diagnosis test and also to compare the operation of different models of two-category results (Lloyd, 1998;Marzban,1998;Pepe,2000).

An Overview of ROC Curve Analysis
For non-probabilistic or categorical prediction, the four elements of a 2×2 contingency table provide a complete representation of performance for any number of classes (Wilks, 1995).Although the table has four elements, there are only two degrees of freedom, i.e. distressed&non-distressed firms.The contingency table, in turn, can be reduced to a host of scalar measures of performance, but it is difficult to display and interpret (Tang and Chi, 2005).

Sensitivity and Specificity
In order to preclude any loss data (due to the decrease from two degrees of freedom to one), the context of two scalar measure is employed in this paper.The true positive rate (TPR) or the 'sensitivity' (the probability of correctly identifying a positive) and true negative rate (TNR) or the 'specificity' (the probability of correctly identifying a negative) for functions are two common measures.Their complements are the false negative rate (FNR) and the false positive rate (FPR) for functions.In fact, they estimate two varied aspects of any dichotomous test (Tang and Chi, 2005).
Because the denominators are different, their measures are not complements of one another.In our case, the sensitivity and specificity of the dichotomous test are defined as follows: Generally TPR indicates the ability of the test to predict all the distressed corporations rightly.The higher the sensitivity, the lesser the amount of the false negative rate.TNR shows the ability of the test to predict rightly all the corporations which are not distressed.The higher the TNR, the lower the FPR.A good discriminatory test should have high sensitivity and high specificity.

The Area under the ROC Curve
The area under the ROC curve (AUC) is an important index of a general measure of features of the underlying distribution of forecasts.This measure is equivalent to the Gini coefficient (Thomas et al. 2002) and also to the Mann-Whitney-Wilcoxon two-independent sample non-parametric test statistic (Hanley & McNeil, 1982) and is referred to in the literature in many ways, including AUC, the c-index or c-statistic, and .
Let D and N represent the measurements for distress and non-distress subject, with high values suggesting a positive result and low values a negative result.Then, the AUC is = Prob (D > N).The larger the area under the ROC curve, the more the accuracy of the model prediction.In the case of perfect prediction, the area beneath will equal 1.0, while an area of 0.5 reflects random forecasts.In Equation ( 3), the non-parametric approximation of is (3) In Equation (3), N D and N N are the number of distressed and non-distressed firms, respectively, and With S D , S N being the score of a distress and non-distress subject, respectively (see Bamber, 1975;Tang and Chi, 2005).

Research Design
In this section, our data sources, sample selection criteria, and the process of selecting the variable are going to be discussed.The design of the research is shown in figure 1(see Tang and Chi, 2005).

Research Data
The field of study in this research is the manufacturing corporations accepted in Tehran Securities & Exchange Organization; therefore the needed data has been collected from the database of Tehran Securities & Exchange Organization.Regarding the studied period of time (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) the data of the 304 corporations, 79 of which were distressed and 225 of which non-distressed, was available.
Data collection of the distressed and non-distressed corporations has been done in the T, T-1 and T-2 years.The T, T-1 and T-2 years for the distressed corporations are defined respectively as the year of the occurrence of the financial distress (the corporations have lost at least half of their capital in 2 successive years), one and two years before that.Regarding the non-distressed corporations, respectively the year the corporation had the most profit and one and two years before that.The selected set for analysis is divided randomly into two set: Training (80 percent) and testing (20 percent) samples.The former is for making and estimating the models and the parameters related to prediction methods, and the latter is used for studying the righteousness and accuracy of the prediction of the model.

Variable Selection
In the models of the prediction of financial distress, the statistic models of the predictive variables are specifically selected by the use of some statistic tests to make sure of their being normal, independent and etc.
In this article the predictor variables are divided into two parts, one of them is the distress score of corporations gained from using the output-oriented BCC model, the other includes financial ratios derived from financial statements.It is worth noting that the selected ratios should reflect the specificities of profitability, liquidity, activity, and leverage.We have chosen 20 financial ratios whose performances in the former researches of financial distress prediction have been proved, as the potential prediction variables.These variables are shown in table 1.

Statistic Results of the Variables
In order to select the financial ratios, which are informative and closer to the financial condition of the corporation, we used Mann-Whitney non-parametric test in order to check the significance of the differences of each ratio in non-distressed and distressed corporations.Table 2 shows that most of the variables except X 6 , X 9 , X 11 , X 16 , X 18 , X 20 are significant at 0.95 confidence level.
After investigating the meaningfulness of the average difference of financial ratios of non-distressed and distressed corporations at 0.95 confidence level, it is necessary to eliminate those of which have not high ability to discriminate between distressed and non-distressed corporations.Generally speaking, when the goal of the researcher is to know the relations among a large number of independent variables to enter to the relation, the forward stepwise selection (conditional) is a proper one.6 variable for one year prior (T-1): Quick ratio, Net working capital/ Total assets, Total debt/ Total assets, Net profit/Total assets, Earnings before interest and taxes/Net sales, Net sales/Fixed assets and 4 variables for 2 years prior (T-2): Net working capital/ Total assets, Total debt/ Total assets, Earnings before interest and taxes/ Total debt, Gross profit/ Net sales were selected.The tables 3 and 4 show a summary of the results of estimating the Logit models (see fallah shams et al, (2011).

Results of Logit Models without Using Distress Score
The expenses of falsely classifying a corporation, approaching distress in reality, as a non-distressed (type I error) is much more than the time a non-distressed corporation id classified as mistakenly as a distressed (type II error).In the prediction of financial distress models, there must be a balance between these 2 errors (FP and FN) which are done by optimal cutoff point of the model.Optimal cutoff point is a point of model performance in which the error of false classification of the model (including error I and II) are minimized.
Different cutoffs have been used in prior studies for measuring classification accuracy.For example, Altman (1968) and Deakin (1972) use cutoffs which minimize misclassification accuracy; Ohlson (1980) and Palepu (1986) use the cutoffs where the distributions of the two groups intersect; and Barniv et al. (2002) and Frydman et al. (1985) use cutoffs that minimize the number of misclassifications.Searching the prior researches shows that most of the researchers have chosen the figure 0.5 as cutoff point, however this research is going to use optimal cutoff point, which is achieved by the largest area under the ROC curve, in order to reach the highest overall prediction accuracy of the model prediction.Tables 5 and 6 present the results of the Logit models at the 0.1-0.9cutoffs, respectively, on the training set, which has the , or the overall prediction accuracy ranging from 0.738 to 0.914 for T-1 and from 0.588 to 0.807 for T-2.
Optimal cutoff points shown in figure 2

The Results of Logit Models with Using Distress Score
As shown in table 7, the sensitivity and specificity of logit models by using the distress score in cutoff point which has the most accuracy, is respectively equal to 0.905 and 0.950 for T-1 model, and 0.841 and 0.822 for T-2 model.The table indicates that this model has high accuracy in recognizing the distress and non-distress corporations.As it is seen, the amount of θ is high and is approximately equal to 0.928±0.042(0.844, 1.012) forT-1 model, and 0.832±0.080(0.672, 0.992) for T-2 model.

Comparative Models Performance
Roc curves and the area under it (AUC), describes the balance between sensitivity and specificity, besides comparing the performance of different tests on specific classification activity (Tang, T. C and Chi, L.C, 2005).
ROC curves in logit model by using the distress score, whose graph drawn in figure 4 and 5, is higher and in the left side of logit model curve without using the distress score.In figure 6 and 7, we also see that this model has higher sensitivity and specificity in the training set.Consequently it could be said that in the classification of distress and non-distress corporations, the logit model with using the distress score operates better than the logit model without using distress score.

Prediction Accuracy of Models
To recognize the accuracy of prediction of T-1 and T-2 models, the testing set was investigated.Table 8 and figure 8 and 9 show the results of both two models.In T-1 model without using the distress score, sensitivity, specificity, error I and II are respectively equal to 0.938 ،0.840 ،0.062 ،0.160, and for T-2 model they are respectively equal to 1.000 ،0.378 ،0.000, and 0,622.The amount of θ for T-1 model is equal to 0.889±0.028(0.833, 0.945) and is 0.689±0.030(0.630, 0.748) for T-2 model.In T-1 model with using the distress score, the sensitivity, specificity, error I and II are respectively equal to 0.938, 0.889, 0.062, 0.111and are equal to 1.000, 0.422, 0.000 and 0.578 for T-2 model.The amount of θ for T-1 model is equal to 0.913±0.026(0.861, 0.965) and is equal to 0.711±0.027(0.657, 0.765) for T-2 model.
The ROC curve has been presented for two models in figure 8 and 9.In T-1 model with using the distress score, the ROC curve is higher and in the left side of T-1 model without using the distress score.It is the same in T-2 model.As a result logit model with using the distress score on the testing set operates better than the logit model without using the distress score.
Generally, the classification accuracy in the training set and the prediction accuracy in the testing set of the A models for T-1 and T-2 are respectively equal to 91.4% ، 80.7%, 88.9% and 68.9% and in the B models are equal to 92.8% ، 83.2%, 91.3% and 71.1%.Actually the results show that the logit models with using the distress score outperform the logit model without using distress score.

Conclusion
The financial distress and consequently the bankruptcy of the economic units can sustain enormous losses in large and small scales.In large scale, the financial distress of corporations' results in the decrease of Gross Domestic Product (GDP), the increase of unemployment, wastes the country's resources, and so on.In small scale, the beneficiaries, economic agencies, such as share-holders, investigators, creditors, managers, employees, raw material distributors and customers sustain loss.Therefore the prediction of future financial condition of corporations has largely attracted the financial researchers and finding the alerting indexes of the occurrence of financial distress has turned to be one of the most attractive and significant fields of financial and economic researches.Studying the history of researches show that many of researchers have focused on financial ratios to predict the financial distress.It is worth noting that the score of the distress of the corporations is beneficial for studying their operation condition.Therefore, in this article the distress score is gained form DEA and then it is used as predictor variable beside the other financial ratios.Moreover, this article also compares the performance of the models by using the ROC curve analysis.
The studied fields of this research are the corporations accepted in Tehran Stock Exchange; therefore the needed data has been collected from the database of Tehran Securities & Exchange Organization.Regarding the studied period of time (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008), the data of the 304 corporations, 79 of which were distressed and 225 of which non-distressed, was available.
Data collection of the distressed and non-distressed corporations has been done in the T, T-1 and T-2 years.The T, T-1 and T-2 years for the distressed corporations are defined respectively as the year of the occurrence of the financial distress (the corporations have lost at least half of their capital in 2 successive years), one and two years before that.Regarding the non-distressed corporations, respectively the year the corporation had the most profit and one and two years before that.The selected set for analysis is divided randomly into two sets: Training (80 percent) and testing (20 percent) sets.The former is for making and estimation of the models and the parameters related to prediction methods, and the latter is used for studying the righteousness and accuracy of the prediction of the model.
The results show that the general performance of logit model with using the distress score as the predictor variable is better than logit model without using the distress score.Therefore, it could be concluded that we can enhance the accuracy of prediction by using the distress score gained from DEA as a predictor variable in financial distress prediction.
True positive (TP): the distressed corporation predicted distressed acceding to the model True negative (TN): the non-distressed corporation predicted non-distressed according to the model False negative (FN): the distressed corporation falsely predicted non-distressed according to the model False positive (FP): the non-distressed corporation falsely predicted distressed according to the model The aim is minimizing errors (FN & FP).There is a relation between FN and FP: if FP decreases, FN increases and vice versa.ROC curve indicates that minimization of FN and FP independently is impossible.
and 3, are equal to 0.4 in T-1 model and 0.3 in T-2 model, is determined by the largest area under the ROC curve. is equal to 0.914 for T-1 model and 0.807 for T-2 model.

Table 1 .
A list of predictor variables

Table 3 .
Summary of the Estimated Logit Model (T-1)

Table 4 .
Summary of the Estimated Logit Model (T-2)