Bankruptcy Prediction: Some Results from a Large Sample of Polish Companies

The Polish economy went through a transition from a centrally-planned economy to capitalism about twenty years ago. Therefore in almost every area of economic science, Polish economists frequently find that theories created for developed countries are not directly applicable under Polish conditions. This statement applies also to the area of bankruptcy prediction. For almost twenty years there have been numerous attempts to construct a Polish model of bankruptcy prediction. The main obstacle in this process is the access to large enough data samples. In this paper we analyze, to the best of our knowledge, the largest database of Polish company bankruptcies. It includes data from 13,288 companies, of which 1,198 went bankrupt. Our intention, using Shumway’s (2001) approach, is to show the aggregated results of fitting many competing bankruptcy model specifications to a large sample of Polish companies. The models are described in terms of their predictive powers, and also the predictors which appeared most often in the set of the best performing models are identified and discussed.


Introduction
The Polish economy has gone through a transition from a centrally-planned economy to capitalism.This change took place about twenty years ago, so the Polish economy is still developing and shows behavior that is different to that observed in developed countries.Therefore, in almost every area of economic science Polish economists frequently find that theories created for developed countries are not directly applicable to Polish conditions.This statement applies also to the area of bankruptcy prediction.
Economic databases in Poland are still being created, and the available data is often very limited.As a consequence the main obstacle to bankruptcy model development in Poland is the access to large enough data samples.The vast majority of Polish authors had to work with data samples, numbering at most one hundred financial statements, and more often with no more than a few dozen cases.Moreover, their data rarely spanned periods longer then 4-5 years.In this paper we analyze, to the best of our knowledge, the largest database of Polish company bankruptcies.It includes data from 13,288 companies, of which 1,198 went bankrupt.The data was collected between the period 1995-2011.
Our intention, using Shumway's (2001) approach, is to show the aggregated results of fitting many competing bankruptcy model specifications to a large sample of Polish companies.The models are described in terms of their predictive powers, and also the predictors which appeared most often in the set of the best performing models are identified and discussed.

Literature Review
Scientific analysis of bankruptcy prediction began about 80 years ago.Bellovary et al. (2007) list scientific studies in this area done between 1930 and 2004.The pioneering research was focused on the analysis of single financial ratios (univariate analysis) and comparisons of their distributions between companies showing good financial standing and those that went bankrupt or showed signs of severe problems, such as a lack of profitability.
In 1968 Altman published the first results of a multivariate analysis of bankruptcy data.His paper is often considered to mark the beginning of the modern approach to bankruptcy prediction.Altman's method -the so called Z-score -was based on multivariate discriminant analysis and was a standard method of bankruptcy prediction for at least 20 years after its creation.Altman analyzed 5 financial ratios and showed that using these ratios one may predict one-year-ahead bankruptcy with a very high confidence.His model was also tested with an out-of-sample data that further proved its quality.
Altman's study triggered growing interest in researching bankruptcy prediction.Ohlson (1980) constructed a multivariate logistic model of bankruptcy prediction using nine financial ratios.His study covered a large data sample including data on more than 2,000 companies.He reported a prediction accuracy rate of about 85%.His study included also a critique of Altman's method based on the discriminant analysis.
Another well-known study related to bankruptcy prediction is a study conducted by Zmijewski (1984).Zmijewski developed an approach to financial distress prediction.His analysis was focused on profitability (or lack of it), not on bankruptcy.However, he used his model for a one-year-ahead bankruptcy prediction and reported an accuracy rate close to 80%.
Many other scientific papers regarding bankruptcy prediction models were published besides the ones mentioned above.These models differ in both the variables used for prediction as well as the modeling approaches.Bellovary et al. (2007) report that among the 165 models created since 1960s, the most popular modeling approaches are: discriminant analysis, logit and probit models, and neural networks, but methods such as genetic algorithms and machine learning algorithms, e.g.Support Vector Machines, are used as well.
Up until Shumway's (2001) paper a typical approach to bankruptcy prediction was the so called single-period static modeling.However, Shumway showed that the static approach has many drawbacks and may lead to suboptimal results.The main concern about static modeling is that it does not fully use all the information provided by the data, by ignoring multi-period dynamics.As a solution Shumway proposed using a hazard model.Shumway reestimated Altman's and Zmijewski's models and concluded that almost half of the variables used by them are statistically unrelated to the bankruptcy probability.
For the last twenty years there have also been numerous attempts to construct a model of bankruptcy prediction in the realms of the Polish economy.Historically, the most popular bankruptcy models were linear discriminant models researched for example by Mączyńska (1994), Pogodzińska and Sojak (1995), Gajdka and Stos (1996), Hadasik (1998) and Hołda (2001).More recently logistic regression, neural networks and decision trees were explored in Polish literature by Wędzki (2005) and Hołda (2006).

Data Description
The database was created on the basis of the individual company reports included in the EMIS database, run by ISI Emerging Markets (www.securities.com).The data provided by EMIS covers the yearly reports of medium to large Polish companies collected from various official and unofficial sources.Moreover this database lists the dates of bankruptcy announcements, if such an announcement was made.
A typical EMIS data file consists of 36 items, however 14 of them are precalculated financial ratios, whilst 6 others are almost always blank fields.As a result, all the calculations are based upon the following 16 balance sheet items: Fixed assets, Current assets, Inventories, Short-term receivables, Cash and cash equivalents, Total Assets, Shareholders' equity, Long-term liabilities, Short-term liabilities, Total income, Sales revenues, Operating costs, Operating profit (loss), Gross profit (loss), Net profit (loss), and Depreciation.
We used these quantities to construct a selected set of the widely known and recognized financial ratios, i.e.: Operating profitability, Assets to operating income ratio, Return on Sales, Return on Equity, Long-term liabilities to total assets ratio, Current liabilities to total assets ratio, Equity to assets ratio, Funded capital ratio, Quick ratio, Cash Turnover Ratio, Accounts Receivable Turnover, Total Assets Turnover, Working capital productivity, Current assets to working capital ratio, Inventory to working capital ratio and Current receivables to current liabilities ratio.We have also made an attempt to compute many additional financial ratios but most of them, due to missing data, could very rarely be computed and therefore were excluded from further analysis (for details see Table 1).
The generated data set suffered from a typical drawback of financial data, i.e. it contained grossly outlying observations.This, when left unattended, might negatively affect any further analysis.Therefore, we performed on each individual financial ratio a procedure called data winsorization, which is commonly used as a preprocessing step in the field of bankruptcy modeling.For each financial ratio we replaced all the observations that were at least 1.5 times the interquartile range below (above) the first (third) sample quartile, with the first (third) quartile value minus (plus) 1.5 times the interquartile range.Also, when a financial ratio took an infinite value, as a consequence of dividing by zero, we replaced it with a sample maximum (for positive infinity) or a sample minimum (for negative infinity).
All the preprocessing steps yielded a data set containing 88,753 firm-years, where by a firm-year we mean a single set of financial ratios computed for one company on the basis of one calendar year's data.On average each company contributed 6.7 firm-years to the sample.
The most prevalent forecasting horizon in bankruptcy prediction literature is one year.However, in our opinion it is more useful to lengthen the forecasting horizon to two years, to give a potentially failing company early warning and consequently more time to implement anti-bankruptcy strategies.Consequently, in the next step for each firm-year T we determined the value of a binary dependent variable: 'one' if this company went bankrupt in the year T+2, 'zero' otherwise.If a company went bankrupt in year T+2 then its data from year T is the last firm-year for this company included in the data set.A more detailed numerical description of a final data set is presented in Table 2 and Table 3.
In the last step we split the whole data set into two independent subsamples: a learning one and a testing one.
The learning subsample consisted of 55,494 firm-years, collected between the period 1995-2006 (62.5% of all), whilst the testing subsample included 33,259 firm-years collected between the period 2007-2010 (37.5% of all).

Modeling Approach Description
In a single-period approach, a researcher considers each bankrupt firm's data only at a fixed time before bankruptcy.Any data preceding that point in time is discarded.Shumway (2001) proposes to use a hazard model that exploits all the available information, i.e. each firm's time-series data is included as time-varying covariates.
As a result of this additional data, hazard models produce more precise parameter estimates and may generate more efficient predictions.Importantly, Shumway points out that the hazard model can be formulated as a binary logit model (ordinary logistic regression model).To fit a hazard model using software for a logistic regression, each firm-year needs to be a separate observation.We applied Shumway's method and used penalized maximum likelihood to estimate logistic regression model parameters.More details on penalized maximum likelihood estimation can be found, for example in Firth (1993), Fijorek and Sokołowski (2012).
The conducted research was not limited to the estimation of a single bankruptcy prediction model.Instead of this, we have estimated many competing models and looked for the best performing ones, and the variables that constitute them.Having 16 explanatory variables (financial ratios) at hand, it was possible to design as many as 65,535 different models.For each of them we proceeded according to the following steps: 1) Model coefficients were estimated using learning sample data.
2) Using estimated coefficients, we computed bankruptcy probability for each learning sample firm-year.
3) Using all estimated bankruptcy probabilities we computed their median (Med) and 80 th percentile (P 80 ).
4) Using testing sample data we again computed probability of bankruptcy for each firm-year, using model coefficients estimated with learning sample data.

5)
We calculated the number (denoted N 1-5 ) of bankruptcies in the testing sample for which the estimated probability of bankruptcy is lower than Med.It shows the number of out-of-sample bankruptcies for which the estimated model produces a bankruptcy probability that is lower than the median value of probabilities obtained for the learning sample data.This represents the undesirable behavior of the model to predict a low bankruptcy probability for a company that in fact went bankrupt.Thus it may be treated as measure of an out-of-sample 'weakness' of the model.

6)
We counted the testing period bankruptcies with a probability of bankruptcy greater than P 80 -this number is denoted as N 9-10 .N 9-10 shows the number of out-of-sample bankruptcies for which the estimated model produces bankruptcy probabilities that fall into the two top (9 th and 10 th ) deciles of bankruptcy probabilities obtained with the learning sample.This number expresses the desired model behavior of assigning high bankruptcy probability to a bankrupt company.Therefore we will treat it as a measure of the model's predictive ability.
7) Finally, we computed two fractions: (i) N 1-5 /N, and (ii) N 9-10 /N, where N is the total number of bankruptcies in the testing period data.
The fractions computed in the last step of the algorithm were tabulated for all the estimated models and are discussed in the next section.For comparison purposes we also computed, and depicted the same fractions for the learning sample data.

Results
We begin the analysis of the results by comparing the performance of our models to the model developed by Shumway (2001).Figure 1 presents kernel densities, estimated for ratios N 1-5 /N and N 9-10 /N for the learning and testing samples.We can conclude that in the testing sample the best estimated models achieve N 1-5 /N ratio of about 9%, and N 9-10 /N ratio of about 59%.Therefore, the performance of even the best estimated model in our analysis is significantly worse than the performance of Shumway's (2001) model.There are many possible reasons for this.Firstly, the Polish economy is still developing, which means that Polish entrepreneurs act in harsher conditions than their American counterparts.Therefore, it is probably easier to predict bankruptcy in the US than in Poland, in general.Secondly, we attempted to predict bankruptcy two years before it happens, whilst Shumway's model predicts one-year-ahead bankruptcies.We think we can safely assume that predicting bankruptcy for a longer horizon is a far harder task.Lastly, our testing sample covers the period of the last debt-markets' crisis which might cause a change in the main factors driving bankruptcy, i.e. the factors relevant in the learning period may not be as relevant in the testing period due to possible structural changes.It is worth underscoring, however, that the performance of the best estimated models is not bad in absolute terms.Only about 10%-15% of the testing-period bankruptcies were assigned a bankruptcy probability that is lower than Med.
On the other hand about 55%-58% of these bankruptcies receive a bankruptcy probability that falls into the top two deciles of all the estimated bankruptcy probabilities.It is also worth noting that the set of the best models is relatively large -more than 1,500 models achieve results in the range given above.
Besides the models' predictive performance, we also analyzed the statistical significance of predictors across the estimated models.We were interested the most in variables that constituted the best performing models, and had statistically significant coefficients.The results of this analysis are given in Table 4, along with the results for all the estimated models.The data shown in Table 4 allows us to identify the most and the least important predictors of bankruptcy.The two most important financial indices are Operating profitability, and the Cash turnover ratio.These predictors have statistically significant coefficients in all, or almost all, of the best 100-500 analyzed models.In addition to these two financial ratios, there is a group of four other variables, which have statistically significant coefficients in the majority of the best analyzed models.These are: Return on sales, Current liabilities to total assets ratio, Equity to assets ratio and Total assets turnover.The next interesting group of variables is formed by the following financial ratios: Long-term liabilities to total assets ratio, ROE, Quick ratio, Accounts receivable turnover and Working capital productivity.These variables turn out to be important in the case of 25%-45% of the best 100-500 analyzed models.Finally, the five following financial indices: Assets to operating income ratio, Funded capital ratio, Current assets to working capital ratio, Current receivables to current liabilities ratio and Inventory to working capital ratio, appear very rarely or almost never in the best 100-500 of the analyzed models.
Comparing the figures commented on above with the results shown in the last column of Table 4, we can conclude the following points.Firstly, Total Assets Turnover plays a much bigger role in the group of the best models, than in all the estimated models containing this variable, on average.The opposite may be said for quite a few other indices.This is especially true for the Inventory to working capital ratio, Accounts Receivable Turnover, the Quick Ratio and the Current receivables to current liabilities ratio.
The last part of analysis regards checking what kind of relationship exists between bankruptcy probability, and all the used variables (Table 5).To save space we used only data from the best 500 models.Narrowing down the analyzed group of models does not change the conclusions.With one exception, all results reported in Table 5 are consistent with economic theory.The relationship between all profitability (return) -related indices and bankruptcy probability is negative, meaning that higher profitability implies a lower probability of going bankrupt.The same statement applies to all turnover ratios.The relationship between liabilities indices and bankruptcy probability is positive, meaning the higher the debt the greater the chance of going bankrupt.
The only exception is the result for the Cash Turnover Ratio.It is believed by economists, that the higher the value of this ratio the better, however the analyzed data showed the opposite.If we assume that a company suffers from problems with liquidity, then its Cash Turnover Ratio will be very high.Problems with liquidity are encountered very often in the realms of the Polish economy, as a result we suspect that this ratio acts as a liquidity measure, and its high levels indicate troubles with liquidity.Hence the positive relationship between this ratio and bankruptcy probability.

Conclusions
The conducted research allows us to form the following conclusions.Firstly, the performance of the best estimated models is satisfactory.Only about 10%-15% of the testing-period bankruptcies were assigned a low bankruptcy probability and about 55%-58% of bankruptcies received a high bankruptcy probability.The set of the best models is relatively large -more than 1,500 models achieved the above mentioned results.
This points us to the second conclusion, that it is possible to fit many, almost equally good models, with markedly different input variables.However, it is possible to identify the variables which appear most often in the set of the best performing models.There is a clear group of financial ratios that seem to be very helpful when predicting bankruptcy in Poland.These are: Cash turnover ratio, Operating profitability, Return on sales, Current liabilities to total assets ratio, Equity to assets ratio and Total assets turnover.
Thirdly, the relationship between bankruptcy probability, and all but one of the analyzed variables is consistent with economic theory.The only exception to this rule is the Cash turnover ratio which seems to be bound up more with liquidity than with real cash turnover.Therefore, its high value signals troubles with liquidity, and is one of the strongest factors affecting predicted bankruptcy probability.
Based on the above conclusions we can offer some suggestions regarding practical usage of the conducted research.This suggestions apply to Polish economy but we hope that they could be, at least to some extent, applied to other economies, especially ones that used to be centrally-planned and have transformed to capitalism.
Firstly, it seems that it is possible to predict bankruptcy two years ahead of it, and this can be done quite accurately.This observation opens the way to constructing a reliable early-warning system for bankruptcies, even on a nationwide level, e.g.similar to Polish Rapid Reaction Facility (www.isr.parp.gov.pl).Moreover our results show that there is a large group of models that use widely different inputs (financial ratios) and perform almost equally well regarding two years ahead bankruptcy prediction.This finding is particularly important in real-life situations where companies do not always provide all the necessary data and alternative models are needed to accommodate such cases.
Secondly, our results may be used to form recommendations for bankruptcy threat monitoring at an individual company level.It is not surprising, but nonetheless worth saying, that managers should be very concerned about any liquidity-related problems because this is one of the strongest factors in predicting bankruptcy.Troubles with managing liquidity at a proper level may lead to excessive short-term borrowing which in turn affects the Current liabilities to total assets (CLTA) ratio.Sudden increases in CLTA should be investigated very closely as our results show that this financial ratio is also a very strong predictor of bankruptcy.The next item on a bankruptcy threat monitoring checklist should be an analysis of sales-related ratios, especially the Return on Sales and Total Assets Turnover ratios.Their significant decrease, meaning a less efficient use of the company's assets, is also a clear symptom of an incoming crisis.

Figure 1 .
Figure 1.Kernel density estimates for N 1-5 /N, and N 9-10 /N for the training and testing sample

Table 1 .
Definition of financial ratios considered for and included in the analysis

Table 2 .
Companies in the sample cross tabulated by year and type of main activity (farming, forestry, and fishery excluded)

Table 3 .
Distribution of financial ratios in bankrupt and non-bankrupt firm-years

Table 4 .
Predictors forming the best performing models

Table 5 .
Signs of coefficients of variables for the best 500 models