Bankruptcy Prediction Using Bayesian , Hazard , Mixed Logit and Rough Bayesian Models : A Comparative Analysis

Bankruptcy prediction has been a topic of active research for business and corporate institutions in recent times. The problem has been tackled using various models viz. Statistical, Market Based and Computational Intelligence in the past. In this work, we analyze bankruptcy using both parametric and nonparametric prediction techniques. This investigation concentrates on the impact of choice of cut off points, sampling procedures and business cycle on accuracy of bankruptcy prediction models. Misclassification can result in erroneous predictions leading to prohibitive costs to investors and economy. To test the impact of choice of cut off points and sampling procedures, four bankruptcy prediction models are examined viz. Bayesian, Hazard, Mixed Logit and Rough Bayesian techniques. To evaluate the relative performance of models, a sample of firms from Lynn M. LoPucki Bankruptcy Research Database in US is used. The choice of cut off point and sampling procedures are found to affect rankings of various models. The results indicate that empirical cut off point estimated from training sample resulted in lowest misclassification costs for all the models. Although Hazard and Mixed Logit models resulted in lower costs of misclassification in randomly selected samples, Mixed Logit model did not perform well across varying business cycles. Hazard model has highest predictive power. However, higher predictive power of Rough Bayesian and Bayesian modes when ratio of cost of Type I to cost of Type II errors is high is relatively consistent across all sampling methods. This advantage of Bayesian models may make them more attractive in current economic environment. This study also compares the performance of bankruptcy prediction models by identifying conditions under which a model performs better. It applies to a varied range of user groups including auditors, shareholders, employees, suppliers, rating agencies and creditors' concerns with respect to assessing failure risk.


Introduction
Bankruptcy prediction (Altman, 1993;Bellovary et al., 2007;Pate, 2002) is an important and challenging topic for business and corporate institutions.Prediction of corporate bankruptcy is a phenomenon of increasing interest to investors or creditors, borrowing organizations and governments alike.It is a key goal of any management.Timely identification of organizations' impending failure is desirable.Bankruptcy is the condition in which an organization cannot meet its debt obligations and petitions federal district court for either reorganization of its debts or liquidation of its assets.In action, the property of debtor is taken over by receiver or trustee in bankruptcy for benefit of creditors.An effective prediction in time is valued priceless for business in order to evaluate risks or prevent bankruptcy (Altman et al., 1977;Altman, 1993).A fair amount of research has therefore focused on bankruptcy prediction (Agarwal et al., 2008;Altman, 1968Altman, , 1993Altman, , 2007;;Altman et al., 1977;Altman et al., 1994;Beaver et al., 2005;Begley et al., 1996;Chava et al., 2004;Grice et al., 2001;Hensher et al., 2007;Hillegeist et al., 2004;Hsieh, 1993;Jones, 1987;Katz et al., 1985;McKee, 2003;Mensah, 1984;Michael et al., 1999;Ohlson, 1980;Ravi et al., 2007;Robertson, et al., 1991;Sarkar et al., 2001;Shumway, 2001;Sun et al., 2007;Tam, 1991;Weiss et al., 2004;West, 1985;Wilson et al., 1994;Zavgren, 1983;Zmijewski, 1984).There may be early warning signs of impending financial distress and this would allow the manager to act in a pre-emptive manner to mitigate the situation from worsening.Signs of potential financial distress are evident long before bankruptcy occurs (Altman, 1993).Financial distress begins when an organization is unable to meet its scheduled payments or when projection of future cash flows points to an inability to do so in near future.The causes leading to business failure and subsequent bankruptcy (Bellovary et al., 2007;Grice et al., 2001;Zavgren, 1993) can be divided into economic, financial, neglect, fraud, disaster and others.Economic factors include industry weakness and poor location.Financial factors include excessive debt and insufficient capital.Research shows that financial difficulties are the result of managerial error and misjudgment.When errors and misjudgments proliferate, it could be a sign of managerial neglect.Corporate fraud became a public concern during late nineties.However, no models are yet available that could detect and flag corporate fraud.Disaster is sometimes the cause of corporate failure.It includes human error and malice.
Bankruptcy filing is not exclusive to any specific economy.Globalization can feed waves of economic distress across societies and national economies after original economy witnesses its deleterious impact.Countries like Japan, Belgium, Thailand, Greece, Hungary etc. are developing their own bankruptcy prediction models to deter disastrous consequences of ultimate financial distress.Predicting corporate failure using past financial data is a traditional and modern topic of financial business (Altman, 1968;Altman et al., 1977;Altman, 1993;Altman et al., 1994;Beatty et al., 2002;Beaver, 1966;Hensher et al., 2007;Jones et al., 2004;Merton, 1974;Neophytou et al., 2004;Platt et al., 2004;Robertson et al., 1991;Thomson, 1991;Zavgren, 1983).The solution to this problem is discriminant function from variable space in which observations are defined into binary set.In United States, defaults and bankruptcies have increased markedly over past decade.In fact, 17 out of 20 largest bankruptcy filing in United States happened during this period.An appreciable volume of work in literature has been devoted to forecasting corporate failure.The methodologies employed have been based mainly on various Statistical and Computational Intelligence models.During this distress period, three important Statistical models viz.Bayesian (Sarkar et al., 2001), Hazard (Shumway, 2001) and Mixed Logit (Jones et al., 2004) have been successfully applied to bankruptcy prediction.All these models have theoretical advantages over existing prediction models.
In this study, we address the issue of empirical testing of comparative power of the above mentioned three approaches along with Rough Bayesian model (Pawlak, 1982(Pawlak, , 1991;;Pawlak et al., 1994;Sarkar et al., 2001;Sun et al., 2007;Verikas et al., 2010;Yao, 2007Yao, , 2008Yao, , 2010;;Yao et al., 1992;Yao et al., 1990).We examine the role of cut off point choice and sampling procedures in evaluating relative performance of bankruptcy prediction models.Specifically, following four important questions are addressed: (i) Does rankings of bankruptcy prediction models depend on arbitrary choice of cut off point?(ii) What is the procedure to determine the optimal cut off point?(iii) Does performance outcomes of bankruptcy prediction models depend on sampling procedures?(iv) How does the cost of Type I and Type II errors influence performance of bankruptcy prediction models?The first two questions focus on the critical role of cut off point as it affects bankruptcy prediction models.The third question extends bankruptcy inquiry to acknowledge the sensitivity of change in economic cycle.The fourth question provides insight to the fundamental question concerning choice of failure prediction models which are critical in optimal allocation of resources.The relevance of this research is twofold viz.(i) The current financial crisis is a result primarily of failure in accounting for credit risk and consequently failure of accounting for default risk by borrowers.This crisis thus demonstrates the need for better methods of evaluating risk of bankruptcy.(ii) The prudential banking regulation Basel II proposes banks use internal models to assess their risks and in particular their credit risks i.e. default risk and stockholders' equity necessary to cover risk of default.
This investigation is built upon prior and concurrent research in three different streams in the bankruptcy prediction literature.First, we build on studies by Jones et al. (2004) and Sarkar et al. (2001), Shumway (2001) and Hensher et al. (2007) who proposed the use of advanced probability modeling in prediction of corporate bankruptcy.These studies indicate that Bayesian, Hazard and Mixed Logit models have valuable applications in financial distress research.We attempt to extend these findings focusing on sensitivity of performance of their models to cut off point selection, to sampling procedure and to the ratio of cost of Type I to Type II error.The second stream of research demonstrates the importance of industry, change in accounting regulation and change in ability of financial statement data in forecasting accuracy of bankruptcy prediction models (Beaver et al., 2005;Chava et al., 2004;Hillegeist, 2004).The basic notion of their research is that besides industry effect, aspects of accounting systems such as going concern and conservatism principle among others limits the performance of any accounting based bankruptcy prediction model.Given their results, we study the impact of overall economic business cycle on testing bankruptcy prediction models.The third stream of research focuses on criterion used to compare bankruptcy prediction models such as classification accuracy, rates of misclassifications and cost of misclassification (Begley et al., 1996;Grice et al., 2001;Jones et al., 2004;Sun et al., 2007).In this work, we address usefulness as measured by rankings of the four models to three criterions.
This study leads to the following important results.The choice of cut off points and sampling procedures were found to affect rankings of various models.The results indicate that empirical cut off point estimated from training sample resulted in lowest misclassification costs for Rough Bayesian model.Although Hazard, Mixed Logit and Rough Bayesian models resulted in lower costs of misclassification in randomly selected samples, Mixed Logit model did not perform as well across varying business cycles.In general, Rough Bayesian model has highest predictive power.However, the higher predictive power of the Bayesian model when ratio of cost of Type I to cost of Type II errors is high, is relatively consistent across all sampling methods.
The significance of this study to private and academic sectors by identifying under what conditions it is most appropriate to use which model in predicting bankruptcies.Auditors often fail to make accurate judgments on organizations' going concern conditions, notwithstanding their knowledge of organization (Hopwood et al., 1994;McKee, 2003).An appropriate choice of a bankruptcy model can help auditor in recognizing his disclaimer or qualification as to going concern nature of business (Altman et al., 1974).Bankruptcy prediction models can also serve audit researchers in better understanding auditor's biases (Sarkar et al., 2001).Second, in addition to auditors, creditors, stock-holders and senior management are all interested in bankruptcy prediction because it affects all of them (Wilson et al., 1994).An early warning model will allow management to take corrective actions before it is too late (Whalen et al., 1988).In fact, research in this area assumes greater significance because poor credit risk model might lead to sub optimal capital allocation (Agarwal et al., 2008).Regulators can intervene early so that mitigating actions can be taken to reduce expected costs of failure (Thomson, 1991).
Finally, an appropriate model can help price distress security as shown by Katz et al. (1985) who found that abnormal return can be earned by identifying changes in scores calculated from bankruptcy model.This paper is organized as follows.In section 2, the related work in bankruptcy prediction and its methodological implications are discussed.The experimental framework is presented in the next section.In section 4, the different data models for analysis are described.This is followed by the experimental results.Finally, conclusions are given in section 6. Beaver (1966) used univariate approach and found that net income to total debt had highest predictive ability.Although univariate analysis seems easy and intuitive to implement, the ambiguity of explanatory power does not provide clear signals (Jones, 1987;Zavgren, 1983).Altman (1968) used Linear Discriminant Analysis (LDA) which assumes equivalent covariance matrices across groups.In 1977, he proposed ZETA analysis which relaxes assumption of equivalent distribution of two group's covariance matrices.However, ZETA analysis is still criticized for assumption of multivariate normal distribution of independent variables (Jones, 1987;Tam, 1991).Tam (1991) found that most of the financial variables are not normally distributed.Unlike LDA and ZETA which are used as discrimination tool, Logit (Martin, 1977;Ohlson, 1980) and Probit (Zmijewski, 1984) are models designed for estimation of probability.These two models require assumptions only on residuals' distribution thus avoiding criticisms aimed at LDA and ZETA analysis.As parametric models dominated the research focus at beginning, studies using non parametric models started to develop in late 1980s.The most commonly used was Artificial Neural Networks (ANN) (Tam, 1991).Over past two decades, number of studies related to ANN in bankruptcy prediction is 39, compared with 11 in LDA, 19 in Logit and 3 in Probit (Bellovary et al., 2007).However, criticism of non parametric models is that significance of variables is not testable.

Related Work
In the past decade, three important models viz.Bayesian (Sarkar et al., 2001), Hazard (Shumway, 2001) and Mixed Logit (Jones et al., 2004) have been applied to bankruptcy prediction.These models have subtle theoretical advantages over previous ones.Bayesian model applies well known Bayesian equation.Prior knowledge, practical estimates and subjective preference are easy to incorporate simultaneously or separately into the model.These prior recognitions are then adjusted by objective estimates from historical and empirical evidence.Subsequently posterior probability is obtained.The noticeable characteristics of this model are transparent.It is intuitive and easy to understand.The improvement of Hazard model over Logit is that former explicitly models bankruptcy not as process that happens at a point in time, an assumption made by all previous models, but as process that lasts for period of firm's life.Shumway (2001) argued that Hazard model is preferable as it incorporates time varying covariates.The advancement of Mixed Logit model over Logit is that it takes into account both observed information and unobserved information.Under this setting, there are two means to model unobserved information.The former is random parameter specification and later is error components approach.
Bankruptcy prediction models are usually evaluated from sample chosen in particular time period.Since distribution of accounting variables is dynamic (Mensah, 1984;Webster et al., 2005) models are likely to be sample specific (Agarwal et al., 2008).Grice et al. (2001) re-estimated Zmijewski's (1984) and Ohlson's (1980) models using time periods other than those used to originally develop models and found that accuracy of each model declined from period of 1988-91 to 1992-99.Begley et al. (1996) came to similar conclusion that Altman (1968) and Ohlson (1980) models did not perform well in 1980s even when coefficients were re-estimated.Robertson et al. (1991) cautioned that one could not assume that predictive power of models can transcend to industries.For example, LDA originally developed for manufacturing industry is used by practitioners as one of the important indicators of credit worthiness across many industries without re-estimating coefficients.Grice et al. (2001) found that Ollison's (1980) model was sensitive to industry classification while Zmijewski's (1984) model was not.Zmijewski (1984) indentified two sampling errors in evaluating bankruptcy prediction models.The first is choice based sample bias.The sampling procedure for bankruptcy prediction analysis initially identifies two groups of observations viz.bankrupt and non bankrupt.This procedure violates assumption of exogenous random sampling since probability that a firm enters sample depends on observed status.He found that predictive power is upwardly biased when sample selected has probability of bankruptcy which deviates from population probability of bankruptcy.He further suggested that weighted exogenous sample maximum likelihood can be used to adjust bias.Platt et al. (2002) tested choice based bias and results were consistent with Zmijewski's (1984) findings.The second sampling error refers to sample selection bias.Sampling procedures usually eliminate observations with incomplete data.Zmijewski (1984) used a bivariate normal approach to estimate correlation between bankruptcy and observation with missing data.He found that organizations with missing data would have higher probability of bankruptcy.In other words, sample selection bias understates bankruptcy probability.Beaver et al. (2005) pointed out that importance of intangible assets has increase over time due to technology based assets generated through research and development expenditure.In addition financial derivative markets experienced an explosion in 1990s.While financial derivatives are mainly used as substitute for leverage such a marked increase would lead to underestimate leverage ratios of organizations.Begley et al. (1996) also state that leverage variables play an important role.Beaver et al. (2005) argued that degree of discretion on financial statements is increasing.These three developments have direct effect on financial ratios which then undermine predictive power of bankruptcy prediction models whose inputs are mainly financial variables.
Changes in regulation may also have an impact on accuracy of accounting based prediction models.Since 1973 many new standards have been established for various perspectives such as Statement Number 87 for pensions, Number 106 for post retirement benefits, Number 107 for financial instruments and Number 115 for debt and equity.Any new standard will have an effect in long run on financial statement (2005).The use of bankruptcy filing has become strategic consideration since changes to bankruptcy laws in late 1970s (1996).
The criterion mainly used to compare bankruptcy prediction models is classification accuracy, rates and cost of misclassification.Classification accuracy is used by many researchers (Grice et al., 2001;Jones et al., 2004;Sun et al., 2007;Wiginton, 1980) because of its intuitive nature.Bankruptcy prediction models are applied in holdout sample or original sample that is used to estimate parameters of model and then sample is separated into groups according to organization's observed status.The rate of successful prediction is calculated within each group.The Lachenbruch validation approach is another validation method used by Altman et al. (1977).It is also known as Jackknife method where one observation is withheld from estimation sample and its status is predicted.Then same procedure is repeated until each observation has been predicted and individual classification accuracy is accumulated over entire sample.
Unlike classification accuracy rate of misc1assification counts observations that are incorrectly classified.Two types of error are defined accordingly.Type I and II errors refer to incorrect predictions of bankrupt and non bankrupt organizations respectively.Numerically, rate of errors is equal to one minus rate of classification accuracy from respective group.Ohlson (1980) demonstrated that rate of misclassification varies across different cut off points.To find optimal cut off point, he summed rates of Type I and II errors and optimal solution is found by minimizing sum.The cost of Type I error is loss of principle when debtor defaults.The cost of Type II error refers to opportunity cost that is difference between interest revenue generated from loans that should have been issued and risk free rate of return.Altman et al. (1977) examined 26 largest United States commercial banks and 67 smaller regional banks.They found that cost of Type I error is 35 times that of Type II error.In contrast, Weiss et al. (2004) suggested a way to specify cost of misclassification in relation to individual organization's characteristics by incorporating models by Hull et al. (2000) and Merton (1974). www.ccsen

Experim
In this re Bankruptc assets mea date of fil solved.
[Reproduc  The predictive ability of bankruptcy models is usually considered to be sensitive to number and combination of independent variables in the literature.Bellovary et al. (2007) summarized that number of variables in literature ranges from 1 to 57 and total of 752 different factors are used.However, models using only two variables can have predictive accuracies ranging from 86% to 100% which is comparable to models using higher number of independent variables.Beaver et al. (2005) also argued that the effect of selection of independent variables may have only marginal influence because statement variables are correlated.They further found that informative power of statement variables is actually decreasing due to change in application of financial tools and standards.This loss of informative power in accounting information can be compensated with usage of market variables.In this work six variables are used which are identified by Beaver et al. (2005) as independent variables in all three models.Statement variables are collected from Compustat and market variables are sampled from CRSP.The six variables are defined by Beaver et al. (2005) as follows: (a) ROA: Net income divided by total assets.This variable captures ability that an organization generates income from its assets.In bankruptcy prediction, it is considered to be critical element used to measure ability of an organization to repay its interest or debts.(e) LSIGMA: The standard deviation of residual return from regression of twelve monthly returns of the organization on monthly returns of market index.This variable reflects market perception of organization's performance.
(f) LRASIZE: The logarithm of ratio of market capitalization of organization divided by market capitalization of market index.Market capitalization is calculated as number of share outstanding times stock price.
LERET and LSIGMA are computed for 12 month period ending with third month after fiscal year end.LRSIZE is computed as of end of third month after fiscal year end.

Data Analysis
In order to establish the validity of different bankruptcy predictive models the following parameters are used in this work.
(a) is the status of organization i in a given year for 1, … … , . (b) is 7 1 vector of independent variables for organization .First element is 1.
(d) is coefficient of j th element of variable j of where is the intercept.
In the remaining subsections, analysis of bankruptcy prediction is performed using Bayesian, Hazard, Mixed Logit and Rough Bayesian models.

Bayesian Model
Bayesian model developed from Bayes' Theorem (Duda et al., 1973;Good, 1965) is represented with two distinct interpretations.It is a way of understanding how the probability that a theory is true is affected by a new piece of evidence.In Bayesian interpretation, it expresses how a subjective degree of belief should rationally change to account for evidence.It has applications in wide variety of contexts ranging from marine biology to the development of Bayesian spam blockers for email systems.In the philosophy of science, it has been used to try to clarify relationship between theory and evidence.Many insights in the philosophy of science involving confirmation, falsification, relation between science and pseudoscience and other topics can be made more precise and sometimes extended or corrected by using Bayes' Theorem.The application of Bayes' Theorem to update beliefs is called Bayesian Inference.The Bayesian equation for this work is expressed as follows: Sarkar et al. (2001) found that the naive model which assumes independence across predictive variables is comparable in performance with composite model which assumes dependence across some or all predictive variables as shown in Equation ( 1).The naive model used in this work is as follows: The variable is continuous and it is to be discretized before inputs are given to the model.Sun et al. (2007) found that the optimal level of discretization is either 2 or 3.Here EPT method is used (Sun et al., 2007) to discretize each input variable into 3 levels which approximates continuous distribution by dividing outcome space into levels with probability of occurrence 18.5%, 63% and 18.5% respectively.The two points for discretization are determined in each training sample and are also used to discretize the holdout sample.The bankruptcy rate of training sample is used as estimator of probability of bankruptcy for individual organization in holdout sample.

Hazard Model
Hazard models are a class of survival models in statistics.Survival models relate time that passes before some event occurs to one or more covariates that are associated with that quantity.In hazard model, the unique effect of a unit increase in a covariate is multiplicative with respect to hazard rate.For example, taking a drug may halve one's hazard rate for a stroke occurring or changing material from which a manufactured component is constructed may double its hazard rate for failure.Shumway (2001) proved that multi-period Logit model is equivalent to a discrete time hazard model.The multi-period Logit model is estimated from data of each organization in each year as if each organization year is an independent observation.The hazard function for one observation is expressed as follows: Here is the probability that organization will be bankruptcy in period from to 1.
The survival function of the organization is given by: It is the probability that an organization survives to time .The Logit function is assumed to be the functional form of hazard function so Equation ( 4) can be rewritten as follows: Again, the likelihood function is given by: The Equation ( 6) measures discrete time multi-period model.is the last organization year of organization in our sample.The only difference in comparing with Shumway's model is that the organization's age is omitted from independent variables in an attempt to maintain consistency of independent variables across three models.

Mixed Logit Model
Mixed Logit model (Johnson et al., 2007) is a general statistical model for examining discrete choices.The motivation for mixed logit model arises from limitations of standard logit model.The standard logit model has three primary limitations taken care of by mixed logit model.Mixed Logit obviates three limitations of standard logit by allowing for random taste variation, unrestricted substitution patterns and correlation in unobserved factors over time.Mixed logit can also utilize any distribution for random coefficients.Mixed Logit model can approximate to any degree of accuracy any true random utility model of discrete choice, given an appropriate specification of variables and distribution of coefficients.In Mixed Logit model utility associated for each observation can be expressed as follows (Revelt et al., 1998): In Equation ( 7), is the error component that is correlated among alternatives and heteroskedastic for each individual organization and is the random term with mean zero which is independently and identically distributed over alternatives and individual organization.If Logit function is assumed to be the functional form of probability of bankruptcy and given , the conditional Logit model is identical to traditional Logit model.

Now, 1|
Also can be represented as: Here, ′ is vector of subset or full set of observed independent variables and is random vector with mean zero whose density function is ∑ which can be chosen as any distribution with mean zero.The term ′ induces heteroskedasticity or correlation or both across unobserved utility components of alternative status of an observation by specifying ∑ in a desired manner.This term is known as error components approach which treats unobserved information as a separate error component in random component.Also ′ is subset or full set of and is vector of random parameters or coefficients.Random coefficients allow heterogeneity across individual firms with respect to their sensitivity to observed exogenous variables.Denoting element of as then can be further represented as: (10) Here, is random variable following any distribution with mean zero and variance 1.The term s ′ is known as random parameter approach.In this work, we adopt random parameter approach in accordance with Jones et al. (2004).The parameters and must be estimated in addition to . is defined to be 0,1 .Thus, is also normally distributed with mean and variance .Then, unconditional probability of bankruptcy can be expressed as expected value of indicator function According to law of iterated expectation Equation ( 11) can be transformed as follows: In this work, LSIGMA and ROA are selected as subset s whose coefficients are random.The likelihood function is given by: The estimates of Hazard and Mixed Logit models may be inconsistent and biased because of choice based sample (Zmijewski, 1984).He summarized three ways to estimate models with choice based sample viz.(a) weighted exogenous sample maximum likelihood (WESML) (b) conditional maximum likelihood (CML) and (c) full information concentrated maximum likelihood (FICML).WESML is computationally least complex so it is used to estimate Hazard and Mixed Logit models in this work.The log likelihood function of Hazard model using WESML is adjusted and is as follows: The log likelihood function of Mixed Logit model using WESML is adjusted and is given by: In Equations ( 14) and ( 15), we have 1 as population bankruptcy rate (0.847% as reported by Zmijewski 1984); 0 1 1 ; 1 is sample bankruptcy rate and 0 1 1 .

Rough Bayesian Model
Rough Bayesian model is developed using probabilistic Rough Sets and Bayesian models.We first review basic formulations of probabilistic Rough Sets and then present Rough Bayesian model in following subsections.(Pawlak, 1982(Pawlak, , 1991;;Pawlak et al., 1994):

Decision Theoretic
It can be argued with certainty that any object ∈ belongs to and that any object ∈ does not belong to .It cannot be decided with certainty whether or not an object , and belongs to .The qualitative categorization in Rough Sets proposed by Pawlak may be too restrictive to be practically useful.Probabilistic Rough Sets enables some tolerance of uncertainty in which Pawlak's Rough Sets are generalized by considering degrees of overlap between equivalence classes and the set to be approximated i.e. and in equation system ( 16).The conditional probability of an object belonging to given that object is in estimated using cardinalities is given by: In Equation ( 17), | • | denotes cardinality of set.Pawlak and Skowron (1994) suggested that conditional probability is actually rough membership function.According to above definitions the five regions can be equivalently defined as: The equation system ( 18) is defined by using two extreme values 0 and 1.These are of qualitative nature and magnitude of | is not taken into account.The main result of decision theoretic Rough Sets is parameterized probabilistic approximations.This can be represented by replacing values 1 and 0 in equation system (18) by pair of threshold values and with .The , probabilistic positive, positive boundary, boundary, negative boundary and negative regions are defined by the following expressions: The five probabilistic regions lead to five way decisions (Yao, 2007(Yao, , 2008(Yao, , 2010;;Yao et al., 1992;Yao et al., 1990).We accept an object to be member of if probability is greater than .We reject to be member of if probability is less than .We neither accept nor reject to be member of if probability lies between and instead we make decision of deferment.The threshold values and can be interpreted in terms of cost or risk of five way classification.The values can be systematically computed based on minimizing overall risk of classification.

Classification Based on Bayes Model
The conditional probabilities are not always directly derivable from data.In such cases alternative ways are needed to be considered to calculate their values.A commonly used method is to apply Bayes Theorem such that: is the posteriori probability of class given ; is the apriori probability of class and | is the likelihood of with respect to .A monotonically increasing function of conditional probability may be defined to construct an equivalent classifier.This observation can lead to significant analytical and computational simplifications.The probability in Equation ( 21) can be eliminated by taking odds form of Bayes Theorem which is given as follows: A threshold value on probability can be interpreted as another threshold value on odds.For positive region we have: By applying logarithms to both sides of the above equation we get: Similar expressions can be obtained for negative and boundary regions.Thus, the five regions can now be represented as follows: This interpretation simplifies calculation by eliminating .The detailed estimations of related probabilities need to be further addressed.

Bayesian Model for Estimating Probabilities
The naive Rough Bayesian model provides a practical way to estimate conditional probability based on naive Bayesian classification (Duda et al., 1973).The rough set model proposed by Pawlak (1982Pawlak ( , 1991) has information about a set of objects that are represented through an information table with finite set of attributes.Formally, an information table can be expressed as follows: .In practice, it is difficult to analyze interactions between components of especially when is large.A solution to this problem is to calculate likelihood based on naive conditional independence assumption (Duda et al., 1973;Pawlak, 1982Pawlak, , 1991)).That is we assume each component of to be conditionally independent of every other component ∀ . For Bayesian interpretation of five regions based on equation ( 24), we can add the following naive conditional independence assumptions: The Equations ( 29) and ( 30) can be re-expressed as follows: Here, and | can be estimated from the frequencies of training data by substituting the following values: Here, , is called meaning set and is defined as , ∈ i.e. set of objects whose attribute value equals with respect to attribute .Similarly, and | can be estimated.We can rewrite equation system (25) as:

Experimental Results
This section illustrates the results obtained towards bankruptcy prediction using Bayesian, Hazard, Mixed Logit and Rough Bayesian models.A comparative analysis among different stated methods is also highlighted.All the bankruptcy models are implemented in MATLAB on Pentium IV PC with 1 GB RAM.In the remaining subsections, we discuss the experimental results in terms of cut off point, Type I and Type II errors, optimal cut  some important methods to choose cut off point viz.cut off point of 50% suggested by Johnson et al. (2007) where prior probability and costs are difficult to incorporate and another one suggested by Grice et al. (2001), Zmijewski (1984) and Ohlson (1980); industry failure cut off rate of 0.41% suggested by Martin (1977) assuming that apriori probability for group membership is equal to sample probability and cut off point of 3.8% suggested by Begley et al. (1996) obtained from Ohlson (1980) that minimizes sum of Type I and Type II errors.
The results based on different cut off points are illustrated in Table 4. Within each model, Type I or Type II is the rate of Type I or Type II errors and Type I + Type II is the sum of Type I and Type II errors.When cut off point is 50%, Type I error is the mainly for validation then Bayesian model outperforms others.However, when sum of Type I and Type II errors is used, Hazard, Mixed Logit and Rough Bayesian models dominates for cut off points of 0.41% and 3.8%.It is to be noted that these results are arbitrary and does not reflect true quality of models.
The above cut off point choices has two limitations.First, an inappropriate assumption of equal costs of misclassification exists if sum of two errors is used as validation method.Altman et al. (1977) estimated that cost of Type I error is 35 times greater than that of Type II error.This is because when Type I error occur, creditors lose their total principle, in contrast to the opportunity cost resulting from occurrence of Type II error.Secondly, the differences of numbers of organizations in each group are generally ignored.The number of bankrupt organizations is far less than non bankrupt organizations.According to Zmijewski (1984) frequency of financial failure has not exceeded 0.75% since 1934.

Optimal Cut off Point
An appropriate cut off point which minimizes the cost of misclassification is given by following equation: Here, is total cost of misclassification, , and , are the cost of Type I and Type II error for organization i respectively, is the cut off point, is probability of bankruptcy for organization i, is the observed status of organization I, • is indicator function that returns 1 if logical function is satisfied, otherwise returns 0. Weiss et al. (2004) provided a methodology to estimate level of individual costs.However this estimation will dramatically increase computational time.So we use a broad specification of costs as in Altman et al. (1977).Equation (36) provides minimization of simplified total cost.

min
(36) Here, and are rates of Type I and Type II error respectively, a function of cut off point , and are rates of bankrupt and non bankrupt firms in sample.Equation ( 36) cannot be solved in a closed form.Thus, optimal cut off point is not known in advance for holdout sample.A proxy of optimal cut off point can be found using various methods.One technique will be using cut off points discussed in literature.Another technique is to estimate cut off point using training sample.To determine an empirical cut off point estimated using training sample is optimal proxy in holdout sample, the random selection sample is run 30 times.
The training sample is used to evaluate apriori probabilities and to compute an empirical cut off point.The holdout sample is used to generate predictive results.Three other cut off points are selected for comparative purpose viz.0.41%, 3.8% and 50%.Pair sample t tests are conducted under each pair of cut off points within each model.The results are given in Table 5 using various cut off points.The first column contains ratios of cost of Type I to cost of Type II errors.Instead of using only one specification of Type I and Type II costs, we consider several ratio specifications in order to have robust check.The results suggest that empirical optimal cut off point calculated with training sample is the best proxy for the true optimal cut off point for holdout sample as it dominates most specifications of Type I costs over Type II costs by producing least cost of misclassification.Such results are consistent for all four models.Following estimated optimal cut off point, industry bankruptcy rate of 0.41% is considered as second best cut off point.It is not appropriate to use it when ratio of Type I cost and Type II cost is low, since most of organizations will be classified as bankrupt reducing cost of Type I error while simultaneously increasing cost of Type II errors.A cut off point of 50% is generally used when equivalent costs of Type I error and Type II error are assumed but empirical results suggest that it will produce more cost of misclassification even though two types of costs are assumed to be equal.

Randomly Selected Samples
As discussed in subsection 5.1, different choices of cut off points can lead to different conclusion about comparative power of prediction of various models.As a result, tests should be conducted under conditions that are free of distortion posed by an arbitrary cut off point while maintaining true characteristics of models.The results in subsection 5.2, suggest that cut off points estimated from training sample are more preferable than other specifications.We use this empirical cut off point to compare four new models.The 30 random samples in subsection 5.2 are used again.Pair sample t tests are performed to generate total costs of misclassification for each pair of models.The results are presented in Table 6 which is reproduction of Table 5 using optimal cut off point.

Samples in Different Business Cycles and Sub Cycles
Since random selection produces results that may be tarnished with noisy sample, we conduct our analysis around business cycles.There are 4 business cycles in our sample.A comparative study is conducted on adjacent business cycles.Priors, estimates and optimal cut off point are evaluated from preceding cycle and cost of misclassification is calculated with subsequent cycle.Since first business cycle is short and sample size is fairly small, we combine first two cycles.Tables 7 and 8 illustrate results on business cycle and on sub cycles using same methodology.tively ost of make it more attractive in current economic environment where many large organizations have come under financial distress including cases such as Enron, Lehman and GM.These organizations were financed with significant amount of debt leading to high level of Type I error cost.However, they were generally considered not to be candidates for bankruptcy.Therefore, credit spread of such large organizations will be low leading to a low level of Type II error cost which is the opportunity cost for a bank issuing a loan.This investigation leaves several issues which can be considered as subject of future studies.Unlike this work, where we took a methodological approach which focuses on sensitivity of performance of Bayesian, Hazard, Mixed Logit and Rough Bayesian models, one can use an empirical approach to further analyze which model is sensitive to what factors or factors which may influence a model the most.In addition, rather than using a wide specification of cost, a model may be developed that more accurately estimates cost of both errors on an individual level helping to better compare bankruptcy prediction models.Finally, two major differences that characterize United States and Canadian markets is level of litigation risk and standard setting approach.An interesting study from international perspective is to determine differential impact of financial ratios and corporate governance quality in predicting bankruptcies.
Figure 1 p beginning 1990s to 9 again; a to period is f any of ind Due to lac sample se collected f end that is business c A busines contraction since 1980 as shown i it is elimin for non ba Table2 pro (Compustat code: net income = DATA172, total asset = DATA6) (b) ETL: EBITDA divided by total liabilities.EBITDA is earning before interest, tax, depreciation and amortization.It is calculated as sales minus cost of goods sold minus selling, general and administration expense.ETL is indicator to measure liquidity of an organization to generate cash in order to meet interest and principal requirement, especially in short term.(Compustat code: sales = DATA12, COGS = DATA41, selling, general and administration expense = DATA189) (c) LTA: Total liabilities divided by total assets which is measure of organization's capital structure.(Compustat code: total liabilities = DATAl81) (d) LERET: The cumulative residual return defined as difference between cumulative monthly return for the organization less cumulative monthly return on market index of NYSE, AMEX and NASDAQ firms.Share price return is recognized as leading indicator in economics.A large decline in an organization's return may signal financial difficulties.

Table 2 .
Breakdown of business cycle

Table 3 .
Descriptive statistics of independent variables

Table 4 .
Predictive results with different cut off points

Table 5 .
Random selection and test results

Table 6 .
Costs of misclassification least cost at 5% significant level (two or three*in one row indicates that they are not statistically different).]Hazard, Mixed Logit and Rough Bayesian models are effectively no different in terms of predictive power for all specification of Type I and Type II cost.In general, Rough Bayesian model has highest predictive power.When the ratio of cost of Type I to cost of Type II errors reaches a very high level, Bayesian model is comparable to the other two.A possible shortcoming of randomly selected samples can result in situations where ex post results are used to explain and predict prior events creating a noisy sample.

Table 7 .
Results estimated across business cycles [Numbers in bold are the lowest cost within four models for each ] www.ccsen6.Conclus