Determinants of the Levels of Development Based on the Human Development Index: A Comparison of Regression Models for Limited Dependent Variables

This paper aims to examine the factors affecting the level of development of countries using various regression models for limited dependent variables including binary logit, probit and Tobit analyses. In this manner, the paper may suggest a road map for high developed countries to achieve very high developed levels. For this purpose, Human Development Indices of 84 countries were involved in the analysis with respect to nine independent variables. The results of the all regression models indicate that determinants including life expectancy at birth, expected years of schooling, labour force participation rate (female-male ratio), and GDP per capita have statistically significant effects on the level of development.


Introduction
In the contemporary era, the concept of development has been in greater need of analysis and clarification and the word has come to be extraordinarily widely used in public discourse probably more so than ever before in its history (Payne & Phillips, 2010).As the Industrial Revolution brought forth the most rapid development of the productive forces and accumulation of wealth, it will be logical to examine both material and subjective indicators that may have been responsible for the economic expansion (Jaffee, 1998).Economists use the terms developing and transitional rather than growing to underline that the goals of these countries involve more than simply an increase in output.Thus, development refers to an increase in productive capacity and output by means of a change in the underlying institutions (Colander, 1998).Specifically, development should include increasingly effective institutions in all sectors of society, at economic, social, political, national, regional, and local levels.It should also comprise increasing participation of individuals and groups when making decisions which effect their lives economically and politically (Spalding, 1990).Development of a country is principally associated with both the improvement in basic human needs (health, sanitation, education, etc.) and the growth of per capita income (Mazumdar, 2003;Streeten, 1981), while the main objective of development is simply to create an enabling environment for better life (UNDP, 1990).This paper employs to determine the factors affecting the development levels of selected countries using regression models for limited dependent variables.The paper is organized as follows.Section 2 deals with the concepts of the human development and the Human Development Index (HDI) in general.Section 3 reviews the existing literature to exhibit the studies that concentrate on the evaluation of the HDI data using several statistical approaches.Section 4 gives general information about the theoretical background of regression models for limited dependent variables.Section 5 introduces the methodology of the study and the data being used to perform the analysis.Section 6 presents the results of the logit, probit, and Tobit analyses and the interpretation of the underlying results in detail.Finally Section 7 involves the general discussion and comparison in terms of the present evidence.

Human Development and the HDI
Human development is evidently about enlarging people's choices on the basis of shared natural resources.Since freedoms and capabilities possess a more expansive notion than basic needs, human development can be adopted as the expansion of people's freedoms and capabilities to lead lives that they and have reason to value.In this sense, the human development approach is consistently concerned with making sense of the world and addressing challenges now and in the future (UNDP, 2011a).The human development approach involves two central theses about people and development which are concerned with evaluating improvements in human lives as a distinctive development objective and what human beings can do to achieve such improvements particularly policy and political changes (Fukuda-Parr, 2003).
The Human Development Reports are independent publications commissioned by the United Nations Development Programme (UNDP).They are recognized as independent intellectual exercises, important tools for raising awareness about human development around the world, and piooners of methodological innovation and development thinking (UNDP, 2013a).UNDP also introduces regional, national, and local Human Development Reports to enable responsible people or countries making comparisons and evaluations with respect to the corresponding evidence.The first Human Development Report was launched in 1990 for the purpose of 'putting people back at the center of the development process with respect to economic policy', while the actual particular objective of the Human Development Reports was to investigate the progress of the conditions of human being, associated with the removal of disadvantages and the creation of opportunities to lead worthwhile lives (UNDP, 1990;Anand & Sen, 1997, 2000).So far, twenty two reports were annually published considering various similar or different components.
As human development shows characteristics of an abstract variable, it cannot be directly observed and measured.Many researches argue that GNP per capita cannot satisfy all aspects of development even if an affirmative relationship between GNP per capita and social and human welfare, so human development requires a range of socio-economic indicators.Therefore, a satisfactory measurement concept was developed by the UNDP (Nübler, 1995;McGillivray, 1991;Ivanova, Arcelus, & Srinivasan, 1999).Human Development Index (HDI) is a composite index that measures average achievement in three basic dimensions of human development such as along and healthy life, knowledge, and a decent standard of living (UNDP, 2011a).The HDI was initially developed to underline that people and their capabilities should be the ultimate criteria for evaluating the development level of a country along with economic growth (UNDP, 2013a).Despite a number of authors discuss some drawbacks of the HDI (Trabold-Nübler, 1991;Hicks, 1997;Ivanova, Arcelus, & Srinivasan, 1999;Chowdhury & Squire, 2006;Klugman, Rodríguez, & Choi, 2011), and even propose some modified indices (Paul, 1996;Sharma, 1997;Noorbakhsh, 1998;Chakravarty, 2003;Mazumdar, 2003;Grimm, et al., 2008;Harttgen & Klasen, 2012;Herrero, Martínez, & Villar, 2012;Bilbao-Ubillos, 2013;Blancard & Hoarau, 2013); one of the most significant superiorities of the HDI is regarded as its usefulness that is able to represent both social and economic development through a single value between 0 and 1 (UNDP, 2013b).
The HDI generally comprises three key components including longevity, knowledge, and income; where longevity is measured by life expectancy at birth, knowledge is measured by adult literacy and mean years of schooling, and the income in the HDI is a proxy for a bundle of goods and services needed for the best use of human capabilities (ul Haq, 2003).In order to keep simplicity and usefulness of the HDI, the Human Development Reports present a variety of relevant information in detail and they provide a summary for some of the major components of human development using the HDI to exbihit an alternative emphasis for several standard measures of economic development (Anand & Sen, 2000).
The HDI is the geometric mean of normalized indices measuring achievements in each three key components and there are two steps to calculating the HDI involving creating the dimension indices and aggregating the subindices to produce the HDI.First step principally sets minimum and maximum values to transform the indicators into indices between 0 and 1.The maximum values refer to the highest observed values in the time series (i.e. 1980-2011for Human Development Report 2011).On the other hand, the minimum values are set at 20 years for life expectancy, at zero years for both education variables and at 100 dollars for per capita Gross National Income (GNI).In that way, the subindices are calculated as the following: This equation is applied to each of the two subcomponents for education, then a geometric mean of the resulting indices is created and the equation is employed again to the geometric mean of the indices using zero as the minimum and the highest geometric mean of the resulting indices for the time period under consideration as the maximum.This will be equilavent to applying the corresponding equation directly to the geometric mean of the two subcomponents.Specifically, for income the natural logarithm of the actual minimum and maximum values is used to avoid concavity.Consequently, the HDI is the geometric mean of the three dimension indices as shown by the following formula (UNDP, 2011a): Since 2010, in the knowledge component of the HDI is measured by combining the expected years of schooling for a school-age child in a country entering school today with the mean years of prior schooling for adults aged 25 and older.The income measurement has changed from purchasing-power-adjusted per-capita GDP to purchasing-power-adjusted-per-capita GNI, which provides a more accurate economic portray of many developing countries (UNDP, 2011b).

Literature Review
A number of studies concentrated on the statistical implementation of the HDI in the literature.Lai (2000) used the weighted principal component method on the human development indicators to measure and analyze the progress of human development in the world and employed the main principal component to quantify the temporal changes of the human development of several selected countries.Lee, Lin, and Fang (2006) developed a fuzzy multiple objective data envelopment analysis model to assess the relative performance of the countries in terms of human development by using optimal weights for the component indices of the HDI.
Antony and Rao ( 2007) calculated the HDI and Human Poverty Index of Indian states and developed a composite index using several multivariate statistical methods that is able to explain variations in poverty, health, nutritional status, and standard of living.Cherchye, Ooghe, and Van Puyenbroeck (2008) focused on robust human development rankings and showed that all proposed ranking procedures can be implemented through linear programming.Finally, they illustrated how their methodology could prove useful in the robustness of human development country ranking/classification in a statistical way.
Yang and Hu ( 2008) used both one-and multi-dimensional cluster analysis to analyze China's HDI data for 1982, 1995, 1999, and 2003 to classify China's provinces into four tiers based on the three basic development aspects embedded in the HDI.Grimm et al. (2010) applied a simply approach to compute the three components and the overall HDI for quintiles of the income distribution to make an empirical assessment of 32 countries and they found that a strong overall negative correlation between the level of human development and inequality in human development.
Abayomi and Pizarro (2013) offered a straightforward methodology for measurement of progress, across many dimensions, using cross-national social indices, which they classified as linear combinations of multivariate country level data onto a univariate score and they suggested a Bayesian approach which yields probabilistic intervals for the point estimates of country scores.
Pinar, Stengos, and Topaloglou (2013) considered a stochastic dominance approach for measured human development such as the official equally-weighted HDI and they compared the official equally-weighted HDI to all possible indices constructed from a set of individual components to achieve the most optimistic scenario for development.
Terzi, Trezzini, and Moroni (2013) developed a partial least squares path model to investigate the causal effects between different types of institutions and human development.
Tofallis (2013) purposed a two-step automatic-democratic approach to weight setting for the new HDI, which has properties of non-subjectivity, fairness, and convenience.The first step aims to find the most advantageous set of weights for each nation in turn and the second step evaluates the associated optimal scores on the corresponding indicators to obtain a single weight set.According to the analysis results, the highest weight was placed on the life expectancy dimension.Wu, Fan, and Pan (2013) employed a super efficiency model to evaluate the rationality of the HDI rankings of 19 evaluated OECD countries in 2009 and empirical results showed that approximately 75 % of the evaluated countries had rather different results in the efficiency rankings and the HDI rankings.

Regression Models for Limited Dependent Variables
One of the most important developments in econometrics is the increasing use of microeconomic data on individual economic units (Chow, 1983).Qualitative response models are adopted as regression models in which dependent variables take disrete values (Amemiya, 1985).Consideration of modeling behavior with discrete dependent variables can be divided into consideration of the functional form for the probabilities and consideration of the convenient estimation technique for alternative models and data sets (Hanushek & Jackson, 1977).Many social problems are naturally discrete or qualitative rather than continuous or quantitative such like an event occurs or it does not occur (i.e.employed vs unemployed, married vs unmarried, guilty vs unguilty, etc.), where binary discrete phenomena take the form of a dichotomous indicator or dummy variable (Pampel, 2000;Allison, 1999).In that circumstance, since empirical research in the social sciences increasingly encourages the use of probabilistic choice models involving the linear probability model, the logit model and the probit model, in recent years (Caudill & Jackson, 1989), many researchers in business and economics are frequently concerned with the analysis of a binary dependent variable, where 0 and 1 denote failure and success respectively.In principal, a sequence of success and failure is considered that the chance of that a particular trial is a 1 depends on the value of one or more independent variables.So, the appropriate tests and their estimates for these situations are related to in which the independent variable is preassigned and with independent variables that are functions of the corresponding sequence (Cox, 1958).
The binary response model has an S-shaped relationship between the probability of an event and the independent variables (Long, 1997).The following equation defines a univariate binary qualitative response model as where denotes a sequence of independent binary random variables taking the value of 0 or 1, denotes a K-vector of known constraints, denotes a K-vector unknown parameters, and finally F denotes a certain known function (Amemiya, 1985).The value of ′ must lie between 0 and 1, although ′ can take any value on the real line (Davidson & MacKinnon, 2004).Since the most serious difficulty arises from the linear probability model predictions that may lie outside the (0, 1) interval, alternative distributional assumptions for which all predictions must lie within the appropriate interval were investigated (Pindyck & Rubinfeld, 1981).A formal probit or logit model provides estimation of probabilities, marginal effects, and other derivative results in regard to the normal or logistic distribution on the data (Greene, 2007).In other words, these models are actually interested in analyzing how a series of exogenous variables influences the underlying probability (Hanushek & Jackson, 1977).

Binary Logit Model
The outcome variable in logistic regression model is binary or dichotomous that distinguishes a logistic regression model from the linear regression model.However, the techniques employed in the linear regression analysis serve as a reference for the logistic regression approach.When the outcome variable is binary in a regression model, the conditional mean of the regression equation must be formulated to be bounded between 0 and 1, while the binomial and non-normal distribution describes the distribution of the errors (Hosmer & Lemeshow, 2000).Suppose Y denotes a binary variable and X denotes a quantitative explonatory variable, and let π(x) denote the success probability for the binomial distribution, the logistic regression model possesses a linear form for the logit of the relevant possibility as the following: In this context, π(x) can be described in terms of exponential function, exp(x) as follows (Agresti, 1996): (5) Many of the desirable characteristics of a linear regression model are able to be achieved through the logit transformation, where the logit is linear in its parameters, may be continuous, and may range from -∞ to +∞, depending on the range of x (Hosmer & Lemeshow).The errors in the logit model are assumed to have a standard logistic distribution with mean zero and variance /3 (Long, 1997).A dummy variable 1 such that 1 if th unit chose alternative 1 0 if th unit did not choose alternative 1 (6) can be introduced to form the likelihood of observing the sample.The likelihood of the sample of n observations is as follows (Chow, 1983): The logit model is particularly convenient to explain the odds of success or another substantive outcome or the odds of success for one group relative to another.Odds can be specifically defined as the ratio of the probability of one outcome to another, (Powers & Xie, 2000), that provides a more sensible scale for multiplicative comparisons.In addition, because odds ratios are less sensitive to changes in the marginal frequencies than other measures of association, they are widely-used measures of the relationship between two binary variables.Because the application of the linear probability model brings about some crucial problems such as heteroscedasticity, normality, biased predictions, etc., the logit model can be regarded as the most popular regression model for limited dependent variables by means of several advantages such as simplier interpretation and generalization, and desirable sampling properties (Allison, 1999).

Binary Probit Model
Suppose denotes a vector of characteristics of the ith economic unit and of the object for choice which are explanatory variables for the probability that the ith unit will choose alternative 1.The probit model explains this probability as where β is a vector of unknown coefficients (Chow, 1983).This equation can also be referred to the standardized cumulative normal function where s denotes a random variable which is normally distributed wit mean zero and unit variance (Pindyck & Rubinfeld, 1981).Historically, probit analysis has been widely used whenever the analyst has individual or micro data and is considering a model with a discrete dependent variable (Hanushek & Jackson, 1977).The probit analysis assumes that there is an underlying response variable * defined by the regression relationship * (10) where * is an unobservable latent variable in practice.The observed variable is a dummy variable y defined by The probability formulation can be achieved as 1 1 (12) from the corresponding relations where F is the cumulative distribution function for u, and hence the likelihood function is as follows (Maddala, 1983): The probit model precisely assumes that the underlying probability function is normal rather than logistic and the derivation of the maximum likelihood estimator can be obtained that exactly parallels the maximum likelihood logistic estimator (Hanushek & Jackson, 1977).

Tobit Model
Tobit model is a censored or truncated regression model in which the range of the dependent variable is constrained.In that case, the model is called truncated if the observations outside a specific range are totally lost and censored if at least the independent variables are observed (Amemiya, 1985).When the latent variable * is observed if it is more than zero and is not observed if it is less than or equal to zero, then the observed variable will be represented by the following formula: ~IN 0, This condition is known as the Tobit model which was initially developed by Tobin (1958), that can be denominated as censored normal regression model because some observations on the latent variable are censored (Maddala, 2001).The term censoring refers to the only limited information about the dependent variable can be obtained, when the independent variables for the entire sample, but for specific observations are observed.The Tobit model uses all of the information and provides consistent estimates of the parameters (Long, 1997).The following equation exhibits the likelihood function of the standard Tobit model as ∏ 1 Φ ′ / ∏ / (15) where Φ and denote the distribution and density function of the standard normal variable, respectively (Amemiya, 1985).

Methodology
The 2011 HDI is divided into four quartiles, from "very high" to "low" human development achievement, as introduced in the 2010 HDI, while the new classifications managed to improve functionality of earlier HDI cut-off points and also to reduce the amount of variation in each group (UNDP, 2011b).Within the scope of the relevant classification this study comparatively performs binary logit, probit, and Tobit analyses for 84 countries that have "very high" and "high" HDI values.Particularly, the HDI values under the censored point of 0.783 were involved in the Tobit analysis.Although there are many factors which may possibly affect the level of development of countries, this study classifies these factors in five main groups such as health, education, gender, environment, and economics-population. Furthermore, the study seeks to investigate the statistically significant factors through experiencing several combinations.As a result, four variables were statistically significant for all regression models: life expectancy at birth, expected years of schooling, labour participation rate, and GDP per capita (PPP$).Proportion of seats held by women in a lower or single house or an upper house or senate expressed as percentage of total seats Labour participation rate (female-male ratio) Ratio of female to male of the working age population (ages 15-64) that actively engages in the labour market, by either working or actively looking for work Urban population GDP per capita (PPP$) Particular matter concentrations in terms of fine suspended of particulates of human-made or natural origin less than 10 microns in diameter that are capable of penetrating deep into the respiratory tract Gross domestic product (GDP) expressed in purchasing power parity (PPP) international dollar terms, divided by midyear population 1 Source: Adapted from " UNDP Human Development Report 2011-Sustainability and Equity: A Better Future for All"

Binary Logit Analysis Results
Table 2 exhibits the results of the logit analysis for the selected variables.The log-likelihood value of the logit analysis was found as -14.504 for 84 observations.The likelihood ratio is widely-used, rather than F-test, for the models which were solved by the maximum likelihood technique.The likelihood value of the logit model was 87.01 and it was statistically significant (p < 0.05) which means the four independent variables were jointly able to explain the logit model.The odds ratio (OR) values enable the analyst to interpret the relationship between dependent and the independent variables.For instance, the OR value which is more than 1 implies that the independent variable increases the probability of the occurrence of the event, in contrast, the OR value which is less than 1 decreases this probability.In this context, the logit analysis estimates that life expectancy at birth increases the probability of having a very high HDI for a country nearly 1.5 times more (OR = 1.57, 95% CI = 1.04 -2.39) than having a high HDI value.Expected years of schooling had the most impact on the level of development for the selected countries, where this variable increases the likelihood of having a very high HDI approximately 3 times (OR = 3.21; 95 CI = 1.39 -7.44) than high HDI value.Similary, labour participation rate (OR = 1.08; 95% CI = 1.00 -1.16) and GDP per capita (OR =1.21; 95% CI = 1.05 -1.41) also increase the likelihood of having a very high HDI value for a country.Moreover, the goodness-of-fit of the logit model was also examined in Table 3. Pseudo-R 2 value of the logit model was equilavent to McFadden's R 2 value, which was found as 0.75.This value implies that independent variables are able to explain the relevant changes on the dependent variable with a percentage of 75.Because, pseudo-R 2 can take relatively lower values for regression models performing limited dependent variables when compared with classical regression models, the pseudo-R 2 value of the logit model in this study can be considered as quite acceptable.In addition, very low value of Akaike Information Criteria (AIC) and the negative value of Bayesian Information Criteria (BIC) ensure that the goodness-of-fit of the logit model was satisfactory.A marginal effect explains the rate change in one quantity relative to another, or the change in the dependent variable per unit change in the independent variable.Since the logit and probit models are linear in the parameters, a unit change in produces a β change in a latent variable * for the logit and probit models (Powers & Xie, 2000).Table 4 presents marginal effects of the four independent variables for the logit analysis.
More specifically, one unit change in life expectancy at birth increases the probability of having a very high HDI value with 6% (dy/dx = 0.06; 95% CI = -0.005-0.112).Similarly, expected years of schooling had the most impact (dy/dx = 0.15; 95% CI = -0.003-0.298) on the probability of the high HDI value, where one unit change in this independent variable increases the probability with 15%.Finally, one unit change on labour participation rate (dx/dy = 0.09; 95% CI = -0.002-0.020) and GDP per capita (dx/dy = 0.03; 95 % CI = 0.005 -0.044) increase the corresponding probability with 9% and 3%, respectively.The impact of the four independent variables was analyzed using the binary probit model.As shown in Table 3, the log-likelihood of the probit model was -14.293 with 84 observations.The likelihood ratio of the probit analysis was 87.43, which was statistically significant (p < 0.05).Pseudo R 2 value of the probit model was found as 0.7536.This values implies that independent variables are able to explain the relevant changes on the dependent variable with an approximate percentage of 75.Table 5 also exhibits that all four independent variables have increasing impacts on the HDI values of the selected countries.6 presents the goodness-of-fit test for the probit model.Pseudo-R 2 value of the logit model was very similar to McFadden's R 2 value, which was found as 0.754.In addition, very low value of Akaike Information Criteria (AIC) and the negative value of Bayesian Information Criteria (BIC) ensure that the goodness-of-fit of the probit model was satisfactory.7 presents marginal effects of the four independent variables for the probit analysis.In particular, one unit change in life expectancy at birth increases the probability of having a very high HDI value with 7% (dy/dx = 0.07; 95% CI = 0.009 -0.121).Furthermore, expected years of schooling had the most impact (dy/dx = 0.17; 95% CI = 0.028 -0.305) on the probability of the high HDI value, where one unit change in this independent variable increases the probability with 17%.Finally, one unit change on labour participation rate (dx/dy = 0.01; 95% CI = 0.01 -0.021) and GDP per capita (dx/dy = 0.03; 95 % CI = 0.011 -0.045) increase the corresponding probability with 1% and 3%, respectively.

Tobit Analysis Results
Table 8 represents the results of the Tobit analysis with respect to four independent variables and the HDI values.The countries which had lower and higher HDI values than 0.783 were considered as developed and very developed countries, respectively.In this circumstance, 39 countries were found as left-censored and 45 countries were uncensoredly analyzed.The coefficients of the independent variables can be interpreted as the impact of the latent variable, * .As shown in Table 6, all independent variables were statistically significant (p < 0.05) and had increasing impacts on the dependent variable; however the consant term was not statistically significant on the contrary with the logit and probit models.The likelihood ratio (LR χ 2 (4)=164.79)was also statistically significant (p < 0.05), and all independent variables were able to explain the tobit model.In practice, pseudo R 2 values can be ignored for the Tobit models, which are generally inadequate for the interpretation.
Finally, Table 9 exhibits marginal effects of the four independent variables for the tobit analysis.In particular, one unit change in expected years of schooling increases the probability of having a very high HDI value with nearly 1.5% (dy/dx = 0.0145; 95% CI = 0.0103 -0.0188).The other independent variables had little impacts which were under 1%.

Conclusion & Discussion
This paper performed the logit, probit, and Tobit models to determine factors affecting the level of development of 84 countries which had high and very high HDI values in the Human Development Report 2011.Marginal effects of all three models were also examined to illustrate the results in detail.The results suggested that life expectancy at birth, expected years of schooling, labour participation rate, and GDP per capita had statistically significant impact on the level of development of 84 countries being analyzed.For all three models, expected years of schooling had the most impact on the probability of having a very high HDI value for a country.Although the values were very similar, the logit model appeared to be the most convenient model to determine the impact of the four determinants through the interpretation of the odds ratios.Countries having high HDI values may concentrate on all four determinants and design their policies, but especially on the education dimension to reach the standards of top countries.

Table 1 .
Dependent and Independent variables used in the study

Table 2 .
Binary logit analysis results

Table 3 .
Goodness-of-fit test for binary logit model

Table 4 .
Marginal effects for binary logit analysis

Table 5 .
Binary probit analysis results

Table 6 .
Goodness-of-fit test for Binary Probit Model

Table 7 .
Marginal Effects for Binary Probit Analysis

Table 8 .
Tobit analysis results

Table 9 .
Marginal effects for Tobit analysis