Use of Shape Restricted Regression Methods for Fitting Model of Per Capita GDP: A Global Economic Scenario of 2018

The aim of the study is to analyze the pattern of gross domestic product (GDP) according to human development index (HDI) for 184 countries of the world. GDP per capita indicates only economic prosperity but not the overall development of the citizens of a country. This research tries to find out the beneath relationship of the financial state and human development of countries using the data of 2018. For demonstrating this analysis several parametric and non-parametric regression methods subject to shape restriction have been used. The study targets to shed light on comparative performance of shape constrained regression with cone projection, polynomial regression, LOESS, isotonic regression with pooled adjacent violators algorithm, kernel regression, smoothing spline and generalized additive model in convex situation.


Introduction
Gross domestic product (GDP) is the market value of all final goods and services produced within a country in a specific time period. It is a globally used indicator of the economic size and growth of a country. GDP per capita is another indicator of economic prosperity per person which is calculated mainly by overall GDP by the country's total population. GDP per capita does not only determine the national wealth of the country but it is also closely related with the living standard of the citizen of that country. Economically developed countries tend to have a higher GDP per capita with smaller population and better civic facilities including mass education accessibility, medical facilities, social security etc. than the less developed countries. Human development index (HDI) is a tool annunciated yearly by United Nations Development Program (UNDP), which evaluates the actual development of the citizen of a country not only their monetary wellbeing. HDI actually assesses three aspects such as living healthy life, education attainability and an admissible standard of livelihood for capturing a crude picture of genuine development of subjects living in a particular region. Statistically, HDI is the geometric mean of normalized indices of three (health, social and economic) attributes, of which the first attribute is characterized by life expectancy, the second one is arithmetic mean of expected years of schooling for children of school entering age and average years of schooling for at least 25 years aged people and the last one is determined by a logarithmic transformation of Gross national income (GNI) per capita. According to the cutoff points referred in Human Development Report 2019, 32.6 percent of 184 countries show very high human development (HDI>= 0.8), 28.8 percent show high human development (0.7-0.8), 19.6 percent show medium development (0.55 -0.7) and 19.6 percent of those countries show low human development (HDI<0.55).
In this study this is tried to observe the relationship between the country's economic growth with the overall prosperity of its citizen's life. For serving this purpose, different statistical regression models (both parametric and non-parametric) have been studied. In section 2 theoretical background of the methods are reviewed along with their practical performance in section 3. Comparison among the models has been presented in section 4.

Data Source
To study the impact of human development index in gross domestic product per capita over the time period 2018, the country specific data values of these variables are employed in this research. The secondary data of nominal GDP per capita (in us dollar) of 184 countries have been taken from International monetary fund (IMF). Luxemburg shows highest per capita in 2018 and Burundi possesses the minimum GDP per capita, only 306.9 US dollar per year. Another key variable is human development index (HDI). This values of HDI is taken from United Nations Development Programme (UNDP)'s official database for the corresponding countries for specific year 2018. Statistical software R (version 4.0.2) is used for computational purpose.

Polynomial Regression
In presence of one predictor, holding both normality and linearity assumption, the relationship between the response and predictor can be explained by simple linear regression model such as, But when the linearity assumption violates i.e., the response and predictor does not have a linear relationship, then a nonlinear regression model can be imposed for better fit. The non-linear regression model can be written as, Here, the function ( ) can take any form and shape. If the function is replaced by a polynomial function of order p, then it is called polynomial regression model and the regression parameters can be obtained by least square method (Gergonne, 1815).

Local Regression and Loess Approach
Local regression is a different approach for fitting flexible non-linear functions involving the fit of a specific point only by its neighborhood observations. The basic concept of local regression comes from the graphical approach of Lowess (Loess). It fits weighted least square by minimizing ∑ ( − 0 − 1 ) 2 =1 at = 0 using weight ( ) and k nearest neighbor algorithm with an appropriate choice of smoothing span f. Greater value of f tends to result in more global and smoother fit. Usually quadratic function is used to fit locally with the tri-cube weight function w(x) as following, where d is the numeric distance from fitted value and given sample value and 0 ≤ ≤ 1.

Shape Restricted Regression
This is a semi parametric model estimation criteria where a constrained regression function and a vector of parameters are estimated using a single cone projection. The shape restricted regression model is used by Liao and Meyer (2014) is as follows, In a matrix form, it can be written as, = + + , where is the vector of parameter and is the covariate matrix. For k covariate, Z is full column rank × matrix and for no covariate case, it contains only a column of ones. This model provides a good picture of predictor X non-parametrically using a flexible assumption of the shape of ( ) with the presence of parametrically modelled covariate. The model accedes eight shapes of the function i.e., increasing, decreasing, convex, concave, increasing convex and increasing concave. Estimation of and is executed from a single polyhedral cone rather than using back fitting algorithm introduced by Cheng (2009).

Isotonic Regression and PAVA
When the predictor holds the assumption of non-decreasing ( ≤ +1 ) pattern, hence the function g(x) is a monotonic (isotonic) function. The isotonic regression is really helpful because of its flexibility to any functional form. Isotonic regression uses weighted least square subject to non-decreasing constraint. The most commonly used algorithm to get this regression is pooled adjacent violators algorithm (PAVA). If 1 < 2 < 3 … < , the PAVA algorithm works with starting with initial value 1 , moving to right until any pair ( , +1 ) violates monotonicity constraint ≤ +1 .
If any > +1 , then they will be replaced by ̅ = + +1 2 . Then −1 > ̅ condition will be checked. If the condition is violated, then the process goes back to previous step and performs for the rest values. Otherwise, the adjacent three values are pooled and replace them with ̅ = −1 + + +1 3 . This step is repeated until the monotonicity constraint prevails (Barlow et al., 1972).
While performing monotonic regression, this function ( ) is estimated through any unconstrained non-parametric method such as Nadaraya-Watson estimate or local linear estimate. Then inverse of the monotonic function ( ̂) can be obtained by the following formula, Here, is a function which measures the probability to find specific distant neighbors from certain values. Kernel regression is used here to estimate density. The kernel uses the bandwidth ℎ . Many options are available to choose an admissible bandwidth of a kernel density estimator such as Sheather and Jones (1991) approach, unbiased and biased cross validation approach etc.

Regression Spline With Basis Function
Regression spline is a technique which fits lower order polynomial regression models into k distinct regions of predictor. It is a very popular regression technique because of its flexibility than higher order polynomials and step functions. It is often called piecewise cubic polynomial regression as it fits third order polynomial model of the form = 0 + 1 + 2 2 + 3 3 + for every knots. For k knots, the number of associated spline coefficients is (k+p+1) as there are k+1 polynomials with p degree of polynomials having p*k constraints. Let the basis functions can be represented with 1 , 2 , … . , +3 and the model becomes = 0 + 1 1 ( ) + 2 2 ( ) + ⋯ + +3 +3 ( ) + .
It is needed to choose the appropriate basis function for a specific application among different spline basis such as, truncated power basis, B-spline, cardinal spline, penalized spline etc. (Knott, 2000).

Smoothing Spline
Smoothing spline aims to find a function which minimizes the loss function incorporating a nonnegative tuning parameter that actually accounts for the roughness of smoothing spline ranging 0 to ∞. The loss function used here is where ′′ ( ) measures the amount by which the slope of a function is changing at t (Green and Silverman, 1994). Unlike cubic splines, the main problem of smoothing spline is to fix an appropriate value of to balance between bias and variance of smoothing spline rather than knots. LOOCV (Leave One Out Cross Validation) method can be employed here to find a possible solution using the following formula, where is iid random variable with mean 0 and variance 2 (Hastie & Tibshirani, 1990). As GAM provides a linear equation, the impact of every single predictor variable in response can be identified individually. GAM with a natural spline is a simpler way to fit if an appropriate set of basis function can be chosen. On the contrary, GAM with smoothing spline includes a tuning parameter to deal with the roughness of the fitted curve which makes the computation little bit more complex. As least square estimates cannot be obtained here, backfitting method can obtain an admissible result.

Practical Performance
We use a dataset of GDP and HDI of 184 countries to demonstrate the methodology for several types of constrained regression models. In this study, it is aimed to fit a nonlinear regression model of GDP with shape constraint, assuming GDP as response variable and HDI as predictor. A scatterplot (fig 1) is drawn to present the worldwide scenario of GDP along with HDI according to continents. shows an upward convex relationship between GDP and HDI. As log transformation does not result in a linear pattern, it is wise to fit a nonlinear regression model with a shape restriction of convexity rather than a linear one. Countries are classified into four classes such as very highly developed (0.8 and above), highly developed (0.7 to 0.8), medium developed (0.55 to 0.7) and poorly developed (0.55 or less) according to their human development index which was introduced in Human Development Report 2014. Demonstration of linear regression models for each of the classes (figure 2(b)) result in poor fitting ( 2 = 0.58, 0.18, 0.000008 and 0.18) for each class and also for overall data ( 2 = 0.2). Such situation can be handled in various parametric and nonparametric way where linearity assumption is disregarded.

Polynomial Regression
Polynomial regression is parametric way to deal with intrinsically nonlinear data where least square estimation is used to predict parameters. As larger degree overfits the data, it is crucial to find the optimal choice of order of polynomial function. Polynomials of several orders have been fitted for this data where the dependent variable is GDP and the predictor is HDI. F test statistic ( = 7.92, = 0.0054) suggests polynomial of 4 th order fits better than any other order. So the non-linear effect of the GDP data can be modelled through a polynomial of degree 4 given by, = 0 + 1 + 2 2 + 3 3 + 4 4 + , = 1,2, … . , This model leads to Multiple 2 = 0.8033 (adjusted 2 =0.7987) and shows the significance of 1 , 2 , 3 and 4 coefficients. Though polynomial regression model is often inflexible to interpret, it has several enviable properties to figure out curvilinear relationship of the response and predictor parametrically.

Local Regression
For the GDP data, Loess is applied for two tuning parameter f. From figure 4 it is observed that, for f=0.8 the fitted line does not capture the total intrinsic pattern as higher value of f lead to use more of the data as training observations. For this study, the admissible choice of smoothing span is 0.6 as the residual standard error is lower for this bandwidth.
This is a non-parametric approach employing moving polynomial although parametric weighted least square is used to fit the estimate in every point of x. Figure 5 shows the Loess fit of GDP data for f= 0.6 with 95% confidence band. Figure 5. Fitting of local regression for f=0.6 with 95% confidence band

Shape Restricted Regression
Shape restricted regression is a semi parametric technique where the response is modelled non-parametrically with predictor with a clear assumption of its shape as well as modelled parametrically with one or more covariates. As our GDP data exhibits an increasing convexity, using this assumption we get the following (figure 6) estimates. As the regression is demonstrated for only one predictor with no categorical or scaled covariate, the parametric part of this regression is absent. The hypothesis testing of 0 : vs 1 : is an exact one sided test which is illustrated with 01 = 0 − 1 0 . We get the value sum of squared residuals for the linear part, 0 = 7.177 and sum of squared residuals for the shape constrained model 1 =1.366. P value (<.001) of this test justifies the use of shape constraint of the model.

Isotonic Regression and PAVA
Isotonic regression is applied to fit monotonically increasing model which has actually piecewise linear form. Pooled adjacent violators algorithm is also used as it is a general approach to deal with convex function and ties to solve isotonic problem. Estimation of isotonic regression with active set algorithm is figured out in 7(a) whereas 7(b) shows the popular PAVA algorithm employing weighted median solver.

B-Spline and Smoothing Spline
Countries are categorized into four distinct groups according to their HDI for classification. So imposing these quartile cutoff values (0.55, 0.7, 0.8) as knots we get the following model with six predictors and intercept.
= 0 + 1 1 ( ) + 2 2 ( ) + ⋯ + 6 6 ( ) Figure 9. Cubic regression spline using basis function with three knots at 0.55, 0.7, 0.8 Spline fitting to GDP data with three knots is displayed graphically in figure 9. The spline function is colored as green and 95% confidence bands are colored as red. It is evident that confidence bands in the boundary region appears narrower. This model results into adjusted 2 =0.7962 and multiple 2 =0.8029, which is a clear indication of better fit. Figure 10. Fitting of Smoothing spline by leave-one-out cross-validation (lOOCV), resulted in 6.56 df

Generalized Additive Model (GAM)
Generalized additive model (GAM) is a linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables. It is called additive model because smooth functions for separate variables are added together. For GDP data, three models are employed. The first model uses smoothing spline of predictor with 6 degrees of freedom. Model 2 uses bs=cr, which implies cubic spline basis defined by a modest sized set of knots spread evenly through the covariate values. They are penalized by the conventional integrated square second derivative cubic spline penalty. Model 3 uses REML of maximum likelihood which may be used for smoothness selection, by viewing the smooth components as random effects. Formula: Model 1: GDP ~ s (HDI, 6) Model 2: GDP ~ s (HDI, bs = "cr") Model 3: GDP ~ s (HDI), method="REML" A comparison between these three GAM models has been illustrated in table 1. Model 2 and 3 gives almost similar result with an admissible value of adjusted 2 . Model 2 is displayed in figure 11 with 95% confidence band. Model 3 shows better estimate as it resulted in lower value of Akaike information criterion. Figure 11. Fitting of generalized additive model (GAM) for model 2

Discussion
A comparative performance of the above fitted non-linear models has been presented in table 2. The performance of the models has been evaluated by some quality measures such as root mean squared error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). It is apparent from the table that among several parametric and non-parametric approaches, the regression model restricted to convex shape and general additive model (GAM) using maximum likelihood estimation for smoothing parameter gives better estimation as their error measurements are lower than others.

Conclusion
The study is intended to fit a model between two key variables human development index (HDI) and gross domestic product (GDP) of worldwide countries. It is evident from the data that the countries which has low human development index implying low life expectancy, less year of schooling and lower income, has also less GDP per capita. Most of the African countries and few Asian countries belong to this class. On the contrary, most of the European and North American countries show very high human development along with a good prospect in GDP. As the data exhibits convexity pattern, several non-linear methods have been conducted for modelling purpose. Among parametric techniques, polynomial with degree 4 gives a better fit. Among non-parametric techniques, general additive model (GAM) performed well for smoothing. For semi-parametric situation, shape constrained regression cone projection works well under increasing convexity with the presence of scaled or categorical covariates.