The Small Area Estimation of Economic Security: A Proposal

,


Introduction
The objective of this work is to propose a strategy to estimate an indicator of economic security for small areas, defined as geographical areas or domains for which the sample size is too low to obtain reliable estimates. The problem of small area estimation is particularly relevant when a sample survey is planned to provide reliable estimates at a specific geographical level or for specific domains, and reliable estimates for a more detailed level are demanded for some reasons. For example, the availability of reliable information at local level, may help to plan policies to improve households' well-being and reduce territorial economic inequality. This issue is particularly relevant for Italy, whose economic system is characterized by a strong territorial concentration of productive activities, with consequent effects on the territorial distribution of households' richness.
The major contributions to the small area estimation issue are, among others, those from Rao (2003), Rao and Molina (2015), Jiang and Lahiri (2006) and Pfeffermann (2013). The advantage of the small area methods suggested by these authors is that they allow to improve the reliability of "direct estimates" obtained for small area, where direct estimates are estimates obtained simply by using survey weights (for example using the Horvitz-Thompson estimator). The reliability of estimates is improved: i) by borrowing information from auxiliary variables available for the population from the Census or administrative archives; ii) linking the small areas through a model. Applied contributions to the small area estimation problem have so far focused mainly on poverty and inequality, but we think that economic insecurity/security represents a different concept, which also deserves to be investigated.
In this work we focus on economic security, the counterpart of economic insecurity. Economic security is a complex expression that carries a variety of meanings. Although there is no formal unambiguous definition, it is possible to provide a general definition by characterising it as a condition of well-being and wealth. It has strongly influenced the institutional political debate of the last decade in the West, especially since the financial crisis of 2008 (D'Ambrosio and Rohde, 2014) but even more in the last years after the economic consequences due to the pandemic period (Dvoryadkina et al., 2021). crisis. We consider the weighted average of the scores obtained for individuals to obtain an aggregate indicator at area level.
Moreover, there is no convergence among researchers on what is the most suitable outcome to study economic security. For example, some outcomes very commonly considered in the literature are wealth, income, consumption, but also employment, activity rate, or even the comparison of the same indicators of poverty between countries with different levels of well-being (Osberg and Sharpe, 2014).
In this work we consider small area models specified at area level. The basic small area model specified at area level was proposed by Fay and Herriot (1979). Given the nature of the indicator, we propose to consider some longitudinal extensions of the Fay-Herriot model. These longitudinal models include time-specific random effects considering autoregressive processes of order 1 (AR1; see Esteban et al., 2012) and moving average of order 1 (MA1; see Esteban et al., 2016).
We carry out a design-based simulation study to investigate the design-based properties of the estimates obtained from the small area models considered. To this purpose we use longitudinal data taken from the European Union Statistics on Income and Living Conditions (EU-SILC) referred to Italy. The results show a significant improvement in small area estimates obtained from the small area models suggested, especially in the case of MA1 model. The rest of the paper is structured as follows. Section 2 presents the economic insecurity indicators employed in this work. In Sections 3 small area models considered are introduced together with the Bootstrap method used to estimates the variance of direct estimates. In Section 4 the simulation study is described, and results are presented, and Section 5 concludes. D'Ambrosio (2013, 2019) use an axiomatic approach to derive the economic insecurity index. This approach is shared with several other contributions aimed at proposing indexes in economics and social sciences. Some proprieties that a measure in this context should satisfy are defined and then the class of measure that can respect these axioms is develop (Thompson, 2001).

The Economic Insecurity Index
Given a R (T) a ( + 1)-dimensional Euclidean space and the vector ( , … ,0), where T is the number of past periods (lags) considered and 0 represents the current period, the economic insecurity index is defined as a function = 〈 〉 ∈ that for every ∈ ℕ, gives : ℝ ( ) → ℝ. Applying this function to the stream vector of an outcome (for example income or wealth) available for an individual, ( 0 , 1 , … , ) ∈ ℝ ( ) , a score representing the individual level of economic insecurity is obtained.
The following desirable axioms are satisfied by the index suggested by Bossert and D'Ambrosio (2019): Gain Loss monotonicity; Proximity monotonicity; Resources-variation monotonicity; Homogeneity; Translation invariance; Quasilinearity; Stationarity; Loss Priority. Gain Loss monotonicity guarantees that a lower level of insecurity is related to a stream with a gain in the earliest period compared to a stream without changes. Proximity monotonicity guarantees that a loss (or a gain) in the past has less weight than a loss (or a gain) in the present. Resources-variation monotonicity represents an extension of the two previous properties and established that a movement of the stream of the type "first down and then up", which means a loss and then straight forward a gain, produces a decrease in the insecurity score, while an increment in the score is given by a "first up then down movement". The "Homogeneity" property assures that, when the outcome stream is multiplied by a constant, then the score resulting from the index is also multiplied by the same constant. The "Translation Invariance" property assures that if the same amount is added to each outcome value of the stream, the resulting insecurity score remains the same. According to the "Quasilinearity" property, the insecurity score I T (w) corresponding to a stream ∈ ℝ ( ) can be expressed as a function of the differences − −1 observed for the − 1 most recent outcome levels. The "Stationarity" propriety guarantees that if two streams, p and q, are shifted of r periods in the past and s is assigned as outcome levels in the additional periods, the insecurity comparison associated with the two streams remains unchanged. Finally, the "Loss Priority" property establishes that, ceteris paribus, a loss has a stronger impact on individual insecurity than a gain of the same magnitude, in the same period.
The insecurity index that satisfies all these proprieties (Bossert and D'Ambrosio, 2019;Bossert et al., 2022) may be written as: where value denotes the outcome level at time t. 0 , 0 ∈ ℝ ++ and ∈ (0, 0 0 ⁄ ), such that 0 > 0 , for all ∈ ℕ and for all ∈ ℝ ( ) . Coefficients 0 and 0 represent, respectively, the coefficient for the losses and the coefficient for the gains, chosen such as 0 > 0 (this allows the "Loss priority" propriety to be satisfied). represents a discount factor which is set equal for gains and losses. A higher value for the discount factor ensures a higher weight attached to the past and vice-versa.
The index is structured in two parts: the first summation captures all the periods where a loss has occurred, while the second summation regards all the periods where a gain has occurred. We choose to use for the coefficients involved in (1) the same values suggested in Bossert et al. (2022), 0 = 1, = 15 16 ⁄ and = 0.9, although the authors note that the results are quite robust with respect to different choices for these values. Moreover, in this work we choose the household equivalized income as individual outcome for the calculation of the index and, following Hacker et al. (2010), we choose to consider a security index and we take the negative value of (1).

The Small Area Models Considered
Area level models consist of two models: a model linking the small area direct estimates to the underling parameters and a model linking the underling parameters to some auxiliary variables known for the population at small area level. Auxiliary information is known from the Census and/or administrative archives, and therefore it is free of sampling error.
The first area level model proposed in literature is the Fay-Herriot model (Fay and Herriot, 1979). This model represents the cornerstone of small area estimation and consists of two equations. The "sampling models" links the direct estimate to the underlying parameter: where denotes the direct estimate of small area d, is the parameter of interest (in our case the average of the economic security indicator) in area d, and represents the sampling errors independently distributed as The "linking model" links the underlying parameter to some auxiliary information known at population level: contains p auxiliary variables available from the Census or administrative archives for all the small areas, is a vector of p regression coefficients, and are model errors, assumed to be independent and identically distributed (i.i.d.) from N(0, 2 ), with variance 2 unknown. Errors are assumed to be independent from sampling errors .
The variances of direct estimates, 2 , are usually assumed to be known, and substituted with their sample estimates. In our case they are estimated using the bootstrap method and then smoothed using a Generalized Variance Function model, as described in Section 3.2.
Merging the sampling and the linking models, the extended form of the basic Fay-Herriot model is obtained: Small area estimates are obtained through the Empirical Best Linear Unbiased Prediction (EBLUP), which is a weighted combination of the direct estimator and the domain specific regression estimator (Prasad andRao, 1990 andDatta andLahiri, 2000). The BLUP is given by: EBLUP is simply obtained from BLUP by substituting 2 with the corresponding estimate, that can be obtained using moments, ML or REML method.
The MSE of EBLUP is obtained as follows: where: The estimator for (̃) depends on the method used to estimate 2 . If it is estimated using moments or REML method, under regularity conditions it reduces to: where and ̅ (̂2) is the asymptotic variance of the ̂2 estimator (Rao, 2003, Chapter 7).
Many extensions of the classical Fay-Herriot model (FH) have been developed over time. Considering the longitudinal nature of our indicator, we consider two extensions that take advantage of the availability of more waves, by adding time varying effects in the model and specifying an AR(1) or MA(1) process for them. We will call them AR1 and MA1 model respectively. The idea behind these two models is that the reliability of results can be improved by borrowing information both from space and time, using simultaneously time-varying effects and random effects. Therefore, to estimate the parameter for the last wave, previous information can be used through a longitudinal model. Small area AR1 model (Rao and Yu, 1994;Esteban et al., 2012) can be written as follows: where is the direct estimate of the parameter for area d and time t, is a p auxiliary variables vector for time t, is the vector of regression coefficients, 1, are area specific effects constant over time assumed i.i.d. (0, 1 2 ), 2, are time varying effects with common variance 2 2 and following an AR(1) process with parameter , and are the sampling errors assumed to be independently distributed as (0, 2 ).
Small area MA1 model may be written using formulas already seen for model AR1, apart for the time correlation matrix Ω( ) that is substituted by Ω( ) = 1≤ ≤ (Ω ( )), where represents the parameter of the MA(1) process assumed for time varying effects. Matrix Ω ( ) is given by: EBLUP for the small area population mean and its MSE may be obtained as in equation (17) and (18) In this work, Fay-Herriot, AR1 and MA1 models are estimated using R packages "sae" and "saery", and unknown parameters are estimated using REML method.

Bootstrap Variance Estimation
As it is customary, we assume that the variances of the direct estimates are known, and we substitute them with their respective estimates. Given the complex structure of the indicator considered, the estimates of these variances are obtained using a Bootstrap strategy, that is by repeatedly selecting B random samples with replacement from the survey sample by small area, calculating the weighting average of the economic security index for each replication by small area, and then calculating the variance of these estimates by small area. In this work we set B=1,000.
Nevertheless, Bootstrap variances calculated for small areas may be highly instable estimates, given the small number of sampling units available for small areas. For this reason, these variances are usually smoothed using a model. The procedure we used to this purpose belongs to the Generalized Variance Function approach (GVF) (Wolter, 2007). An important aspect to emphasize is that, although this methodology is widely used for the estimation of variances in household surveys, it is primarily a practical methodology and, overall, a general theory supporting the use of one specific model over another has not been developed yet. GVF allows to select a model that enables to link the direct estimates to the estimates of their variances.

Data Used
A simulation is performed to understand the properties of the small area estimators suggested, considering their application to income survey data. A design-based simulation is chosen. The advantage of design-based simulation is that the estimator properties are evaluated under the randomised distribution, i.e. the distribution over all possible samples that could be selected from the population of interest under the sampling design. This allows to have a more realistic view of the small area estimation problem considered. In contrast, using model-based methods, inference is made with respect to the underlying models, that are always approximations. On the other hand, design-based simulations allow the robustness of model-based estimation methods to be assessed against misspecification, by repeatedly sampling from a realistic population. The advantages between choosing a design-based or model-based simulation have been widely discussed, see for example Salvati at al. (2010), Pfeffermann (2013) and Warnholz and Schmid (2016).
In this simulation study, EU-SILC survey data are used. EU-SILC represents the main source of data for the periodic reports of the European Union on the socio-economic conditions and the spread of poverty in the member countries, with indicators focused on income and social exclusion, in a multidimensional approach to the problem of poverty and material deprivation. The sampling strategy adopted in EU-SILC is complex. The sample consists in a rotating panel in which in each successive year of the survey only a portion of the sample is replaced, so that each unit is expected to participate in the survey for 4 years. The sample of households that are introduced in each successive year is selected according to a two-stage stratified sampling scheme, a sampling strategy often used for household surveys. In the case of Italy, the first-stage units are municipalities, stratified by region and population size. In the second stage, households are randomly selected from the municipalities selected in the first stage.
In this simulation the sample is used as the synthetic population, and subdivided into 20 regions that represent the small areas of reference. Due to the lack of information on the first-stage sampling and on the region (NUTS2), we subdivide households within each territorial repartition (NUTS1) into a number of clusters equal to the number of regions in the territorial repartition (from a minimum of 2 regions to a maximum of 6 regions per territorial repartition). Clusters are defined according to the results of a k-means non-hierarchical cluster analysis carried out in each territorial repartition by considering two variables, the equivalized tax on income and social insurance contributions and the equivalized regular tax on wealth. The reason for this choice is that in Italy taxes have a small regional component. In the algorithm, the number of cluster is set equal to the number of regions in the territorial repartition. Clusters obtained are than ranked according to the average of the equivalized taxes, and matched with the regions having the same rank within the territorial repartition according to individual taxed paid on average in the population. We consider this approximation acceptable in the context of a simulation study. The resulting 20 clusters are considered target small areas from which to select random samples.

Results
The size of samples repeatedly selected from the simulated population ranges from a minimum of 18 households in the smallest area to a maximum of 261 households in the largest one. 81 households are selected on average from the areas.
For each random sample of households and for each year, we calculate the weighted average (Horvitz-Thompson estimator) of the security scores obtained for the individuals in the households selected within each area. These averages represent the direct estimates. Then the Bootstrap technique and GVF procedure discussed in Section 2 are applied to estimate the variance of direct estimates. In particular, we use as inputs for the GVF procedure the weighted average of the economic security scores calculated for small area (direct estimates) and their respective Bootstrap variances. To select a unique function for every simulated sample, we apply the GVF procedure to the average of the direct area estimates and the average of the Bootstrap area variances obtained for the 1,000 simulated samples. Then we apply the same function to smooth the variances of direct estimates for all simulated samples.
After comparing several specifications for the smoothing model, we select the following model on the basis of AIC and BIC criteria: ( 2 ( )) = 0 + 1 ( ) To compute this model, we used the ReGenesees package from R, which includes computational algorithms for GVF. Through this model we find the parameters we need to predict the estimated coefficients of variation that will allow us to obtain the smoothed variances, that is: is the variance of the model. In the linking model of the small area models we include auxiliary variables calculated from data related to the populations of the Italian regions available from the National Statistical Institute and from fiscal archives. The model should be parsimonious, but in order to better explain the dependent variable, a number of regressors have to be selected in this case. This is also due to the nature of our indicator, which has a very high variability. The following auxiliary variables are chosen, by fitting a regression model for the averages of the regional direct estimates obtained from the simulated sample, and using a stepwise regression: average income from building (earned for a building), average amount of retirement income, average individual income (calculated on the number of income earners), proportion of the working-age population, proportion of the working-age population calculated for foreigners, percentage of graduates, the ratio between the number of individuals aged 0-14 year-olds and the population between 15 and 64 years old.
The EBLUPs derived from Fay-Herriot, AR1 and MA1 models are obtained using the methodologies seen in Section 3. The following performance measures are calculated to compare the performance of the small area estimators proposed: where ARB means average absolute relative bias and it is a measure of the bias of an estimator (see Rao, 2003). In this formula d denotes the area, while s = 1, ...,1000 denotes the sample, ̂ is the estimate for the d-th area and the s-th sample (Direct, Fay-Herriot, AR1 or MA1) and Y d is the parameter in population for the d-th domain. Then we measure the accuracy of estimates considering: where AMSE is the average mean-squared error of an estimator (Direct, Fay-Herriot, AR1 or MA1). Finally: where St denotes the small area estimator (FH, AR1 or MA1), measures the gain in efficiency provided on average by the small area estimator. AMSE and AEFF measure the accuracy of an estimator in terms of its mean square error.
Results are reported in Table 1. They clearly show the gain in efficiency provided by the small area estimators. MA1 model performs better than both FH and AR1 models in terms of bias. The ARB of the direct estimator is very close to zero, as expected. On the other hand, the ARB of the Fay-Herriot model is slightly lower than that of the AR1, but higher than that of the MA1, therefore MA1 must be preferred in terms of bias.
All small area models provide significantly lower value for AMSE than the direct estimator. The most efficient estimates are produced by MA1 model, that provides a gain in efficiency of 130% with respect to the direct estimator. It is followed by AR1 and then by FH model. It is possible to note that, even though AR1 model provides more biased estimates than FH model, it overall provides more efficient estimates than FH model. Therefore, the simulation highlights that the best model in our case is MA1, but AR1 also appears to provide an overall efficiency gain compared to the FH, although it results more biased. We focus on the best performing small area model, MA1, and we carry out a graphical analysis to understand the properties of this estimator better. Figure 1 highlights that as the sample size increases, the absolute relative bias decreases. It can be noticed that the highest values for the bias correspond to the areas with the lowest sample size, and that the bias decreases as the sample size increases. This highlights that the small area estimator based on the MA1 model tends to be asymptotically design-unbiased and consistent. In Figure 2 the square root of the ratio between MSE(Dir) and MSE(MA1) is plotted against the sample size, to show the efficiency gain provided by the small area estimator with respect to the direct estimator for different sample sizes. This gain seems to be particularly pronounced for those areas with low values of the sample size, although, in general, even in the largest areas, MA1 model provides a noticeable, although slightly less pronounced, gain in efficiency. Finally, Figure 3 compares the square root of the MSE of MA1 estimates with that of direct estimates. All observations are above the diagonal, thus highlighting that MA1 estimates are more reliable than direct estimates in all small areas.

Conclusions
In this work a strategy for the small area estimation of economic security is proposed. To this purpose an indicator obtained by summing absolute differences between levels of equivalent household income in consecutive years, is applied to data taken from EU-SILC sample survey carried out for Italy from 2014 and 2016. The target parameter is the small area average of the individual economic security score. To improve the reliability of direct estimates that can be obtained for small areas, small area models specified at area level are considered. This indicator has been used here for the first time in a small area estimation context, whereas poverty and inequality indicators have usually been considered in the small area literature so far. In addition to the basic Fay-Herriot model, given the nature of the indicator, we proposed to consider some longitudinal extensions of the Fay-Herriot model, specifically two models including temporal random effects. In the first longitudinal model temporal random effects follow an autoregressive process of order 1, in the second one they follow a moving average process of order 1. Thus, we try to improve the reliability of estimates by borrowing information, not only from auxiliary variables available from administrative archives, but also from temporal correlation. The variances of small area direct estimates, to be included in the small area models, are estimated by using a bootstrap methodology, and then smoothing using the GVF method.
The performance of the models proposed is evaluated though a simulation study, based on EU-SILC data. Results obtained from the simulation study highlight that the best performing small area model, both in terms of bias and overall efficiency, is the MA1 model. Moreover, the simulation study provides further evidence of some properties of the estimators: indeed, all models perform better than direct estimator in terms of mean square error. Furthermore, the graphical analysis shows that the small area estimator based on best performing model, MA1 model, tends to be asymptotically design-unbiased and consistent, and that the efficiency gain is relevant for all areas.
However, this work suffers from some limitations. The available auxiliary variables are not highly correlated with the target variable. The variability of the economic security indicator considered in this work makes particularly difficult to find suitable covariates for the small area models. This problem is often encountered in small area application for poverty and inequality indicators. Finally, a further limitation is the low number of waves available. In fact, with a higher number of waves available, the longitudinal models considered may provide more precise estimates for the economic security indicator.

Disclaimer
This paper is based on data from Eurostat, EU Statistics on Income and Living Conditions (2014, 2015, 2016. The responsibility for all conclusions drawn from the data lies entirely with the authors.