Bayesian Approach Using Latent Variable for Zero Truncated Poisson Distribution : Application for Species-Area Relationship

In ecology, understanding the species-area relationship (SARs) is extremely important to determine species diversity. SARs are fundamental to evaluate the impact in this diversity due to destruction of natural habitats, to create biodiversity maps and to determine the minimum area to preserve. In this study, the number of species is observed in different area sizes. These studies are referred in the literature through nonlinear models without assuming any distribution of the data. In this situation, it only makes sense to consider areas in which the number of species is greater than zero. As the dependent variable is a count data, we assume that this variable comes from a known distribution for discrete positive data. In this paper, we used the zero truncated poisson distribution (ZTP) to represent the probability distribution of the random variable “species diversity” and we considered some nonlinear models to describe the relationship between species diversity and habitat area. Among the proposed models in literature, we considered the Arrhenius power function, Persistence function (P1 e P2), Negative Exponential and Chapman-Richards to describe the abundance of species. In this paper, we take a Bayesian approach to fit models. With the purpose of obtaining conditional distributions, we propose the use of latent variables to implement the Gibbs Sampler. In order to progress using the best possible models for data, a comparison of performance between models referred in this paper will be verified through the criteria Extended Akaike Information Criterion (EAIC), Extended Bayesian Information Criterion (EBIC), Deviance Information Criterion (DIC) and Conditional Predictive Ordinate Criterion (CPO). In addition to selecting the best model, it will also assist to define the best selection criterion.


Introduction
One of the fundamental aspects from ecology is the relationship between habitat area and species diversity (Lomolino, 2000).This relationship is essential to understand the biological distribution of species diversity and it is determined by counting the number of distinct species in different sized areas.This studies, known as speciesarea relationship (SAR), are one of the most important tools to create biodiversity maps, to establish relationships between extinction and migration rates and to determinate minimum size areas for species preservation (Arrhenius, 1921;Ulrich et al., 2003).Generally, only one type of organism is observed; the fish diversity in a lagoon, for example.In order to describe SAR, several nonlinear models are suggested in literature (Tjorve, 2003).However, few have been published about comparing different adjustment models.Among suggested models, we are going to consider Arrhenius model, the Persistence models (P1 and P2), Negative Exponential and Chapman-Richards to describe the species abundance (Dengler, 2009).
In these models, the parameter β 0 corresponds to the maximum expected value of species in a specific region, called asymptote.The parameter β 1 is related to the average rate of growth in species diversity.The parameter β 2 defines the shape of the curve and dislocates the inflection point of the function, reflecting the conditions of the region favorable to the growth in species diversity.Since the dependent variable is a count data, it is plausible to assume that it comes from some positive discrete distribution.The most commonly used distribution in count data is Poisson.However, when we have a structural absence of zeros in the data, the most commonly used form is to truncate at the point zero.The objective of this paper is to consider the Zero Truncated Poisson distribution (ZTP) as an alternative to represent the probability distribution of the number of species in determined area.
In this paper, we propose a bayesian approach using latent variables to obtain conditional distributions and Gibbs algorythm implementation.

Truncated Distributions
We will discuss some definitions of truncated distributions used in this paper.Initially we can redistribute the probability distribution functions in a way that the sum keeps being unitary.This way, we can define another variable W = Y in the values of interest, in a way that g(y) = k f (y), y = 1, 2, 3, ..., where f is the probability function of W. As we want Y truncated in zero, we should have So we have the zero-truncated distribution function given by: with the main moments So it is evident that the expected value of the truncated distribution is higher when compared to the non-truncated distribution.This difference is also evident in the variance, however the variance is lower once we have a truncated distribution.

Zero Truncated Poisson Distribution
Since the dependent variable is a count data, it is plausible to assume that it comes from some known positive discrete distribution, such as Poisson distribution.Considering that the event y i = 0 is not observed, we can obtain the zero truncated distributions conditioning the probability functions at the point zero (Van Der Heijden, 2003).
The probability function of the Zero-Truncated Poisson model (Tate, 1958) is defined by: and the first and second moments are This way, it is noticeable that the conditional variance is inferior to the conditional mean.
The fact that a distribution belongs to the exponential family provides great properties to the estimators and has good emphasis in generalized linear models (McCullagh e Nelder, 1989).The probability density functions of exponential family distributions can be expressed as: We will use the systematic component of the model with identity link function.In this case, μ i is specified as a nonlinear function that represents the expected value of the species diversity due to the area.We consider μ i = f (x i ; β) and to f (•) the following functions (Table 1).

Inference
Considering an independent sample (y 1 , ..., y n ), the log-likelihood function of the model is: The estimation of the parameters can be obtained by the maximization of the log-likelihood function.The score function of the log-likelihood ( 9) is expressed by: The log-likelihood function of the model (5) considering the function

Bayesian Inference
We introduce a Bayesian approach using non-informative prior densities.The non-informative priors are used when we expect that the information of the data is dominant.
For the estimation of β through bayesian model, we assume as prior a Multivariated Normal distribution N(β 0 , Σ 0 ), with β 0 given by the estimation of maximum likelihood and Σ 0 given by the inverse of the observed hessian matrix.
By multiplying the expressions ( 5) and ( 12), the joint posterior distribution is given by: In this case, we may use as an approximation to β 0 , Σ 0 the estimates based on the likelihood function ( 11), with Σ 0 approximated by the inverse of the hessian matrix and τ being a known parameter that controls the prior uncertainty about the parameters β.
Since the conditional distributions are unknown, we use the MCMC method through the Metropolis-Hastings algorithm to obtain the estimates of the parameters.The Bayesian confidence interval (Highest Posterior Density) (HPD), fixing the confidence (α) in 95% is given by: Considering a flat Normal distribution as priori, the posterior density to the parameters is given by: We have a situation where the posterior conditional distributions don't have the form of any known distribution and, in this case, the Metropolis-Hastings algorithm is necessary to the posterior calculation.
In order to simplify the conditional distribution for the Gibbs algorithm, we resort to the use of latent variables that allow writing the posterior density as the product of model components.

Latent Variables
The idea of using latent variables is to obtain known forms of complete conditional distributions (Higdon, 1998).
Considering the variables u = (u 1 , ..., u n ), v = (v 1 , ...., v n ) and w = (w 1 , ..., w n ), the joint density to (β, u, v, w) is: We can prove the above equation by verifying that In order to obtain the complete condicional distributions, we can evaluate the situations represented by the variable indicated by the expression ( 16).
In the case of the latent variable u i , we have: As such, the vector of parameters β = β 0 , ..., β q must satisfy the restrictions above to every x i .In this case, we define a subset of the parametric space for β given by: In the case of the latent variable v i , we have: This way, we have: Considering the latent variable w i , we have: Hence, we define the set as: In this case, the indicative variable in the equation ( 16) is equal to 1 if, and only if, β ∈ S u β ∩ S v β ∩ S w β .The conditional probability distribution can be expressed by: hence u i Uniform 0, exp{y i θ i } .
To the variable v i , To the variable w i , hence

Algorithm to the Generation of Samples
Given β 0 = (β 0 1 , ..., β 0 q ), we use the estimates calculated by the maximum likelihood method as initial values.The values will be updated following these steps: 4) We generate a new candidate, β c , from the multivariate normal distribution, N( β, Σ), corresponding to the estimates of the log-likelihood (9) and the hessian matrix.We submit the candidate to the acceptance according the conditions (β ∈ S u β ∩ S v β ∩ S w β ).Otherwise the vector is discarded and β 1 = β c is used.The process is repeated using β 1 = (β 1 1 , ..., β 1 q ) as initial values until the necessary sample is reached.

Model Selection
Several model selection method are proposed in literature.In the paper we will consider the criteria EAIC, EBIC (Brooks, 2002), DIC (Spiegelhalter et al., 2002) and CPO (Pettit & Young, 1990).The first three criteria suggest that the comparison among models are made based on the deviance calculation.

D(β, M
where L(y|β, M i ) is the likelihood function associated to the model M i and C is a constant that is canceled out.A Monte Carlo estimate for the standard deviation is: −2 log(L(y|β ( j) , M i )), i.e., the posterior average deviation.

Extended Akaike Information Criterion (EAIC)
The criterion proposed by Akaike ( 1974) is based on the likelihood function, AIC(β, M i ) = −2 log(L(y|β, M i ))+2p, penalized by the numbers of parameters in the model.Whereas the bayesian information criterion (BIC) ponders the sample size using BIC(β, M i ) = −2 log(L(y|β, M i )) + p log(n).The selection criteria, within the Bayesian context, are obtained through a natural extension considering the posterior density of the parameters of the model.
where p is the number of parameters of the model, n is the sample size and D(β, M i ) = −2 log(L(y|β, M i ) with β equal to the mean of the posterior density.Both criteria (EAIC e EBIC) indicate the best models the lower the obtained value.

Deviance Information Criterion (DIC)
The DIC is consisted by the posterior mean deviance (deviance) penalized by the number of parameters of the model.This criterion is interesting since it can be incorporated during the Monte Carlo simulation.Just like the other criteria, it is an asymptotic approximation to large samples and it is valid when the posterior distribution follows an approximately multivariate normal distribution.Lower values of DIC indicate better adjustment.The DIC is obtained by: where p di = D(β, M i ) − D(β, M i ) measures the complexity of the model i.The criterion suggests a comparison between the mean deviance and the deviance applied in the posterior mean.

Conditional Predictive Ordinate Criterion (CPO)
The Conditional Predictive Ordinate (CPO) (Gelfand et al., 1994) is another criterion widely used in literature to evaluate the model and it is based on cross-validation density.Considering Y = (y 1 , y 2 , ..., y n ), the cross-validation obtained through density p(y k |Y −k ), with Y −k denoting the set of all elements except the k-the observation y −k .This statistic represents the most likely values when the model is adjusted to every observation except y k .For the k-th observation, the CPO is defined by: In this case, p(β|Y −k ) represents the posterior density of β based on data of Y −k .Therefore, from the Equation ( 23) the CPO k is defined as the posterior predictive marginal density of y k given Y −k .Larger values for the CPO k imply better models and lower values indicate influential observations.For most models, there isn't a closed form for the CPO k .Thus, using the samples generated by the Monte Carlo method, we can approximate the calculation of the CPO (Li et al., 2006) by: Summing the information in a simple measure, we chose the model with larger value applied in the natural logarithm of CPO's, called log pseudo marginal likelihood LPML = n i=1 log(CPO i ).

Results and Discussion
In this simulation, we fixed the arbitrary values for β of the Table 1 models, and areas units going from 1 to 40.This areas were divided in sections of size 100, with values generated from the Zero Truncated Poisson distribution, representing the number of different species observed in a specific area.
By calculating the statistics, 10, 40 and 100 replicas were generate Mean Square Error (MSE) and Mean Absolute Percent Error (MAPE) to evaluate the quality adjustment of the model (Table 2, Table 3 and Table 4).The statistics MSE and MAPE are given by: The adjustment of the models were evaluated by four bayesian criteria: DIC, EAIC, EBIC and CPO (Table 5, Table 6 and Table 7) (Spiegelhalter et al., 2002).Using Table 2, we notice that every parameter of the model indicates good adjustment when we compare the actual values to the mean.In these models, the coverage probabilities are close to the expected value of 95.The Table 5 shows the difference between the selection criteria.The Arrhenius, Exponential and Chapman models were correctly selected.
However, a competition between Persistence models and the others is noticeable.It means that, despite the samples being generated with distinct parameters, the models adjust to the data due to their flexibility (Figure 1), with curves approximately superimposed for data generated from Persistence models, resulting in close values to the selection criteria.
Figure 1.Comparison between models

Application to Real Data
The data refer to the species diversity from Hymenoptera order (wasps, bees and ants), observed in a beech forest.
The objective of the study is to relate the number of collected species with the area and estimate the extinction and immigration rates (Ulrich, 2001).In order to estimate the posterior distribution parameters given by 8, 10,000 values were simulated for each parameter and Bayesian criteria of selections were calculated.The priori distributions are given by the maximum likelihood estimates.-196.838 -181.956 -135.576 -130.122 -111.306The Chapman and Persistence 1 models using Poisson distribution show better adjustment, due to lower values of criteria EAIC, DIC and EBIC (Table 8).This fact may be verified by observing Figure 2, in which we notice the two models adjustment adequate to the data.

Discussion
In the proposed approach, we obtained good estimates to the simulated data.For every parameters, the real values are in between the credible intervals.The coverage percents obtained are close to 95% as the sample size is increased.We can notice a depletion in the credible intervals amplitudes.The only exception was verified in the Chapman model for a sized 10 sample, which presented different coverage probabilities from the other models.
We notice a depletion in the sampling errors, which means that the adjusted values are close to the real values, with slightly increase in MSE and MAPE of Arrhenius and Persistence 1 models once we increase from 10 to the 40th sample.
Through the Gibbs algorithm, we notice that the EAIC, EBIC, DIC and CPO models identify, with high probability, the Exponencial, Chapman and Arrhenius models.However, it wasn't possible to verify one single selection criterion that correctly identifies every model.What we have is a high competitivity between the Persistence models.It means that, despite the samples being generated with distinct parameters, the models adjust to the data due to their flexibility.
For the approach using latent variables, the method showed adequate, viable and easy implementation.

Figure 2 .
Figure 2. Model Adjustment for Truncated Poisson for Hymenopteras species

Table 2 .
Bayesian estimation for Poisson distribution n = 10

Table 5 .
Criteria of selection of the models n = 10

Table 8 .
Bayesian selection for the study of Hymenoptera