Bayesian Modelling of Integer Data Using the Generalised Poisson Di ff erence Distribution

Integer-valued random variables arising from the difference of two discrete variables can be seen frequently in various applications. In this paper, we obtain the distribution and derive the properties of the difference of two generalised Poisson variables with unequal parameters. This distribution is adopted to model a set of ultra high frequency (UHF) data relating to FTSE100 index futures using covariates. The unique characteristics of UHF data have introduced new theoretical and computational challenges to both statistical and financial studies. Such data consist of discrete-valued observations and unequally spaced time intervals. We also extend the model to its zero inflated version in order to capture the excess of zeros in the given data set. The analysis is carried out in a Bayesian framework using Markov Chain Monte Carlo methods. Various model diagnostics and model comparisons were undertaken which showed that index changes were explained well by the fitted model.


Introduction
Integer-valued variables (with negative and positive values) arising from the difference of two discrete variables can be seen frequently in various applications, among others, in medicine (Karlis & Ntzoufras, 2006), sport (Karlis & Ntzoufras, 2009), image analysis (Li, Emmerich, Eggermont, Bovenkamp, Bäck, Dijkstra, & Reiber, 2009), finance and risk analysis (Consul, 1986;Shahtahmassebi, 2011).While, in the literature, there are many techniques available for dealing with the difference of two continuous or binary variables, methods for modelling integervalued variables are rare.Available methods are mainly based on normal approximations of discrete distributions.However, in most cases normal approximations are not valid, since such data may take on a small range of integers Karlis and Ntzoufras (2006).Thus, techniques that are based on the distributions defined over both negative and positive integer values may improve our inference of such variables.
A possible choice for such a distribution may be the Poisson difference (PD) distribution by Irwin (1937) and Skellam (1946).This distribution has been used for the difference of any two Poisson variables, even when the two variables are not independent (Karlis & Ntzoufras, 2009).However, there are cases where the PD distribution has a tendency to underestimate values in the tails (Shahtahmassebi, 2011).
To address the problem of underestimation of the tails, there are several possible alternatives to the PD distribution, for example the distribution of the difference of two geometric distributions random variables (Kozubowski & Inusah, 2006) or the distribution of the difference of two generalised Poisson (GP) random variables.In this paper, we investigate the latter because the distribution of the difference of two GP variables may have a wider application as it could have longer or shorter tails compared to the PD distribution and also can be considered as a generalisation of the PD distribution.The distribution of the difference of two GP variables with unequal parameters is based on the idea of the GP distribution put forward by Consul (1986).The generalised Poisson difference (GPD) distribution has two additional parameters.We obtain the probability generating function (pgf) and the cumulant generating function (cgf) of the GPD distribution as well as a recurrence relation for all the cumulants.
We show that the GPD distribution with three parameters (Consul, 1986) is a special case of our GPD distribution.We introduce a zero inflated version of the GPD (ZGPD) distribution and use it for modelling the FTSE100 index changes from March 25 2008.In order to validate the fitted model, we obtain one-step ahead predictions of the index change for the next trading day (March 26, 2008).We assess the performance of the posterior predictive distribution with the probability integral transform (PIT) modified for the case of integer-valued variables (Liesenfeld, Nolte, & Pohlmeier, 2006).Finally, model comparisons were performed using the deviance information criterion (DIC) and the Bayes factor (BF).We carried out all our analyses in R. The R functions for evaluating and generating random samples from the GPD and ZGPD densities were written by the authors.
The remainder of the paper is organised as follows.In Section 2, we introduce the GPD distribution and obtain its properties.Section 3 provides a graphical presentation of the GPD distribution.Section 4 describes the data set available for this study and introduces the ZGPD model.The fitting of the ZGPD model using the Bayesian framework is illustrated in Section 5. Results are presented in Section 6.Finally, we conclude and provide an overview of possible extensions in Section 7.

Generalised Poisson Difference Distribution
Consider the GP distribution (Chandra, Roy, & Ghosh, 2013;Consul & Jain, 1973;Hubert, Lauretto, & Stern, 2009;Lerner, Lone, & Rao, 1997;Srivastava & Chen, 2010;Tuenter, 2006) (following the notation of Consul & Jain, 1973) of a random variable X with the following probability function where λ > 0, max(−1, −λ/m) ≤ θ ≤ 1, and m(≥ 4) is the largest positive integer for which λ + mθ > 0 when θ < 0. This condition is considered to ensure that there are at least five classes with non-zero probability when θ is negative.It can be seen that the GP distribution introduces over or underdispersion whether θ is positive or negative.Also, it reduces to the Poisson distribution when θ = 0; see Consul and Famoye (2006, Chapter 9) and Nikoloulopoulos and Karlis (2008) for further details.Now, let us assume X ∼ GP X (λ 1 , θ 1 ) and Y ∼ GP Y (λ 2 , θ 2 ) are independently distributed.The joint distribution of X and Y is given by (following the notation of Consul, 1986) Therefore, the distribution of the difference for any value of z ∈ Z, where We set lower limits for θ 1 and θ 2 to ensure that there are at least five classes of non-zero probabilities at both tails when θ 1 < 0 or θ 2 < 0. Thus, we set max where m 1 , m 2 ≥ 4 are the largest positive integers in which λ 1 + m 1 θ 1 > 0 and λ 2 + m 2 θ 2 > 0. Therefore, for any z > m 1 when θ 1 < 0, or z < −m 2 when θ 2 < 0, we have Furthermore, parameters λ 1 and θ 1 refer to the positive half and parameters λ 2 and θ 2 to the negative half of the GPD distribution.It is difficult to simplify the probability function defined in (2) in a compact form.We generally denote this distribution by GPD(λ 1 , λ 2 , θ 1 , θ 2 ) in which it can have the following particular forms when some of the parameters are zero.
It can be seen that the third case where θ 1 = θ 2 = 0 is the Poisson difference distribution (Skellam, 1946).
We obtain the following recurrence relation for the cumulants of the probability distribution of the random variable where ] (mean of the GPD distribution μ = L 1 ).The expression for the other cumulants of the GPD distribution can be obtained by using (3) recursively for k = 1, 2, 3, . . . .For example L k , for k = 2, 3, 4, is as follows 7  .
where L 2 = σ 2 , the variance of the GPD distribution.Furthermore, the the coefficients of skewness and kurtosis for the random variable Z are obtained as follows .
By having θ 1 = θ 2 , it can be seen that the GPD distribution of Consul (1986) is a special case of the GPD distribution we introduced here.
It can be seen that the GPD distribution may arise as the difference of any two discrete variables which do not necessarily follow the GP distribution.To see this consider X 1 and X 2 follow GP distribution and we can clearly see that this is equivalent to the difference of two GP variables (Karlis & Ntzoufras, 2009).

Graphical Presentation of GPD Distribution
The behaviour of a random variable Z following the GPD distribution with different values of λ 1 , λ 2 , θ 1 and θ 2 is illustrated in this section.To study only the effect of θ 1 and θ 2 on the GPD distribution, first we assume λ 1 = λ 2 = λ and change the values of θ 1 and θ 2 (Figure 1).In general, the distribution has a wider span over Z as either θ 1 or θ 2 increases.The distribution also becomes short tailed as either of the parameters decrease.For a fixed λ, the sign of ] determines the skewness of the GPD distribution.Now, let us assume θ 1 = θ 2 = θ.It can be seen from Figure 2 that when λ 1 < λ 2 , the GPD distribution becomes a skewed distribution to the left.For any given value of θ, when λ 1 decreases, the GPD tends to have a shorter tail.
Since the following type of symmetry holds for the GPD distribution therefore for a given value of θ the GPD distribution tends to be skewed to the right when λ 1 > λ 2 (not illustrated).Also, for any given values of λ 1 and λ 2 , the form of the GPD is substantially altered by varying θ.

Application
The availability of trade-by-trade (commonly known as ultra high-frequency (UHF)) data on transactions, in the recent decade, has revolutionised data processing and statistical modelling techniques in finance (McCulloch & Tsay, 2001;Tsay, 2005;Liesenfeld, Nolte, & Pohlmeier, 2006;Pacurar, 2008).The unique characteristics of UHF data has introduced new theoretical and computational challenges to both statistical and financial studies (Liesenfeld, Nolte, & Pohlmeier, 2006;Pacurar, 2008;Tsay, 2005).Such data consist of (i) discrete-valued observations, as the price change in consecutive transactions is in a multiple of tick size, where one tick is defined as the minimum amount by which the price of the market can change and (ii) unequally spaced time intervals; see Tsay (2005, Chapter 5) and Liesenfeld, Nolte, and Pohlmeier (2006) and references therein for further details on analysing UHF data sets.).Furthermore, one tick size is considered to be 0.5 FTSE100 index.Each contract is valued as £10 per index point, therefore the value of one tick is £5.

Data: FTSE100 Index Change
A one-minute index change (10:00:00-10:01:00) is illustrated in Figure 4(b) which shows unequally spaced time intervals and price discreteness.Table 1 provides the relative frequencies of index change for March 25, 2008.It shows that the positive and the negative index changes are almost symmetrically distributed around zero.It can be seen that about 69% of the indices have not changed at all and about 20% had a change of 1 tick size.In addition, changes of 4 or more ticks are rare (235 and 225 in left and right tails respectively, around 1% in total), in which the minimum and maximum changes occurred with the size of -18 and 15 ticks, respectively (see Appendix B).

Model for Index Change
Assume P t i is the index of the most recent transaction at time t i .Here t i is a continuous time, but indices are only updated when a transaction actually occurs.We may consider the index, P t i , as where N t i is the number of transactions recorded in the interval from time 0 up to time t i and Z i is the index change associated with the ith transaction.In practice many of the Z i 's are zero.Instead of modelling the index process itself, we model the index change process as a non-stationary and non-linear process (Rydberg & Shephard, 2003).
Here, we assume only the difference of two consecutive indices (an integer-valued variable) is observed and follows the GPD distribution with the probability mass function given in Equation ( 2).
In real life applications, count (or integer-valued) data sets, such as dental (Karlis & Ntzoufras, 2006;Mwalili, 2008), spatial (Agarwal, Gelfand, & Citron-Pousty, 2002) and sports (Karlis & Ntzoufras, 2009), may contain an excess of zero values, i.e. more than what the model would predict.We have already mentioned that in our data set about 69% of index changes were zero, a figure which may be underestimated by the GPD model.By introducing an extra probability parameter, a zero-inflated distribution enables us to capture the possible excess of zero values.

Zero-Inflated Generalised Poisson Difference Model
Zero-inflated distributions are used when there is an excess of zeros, relative to the expected frequency of zeros.In the current data set about 70% of index changes are zero.Thus, we extend the GPD distribution to its zero inflated version in order to capture the possible excess of zeros in a given data set.Let Z be a ZGPD random variable with the following probability function for z ∈ Z, where f gpd (z|λ 1 , θ 1 , λ 2 , θ 2 ) is the GPD probability function given by (2) and p ∈ (0, 1) is the proportion of extra zeros.According to (5), the first part of the model represents the probability of all zero values, while the probability of z 0 is given by the second part.Thus, the mean and the variance of a ZGPD random variable Z are Further details on the zero inflated distributions can be found in (Johnson, Kotz, & Kemp, 2005) and references therein.Now let us assume Z, an integer-valued variable, is a response variable which follows the ZGPD(λ 1 , λ 2 , θ 1 , θ 2 , p) distribution.We adopt a set up similar to a ZPD model which has been used widely in statistics literature (see Karlis and Ntzoufras (2009) and references therein).Suppose there are n observations and k explanatory variables.We define the parameters of the model as follows where λ 1 = (λ 1,1 , . . ., λ 1,n ) t and λ 2 = (λ 2,1 , . . ., λ 2,n ) t , X is a matrix of covariates of size n × k and, α 1 = (α 1,0 , . . ., α 1,k ) t and α 2 = (α 2,0 , . . ., α 2,k ) t are vectors of parameters of the ZGPD model.For simplicity, we assume θ 1 , θ 2 and p are fixed over the whole trading day.Furthermore, we set log(1 − θ 1 ) = β 1 and log(1 − θ 2 ) = β 2 in the model which allows us to have parameters over the real line.The advantage of modelling λ 1 and λ 2 in this manner is that it allows us to identify whether each covariate has a similar or different affect on the negative and the positive values within the same model.
We assess the effect of the the previous index change (z i−1 ), the volume of the previous transaction (v i−1 ) and the time duration between two consecutive transactions (Δt i = t i − t i−1 ), on the current index change.Let us point out that having Δt allows us to consider unequally spaced time intervals in the model.Thus, the ZGPD model in Equations ( 8)-( 9), in terms of the above covariates, is

Prior Distribution
To fully specify a Bayesian model, we need to specify the prior distribution.When no information is available, we propose to use normal prior distributions for the parameters of the ZGPD model with mean equal to zero and a large variance (e.g. 10 4 ) to express prior ignorance.For the proportion of the excess of zeros p, we propose a uniform distribution defined in the (0, 1) interval.
Nevertheless, the Bayesian approach enables us to incorporate external information to our inference via our prior distribution.Also, the Bayesian approach can be used sequentially by using the previous posterior distribution as the current prior distribution which speeds up the process of updating our model.In other words, the information from the previous day's transactions can be used to specify our prior distribution (Karlis & Ntzoufras, 2009).

Posterior Distribution
We obtain our inference based on the posterior distribution of the parameter vector ψ = (α 1 , α 2 , β 1 , β 2 , p).The posterior distribution of ψ can be obtained as where the likelihood is given by where I {0} is an indicator function and f prior (ψ) is the prior distribution given by It can be seen that the posterior distribution is not analytically tractable.Thus, in order to generate samples from the posterior distribution we use MCMC methods, more specifically, the random walk Metropolis-Hastings (M-H) algorithm.

Metropolis Hastings Algorithm
In the M-H algorithm, we choose the normal distribution to be our proposal distribution, such that in the kth iteration, for k = 1, . . ., m, where m is the number of iterations, the normal distribution is centred at the values from the previous iteration, i.e. ψ cand ∼ N(ψ (k−1) , σ 2 ), for some value of σ 2 .In addition, (i) An important point to consider in the case of random walk chains is the choice of the value of the dispersion parameter of the proposal distribution.A large value for the variance allows a greater variation from the previous value, but will lead to a very small acceptance rate.On the other hand, a small value of the variance results in draws which are close to the previous value with a high acceptance rate (Gamerman & Lopes, 2006).The optimal choice for the variance of the normal proposal is σ 2 = c 2 Σ, where c ≈ 2.4/ √ d (d is the dimension of the parameter vector) and Σ is the variance-covariance matrix based on the curvature of the posterior at the mode (Tanner, 1998).
(iii) Parameters are updated one at the time.

Simulating Posterior Predictive Distribution
Let Z * = (Z * 1 , . . ., Z * s ) t be the s values which we wish to predict.The posterior predictive distribution is defined as where f like (Z * |ψ) may be given as Therefore, by obtaining the predictive distribution for each component of index change we may be able to evaluate the predictive distribution of index change.For this purpose, we add the following steps to the MCMC algorithm.At the jth iteration, j = 1, . . ., m, where m is the number MCMC iterations, repeat the following steps, for i = 1, . . ., s: 1,i and θ * ( j) 2,i are the ZGPD model parameters which are obtained by substituting the values of α ( j)  1 , α ( j) 2 , β ( j) 1 and β ( j) 2 at the j th iteration in ( 8)-( 9), and where a random sample from the GPD distribution is obtained as the difference of the two generalised Poisson random variables.

Model Checking
The posterior predictive distribution can be used to probabilistically quantify the response variable which enables us to assess the goodness of fit and overall performance of the model.Therefore, if the predictive distribution, in general, is in agreement with the observed data, this implies a good fit of the model (Karlis & Ntzoufras, 2009).However, considering the size of the data and the fact that most of the values (70%) of index change is zero (no index change), it may be difficult to find suitable diagnostics measures.Here, a randomised version of the PIT is implemented to measure how well the predictive distribution, especially in the tails, is able to explain the density of index change for the next trading day (Liesenfeld, Nolte, & Pohlmeier, 2006).In order to use the PIT, we have to construct intervals based on the cumulative predictive distribution of the ith observed index change and the ith observed index change minus 1, as follows where z i is the ith index change in the data set from the next trading day and p(•) represents the estimated counterpart of the conditional probability given in Equation ( 5).It can be seen that Equations ( 14) and ( 15) form intervals with upper and lower limits of u u i and u l i , respectively.If the model is correctly specified, random draws from such intervals will have a standard uniform distribution.To test the idea of the uniformity of the constructed values, one can use the Kolmogrov-Smirnov (K-S) test or transform the values to the standard normal distribution and plot a quantile-quantile (Q-Q) plot of the standard normal distribution against the transformed values of random draws.Here, we used both the K-S test and the Q-Q plot to judge the efficacy of the predicted values.Furthermore, in order to decide how much complexity is necessary to fit the data, model comparison is also undertaken.That is, whether the full model fits the data properly or a model with fewer parameters has a similar performance.For this purpose, we use the deviance information criterion (DIC): where p D = Davg (z) − D θ(z) is the effective number of parameters and Davg (z) = 1 L L l=1 D(z, θ l ), where L is the number of MCMC iterations after burn-in, is a measure of goodness of fit, in which D θ(z) = D(z, θ(z)) where D(z, θ) = −2 log f (z|θ); see Gelman, Carlin, Stern, and Donald (2003, Chapter 6) for further details.It should be noted that when comparing the Bayesian models, a smaller value of DIC suggests a better fit.

Results
The Gelman and Rubin's convergence diagnostic with the statistic value of 1.01 suggests that we can consider the convergence of the MCMC chains after 15000 iterations (Gelman & Rubin, 1992).A further 5000 samples were collected after burn-in.The posterior mean, standard deviation and 95% credible intervals of the posterior distribution of model parameters are provided in Table 3.For none of the model parameters the 95% credible intervals contained zero, indicating that all three covariates have a significant effect on the index change.It can be seen that the previous index change has a larger effect on λ 2 compared to λ 1 .This implies that in the next transaction a switch from a positive to a negative index change is more likely than a switch from a negative to a positive one.We can also see that the effect of volume on the expected value of the index change tends to be small.Since the posterior means of α 1,3 and α 2,3 are almost the same, the mean index change at the next transaction, irrespective of when it occurs, is expected to be zero, assuming all the other variables (previous volume and index change) were fixed.The posterior means of θ 1 and θ 2 have values of 0.228 and 0.217 suggesting that the fitted model has longer tails than the corresponding ZPD model.Finally, the 95% credible interval of p is (0.124, 0.215) which clearly indicates the presence of excess of zeros and, therefore, the necessity of fitting a zero-inflated model.

ZGPD and ZPD Models
In order to show that the ZGPD model with θ 1 θ 2 (ZGPD θ 1 θ 2 ) outperforms a ZGPD with θ 1 = θ 2 (ZGPD θ 1 =θ 2 ) model and a ZPD model, we use the Bayes factor (BF), which is defined as the ratio of marginal likelihood under one model to the marginal likelihood under the other model (Gelman, Carlin, Stern, & Donald, 2003, July;Kass & Raftery, 1995): where ψ and ψ are vectors of parameters under the M 1 and M 2 distributions, respectively and f like,M 1 (z|M 1 ) and f like,M 2 (z|M 2 ) are marginal likelihoods under M 1 and M 2 models.A Bayes factor greater than one ( or log(BF) > 0) supports M 1 model over M 2 model.We obtain the BF using the "MCMCpack" package in R.

Conclusion
In this paper we introduced the GPD distribution with four parameters.The GPD distribution can be considered as an alternative to the PD distribution when the data has longer or shorter tails compared to the PD distribution.We obtained the pgf and the cgf for the GPD distribution and explored the theoretical characteristics of the distribution.We established a recurrence relation for all the cumulants of a GPD random variable and derived the first four cumulants.In addition, we showed that the GPD distribution suggested by Consul (1986) is a special case of our GPD distribution.
We proposed the ZGPD model and illustrated its application by modelling the FTSE100 index changes in a Bayesian framework implemented via MCMC methods.The Bayesian estimation of the parameters of interest were presented in detail.One of our principle interests concerned the distribution of index change for the whole trading day and we have shown that our approach models the behaviour of index change well.Results from the data revealed that the previous index change and the time duration between two consecutive transactions significantly affect the FSTE100 index change.Finally, various model diagnostics were carried out which showed that index changes were explained well by the fitted ZGPD model.Furthermore, we compared our results with the results from the ZGPD θ 1 =θ 2 and the ZPD models and showed that the ZGPD θ 1 θ 2 model fits significantly better than the ∂ψ(β) Figures 1(a) and 3(b) show the GPD (with θ 1 > θ 2 ) and PD (θ 1 = θ 2 ) distributions with identical values of λ 1 = λ 2 = λ.It can be seen that the GPD distribution has a shorter left tail and a longer right tail in comparison with the PD distribution.
Figure 4. (a) FTSE100 index change futures: the discrete structure of index change process can be observed for March 25, 2008.(b) One minute of FTSE100 index change presents the main features of UHF data, e.g.price discreteness and unequally spaced time intervals Here, we analyse a set of ultra high frequency index changes from FTSE100 index futures (Figure 4(a)) recorded on March 25, 2008, traded on the London International Financial Futures and Option Exchange (LIFFE).The normal trading hours of the FTSE100 are from 08:00 until 16:30.Therefore, for simplicity, any transaction beyond these hours are discarded from the analysis.Thus, the number of transactions on March 25, eliminating 913 transactions, reduces from 46,180 to 45,267 transactions (Figure 4(a)).Furthermore, one tick size is considered to Figure 5.A normal Q-Q plot of random samples drawn from intervals based on the cumulative predictive distribution (a) ZGPD and (b) ZPD models

Table 2 .
Table 2 confirms that the model with three covariates and 11 parameters has the lowest value of DIC.DIC, point-estimate, average deviance and estimated number of parameters for each of five models fitted to the FTSE100 index change λ 2 .(A.8)By differentiating ψ(β) in (A.2) w.r.t λ 1 , then differentiating partially w.r.t θ 1 again, we have By substituting (A.9) and (A.10) into (A.8),we eliminate ∂T 1 /∂θ 1 and ∂T 2 /∂θ 2 , and obtain the following relation