Recent Developments in Recursive Estimation for Time Series Models

Recently there has been a growing interest in joint estimation of the location and scale parameters using combined estimation functions. Combined estimating functions had been studied in Liang et al. (2011) for models with finite variance errors and in Thavaneswaran et al. (2013) for models with infinite variance stable errors. In this paper, first a theorem on recursive estimation based on estimating functions is extended to multi-parameter setup and it is shown that the unified approach can be used to estimate the location parameter recursively for models with finite variance/infinite variance errors. The method is applied for the joint estimation of the location and scale parameters for regression models with ARCH errors and RCA models with GARCH errors.


Introduction
Estimating function theory is well suited to financial data (see Bera et al. (2006)).Recently, Ghahramani andThavaneswaran (2009, 2011) have studied GARCH model identification and recursive estimation by combining least squares and least absolute deviation estimating functions and the method has been applied to identify several financial time series models.Combined estimating functions had also been studied in Liang et al. (2011) and in Thavaneswaran et al. (2013Thavaneswaran et al. ( ,2015) ) for the multi-parameter setup.Thavaneswaran and Ravishanker (2015) studied recursive estimation for circular time series models using estimating functions.In this paper, first the combined estimating function method is applied to obtain joint optimal recursive estimates of the parameters in autoregressive models with t-distributed errors, then regression models with ARCH errors and RCA models with GARCH errors.Combinations of least squares and quadratic estimating function, as well as combinations of least squares and LAD estimating functions, are considered.
The following example motivates the use of estimating function theory for recursive estimation of the parameter in certain time series models with stable errors.
Consider an AR(1) process y t = ϕy t−1 + ε t , where {ε t } is an i.i.d.sequence of symmetric stable random variables with characteristic function c(u) = exp ( −|u| λ ) where 0 ≤ λ ≤ 2. Closed form representations of the density exist only when λ = 1 (ε t follows a Cauchy distribution) or when λ = 2 (ε t follows a Gaussian distribution).Moreover, the second moments are not finite when 0 ≤ λ < 2. Interest centers on estimating the parameter ϕ based on the observations y 1 , • • • , y n .Merkouris (2007) recently has proposed the estimating function approach to estimate the parameter ϕ.Recently in Thavaneswaran et al. (2013) joint estimates of location and scale parameters are derived for a class of autoregressive (AR) models, a class of Random Coefficient Autoregressive (RCA) models with stable errors, as well as for a class of AR models with stable Autoregressive Conditionally Heteroscedastic (ARCH) errors.Moreover in Thavaneswaran et al. (2013) a fast, on-line, recursive parametric estimation for the location parameter based on transformed estimating functions is discussed using simulation studies, and a real financial time series is also discussed in some detail.Recursive (or online) estimation of a parameter where the estimate of the parameter at time t + 1 is the estimate of parameter at time t plus an adjustment is advantageous when there is a large stretch of data and observations become available successively over time.Recursive estimation of the parameter based on nonlinear estimating functions had been studied by Thavaneswaran and Heyde (1999) and based on combined Suppose that {y t , t = 1, . . ., n} is a realization of a discrete-time stochastic process and its distribution depends on a vector parameter θ belonging to an open subset Θ of the p-dimensional Euclidean space.Let (Ω, F , P θ ) denote the underlying probability space, and let F y t be the σ-field generated by {y 1 , . . ., y t , t ≥ 1}.Let h t (θ) = h t (y 1 , . . ., y t , θ), 1 ≤ t ≤ n be specified q-dimensional vectors that are martingale differences.We consider the class M of zero mean and square integrable p-dimensional martingale estimating functions of the form where a t−1 (θ) are p × q matrices depending on y 1 , . . ., y t−1 , 1 ≤ t ≤ n and the parameter θ.The estimating functions g n (θ) are further assumed to be almost surely differentiable with respect to the components of θ and such that E and E[g n (θ)g n (θ) ′ |F y n−1 ] are nonsingular for all θ ∈ Θ and for each n ≥ 1.The expectations are always taken with respect to P θ .Estimators of θ can be obtained by solving the estimating equation g n (θ) = 0. Furthermore, the p × p matrix E[g n (θ)g n (θ) ′ |F y n−1 ] is assumed to be positive definite for all θ ∈ Θ.Then in the class of all zero mean and square integrable martingale estimating functions M , the optimal estimating function g * n (θ) which maximizes, in the partial order of nonnegative definite matrices, the information matrix and the corresponding optimal information reduces to For the estimating function of the form G(θ, , the following Theorem provides the recursive form of the optimal estimator based on optimal estimating function in the multi-parameter case.The proof is similar to the theorem in Thavaneswaran and Heyde (1999) for the scalar parameter case.
Then the recursive estimator for θ based on the optimal estimating function G(θ, F y t−1 ) is given by , where I p is the identity matrix and b * t−1 is a function of g, θ and the observations.If g(x) = x, then , while for any function g (e.g. if g is the score function, then) .
For the scalar parameter case where b * t−1 dose not depend on θ, we have the following corollary as a special case.Corollary 2. In the class G of all unbiased estimating function g(m t (θ, F y t−1 ))) based on the martingale difference m t (θ, F y t−1 )) = y t − µ t (θ, F y t−1 ).The recursive estimator for θ based on the optimal estimating function is given by , while for any function g (e.g. if g is the score function, then) .
Corollary 3 (Thavaneswaran and Abraham (1988)).For the nonlinear time series models of the form where ε t is an uncorrelated sequence with mean zero and variance σ 2 ε , the recursive estimate based on the optimal linear estimating function ∑ n t=1 a t−1 (y t − θ f (t − 1, y)) is given by We describe recursive estimation of ϕ for known scale c.Based on g * n (θ) in Thavaneswaran et al.(2013), the optimal estimate is obtained by solving the estimating equation . By using vector form of the recursive algorithm (1) and (2) and letting g = sin and b * t−1 = k(u, λ)y t−1 , it is easy to show that the recursive equations for ϕ become where I p is the p × p identity matrix, and Recently recursive estimation for location parameters has been studied and applied to real data for infinite variance stable processes (Thavaneswaran et al. (2013)).Now we show that the results can be obtained as a direct corollary to Theorem 1.Consider the RCA(p) process was defined as Assume that {ε t } is a sequence of i.i.d λ-stable random variables with location parameter zero and known scale parameter c.Assume that the components of {b t } are mutually independent, and are independent of {ε t }.Assume that {b t, j } is a sequence of i.i.d λ-stable random variables with location parameter zero and known scale parameter Then, the distribution of y t , conditional on the past, follows a stable distribution with location parameter ϕ ′ y t−1 and scale parameter c t (ϕ, β), where variation of the the RCA process Based on martingale difference sin , the optimal estimate of ϕ is obtained by solving the estimating equation where k(u, λ) = 2u exp(−|u| λ ) 1−exp(−|2u| λ ) .By using the recursive algorithm (2.5) and (2.6) and letting g = sin Now it is easily shown that the recursive equations become The algorithm described in ( 5) and ( 6) gives the new estimate at time t as the old estimate at time t − 1 plus an adjustment.Given initial values ϕ 0 and J 0 , we can compute the estimate recursively.The recursive estimate φt in ( 6) is usually referred to as an 'on-line' estimate and it can be mathematically computed in a feasible way, especially when data arise sequentially.
As a special case of the RCA (p) model with stable errors where b t = 0, the AR (p) model with stable errors is defined as y t = ϕ ′ y t−1 + ε t , where ϕ ′ = (ϕ 1 , . . ., ϕ p ), y ′ t−1 = (y t−1 , . . ., y t−p ), and ε ′ t s are i.i.d.random variables following a λ-stable distribution with location parameter zero and scale parameter c.That is b t = 0 in the RCA model, c t = c and in (3) and (4), the optimal estimate of ϕ is simply obtained by solving recursive equations where I p is the p × p identity matrix, and

Joint Recursive Estimation of the Location and Scale Parameters
For the single parameter case, recursive estimation has been studied in Thavaneswaran and Abraham (1988), Thavaneswaran and Heyde (1999), and Ghahramani and Thavaneswaran (2012) by using estimating functions.In this section we study the recursive estimation of the location and scale parameters for nonlinear time series models with finite variance and infinite variance error.
Now we consider a real-valued discrete-time stochastic process {y t , t = 1, 2, . ..} with conditional moments In order to estimate the parameter θ based on the observations y 1 , . . ., y n , we consider two classes of martingale differences, viz., {m t (θ) = y t − µ t (θ), t = 1, . . ., n} and the generalized martingale differences {M t (θ) = q(m t (θ)) − E[q(m t (θ))|F y t−1 ], t = 1, . . ., n} such that the quadratic estimating functions and the LS and LAD combinations becomes as special cases.The quadratic variations of m t (θ), M t (θ) and, the quadratic covariation of m t (θ) and M t (θ) are respectively Here, q is any differentiable function with respect to θ chosen in a way such that ⟨M⟩ t and ⟨m, M⟩ t exist.The optimal estimating functions based on the martingale differences m t (θ) and M t (θ) are respectively Then the information associated with g * m (θ) and g * M (θ) are respectively Let g 1 , g 2 be fixed unbiased estimating functions having finite and positive variances, and such that the expectations of ∂g 1 /∂θ and ∂g 2 /∂θ are finite with E[∂g 1 /∂θ] 0. The following lemma which gives the form of the combined estimating function as a linear combination of orthogonal estimating functions as well as nonorthogonal estimating functions can be used to obtain the recursive estimates of the location and scale parameters.
For the discrete-time stochastic process {y t }, the following theorem first extends the results in Liang et al. (2011) for quadratic estimating functions to the combined estimating functions based on the martingale differences m t (θ) and the generalized martingale differences M t (θ), and then provides form of the estimates based on the generalized combined estimating functions.
Theorem 5.For the general model in ( 7) and ( 8), in the class of all combined estimating functions of the form (a) the optimal estimating function is given by ) , where (c) the recursive estimate for θ is given by where I p is p × p identity matrix, and a * t−1 and b * t−1 can be calculated by substituting θt−1 in equation ( 13) and ( 14) respectively.
(d) for the scalar parameter case, the recursive estimate of θ is given by Proof.The proof of part (a) and part (b) are somewhat similar to the proof given in Liang et al. (2011).Part (c) and Part (d) extend the results in Thavaneswaran and Heyde (1999) for the generalized combined estimating function g * C (θ) with the vector parameter.Detailed proof of the theorem is given in the Appendix A.
Note 1.The optimal information matrix based on the first i observations is given by −E can be interpreted as the observed information matrix associated with the optimal combined estimating function g * C (θ).

Autoregressive Models with Student's t Errors
Consider an autoregressive model of the form y t = θy t−1 + σ(θ)ε t where σ(θ) is any differentiable function of θ, and {ε t } are uncorrelated random variables having the student t distribution with density function Then the conditional mean and variance of y t are µ t (θ) = θy t−1 and σ 2 t (θ) = σ 2 (θ)ν ν−2 for ν > 2. Maximum likelihood estimation for this model becomes more complicated and the LAD estimating function method had been discussed Thavaneswaran and Heyde (1999) and combined estimating function method has been discussed in Ghahramani and Thavaneswaran (2009).
We consider two classes of estimating functions generated by the martingale differences {m t = y t − µ t , t = 1, . . ., n} and {M t = −sgn(m t ), t = 1, . . ., n}.For ν > 2, it can be easily shown that the LS estimating function based on m t is given by On the other hand, for any value of ν, the LAD estimating function based on M t is given by with information For ν ≤ 2, the LS estimating function is not defined, whereas the LAD estimating function provides an estimate of θ.For ν > 2, neither the LS nor the LAD estimating function is optimal.In the class of all combined estimating functions of the form {g C (θ) : the optimal estimating function is given by where where c 1 and c 2 are the constants and ρ = E Furthermore, the conditional information I g * C (θ) is given by In order to obtain the recursive equations for θ, we first derive the derivatives of a where δ is the dirac delta function.Then the recursive estimate of θ is given by ) , .

Recursive Estimation for Location and Scale Parameters of a Cauchy Distribution
The Cauchy distribution has the probability density function where θ is the location parameter, specifying the location of the peak of the distribution, and b is the scale parameter which specifies the half-width at half-maximum (HWHM).Now we are interested in how to estimate the parameters θ and b jointly.Because the parameters of the Cauchy distribution do not correspond to a mean and variance and attempt to estimate the parameters of the Cauchy distribution by using a sample mean and a sample variance will not succeed.On the other hand, maximum likelihood can also be used to estimate the parameters θ and b.However, this tends to be complicated by the fact that this requires finding the roots of a high degree polynomial, and there can be multiple roots that represent local maxima.Also, while the maximum likelihood estimator is asymptotically efficient, it is relatively inefficient for small samples.In order to solve the estimation problem, we use he estimating function method and obtain the recursive estimates.The advantages of the method include the simplicity of constructing the estimating equations and the explicit calculation of the estimates.
If X 1 , • • • , X n are independent and identically distributed random variables from the Cauchy distribution, then based on the martingale differences the estimating function is given by For simplicity, let β = e −b|u 2 | , then the estimating function becomes Therefore the recursive equations for θ and β are

AR(p) Models with Stable Errors
For the AR(p) process given by y t = ϕ ′ y t−1 + ε t , where ϕ ′ = (ϕ 1 , . . ., ϕ p ), y ′ t−1 = (y t−1 , . . ., y t−p ), and ε ′ t s are i.i.d.random variables following a λ-stable distribution with location parameter zero and scale parameter c.In order to estimate the vector parameter θ = (ϕ ′ , c) ′ , we consider a class of martingale differences Then the optimal estimating function based on h t (θ) is and the corresponding information matrix is Therefore the recursive equations for θ and β are ) , where . .

Regression Model with ARCH Errors
Consider a regression model with ARCH (s) errors ε t of the form In this model, the conditional mean is µ t = x t ′ β, the conditional variance is σ 2 t = h t , and the conditional skewness and excess kurtosis are assumed to be constants γ and κ, respectively.It follows from Theorem 1 that the optimal component quadratic estimating function for the parameter vector θ = (β 1 , . . ., β r , α 0 , . . ., α s ) ) .
It is of interest to note that when {ε t } are conditionally Gaussian such that γ = 0, κ = 0, The optimal quadratic estimating functions for β and α based on the estimating functions m t = y t − x t β and M t = m 2 t − h t , are respectively given by Moreover, the information matrix for θ = (β ′ , α ′ ) ′ in (3.8) has I βα = I αβ = 0, Now we consider the recursive estimation for β and α.For g * C (β), Hence, therefore, by using the recursive algorithm (2.5) and (2.6), the recursive equations for β are given by Then, by using with the result Also the recursive equations for β are given by ) , Example.We consider the simple linear regression model with the ARCH (1) error defined as y t = βy t−1 + ε t where E The parameter of interest are (β, α ′ ) = (β, α 0 , α 1 ).If we assume γ = 0 and κ = 0, then and Also, the derivatives of m t and M t with respect to β and α are given by Then the recursive equations for β and α are derived as ) , ) −1 .

RCA Models with GARCH Errors
For the RCA model with GARCH errors of the form where {b t } and {ε t } are uncorrelated zero mean processes with unknown variance δ = σ 2 b and variance σ 2 ε (θ) with unknown parameter θ, respectively.Further, we denote the skewness and excess kurtosis of {b t } by γ b , κ b which are known, and of {ε t } by γ ε (θ), κ ε (θ), respectively.In this model, the conditional mean is µ t = y t−1 θ and the conditional variance is The parameter θ appears simultaneously in the mean and variance.Let m t = y t − µ t and t , and the conditional excess kurtosis is κ t = ⟨M⟩ t /σ 4 t − 2. Then It follows from Theorem 1 that the optimal component quadratic estimating function for the parameter vector θ where The calculation of ∂h t ∂θ = ( ∂h t ∂θ , 0, q is not straight forward and we the recursive form is necessary to take into account.Computation of ∂h t ∂θ yields: Example.We consider the RCA model GARCH (1, 1) error defined as where {b t } and {ε t } are Gaussian processes with γ b = 0, κ b = 0, γ ε (θ) = 0 and κ ε (θ) = 0.In order to estimate the parameter vector θ = (θ, δ, ω, α 1 , β 1 ), we have Then takeing the derivative of a * t−1 and b * t−1 with respect to θ we obtain where ) ′ the recursive estimate of θ are given by θt ) ,
Expanding the above equation, the coefficients for g 1 and g 2 are given by and Var(g 1 )Var(g 2 ) − Cov 2 (g 1 , g 2 ) .
Proof of Theorem 1.We choose two orthogonal martingale differences m i (θ) = x i −µ i (θ) and where the conditional variance of ψ i (θ) is given by ⟨ψ⟩ That is, m i (θ) and ψ i (θ) are uncorrelated with conditional variance ⟨m⟩ i and ⟨ψ⟩ i , respectively.Moreover, the optimal martingale estimating function and associated information based on the martingale differences ψ i (θ) are ) .
Then the combined estimating function based on m i and ψ i becomes and satisfies the sufficient condition for optimality where K is a constant matrix.Hence, g * C (θ) is optimal in the class G C , and part (a) follows.Since m i and ψ i are orthogonal, the information d) is as follows.The optimal combined estimating function based on m i ((θ)) and M i ((θ)) is given by Then using Taylor's expansion for g * n (θ) we have Substituting the recursive estimate for θ at each step, the estimate based on the first i − 1 observations is given by θi When the ith observation becomes available, the estimate becomes ) .
Hence it is easy to show that the recursive equations for θ take the form (15) -( 16).