Hellinger Distance Estimation of Strongly Dependent Multi-Dimensional Gaussian Processes

In the present paper we determine the minimum Hellinger distance estimator of stationary Gaussian multi-dimensional processes with long-range dependence. Under some assumptions which ensure some probabilistic properties, we establish the asymptotic properties of this estimator.


Introduction
In this paper, we extend the results of N'dri and Hili (2011) to the multivariate case.Within the framework of this study, we consider the sequence {X i } i≥1 that is a R d -valued stationary mean-zero gaussian process with density f (x, θ 0 ), where x ∈ R d and θ 0 is assumed to belong to a compact subset Θ of R q .Set X n = (X (1)  n , . . ., X (d) n ), with EX (p) n = 0, E(X (p)  n ) 2 = γ (p,p) (0) = σ(θ 0 ) for 1 ≤ p ≤ d where the function σ is defined on R q .We define E(X (p)  n X (k)  n+t ) = γ (p,k) (t) for n ∈ N * , 1 ≤ p, k ≤ d and t ∈ N * .We suppose that for each θ there exists 0 < α(θ) < 1 such that the correlations γ (p,k) (t) decrease to zero like t −α(θ) L(t) as t → +∞, where L is a slowly varying function at infinity; i.e. lim t→+∞ L(ts) L(t) = 1 for every s such that 0 < s < +∞.The study of random processes with correlations decaying at hyperbolic rates, presents interesting and challenging probabilistic problems.Progress has been made in the past two decades in the theoretical aspects of the subject.Recent applications have confirmed that data are a large number of fields including hydrology, geophysics, turbulence, economics and finance.Many stochastic models have been developed for description and analysis of this phenomenon.For recent developments, see Barndorff-Nielsen (1998); Bollerslev and Mikkelsen (1996); Ding, Granger, and Engle (1993); Lecourt (2000); Lo (1991); Mignon (1998); Ogata and Abe (1991); Robinson (1994); Tse (1998); Vilasuso (2002); Zumbach (2004) . . .The purpose of this paper is to estimate the parameter θ 0 by the minimum Hellinger distance (MHD) method.A good estimator of θ 0 would have two essential properties: it would be efficient if the postulated model for the data were in fact true and its distribution would not be greatly perturbed if the assumed model were only approximately true.It was long thought that there was an inherent contradiction between the aims of achieving robustness and efficiency; that is, a robust estimator could not be efficient and vice versa.It is now known that the MHD approach introduced by Beran (1977) is one way of reconciling the conflicting concepts of efficiency and robustness.For parametric models, it has been shown that minimum Hellinger estimators achieve efficiency at the model density and simultaneously have excellent robustness properties.So the interest for the MHD technique of parametric estimation has been motivated by the fact that these estimators are efficient and robust.The only examples of MHD estimator are related to i.i.d.sequences of random variables see (Beran, 1977(Beran, , 1978(Beran, , 1981) ) and (Yang, 1991).For strongly mixing samples case, see Hili (1995), and in bilinear and nonlinear model, see (Hili, 1999(Hili, , 2008)).For linear univariate strongly dependent processes, see Bitty and Hili (2010).For univariate strongly dependent processes, see N'dri and Hili (2011).
The present paper is organized as follows: Section 2 is devoted to define the notations and the useful hypotheses.
Preliminary lemmas are proved in section 3.In section 4, we show the almost sure convergence and the asymptotic distribution of the estimator θn of the true parameter θ 0 .The results of some simulations are described in section 5.The aim is to evaluate the performance in finite samples of the proposed estimator.

Notations
Let denote by Let F = { f (., θ)} θ∈Θ be a family of functions indexed over a compact set Θ ⊂ R q such that for each fixed θ ∈ Θ, f (., θ): R d −→ R is a positive integrable function.Denote and where H 2 is the Hellinger distance and , where θ 0 ∈ Θ, we want to construct an estimator of the parameter θ 0 over the class F .In order to do this we shall choose the value of θ which minimizes the functional H 2 ( f (., θ), f n (.)) where f n is the kernel density estimator, a nonparametric estimator of f (., θ), which is defined as where (b n ) is a sequence of bandwidth and K(.) a kernel function.Let us consider the following assumptions.

Hypotheses
Assumptions A , for 1 ≤ j ≤ q is continuous and for every j, the function where τ is a positive real number.

Preliminary Lemmas
In this section we shall study the useful lemmas for the almost sure convergence and the asymptotic distribution of the estimator of the parameter θ 0 .A key ingredient in this study will be the diagram formula for the expectations of the products of Hermite polynomials over a Gaussian vector.First we recall this formula that we will need in the proof of the main theorem of this section.A diagram (or a graph) G of order (l 1 , • • • , l p ) is a set of points {( j, l): 1 ≤ j ≤ p, 1 ≤ l ≤ l j }, called vertices, and a pair set of these points {(( j, l), (k, m)): Observe that edges connect vertices of different levels.We will denote the set of edges of the diagram G by E(G).Given an edge ω = (( j, l), (k, m)), let d 1 (ω) = j and let d 2 (ω) = k.With this notation the diagram formula is, n ) be a stationary mean-zero Gaussian vector such that E X (p) Moreover, define and We have E(G(x, X j )) = 0 and As a matter of fact R d K 2 (z)dz < +∞, so by Assumption (A1) and by dominated convergence theorem, we con- where We say that G has Hermite rank ν if the Hermite coefficient Furthermore, by the orthogonality of the Hermite polynomial, we have, In the following, we show the almost sure convergence to zero of T 1,n (x), T 2,n (x) and T 3,n (x).Our approach use the same methodology as in Arcones (1994).
Proof.Using Assumptions (B1) and (B2), it follows that We have Using dominated convergence theorem and Assumption (A1), we conclude that Similarly, it follows converges to zero as n → +∞.So given δ > 0, for n large enough, the right hand side in (2) becomes smaller than and it is enough to verify that this term converges almost surely to zero as n → +∞.Like this, Tchebychev's inequality gives an upper bound with a variance term for any ε > 0, Now, the argument in Arcones (1994), in Lemma 3.1 and Theorem 3.1 in Taqqu (1975) implies that, Therefore, From Assumption (B3) and Borel-Cantelli's lemma, we deduce that for every x ∈ R d , T 1,n (x) converges almost surely to zero as n → +∞.
Proof.Using Assumptions (B1) and (B2), it follows that for every, k such that As in the proof of Lemma 1, it is easy to check that the terms with mathematical expectations, cancel each other in the limit.So, we are left in checking the almost sure convergence to zero of Using Tchebychev's inequality, it follows that for every ε > 0, As in the proof of Lemma 1, we get that Consequently, We get the result by using Assumption (B3) and Borel-Cantelli's lemma.
Repeating the same argument, it follows that for every ε > 0, Again as in the proof of Lemma 1, we get that Therefore, We use Assumption (B3) and Borel-Cantelli's lemma to conclude the proof.
By assumption (B2) and for n large enough, We use again assumption (B3) and Borel-Cantelli's lemma to conclude the proof.
Theorem 1 Assume that assumptions (A1), (B1), ( B2) and (B3) are satisfied, then for an integer k such that n )) converges almost surely to zero when n → +∞ for every x ∈ R d .
Proof.Now, for an integer k such that n υ < k ≤ (n + 1) υ , we write On one side, we show by Lemma 1 that the term converges almost surely to zero as n → +∞.Then, using Lemma 2, we show that converges almost surely to zero as n → +∞.Lastly, the almost sure convergence to zero of is proved by Lemma 3. Therefore, the almost sure convergence to zero of T 1,n (x), T 2,n (x) and T 3,n (x) proved in Lemma 1 to Lemma 3 conclude the proof of the Theorem 1.
Remark 1 Using the fact that the density is continuous and bounded, we show that

Asymptotic Properties of the Estimator
A good estimator would have two essential properties.Firstly it would be efficient, secondly it would be robust, that is to say : if the postulated model for the data were in fact true and its distribution would not be greatly perturbed if the assumed model were only approximately true.In this section, we study in Theorem 2, the efficiency property of the MHD estimator.For the proof of this Theorem, we use Theorem 1 and the continuity of the functional T .
In Theorem 3, we study the asymptotic distribution property of this estimator.First we state the following lemma required in proof.
Proof.We have, where Σ(θ 0 ) is the variance-covariance matrix.By the continuity of the density and by the dominated convergence theorem, we conclude that E f n (x) − f (x, θ 0 ) → 0 as n → +∞.Moreover, we show as in the proof of Lemma 1 that the subsequence corresponding to terms of order n υ for each x ∈ R d .This implies that for each x ∈ R d and Using Theorem 1 and Tchebychev's inequality, we deduce that for ε > 0, Then we have Under assumptions (A1), ( B1)-( B3) and from Borel-Cantelli's lemma, we conclude that for all x ∈ R d , f n (x) converges almost surely (a.s.) to f (x, θ 0 ).Then Therefore f n → f a.s. as n → +∞ in the Hellinger topology.
To establish asymptotic distribution of θn , we need some further notations.Define S (., Furthermore, it is know (see, e.g., Major (1981), HO and Sun (1990)) that there are spectral measures G (p,q) , for ) be the joint random spectral measure which is the limit of where B[−π, π] is the Borel σ-algebra on [−π, π] and Z G (p,p) are random spectral measures associated with the spectral measures G (p,p) .
Proof.From Theorem 2 in Beran (1977), we deduce that where V k is a (q × q)-matrix whose components tends to zero in probability as k → +∞.
For b ≥ 0, a > 0 we have the algebraic identity and we have Using the same approach as in the proof of Lemma 1, it follows from Theorem 1 in combination with assumptions (A5) and (B3) that there exist a positive real number λ such that From assumptions (B2) and (B3), we deduce that Using the dominated convergence theorem we conclude that Moreover, by assumption (A5) we get Let E f k (x) − f (x, θ 0 ) := k −η where να(θ 0 ) 4 < η < 1, by assumptions (B2) and (A5), we get Using again the dominated convergence theorem, we conclude that have the same asymptotic distribution.Denote by Now, the previous argument implies that Let define by We have We define Therefore, where Using the same arguments as in the proof of Lemma 1, we get Furthermore, for large k and by the Theorem 1 there exist an integer number υ such that, and choose Since the right side is sommable, the Borel-Cantelli's lemma implies that Using again the Theorem 1, we conclude that Hence J k (u, θ 0 ) → 0 a.s. as k → +∞.We deduce that k να(θ 0 )/2 L ν/2 (k) ( θk − θ 0 ) has the same asymptotic distribution that We suppose that the rank of the function G k (.) is ν, 1 ≤ ν < 1 α(θ 0 ) and u = (ν, . . ., ν) ∈ R d .Using Theorem 6 in Arcones (1994), we conclude that p (l 1 , . . ., l ν ) is the number of l 1 , . . ., l ν that are equal to p.This conclude the proof.

Simulations
In this section, we investigate the finite sample properties of the MHD estimator.For this purpose we consider the univariate Fractional Gaussian Noise (FGN) process.Let X 1 , . . ., X n be n observations of this process.For 0 < H < 1, FGN is a mean stationary gaussian process with autocovariance sequence and density function .
We simulate the scale σ(H) of the process.The kernel density estimator f n is constructed by using the Gaussian kernel 1 and the bandwidth b n = n −1/10 with sample size n = 1000.For the simulations, we use "longmemo" in R packages and MDEstimator function.Table 1 shows the consistency of the MHD estimator.To illustrate the robustness of the MHD estimator, we proceed as follows; in the MHD estimation, we replace f n by f n,α which is defined as follows f n,α = (1 − α) f n + αδ [0,1] , where α ∈ [0, 1], and δ [0,1] the uniform density on the interval [0, 1].This give the following table with nine values of α.

Table 1 .
Simulations on the estimation of σ(H) parameter

Table 2 .
Robustness of the MHD estimator