Kernel Gradient of Density of Probability Estimate From Contaminated Associated Observations

We consider the estimation of gradient of density function of positive associated random process (Xi)i from noisy observations. We establish asymptotic expressions for the variance of the gradient of estimator of density of probability. We consider the case of algebraic decay of the tail of the noise characteristic function of i.


Introduction
We consider the problem of estimating the gradient of multivariate probability densities of a stationary process using observations that are corrupted by additive noise.For each integer p ≥ 1, we consider that there exits a joint probability density f (x) = f (x 1 , ..., x p ) for X 1 , ..., X p , where (X i ) i is a real-valued stationary process.Consider the deconvolution problem Y i = X i + i , i = 1, 2....Such a model of measurements being contaminated by errors arises in many fields where the measurements cannot be observed directly.The noise process ( i ) i consists of independent and identically distributed random variables, and assume furthermore that it is independent of the process (X i ) i , with known marginal density h(x).Let g(x) be the joint probability density function of the random variables Y 1 , ..., Y p which is given by where h(u) = Π p j=1 h(u j ) and u = (u 1 , ..., u p ).We consider the gradient ∇ f (x) of multivariate density deconvolution when the process (X i ) i is associated.Our aim is to study the estimate of ∇ f (x) from the noisy observations (Y i ) n i=1 .This is clearly a multidimensional density deconvolution problem for dependent data.
In Chacon, Duong and Wand (2011), the authors investigate kernel estimators of multivariate density derivative functions using general bandwidth matrix selectors.Given a random sample X 1 , X 2 , ..., X n drawn from the same density of probability f , and provide the results for mean integrated square convergence both asymptotically and for finite samples.The influence of the bandwidth matrix on convergence is established.
The deconvolution problem for the estimation of f (x) has been investigated by many authors.We cite the work of Fan (1991), Masry (2001), among others.Most of papers cited above address how to estimate the unknown density and compute the rate of convergence for specific error process.Fan (1991) used kernel density estimator to estimate the unknown density f , as well as its derivatives, for the case of i.i.d observations and p = 1.Masry (2003) developed an estimation of the multivariate probability density when the underlying process (X i ) i is associated (p ≥ 1).
We recall the definition of association for collections of random variables.
Definition 1 The sequence (X n ) n∈Z is said to be positively associated if for every finite subcollection (X i 1 , ..., X i n ) and every pair of coordinate-wise non decreasing functions whenever the covariance is defined.This definition was introduced by Esary, Proschan, and Walkup (1967).Furthermore, positive association seems to be a natural assumption to model certain clinical trials as those described in Ying and Wei (1994).It is also known, see Pitt (1982), that Gaussian processes are positively associated, if and only if, their covariance function is positive.We note that an important property of associated random variables is that non correlation implies independence; the only alternative frame for this to hold is the Gaussian one.This means that one may hope that dependence will appear in this case only through the covariance structure, and also justifies the study of such processes.Indeed, a covariance is much easier to compute than a mixing coefficient.Unfortunately, a main inconvenience of mixing is that there are only few mixing models for which the mixing coefficients can be explicitly evaluated.We note that association and mixing define two distinct but not disjoint classes of processes.
Example 2 (see Louhichi, 2000) Let ( i ) be a sequence of i.i.d random variables and (μ i ) i∈Z a sequence of numbers.Let X n j = |i|≤n μ j j−i ; and assume that, there exists X j such that lim n→∞ X n j = X j a.s; sup j E|X n j | < ∞ and |X j | ≤ ∞ a.s.The linear process is X j = i∈Z μ j j−i .If the sequence (μ i ) i∈Z is non negative and |μ j | < ∞ then (X j ) j∈Z is associated.

Notations
We denote the characteristic functions of f , g, h and h by φ f , φ g , φ h and φ h respectively.Then φ g (t) = φ f (t)φ h (t) and φ h (t) = Π p j=1 φ h (t j ), where t = (t 1 , ..., t p ).Let us consider ĝn (x) a kernel-type estimate of g(x) that is ĝn Let K(x) = Π p j=1 K(x j ), where K(x) be a real-valued, even, bounded density function on the real line satisfying K(x) = O(|x| −1−δ ) for some δ > 0 and denote its Fourier transform by φ K (t).Assumptions will be made on φ K (t) and φ h (t) which will ensure that φ K (t) where L 1 is the space of Lebesgue integrable functions and L ∞ the space of bounded functions.
For every h n > 0 define the deconvolution kernel The choice of product type kernel is not essential and is made for sake of simplicity.
The kernel density estimator for estimating the unknown density of X is defined as follows: Let (h n ) n≥1 be a sequence of positive numbers such that h n → 0 as n → ∞; given the observations (Y i ) n i=1 , the estimate of f (x) is defined by where Y j = (Y j+1 , ..., Y j+p ) and it is assumed that n > p.
Another expression of f n (x) is where φ n (t) is the standard estimate of the characteristic function of φ g (t): The difficulty of deconvolution depends on the smoothness of the distribution of the error variable .By smoothness of the error distribution, we mean the order of characteristic function φ h (t) of ( i ) as t → ∞.We say that the distribution of is algebraically decreasing or ordinary smooth of order β if φ h (t) satisfies: where a 0 , a 1 , β are positive real numbers.
(A2) The p-dimensional density g(u, v) of the vectors Y 0 and Y l for l ≥ p exists.
(A3) The process is associated and its covariance function c j = cov(X j+1 , X 1 ) satisfies ∞ j=1 j δ c j ≤ ∞ for some δ > 1 + 2 p .We need some lemmas for the proofs of the main results.

Some Auxiliary Lemmas
The result from real analysis that is needed here is the following (see for instance Wheeden & Zygmund, 1977, p. 189): Lemma 4 Assume that φ h (t) and φ K (t) satisfies Lemma 5 Assume that φ h (t) and φ K (t) are twice continuously differentiable with bounded derivatives such that φ h (t/h n ) ; we have F h n (t) ∈ L 1 and is twice continuously differentiable with bounded derivatives; F h n (t) ∈ L 1 then by integration by part one has and by the Riemann-Lebesgue lemma Under such smoothness conditions on φ h and φ K we will show that in fact From which we obtain a bound for its L 1 -norm Based on (2), we get: and thus By assumptions i), we have Thus using iii) we have Proceeding in the same way than Lemma 7, we show that h β n W h n (x) 1 ≤ Const.We conclude, in view of Lemma 7 and lemma 8, that h pβ n g n,k (x) 1 ≤ const.

Main Results
Let us now come back to the main purpose of this paper, which is the study of the estimate of the gradient of density of probability under association.

Main Results
Proposition 7 Assume that ∇ f (x) ∈ L 1 (R p ) and f continuous, then for all x ∈ R p , we have Theorem 8 Let nh (2β+1)p+1 n → ∞ as n → ∞.Under assumptions A1-A3 and conditions on kernel function and on distribution of errors in the lemmas 4, 5 and 6; we have: where 1 p is the matrix all of whose elements are 1, and where B 1 is defined in Lemma 4.

Special Case of the Main Result
We choose K Gaussian.The distribution of ε is the exponential distribution h(x) = λe −λx , then its characteristic function is: First: Second: and

Proof of Proposition 3
We have from (3): We use where * is the convolution operator.Then, applying Lemma 3, we get We conclude that E∇ f n (x) → ∇ f (x).

Proof of Theorem
First part and we conclude using Lemma 4. ii) We can write: By Cauchy-Schwarz's inequality, we get We deduce and and then i) and ii) gives lim Contribution of S 1 .
Assume that the vector Y 0 = (Y 1 , ..., Y p ) and Y j = (Y j+1 , ..., Y j+p ) have a joint probability density g(u, v) of order 2p for j ≥ p and let g(u) be the probability density of Y 0 .Define the dependence index of the process (Y i ) by for some M 2 < ∞.
Let g(u , u , u ) be the joint probability density of (Y 1 , ...Y l+p ) with u , u , u having dimensions l, p − l and l respectively.Let We deduce then It follows by Lemmas 4, 5 and 6; that Next we consider S 2 . Let where we used Lemmas 5 and 6 and the inequality (6).Hence we get We bound S 3 for associated process and We use the following Birkel's Lemma Lemma 9 (Birkel, 1988) Let V i ; i ∈ I be a finite collection of associated (PA) random variables.Let I 1 and I 2 be subsets of I and let H i be functions on R | j| , j = 1, 2; with bounded first order partial derivatives.Then where ∞ stands for the sup norm.
In our case H j are g n .The derivative can be written: Hence, applying Lemma 5 and 6, we get For associated process (Y j ), we apply the Lemma 9 above.
where we have used the fact that cov(Y j+l , Y i ) = cov(X j+l , X i ) due to the independence of (X j ) and (ε j ) and i.i.d assumption on the (ε j ) s. Thus where c l = cov(X l+1 , X 1 ) is the covariance function of (X i ) i .
We now select π n = h Theorem 4 and Proposition 3 yield the quadratic mean convergence of the estimate ∇ f n (x).

Conclusion
We have studied in this paper the problem of estimating the gradient of multivariate probability densities of a stationary process using observations that are corrupted by additive noise.An important problem in non parametric estimation consists in estimation of the mode, i.e., the location of an isolated maximum of the unknown density.Nonparametric estimation of the mode of a density function via kernel methods may be considered when data is contaminated with associated observations.We can then study the asymptotic properties of these mode estimates.