Lp-Adaptive Estimation Under Partially Linear Constraint in Regression Model

We study the problem of multivariate estimation in the nonparametric regression model with random design. We assume that the regression function to be estimated possesses partially linear structure, where parametric and nonparametric components are both unknown. Based on Goldenshulger and Lepski methodology, we propose estimation procedure that adapts to the smoothness of the nonparametric component, by selecting from a family of specific kernel estimators. We establish a global oracle inequality (under the Lp-norm, 1 ≤ p < ∞) and examine its performance over the anisotropic Hölder space.


Introduction
We observe (X 1 , Y 1 ), . . . , (X n , Y n ) ∈ R d × R satisfying where d ≥ 2, g is an unknown function from [0, 1] d to R, the design points {X i } n i=1 are i.i.d. random variables uniformly distributed on [0, 1] d and the noise {ζ i } n i=1 are i.i.d. centered real random variables having symmetric distribution. The sequences {ζ i } n i=1 and {X i } n i=1 are assumed to be independent. In addition, we assume that g has a partially linear structure, that is, there is an unknown parameter β ∈ R d 1 and an unknown function f defined on [0, 1] d 2 with values in R such that where X i = (U i , T i ) ∈ R d 1 × R d 2 with d 1 + d 2 = d and the prime indicates the transposition. Moreover, we assume that d 1 and d 2 are known and the conditional covariance matrix E (U 1 − E(U 1 |T 1 ))(U 1 − E(U 1 |T 1 )) is non-singular.
Partially linear models are semiparametric models since they contain both parametric and nonparametric components. They are more flexible than the standard linear models and they may be preferred to acompletely nonparametric regression because of the well-known "curse of dimensionality". Partially linear models have many applications. (Engle, Granger, Rice & Weiss, 1986 ) were among the first to consider this kind of models. They analyzed the relationship between temperature and electricity usage. The bibliography concerning these models is very extensive and we refer readers to (Härdle, Liang & Gao, 2000) and the references therein.
In this paper, using the ideas of (Goldenshulger & Lepski, 2012), we propose estimation procedure that adapts to the smoothness of the nonparametric component f appearing in (2), by selecting from a family of specific kernel estimators, basing on the observation D n = {(X 1 , Y 1 ), . . . , (X n , Y n )}. For the proposed estimator, we establish global oracle inequality and show how to use it for deriving minimax adaptive results if f belongs to anisotropic Hölder space. To the best of our knowledge, Goldenshulger and Lepski method was used to have adaptive estimation in the framework of multivariate regression with random design in (Comte & Lacour, 2013), (Lepski & Serdyukova, 2014) and (Nguyen, 2014). Our work is close to these two last. However, there are major differences. Nguyen (Nguyen, 2014) considered a general regression model while we made a structural assumption about the regression function as in (Lepski & Serdyukova, 2014) who considered the case where the regression function has single-index structure. The problem of adpative estimation of multivariate function was also studied by (Goldenshulger & Lepski, 2008), (Goldenshulger & Lepski, 2009), (Goldenshulger & Lepski, 2011a), (Goldenshulger & Lepski, 2014), (Lepski, 2013a) and (Rebelles, 2015) in the white gaussian noise and density models.
To measure performance of estimators of nonparametric component f , we will use the risk function determined by the L p -norm · p , 1 ≤ p < ∞: for function f in (2) and for an arbitrary estimator f n based on the observation D n , we consider the risk: where E (n) β, f denotes the expectation with respect to the probability measure P (n) β, f of the observation D n . Here, we note that x)dx and 0 < δ < 1 2 is a given number. We restrict the L p -risk from the cube [0, 1] d 2 to [δ, 1 − δ] d 2 in order to avoid boundary effects.
Throughout this paper, p, q and δ are supposed to be fixed, . is the Euclidian norm R d 1 and the distribution of the noise satisfies the following condition.
Assumption 1 There exists P > 0 and s > max{2q, 2p} such that : This assumption represents the link between the noise and the loss function. Its can be found in (Baraud, 2002) for unidimensional case and (Nguyen, 2014) for multidimensional case.
The rest of this paper is organized as follows. In section 2, we present estimation procedure and preliminary results. Section 3 is devoted to main results. We establish oracle inequality and derive minimax adaptive properties of our estimator.

Selection Rule
In this section we motivate and explain our procedure. Let L and K be two kernel functions defined respectively on R d 1 and R d 2 with values in R satisfying the following assumptions.
Assumption 2 Assumption 3 Let H n be a given subset (0, 1] d 2 defined by: h j and log n denotes the integer part of log n.  Vol. 12, No. 6;2020 Consider the family of kernel estimators : where for any h = (h 1 , ..., Here and later, for any u, v ∈ R d 2 , u v stands for the coordinate-wise division. Remark 2.1 The family of estimators F (H n ) does not depend on β. Indeed, we have for all β ∈ R d 1 , because (U i , T i ) are uniformly distributed on [0, 1] d and according to Assumption 2 (iii). Thus, the use of the second kernel L makes it possible to get rid of the influence of the parameter β.
We propose the data-driven selection from the family F (H n ) which leads to the estimator f n . Next, for underlying function f , we find an explicit upper bound for the risk R (q) p [ f n , f ]. More precisely, we prove that Here C ≥ 1 is an universal constant, the remainder term γ n is independent of f and where f ∞ > 0 is a given number. Let Then, f h * is called oracle and inequality (3) is called oracle inequality. It guarantees that when γ n is negligible before inf , the risk of f n is of the same order as that of the oracle f h * .
Our selection rule uses auxiliary estimators that are constructed as follows: for h, η ∈ H n , define the kernel K h K η by Let f h,η denote the estimator associated with this kernel: Let τ = (τ 1 , ..., τ d 2 ) where τ i = log −1 d 2 (n) and consider the following notations: where C(p) = a(p) ∨ B p with a(p) = 15p log(p) and B p is known constant B p given in theorem 6.36 (Folland, 1999). For every h ∈ H n , let where is least squares estimator of β proposed by (Robinson, 1988).
Here, we note that x + = max(0, x). The selected bandwidth h is defined by and our final estimator is f n := f h .

Auxiliary Lemmas
According to Assumption 2, the stochastic term of estimator f h can be decompose as follows Similarly for the auxiliary estimator, we have Also, the biases of estimators f h (x) and f h,η (x) are denoted B h (x) and B h,η (x) respectively.
Proposition 2.2 Suppose that Assumption 3 is fulfilled. For any η ∈ H n , we have Using Young inequality and Assumption 3 we obtain: Thus, we have and we deduce the result.
3 Suppose that f ∈ F and assumptions 1, 2 and 4 are fulfilled. Then for all n N = max{ñ,n} we have To prove this Proposition we need the following Lemmas. The first is due to (Goldenshulger & Lepski, 2011b). It's an immediate consequence of the Bennett inequality for empirical processes (see Bousquet, 2002) and the standard arguments allowing to derive the Bernstein inequality from the Bennett inequality.
Lemma 2.5 Suppose that f ∈ F and assumptions 2, 3 are fulfilled. Then for any n 2 we have: Proof. We can write For p 1, there exist a countable subset Υ of the unit ball of L q (R d 2 ) with 1 q = 1 − 1 p such that Let's pose Note that for all λ ∈ Λ, we have E(λ(Z 1 )) = 0. We need to find the upper bound for E ξ (1) h p , sup λ∈Λ E(λ 2 (Z 1 )) and sup λ∈Λ λ ∞ .
Therefore, we deduce that If p > 2, using Rosenthal's inequality, we have Using Hölder inequality, we can write Applying Theorem 6.18 and Theorem 6.36 in (Folland, 1999), we obtain Applying Lemma 2.4 we have : Thus, we obtain Lemma 2.6 Suppose that f ∈ F and assumptions 2, 3 are fulfilled, then for any n 2 we have , using Fubini's Theorem and Rosenthal's Theorem Lemma 2.7 Suppose that f ∈ F and the Assumptions 1, 2 and 3 are fulfilled. Then there exists integer N such that for any n N, we have Proof. Recall that We can write Using Proposition 1 in (Lepski, 2013b), with ψ(.) = |.|, = √ 2 − 1, y = 2n 1/4 , c = 1, there exists integerñ such that for all n >ñ Now, we consider P ξ (2) τ ∞ 1 . First, we remark that If s > 2q then for all n >n, we have P ξ (2) τ ∞ 4κ 2 log 2 (n)(L) 2 Pn −q .
Thus, We obtain for all n > N = max{ñ,n} Here, we have used the inequality exp{−x} exp{m log(m)}x −m for all x > 0 and m > 0.
Proof of Proposition 2.3. Let us define A = {f f ∞ }. Using Lemma 2.5 we obtain According to Lemmas 2.6 and 2.7, we deduce that there exists integer N such that for all n > N Proposition 2.8 Suppose that Assumptions 1, 2 and 3 are fulfilled. Then, there exists a constant n 1 ≥ 2 such that for any n n 1 , we have Proof. Using Proposition 2.2, we have The condition s > max{2p, 2q} given in Assumption 1 ensures that 0 < < 1 2 . We have the following decomposition Since ζ 1 is centered and symmetric, we have We need to find the upper bound for E ξ (2) h p , sup λ∈Λ E(λ 2 (Z 1 )) and sup λ∈Λ λ ∞ . If We can write |λ(z)| W(., z) p and Applying Theorem 6.18 and Theorem 6.36 in (Folland, 1999), we obtain Applying Lemma 2.4 with U 2 (h, p), A 2 2 (h, p) and B 2 (h, p) we have There exists n 1 = arg sup n∈N {n −q exp(− n 7 ) − n −q/2 0} such that for any n > n 1 , we have There exists n 1 = arg sup n∈N C 2 (p) exp −n qC 2 (p) − n −1/2 0 such that for all n > n 1 h (x) and for all n > n 1 , κL(2P) 1/l , l ∈ [1, 2] a(l)κL(P + P l/2 ) 1/l , l > 2.
Proposition 2.9 Suppose that assumptions 2 and 3 are fulfilled. Then there exists integerN such that for all n >N we have To prove Proposition 2.9, we need the following Lemma Lemma 2.10 Suppose that assumptions 2 and 3 are fulfilled, then for all n 2: Proof. For p ∈ [1; ∞), using Fubini's theorem and Rosenthal's inequality we have: Proof of Proposition 2.9. We can write For p 1, we have We need to find the upper bound for E ξ (3) h p , sup λ∈Λ E(λ 2 (Z 1 )) and sup λ∈Λ λ ∞ .
Here, we have used the inequality exp(−x) exp(m log(m))x −m for all x > 0 and m > 0.
Thus, we obtain that for all n 2 and finally, we deduce that Proposition 2.11 Suppose that assumptions 1, 2 and 3 are fulfilled. Then there exists n 3 ≥ 2 such that for any n n 3 .
Proof. Recall thatf Taking the account thatf We obtain and In view of (9), (10) and (11) we have respectively: Using Markov inequality, we get There exists integer n 3 such that for all n n 3 all right hand-sides of (12), (13), (14) and (15) are smaller than 1. From all the above, it can be deduced that Proposition 2.12 There exists n 5 ∈ N such that ∀n n 5 we have for any 1 q 2 Using the fact that for j = 1, ..., d 1 , β j − β j = O P (n −1/2 ) then we can write for all ω > 0, there exist γ > 0 such that P |β j − β j | γn −1/2 < ω.
Using Hölder inequality, we have Using the fact that for j = 1, ..., d 1 , E (β j − β j ) 2 = O(n −1 ), that is there exist C > 0, N ∈ N such that for any n there exist n 4 such that for all n n 4 the right-hand term of (16) is smaller than 1. It can be deduced that

Oracle Inequality
Note that B τ is the bias of estimator f τ . Let f n be the estimator obtained from the selection rule (6), (7), (8).
Theorem 3.1 Suppose that Assumptions 1, 2 and 3 are fulfilled. Then, there exists integer N 1 such that for any n N 1 and f belongs to F defined by (4), we have for any 1 q 2 where Q, d, d 1 and T (2q) are constants depending on p, q, f ∞ , κ, andL.

Remark 3.2
(i) The oracle inequality of theorem 3.1 is the key technical tool bounding L p -risk of this estimator on anisotropic Hölder classes H d 2 (α, L).
(ii) The main difficulty in extending it to q > 2 is to use the square risk of β j − β j . Indeed, analyzing the proof of proposition 2.12, we remark that the following relations should be fufilled If 1 q 2, these relation hold. However, we were not able to obtain this relation in the case q > 2. Note nevertheless that if such relation would be found ours results could be extented to q > 2.
Proof. Thanks to the triangular inequality, formula (7) and the definition of h, we have for any h ∈ H n Using Fubini theorem and Young inequality, for any h, η ∈ H n , we have According to (Goldenshluger & Lepski, 2012), Proposition 2.2 and formula (6), we obtain Thus, using Propositions 2.8, 2.9, 2.11 and 2.12, we deduce that there exists integer N 1 such that for all n N 1 we get Finally, we obtain

Adaptive Estimation
In this section, we use the previously established oracle inequalities to study adaptive properties of estimators f n over the scale of anisotropic Hölder classes H d 2 (α, L), α ∈ (0, l] d 2 and L > 0, where l > 0 is fixed. where D (m) i f denotes the mth-order partial derivative of f with respect to the variable t i and u is the largest integer strictly less than u In this section, we assume that the kernel function K satisfies Assumption 3 and Assumption 4 (a) K(u) = K(u 1 , . . . , u d 2 ) = d 2 j=1 K(u j ), where u ∈ R d 2 and K is an unidimensional kernel function.  (d) K ∞ < +∞ (e) K(t)t j dt = 0, for all 0 < j < l.