The Hannan-Quinn Proposition for Linear Regression


  •  Joe Suzuki    

Abstract

We consider the variable selection problem in linear regression. Suppose that we have a set of random variables  $X_1,\cdots,X_m,Y,\epsilon$ ($m\geq 1$) such that $Y=\sum_{k\in \pi}\alpha_kX_k+\epsilon$ with $\pi\subseteq \{1,\cdots,m\}$ and reals $\{\alpha_k\}_{k=1}^m$, assuming that $\epsilon$ is independent of any linear combination of $X_1,\cdots,X_m$. Given $n$ examples $\{(x_{i,1}\cdots,x_{i,m},y_i)\}_{i=1}^n$ actually independently emitted from $(X_1,\cdots,X_m, Y)$, we wish to estimate the true $\pi$ based on information criteria in the form of $H+(k/2)d_n$, where $H$ is the likelihood with respect to $\pi$ multiplied by $-1$, and $\{d_n\}$ is a positive real sequence. If $d_n$ is too small, we cannot obtain consistency because of overestimation. For  autoregression, Hannan-Quinn proved that the rate $d_n=2\log\log n$ is the minimum  satisfying strong consistency. This paper solves the statement affirmative for linear regression. Thus far, there was no proof for the proposition while $d_n=c\log\log n$ for some $c>0$ was shown to be sufficient.


This work is licensed under a Creative Commons Attribution 4.0 License.