The Aitchison and Aitken Kernel Function Revisited

Over three decades ago Aitchison and Aitken proposed a novel kernel function for estimating the density functions of underlying distributions in discrete input spaces. To the best of our knowledge, it has not been shown whether this kernel function is positive definite (i.e., a reproducing kernel function) on these spaces. Its positive definiteness would have enriched and enlarged its applicability domain: a positive definite kernel function has an associated Reproducing Kernel Hilbert Space, a framework on which a variety of powerful statistical and machine learning schemes can be developed. This paper aims to demonstrate that Aitchison and Aitken’s kernel function is indeed positive definite on discrete metric spaces. We also touch on possible applications of the proposed theorem.


Introduction
In a seminal paper Aitchison and Aitken (1976) proposed a kernel function defined on discrete descriptor/input spaces.These authors introduced this kernel function, which is henceforth referred to as the AA-kernel, for estimating density functions in binary input spaces (Aitchison & Aitken, 1976).Its simple non-parametric nature together with its consistency properties have made the AA-kernel a useful tool for generating discriminant functions.For example, classifiers based on the AA-kernel have recently been widely employed in cheminformatics classification problems where the molecules are represented by a zero-one (i.e., binary) variables (Harper et al., 2001;Hert et al., 2004;Wilton et al., 2006;Lowe et al., 2011).Furthermore, in the past few years R-packages featuring the AA-kernel have started to appear in the literature.
According to Aronszajn (1950), to every positive definite kernel function (PDKF)-in the sense defined below-on X×X there corresponds a unique Reproducing Kernel Hilbert Space (RKHS) on X, where X can be any non-empty set (Wahba, 1998).RKHS provides a general framework on which a diverse set of powerful data analysis tools (the so-called Reproducing Kernel Hilbert Space Methods) can be developed, whereby density function estimations, the widely popular Support Vector Machines (Vapnik, 1995) and function approximations from finite data, to name but a few, can be viewed as special cases (Poggio & Girosi, 1997;Wahba, 1990;Hofmann et al., 2008).
The AA-kernel, which is defined on a discrete metric space, is not a Gaussian function in this space, but it can be viewed as the counterpart of a Gaussian kernel function defined on an Euclidean ("standard") metric space (Aitchison & Aitken, 1976).It is well documented that a Gaussian kernel function is PDKF on standard metric spaces (Berg, 1998), but the same cannot be said for the AA-kernel on its discrete metric space.To the best of the author's knowledge, it has not been demonstrated whether (or not) the AA-kernel is positive definite-a "positive result" would have significantly enlarged the applicability domain of the AA-kernel: As stated in the preceding paragraph, for a positive definite kernel function there is a corresponding unique Reproducing Kernel Hilbert Space.This means that if one proves the AA-kernel to be positive definite, then general probabilistic or deterministic data analysis models based on Reproducing Kernel Hilbert Space defined on discrete metric input spaces can be devised.
The following section gives the main definition and several important properties of PDKFs, which are relevant to the topic addressed in this paper.Also in this section the AA-kernel is defined.In Section 3, we demonstrate that the AA-kernel is positive definite.The final section gives our concluding remarks citing possible applications of the theorem proposed in this paper.

The AA-Kernel and Positive Definite Kernel Function (PDKF)
In the context of the work presented in this paper, a kernel function is a two-input symmetrical function (Shawe-Taylor & Cristianini, 2004, Chapter 3).
The AA-kernel is a two-input symmetrical function given as (Aitchison & Aitken, 1976) where x i and x j are binary variables ∈ X = B n with B = {0, 1, ..., c − 1}; 0.5 ≤ λ ≤ 1, and n and c (≥ 2) refer to the number of discrete entries that each variable has and the categories that an entry can assume, respectively; and d(x i , x j ) is a discrete metric defined on B n , i.e., it denotes the number of disagreements in corresponding elements of x i and x j .
In this work, for clarity and without loss of generality, we set c to 2, i.e., B = {0, 1}.The two λ values, λ = 1 and λ = 0.5, lead to two extreme forms of density estimations: uniform distribution and relative frequencies of appearance of the given discrete data, respectively.Thus these two values of λ are ignored-that is, 0.5 ≤ λ ≤ 1 becomes 0.5 < λ < 1 (see Aitchison & Aitken, 1976).
proffers a simple way to compute the value of d(x i , x j ).It certainly does not imply that B n is a normed space.Instead, here (x i − x j ) T (x i − x j ) merely represents a convenient scheme to calculate d(x i , x j ).
Having defined and described the AA-kernel, for completeness we now briefly discuss what a positive definite Kernel Function (PDKF) is.We also cite a number of useful properties of PDKFs, which we deem most relevant for the purpose of this paper.
Definition (Wahba, 1998) A two-argument symmetric function K(x i , x j ; λ) is said to be positive definite kernel function on X × X if for any N and any N (data) points x 1 , ..., x N ∈ X the N × N matrix with elements K(x i , x j ; λ), N i, j α i α j K(x i , x j ; λ) ≥ 0.Where α i , α j ∈ R, λ is a real-valued tunable smoothing parameter and X being any non-empty set.
Note that in the case of the AA-kernel X is B n , i.e., x i and x j can be considered as n-dimensional vectors.
Before proceeding further to show that the AA-kernel is a PDKF, which constitutes the core objective of this paper, a highly useful proposition is provided.The proposition encapsulates several important closure properties of PDKFs, which are relevant for the purpose of this paper.The proof of this proposition can be found in Shawe-Taylor and Cristianini's book (2004).
Proposition If g 1 , g 2 , and h are PDKFs over X × X, a, γ ∈ R + , f (.) is a real-valued function on X, and x i and x j ∈ X, then so are the following PDKFs in the sense defined above: x j is a linear kernel function.

AA-Kernel Function Is Positive Definite
This section constitutes the nub of the paper.First we formulate a theorem stating that the AA-kernel is positive definite.We then provide the full proof of the theorem.
Theorem 1 If 0.5 < λ < 1, and x i and which-using Equation 2-can be rewritten as then after some simple algebraic manipulations, Equation 4becomes , respectively.This gives , respectively.
where β, γ m and q denote 2λ−1 λ , β m m! and q = q(x i , x j ) = x T i x j (a linear kernel function), respectively; with m being a positive integer.

Summary
Over three decades ago Aitchison and Aitken proposed a novel kernel function for estimating the density functions of underlying distributions in discrete metric spaces.To the best of our knowledge, it has not been shown whether this kernel function is positive definite on discrete metric spaces.The positive definiteness of this kernel function would have enriched and enlarged its applicability domain, because a PDKF has an associated Reproducing Kernel Hilbert Space (RKHS).A RKHS provides an excellent framework on which a variety of powerful statistical and machine learning schemes can be developed as discussed at great length and detail in some of the references cited in Section 1.
We therefore anticipate that the proposed and proven theorem in this paper can be applied wherever (in statistics or machine learning) the application of models based on the RKHS concept deemed appropriate for the analysis of discrete datasets.