Various Proofs of the Sylvester Criterion for Quadratic Forms

In the first part of the paper we present several proofs of the so-called Sylvester criterion for quadratic forms; some of the said proofs are short and easy. In the second part of the paper we give an algebraic proof of the Sylvester criterion for quadratic forms subject to a linear homogeneous system.


Introduction
Given a (real) symmetric matrix A, of order n, and its associated quadratic form one of the most used and taught criteria to test the positive (or negative) definiteness of ( 1) is the so-called Sylvester criterion.Whereas the necessary part of the proof of this criterion is rather simple, the proof of the sufficient part is often omitted or is given in a somewhat long and intricate way.The aim of the first part of the present paper (Section 2) is to offer various proofs of the Sylvester criterion for quadratic forms, with the hope that some of the said proofs can be useful for a course of Linear Algebra delivered to undergraduates.
In the second part of the paper (Section 3) we give a relatively simple algebraic proof of a well-known necessary and suffcient determinantal criterion for positive (negative) definiteness of a quadratic form subject to a system of homogeneous linear equations.This determinantal criterion may therefore be considered a generalization of the Sylvester criterion to a "constrained quadratic form".
We recall that the i, j minor of the element a i j of matrix A, of order n, denoted M i j , is the determinant of the (n−1)×(n−1) matrix obtained by deleting the i-th row and the j-th column of A. The i, j cofactor of the element a i j of A, denoted C i j , is given by (−1) i+ j M i j .We denote by A (k) the k-th order leading principal submatrix of A or k-th order North-West principal submatrix of A, i. e. the square submatrix of order k, consisting of the first k rows and first k columns of A, k = 1, ..., n.
The k-th order leading principal minor or North-West principal minor of A is det(A (k) ), denoted also ∆ k , k = 1, ..., n.The k-th order principal minor of the n × n matrix A, is the k-th order leading principal minor of P ⊤ AP, where P is some permutation matrix of order n.There are n!/k!(n − k)! possible k-th order principal minors.
Theorem 1 (Sylvester criterion).Let A be a real symmetric matrix of order n.Then (1) is positive definite (equivalently: matrix A is positive definite) if and only if The quadratic form (1) is negative definite (equivalently: matrix A is negative definite) if and only if Remark 1.
We note that the second part of Theorem 1 follows at once from the first part, by remarking that if A is positive definite, then −A is negative definite (if A is negative definite, then −A is positive definite), and that, by basic properties of determinants, it holds det(−A (k) ) = (−1) k det(A (k) ).
Other basic properties related to (1) are the following ones.
Theorem 2. Matrix A is positive definite if and only if all its eigenvalues are positive; A is negative definite if and only if all its eigenvalues are negative.
Contrary to the Sylvester criterion, the proof of Theorem 2 is quite short and simple, being based on the so-called "spectral theorem" or "principal axes theorem": any symmetric matrix can be diagonalized by an orthogonal transformation matrix.
(ii) Owing to Theorem 2, the eigenvalues of A are all positive and being det(A) λ i , we have the thesis.
We note further that if A is positive definite, then obviously any North-West principal submatrix Other useful properties concerning (1) are given by the following results (see, e. g., Horn and Johnson (1985)).
Theorem 3. The quadratic form (1) is positive definite if and only if there exists a nonsingular matrix Q of order n, not necessarily unique, such that (It is sufficient to note that if λ 0 is an eigenvalue of the nonsingular matrix A (if λ = 0, then A would be singular), then 1/λ is eigenvalue of A −1 ).

Various Proofs of the Sylvester Criterion
A) A straightforward proof.
i) Proof of the necessary conditions.
The necessary conditions are usually proved in an easy way in most papers and books: for each k, k = 1, ..., n, let us choose a vector x k ∈ R n , x k 0, with the first k components arbitrary real numbers and with the last (n − k) components equal to zero: ii) Proof of the sufficient conditions.
We use the "complete induction method".Obviously the condition is true for k = 1.In order to prove that the same is true for k = 2, we have to verify that From a 11 > 0 and a 11 a 22 − (a 12 ) 2 > 0 it follows a 22 > 0 and therefore that also tr(A (2) ) = a 11 + a 22 > 0. Being ∆ 2 > 0 and tr(A (2) ) > 0, taking the characteristic equation of A (2) into account, we get that the eigenvalues of A (2) are both positive and therefore A (2) is positive definite.
Then, we have to prove the implication: Usually the above implication is proved by means of numerous algebraic transformations.It can be easily proved as follows.Let us suppose that A (k+1) is not positive definite.It follows that A (k+1) must posses at least two negative eigenvalues, say λ and λ (if A (k+1) would have only one negative eigenvalue, then ∆ k+1 < 0).As A (k+1) is symmetric, there exist two orthogonal eigenvectors, say x and y, corresponding to the said two negative eigenvalues: x ⊤ y = 0.
We can therefore choose a linear combination of x and y, u = αx + βy, such that u 0 and the last entry of u is zero (x and y are both different from the zero vector, but they cannot have only the last component different from zero, otherwise they would not be orthogonal).Then we have as from A (k+1) x = λx, with λ < 0, we get x ⊤ A (k+1) x = λx ⊤ x < 0 and similarly we get y ⊤ A (k+1) y < 0. But then A (k) would not be positive definite, against the induction assumption.So, all eigenvalues of A (k+1) must be positive, i. e.A (k+1) is positive definite.
B) Another simplified proof of the sufficiency of the Sylvester criterion.
We have said that the proofs of ii) in the previous section A) are usually quite long and/or intricate.See, e. g., Bomze (1995), Mirsky (1955), Kemp and Kimura (1978), Simon and Blume (1994).An exception is given by Samuels (1966) whose proof we report here, for the reader's convenience.This proof relies on the following lemma.
Proof.Since A (n−1) is nonsingular, vector a has a unique representation as a linear combination of the columns of A (n−1) .
Let α 1 , ..., α n−1 be the coefficients of this linear combination of the said columns.Let P ⊤ be the matrix obtained from the identity matrix I by replacing its last row by Then P is nonsingulat and, by the symmetry of A, is the desired matrix.Since the second part of the lemma follows at once.
On the grounds of the previous lemma, the second "step" of the inductive proof of ii) sub A), is immediate: let A be an n-th square symmetric matrix with positive principal minors.Then A satisfies the assumptions of the lemma; the positive definiteness of x ⊤ Ax is equivalent to positive definiteness of where y denotes the first (n − 1) components of x : y = [x 1 , x 2 , ..., x n−1 ] ⊤ .This form is positive definite since the first term is positive by the induction assumption and the second term is positive, thanks to Lemma 1.
We point out that a similar and elegant proof of the above sufficient condition is provided by Hestenes (1975).

C) Proofs by means of the Lagrange-Jacobi reduction formula.
There are various proofs of the Sylvester criterion which are based on the Lagrange-Jacobi reduction formula; some proofs are clear and efficient from a didactic point of view, but rather long, such as for example the proofs of Gantmacher (1959), Hadley (1961) and Hohn (1973).Some other proofs are more concise but also more difficult to grasp and therefore not too suitable for a course to undergraduates; it is the case, e. g., of the classical paper of Debreu (1952), of the books of Aleskerov and others (2011) and Murata (1977).Other proofs are incomplete, in the sense that they are performed for n = 2 or n = 3, such as in Bellman (1970).The proof of Truchon (1987) follows essentially the one of Debreu.
As the proof of the necessary conditions in the Sylvester criterion does not entail difficulties, we concentrate ourselves on the proof of the sufficiency part.We assume that all North-West principal minors of A are positive: Let us consider the upper triangular matrix B = where the elements b i j are defined as follows where C ji ( j) denotes the cofactor of a ji in the North-West principal submatrix A ( j) , being, by definition, C 11 (1) = 1.We note that matrix B is nonsingular, being It follows that B has an inverse B −1 .We now introduce the vectors x ∈ R n and y ∈ R n and make the one-to-one transformation Therefore Q(y) is a symmetric quadratic form having the same sign of Q(x).But matrix C is also a diagonal matrix, being lower triangular and symmetric.Moreover, it holds, with ∆ 0 = 1 by definition, Indeed, in order to evaluate C = B ⊤ AB, it is convenient to evaluate first the product AB = . As matrix B is upper triangular, we have The numerator in the last expression is equal to ∆ j , if h = j, it is equal to zero if h < j (use the Laplace second theorem on determinants).Therefore α h j = 0 if h < j: hence matrix AB is lower triangular and we have α j j = 1, j = 1, ..., n.But B ⊤ is a lower triangular matrix (it is the transpose of an upper triangular matrix) and therefore the product C = B ⊤ (AB) will be lower triangular too.As C = C ⊤ , C is a diagonal matrix.Being α j j = 1, we have c j j = b j j .1, and taking into account the definition of b i j , we get relation (2).
From Q(y) = y ⊤ Cy we have the following Lagrange-Jacobi reduction formula, from which the Sylvester criterion is immediate: Obviously, the last relation can be equivalently expressed in the form This last expression has been obtained by Czaki (1970) in a quick way: we assume, as before, that ∆ r 0, for r = 1, ..., n, and denote, as before, by C i j (r) the cofactor pertaining to an arbitrary element a i j of the North-West principal submatrix A (r) of order r.Now, let us introduce a new vector v ∈ R n , given by the nonsingular transformation where T is not necessarily orthogonal.The transformation matrix suitable to obtain the desired result is: whereas the transposed matrix is, with the symmetry conditions taken into consideration, (2) = a 11 , etc.After posing ∆ 0 = 1 by definition, the original quadratic form Q(x) = x ⊤ Ax can be converted into the following diagonal form: because C rr (r) = ∆ r−1 for r = 1, 2, ..., n.Thus, the diagonal form can be expressed as D) Proofs which use the "Inclusion Principle" for eigenvalues.
Some authors (e. g.Franklin (1968), Horn and Johnson (1985), Johnson (1970)) make speed and short the "second step" of the sufficiency proof sub A) by means of the so-called Inclusion Principle for eigenvalues (of a symmetric matrix), known also as Cauchy's Interlace Theorem, which is in turn a special case of the more general Courant-Fischer Theorem.
Rather short proofs of Cauchy's Interlace Theorem are given by Hwang (2004) and by Warrington (2005).
] be a symmetric matrix of order n, with n > 1.Let A (n−1) be the North-West principal submatrix of A of order (n − 1).Let α 1 α 2 ... α n be the eigenvalues of A and let β 1 β 2 ... β n−1 be the eigenvalues of A (n−1) .Then With the help of Theorem 5, the sufficient part of the proof of the Sylvester criterion by induction can be simply performed as follows.Suppose that ∆ i ≡ det(A (i) ) > 0 for all i n.Then A (1) = a 11 is obviously positive definite.Supposing that A (k) is positive definite for some k < n, we have to prove that A (k+1) is positive definite.If α 1 ... α k+1 are the eigenvalues of A (k+1) and if β 1 ... β k are the eigenvalues of A (k) , the Cauchy Interlace Theorem tell us that But all β i are positive, since A (k) is positive definite, therefore all α i , i = 1, ..., k, are positive.But being we conclude that also α k+1 > 0 and therefore that also A (k+1) is positive definite.the necessity part of the proof of the Sylvester criterion shows no particular difficulty.For the sufficient part we need the following two lemmas.
Lemma 2. Let v 1 , ..., v n be a basis of a finite-dimensional vector space V. Suppose W is a k-dimensional subspace of V.If m < k, then there exists a nonzero vector in W which belongs to the span of v m+1 , ..., v n .
Proof.Let w 1 , ..., w k be a basis for W. If the statement of the lemma is false, then w 1 , ..., w k , v m+1 , ..., v n is a basis of V, which contradicts that dim V = n.Lemma 3. Let A be a symmetric (real) matrix of order n.Suppose W is a k-dimensional subspace of R n such that w ⊤ Aw > 0, ∀w ∈ W, w 0.
Then A has at least k (counting multiplicity) positive eigenvalues.

Proof.
By the spectral theorem, there is an orthonormal basis v 1 , ..., v n for R n consisting of eigenvectors of A with corresponding eigenvalues λ 1 , ..., λ n .Suppose that the first m eigenvalues are positive and the remaining not.If m < k, then Lemma 2 implies there exists a nonzero vector w in W which may be written Applying our hypothesis that the quadratic form defined by A is positive on W, we obtain However, we assumed that λ m+1 , ..., λ n are all non positive, so that which is a contradiction.Thus m k.Now, it is possible to prove the sufficiency of the Sylvester criterion by induction on n.For n = 1, there is nothing to prove, as we already know.Suppose that the sufficiency of Sylvester's criterion is true for the (n − 1) × (n − 1) leading principal submatrix of A, i. e. for A (n−1) .Let W denote the subspace of R n consisting of all vectors with zero n-th coordinate.Let P denote the projection of R n onto the subspace spanned by the first (n − 1) coordinates with respect to the standard basis.Then w ⊤ Aw = (w ⊤ P)A(P ⊤ w) = w ⊤ A (n−1) w > 0, ∀w ∈ W, w 0 by our induction hypothesis.By the preceding lemma, A has at least (n − 1) positive eigenvalues (counting multiplicity).Therefore, the n-th eigenvalue is also positive, since otherwise, the n-th leading principal minor, i. e. det(A), would be non positive, which is a contradiction.

The Sylvester Criterion for Quadratic Forms Subject to Linear Constraints
In many applications of real quadratic forms, especially in optimization problems with equality constraints, we are interested in the sign of the quadratic form (1), where x ∈ S , S being the set of all non zero solutions to the homogeneous system of linear equations Bx = 0, where B is a (real) (m, n) matrix, with m < n.
The related extension of the Sylvester criterion for the above problem has been treated by many authors; see, e. g., Afriat (1951), Bellman (1970), Chabrillac and Crouzeix (1984), Debreu (1952), Farebrother (1977), Lancaster (1968), Mann (1943), Murata (1977), Samuelson (1947).Perhaps the first paper which extends the Sylvester criterion to the case under examination is the paper of Mann (1943), where the related proof is however quite long and with many algebraic transformations (there is also a misprint in the second one of Mann's conditions).The treatment of Debreu (1952) is sound and rather stringent, however this author makes use of tools of Mathematical Analysis, whereas the problem into consideration is a typical algebraic problem.Chabrillac and Crouzeix (1984) and Farebrother (1977) employ the Schur's complement, the inertia law and a classical result of Finsler.The treatments of Lancaster (1968) and Samuelson (1947) are interesting, but not complete and rather unsatisfactory.The papers of Black and Morimoto (1968) and Väliaho (1982) are concise and in the same spirit of our proof.The paper of Afriat (1951) is rather intricate; the proof of Truchon (1987) expands the concise proof of Debreu (1952) and is therefore quite long.McFadden (1978) presents some equivalent determinantal conditions for a symmetric matrix to be positive definite subject to homogeneous linear constraints.There are some obscure points: e. g. it is stated that the matrix has a positive principal minor of each order.Moreover, the conditions of Mc Fadden are in general less operative than the Sylvester conditions.Two other interesting papers are the one of Cherubino (1957), in Italian, and the one of Shostak (1954), in Russian.
We try to give an algebraic proof of the said generalization of the Sylvester criterion, proof which uses only classical results of Linear Algebra.We base our treatment on the paper of Giorgi (2003).Let us consider the quadratic form (1): (A symmetric matrix of order n), subject to the linear constraints where B is of order (m, n), with m < n and rank(B) = m.
Theorem 7. Without loss of generality, let us suppose that the first m columns of B are linearly independent.Then: (i) Q(x) = x ⊤ Ax > 0 for all x 0 such that Bx = 0 if and only if the leading principal minors of order 2m + p, p = 1, 2, ..., n − m, of the matrix have the sign of (−1) m .Denoting by ∆k the k-th leading principal minor of H, we must therefore have (−1) m ∆k > 0, k = 2m + 1, ..., m + n.
(ii) Q(x) = x ⊤ Ax < 0 for all x 0 such that Bx = 0 if and only if the leading principal minors of H, of order 2m + p, p = 1, 2, ..., n − m, have the sign of (−1) m+p .
Remark 3. Also in the previous theorem statement (ii) derives from statement (i) by observing that A is negative definite under constraints if and only if −A is positive definite under the same constraints.
Remark 4. In some papers and books the following bordered matrix is considered: In this case obviously the previous conditions (i) and (ii) have to be suitably modified, by making reference to the leading principal minors of H. Remark 6.We have to note that the assumption that the rank of B (rank(B) = m) is given by the first m columns of B is essential to have necessary and sufficient conditions.In absence of this assumption, conditions (i) and (ii) of Theorem 7 are only sufficient conditions and they imply rank(B) = m.This is the case of the usual sufficient second-order optimality conditions imposed on the Lagrangian function of optimization problems with equality constraints.Consider, e. g., the following simple example.
It is obvious that the quadratic form is positive definite on the constraint x 3 = 0 (hence here vector b is b = [0, 0, 1]).Yet we have Proof of Theorem 7. On the grounds of Remark 3, we prove only statement (i).Let us partition matrix B into two submatrices B 1 and B 2 , where B 1 contains the first m columns of B, so that det(B 1 ) 0, and Similarly, we separate vector x ∈ R n into two blocks: Finally, we partition matrix A into four submatrices, as follows: where A 21 = A ⊤ 12 , A 11 is symmetric of order m, A 12 is of order (m, (n − m)) and A 22 is symmetric of order (n − m).

Now we compute
The m variables x 1 , x 2 , ..., x m can be eliminated, as x must satisfy relation (4).Taking into account that x where A * is the symmetric matrix given by Putting J = −B −1 1 B 2 we can also compact the above expression: In other words, the sign of Q(x) under the constraints Bx = 0 coincides with the sign of Q 2 (x), without constraints.Then, it is possible to establish a congruence relation between two symmetric matrices, which allows to compare the sign of the North-West principal minors of the square matrix A * (of order n − m) with the North-West principal minors of the square matrix H.
More precisely, we form the following matrices C, M and N, defined as follows: where I k is the identity matrix of order k; It is quite easy to verify that C ⊤ PC = H, where P is the symmetric matrix By performing, by blocks, the product C ⊤ PC, we realize that for every p = 1, 2, ..., n − m, the North-West principal minor ∆2m+p of order 2m + p of H is linked to the North-West principal minor ∆ * p of order p of A * by the formula We obtain therefore that the signs of ∆ * p and (−1) m ∆2m+p coincide, for p = 1, 2, ..., n − m.
Remark 7. The case where the constraints of the quadratic form are given by a non-homogeneous system of the type is studied by Shostak (1954).Also Heal, Hughes and Tarling (1974) consider a constraint system of the type ( 5), but then they treat (in an unsatisfactory way) only the case c = 0.For the reader's convenience we report the results of Shostak.
Again we consider the quadratic form (1) subject to (5), where B is of order (m, n), m < n, rank(B) = m and where the determinant of the square submatrix B 1 , formed by the first m columns of B, is different from zero.Let us transform the variables in (1) and in ( 5) by means of the following position: x j = y j y n+1 , j = 1, ..., n, (y n+1 0).
Then (1) becomes The problem is then of characterizing the sign of (6) under the constraints (7).The positive or negative definiteness of the said constrained quadratic form depends from the sign of the North-West principal minors, of order 2m + 1, ..., m + n + 1, of the bordered matrix

Remark 5 .
In case of only one constraint, of the type bx = 0, with b ∈ R n and with b 1 0, the previous conditions (i) and (ii) of Theorem 7 become, respectively, a) Q(x) is positive definite on the set of non trivial solutions of bx = 0 if and only if ∆3 = 0 b 1 b 2 b 1 a 11 a 12 b 2 a 21 a 22 < 0, ..., |H| < 0. b) Q(x) is negative definite on the set of non trivial solutions of bx = 0 if and only if ∆3 = 0 b 1 b 2 b 1 a 11 a 12 b 2 a 21 a 22 > 0, ∆4 = 0 b 1 b 2 b 3 b 1 a 11 a 12 a 13 b 2 a 21 a 22 a 23 b 3 a 31 a 32 a 33 < 0, etc.