Discussing a More Fundamental Concept Than the Minimal Residual Method to Solve Linear System in a Krylov Subspace

,


Introduction
In this paper we derive a better Krylov subspace solution method by maximizing the orthogonal projection, instead of that obtained by the method of minimal residual, to solve the following linear equations system: where x ∈ R n is an unknown vector, to be determined from a given non-singular coefficient matrix A ∈ R n×n , i.e.Rank(A) = n, and the input b ∈ R n .
Given an initial guess x 0 , from Equation (1) we have an initial residual Upon letting z = x − x 0 , Equation ( 1) is equivalent to which can be used to search a descent direction z after giving an initial residual r 0 .Liu (2013aLiu ( , 2013bLiu ( , 2014a) ) has proposed new methods by minimizing the following merit function: to obtain a faster descent direction z in the iterative solution of Equation (1).
In the numerical solution of linear equations system the Krylov subspace method is one of the most important classes of numerical methods (Dongarra, 2000;Saad, 1981;Freund & Nachtigal, 1991;van den Eshof & Sleijpen, 2004;Liu, 2013c).The iterative algorithms that are applied to solve large scale linear systems are mostly the preconditioned Krylov subspace methods (Simoncini & Szyld, 2007).
Suppose that we have an m-dimensional Krylov subspace generated by the coefficient matrix A from the right-hand side vector r 0 in Equation ( 2): K m := Span{r 0 , Ar 0 , . . ., A m−1 r 0 }.
Let L m = AK m .The idea of GMRES is using the Galerkin method to search the solution z ∈ K m , such that the residual b − Ax = r 0 − Az is perpendicular to L m (Saad & Schultz, 1986).It can be proven that the solution z ∈ K m minimizes the residual (Saad, 2003): (5) Throughout this paper the 2-norm of a vector x is denoted by x .
Recently, Liu (2014b) has developed a new theory to find the double optimal solution of Equation ( 1), simultaneously based on the two minimizations in Equations ( 3) and ( 5).Here, we only use a similar merit function as that in Equation ( 3) and employ a scaling invariant property of the proposed merit function to derive a maximal projection solution (MP) in the Krylov subspace.More importantly, we can prove that the MP is better than the least squares solution (LS) with an exact estimation equation of the difference between MP and LS provided.
The remaining parts of this paper are arranged as follows.In Section 2 we start from an m-dimensional Krylov subspace to express the solution with coefficients to be optimized in Section 3, where a new merit function is proposed for finding the optimal expansion coefficients.We can derive a closed-form MP of Equation ( 1).The comparisons between the MP and the LS are performed in Section 4, and an important improvement is verified.
The examples of linear problems and discussions are given in Section 5 to display some advantages of the present methodology to find an approximate solution of Equation (1).

A Krylov Subspace Method
For the linear equations system (1), by using the Cayley-Hamilton theorem we can expand A −1 by and hence, the solution x is given by where the coefficients a 0 , a 1 , . . ., a n−1 appear in the characteristic equation for A: Here, we assume that a 0 = −det(A) 0. In practice, the above formula to find the solution of x is quite difficult to be realized, since the coefficients a j , j = 0, 1, . . ., n − 1 are hard to find, and the computations of the higher order powers of A are very expensive, when n is a quite large positive integer.
The idea of projection method, including the GMRES, is searching a solution x in a smaller subspace than the original space R n with dimension m n.In doing so, the higher order expansion terms in Equation ( 6) are truncated, and we can find the lower order expansion coefficients through a suitably designed optimization in a Krylov subspace.A basic ingredient of the Krylov subspace method is the construction of an orthonormal set of linearly independent bases.We describe how to set up the bases u k , k = 1, . . ., m by the Krylov subspace method.Suppose that we have an m-dimensional Krylov subspace generated by the coefficient matrix A from the right-hand side vector b in Equation (1): Then the Arnoldi process is used to normalize and orthogonalize the Krylov vectors where δ i j is the Kronecker delta symbol.The resulting matrix is denoted by which is an n × m Arnoldi matrix with its jth column being the vector u j .Because u 1 , . . ., u m are linearly independent and m < n, U has a full column rank, that is, Rank(U) = m.The expansion of x in the Krylov subspace is denoted by x ∈ K m .Then, we can prove the following result, where we minimize r as shown in Figure 1(a).
Theorem 1 For x ∈ K m , and b 0 ∈ R n being a given vector, the best x and y = Ax which satisfy are given by where X = UDJ T , and E = AX.
Let J be an n × m matrix: By the assumption of the full ranks of A and U, J has a full rank with Rank(J) = m.Then, y = Ax can be written as y = Jα.
Expanding the square residual we have where A dot between two vectors signifies the inner product of these two vectors.Because J has a full rank, C is an m × m positive definite matrix, whose inversion is denoted by Inserting Equations ( 16) and ( 17) into Equation ( 15), taking the differential with respect to α and setting it to be zero, we can find Feeding it into Equation ( 12) we can obtain Equation (10), while Equation ( 11) is obtained from Equation (10) by multiplying A on both sides. 2

Maximizing the Orthogonal Projection
In this section we will propose a new merit function to improve the solution in Theorem 1.

An orthogonal Projection of b onto y
Let and we attempt to establish a merit function, such that its minimization leads to the best fit of y to b, because Ax = b is exactly the equation we want to solve.
We consider finding the best approximation of y to b.The orthogonal projection of b to y is regarded as the approximation of b by y as shown in Figure 1(a), whose error vector is written as where the parenthesis denotes the inner product.The best approximation can be found with y minimizing or maximizing the square norm of the orthogonal projection of b to y, i.e., max Due to this reason the solution of the above equation will be named the maximal projection solution (MP), to distinct it from the well-known least squares solution (LS).

A Main Result
The maximum in Equation ( 23) is equivalent to minimize the following merit function: However, it is a quite difficult optimization problem, and how to solve it is given below.
Theorem 2 For x ∈ K m , and b 0 ∈ R n being a given vector, the best x and y = Ax which satisfy are given by where Moreover, we have the following implication and identity: Proof.The proof of this theorem is quite lengthy and we divide it into four parts.(A) Because of x ∈ K m we can expand x by where α 0 is a scaling factor to be determined below, and α := (α 1 , . . ., α m ) T ∈ R m is the collection of m expansion coefficients.Here, we intentionally divide the coefficient preceded u 1 = b/ b into two parts α 0 b and α 1 .Due to Equation ( 31), y = Ax reads as where J was defined by Equation ( 13) and With the help of Equation ( 32), the terms b • y and y 2 in Equation ( 25) can be written as From the necessary condition for the minimization of f we have in which ∇ α denotes the gradient with respect to α.Thus, we can derive where (B) With the help of Equation ( 18), Equations ( 35) and ( 39) can be written as From Equation (37) we can observe that y 2 is proportional to y 1 , which is supposed to be where 2λ is a multiplier to be determined, abiding to the principle of simplicity.From the second equality, by cancelling the common term 2y 1 on both sides, we have Then, by Equations ( 38), ( 41) and ( 42) we have where Inserting Equation (44) into Equations ( 34) and (40) we have where E := JDJ T (48) Remark 1 In Equation ( 10), Xb is known a least squares solution of Equation (1) in the Krylov subspace x ∈ K m , which minimizes the residual b − Ax .Obviously, Equation ( 10) is a special case of Equation ( 26) with α 0 = 0. Theorem 2 indicates that the minimization in Equation ( 25) is a more fundamental concept than the usual minimization of the residual b − Ax 2 as shown in Equation ( 29).

Comparing the Maximal Projection and Least Squares Solutions
In this section we compare the two optimal solutions of Equation (1) derived in Theorems 1 and 2, and prove some important results.
Lemma 1 For Rank(J) = m, α in Theorem 1 appeared in y = Jα of Equation ( 14) is a least squares solution of the following least squares problem: Proof.α in Equation ( 14) satisfies Equation ( 62) and is a least squares solution of the following overdetermined linear system: In the sense of Penrose, we have where J † is the Penrose pseudo-inverse of J (Trefethen & Bau III,1997), and Jα is a projection of b to the nearest point in the space of Range(J) as shown in Figure 1(b). 2 Lemma 2 For Rank(J) = m, α in y − y 0 = Jα of Equation ( 32) is a least squares solution of the following least squares problem: Proof.By using Equations ( 44) and ( 57) we have The above α satisfies Equation ( 65) and is a least squares solution of the following overdetermined linear system: Hence, we have where J † is the Penrose pseudo-inverse of J, and Jα is a projection of b − y 0 to the nearest point in the space of Range(J) as shown in Figure 1(c). 2 Remark 2 From Equations ( 32), ( 68) and ( 48) we have y − y 0 = Jα = E(b − y 0 ), where the orthogonal projector E plays a role to project b − y 0 onto the space of Range(J) ⊂ Range(A).On the other hand, by setting α 0 = 0 in Equation ( 26) and using Equations ( 12) and ( 64) we can write which is a least squares solution in the Krylov subspace for the minimization in Equation ( 9).Because of α 0 = 0, the point y 0 in Figure 1(c) moves to the zero point as shown in Figure 1(b).In general, α 0 not necessarily be a small number, and thus the above solution will be less accurate than that obtained from Equation ( 26), which can be recast to Then, we can claim that the maximal projection solution in Equation ( 70) is better than the least squares solution obtained from the minimum of residual in Equation ( 9), which is recast to Equation (69).
Below we prove two main results about the residuals of the optimal solutions derived from Theorems 1 and 2; however, before that we need the following lemma.
Lemma 3 Both E and I n − E are projection operators, which render x

Example 1
Finding an n-order polynomial function p(x) = a 0 + a 1 x + . . .+ a n x n to best match a continuous function f (x) in the interval of x ∈ [0, 1]: leads to a problem governed by Equation ( 1), where A is the (n + 1) × (n + 1) Hilbert matrix defined by x is composed of the n + 1 coefficients a 0 , a 1 , . . ., a n appeared in p(x), and is uniquely determined by the function f (x).
The Hilbert matrix is a notorious example of highly ill-conditioned matrices.Equation ( 1) with the matrix A having a large condition number usually displays that an arbitrarily small perturbation of data on the right-hand side may lead to an arbitrarily large perturbation to the solution on the left-hand side.The ill-posedness of Equation (1) with the above coefficient matrix A increases very fast with n.Todd (1954) has proven that the asymptotic of condition number of the Hilbert matrix is We consider an exact solution with x j = 1, j = 1, . . ., n and b i is given by Then, we solve this problem by using the LS and MP solutions under n = 300.In Figures 2(a It can be seen that the MP solutions are slightly better than that of the LS solutions.When we take m = 12, the solution is the best one with the maximum error being 8.96 × 10 −4 and the residual being 4.18 × 10 −9 . In the MFS the solution of u at the field point z = (x, t) can be expressed as a linear combination of the fundamental solutions U(z, s j ): where n is the number of source points, c j are unknown coefficients, and s j are source points being located in the complement Ω c of Ω = [0, ] × [0, T ].For the heat conduction equation we have the basis functions where and n = 2m 1 + m 2 .
Since the BHCP is highly ill-posed, the ill-condition of the coefficient matrix A in Equation ( 85) is serious.To overcome the ill-posedness of Equation ( 85) we can use the MP to solve this problem.Here we compare the optimal solution with an exact solution: u(x, t) = cos(πx) exp(−π 2 t).
For the case with T = 1 the value of final time data is in the order of 10 −4 , which is small by comparing with the value of the initial temperature f (x) = u 0 (x) = cos(πx) to be retrieved, which is O(1).We impose a relative random noise with an intensity σ = 10% being imposed on the final time data

Discussions
Because the least squares methods are popularly used in the mathematical, physical and engineering science (Blais, 2010), the results presented in this paper are quite significant and promising that a more fundamental and better solution than the least squares solution exists.It can be seen that Theorem 5 guarantees that the MP solution is better than the LS solution as shown in example 1 for a direct problem.For the inverse problem of example 2 the superiority of MP solution is fully exposed, of which the LP solution is thoroughly failure, but the MP solution is still workable, giving solutions with higher accuracy and higher robustness against a large noise up to 10%.More studies are required in order to test the performance of the newly developed maximal projection solution in the Krylov subspace for other systems.Because the theorems were proven without needing of the restriction of m, the maximal projection solution is always better than the LS solution, independent to the Krylov subspace and its dimension m.The new methodology presented here may shed a new light on numerical methods which are based on the least squares method.

Figure 1 .
Figure 1.(a) The approximation of b by y, and (b) the relation of b, y and Jα used in Lemma 1, and (c) the relation of b − y 0 , y − y 0 and Jα used in Lemma 2 ) we show the values of α 0 of the MP solution with respect to m in the range of 6 ≤ m ≤ 13.The values of α 0 are in the range of [0.3, 2.5].Then the residuals and the maximum errors of x i with respect to m are compared in Figures 2(b) and 2(c).

Figure 3 .
Figure 3.For backward heat conduction problem under a noise, (a) α 0 of MP solution, comparing (b) residual errors and (c) maximum errors obtained by MP and LS solutions After imposing the boundary conditions and the final time condition to Equation (84) we can obtain a linear equations system: Ax = b, (85) , which is used to test the stability of MP solution.Under the following parameters m 1 = 11 and m 2 = 6, and hence n = 28, we first plot the values of α 0 of the MP solution with respect to m in the range of 5 ≤ m ≤ 16.The values of α 0 are in the range of[−6.5, 9.4].The residual error b − Ax with respect to m is plotted in Figure3(b), while the maximum error of u(x, 0) is plotted in Figure3(c), of which m = 16 is the best one.With m = 16 we can obtain very accurate solution with the maximum error being 9.37 × 10 −3 .The solutions obtained by the LS method are very bad, which show that the LS solution is not applicable to the inverse problem.Because α 0 is quite large, neglecting which in Equation (10) causes a large error as shown in Figures3(b) and 3(c) by dashed lines.