Accelerated Bidirectional Modifications of the Steepest Descent Method for Ill-posed Linear Algebraic Systems with Dynamics-theoretical and Optimization Interpretation

It is well known that the classical numerical algorithm of the steepest descent method (SDM) is effective for well-posed linear systems, but performs poorly for ill-posed ones. In this paper we propose accelerated and/or bidirectional modifications of SDM, namely the accelerated steepest descent method (ASDM), the bidirectional method (2DM), and the accelerated bidirectional method (A2DM). The starting point is a manifold defined in terms of a minimum functional and a fictitious time variable; nevertheless, in the end the fictitious time variable disappears so that we arrive at purely iterative algorithms. The proposed algorithms are justified by dynamics-theoretical and optimization interpretation. The accelerator plays a prominent role of switching from the situation of slow convergence to a new situation that the functional tends to decrease stepwise in an intermittent and ceaseless manner. Three examples of solving ill-posed systems are examined and comparisons are made with exact solutions and with the existing algorithms of the SDM, the Barzilai-Borwein method, and the random SDM, revealing that the new algorithms of ASDM and A2DM have better computational efficiency and accuracy even for the highly ill-posed systems.


Ill-posed Problems and Remedy
In this paper we propose robust and easily implemented new methods to solve the system of linear algebraic equations where the coefficient matrix A ∈ R n×n is a given positive definite matrix, which may be ill-conditioned; the right-hand side vector b ∈ R n is the input data, which may be corrupted by noise; and x ∈ R n is the unknown vector to be sought for.
We thus encounter the problem that the numerical solution to Eq. (1) may deviate from the exact one to a great extent, when A is highly ill-conditioned and/or b is disturbed by noise.In some occasions (for example, design code regulations require engineers to consider hundreds of load combinations b), the unknown vectors x may be solved for preferably by first inverting A, but this again causes a great difficulty when A has a large condition number.
Many numerical methods used in the computational mechanics are confronted with the ill-posed linear algebraic equations, as demonstrated in the literature (Liu, 2007a;Liu, 2007b;Liu, 2007c;Lorentz & Benallal, 2005).Indeed, the solution of ill-posed linear algebraic equations is an important issue for many engineering and scientific problems.A good numerical method to solve Eq. ( 1) is not only important in its own right but also beneficial in applications such as (i) the optimization problems including linear programming and nonlinear programming, (ii) the Newton's, quasi-Newton's and homotopy methods for systems of nonlinear equations, and (iii) the finite difference, boundary element and finite element methods for partial differential equations.
To account of the sensitivity to noise a common practice is to invoke a regularization method to tackle this sort of ill-posed problems (Kunisch & Zou, 1998;Resmerita, 2005;Wang & Xiao, 2001;Xie & Zou, 2002), where a suitable regularization parameter is utilized to depress the bias in the computed solution via a better balance of the approximation error and the propagated data error.Developed were several techniques after the pioneering work of Tikhonov & Arsenin (1977).For a large scale system the main choice is to use the iterative regularization algorithms, where a regularized parameter is represented by the number of iterations.The iterative algorithms work if an early stopping criterion is erected to prevent from reconstruction of noisy components in the approximated solutions.To solve the ill-posed linear problems, several methods such as using the fictitious time integration method as an ill-conditioning filter (Liu & Atluri, 2009a), a modified polynomial expansion method (Liu & Atluri, 2009b), the nonstandard group preserving scheme (Liu & Chang, 2009), and a natural regularization (Liu, Hong, & Alturi, 2010) have been developed.
A quantitative measure of the ill-posedness of Eq. ( 1) is the condition number of the square matrix A (Stewart, 1973): where ∥A∥ is the Frobenius norm of A. For an arbitrary ϵ > 0, there exists a matrix norm ∥A∥ such that ρ(A) ≤ ∥A∥ ≤ ρ(A) + ϵ, where ρ(A) is the radius of the spectrum of A. Therefore, the condition number of A can be estimated by where σ(A) is the collection of the eigenvalues of A.
The linear algebraic system (1) with A having a large condition number usually suffers from the numerical instability problem that an arbitrary small perturbation on the right-hand side b may lead to a large perturbation to the solution x on the left-hand side.Roughly speaking, the numerical solution of Eq. ( 1) may lose the accuracy of k decimal points in x when cond(A) = 10 k .The problems of ill-conditioned A appear in several fields.A famous example of highly ill-conditioned matrices is the Hilbert matrix.It arises naturally in finding an n-degree polynomial function p The minimization problem (4) leads to Eq. ( 1), where A is the (n + 1) × (n + 1) Hilbert matrix, whose elements are of the form x is composed of the n + 1 coefficients a 0 , a 1 , . . ., a n of p(x), and is uniquely determined by the function f (x).

The Steepest Descent Method
Solving Eq. ( 1) by the steepest descent method (Jacoby, 1972;Ostrowski, 1973) is equivalent to solving the minimization problem: where in which the superscript T signifies the transpose.By using the Ritz variational principle one can derive the following SDM algorithm.
(i) Give an initial x 0 .
For the SDM the residual vector r k is the steepest descent direction of the function φ at the point x k .But when ∥r k ∥ is rather small the computed r k may deviate from the actual steepest descent direction to a great extent due to a round-off error of computing machine, which usually leads to the numerical instability of the SDM.

Steplengths in the SDM, BBM, and RSDM
Modifications to the SDM have recurred.These modifications, although somewhat ad hoc in some cases, have stimulated a new interest in the SDM because it is recognized that the gradient vector itself is not a bad choice of the direction, but rather that the steplength originally dictated by the classical SDM is to blame for the slow convergence behavior.Barzilai & Borwein (1988) were the first, who presented a new choice of steplength through two-point stepsize.The algorithm of the Barzilai-Borwein method (BBM) is where with initial guesses r 0 = 0 and x 0 = 0.Although failing to guarantee the descent of the minimum functional values, they were able to produce a substantial improvement of the convergence speed in certain tests.Their results have initiated many researches on the SDM, for example, Raydan (1993Raydan ( , 1997)), Friedlander et al. (1999), Raydan & Svaiter (2002), Dai & Liao (2002), Dai et al. (2002), Dai & Yuan (2003), Flectcher (2005), Yuan (2006), etc.Among these efforts, a very interesting modification of steplength was the random SDM (abbreviated to RSDM) proposed by Raydan & Svaiter (2002), namely where θ k are random numbers in [0, 2].
These algorithms (8), (9), and (10) may be summarized as It can be seen that they are different mainly in steplengths.

A New Stage-Invariant Manifold
In this paper we approach this steplength problem from a quite different point of view of dynamics on an invariant manifold in the product space of the state space and a fictitious time axis, and propose new strategies.
From Eqs. (1) and ( 7) it is easy to prove that the minimum is because A is positive definite, where x * is the solution of Eq. (1).We take where c 0 is a constant such that ϕ(x) > 0. Therefore, solving Eq. (1) by the steepest descent method amounts to solving the minimization problem: where ϕ(x) is given by Eq. ( 11).It is obvious that the minima of ϕ(x) and φ(x) occur at the same place x = x * .
There are several regularization methods available for treating Eq. ( 1) when A is ill-conditioned.In this paper we consider an iterative regularization method for Eq. ( 1) by investigating the evolutional behavior of x governed by a dynamical system defined on a manifold h(x, t) formed from ϕ(x) with the requirements that Q(t) be positive (Q(t) > 0) and strictly increasing ( and that C(t) > 0. The requirement that Q(t) be a strictly increasing function of a fictitious time variable t enables us to have an absolutely convergent property in solving Eq. ( 1).We will utilize such a strictly increasing positive Q(t) to play the role of an accelerator in the dynamics of the iterative methods.We expect h(x, t) = C(t) to be an invariant manifold in the space of (x, t) for a dynamical system to be specified further.Note that by the above implicit function we in fact have required x to be a function x(t) of t, that is, we are considering the trajectory in the invariant manifold (13).Therefore, when taking the differentiation of Eq. ( 14) with respect to t, we have to fulfill the consistency condition where is the residual vector and an overdot indicates differentiation with respect to t.
Let there be a gradient flow in the state space of (x), where α is to be determined.Inserting Eq. ( 17) into Eq.( 15) and using Eq. ( 16) we get With this we obtain an evolution equation for x(t): This is the governing equation for the dynamical system defined on the invariant manifold (13).Since, in order to be qualified as an accelerator, Q(t) has been chosen to be a strictly increasing positive function of t, we have an absolutely convergent property in solving Eq. ( 1) by searching the minimum of ϕ through the following equation, When t is large the above equation with the aid of the accelerator Q(t) helps the functional ϕ to attain faster its minimum, and meanwhile the solution of Eq. ( 1) is obtained quicker.

Dynamics of Iterative Algorithms
In this section we derive a discrete analogue of the foregoing continuous time dynamics and develop accelerated algorithms based on the discrete time dynamics.Since the fictitious time variable is now discrete, we arrive at purely iterative algorithms with discrete k ∈ {0, 1, 2, . ..} superseding continuous t.
Here and following, we use the symbols ẋk = ẋ(k∆t), Hence the trajectory ( 14) on the invariant manifold (13) becomes

Accelerated SDM
We use the Euler scheme to discretize the evolution equation ( 17), Upon denoting which leads to Hence It is recognized that this is indeed the algorithm (8) of the steepest descent method (SDM).The trajectory of x k+1 generated by Eq. ( 25) is kept on the invariant manifold if and only if Despite the optimal property (23) and the globally convergent property, the SDM algorithm (25) converges linearly and performs poorly in ill-posed cases (Akaike, 1959).To accelerate the convergent iterations we stick to such C k+1 of Eq. ( 27) but replace Q k+1 = 1 implied in Eq. ( 26) as comparing with equation given below by the accelerator Q 1 > 1 and According to Eq. ( 27), we choose r T k r k and r T k Ar k be the factors of the accelerator.After derivation, we find the minimization problem min with the best accelerator and Eq. ( 22) leads to The range of γ contained in Eq. ( 29) is 0 ≤ γ < 1 which is investigated in the next subsection.
Then, we devise the following algorithm: (i) Give an initial x 0 .
For convenience the algorithm is referred to as the accelerated steepest descent method (ASDM)1 .If instead γ = 0 the above algorithm becomes the classical steepest descent method (SDM).The SDM and ASDM are unidirectional methods in the sense that at each step the optimizations of their algorithms are performed within an one-dimensional space in which the optimal steplengths are found.

How Does the Accelerator Work?
In the last subsection, we propose an accelerated modification of SDM.For this modification, it is easy to obtain by specifying c 0 = 1 2 x ⋆T Ax ⋆ where Ax ⋆ = b.Upon using the Kantorovich inequality, we get the upper bound of ϕ k+1 ϕ k as follows: From inequality (35) it follows that γ ∈ (−1, 1) is sufficient for a strictly decreasing sequence {ϕ k }.However, γ ∈ (−1, 0) leads to Q k < 1 based on Eq. ( 29); therefore, we have to choose γ ∈ (0, 1) in order to have an accelerator Q 0 > 1 and The θ k in the algorithm (10) of the random SDM indeed plays the role of 1 − γ.According to the above analysis based on the viewpoint of dynamics, we suggest that the random numbers be chosen from θ k ∈ (0, 1) (abbreviated to RSDM1) rather than the original ad hoc range θ k ∈ [0, 2] (abbreviated to RSDM).
Another index is Using again the Kantorovich inequality we derive the upper bound of as in the following: According to the estimate (36), the strictly decreasing r T k+1 r k+1 may exist merely in the range Observe that the range of strictly decreasing r T k+1 r k+1 for γ 0 appears to be narrower than that for γ = 0.It provides one of the reasons why the irregularly jumping behavior of the residual errors for the accelerated algorithm ASDM (0 < γ < 1) is more violent than that of the classical algorithm SDM (γ = 0).The corresponding behavior of SDM (γ = 0) was explored in (Nocedal et al., 2002).

Bidirectional Methods
In deriving the unidirectional methods we use the Euler scheme to discretize the dynamical system (17).Now we discretize the flow ẋ as the weighted difference of states: ẋ(k∆t) = x((k + 1)∆t) − β k x(k∆t) ∆t so that the dynamical system (17) becomes Thus it follows that where The trajectory (26) of x k+1 generated by Eq. (37) with Eq. (39) will be kept on the invariant manifold if and only if Note that Q k+1 = 1 implied in Eq. ( 26).In order to speed up the rate of convergence, we stick to Eq. ( 40) but replace Q k+1 = 1 with the accelerator Q 1 > 1 and Q k+1 > Q k .According to Eq. ( 40), we select a k 1 , a k 2 , a k 3 , d k 1 , and d k 2 to be the factors of the accelerator Q k+1 .Basing on our investigation, we find that with the best accelerator containing a parameter γ ∈ [0, 1), the extremal of the optimization problem min instead of Eq. ( 38), occurs at Following the discrete time dynamics of x k+1 of Eq. ( 37) with Eq. ( 43), the x k+1 evolves, generating the trajectory (21) on the invariant manifold if and only if C k+1 and Q k+1 satisfy Eqs. ( 40) and ( 41), respectively.
According to the above derivation, an iterative algorithm is proposed as below.
(i) Give an initial x 0 .
(ii) For k = 0, 1, . .., we repeat the following computations: If x k converges according to a given stopping criterion ∥r k ∥ < ε, then stop; otherwise , and go to step (ii).

Example 1
In this example we consider a highly ill-posed two-dimensional linear system: The condition number is cond(A) = 1.59 × 10 13 , where A = B T B is positive definite and B denotes the coefficient matrix, which is not symmetric, of Eq. ( 44).The exact solution is (x ⋆ , y ⋆ ) = (1, 1).The seven algorithms of SDM, BBM, RSDM, RSDM1, ASDM, 2DM, and A2DM are applied to find the solutions of Eq. ( 44), whose results are displayed correspondingly in Fig. 1 to Fig. 7.
To promote a better understanding of these methods and, in particular, the influence of initial guesses, we run the algorithms starting from various initial points (x 0 , y 0 ) = (1 Figure 1(a) shows the 360 × 20 points of (x k , y k ) during the 19 iterations by using the SDM for the 360 initial points on the circle (x − 1) 2 + (y − 1) 2 = 1, among which for illustration purposes, in particular, 2 initial points (x 0 , y 0 ) and their first iteration points (x 1 , y 1 ) are each marked by a cross and connected by thin lines to show their orbits.It is clearly seen that  b) are greater than those using the others, this matter does not do harm but instead demonstrates the low reliability of the residual errors due to a large condition number as noted in (Nocedal et al., 2002).Summarizing, for the ill-posed linear system (44), the A2DM and 2DM display their superior performance, with their iterations converging rapidly to the solution point (x ⋆ , y ⋆ ) = (1, 1) in less than 19 steps.

Example 2
In this example we consider a highly ill-posed linear equation (1) with A being the Hilbert matrix of elements in Eq. ( 5).
The ill-posedness of Eq. ( 1) increases fast with n.
In order to compare the numerical solutions with exact solutions we suppose that x ⋆ 1 = x ⋆ 2 = . . .= x ⋆ n = 1, and then by Eq. ( 5) we have We solve this problem for the case with n = 50.The resulting linear system of equations is highly ill-posed, the condition number being as large as 4.79 × 10 18 .Seven methods are applied to find the solutions of Eq. ( 1), whose results are displayed in Figs. 8 to 10.We used γ = 0.15 for A2DM and γ = 0.05 for ASDM.
The A2DM and 2DM have a privileged set of initial guesses x i = c, ∀c ∈ R which happen to lie in a ray from the origin x i = 0 through the solution point x ⋆ i = 1.If we start from these initial guesses, the A2DM and 2DM converge in only one step.Therefore, we chose another initial guess RSDM1 and the A2DM converge in 99985 iterations and 81660 iterations, respectively, but the others do not converge within 10 5 iterations.Figure 8 shows the residual errors ∥r T k r k ∥ versus the step number k, the result of A2DM being compared with those of the other six methods in six plots.It is seen that A2DM has a faster convergence rate than the others.In this 50-dimensional case, 2DM does not work well as it did in the 2-dimensional case (Example 1).Observe that the residual errors of SDM and 2DM are changing smoothly and converging slowly, whereas those of ASDM, BBM, RSDM, RSDM1, and A2DM are changing violently to the contrary.
Figure 9 shows that the solution errors ∥x k − x ⋆ ∥ versus the step number k.The solution errors of all methods display nearly stationary during a relatively long period from step k = 10 1 to step k = 10 4 .In this duration the performance of A2DM is merely better than 2DM.But beyond the 10 4 iterations, the solution errors of A2DM vary to a larger extent and succeed in attaining much lower values.Figure 10 displays the 50 components |(x N f − x ⋆ ) i |/(x ⋆ ) i of numerical errors at the final converged step N f ; the distance between the final point and the solution point x ⋆ i = 1 for A2DM is the shortest.Both figures confirm that A2DM renders the most accurate results in the end.

Example 3
As an application of the new algorithms we consider a polynomial interpolation problem.In other words, given some data points, such as obtained by sampling of a measurement, the aim is to find a polynomial which goes exactly through these points.
Given a set of m data points (u i , v i ) where no two u i are the same, one is looking for a polynomial p(u) of degree at most m − 1 with the following property: where a, b] being the spatial interval of the problem domain.
The unisolvence theorem states that such a polynomial p(u) exists and is unique, and can be proved by using the Vandermonde matrix.Suppose that the interpolation polynomial is in the form of where 1, u 1 , • • • , u m−1 constitute a monomial basis.The statement that p(u) interpolates the data points means that Eq. ( 46) must hold.
If we substitute Eq. ( 47) into Eq.( 46), we obtain a system of linear equations in the unknown coefficients We have to solve the above system for x i to construct the interpolant p(u).
In order to compare the numerical solutions with exact solutions we suppose that Then we obtain v 1 , . . ., v m according to Eq. ( 48) .In this case we take m = 100 and u i = −1 + 2i/100 to be the nodal points.The condition number of this system is cond(A) = 1.36 × 10 19 where A = B T B and B denotes the coefficient matrix.
Seven methods are applied to find the solutions of Eq. ( 48), whose results are displayed in Figs.11 to 13.As in the last example, the A2DM and 2DM have a privileged set of initial guesses x i = c, ∀c ∈ R which happen to lie in a ray from the origin x i = 0 through the solution point x ⋆ i = 1.If starting from these initial guesses, the A2DM and 2DM would converge in only one step.Therefore, we start from, instead, a fairly ordinary initial guess x i = (−1) i × 0.5, i = 1, 2, • • • , 100, and set a stopping criterion ϵ = 10 −8 .All the methods do not converge in 10 5 iterations.The comparison of the residual errors for these methods are shown in Fig. 11.The residual errors of A2DM plotted in the six charts with the other six methods exhibit faster convergence rate almost all the way in the whole iterative process.It can also be seen that ASDM performs better than BBM, RSDM, and RSDM1.
It is interesting to examine more closely the behavior of the residual errors exhibited in Fig. 11 of this example as well as Fig. 8 of the previous example.We observe that SDM and 2DM have similar trends of smoother and slower decreasing rates, but the others exhibit more violently oscillating behavior.These common phenomena for the BBM, RSDM, RSDM1, ASDM, and A2DM may be attributed to their steplength modifications by a multiplicative scalar factor, namely Since we know the exact solution of problem (48), we plot the solution errors ∥x k − x ⋆ ∥ of these seven methods for all steps k in Fig. 12.The result of A2DM is compared with those of the other six methods in the six plots.The solution errors using A2DM converge faster than those using the others nearly at all steps k.Similarly, the numerical errors of A2DM are smaller than those of the others as can be seen in Fig. 13.

Concluding Remarks
We have proposed accelerated and/or bidirectional modifications of the classical steepest descent method (SDM), namely the accelerated SDM (ASDM), the bidirectional method (2DM), and the accelerated 2DM (A2DM).The proposed methods are all furnished with dynamics-theoretical and optimization interpretation.By embedding the minimization problem of a quadratic functional into an invariant manifold in the product space of the state space and the fictitious time axis, we have constructed a continuous time dynamical system for the unknown vector x in the state space.Using different discretization schemes, we have constructed the invariant manifolds of the unidirectional method and the 2DM.Locally they pursue optimal line search and optimal plane search, respectively.The unidirectional method is nothing but the classical SDM.On the basis of dynamics of trajectories we have equipped SDM and 2DM with accelerators Q 0 > 1 and Q k+1 > Q k in order to improve the convergence rate and have thus developed the accelerated algorithms of ASDM and A2DM through optimization processes.
The iterative algorithms so developed retain the simplicity of the classical SDM algorithm and are robust and easy to implement.Moreover, both the operations of multiplication and addition are of the same order of N 2 for the SDM, ASDM, 2DM, and A2DM in N-dimensional problems.Comparison of SDM, BBM, RSDM, RSDM1, ASDM, 2DM, and A2DM methods are made and demonstrated in the numerical experiments of three highly ill-posed problems.
In Example 1, we have observed that the initial guesses have not structured influence for the 2DM and A2DM and their iterations converge rapidly to the solution point.For the example of Hilbert matrix, the RSDM1 and A2DM converge within 10 5 iterations under the stopping criterion ε = 10 −8 but the A2DM shows the lower value of solution error in the end.Even though all the methods do not converge in 10 5 iterations with the stopping criterion ε = 10 −8 , the A2DM exhibits faster convergence rate almost all the way in the whole iterative process of Example 3.
The A2DM consequently displays better performance in all cases.The 2DM works well merely in the 2-dimensional case.
The ASDM is a second choice besides the A2DM.According to the numerical experiments, the bidirectional methods of 2DM and A2DM (the second column in Table 1) display better efficiency and accuracy than the unidirectional methods of SDM and ASDM (the first column in Table 1), and the accelerated methods of ASDM and A2DM (the second row in Table 1) work better than the original methods (the first row in Table 1).
Table 1.Classification of the proposed modifications unidirectional method bidirectional method ----SDM 2DM accelerated ASDM A2DM The merits of A2DM are (i) the point of view of global dynamics (virtual time method continuous time dynamics and iterative alogorithm discrete time dynamics) with the Q 0 > 1 and Q k+1 > Q k as an accelerator and (ii) both the steplength and direction at each step being determined locally by an optimization in a two dimensional subspace without sophisticated computation.
360, and stopped at the 19th step (k = 19).The iterations of 19 points (x k , y k ), k = 0, 1, . . ., 19, per initial guess for the 360 initial guesses were plotted in the (a) parts of the seven figures.In the (b) and (c) parts plotted are the residual errors r T 19 r 19 and the solution errors √ (x 19 − x ⋆ ) 2 + (y 19 − y ⋆ ) 2 , respectively, at step k = 19 for the 360 initial guesses.

Figure 1 .
Figure 1.Example 1. Evolution of iterations, residual errors and solution errors for SDM Figure 9. Example 2. Comparison of solution errors for A2DM with the other six methods Figure 11.Example 3. Comparison of residual errors for A2DM with the other six methods