Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 8

Iterative Methods for Solving Linear

Systems

8.1

Convergence of Sequences of Vectors and Matrices

In Chapter 6 we have discussed some of the main methods for solving systems of linear

equations. These methods are direct methods, in the sense that they yield exact solutions

(assuming infinite precision!).

Another class of methods for solving linear systems consists in approximating solutions

using iterative methods. The basic idea is this: Given a linear system Ax = b (with A a

square invertible matrix), find another matrix B and a vector c, such that

1. The matrix I − B is invertible

2. The unique solution x of the system Ax = b is identical to the unique solution u of the

system

u = Bu + c,

and then, starting from any vector u0, compute the sequence (uk) given by

uk+1 = Buk + c,

k ∈ N.

Under certain conditions (to be clarified soon), the sequence (uk) converges to a limit u

which is the unique solution of u = Bu + c, and thus of Ax = b.

Consequently, it is important to find conditions that ensure the convergence of the above

sequences and to have tools to compare the “rate” of convergence of these sequences. Thus,

we begin with some general results about the convergence of sequences of vectors and ma-

trices.

Let (E,

) be a normed vector space. Recall that a sequence (uk) of vectors uk ∈ E

converges to a limit u ∈ E, if for every > 0, there some natural number N such that

uk − u ≤ , for all k ≥ N.

235

236

CHAPTER 8. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS

We write

u = lim uk.

k→∞

If E is a finite-dimensional vector space and dim(E) = n, we know from Theorem 7.3 that

any two norms are equivalent, and if we choose the norm

∞, we see that the convergence

of the sequence of vectors uk is equivalent to the convergence of the n sequences of scalars

formed by the components of these vectors (over any basis). The same property applies to

the finite-dimensional vector space Mm,n(K) of m × n matrices (with K = R or K = C),

which means that the convergence of a sequence of matrices Ak = (a(k)) is equivalent to the

ij

convergence of the m × n sequences of scalars (a(k)), with i, j fixed (1

ij

≤ i ≤ m, 1 ≤ j ≤ n).

The first theorem below gives a necessary and sufficient condition for the sequence (Bk)

of powers of a matrix B to converge to the zero matrix. Recall that the spectral radius ρ(B)

of a matrix B is the maximum of the moduli |λi| of the eigenvalues of B.

Theorem 8.1. For any square matrix B, the following conditions are equivalent:

(1) limk→∞ Bk = 0,

(2) limk→∞ Bkv = 0, for all vectors v,

(3) ρ(B) < 1,

(4)

B < 1, for some subordinate matrix norm

.

Proof. Assume (1) and let

be a vector norm on E and

be the corresponding matrix

norm. For every vector v ∈ E, because

is a matrix norm, we have

Bkv ≤ Bk v ,

and since limk→∞ Bk = 0 means that limk→∞ Bk = 0, we conclude that limk→∞ Bkv = 0,

that is, limk→∞ Bkv = 0. This proves that (1) implies (2).

Assume (2). If We had ρ(B) ≥ 1, then there would be some eigenvector u (= 0) and

some eigenvalue λ such that

Bu = λu,

|λ| = ρ(B) ≥ 1,

but then the sequence (Bku) would not converge to 0, because Bku = λku and |λk| = |λ|k ≥

1. It follows that (2) implies (3).

Assume that (3) holds, that is, ρ(B) < 1. By Proposition 7.9, we can find

> 0 small

enough that ρ(B) + < 1, and a subordinate matrix norm

such that

B ≤ ρ(B) + ,

which is (4).

8.1. CONVERGENCE OF SEQUENCES OF VECTORS AND MATRICES

237

Finally, assume (4). Because

is a matrix norm,

Bk ≤ B k,

and since B < 1, we deduce that (1) holds.

The following proposition is needed to study the rate of convergence of iterative methods.

Proposition 8.2. For every square matrix B and every matrix norm

, we have

lim Bk 1/k = ρ(B).

k→∞

Proof. We know from Proposition 7.4 that ρ(B) ≤ B , and since ρ(B) = (ρ(Bk))1/k, we

deduce that

ρ(B) ≤ Bk 1/k for all k ≥ 1,

and so

ρ(B) ≤ lim Bk 1/k.

k→∞

Now, let us prove that for every > 0, there is some integer N ( ) such that

Bk 1/k ≤ ρ(B) +

for all k ≥ N( ),

which proves that

lim Bk 1/k ≤ ρ(B),

k→∞

and our proposition.

For any given > 0, let B be the matrix

B

B =

.

ρ(B) +

Since B

< 1, Theorem 8.1 implies that limk→∞ Bk = 0. Consequently, there is some

integer N ( ) such that for all k ≥ N( ), we have

Bk

Bk =

≤ 1,

(ρ(B) + )k

which implies that

Bk 1/k ≤ ρ(B) + ,

as claimed.

We now apply the above results to the convergence of iterative methods.

238

CHAPTER 8. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS

8.2

Convergence of Iterative Methods

Recall that iterative methods for solving a linear system Ax = b (with A invertible) consists

in finding some matrix B and some vector c, such that I − B is invertible, and the unique

solution x of Ax = b is equal to the unique solution u of u = Bu + c. Then, starting from

any vector u0, compute the sequence (uk) given by

uk+1 = Buk + c,

k ∈ N,

and say that the iterative method is convergent iff

lim uk = u,

k→∞

for every initial vector u0.

Here is a fundamental criterion for the convergence of any iterative methods based on a

matrix B, called the matrix of the iterative method .

Theorem 8.3. Given a system u = Bu + c as above, where I − B is invertible, the following

statements are equivalent:

(1) The iterative method is convergent.

(2) ρ(B) < 1.

(3)

B < 1, for some subordinate matrix norm

.

Proof. Define the vector ek (error vector ) by

ek = uk − u,

where u is the unique solution of the system u = Bu + c. Clearly, the iterative method is

convergent iff

lim ek = 0.

k→∞

We claim that

ek = Bke0,

k ≥ 0,

where e0 = u0 − u.

This is proved by induction on k. The base case k = 0 is trivial. By the induction

hypothesis, ek = Bke0, and since uk+1 = Buk + c, we get

uk+1 − u = Buk + c − u,

and because u = Bu + c and ek = Bke0 (by the induction hypothesis), we obtain

uk+1 − u = Buk − Bu = B(uk − u) = Bek = BBke0 = Bk+1e0,

proving the induction step. Thus, the iterative method converges iff

lim Bke0 = 0.

k→∞

Consequently, our theorem follows by Theorem 8.1.

8.2. CONVERGENCE OF ITERATIVE METHODS

239

The next proposition is needed to compare the rate of convergence of iterative methods.

It shows that asymptotically, the error vector ek = Bke0 behaves at worst like (ρ(B))k.

Proposition 8.4. Let

be any vector norm, let B be a matrix such that I −B is invertible,

and let u be the unique solution of u = Bu + c.

(1) If (uk) is any sequence defined iteratively by

uk+1 = Buk + c,

k ∈ N,

then

lim

sup

uk − u 1/k = ρ(B).

k→∞

u0−u =1

(2) Let B1 and B2 be two matrices such that I − B1 and I − B2 are invertibe, assume

that both u = B1u + c1 and u = B2u + c2 have the same unique solution u, and consider any

two sequences (uk) and (vk) defined inductively by

uk+1 = B1uk + c1

vk+1 = B2vk + c2,

with u0 = v0. If ρ(B1) < ρ(B2), then for any

> 0, there is some integer N ( ), such that

for all k ≥ N( ), we have

v

1/k

ρ(B

sup

k − u

2)

.

u

u

ρ(B

0−u =1

k − u

1) +

Proof. Let

be the subordinate matrix norm. Recall that

uk − u = Bke0,

with e0 = u0 − u. For every k ∈ N, we have

(ρ(B1))k = ρ(Bk1) ≤ Bk1 = sup Bk1e0 ,

e0 =1

which implies

ρ(B

1/k

1/k

1) =

sup

Bk1e0

= Bk1

,

e0 =1

and statement (1) follows from Proposition 8.2.

Because u0 = v0, we have

uk − u = Bk1e0

vk − u = Bk2e0,

240

CHAPTER 8. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS

with e0 = u0 − u = v0 − u. Again, by Proposition 8.2, for every > 0, there is some natural

number N ( ) such that if k ≥ N( ), then

sup

Bk

1/k

1 e0

≤ ρ(B1) + .

e0 =1

Furthermore, for all k ≥ N( ), there exists a vector e0 = e0(k) such that

e

1/k

1/k

0

= 1 and

Bk2e0

= Bk2

≥ ρ(B2),

which implies statement (2).

In light of the above, we see that when we investigate new iterative methods, we have to

deal with the following two problems:

1. Given an iterative method with matrix B, determine whether the method is conver-

gent. This involves determining whether ρ(B) < 1, or equivalently whether there is

a subordinate matrix norm such that B < 1. By Proposition 7.8, this implies that

I − B is invertible (since − B = B , Proposition 7.8 applies).

2. Given two convergent iterative methods, compare them. The iterative method which

is faster is that whose matrix has the smaller spectral radius.

We now discuss three iterative methods for solving linear systems:

1. Jacobi’s method

2. Gauss-Seidel’s method

3. The relaxation method.

8.3

Description of the Methods of Jacobi,

Gauss-Seidel, and Relaxation

The methods described in this section are instances of the following scheme: Given a linear

system Ax = b, with A invertible, suppose we can write A in the form

A = M − N,

with M invertible, and “easy to invert,” which means that M is close to being a diagonal or

a triangular matrix (perhaps by blocks). Then, Au = b is equivalent to

M u = N u + b,

that is,

u = M −1N u + M −1b.

8.3. METHODS OF JACOBI, GAUSS-SEIDEL, AND RELAXATION

241

Therefore, we are in the situation described in the previous sections with B = M −1N and

c = M −1b. In fact, since A = M − N, we have

B = M −1N = M −1(M − A) = I − M−1A,

which shows that I − B = M−1A is invertible. The iterative method associated with the

matrix B = M −1N is given by

uk+1 = M−1Nuk + M−1b,

k ≥ 0,

starting from any arbitrary vector u0. From a practical point of view, we do not invert M,

and instead we solve iteratively the systems

M uk+1 = Nuk + b,

k ≥ 0.

Various methods correspond to various ways of choosing M and N from A. The first two

methods choose M and N as disjoint submatrices of A, but the relaxation method allows

some overlapping of M and N .

To describe the various choices of M and N , it is convenient to write A in terms of three

submatrices D, E, F , as

A = D − E − F,

where the only nonzero entries in D are the diagonal entries in A, the only nonzero entries

in E are entries in A below the the diagonal, and the only nonzero entries in F are entries

in A above the diagonal. More explicitly, if

a

11

a12

a13

· · ·

a1n−1

a1n

a

21

a22

a23

· · ·

a2n−1

a2n 

a

31

a32

a33

· · ·

a3n−1

a3n 

A = 

 ,

..

..

..

. .

..

.. 

.

.

.

.

.

.

an−1 1

an−1 2 an−1 3 · · · an−1 n−1 an−1 n

an 1

an 2

an 3

· · ·

an n−1

an n

then

a

11

0

0

· · ·

0

0

 0

a

22

0

· · ·

0

0 

 0

0

a

33

· · ·

0

0 

D = 

 ,

..

..

..

. .

..

.. 

.

.

.

.

.

. 

 0

0

0

· · · an−1n−1

0 

0

0

0

· · ·

0

an n

242

CHAPTER 8. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS

0

0

0

· · ·

0

0

0 a

12

a13 · · · a1n−1

a1n

a

21

0

0

· · ·

0

0

0

0

a23 · · · a2n

−1

a2n 

a

.

31

a32

0

· · ·

0

0

. .

0

0

0

a3n−1

a3n 

−E =  .

.

.

.  , −F = 

 .

.

.

.

..

. .

. ..

..

..

 .

.

.

.

.

 ..

..

..

. .

. ..

.. 

. .

an−1 1 an−1 2 an−1 3

.

0

0

0

0

0

· · ·

0

an−1 n

an 1

an 2

an 3

· · · an n−1 0

0

0

0

· · ·

0

0

In Jacobi’s method , we assume that all diagonal entries in A are nonzero, and we pick

M = D

N = E + F,

so that

B = M −1N = D−1(E + F ) = I − D−1A.

As a matter of notation, we let

J = I − D−1A = D−1(E + F ),

which is called Jacobi’s matrix . The corresponding method, Jacobi’s iterative method , com-

putes the sequence (uk) using the recurrence

uk+1 = D−1(E + F )uk + D−1b,

k ≥ 0.

In practice, we iteratively solve the systems

Duk+1 = (E + F )uk + b,

k ≥ 0.

If we write uk = (uk1, . . . , ukn), we solve iteratively the following system:

a11uk+1

1

=

−a12uk2

−a13uk3

· · ·

−a1nukn

+ b1

a22uk+1

2

=

−a21uk1

−a23uk3

· · ·

−a2nukn

+ b2

..

.

.

.

..

..

.

an−1 n−1uk+1

n−1

= −an−1 1uk1

· · ·

−an−1n−2ukn−2

−an−1nukn

+ bn−1

an nuk+1

n

=

−an 1uk1

−an 2uk2

· · ·

−an n−1ukn−1

+ bn

Observe that we can try to “speed up” the method by using the new value uk+1

1

instead

of uk1 in solving for uk+2

2

using the second equations, and more generally, use uk+1

1

, . . . , uk+1

i−1

instead of uk1, . . . , uki−1 in solving for uk+1 in the ith equation. This observation leads to the

i

system

a11uk+1

1

=

−a12uk2

−a13uk3

· · ·

−a1nukn

+ b1

a22uk+1

2

=

−a21uk+1

1

−a23uk3

· · ·

−a2nukn

+ b2

..

.

.

.

..

..

an−1 n−1uk+1

n−1

= −an−1 1uk+1

1

· · ·

−an−1n−2uk+1

n−2

−an−1nukn + bn−1

an nuk+1

n

=

−an 1uk+1

1

−an 2uk+1

2

· · ·

−an n−1uk+1

n−1

+ bn,

8.3. METHODS OF JACOBI, GAUSS-SEIDEL, AND RELAXATION

243

which, in matrix form, is written

Duk+1 = Euk+1 + F uk + b.

Because D is invertible and E is lower triangular, the matrix D − E is invertible, so the

above equation is equivalent to

uk+1 = (D − E)−1F uk + (D − E)−1b, k ≥ 0.

The above corresponds to choosing M and N to be

M = D − E

N = F,

and the matrix B is given by

B = M −1N = (D − E)−1F.

Since M = D − E is invertible, we know that I − B = M−1A is also invertible.

The method that we just described is the iterative method of Gauss-Seidel , and the

matrix B is called the matrix of Gauss-Seidel and denoted by L1, with

L1 = (D − E)−1F.

One of the advantages of the method of Gauss-Seidel is that is requires only half of the

memory used by Jacobi’s method, since we only need

uk+1

1

, . . . , uk+1, uk

i−1

i+1, . . . , uk

n

to compute uk+1. We also show that in certain important cases (for example, if A is a

i

tridiagonal matrix), the method of Gauss-Seidel converges faster than Jacobi’s method (in

this case, they both converge or diverge simultaneously).

The new ingredient in the relaxation method is to incorporate part of the matrix D into

N : we define M and N by

D

M =

− E

ω

1 − ω

N =

D + F,

ω

where ω = 0 is a real parameter to be suitably chosen. Actually, we show in Section 8.4 that

for the relaxation method to converge, we must have ω ∈ (0, 2). Note that the case ω = 1

corresponds to the method of Gauss-Seidel.

244

CHAPTER 8. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS

If we assume that all diagonal entries of D are nonzero, the matrix M is invertible. The

matrix B is denoted by Lω and called the matrix of relaxation, with

D

−1

1 − ω

Lω =

− E

D + F

= (D − ωE)−1((1 − ω)D + ωF ).

ω

ω

The number ω is called the parameter of relaxation. When ω > 1, the relaxation method is

known as successive overrelaxation, abbreviated as SOR.

At first glance, the relaxation matrix Lω seems at lot more complicated than the Gauss-

Seidel matrix L1, but the iterative system associated with the relaxation method is very

similar to the method of Gauss-Seidel, and is quite simple. Indeed, the system associated

with the relaxation method is given by

D

1 − ω

− E u

D + F u

ω

k+1 =

ω

k + b,

which is equivalent to

(D − ωE)uk+1 = ((1 − ω)D + ωF )uk + ωb,

and can be written

Duk+1 = Duk − ω(Duk − Euk+1 − F uk − b).

Explicitly, this is the system

a11uk+1

1

= a11uk1 − ω(a11uk1 + a12uk2 + a13uk3 + · · · + a1n−2ukn−2 + a1n−1ukn−1 + a1nukn − b1)

a22uk+1

2

= a22uk2 − ω(a21uk+1

1

+ a22uk2 + a23uk3 + · · · + a2n−2ukn−2 + a2n−1ukn−1 + a2nukn − b2)

...