Most modern iterative methods for solving large sparse linear systems are based on subspace projection. These methods search for an approximate solution within a subspace of the global space. As the dimension of the search space increases, it becomes closer to the entire space, and thus the approximate solution becomes closer to the true solution. The most renowned subspace projection methods are Krylov subspace methods, which are extensively used in scientific computing. The most popular Krylov subspace methods include the Conjugate Gradient (CG) method, the Generalized Minimal Residual (GMRES) method, the BiConjugate Gradient Stabilized (BiCGSTAB) method, and the Preconditioned Conjugate Gradient (PCG) method.
Projection Methods
Consider the real linear system
where is an real matrix. The idea of projection techniques is to extract an approximate solution to the above problem from a subspace of . If is this search subspace, and if is its dimension, then constraints need to be imposed to be able to extract such an approximation. A typical way of describing these constraints is to impose (independent) orthogonality conditions. Specifically, the residual vector is constrained to be orthogonal to linear independent vectors , which span the constraint subspace, that is, we require the approximate solution to satisfy the following conditions
This framework is common to many different mathematical methods and is known as the Petrov-Galerkin conditions.
There are two broad classes of projection methods:
Orthogonal: the constraint subspace is the same as the search subspace , i.e., .
Oblique: the constraint subspace is different from the search subspace , i.e., .
General Projection Methods
Let be an real matrix, and be two -dimensional subspaces of . A general projection method onto and orthogonal to is a process which
But as a iterative method, since we need to exploit the knowledge of some initial guess to the solution, the approximation shall be sought in the affine space instead of in every iteration. Hence the approximation problem should be redefined as follows:
Let’s rewrite the approximation as the general form and denote the residual vector , then becomes
The above problem can be represented as matrix form. Let be an matrix whose columns form a basis of and, similarly, be an matrix whose columns form a basis of . Then the problem can be written as
It’s obvious that has the explicit solution
if is invertible, and in this case the approximate solution is given by
There are two important special choices of and which can guarantee the invertibility of :
Error Projection: is positive definite, i.e., , and ;
Residual Projection: is invertible and ;
These two special cases give rise to some most popular iterative methods for solving linear systems, and they correspond to two optimization problems:
and : the optimization problem is to minimize the energy functional where is the true solution of the linear system .
is invertible and : the optimization problem is to minimize the energy functional where is the residual vector.
One-dimensional Projection Processes
One-dimensional projection processes are the simplest projection methods. They are defined when
where and are two vectors in . In this case, the new approximation takes the form (SAXPY operation, in Julia, we use muladd) and the Petrov-Galerkin conditions yields
There are three popular one-dimensional projection methods:
Steepest Descent Method: and ;
Minimal Residual Iteration: and ;
Residual Norm Steepest Descent Method: and .
Krylov Subspace Methods
A Krylov subspace method is a method for which the subspace is the Krylov subspace
where . We donate as when there is no ambiguity. The different versions of Krylov subspace methods arise from different choices of the constraint subspace and from the ways in which the system is preconditioned.
In the view of approximation theory, a Krylov subspace method are of the form
where is a certain polynomial of degree . In the simplest case where , the above formula reduces to
that is, is approximated by a polynomial of degree in applied to the residual vector .
The following two broad choices for give rise to the best-known techniques:
or its minimum-residual variant ;
, that is, the methods based on defining to be a Krylov subspace associated with .
Arnoldi Process
The Arnoldi process is a method for constructing an orthonormal basis for the Krylov subspace . It can be viewed as an orthogonal projection method onto for general non-Hermitian matrices.
At each step of the Arnoldi process, the algorithm multiplies the previous Arnoldi vector by and then orthogonalizes the resulting against all previous ’s by a Gram-Schmidt process. It will stop if the vector computed vanishes or becomes linearly dependent on the previous vectors. The Arnoldi process is summarized as follows:
Arnoldi's Process (exact arithmetic)
Input: a general non-Hermitian matrix , a same size vector , and an integer .
Output: orthonormal basis for the Krylov subspace and the corresponding upper Hessenberg matrix s.t. .
Initialization:
Choose a vector of norm 1.
For
# classical Gram-Schmidt Orthogonalization (should be replaced by MGS or HO)
Compute for .
Compute .
Let .
If then Stop
Let .
End
Return and .
It can be shown that if the Arnoldi process does not stop before steps, then the Arnoldi vectors form an orthonormal basis for the Krylov subspace . Furthermore, the matrix is upper Hessenberg and satisfies the relation . If we denote the matrix obtained from by removing the last row as , then the following relations hold:
and
In practice, the orthonormalization process in the Arnoldi process should be implemented using the modified Gram-Schmidt (MGS) process or the Householder transformation (HO) for numerical stability. Due to round-off errors, the classical Gram-Schmidt process is usually not stable enough in numerical computations, which leads to the loss of orthogonality of the Arnoldi vectors.
The key difference in MGS compared to CGS is that the computations of use the partially orthogonalized vector instead of the original vector . Hence, the implementation of MGS and CGS has the same complexity, but MGS is more numerically stable.
As another more reliable alternative, the Householder orthogonalization uses reflection matrices of the form . However, since the column vectors to be orthogonalized are not available in advance in Arnoldi’s process (they are obtained as , which requires computing the current basis vector first), the Householder algorithm constructs as where are previous Householder matrices. This vector is then multiplied by to get and the general Householder transforms are applied to it.
In this Householder Algorithm, satisfies:
Since the chosen satisfies for , the matrix has a -dimensional identity matrix in its top-left corner, corresponding to the reflection hyperplane of the Householder transformation, that is
Therefore, , which implies that are orthogonal. Consequently, the set is mutually orthogonal.
Thus, this algorithm can be viewed as performing a QR decomposition on , i.e.,
where the determination of each depends on the previous (for ).
@doc""" Arnoldi(A, ini, m; orthogonalization) Arnoldi method for computing the orthogonal basis V and the upper Hessenberg matrix H. """ function Arnoldi(A::AbstractMatrix{T}, ini::AbstractVector{T}, m::Int; orthogonalization::String = "MGS") where T n = size(A, 2)
if norm(ini) == 0 throw(ArgumentError("Initial vector is zero")) elseif size(ini, 1) != n throw(ArgumentError("Initial vector has wrong size: $(size(ini, 1))")) end
V = zeros(T, n, m+1) # Store orthogonal basis H = zeros(T, m+1, m) # Upper Hessenberg matrix V[:,1] = ini / norm(ini) # First orthogonal basis vector
if orthogonalization == "CGS"# Classical Gram-Schmidt for j = 1:m # Compute A * q_k v = A * V[:, j]
# Orthogonalization step (Classical Gram-Schmidt) H[1:j, j] = V[:, 1:j]' * v # Compute h_ij always using original v for i = 1:j v -= H[i, j] * V[:, i] end
H[j+1, j] = norm(v) V[:, j+1] = v / H[j+1, j] end elseif orthogonalization == "MGS"# Modified Gram-Schmidt for j = 1:m # Compute A * q_k v = A * V[:, j]
# Orthogonalization step (Modified Gram-Schmidt) for i = 1:j H[i, j] = dot(V[:, i], v) # Compute h_ij using partially orthogonalized v v -= H[i, j] * V[:, i] end
H[j+1, j] = norm(v) V[:, j+1] = v / H[j+1, j] end
# MGS could be impoved even further by using *double orthogonalization*: # Whenever the final vector is computed, a test is performed to compare its norm # with the norm of the initial norm(Av_j). If the reduction falls below a certain # threshold, indicating severe cancellation might have occurred, a second orthogonalization # is made. This is called double orthogonalization. elseif orthogonalization == "HO"# Householder z = V[:, 1] W = zeros(T, n, m+1) for j = 1 : m+1 W[:, j] = [zeros(T, j-1); z[j] - norm(z[j:n]); z[j+1:n]] W[:, j] = W[:, j] / norm(W[:, j]) if j == 1 V[:, j] = [zeros(T, j-1); 1; zeros(T, n-j)] V[:, j] = V[:, j] - 2 * W[:, j] * dot(W[:, j], V[:, j]) else H[1:j,j-1] = [z[1:j-1]; norm(z[j:n])] V[:, j] = [zeros(T, j-1); 1; zeros(T, n-j)] for i = j:-1:1 V[:, j] -= 2 * W[:, i] * dot(W[:, i], V[:, j]) end end if j < m+1 z = A * V[:, j] for i = 1:j z -= 2 * W[:, i] * dot(W[:, i], z) end end end else throw(ArgumentError( "Unsupported orthogonalization method: $orthogonalization, use CGS, MGS, or HO" )) end return V[:, 1:m], H[1:m, 1:m] end
# Test code if isinteractive() # Only execute this part of the code in interactive mode N = 100 A::Matrix{Float64} = rand(Float64, N, N) b::Vector{Float64} = rand(Float64, N) m = 99 @time V, H = Arnoldi(A, b, m; orthogonalization = "CGS") @printf("Error of Arnoldi with CGS: %.4e\n", norm(V'*A*V - H))
@time V, H = Arnoldi(A, b, m; orthogonalization = "MGS") @printf("Error of Arnoldi with MGS: %.4e\n", norm(V'*A*V - H))
@time V, H = Arnoldi(A, b, m; orthogonalization = "HO") @printf("Error of Arnoldi with HO: %.4e\n", norm(V'*A*V - H)) end
The output of the above code is as follows (in interactive mode), which shows the errors of Arnoldi (here we consider ) with different orthogonalization methods:
Output
1 2 3 4 5 6
0.000890 seconds (109 allocations: 322.984 KiB) Error of Arnoldi with CGS: 2.2750e-12 0.006154 seconds (112 allocations: 322.906 KiB) Error of Arnoldi with MGS: 2.5814e-13 0.001214 seconds (611 allocations: 1.188 MiB) Error of Arnoldi with HO: 1.3080e-13
The following table shows the computational complexity of the Arnoldi process with different orthogonalization methods, where MGSR stands for Modified Gram-Schmidt with Reorthogonalization.
CGS
MGS
MGSR
HO
Flops
Storage
Arnodli’s process is also a powerful tool for finding eigenvalues. In fact, it’s discovered that the eigenvalues of the Hessenberg matrix (which are called Ritz eigenvalues, can be computed efficiently for instance with the QR algorithm) obtained from a number of steps smaller than could provide accurate approximations to some eigenvalues of the original matrix, which leads to an efficient technique for approximating eigenvalues of large sparse matrices. This can be verified easily in Julia by using eigen function. An example of Arnoldi iteration demonstrating convergence of Ritz values to the eigenvalues of a matrix is shown below.
Arnoldi iteration demonstrating convergence of Ritz values (red) to the eigenvalues (black) of a matrix, composed of uniform random values on the domain [-0.5 +0.5]
FOM
Given an initial guess to the linear system , we frist consider an orthogonal projection method, which takes . Here is the initial residual vector. This method seeks an approximate solution from the affine space by imposing the Petrov-Galerkin condition
If let to be the first Arnoldi vector, and set , then recall that the Arnoldi iteration generates an orthonormal basis for the Krylov subspace which satisfies
and a direct computation shows that
As a result, if we denote the correction vector , then the Petrov-Galerkin condition can be written as
which gives
Therefore, the approximate solution is given by
The residual vector is such that
since . Recall that , we have , which leads to
Hence we have
A method based on this approach is called the Full Orthogonalization Method (FOM).
A rough estimate of the cost of FOM is determined as follows. If is the number of nonzero elements of , then steps of the Arnoldi process will require matrix-vector products at the cost of . Each of the Gram-Schmidt steps costs approximately operations, which brings the total over the steps to approximately . Thus, on the average, a step of FOM (with CGS or MGS) costs approximately flops.
Regarding storge, vectors of length are stored in . Additional vectors must be used to keep the current solutio and the right-hand side vector, and a scratch vector for the matrix-vector product. In addition, the Hessenberg matrix must be saved. Thus, the total storage is approximately . In most cases is small relative to , so this cost is dominated by the first term.
@doc""" FOM(A, b, m; ini, orthogonalization) Full Orthogonalization Method (FOM) for solving linear systems Ax = b. """ function FOM(A::Matrix{T}, b::Vector{T}, m::Int; ini::Vector{T} = zeros(T, size(A, 1)), orthogonalization::String = "MGS" ) where T r = b - A * ini if norm(r) == 0 return ini end V, H = Arnoldi(A, r, m; orthogonalization = orthogonalization) y = H \ [norm(r); zeros(T, m-1)] x = ini + V * y return x end
@doc""" RFOM(A, b; ini, orthogonalization, tol) Restarted Full Orthogonalization Method (RFOM) for solving linear systems Ax = b. """ function RFOM(A::Matrix{T}, b::Vector{T}; ini::Vector{T} = zeros(T, size(A, 1)), orthogonalization::String = "MGS", tol::Float64 = 1e-8 ) where T x = ini m = 1 r = b - A * x while maximum(abs, r) > tol && m < size(A, 1) x = FOM(A, b, m; ini = x, orthogonalization = orthogonalization) r = b - A * x m += 1 end return x end
# GMRES algorithm function GMRES(A::AbstractMatrix{T}, b::AbstractVector{T}, tol=1e-6, max_iter=1000) where T n = size(A,2) K = 0 # Initial guess x = zeros(n) r0 = b - A * x # Initial residual beta = norm(r0) # If the initial residual is less than the tolerance, return the zero solution if beta < tol return x end # Orthogonalization process Q = zeros(T, n, max_iter+1) # Store orthogonal basis H = zeros(T, max_iter+1, max_iter) # Upper Hessenberg matrix Q[:,1] = r0 / beta # First orthogonal basis vector for k = 1:max_iter # Compute A * q_k v = A * Q[:, k] # Orthogonalization step for j = 1:k H[j, k] = dot(Q[:, j], v) v -= H[j, k] * Q[:, j] end # Compute the norm of the remaining vector H[k+1, k] = norm(v) if H[k+1, k] < tol # If the norm is small enough, stop early K = k break end # Update orthogonal basis Q[:, k+1] = v / H[k+1, k] K = k end # Form the right-hand side vector k = K y = zeros(T, k) for i = 1:k y[i] = dot(Q[:, i], r0) end # Solve the least squares problem to get the solution vector x_gmres = H[1:k, 1:k] \ y x += Q[:, 1:k] * x_gmres return x end
# Test code if isinteractive() # Only execute this part of the code in interactive mode A::Matrix{Float32} = [4 -100; -14 -10; 0 -14 -1; 00 -13] b::Vector{Float32} = [15, 10, 10, 10] @time x_gmres = GMRES(A, b) println("GMRES solution: ", x_gmres) println("Ax = ", A * x_gmres) end