10. Nonlinear Optimization (Multiple Variables)
Note – This document was created using AI translation.10.1 Overview
This section addresses the problem of finding the minimum point of a multivariable nonlinear function (or, by reversing the sign, the maximum point):
\[
min f(x_1, x_2, \dots, x_n)
\]
Let \(x_1, x_2, \dots, x_n\) represent the elements of the column vector \(\boldsymbol{x}\), then the system can be concisely written as:
\[
min f(\boldsymbol{x})
\]
Similar to the single-variable case, we assume that the target region contains only one minimum point, meaning the method focuses solely on the vicinity of that specific minimum. This is referred to as local optimization or finding a local minimum.
10.2 Methods for Nonlinear Optimization of Multivariable Functions
10.2.1 Descent Method
Descent method is a method that finds the minimum point iteratively as follows.
\[
\boldsymbol{x_{k+1}} = \boldsymbol{x_k} + \alpha_k\boldsymbol{d_k}
\]
where \(\boldsymbol{d_k}\) is called the direction vector, and \(\alpha_k\) the step size.
This process is visualized below. First, a direction vector \(\boldsymbol{d_k}\) is determined toward the expected minimum. Then, the step size \(\alpha_k\) is determined using single-variable optimization along the direction. This iteration continues, progressively moving closer to the true minimum.
10.2.1.1 Steepest Descent Method
The steepest descent method chooses the direction vector based on the maximum gradient direction. If the contour lines are nearly circular, convergence to the minimum is quick. However, in practical cases, convergence is often slower than expected.
10.2.1.2 Newton’s Method
If the function \(f\) is differentiable, its gradient is defined as:
\[
\nabla f(\boldsymbol{x}) =
\begin{pmatrix}
{\partial}f(\boldsymbol{x})/{\partial}x_1 \\
{\partial}f(\boldsymbol{x})/{\partial}x_2 \\
\vdots \\
{\partial}f(\boldsymbol{x})/{\partial}x_n \\
\end{pmatrix}
\]
If \(f\) is twice differentiable, its Hessian matrix is defined as:
\[
\boldsymbol{H}(\boldsymbol{x}) =
\begin{pmatrix}
{\partial}^2f(\boldsymbol{x})/{{\partial}x_1{\partial}x_1} & {\partial}^2f(\boldsymbol{x})/{{\partial}x_2{\partial}x_1} & \cdots & {\partial}^2f(\boldsymbol{x})/{{\partial}x_n{\partial}x_1} \\
{\partial}^2f(\boldsymbol{x})/{{\partial}x_1{\partial}x_2} & {\partial}^2f(\boldsymbol{x})/{{\partial}x_2{\partial}x_2} & \cdots & {\partial}^2f(\boldsymbol{x})/{{\partial}x_n{\partial}x_2} \\
\vdots & \vdots & \ddots & \vdots \\
{\partial}^2f(\boldsymbol{x})/{{\partial}x_1{\partial}x_n} & {\partial}^2f(\boldsymbol{x})/{{\partial}x_2{\partial}x_n} & \cdots & {\partial}^2f(\boldsymbol{x})/{{\partial}x_n{\partial}x_n} \\
\end{pmatrix}
\]
If both first and second derivatives of \(f\) are continuous, \(\boldsymbol{H}(\boldsymbol{x})\) is a positive-definite symmetric matrix.
To find the local minimum, Newton’s method solves \(\nabla f(\boldsymbol{x}) = 0\) as a nonlinear system of equations, producing the iteration formula:
\[
\begin{align}
& \boldsymbol{d_k} = -\boldsymbol{H}(\boldsymbol{x_k})^{-1}\nabla f(\boldsymbol{x_k}) \\
& \boldsymbol{x_{k+1}} = \boldsymbol{x_k} + \boldsymbol{d_k} \\
\end{align}
\]
Here, \(\boldsymbol{d_k}\) is the descent direction and is sometimes called the Newton direction. While Newton’s method sets \(\alpha_k = 1\), in practice, single-variable optimization along the Newton direction is required to determine \(\alpha_k\).
Newton’s method requires caution because computing the Hessian matrix can be computationally expensive, and the Hessian matrix may lose its positive-definiteness during iterations.
10.2.1.3 Quasi-Newton Method
To bypass Newton’s method’s difficulties, quasi-Newton methods approximate the Hessian matrix \(\boldsymbol{H}(\boldsymbol{x})\) using an alternative positive-definite matrix \(\boldsymbol{H}\).
A widely used approach is BFGS (Broyden-Fletcher-Goldfarb-Shanno) method:
\[
\boldsymbol{H_{k+1}} = \boldsymbol{H_k} – \boldsymbol{H_k}\boldsymbol{s_k}(\boldsymbol{H_k}\boldsymbol{s_k})^T/(\boldsymbol{s_k}^T\boldsymbol{H_k}\boldsymbol{s_k}) + \boldsymbol{y_k}\boldsymbol{y_k}^T/(\boldsymbol{s_k}\boldsymbol{y_k})
\]
where \(\boldsymbol{s_k} = \boldsymbol{x_{k+1}} – \boldsymbol{x_k}\) and \(\boldsymbol{y_k} = \nabla f(\boldsymbol{x_{k+1}}) – \nabla f(\boldsymbol{x_k})\).
Let \(\boldsymbol{H_0}\) be an appropriate positive definite symmetric matrix, for example the identity matrix, and apply this recurrence formula iteratively.
10.2.2 Trust Region Method
A trust region is defined as a circle of radius \(\delta_k\) centered at \(\boldsymbol{x_k}\).
Using a second order Taylor series approximation, a local quadratic approximate function \(\boldsymbol{q_k}(\boldsymbol{s})\) is created:
\[
\boldsymbol{q_k}(\boldsymbol{s}) = f(\boldsymbol{x_k}) + \boldsymbol{s}^T \nabla f(\boldsymbol{x_k}) + (1/2)\boldsymbol{s}^T\boldsymbol{H_k}\boldsymbol{s}
\]
where \(\boldsymbol{H_k}\) represents either the Hessian matrix at \(\boldsymbol{x_k}\) or an approximate matrix like the quasi-Newton method.
The trust region radius \(\delta_k\) reflects the domain where the quadratic approximation is expected to be valid.
The trust region method solves \(\boldsymbol{q_k(s)}\) to determine its minimum point \(\boldsymbol{s_k}\) within the region, and updates \(\boldsymbol{x}\) as \(\boldsymbol{x_{k+1}} = \boldsymbol{s_k}\). This process simultaneously determines the direction and step size in terms of descent method.
If \(\boldsymbol{H_k}\) is positive-definite and \(||\boldsymbol{H_k}^{-1}\nabla f(\boldsymbol{x_k})|| \leq \delta_k\) then the minimum point is calculated as \(\boldsymbol{s_k} = \boldsymbol{H_k}^{-1}\nabla f(\boldsymbol{x_k})\) within the trust region. Otherwise, the minimum point might lie outside the trust region, then the minimum point on the trust-region circle is calculated and adopted as \(s_k\).
The trust-region radius \(\delta_k\) is dynamically expanded or reduced based on convergence progress.
10.3 Solving Nonlinear Optimization Problems for Multivariable Functions Using XLPack
For nonlinear optimization problems, you can use the VBA subroutine Optif0, which applies the quasi-Newton method. This subroutine approximates derivatives (gradient and Hessian matrix) using finite differences and the BFGS formula, eliminating the need for users to compute derivatives manually. Optif0 can also be accessed through the XLPack solver.
The solution found is a local minimum, meaning that global minimum points cannot generally be determined. Multiple local minima may exist, and choosing an initial value close to the desired minimum is essential to ensure convergence to the intended solution.
Example Problem
Find the minimum point of the following function:
\[
f(x_1, x_2) = 100(x_2 – x_1^2)^2 + (1 – x_1)^2
\]
The function shape is shown below (the vertical axis is logarithmic):
This function, known as the Rosenbrock function, is commonly used as a test problem for finding the minimum point (1, 1) starting from the initial point (-1.2, 1). While generally choosing a closer initial value is preferred, the test intentionally starts from a more distant point. The function features a deep curved valley, making it challenging to reach the minimum.
10.3.1 Solving Using VBA Program (1)
Below is an example VBA program using Optif0.
Sub F(N As Long, X() As Double, Fval As Double)
Fval = 100 * (X(1) - X(0) ^ 2) ^ 2 + (1 - X(0)) ^ 2
End Sub
Sub Start()
Const NMax = 10, Ndata = 5
Dim N As Long, X(NMax) As Double, Xpls(NMax) As Double, Fpls As Double
Dim Info As Long, I As Long
N = 2
For I = 0 To Ndata - 1
'--- Input data
X(0) = Cells(6 + I, 1): X(1) = Cells(6 + I, 2)
'--- Find min point of equation
Call Optif0(N, X(), AddressOf F, Xpls(), Fpls, Info)
'--- Output result
Cells(6 + I, 3) = Xpls(0): Cells(6 + I, 4) = Xpls(1): Cells(6 + I, 5) = Info
Next
End Sub
An external subroutine defines the function, and Optif0 is called with the initial values.
Optif0 iteratively computes the closest minimum starting from the given initial estimate. This program tests five different initial values to examine the solutions obtained.
The tested initial values include (-1.2, 1), (-1, 1), (-1, -1), (0, 1), and (0, 0).
Running this program yielded the following results:
Some initial values, like (-1, 1), failed to converge.
10.3.2 Solving Using VBA Program (2)
Below is an example using Optif0_r, the reverse communication version (RCI).
Sub Start()
Const NMax = 10, Ndata = 5
Dim N As Long, X(NMax) As Double, Xpls(NMax) As Double, Fpls As Double
Dim Info As Long, I As Long
Dim XX(NMax - 1) As Double, YY As Double, IRev As Long
N = 2
For I = 0 To Ndata - 1
'--- Input data
X(0) = Cells(6 + I, 1): X(1) = Cells(6 + I, 2)
'--- Find min point of equation
IRev = 0
Do
Call Optif0_r(N, X(), Xpls(), Fpls, Info, XX(), YY, IRev)
If IRev = 1 Then YY = 100 * (XX(1) - XX(0) ^ 2) ^ 2 + (1 - XX(0)) ^ 2
Loop While IRev <> 0
'--- Output result
Cells(6 + I, 3) = Xpls(0): Cells(6 + I, 4) = Xpls(1): Cells(6 + I, 5) = Info
Next
End Sub
Instead of using an external function, this version directly computes function values when IRev = 1, assigns them to YY, and re-calls Optif0_r. More details on RCI are available here.
Running this program produced the same results.
10.3.3 Solving Using the Solver
The XLPack Solver Add-in can also solve the optimization problem using the “Multivariate Nonlinear Optimization” function. The formula (=100*(C8-B8^2)^2+(1-B8)^2) is entered in cell B9.
For more information about the solver, refer to this link.