Burak Buke

Lecturer
School of Mathematics
The University of Edinburgh
The King’s Buildings
James Clerk Maxwell Building

Calculus of Variations

These notes are based on the excellent book van Brunt [1]. Most equation stated below requires some additional conditions (e.g. Jacobians being non-zero) to be well-posed and/or having unique solutions. To keep the exposition short, I do not mention these conditions and assume that they hold.

It is customary to start the discussion about calculus of variations with some sample problems. The study on calculus of variation is initiated by the demands of classical mechanics. Below are some problems which are considered classical in this area.

Some Sample Problems

Catenary Problem:

Suppose that we wish to hang a wire with end points $(x_0,y_0)$ and $(x_1,y_1)$ so as to minimize the total potential energy. Taking $y(x)$ as the height of the wire at point $x$, and $m$ as the density, we can write the problem as

\[\displaystyle\min_{y\in C[x_0.x_1]:y(x_0)=y_0, y(x_1)=y_1}\int_{x_0}^{x_1}my(x)\sqrt{1+(y'(x))^2}dx.\]

Obviously, one can get rid of $m$ as it is just a constant multiplier.

The optimal solution of the above formulation can be a wire of any length. In real world, we are generally given a wire of length $L$ and find the solution with this additional (generally referred as isoperimetric) constraint.

Catenoid Problem: This problem can be thought of a variant of the catenary problem. Suppose, we wish to fix a wire with end-points as above and rotate it around the $x$-axis. What would be the optimal $y(x)$ which will minimize the surface area of the resulting 3D object? This problem can be formulated as

\[\displaystyle\min_{y\in C^1[x_0.x_1]:y(x_0)=y_0, y(x_1)=y_1}\int_{x_0}^{x_1}2\pi y(x)\sqrt{1+(y'(x))^2}dx.\]

This differs from the catenary problem only by a multiple and hence has the same optimizer.

Brachystochrone Problem

This problem is also similar in nature to the catenary problem. For this problem, again our wire is hanged with end points $(x_0,y_0)$ and $(x_1,y_1)$, where we now assume $x_0>x_1$ and $y_0>y_1$. We wish to place a bead on the wire and let it slide from $(x_0,y_0)$ to $(x_1,y_1)$. Our objective is to find the shape of the wire which will minimize the time to reach the bottom point. In fact the word “βραχύστο χρόνο” means short time in Greek. To formulate this problem we need to find out the speed of the bead when it reaches point $(x,y(x))$, which can be simply done using conservation of energy:

$mgy_0-mgy(x)=\frac{1}{2}mv(x)^2\Rightarrow v(x)=\sqrt{2g(y_0-y(x))}$.

Now, the arclength between $x$ and $x+dx$ is equal to $\sqrt{1+(y'(x))^2}dx$, so it will take $\frac{\sqrt{1+(y'(x))^2}dx}{\sqrt{2g(y_0-y(x))}}$. Integrating, we can formulate our problem as

\[\displaystyle\min_{y\in C^1[x_0.x_1]:y(x_0)=y_0, y(x_1)=y_1}\int_{x_0}^{x_1}\frac{\sqrt{1+(y'(x))^2}dx}{\sqrt{2g(y_0-y(x))}}.\]

By a change of variables $w(x)=y_0-y(x)$, we can formulate an equivalent problem as

\[\displaystyle\min_{w\in C^1[x_0.x_1]:w(x_0)=0, w(x_1)=y_0-y_1}\int_{x_0}^{x_1}\frac{\sqrt{1+(w'(x))^2}dx}{\sqrt{2gw(x))}}.\]

Cycloids: The solution to brachistochrone problem is solved by a curve called a cycloid. This curve has been known before the brachistochrone problem is solved and has the property that wherever on this curve you release the bead (at 0 velocity), it takes the same amount of time for the bead to reach the end point. This property earns the name isochrone for cycloids.

Hamilton’s Principle

Suppose a particle is in $\mathbb{R}^3$, then the forces acting on this particle can be calculated by derivating the so-called potential energy of $V$. Also, suppose that the kinetic energy of a particle is denoted as $T=\frac{1}{2}m((x'(t))^2+(y'(t))^2+(z'(t))^2)$. Then, we define $L=T-V$ as the Lagrangian. If a particle moves from a point $r(t_0)$ to $r(t_1)$, Hamilton’s principle states that the path taken should be a stationary point of $\int_{t_0}^{t_1}L(t)dt$.

Necessary Conditions Using the First Variation: Euler-Lagrange Equations

This section assumes familiarity with why the necessary condition for optimality of unconstrained problems in finite dimensions $\nabla F(x)=0$ holds. The condition relies on the fact that at a stationary point small variantions will lead to a zero change (as a derivative).

So, let $y(x)$ be a stationary point and then any small change $\epsilon \eta(x)$ satisfies something similar to the gradient condition above. Here, we are assuming that the boundary conditions are given, which imply that $\eta(x)=0$ on the boundary. If

\[J(y)=\int_{x_0}^{x_1}f(x,y,y')dx\]

is the objective we are trying to minimize (or maximize). Then,

$\displaystyle\frac{J(y+\epsilon\eta)-J(y)}{\epsilon}=\frac{\int_{x_0}^{x_1}f(x,y+\epsilon\eta,y'+\epsilon\eta')dx}{\epsilon}\to 0$, as $\epsilon\to 0.$

Now, using the first order Taylor approximation, this implies

\[\int_{x_0}^{x_1}\eta\partial f_y +\eta'\partial f_ {y'}=0.\]

Now, use the integration by parts formula in the second part using the fact that the function $\eta(x)=0$ on the boundary

\[\int_{x_0}^{x_1}\eta(\partial f_y+\frac{d}{dx}\partial_{y'}f)=0\]

for all $\eta$ possible, which in turn imply

$\partial f_y+\frac{d}{dx}\partial_{y'}f=0$.

These are called the Euler-Lagrange equations. It is not guaranteed that these equations will have a (unique) solution. There might be anomalies, which can be analyzed using the approaches to the existence and uniqueness of boundary-value problems

Inverse Problems

The question can be started from Euler-Lagrange equations and traced backwards. That is given a differential equation, can we find an objective function whose Euler-Lagrange equation is the one that is given. This problem is called the inverse problem.

Constrained Problems

Sometimes some additional constraints can be imposed on the optimization problem in a similar fashion to the optimization problems in $\mathbb{R}^n$. They will be dealt with in the same fashion in the variational problems, using the Lagrange Multipliers. An example of these problems would be isoperimetric problems.

It turns out that some constraints are easier to deal with than others. If the constraint does not include the derivative, i.e., it can be written as $g(x,y)=0$ which are called holonomic constraints the problem greatly simplifies as compared to $g(x,y,y')=0$, the nonholonomic constraints. For the holonomic constraints simple conditions on the derivatives of $g$ and $L$ is enough to guarantee that there exists a Lagrange multiplier. For nonholonomic constraints the situation is a bit more difficult.

Variable Endpoints

So far, we have assumed that the endpoints $(x_0,y_0)$ and $(x_1, y_1)$. This is a crucial assumption after we perform the integration-by-parts. What will happen, if we relax this assumption? The assumption can be relaxed in two steps. In the first step, we will assume that we know $x_0$ and $x_1$ but are free to change $y$.

Known $x_0$ and $x_1$

Again, performing the same steps

\[\begin{array}{rl} J(y+\epsilon\eta)-J(y)&=\int_{x_0}^{x_1}f(x,y+\epsilon \eta(x),y'+\epsilon\eta'(x))-f(x,y,y')dx\\\ &\approx \int_{x_0}^{x_1}\epsilon(\eta(x)\partial_yf + \eta'(x)\partial_{y'}f)dx\\ &=\left.\epsilon\eta'(x)\partial_{y'}f\right|_{x_0}^{x_1}+\int_{x_0}^{x_1}\epsilon\eta(x)\left(\partial_yf + \frac{d}{dx}\partial_{y'}f\right)dx\\ \end{array}\]

The boundary conditions assure that the first term is 0 when we know $y_0$ and $y_1$. However, when they are not given, to make sure that the first term is 0 for all $\eta$, we need to consider the situations where either $\eta'(x_0)=0$ or $\eta'(x_1)=0$ which yields:

\[\left.\partial_{y'}f\right|_{x_0}=\left.\partial_{y'}f\right|_{x_1}=0.\]

These together with the usual Euclid-Lagrange equations characterize the necessary conditions.

General Case

This case is considerably more difficult than the case above. In this case, again fix $(x_0,y_0)$ and $(x_1,y_1)$ and consider a small variation around these points $(\hat{x}_0,\hat{y}_0)$ and $(\hat{x}_1,\hat{y}_1)$, where $\hat{x}_i=x_i+\epsilon X_i$ and $\hat{y}_i=y_0+\epsilon Y_i$ for $i=0,1$. Again write the difference: $\begin{array}{rl} \displaystyle\int_{\hat{x}_0}^{\hat{x}_1}f(x,\hat{y},\hat{y}')dx-\int_{x_0}^{x_1}f(x,y,y')dx&\approx \displaystyle\epsilon \left(\eta(x)\left.\partial_{y'}f\right|_{x_0}^{x_1}+\int_{x_0}^{x_1}\eta(x)\left(\partial_yf + \frac{d}{dx}\partial_{y'}f\right)dx\right.\\&\quad\quad\quad+\left.\displaystyle X_1f(x_1,y_1,y'(x_1))-X_0f(x_0, y(x_0), y'(x_0))\vphantom{\int_{x_0}^{x_1}}\right). \end{array}\label{eq:expansion}$

The trick here is to realise, for i=0,1,

\[\hat{y}_i=y_i+\epsilon Y_i=\hat{y}(\hat{x}_i)= y_i+\epsilon X_iy'(x_i)+\epsilon\eta(x_i)+\epsilon^2\eta'(x_i)+O(\epsilon^2).\]

The term $\epsilon^2\eta'(x_i)$ is $O(\epsilon^2)$ and hence can also be ignored. Hence, we have

\[\eta(x_i)=Y_i-X_iy'(x_i)+O(\epsilon).\]

Now, to get rid of the first term on the righthand side of $(\ref{eq:expansion})$,

\[\begin{array}{rl} \displaystyle\int_{\hat{x}_0}^{\hat{x}_1}f(x,\hat{y},\hat{y}')dx-\int_{x_0}^{x_1}f(x,y,y')dx&\approx \displaystyle\epsilon \left(\int_{x_0}^{x_1}\eta(x)\left(\partial_yf + \frac{d}{dx}\partial_{y'}f\right)dx+Y_1\left.\frac{\partial f}{\partial y'}\right|_{x_1}-Y_0\left.\frac{\partial f}{\partial y'}\right|_{x_0}\right.\\&\quad\quad\quad+\left.\displaystyle X_1\left.\left(f-\frac{\partial f}{\partial y'}\right)\right|_{x_1}-X_0\left.\left(f-\frac{\partial f}{\partial y'}\right)\right|_{x_0}\vphantom{\int_{x_0}^{x_1}}\right). \end{array}\]

Our freedom to choose $\eta(x), Y_i$ and $X_i$ ($i=0,1$) imply that our necessary conditions are

\[\partial f_y+\frac{d}{dx}\partial_{y'}f=0\]

and on the boundary points

\[\frac{\partial f}{\partial y'}=0 \mbox{ and }f=0.\]

Transversality Conditions

The boundary conditions above can be thought of as

\[\left.p\delta_y-H\delta_x\right|_{x_0}^{x_1}=0,\]

where

\[p=\frac{\partial f}{\partial y'}, H=y'p-f, \delta_y(x_i)=Y_i \mbox{ and }\delta_x(x_i)=X_i, i=0,1.\]

Now, we can think of our problem as having two independent (decision) variables which can be parameterized as

\[r(s)=(x(s),y(s)).\]

Keeping this view in mind, we can write the conditions on boundary points as

\[\frac{dy}{ds}p-\frac{dx}{ds}H=0.\]

These conditions are known as the transversality conditions .

The Hamiltonian Formulation

The Legendre Transform

Suppose that we have a strictly convex function. The simplest way to think of the function is obviously $(x,f(x)).$ Suppose that we are given a pair $(p,H(p))$, where $p$ corresponds to the derivative of the function at some point and $H(p)$ is such a function that we can recover the point $x$ where $p$ is a tangent to $f(x)$ as well as the function. This new pair will provide us with another method to characterize our function, that is there is a one-to-one relationship between $(x,f(x))$ and $(p, H(p))$. To achieve this, we define $H(p)=-f(x)+px$ (or more precisely $H(p)=-f((f')^{-1}(p))+p(f')^{-1}(p)$).

By taking the derivative, we see that $\displaystyle \frac{dH(p)}{dp}=(f')^{-1}(p) \mbox{ and } -H(p)+ p\frac{dH(p)}{dp}=f(x).\label{eq:HamiltonManip}$

**Example: ** Suppose that we wish to find the transformation for $f(x)=x^2$. First, $(f')^{-1}(p)=p/2.$ and therefore $H(p)=p^2/4-p^2/2=p^2/4.$ Now, we see the Legendre transformation as

\[(x,x^2)\rightarrow (p,p^2/4).\]

The Hamiltonian

The objective function of a variational problem admits three parameters and generally written as $\int f(x,y,y')dx$. Here, there is a complicated relationship (the derivative) between the second and the third parameter. Using the logic behind the Legendre transform, we can convert the problem into an equivalent form where the second and the third parameters are somewhat independent. To do so, fix $x$ and $y$ and thinking of $f(x,y,y')$ only as a function of $y'$ find the corresponding Legendre transform. Set $p=\frac{\partial f}{\partial y'}$ and

$H(x,y,p)=-f(x,y,(\frac{\partial f}{\partial y'})^{-1}(x,y,p))+(\frac{\partial f}{\partial y'})^{-1}(x,y,p)p$.

This can be done in $n$-dimensions as well by thinking $p_i=\frac{\partial f}{\partial y_i'}.$ The function $H$ is called the Hamiltonian. The function

\[f(x,\vec{y}, \vec{y'})=-H(x,\vec{y},\vec{p})+\sum_{i=1}^ny_i'p_i\]

is called the Lagrangian.

Hamilton’s Equations

After, we write the Hamiltonian, we have two set of unknown functions $(\vec{p},\vec{y})$.

The definition of Hamiltonian implies that $p_k=\frac{\partial H}{\partial y'_k}$ and also $\frac{\partial H}{\partial y_k}=-\frac{\partial f}{\partial y_k}$ , and Euler-Lagrange equations state $\frac{d}{dx}\frac{\partial f}{\partial y'_k}-\frac{\partial f}{\partial y_k}=0$. Combining these, we have

$p_k'=\frac{\partial H}{\partial y_k}$ and $y'=\frac{\partial H}{\partial p_k}$ (using $(\ref{eq:HamiltonManip})$)

These equations are called Hamilton’s equations.

Symplectic Maps

In some cases, it will be worthwhile to write the problem with a change of coordinates in the Hamiltonian using $Y_i=Y_i(x,y,p), P_i=P_i(x,y,p)$ and expressing the Hamiltonian as $\hat{H}(x,Y,P).$ A very interesting case will be when the difference of Lagrangian’s corresponding to these Hamiltonian’s is the derivative of a smooth function $\phi(x,q,Q)$, i.e.,

\[\displaystyle\frac{d}{dx}\phi(x,y,Y)=\left(\sum_{i=1}^np_iy_i-H(x,y,p)\right)-\left(\sum_{i=1}^nP_iY_i-H(x,Y,P)\right).\]

Using simple calculus (differential of a function), we also know that

\[\displaystyle\frac{d}{dx}\phi(x,y,Y)=\sum_{i=1}^n\left(\frac{\partial\phi}{\partial y_i}y_i'+\frac{\partial\phi}{\partial Y_i}Y_i'\right)+\frac{\partial\phi}{\partial x}.\]

Now, a term-by-term comparison of these two equations yields

\[p_i=\frac{\partial\phi}{\partial y_i}, P_i=\frac{\partial\phi}{\partial Y_i}\mbox{ and }\hat{H}(x,Y,P)=H(x,y,p)+\frac{\partial\phi}{\partial x}.\]

Hamilton-Jacobi Equation

Now, suppose we can find a simplectic map, with $\hat{H}$ and $\phi$ for our Hamiltonian system, where

\[\displaystyle\hat{H}=H+\frac{\partial\phi}{\partial x}=0.\]

Replacing $p_i=\frac{\partial\phi}{\partial y_i}$, we have the so-called Hamilton-Jacobi (H-J) Equation as

\[\displaystyle H(x,y,\frac{\partial\phi}{\partial y_i})+\frac{\partial\phi}{\partial x}=0.\]

Now, the famous theorem of Hamilton and Jacobi can be stated as follows:

Theorem (Hamilton-Jacobi): If $\phi(x,y,\alpha)$ is a solution to H-J equations. Then, the solution to the Hamilton’s equation can be given as $\frac{\partial\phi}{\partial \alpha_i}=-\beta_i$ and $\frac{\partial\phi}{\partial y_i}=p_i,$ where $\beta_i$s are arbitrary constants.

Using this theorem, we can solve a variational problem using the following procedure:

Find the Hamiltonian for the variational problem
Solve the H-J equation to find $\phi.$
Using the theorem, find $y(x,\alpha, \beta)$ as a general solution.

[1] B. van Brunt (2004), The Calculus of Variations, Springer, New York

2021 1

2021