Deep Dive: Hessians
Recently, I was taught the second derivative test for multivariable functions , which goes something like this:
Let be a critical point of , and define to be . Then we have:
If , the test is inconclusive.
No proof was given, so I decided to go and try to find my own intuition and ended up discovering some fascinating things along the way.
I. MOTIVATION
The first question we should ask ourselves is: what visual properties do a function’s extrema and saddle points have in 3D?
There are several ways to answer this question, but I will begin by considering the function’s gradient field. For instance, here’s the gradient field for :
We can clearly see that all the gradients point away from the local minimum; for a local maximum, it would be the opposite.
On the other hand, here’s the plot and gradient field of , a clear saddle shape:
Plot | Gradient field |
---|---|
Here, we see some vectors pointing toward the critical point, while some point away. This makes sense as a saddle point is, by definition, a local max along some paths and a local min along others.
These observations thus motivate a method to quickly check whether the gradients in the neighborhood of point away from, toward, or in both directions from , which brings us to…
II. HESSIANS
A Hessian in 2 dimensions is defined like so:
Immediately, we recognize as the determinant of this matrix. As we will soon find, this is no coincidence.
Note: for all following explanations, assume WLOG that the critical point in question is .
But first, let’s think about what this matrix is telling us. Let’s multiply it by the vector :
In words, this multiplication tells us how much a move by will change the function’s gradient. Moreover, since ,
Thus, is simply a linear approximation of the gradient at .
At this point, we should formalize what it means for the field to point away from or toward the origin. If for all ,
then all the gradients point away from the origin, and thus is a local minimum (note that the left term is just the dot product between and its approximated gradient). Take a moment to reason out the intuition behind this definition if you haven’t already.
Similarly, if these dot products are all less than 0, is a local max; if some are greater than 0 and others are less than 0, we have a saddle point; and otherwise, the test is inconclusive.
Those well-versed in linear algebra might already see where this is going…
III. DEFINITE MATRICES
This Wikipedia article, in all its length, somehow manages to glance over key intutions, which I hope to cover here.
First, recall the Hessian must be a symmetric matrix, since . An important property of all symmetric matrices is that they all correspond to a diagonal transform (i.e. a dilation, potentially non-uniform) on a rotated set of perpendicular axes.
The axes are shown here in red, and we can clearly see the dilation stretches along one of the axes and shrinks along the other.
As long as neither axis is dilated by a negative factor (i.e. is subject to a reflection), we can see intuitively that the resultant vector field will be pointing away from the origin. On the other hand, if both are dilated by negative factors, all the vectors will be pointing toward the origin. And if one stretch factor is negative and the other positive, we will have a mix of vectors pointing in both directions—this corresponds to the saddle case.
Now we see where the determinant comes in: if we think of the determinant of a matrix transform as the amount it scales area by, we can see that the determinant of our transform with stretch factors and should just be . If and are both positive or both negative, corresponding to the local minimum and local maximum cases, respectively, then the determinant will be positive. Otherwise, the determinant must either be less than 0, which corresponds to the saddle case, or 0, in which case the test is inconclusive.
IV. FURTHER READING
Hopefully, you’ve gained a better intuition for both Hessians and the second derivative test for multivariable functions. If you’re curious, here are some more concepts to explore:
- A symmetric matrix has perpendicular eigenvectors. N.B: you actually already saw this!
- The determinant of any matrix is the product of its eigenvalues.
- What does a matrix transpose mean visually?