This is really cool! Took me a minute to grok, so hopefully I can save someone a tiny bit of effort, and anyone who understands this better than me can correct any misconceptions I had.
Pick random points (blue/black squares) and sample the function (the red line) there. What does that tell you about the function? Well, the function can’t be much bigger than those points nearby those point, right? Assuming the function is continuous at least. But what about the parts of the function that aren’t near where you sampled?
Well those could be a bunch bigger, but the closer they are to sampled points, the less big they could be in theory. Let’s say they can’t be bigger than k times the distance from a sampled point (the green line) plus the value at that point. But what is k?
k is the “Lipschitz constant” of the function. You can think of it kinda like the maximum slope of the function, but that’s not 100% accurate. How to we estimate k?
We’ll take each triplet of 3 consecutive points on the function, and create the unique parabola that goes through them (you did this in 8th grade, remember?). Whichever of these parabolas (that have a maximum, not a minimum) has the highest max will be the “winner” (morphing dark green parabola). Let’s take the slope of that parabola at the lowest point on it. For math reasons (that I totally understand, trust me), this is a good estimation of k.
That we have a decent estimation for places where the max of the function can lie, lets start sampling the function there (at the highest spikes of the green line). Whenever we do that, we either find a new global maximum, or we decrease the how much error our estimation can have by getting rid of the highest points on the green line. When the difference between the highest point on the green line and the highest point we’ve found (this is the maximum error) is small enough, we stop, and report the biggest point we’ve found as the max of the function, and we know we’re pretty close and how close we are.
The big thing I’m wondering: This works spectacularly on 1D functions. But that already wasn’t hard to do. How does this approach scale up to higher dimensional functions?
Also, this only really works on a bounded function, as the green line goes towards infinity beyond the first and last points. How do you deal with that?