Category Archives: optimization

SRN – Further and stronger analogy between sampling and optimization by Arnak Dalalyan

Lately, I have been pondering the connection between Monte Carlo based sampling methods and optimation methods. It brings me to read a paper that I have been wanting to read in detail long ago. In this week’s Sunday Reading Notes series on Phyllis with Data, I discuss the paper “Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent” by Arnak Dalalyan. 

This paper focuses on the Langevin Monte Carlo algorithm, which I believe is also called the Unadjusted Langevin algorithm (ULA) in contrast to the Metropolis adjusted Langevin algorithm (MALA). Using the Wasserstein distance as the metric, this paper establishes an upper bound of the Wasserstein distance, more precisely it shows that one needs O(\frac{d}{\epsilon^2}\log(\frac{d}{\epsilon})) number of iterations to reach the precision level . Considering the close connection between ULA and gradient descent algorithm for optimization, the author proves an upper bound for the L2 distance for gradient descent by cleverly utilizing a tempering argument (details in section 3 of the paper). The well-known result for optimization is that it takes O(\log(\frac{1}{\epsilon})) iterations to reach precision in gradient descent. Intuitively sampling should be a more difficult task than optimization because it requires an exploration of the whole parameter space, which can be of high-dimension. The bounds in this paper confirm this intuition. More importantly, it points out how these two problems are “continuously” connected (as the temperature converges to 0) and that this connection naturally leads to the deduction of the optimization bound from the sampling bound. 

Admittedly this is not a new paper as it was first arxived in 2016, there have been many papers tackling similar problems in sampling, for example using the KL-divergence (Cheng and Bartleet 2019) or extending the results to MALA (Dwivedi et al. 2019). When I first picked up this paper two years ago as a second-year Ph.D. student, I was intimidated by the proof, which in fact was only 6-pages long. This time, as I sat down and went through the proof line by line, I found it quite accessible and elegant. I have been equipped with the mathematical knowledge required to understand the proof in college. This time, I was defeated by my own fear. For me, this paper is not only a delightful read, but also a wake-up call to trust myself.