Hello 2021! Here is the first post in the Sunday Reading Notes series in the year of 2021. This week, I keep reading about optimal scaling of Markov chain Monte Carlo algorithms and I move on to a new class of algorithms: MALA.
Roberts and Rosenthal (1998, JRSSB) concerns the optimal scaling of Metropolis-Adjusted Langevin algorithm (MALA) for target distributions of the form . Unlike random walk Metropolis algorithm, in MALA, the proposal distribution uses the gradient of
at
:
Here is the step variance and
is a d-dimensional standard normal. Following this proposal, we perform an accept / reject step, so that the resulting Markov chain has the desired stationary distribution:
There are some smoothness assumptions on the log-density : absolute value of the first 8 order derivatives are bounded,
is Lipschitz, and all moments exist for
.
Theorem 3 (diffusion limit of first component)
Consider the chain starting from stationarity and following the MALA algorithm with variance
for some
. Let
be a Poisson process with rate
,
and
is the first component of
, i.e.
. We must have, as
, the process
, where
is a Langevin diffusion defined by
Here is the speed measure and
.
The skeleton of the proof is similar to that of proving diffusion limit of random walk Metropolis. We will find sets such that
and on these sets the generators converges uniformly.
Theorem 4 (convergence of acceptance rate)
where
As a result, when is maximized, the acceptance rate
. Also the step variance should be tuned to be
MALA enjoys a faster convergence rate of order compared to order
of RWM. But let’s also keep in mind that MALA requires an extra step to compute the gradients
, which takes
steps to compute.
Reference:
Roberts, G. O., & Rosenthal, J. S. (1998). Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1), 255-268.