## SRN – the magical 0.574

Hello 2021! This week, I keep reading about optimal scaling of Markov chain Monte Carlo algorithms and I move on to a new class of algorithms: MALA. This time, the magical number is 0.574.

Hello 2021! Here is the first post in the Sunday Reading Notes series in the year of 2021. This week, I keep reading about optimal scaling of Markov chain Monte Carlo algorithms and I move on to a new class of algorithms: MALA.

Roberts and Rosenthal (1998, JRSSB) concerns the optimal scaling of Metropolis-Adjusted Langevin algorithm (MALA) for target distributions of the form ${\pi^d(x) = \prod_{i=1}^d f(x_i)}$. Unlike random walk Metropolis algorithm, in MALA, the proposal distribution uses the gradient of ${\pi^d}$ at ${x_t}$:

$\displaystyle x^{\star} = x_t + \sigma_n Z_{t+1} + \frac{\sigma^2}{2} \nabla \log (\pi^d(x_t)).$

Here ${\sigma_n}$ is the step variance and ${Z_{t+1} \sim \mathcal{N}(0,I_d)}$ is a d-dimensional standard normal. Following this proposal, we perform an accept / reject step, so that the resulting Markov chain has the desired stationary distribution:
$\displaystyle \alpha_d(x_t,x^{\star}) = \min \left\{ 1, \frac{\pi^d(x^{\star})}{\pi^d(x_t)} \frac{q_d(x^{\star},x_t)}{q_d(x_t,y_{t+1})}\right\}.$

There are some smoothness assumptions on the log-density ${g = \log f}$: absolute value of the first 8 order derivatives are bounded, ${g'}$ is Lipschitz, and all moments exist for ${f}$.

Theorem 3 (diffusion limit of first component)

Consider the chain ${\{X\}}$ starting from stationarity and following the MALA algorithm with variance ${\sigma_d^2 = l^2 n^{-1/3}}$ for some ${l > 0}$. Let ${\{J\}}$ be a Poisson process with rate ${n^{1/3}}$, ${\Gamma^n_t = X_{J_t}}$ and ${U^d}$ is the first component of ${\Gamma^n}$, i.e. ${U^d_t = X^d_{J_t,1}}$. We must have, as ${d \to \infty}$, the process ${U^d \to U}$, where ${U}$ is a Langevin diffusion defined by
$\displaystyle dU_t = \sqrt{h(l)} dB_t + \frac{1}{2} h(l) \frac{d}{dx} \log (\pi_1(U_t))dt.$

Here ${h(l) = 2 l^2 \Phi(-Kl^3/2)}$ is the speed measure and ${K = \sqrt{\mathbb{E} \left[\frac{5g'''(X)^3 - 3g''(X)}{48}\right]} > 0}$.
The skeleton of the proof is similar to that of proving diffusion limit of random walk Metropolis. We will find sets ${F_d^* \subseteq \mathbb{R}^d}$ such that ${\mathbb{P}\left(\Gamma_s^d \in F_d^{\star} \ \forall \ 0 \le s \le t\right) \to 1}$ and on these sets the generators converges uniformly.

Theorem 4 (convergence of acceptance rate)
$\displaystyle \lim_{d \to \infty} \alpha_d(l) = \alpha(l)$ where ${a(l) = 2 \Phi(-Kl^3/2).}$
As a result, when ${h(l)}$ is maximized, the acceptance rate ${\alpha(l) \approx 0.574}$. Also the step variance should be tuned to be ${\sigma_d^2 = (\hat{l}_d)^2 n^{-1/3}.}$

MALA enjoys a faster convergence rate of order ${n^{1/3}}$ compared to order ${n}$ of RWM. But let’s also keep in mind that MALA requires an extra step to compute the gradients ${\log \pi(x) = \sum_{i=1}^n g'(x)}$, which takes ${n}$ steps to compute.

Reference:

Roberts, G. O., & Rosenthal, J. S. (1998). Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology)60(1), 255-268.