# SRN – Informed proposals for local MCMC in discrete spaces by Zanella (Part I)

This week I am reading ‘Informed proposals for local MCMC in discrete spaces‘ by Giacomo Zanella. This paper is about designing MCMC algorithms for discrete-values high-dimensional parameters, and the goal is similar to the papers discussed in previous posts (Hamming ball sampler & auxiliary-variable HMC). I decide to split the Sunday Reading Notes on this paper into two parts, because I find many interesting ideas in this paper.

In this paper, Zanella come up with locally-balanced proposals. Suppose $\pi(x)$ is the target density and $K_{\sigma}(x,dy)$ is an uninformed proposal. We assume that as $\sigma \to 0$ the kernel $K_{\sigma}(x,dy)$ converges to the delta measure. Zanella seeks to modify this uninformed proposal so that it incorporates information about the target $\pi$ and is biased towards areas with higher density. An example of locally-balanced proposals is $Q_{\sqrt{\pi}} (x,dy) = \frac{\sqrt{\pi(y) }K_{\sigma}(x,dy)}{(\sqrt{\pi} * K_{\sigma})(x)}$. This kernel is reversible with respect to $\sqrt{\pi(x)}(\sqrt{\pi} * K_{\sigma})(x)$, which converges to $\pi(x)dx$ as $x \to 0.$ [Note the normalizing constatn is the convolution $\sqrt{\pi(x)}* K_{\sigma} = \int \sqrt{\pi(y)} K_{\sigma}(x,dy)].$]

More generally, Zanella considers a class of pointwise informed proposals that has the structure $Q_{g,\sigma} = \frac{1}{Z_{g}}\cdot g\left(\frac{\pi(y)}{\pi(x)}\right) K_{\sigma}(x,dy).$ It is suggested that the function $g$ satisfy $g(t) = t g(1/t).$

I will save the discussion on locally-balanced proposals and Peskun optimality to Part II. In this part, I want to discuss Section 5: Connection to MALA and gradient-based MCMC. In continuous space, the point-wise informed proposal $Q_{g,\sigma}$ would be infeasible to sample from because of the term $g\left(\frac{\pi(y)}{\pi(x)}\right) .$ If we take a first-order Taylor expansion, we would have $Q_{g,\sigma}^{(1)} \propto g \left( \exp ( \nabla \log \pi(x) (y-x)) \right) K_{\sigma}(x,dy).$ If we choose $g(t) = \sqrt{t}$ and $K_{\sigma}(x,\cdot) =N(x,\sigma^2)$, this is the MALA proposal.

I find this connection very interesting, although I do not have a good intuition about where this connection comes from. One way to explain it is that gradient-based MCMC in continuous space is using local information to design informed proposals. In the conclusions, the author mentions that this connection should improve robustness of gradient-based MCMC schemes and help with parameter tuning.

References:(x)

•  Zanella, G. (2017). Informed proposals for local MCMC in discrete spaces. arXiv preprint arXiv:1711.07424.