This is a super delayed Sunday Reading Notes post and today I come back to finish discussing the informed proposals paper by Giacomo Zanella. In Part I of my discussions, I reviewed point-wise informed proposals and locally-informed criterion. In Part II, I hope to focus on asymptotical optimality of locally informed proposals and their simulation studies.
The author uses Peskun ordering to compare efficiency of MH schemes. He deduces that ‘locally-balanced proposals are asymptotically optimal in terms of Peskun ordering’ as dimensionality of the underlying state space increases. Conditions for asymptotic optimality are indeed mild (Proposition 1) for the three illustrative examples: 1) independent binary components, 2) weighted permutation and 3) Ising model.
In Section 4, Zanella points out a connection between the balancing function and acceptance probability function (APF) for MH algorithms, which I find very interesting. He also shows that the optimal proposal for independent binary variables is Barker choice The proof goes by finding the limiting continuous-time process of the MH chain and finding the optimal for the limiting process.
The simulation studies use the illustrative examples: weighted permutations and Ising models. The comparisons are in terms of 1) acceptance rate, 2) number of successful flips per computation time and 3) mixing of some summary statistics. The second criterion concerns the trade-off between computational cost (of calculating the informed proposal) and statistical accuracy (by producing efficient moves). For simple target distributions (such as Uniform), using locally-balanced proposals does not bring much benefits but it achieves a much higher number of flips per unit time for more complicated and ‘rough’ targets. See second row of Figure 1 of the paper.
I find it interesting that globally-balanced proposals (aka ‘naively-informed’ or ) are extremely sensitive to initialization. Looking at the effective sample size (ESS) per time, the chains from GB has much more stable behavior if initialized from stationarity. See GB v.s. GB( station.) in Figure 2. But in Figure 5, this phenomenon does not show up for Ising models: initializing from stationarity does not yield ESS performance comparable to that of LBs.
In the simulation studies section, the author emphasizes the cost-vs-efficiency trade-off, which I find very important. I feel I have ignored this aspect of designing MCMC algorithms and should think more about it in my future studies. The author indicates that ‘computations required to sample from locally-balanced proposals are trivially parallelizable’. This is also something very interesting to me and I hope to learn more about multi-core computations during my PhD.
In his discussions, the author makes reference to Multiple-Try Metropolis. The connection between LB proposals and MT proposals is not entirely obvious to me, but I intuitively agree with the author’s comment that the weight function in MT serves a very similar purpose as the balancing function term in LB.
From my downloaded version of the paper, the transition kernel of the first equation on Page 1 is wrong, because the terms needs to be multiplied by its acceptance probability
- Zanella, G. (2017). Informed proposals for local MCMC in discrete spaces. arXiv preprint arXiv:1711.07424.
- Liu, J. S., Liang, F., & Wong, W. H. (2000). The multiple-try method and local optimization in Metropolis sampling. Journal of the American Statistical Association, 95(449), 121-134.