After a very long break, I come back to the Sunday Reading Notes and write about the paper “A Frame for Adaptive MCMC Targeting Multimodal Distributions” by Emilia Pompe et al. This paper proposes a Monte Carlo method to sample from multimodal distributions. The “divide and conquer” idea separates the sampling task into (1) identifying the modes, (2) sample efficiently within modes and (3) moving between the modes. While the paper emphasizes more heavily on addressing task (2) and (3) and establishing ergodicity , it provides a framework that unites all three tasks.
The main algorithm in this paper is Jumping Adaptive Multimodal Sampler (JAMS). It has a burn-in algorithm that uses optimization-based procedure to identify modes, to merge them and to estimate covariance matrices around the modes. After task (1) is solved by the burn-in algorithm, JAMS alternates between the later two tasks with local moves that samples around a mode and jump moves that facilitate crossing the low density regions. The local moves are adaptive in the sense that it relies on local kernels with parameters that are learned on the fly. The jump moves propose a new mode and choose a new point around the new mode.
This paper proves ergodicity of JAMS by considering it as a case of a general class of Auxiliary Variable Adaptive MCMC. It also provides extensive simulation studies on multivariate Gaussian (d = 200) , mixtures of t-distributions and bananas and a Bayesian model for sensor network localisation. The results are compared against Adaptive Parallel Tempering (APT). In all the cases tested, JAMS out performs APT in terms of root mean squared error scaled by dimension of the problem. JAMS is also more robust to initialization in term of recovering all the modes.
I usually think of Monte Carlo integration and optimization as very similar tasks and even neglect the area of optimization. This paper is a wake up call for me, as I believe the current JAMS algorithm relies on a successful mode identifying stage. The discussions in Section 6 also suggest mode finding as the bottle neck.