It has been two years since I last tried to understand the paper ‘Controlled Sequential Monte Carlo’ by Jeremy Heng et al. Over the past two years, I picked this paper up many times, and every time I read it, I understood it a little better. I am finally ready to share my understanding of this paper on this blog.

This paper opens with an introduction to the Feynman-Kac path measure. A triple of (1) initial distribution, (2) a collection of Markov kernels, and (3) potential functions determines a Feynman-Kac path measure. For readers familiar with state-space models, this is not something new: the triple can correspond to (1) the initial distribution of hidden process, (2) Markov kernels of the hidden process and (3) observation densities.

For each Feynman-Kac path measure, there is a sequential Monte Carlo algorithm (aka particle filters) that provides a particle approximation of the path measure. Furthermore, Feynman-Kac path measures can be twisted. Although all the kernels and the potentials are changed, the path measure is the same at the terminal time and the normalizing constant at the terminal time stays the same.

To infer state-space models, we are interested in the smoothing distribution given or the marginal likelihood . The smoothing distribution has Feynman-Kac measure representations, with the marginal likelihood being the normalizing constant. The most natural one is described above, and it corresponds to the bootstrap particle filter. But it is not unique. For example, the path measure associated with a 1-step look ahead kernel corresponds to the popular ‘fully-adapted auxiliary particle filter’. As mentioned above, each of these path measures can be twisted. Proposition 1 defines an optimal policy with respect to each path measure. Optimality is defined in the sense of approximating the normalizing constant almost surely. The optimal policy with respect to the path measure corresponding to the bootstrap particle filter is the backward information filter .

The optimal policies are usually intractable, but they can be approximated. After running any SMC algorithm, we have a particle approximation of the path measure, which gives us a particle approximation of the optimal policy. We can use regression to learn a simple policy that minimized the difference between and particle approximation of . This regression is done recursively starting at and ending at , and it is called approximate dynamic programming in the paper. Given this policy, we have a new path measure, which means that we can run another particle filter according to it and find the optimal policy with respect to the new measure. This process is called policy refinement. According to the authors, in practice, only a few iterations of policy refinement is necessary. They also offer performance monitoring criteria based on effective sample size and regression residuals.

What fascinates (and puzzles) me is that errors do not accumulate over either iteration of policy refinement or approximate dynamic programming. I hope to read this paper is greater detail and be able to report back to my readers soon.