SRN – my second attempt to understand ‘Controlled SMC’ by Heng et al.

It has been two years since I last tried to understand the paper ‘Controlled Sequential Monte Carlo’ by Jeremy Heng et al. Over the past two years, I picked this paper up many times, and every time I read it, I understood it a little better. I am finally ready to share my understanding of this paper on this blog.

It has been two years since I last tried to understand the paper ‘Controlled Sequential Monte Carlo’ by Jeremy Heng et al. Over the past two years, I picked this paper up many times, and every time I read it, I understood it a little better. I am finally ready to share my understanding of this paper on this blog.

This paper opens with an introduction to the Feynman-Kac path measure. A triple of (1) initial distribution, (2) a collection of Markov kernels, and (3) potential functions determines a Feynman-Kac path measure. For readers familiar with state-space models, this is not something new: the triple can correspond to (1) the initial distribution of hidden process, (2) Markov kernels of the hidden process and (3) observation densities.

The Markov chain on path space
Feynman-Kac path measure defined by the Markov chain and the potentials.

For each Feynman-Kac path measure, there is a sequential Monte Carlo algorithm (aka particle filters) that provides a particle approximation of the path measure. Furthermore, Feynman-Kac path measures can be twisted. Although all the kernels and the potentials are changed, the path measure is the same at the terminal time and the normalizing constant at the terminal time Z_T stays the same.

The Markov chain twisted by policy \phi
Eq. (13): The twisted potentials. Eq. (12): rewrite the path measure with the twisted potentials and twisted kernels.

To infer state-space models, we are interested in the smoothing distribution x_{0:T} given y_{0:T} or the marginal likelihood p(Y_{0:T}). The smoothing distribution has Feynman-Kac measure representations, with the marginal likelihood being the normalizing constant.  The most natural one is described above, and it corresponds to the bootstrap particle filter. But it is not unique. For example, the path measure associated with a 1-step look ahead kernel corresponds to the popular ‘fully-adapted auxiliary particle filter’. As mentioned above, each of these path measures can be twisted. Proposition 1 defines an optimal policy with respect to each path measure. Optimality is defined in the sense of approximating the normalizing constant almost surely. The optimal policy with respect to the path measure corresponding to the bootstrap particle filter is the backward information filter \phi^{\star}(x_t) = p(y_{t:T}|x_t)

The optimal policies \phi^{\star} are usually intractable, but they can be approximated. After running any SMC algorithm, we have a particle approximation of the path measure, which gives us a particle approximation of the optimal policy. We can use regression to learn a simple policy \phi that minimized the difference between \log\phi and particle approximation of \log\phi^{\star}. This regression is done recursively starting at t = T and ending at t = 0, and it is called approximate dynamic programming in the paper.  Given this policy, we have a new path measure, which means that we can run another particle filter according to it and find the optimal policy with respect to the new measure. This process is called policy refinement. According to the authors, in practice, only a few iterations of policy refinement is necessary. They also offer performance monitoring criteria based on effective sample size and regression residuals. 

What fascinates (and puzzles) me is that errors do not accumulate over either iteration of policy refinement or approximate dynamic programming. I hope to read this paper is greater detail and be able to report back to my readers soon. 

SRN – The Iterated Auxiliary Particle Filter by Guarniero et al.

In this week’s Sunday Reading Post, I want to discuss the paper ‘The Iterated Auxiliary Particle Filter’ by Pieralberto Guarniero, Adam Johansen and Antony Lee.  The algorithm proposed, iAPF, approximates an idealized particle filter, which would have zero variance in terms of estimating the marginal likelihood of a hidden Markov model. It is worth mentioning that finding if we were to use an idealized particle filter, we would only need one particle instead of a system of particles for marginal likelihood estimations and that the idealized particle filter would be the backward smoothing provided a perfect forward filtering step. 

In this week’s Sunday Reading Post, I want to discuss the paper ‘The Iterated Auxiliary Particle Filter’ (iAPF) by Pieralberto Guarniero, Adam Johansen and Antony Lee.  

The algorithm proposed, iAPF, approximates an idealized particle filter, which would have zero variance in terms of estimating the marginal likelihood of a hidden Markov model. The motivation is that if we were to use an idealized particle filter, we would only need one particle instead of a system of particles for marginal likelihood estimations. The idealized particle filter would be the backward-smoothing provided a perfect forward-filtering step. 

This paper builds upon previous studies of particle filters, including twisted particle filter and the Auxiliary Particle Filter (APF). The idea is to introduce a family of ‘twisted’ models through functions \psi in order to define a new model that would have the same marginal likelihood as the original model. Because the optimal policy \psi^\star,which corresponds to the idealized particle filter, is intractable in general cases, the authors use a series of functional approximations \psi^0,\psi^1, \cdots, \psi^l to approximate \psi^\star.

The key the question to me is how to and how well can we approximate \psi^\star. The ‘how to’ qustions is addressed in Section 3.3. The series of backward recursions that are used to approximate \psi^\star relies on the iterative relationship \psi^\star(x_t) = p(y_t|x_t) f(x_t,\psi^\star_{t+1}) = p(y_t|x_t)\int_X f(x_{t},x_{t+1})\psi^{\star}_{t+1}(x_{t+1}) dx_{t+1}. In Algorithms 3, the authors suggest that we ‘choose \psi^{t} on the basis of \psi_{t}^{1:N} and \psi_t^{1:N}. Later in Section 5, they mention that this step is implemented with a parametric optimization step for their experiments. In the ‘Discussions’ section, the authors mention alternatives such as nonparametric estimates which would require much higher cost. More importantly is the ‘how well’ part, because it is also possible that the class of functions \Psi that we consider does not contain the optimal function \psi. I think this is a very interesting and importantly case and the authors report ‘fairly well’ performance of iAPF in this case. 

As demonstrated in the ‘Applications and Examples’ section, “iAPF can provide substantially better estimates of the marginal likelihood than the Bootstrap Particle Filter (BPS) at the same computational cost.

Log-likelihood estimates using BPF with N = 1000,10000 particles and iAPF with N = 100 particles. We can see from the width of the box plots that iAPF has smaller variances compared BPF by using fewer particles.  Source: (Guarniero et al 2017)

References:

  • Guarniero, P., Johansen, A. M., & Lee, A. (2017). The iterated auxiliary particle filter. Journal of the American Statistical Association112(520), 1636-1647.
  • Pitt, M. K., & Shephard, N. (1999). Filtering via simulation: Auxiliary particle filters. Journal of the American statistical association94(446), 590-599
  • Whiteley, N., & Lee, A. (2014). Twisted particle filters. The Annals of Statistics42(1), 115-141.