This Sunday Reading Notes post is about a recent article on variational inference (VI).
In variational inference, we can approximate a posterior distribution by finding a distribution
that is the `closest’ to
among a collection of functions
. Once a divergence between
and
has been chosen, we can rely on optimization algorithms such as stochastic gradient descent to find
The `exclusive’ Kullback-Leiber (KL) divergence has been popular in VI, due to the ease of working with an expectation with respect to the approximating distribution . This article, however, considers the `inclusive’ KL
Minimizing is equivalent to minimizing the cross entropy
whose gradient is
If we can find unbiased estimates of , then with a Robbins-Monroe schedule
, we can use stochastic gradient descent to approximate
This article propose Markovian Score Climbing (MSC) as another way to approximate . Given an Markov kernel
that leases the posterior distribution
invariant, one step of the MSC iterations operates as follows.
(*) Sample .
(*) Compute the gradient
(*) Set
The authors prove that almost surely and illustrate it on the skew normal distribution. One advantage of MSC is that only one sample is required per
update. Also, the Markov kernel
provides a systematic way of incorporating information from current sample
and current parameter
. As the authors point out, one example of such a proposal is a conditional SMC update [Section 2.4.3 of Andrieu et al., 2010].
While this article definitely provides a general purpose VI method, I am more intrigued by the MCMC samples . What can we say about the samples
? Can we make use of them?
References:
Naesseth, C. A., Lindsten, F., & Blei, D. (2020). Markovian score climbing: Variational inference with KL (p|| q). arXiv preprint arXiv:2003.10374.
Andrieu, C., Doucet, A., & Holenstein, R. (2010). Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3), 269-342.