[TM] Stein’s lemma

This post is about Stein’s lemma, which first appeared in the landmark paper of Stein in 1981. This lemma is leads to Stein’s unbiased risk estimator and is useful for proving central limit theorems. It has also been called `Gaussian integration by parts’, which is in fact a high level description of the proof.

Lemma 1 (Stein’s lemma) If {y} follows the standard normal distribution, then

\displaystyle  \mathbb{E}\left[x f(x) \right] = \mathbb{E}\left[f'(x)\right],

if the expectations are well-defined.

Proof: If {f} is `nice’, then

\displaystyle  \begin{array}{rcl}  \mathbb{E}\left[f'(x)\right] &=& \int_{-\infty}^{\infty} f'(x) \frac{\exp(-x^2/2)}{\sqrt{2\pi}} dx\\ &=& 0 - \int_{-\infty}^{\infty} f(x) \left(-x \frac{\exp(-x^2/2)}{\sqrt{2\pi}} \right) dx \\ &=& \mathbb{E}\left[x f(x)\right]. \end{array}

\Box

It is also convinient to denote {\phi(x)} as the standard Normal density and remember that

\displaystyle  \phi'(x) = -x \phi(x).

Stein’s lemma can be generalized to exponential family distributions. In particular, for multivariate normals, if {y \sim \mathrm{NVM}_{p}(\mu, \sigma^2 I)} and {\hat{\mu} = r(y)} is any differentiable estimator, then we have

\displaystyle  \textrm{cov}\left(\hat{\mu}_i, y_i\right) = \sigma^2 \mathbb{E}\left[\frac{\partial \hat{\mu}_i}{\partial y_i}\right].

This is Equation (12.56) in `Computer Age Statistical Inference’ by Efron and Hastie.

This post is the second in the category `trivial matters’, where I formally write down some notes to myself about identities, tricks, and facts that I repeatedly use but (unfortunately) need to re-derive everytime I use them. Although these posts are short, they discuss important topics. The term `trivial matters’ is used as a sarcasm, because my blood boils everytime I see terms like `obviously’ or `it is trivial to show that …’ when I grade students’ homeworks.

[TM] Change of Variables in MCMC

This post is about change of variables in Markov chain Monte Carlo (MCMC), which is used quite often when the target distribution is supported on a subset of {\mathbb{R}^n}. For example, the Exponential distribution and the Log-Normal distribution are only supported on positive reals.

Consider a target distribution {\pi(x)} that is supported on a subset {S \subset \mathbb{R}^n}. If we use a random walk proposal {q_X(x' \mid x) = \mathrm{MVN}(x' ; x,\Sigma)}, then we might end up with a proposal {x'} such that {\pi(x') = 0} and, this might cause too few acceptance in the MCMC chain. If we can find a transformation {h:D \to \mathbb{R}^n} that is one-to-one, differentiable and spans {\mathbb{R}^n}, then we can consider a proposal {x' = h^{-1}(y')} where {y' \sim q_Y(\cdot \mid y = h(x))}. This proposal always yields a proposal {x'} such that {\pi(x') > 0.}

Of course, when we employ such a transformation in the proposal kernel, we need to be careful about evaluating the proposal densities. We know that the acceptance probability is {\alpha(x,x') = 1 \wedge \frac{\pi(x') q(x \mid x')}{\pi(x) q(x' \mid x)}}, and it should be no surprise that {q_X(x' \mid x) \not= q_Y(y' \mid y)} unless {h} is the identity map.

Let’s work out the acceptance ratio together carefully. Recall that change of variables proceeds as follows: when {Y \sim f_Y(y) } and we consider the transformation { y = h^{-1}(x)}, the pdf of {X = h^{-1}(Y)} is

\displaystyle f_X(x) = f_Y(h(x))|J_{h}(x)|.

When we apply this to the kernels {q_Y} and {q_X} we get that

\displaystyle q_X(x' \mid x ) = q_Y(h(x') \mid h(x)) \cdot |J_{h}(x')|.

Example 1 {Symmetric proposal on transformed space} If {q_Y(y' \mid y)} is a symmetric proposal, then the acceptance probability becomes

\displaystyle \alpha(x,x') = 1 \wedge \frac{\pi(x') |J_{h}(x)|}{\pi(x)|J_{h}(x')|} .

Here are two common transformations.

Example 2 (Log-transformation for {x} supported on {\mathbb{R}_+})


If {h = \log}, then {|J_{h}(x)| = 1 / x} and acceptance probability is

\displaystyle \alpha(x,x') = 1 \wedge \frac{\pi(x')x'}{\pi(x)x}.

Example 3 (Logit transformation for {x} supported on {(0,1)}) If {h(x) = \log(\frac{x}{1 - x})},then the inverse transformation is {h^{-1}(y) = \frac{exp(y)}{1 + \exp(y)}.} The acceptance probability is

\displaystyle \alpha(x,x') = 1 \wedge \frac{\pi(x')x'(1-x')}{\pi(x)x(1-x)}.

This post is the first one in the category `trivial matters’, where I formally write down some notes to myself about tricks and facts that I repeatedly use but (unfortunately) need to re-derive everytime I use them.