SRN – evidence and cross-validation

A paper that has intrigued me recently is “On the marginal likelihood and cross-validation” by E. Fong and C.C.Holmes on Biometrika.

Model evidence appears as marginal likelihood or prior predictive in this paper:

\displaystyle p_{\mathcal{M}}(y_{1:n}) = \int f_{\theta}(y_{1:n}) d\pi(\theta).

As the paper seeks to establish the connection between model evidence and cross-validation (CV), which is based on predictive performance on held-out test sets, it notes that the log evidence relates to log posterior predictive probability

\displaystyle \log p_{\mathcal{M}}(y_{1:n}) = \sum_{i=1}^{n} \log p_{\mathcal{M}}(y_i \mid y_{1:i-1}) = \sum_{i=1}^{n} \log \int f_{\theta}(y_i) d\pi(\theta \mid y_{1:i-1}).

So log evidence can be interpreted as a predictive sequential scoring rule with score function
\displaystyle s(y_i \mid y_{1:i-1}) = \log p_{\mathcal{M}}(y_i \mid y_{1:i-1}).

The paper begins by arguing that model evidence is the unique scoring rule that guarantees coherency (data ordering should not change the result of inference when data are exchangeable).

Then the authors move on to show the equivalence between evidence and cumulative cross-validation scores. When we consider leave-{p}-out CV, there are {\binom{n}{p}} number of held-out test sets and each set has some predictive score when the rest of the data is used for training. The leave-{p}-out CV score, denoted by {S_{CV}(y_{1:n};p)}, is the average of these predictive scores. When the log posterior predictive probability is used as the scoring rule, then we must have

\displaystyle \log p_{\mathcal{M}}(y_{1:n}) = \sum_{p=1}^n S_{CV}(y_{1:n};p).

So the log evidence is also the sum of leave-{p}-out CV scores, for all values of {p = 1,\ldots,n.}