In this post we'll be looking at the dynamics of MCMC following Ma et al. 2015 by looking at how to construct stationary distributions for MCMC using stochastic differential equations.
MCMC and SGMCMC
Markov chain monte carlo (MCMC) is an algorithm to samples from a target probability density function \(p^*(x)\). Assume that we can evaluate \(p^*(x)\) up to a normalizing constant, there are a variety of MCMC algorithms that can be used to sample from \(p^*(x)\). The one that we'll look at is stochastic gradient MCMC (SGMCMC). These algorithms access the score function, \(\nabla \log p^*(x)\), and use it to construct a stochastic differential equation (SDE) that has \(p^*(x)\) as its stationary distribution. The SDE is then discretized and run until it reaches a stationary distribution which will be constructed to be \(p^*(x)\). In this post, we'll go through how to come up with SGMCMC from first principals.
Consider the multivariate Ito SDE:
Our goal will be to figure out how to choose \(\mu\) so that the stationary distribution is \(p^*(x)\). In order to do so, we first need to look at the marginal distribution of the SDE at any time \(t\), \(p_t(x)\). This is given by the Fokker-Planck equation. Let \(D_{ij} = \sigma_{ik}\sigma_{jk}\). Then the Fokker-Planck equation is:
It is also true that if \(Q_{ij}(x,t)\) is a skew symmetric matrix \((Q=-Q^T)\), then \(\frac{\partial^2 f Q_{ij}}{\partial x^i \partial x^j} = 0\) for any function \(f\), so we also have that
Notice that this equation now resembles the continuity equation, \(\frac{\partial p_t}{\partial t} + \frac{\partial p_tV_i}{\partial x_i} = 0\) where \(V\) is the vector field that determines how particles evolve. So in the case of the SDE, we have that:
The stationary distribution of the SDE is achieved when \(V=0\) (which leaves \(p_t\) constant) which will clearly be equal to our target when we set
This is the "complete recipe" described in Ma et al. for constructing SGMCMC algorithms. The proof that this is complete (describes all MCMC algorithms) is given in the appendix of the paper and basically shows how to construct \(Q\). Now we can take a lookg at the dynamics of MCMC.
MCMC dynamics
In the first part we saw that the SDE with the form
where
will have \(p^*(x)\) as its stationary distribution and that the dynamics of the SDE are given by
where \(p_t\) is the marginal distribution of the SDE at time \(t\).