In my previous post on flow matching, we went over how to construct a continuous normalizing flow that transforms samples from a user specified prior, \(p_{t=0}\), to a target probability distribution, \(p_{t=1}\), that we can sample from. In order to do this, we assumed that we could construct some probability path \(p_t\) that can be written as a marginal distribution over a latent variable:
In this post we'll go over one easy way to choose \(p_y\) and \(p_{t|y}\) and go over how this relates to diffusion models.
Gaussian conditional flows
If we choose \(p_0 = N(0,I)\), then we can make the following choices for \(p_y\) and \(p_{t|y}\):
where \(\mu_t\) and \(\Sigma_t\) are differentiable functions of \(y\) that satisfy the following conditions:
We can verify that that these choices of \(p_y\) and \(p_{t|y}\) give us the correct marginals at \(t=0\) and \(t=1\):
Vector field that generates the probability path
In order to use these choices of \(p_y\) and \(p_{t|y}\), we need to be able to compute the vector field that generates the probability path. Because we are working with simple distributions like Gaussians, everything is available to us in closed form.
To find the vector field, we first need to the equation for the path that a sample can evolve on. Notice that we can sample \(x_t \sim p_{t|y}(x_t|y)\) using the model:
So we can differentiate \(x_t\) to get the conditional vector field:
The simplest choice of \(\mu_t\) and \(\Sigma_t\) that satisfies our boundary conditions is:
This leads us to the optimal transport conditional VFs from the flow matching paper and straight path example from my post on flow matching which has
Even though we are free to parametrize \(\Sigma\) in terms of \(y\) as well as \(t\), we will only consider the case where \(\Sigma\) only depends on \(t\) going forward so that things simplify. This is also the case that appears in practice.
Relationship between diffusion and flow matching
Now that we have the equations that describe the probability path, we can relate this to diffusion models by relating the vector fields that generate the probability path to the score function.
CNF from an SDE
The appendix of the flow matching paper has a section that shows how to go from a stochastic differential equation to vector field that generates its flow. This is expressed by the Fokker-Planck equation: If we have a stochastic differential equation of the form \(dx = f_tdt + g_t dw\), the its probability path has the form
where \(\Delta\) is the Laplace operator. If we rearrange terms to match the form of the continuity equation, we have that
where \(w_t\) is the vector field that generates the flow of \(p_t\). So the vector field that generates the flow of \(p_t\) is given by \(V_t = f_t - \frac{g_t^2}{2}\nabla \log p_t\).
SDE from a CNF
Next, we'll go in the reverse direction and show how CNFs that are constructed using conditional Gaussian probability paths can be expressed as stochastic differential equations.
Recall that our probability path has the form:
To get the score function of the probability path, we'll first show how the score function of the conditional Gaussian relates to its vector field that generates its flow.
Score function and vector field of a Gaussian
Recall that the score function of a Gaussian is given by
and the vector field that generates the flow is
We can multiply the score function by \(\Sigma_t^\frac{1}{2}\) and then plug it into the vector field equation to relate the two:
Now that we have a relationship between the score and vector field for the conditional distribution, we can take a look at how this relates to the score function and vector field of the marginal distribution.
Relationship of marginals
The key property that we will exploit is the posterior expectation property that both the score function and marginal vector field satisfy:
This is easily seen for the score function because
and the marginal vector field because of the continuity equation (see my flow matching post).
So now we can plug and chug our equations for the conditional distribution into these marginal equations:
Notice the similarity between this equation and the equation that we get from the Fokker-planck equation. Next, we'll show that in the conditional optimal transport case, the first term can be simplified fully.
Optimal transport conditional paths
We can look at the special case where we use the optimal transport conditional path where \(\mu_t(y) = ty\) and \(\Sigma_t = (1-t)^2I\). In this case, we can fully simplify the expression of the marginal distribution's vector field by exploiting the fact that \(\frac{d\mu_t(y)}{dt} = \frac{1}{t}\mu_t(y)\). Again, because we want to relate everything to the score function, lets write \(\mu_t(y)\) in terms of \(\nabla \log p_{t|y}\):
Now lets simplify the posterior expectation:
Now we're almost done! Next, we'll substitute \(\Sigma_t = (1-t)^2I\) into the full expression:
Our final expression that relates the vector field that generates the probability path and the score function of a CNF that is constructed using conditional probability paths is: