Rectified Flow Toy

Rectified Flow is a technique for improving generative models such as image generators. Below is an interactive visualization of how it works.

Consider building a generative model for images of cats. A naive way to do this is to make a list of pairs (noise_image, cat_image) and train a deep neural net in the usual supervised way. You generate cat images by constructing a random noise image and plugging it into the model.

This doesn't work very well because, in addition to learning the distribution of cat images, it has to learn the completely arbitrary way they are mapped to noise images. Since the noise images are random anyway, what if we could choose them differently so they mapped more naturally to cat images? Rectified flow is a procedure for doing that.

Some notation: Let's call the the noise images $x_i$ and the cat images $y_i$ . The subscript $i$ indexes into the training set of images. We start by choosing random $x_i^0$ s, which we'll replace with better ones $x_i^1$ etc.

Since we've made $x$ and $y$ belong to the same vector space (such as 256x256x3 RGB images), we can talk about the difference $x_i - y_i$ and about the path between them. We can interpolate points between them, such as $t x_i + (1-t) y_i$ .

The procedure starts by training a temporary model to follow the reverse path from $y_i$ to $x^0_i$ . We can do this by training on the mapping from random interpolations to the difference vector:

t x_i^0 + (1-t) y_i \to x_i^0 - y_i ~~~ \forall t \in [0,1]

We can now use this network in a differential equation solver to trace the path from $y_i$ to $x_i$ , by following the flow for $1$ unit. It would exactly reproduce the path if there's just one image, but when there are many images, we should expect paths to nearly intersect. When this happens, both retraced paths will deflect and end at different points. We call these $x_i^1$ .

The new set of points $x_i^1$ will have a similar distribution to the original $x_i^0$ (ie, uniform random), but it is easier for a model to learn the $x_i^1 \to y_i$ mapping. Choosing a random noise image and plugging it into this model should generate a better cat image than a model trained on $x_i^0 \to y_i$ .

Here's an interactive visualization. You can drag the image samples $y_i$ and original noise samples $x^0_i$ around. The gray lines link corresponding pairs. Notice that they often cross. The intermediate points show the process of following the flow, leading to the new random samples $x^1_i$ . You would then train the final model to map $x^1_i$ to $y_i$ .

WebGPU not supported

Rectified Flow Toy

Further reading