Rectified Flow Toy
Rectified Flow is a technique for improving generative models such as image generators. Below is an interactive visualization of how it works.
Consider building a generative model for images of cats. A naive way to do this is to make a list of pairs (noise_image, cat_image) and train a deep neural net in the usual supervised way. You generate cat images by constructing a random noise image and plugging it into the model.
This doesn't work very well because, in addition to learning the distribution of cat images, it has to learn the completely arbitrary way they are mapped to noise images. Since the noise images are random anyway, what if we could choose them differently so they mapped more naturally to cat images? Rectified flow is a procedure for doing that.
Some notation: Let's call the the noise images and the cat images . The subscript indexes into the training set of images. We start by choosing random s, which we'll replace with better ones etc.
Since we've made and belong to the same vector space (such as 256x256x3 RGB images), we can talk about the difference and about the path between them. We can interpolate points between them, such as .
The procedure starts by training a temporary model to follow the reverse path from to . We can do this by training on the mapping from random interpolations to the difference vector:
We can now use this network in a differential equation solver to trace the path from to , by following the flow for unit. It would exactly reproduce the path if there's just one image, but when there are many images, we should expect paths to nearly intersect. When this happens, both retraced paths will deflect and end at different points. We call these .
The new set of points will have a similar distribution to the original (ie, uniform random), but it is easier for a model to learn the mapping. Choosing a random noise image and plugging it into this model should generate a better cat image than a model trained on .
Here's an interactive visualization. You can drag the image samples and original noise samples around. The gray lines link corresponding pairs. Notice that they often cross. The intermediate points show the process of following the flow, leading to the new random samples . You would then train the final model to map to .
WebGPU not supported
Notes
Further reading
An Introduction to Flow Matching
The paradox of diffusion distillation Perspectives on diffusion