Method enables better control of GAN image generators’ output

January 19, 2024

[ad_1]

Generative adversarial networks (GANs) are a technology that can produce remarkably realistic synthetic images. From a set of real images, a GAN learns a mapping from a latent distribution to the image distribution represented in the training dataset.

Modifying images by controlling GANs is a lively topic of research, whose applications include dataset creation and augmentation, image editing, and entertainment. Researchers have developed ever more sophisticated techniques for both exploring and structuring latent spaces, in order to understand how movement through the spaces translates to modification of synthetic images’ properties.

In a paper that we presented at this year’s European Conference on Computer Vision (ECCV), my colleagues and I describe a new technique that offers precise control over GAN outputs. Unlike prior techniques, ours can hold selected image attributes steady — say, the location and appearance of one sofa in a room — while varying others.

In this sequence of images, we hold one feature of a GAN-generated image (the sofa, boxed in red in the first image) steady while varying the others around it.

Prior approaches to controlling GANs depended on linear trajectories through the latent space, along which some feature would vary — say, the age of the faces being generated, or the extent to which they were smiling or frowning. Researchers either looked for existing axes in a latent space, in which case the correlations with image features were rarely exact, or they intentionally structured the space so that it lent itself to linear trajectories, in which case they had to know in advance which image features they wanted to control.

Our method can find nonlinear trajectories through a latent space that hold some properties steady (in this case, the identity of a — nonexistent, synthetic — face) and vary other properties (hair length or color).

Instead of correlating spatial axes with predetermined features, our method plots a nonlinear trajectory through a GAN’s latent space. Consequently, it can work with existing GANs, regardless of the structure of their latent spaces. That means we can, in principle, control multiple arbitrary attributes.

By the same token, we can control features that would be difficult for humans to annotate accurately — and therefore difficult to capture by modifying the structure of the latent space. For instance, taking the Fourier transform of an image, we can fix the high-frequency characteristics and vary the low-frequency characteristics, producing clearly distinct images whose variations, however, are difficult to explain:

A source image (far left), followed by three images in which low-frequency characteristics are held steady, while high-frequency characteristics are varied, and three images in which the reverse is true. It would be difficult for a human annotator to label the differences between the images.

Finally, most work on controllable GANs has focused on synthetic faces, which simplifies the problem somewhat, since the same facial characteristics tend to inhabit approximately the same regions of the image. Our method, because it plots local trajectories through an arbitrary latent space, can handle more diverse types of images.