In art history, perspective took a long time to master for painters.
Across prehistory, antiquity, and the Middle Ages, scenes were formed using mere guesswork, ignoring conventions of depth and space. It wasn’t until the 1410s – during the Italian Renaissance when architect Filippo Brunelleschi painted the Florentine streets – when linear perspective was introduced into the mainstream artworld.
This is an integral part of artistic composition, with painters and photographers studying methods to obtain the most accurate, aesthetically pleasing results.
However, with DragGAN, even the most inexperienced of rookies can now articulate their vision through spatial manipulation.
Demo credit: DragGAN / Compiled by @_akhaliq on Twitter
Created by researchers from the Max Planck Institute for Informatics, Google, MIT CSAIL, and Saarbrücken Research Center, the program allows users to shift objects in photos and illustrations to warp perspectives.
This is done by plotting “handle” (red) points as well as “target” (blue) points on a 2D image and simply dragging on the handle pointsto spatially morph the image.
DragGAN supports a range of subject matter, including landscapes, humans, animals, and cars.
For portrait editing, it also allows users to change facial expressions, haircuts, poses, and lighting. Additionally, users can isolate areas in the picture using a masking tool, leaving the rest undisturbed.
As hinted at in its name, the tech uses generative adversarial networks (GANs), which can be used to generate realistic images, text, music, videos, and 3D objects from scarce data.
GANs work by pitting two neural networks – a “generator” and a “discriminator” – against each other. The generator is trained on purely real data and taught to generate new data similar to what it has learned. On the other hand, the discriminator is fed both real and generated data, and is taught to differentiate between the two. As both carry out their tasks, they become more sophisticated, leading to the generator creating output that is computationally indistinguishable from real data.
Photo credit: DragGAN
Initially introduced by Goodfellow et. al in 2014, the algorithm has since been used to power popular applications such as text-to-image generators DALL-E 2 and Midjourney. With this project, Google has yet another notch on its belt in its AI race against Microsoft or even Adobe, with its new Firefly offering.
Just as with any generative AI tool, GAN-based programs such as DragGAN can be a powerful means for disinformation by creating realistic deepfakes, whether it be through false audio, photography, or videos.
Regardless, this has opened up a whole host of possibilities for photographers, editors, marketers, and novice artists. DragGAN can even be used as a companion app for AI-generated images, which can be more simply manipulated without typing in extra keywords.
Photoshoots will also become more efficient, as quick fixes can be made in the post-production process. Unlike Photoshop, which requires a fair bit of knowledge and experience to master, DragGAN is intuitive. It empowers individuals to easily create professional output through its more sophisticated yet accessible interface.
Currently, only a research paper is available for the project, and DragGAN is not yet available for public use in its final form.
This was published as a part of AI Odyssey, a section on generative AI developments featured in Tech in Asia’s emerging tech newsletter.
Delivered every Tuesday via email and through the Tech in Asia website, this free newsletter breaks down the biggest stories and trends in emerging tech. If you’re not a subscriber, get access by registering here.