A new AI system can create a variety of images both realistic and surreal from natural language descriptions. It has the potential to be game changing, but not without some concerns.
Algorithms and AI continue to make waves across the art industry.
The latest breakthrough comes from a research team called OpenAI, who has just unveiled a new version of its DALL-E program. This software is a text-to-image generation tool that produces artwork based on a user-inputted description.
View this post on Instagram
Aptly called DALL-E 2, this new AI won’t be available to the public, but researchers can sign up to preview its capabilities. OpenAI says it plans to make its software available for use in third-party apps eventually – though no word on when this will happen.
For now, the program will be tested by vetted partners.
Users aren’t allowed to upload generated images that may ‘cause harm’, and must disclose what they’re using the AI for.
How does DALL-E 2 work?
While I am by no means a coding expert, I can confirm that DALL-E 2 operates using a pre-built image library. The algorithm is given a wealth of tagged pictures and then creates new artwork based on what it already knows.
Say you wanted to create an image of a tiger on a canoe. Weird, right? But DALL-E 2 will search its files to find what a ‘canoe’ and a ‘tiger’ both look like, and create a single piece that convincingly combines both.
View this post on Instagram
DALL-E 2 builds on the first iteration’s CLIP, a computer vision system. OpenAI says that this new software generates images using ‘diffusion’, whereby a piece begins as a few dots and is gradually filled in with details.
This process happens via a two-stage model. CLIP first matches your text to other existing photographs and images, then a ‘decoder’ generates the picture itself.
The above video gives a brief demonstration of what’s possible, showing off AI-generated cats, realistically edited versions of pre-existing images, and a complex system of object labelling that allows DALL-E 2 to understand your prompts. It’s truly impressive stuff.
Interestingly, OpenAI stresses that there are still errors and issues to iron out.
View this post on Instagram
Objects that are mislabelled could cause the algorithm to produce incorrect pictures that do not line up with the text description provided. If within its coding it has a pre-existing photo of a car labelled as a ‘plane’, for example, then this could lead the generator completely off course, sending back a BMW instead of a Boeing.
In addition, very specific prompts aren’t possible until the AI has labelled and learned what the relevant objects are.
Asking for a town or rare species of animal may result in wonky, incorrect images until the algorithm has been improved. Keep in mind this is only the second iteration of DALL-E, so we’ll no doubt see even more mind-bending demos in the future.