Is DALL·E the latest breakthrough in artificial intelligence?
It seems there’s no end to the fascinating innovations coming out in the world of AI. DALL·E, the most recent tool developed by OpenAI, was announced just months after unveiling its groundbreaking GPT-3 technology.
DALL·E is another exciting breakthrough that demonstrates the ability to turn words into images. As a natural extension of GPT-3, DALL·E takes pieces of text and generates images rather than words in response.
In this episode of Short and Sweet AI, I discuss DALL·E in more detail, how it differs from GPT-3, and how it was developed.
In this episode, find out:
What DALL·E is
How DALL·E can generate images from words
What unintended yet useful behaviors DALL·E can produce
Hello to you who are curious about AI. I’m Dr. Peper and today I’m talking about DALL·E.
In a previous episode, I highlighted a new type of AI tool called GPT-3. GPT-3 is a machine learning language model trained on a trillion words that generates poetry, stories, even computer code. Within months of announcing GPT-3, OpenAI released DALL·E. DALL·E is not just another breathtaking breakthrough in AI technology. It represents the ability, by a machine, to manipulate visual concepts through language.
DALL·E is a combination of the surrealist artist Salvador Dali and the animated robot Wall-E. What it does is simple but also revolutionary. It’s a natural extension of GPT-3. The AI system was trained with a combination of the 13 billion features of GPT-3 added to a dataset of 12 billion images.
DALL·E takes text prompts and responds not with words but images. If you give the system the text prompt, “an armchair in the shape of an avocado” it generates an image to match it. It’s a text-to-image technology that’s very powerful. It gives you the ability to create an image of what you want to see with language because DALL·E isn’t recognizing images, it draws them. And by the way, I would buy one of those avocado chairs if they existed.
You can visit OpenAI’s website and play with images generated by this astounding technology: a radish in a tutu walking a dog, a robot giraffe, a spaghetti knight. The images are from the real world or are things that don’t exist, like a cube of clouds.
How does It Work?
Text-to-image algorithms aren’t new but have been limited to things such as birds and flowers or other unsophisticated images. DALL·E is significantly different from others that have come before because it uses the GPT-3 neural network to train on text plus images.
DALL·E uses the language and understanding provided by GPT-3 and its own underlying structure to create an image prompted by a text. Each time it generates a large set of images. Then another machine learning algorithm called CLIP ranks the images and determines which pictures best match the text. As a result, the illustrations are much more coherent and reflect a blend of more complex concepts. This is what makes DALLE the most realistic text-to-image system ever produced.
Unintended But Useful Behaviors
DALL·E also demonstrates another example of “zero-shot visual reasoning.” Zero-shot learning or ZSL, is the ability of models to perform tasks that they weren’t specifically trained to do. They’re unintended but useful behaviors.
In the case of GPT-3, it can write computer code even though it wasn’t trained to do coding. DALL·E “learned” to generate images from captions or if given the right text prompt it can transform images into sketches. Another task it wasn’t specifically trained to do was to design custom text on street signs. Essentially DALL·E can behave as a Photoshop filter.
It also shows an understanding of visual concepts. It can, in a sense, answer questions visually. When given hidden patterns and prompted to solve an uncompleted grid with images to match, DALL·E was able to fill in the grid with matching patterns without being given any prompts.
Creativity is a Measure of Intelligence
Experts agree language grounded in visual understanding like DALL·E makes AI smarter. This machine learning system has the ability to take two unrelated concepts such as an armchair and an avocado and put them together in a coherent, new way. This is stunning because the ability to coherently blend concepts and use them in a new way is key to creativity. In essence, the machine stores information about our world to use and generalize in a very human-like way. And in the AI world, creativity is one measure of intelligence. So, is this how machine intelligence becomes human-like intelligence?
Thanks for listening, I hope you found this helpful. Be curious and if you like this episode, please leave a review and subscribe because then you’ll receive these episodes weekly. From Short and Sweet AI, I’m Dr. Peper.