I can’t believe I somehow missed when OpenAI introduced DALL-E in January 2021 – a neural network that could “generate images from text descriptions” -- so I’m sure not going to miss now that OpenAI has unveiled DALL-E 2. As they describe it, “DALL-E 2 is a new AI system that can create realistic images and art from a description in natural language." The name, by the way, is a playful combination of the animated robot WALL-E and the idiosyncratic artist Salvator Dali.
Credit: DALL-E 2/OpenAI
This is not your father’s AI. If you think it’s just about art, think
again. If you think it doesn’t matter for
healthcare, well, you’ve been warned.
Here’s further descriptions of what OpenAI is claiming:
- “DALL·E 2 can create original, realistic images and art from a text description. It can combine concepts, attributes, and styles.
- DALL·E 2 can make realistic edits to existing images from a natural language caption. It can add and remove elements while taking shadows, reflections, and textures into account.
- DALL·E 2 can take an image and create different variations of it inspired by the original.”
Here’s their video:
I’ll leave it to others to explain exactly how it does
all that, aside from saying it uses a process called diffusion, “which starts
with a pattern of random dots and gradually alters that pattern towards an
image when it recognizes specific aspects of that image.” The end result is that, relative to
DALL-E, DALL-E 2 “generates more realistic and accurate images with 4x greater
resolution.”
Devin Coldeway, writing in TechCrunch, marvels:
It’s hard to overstate the quality of these images compared with other generators I’ve seen. Although there are almost always the kinds of “tells” you expect from AI-generated imagery, they’re less obvious and the rest of the image is way better than the best generated by others.
OK, it’s true that DALL-E isn’t coming up with the
ideas for art on its own, but it is creating never-seen-before images, like a
koala bear dunking or Mona Lisa with a mohawk.
If that’s not AI being creative, it’s close.
-----------
Sam Altman, OpenAI’s CEO, had a blog post with several interesting
thoughts about DALL-E 2. He starts out
by saying: “For me, it’s the most delightful thing to play with we’ve created
so far. I find it to be creativity-enhancing, helpful for many different
situations, and fun in a way I haven’t felt from technology in a while.” I’m a big believer in Seven Johnson’s maxim
that the future is where people are having the most fun, so that really hit
home for me.
Mr. Altman outlines six things he believes are noteworthy
about DALL-E 2:
- “This is another example of what I think is going to be a new computer interface trend: you say what you want in natural language or with contextual clues, and the computer does it.
- It sure does seem to “understand” concepts at many levels and how they relate to each other in sophisticated ways.
- Although I firmly believe AI will create lots of new jobs, and make many existing jobs much better by doing the boring bits well, I think it’s important to be honest that it’s increasingly going to make some jobs not very relevant (like technology frequently does)
- A decade ago, the conventional wisdom was that AI would first impact physical labor, and then cognitive labor, and then maybe someday it could do creative work. It now looks like it’s going to go in the opposite order.
- It’s an example of a world in which good ideas are the limit for what we can do, not specific skills.
- Although the upsides are great, the model is powerful enough that it's easy to imagine the downsides.”
On that last point, OpenAI has sharply restricted what
images DALL-E has been trained on, who has access to it, watermarks each images
it generates, reviews all images generated, and restricts the use of real
individuals’ faces. They recognize the
potential for abuse. Oren Etzioni, chief executive of the Allen Institute for AI, warned
The New York Times: “There is already disinformation online, but the
worry is that this scale disinformation to new levels.”
Mr. Altman indicated that there might be a product launch
this summer, with broader access, but Mira Murati, OpenAI’s head of research, was
firm: “This is not a product. The
idea is understand capabilities and limitations and give us the opportunity to
build in mitigation.”
OpenAI algorithms researcher Prafulla Dhariwal told
Fast Company: “Vision and language are both key parts of human
intelligence; building models like DALL-E 2 connects these two domains. It’s a
very important step for us as we try to teach machines to perceive the world
the way humans do, and then eventually develop general intelligence.”Credit: OpenAI
As their video says. “DALL-E helps humans understand
how advanced AI systems see and understand our world.”
------------
I don’t have any artistic skill whatsoever, but, as
Mr. Altman suggested, we’re building towards “a world in which good ideas are
the limit for what we can do, not specific skills.” In that world, as Mr.
Altman also suggested, AI may do creative and cognitive work before physical
labor. We’ve already met Ai-Da, a an AI-driven “robot
artist,” and we’re going to see other examples of creative AI.
OpenAI already has OpenAI Codex, an “AI system
that can convert natural language to code.” There are AI tools that can write, including one
powered by OpenAI, and ones that can compose music.
And, of course, Google has
a host of AI initiatives specifically oriented towards health.
Healthcare in general, and the practice of medicine in
particular, has long been seen as a uniquely human endeavor. Its practitioners claim it is a blend of art
and science, not easily reducible to computer code. If healthcare is finally acknowledging that AI
is good at, say, recognizing
radiology images, it purports that is still a long way from diagnosing patients
with their complex situations, much less advising or comforting them.
Perhaps we should ask DALL-E 2 to draw them a picture of
what that might look like.
No comments:
Post a Comment