Remember when Hayao Miyazaki called an AI-created animation “an insult to life itself?”
Researchers from Nvidia have created an image translation AI that will almost certainly have you second-guessing everything you see online. The system can change day into night, winter into summer, and house cats into cheetahs with minimal training materials.
What they’re doing with generative adversarial networks these days is insane. Watching them in action is the first time that I’ve felt jobs are at risk. I could imagine the possibility and talk of how future AI and robotics will lead to such an age, but until I discovered GANs, I never really had an idea for how it would happen. And the craziest thing is that, despite all the reassurances we’ve been giving ourselves about how robots will only do the jobs we don’t want to and that the future will be filled with artists, it’s the _creative_ jobs that might be going away first.
If I could have the chance to coin a term, “media synthesis” sounds good. I mean it sounds dystopian, but there are amazing possibilities as well.
There’s a slew of news stories about media synthesis coming out in the past few days.
And there’s also these older ones (some dating back to 2014!)
A year-old video talking about image synthesis:
And a more recent one, this one from DeepMind:
GANs can generate gorgeous 1024×1024 images now
These are not images that are plucked from Google via a text-to-image search. The computer is essentially “imagining” these things and people based on images it’s seen before. Of course, it took thousands of hours with ridiculously strong GPUs to do it, but it’s been done.
Oh, and here’s image translation.
Once you realize that AI can democratize the creation of art and entertainment, the possibilities really do become endless— for better and for worse. I choose to focus on the better.
You see, I can’t draw for shit. My level now isn’t much better than a 9-year-old art student, and I’ve not bothered to practice to get any better because I just can’t seem to overcome depth in drawings while my hand seems to not want to make any line look natural. Yet I’ve always imagined making a comic. I’m much more adept at writing and narrative, so if only I didn’t have to worry about drawing— you know, the part that defines comics as comics— I’d be in the clear.
GANs could help me do that. With an algorithm of that sort, I could generate stylized people who look hand drawn, setting them in different poses, generating a panel in a variety of art styles. It’s not the same as one of those filters that takes a picture of a person and makes it look like a cartoon by adding vexel or cel-shading but actually generating an image of a person from scratch, but defying realistic proportions in lieu of cartoon/anime ones.
Right now, I don’t know how possible that is. But the crazy thing is that I don’t think we’ll be waiting long for such a thing.
And it’s certainly not the only aspect of media synthesis that’s on the horizon.
Lyrebird claims it can recreate any voice using just one minute of sample audio
Want realistic-sounding speech without hiring voice actors? There’s an algorithm for that too.
Japanese AI Writes a Novel, Nearly Wins Literary Award
Want an epic, thought-provoking novel or poem but you have virtually no writing skills? There’s an algorithm for that too. And if you’re like me and you prefer to write your own novels/stories, then there’s going to be an algorithm that edits it better than any professional and turns that steaming finger turd into a polished platinum trophy.
And this is from 2016:
Want to create an awesome painting but the best you can do is a shitty doodle in MS Paint? There’s an algorithm for that.
Like a particular genre of music but you can’t find a band making the exact sort of music you’d love? Can’t make music yourself? There’s an algorithm for that.
Need to create a world map for a story or video game? There’s an algorithm for that.
Related to what I mentioned before. Just doodle whatever, and the algorithm will take care of the rest.
OpenAI’s co-founder Greg Brockman thinks 2018 we will see “perfect“ video synthesis from scratch and speech synthesis. I don’t think it’ll be perfect, but definitely near perfect.
All this acts as a base for the overarching trend: a major disruption in the entertainment industry. Algorithms can already generate fake celebrities, so how long before cover models are out of a job? We can create very realistic human voices and it’s only going to get better; once we fine-tune intonation and timbres, voice actors could be out of a job too. The biggest bottleneck towards photorealism and physics-based realism in video games is time and money, because all those gold-plated pixels in the latest AAA game required thousands of dollars each. At some point, you reach diminishing returns based on time and money investment, so why not use algorithms to fill in that gap? If you have no skills at creating a video game, why not use an algorithm to design assets and backgrounds for you? If we get to a point where it’s advanced enough, it could even code the damn game for you.
I hold no delusions about the time frame— very little of this is going to be on your computer within five years. You can use DeepDream and DeepArt and various DL voice synthesis programs, but it’s all still very early in development. There will still be voice actors and animators in 2025. They’ll still be fields you can get into and receive career payment. Comic and manga creators also won’t be replaced anytime soon. If anything, it might take a bit longer for them precisely because of the nature of cartooning. Neural networks today are fantastic at repainting a pre-existing image or using an image it’s seen before to create something new. But so far, it lacks the ability to actually stylize the image. There’s no way to exaggerate features like you’d see in a cartoon. We know networks understand anime eyes, but they don’t seem to be able to create an actual anime character based on images they’ve seen— if you fed a computer 1,000 anime stills and then inputted your own portrait into it, it wouldn’t give you huge eyes or unrealistically sharpened/cutened features— it’d just recolor your portrait to make it toon-shaded. Likewise, I can’t make my friend look like a character from the Simpsons with any algorithm that currently exists. He’d just have crayon-yellow skin and a flesh-colored snout but otherwise wouldn’t actually have his skeletal or muscular structure altered to fit the Simpsons’ distinctive style.
No network today can do that. It might be possible within a couple years to at least get a GAN to approximate it, but it won’t be until the mid-2020s at the earliest that we’ll see “filters” that could change my portrait into an actual cartoon. As of right now, making an algorithm “cartoonify” a person simply means adding vector graphics or cel-shading.
Now that won’t be a problem if you were to use text-to-image synthesis. You could phase out the middleman and go straight to generating new characters from scratch. And in 2018, I bet that we might see the first inklings of this in a very basic way. In a lab, we’ll get a comic created entirely by algorithm.
Input text describing a character— if I had to come up with something, I’d make it simple and just go with “round head with stick figure body”.
Do the same thing for others. Describe the ways their limbs bend. If they have mouths, describe whether or not they’re open. If there are speech bubbles, what do they look like and how big are they? Etc. etc.
Perhaps you could be more daring and feed a network thousands of images from a pre-chosen art style, but I’m being conservative.
Right now, a neural network that can actually make narrative sense is a damn-near impossible thing to create. So if you want to achieve causality and progression in such a story, you’ll still need a human to make sense of it. Thus, this comic will likely be organized by a human even if the images are entirely AI-generated.
The ones that require static images, enhancing motion, or generating limited dynamic media will certainly take off. In ten years, I bet the entire manga industry in Japan will be crashing (the industry over there is so overworked that it wouldn’t take much to cause a crash) and burning while American hold-outs cling bitterly onto canon-centric relevance while all the plebs generate every single disturbing plotline they could imagine with beloved characters.
The early 2020s will be a time of human creativity enhanced by algorithms. A time where you can generate assets, design objects without requiring to hire anyone, and refine written content while still maintaining fully human control over the creative process. I can already see the first major YouTube animation with 1 million+ views that’s mostly animated by a team with AI filling in a lot of the blanks to give it a more professional feel alongside generating the voices for the characters. Dunno when it’ll happen, but it will happen very soon. Much sooner than a lot of people are comfortable with. But don’t expect to type “Make me a comic” and expect to get a full-fledged story out of it. The AI will generate the content for you, but it’s up to you to make sense of it and choose what you think works. So you have to generate each panel yourself, choosing the best ones, choosing good colors, and then you have to organize those panels. The AI won’t do it for you because early ’20s AI will likely still lack common sense and narrative-based logic.
TLDR: researchers are working on/refining algorithms that can generate media, including visual art, music, speech, and written content. In the early 2020s, content creators will be using this to bring to life professionally-done works with as small of teams as possible. It may be possible for a single person with no skills in art, voice acting, or music to put together a long-running comic with voice-acting and original music using only a computer program and their own skills at writing a story. This will likely be the first really major, really public example of automation takin’ teh jerbs.