In the world of AI, where creativity often meets technology, Runway, a major player in video-generating AI, has teamed up with Getty Images, a giant in the realm of stock images and videos. They're crafting a new AI model, dubbed the Runway <> Getty Images Model (RGM), which might not win any awards for its name but holds promise in its function.
The RGM aims to serve as a foundation for companies to create customized models for generating video content. Think of it as a starting point, a canvas ready to be painted on with the unique colors of a company's needs. Runway's enterprise clients can tweak the RGM with their specific data, shaping it to their liking.
And yes, there's a nod to OpenAI's GPT models here. Runway isn't just playing in the AI sandbox; they're aiming to be the sandbox, especially in the video AI playground. They envision a future where their AI could even be used in Hollywood or broadcasting, crafting videos that are indistinguishable from real life. This ambition suggests that RGM might leapfrog over the limitations of Runway's current Gen-2 model, stepping into an era of creating truly lifelike videos.
Don’t Hold Your Breath for Realistic Videos from AIs
In a recent experiment, I tested the capabilities of Runway Gen-2, a leading video-generative AI model, to see if it could create DeepFake content that's photorealistic. The outcome? It fell short of expectations. The generated videos were noticeably lacking in realism, a clear sign that the hype was, well, just hype. Take a look for yourself if you're skeptical.
The crux of the problem with video-generative AI, including RGM, is its lack of understanding of basic physical and biological principles. To create lifelike movement in videos, an AI must grasp concepts like gravity, inertia, and the limitations of biological motion – think of how joints move or how different objects interact in a physical space. This complexity isn't just a small hurdle; it's a significant barrier to realism.
Here's an example of a video generated by Runway Gen-2.
I've previously pointed out that the future of video-generative AI likely lies in processing 3D data. This data includes structured information about objects, their dimensions, and how they relate to each other spatially. Unfortunately, Getty Images doesn't have a 3D data repository. Yet, Runway seems to be aware of the importance of 3D data. They have a 3D Capture feature in beta, allowing users to upload 360-degree videos of objects or scenes, which Runway then renders in 3D. This feature doesn't quite solve the AI's understanding of physics or biomechanics, but it's a step toward more realistic video generation.
For instance, here's a video created using Runway Gen-2 3D of an old Yashica camera. You can view the camera from any angle, and the quality of this video is noticeably better than the previous ones.
While Runway's RGM model might expedite the reshooting process in filmmaking, it's not quite ready to generate video content from scratch. The path to achieving that level of sophistication remains a challenging one.
More Multimodal Inputs and the Right Partners Can Cement Runway as the Definitive Leader
Runway's Gen-2 3D video generation, while not as sleekly combined as something like ChatGPT 4 paired with DALL-E, still represents a blend of different types of input leading to a single type of output. This is a bit like a multi-ingredient stew resulting in one flavor, albeit not as complex as the latest GPT-4. GPT-4 is an AI model that really embraces the multimodal approach, handling a variety of input and output formats – think Excel, PowerPoint, images, Word, PDFs, and the like. In contrast, most current video-generative AIs are a bit more limited, mainly accepting inputs in three forms: video, images, and text, and outputting in just two: video and images. The diagram below should clarify this a bit more.
But the potential future for video-generative AIs? It’s a bit more ambitious. Imagine a system where adding 3D data and motion capture files (like BVH files) becomes the norm. This would elevate the realism of generated videos significantly. The ideal endgame for these AI models is a deep understanding of physics, motion, biomechanics, and dimensionality, allowing them to create motion capture or 3D animation files from scratch. That's when they'll become truly formidable tools for filmmakers, producers, and videographers.
Getting there, though, is a colossal task. It requires a vast trove of visual data. Runway's partnership with Getty Images is a step in the right direction for current needs, but looking ahead, Getty's repository might fall short of the more advanced requirements. To reach this envisioned future, video-generative AI companies should consider branching out to collaborate with entities specializing in 3D and motion-capture data.
Firms like Bandai Namco, with its extensive collection of motion-capture data, or TurboSquid, known for high-quality 3D models, could be invaluable partners. Currently, these 3D data sources are more niche, catering primarily to professionals, not the general public. This makes them smaller but no less crucial. The first video-generative AI company to successfully build a network of 3D data partnerships could well become the leader in this evolving field. Will Runway take this path and maintain its position atop the video-generative AI hill? Time will tell.
While I am excited for the prospect of realistic video AI generation, I also feel that their development might still face significant challenges. With the introduction of 3D data and motion capture in order to create realistic 3D objects, it seems that the already bottlenecked training phase for the AI model will be further slowed.