Stability AI introduced the generator 4D-video Stable Video 4D

Jul 24, 2024

With the popularity of generative neural networks, many AI algorithms for video creation are already available, such as Sora, Haiper and Luma AI. The developers from Stability AI have introduced something completely new. We are talking about the Stable Video 4D neural network, which is based on the existing Stable Video Diffusion model, which allows you to convert images into video. The new tool takes this concept further by creating multiple videos from 8 different perspectives from the resulting video data.

Stable Diffusion 3

«We believe that Stable Video 4D will be used in filmmaking, gaming, AR/VR and other areas where there is a need to view dynamically moving 3D objects from arbitrary angles,” says Varun Jampani, head of 3D research at Stability AI Jampani).

This isn’t the first time Stability AI has gone beyond 2D video generation. In March, the company announced the Stable Video 3D algorithm, which allows users to create short 3D videos based on an image or text description. With the launch of Stable Video 4D, the company takes a significant step forward. While the concept of 3D or three dimensions is usually understood as a type of image or video with depth, 4D does not add another dimension. Actually 4D includes width (x), height (y), depth (z) and time (t). This means that Stable Video 4D allows you to view moving 3D objects from different viewpoints and at different points in time.

«The key aspects that enabled us to create Stable Video 4D were that we combined the strengths of our previously released Stable Video Diffusion and Stable Video 3D models, and enhanced them with a carefully curated dataset of dynamically moving 3D objects,” explained Jampani. He also added that Stable Video 4D is the first of its kind algorithm in which a single neural network performs image synthesis and video generation. Already existing analogues use separate neural networks to solve these problems.

«Stable Video 4D completely synthesizes eight new videos from scratch, using the input video as a guide. There is no explicit transfer of information about pixels from input to output; all this transfer of information is carried out implicitly by the neural network,” Jampani added. He added that currently Stable Video 4D can handle video of a single object lasting several seconds with a simple background. In the future, the developers plan to improve the algorithm so that it can be used to process more complex videos.