Chinese tech giant Tencent has announced HunyuanVideo, an advanced artificial intelligence model for video generation published as open source. For the first time, the output code and weights of an AI model with such capabilities are available to everyone.

Image source: Tencent

HunyuanVideo, according to Tencent, is capable of generating videos at the level of the world’s leading closed-source systems – these videos are distinguished by high picture quality, a variety of object movements in the frame, the ability to synchronize visual and audio, as well as stability of generation. This is the largest model for video generation – it has 13 billion parameters. The HunyuanVideo package includes a framework with tools for data management; tools for collaborative training of models working with images and video; and infrastructure to support large-scale model training and execution.

Tencent tested the model with the support of the professional community, which found that HunyuanVideo is superior in quality to the closed projects Runway Gen-3 and Luma 1.6. To achieve this result, the developer turned to a hybrid Dual-stream to Single-stream transmission architecture. At the initial stage, video and text tokens are processed independently by several blocks of the transformer model, due to which data of different formats is converted without interference. During the single stream stage, video and text tokens are passed to subsequent transformer blocks, enabling efficient fusion of multimodal data. This allows the complex relationships between visual and semantic information to be captured, and the overall performance of the model improves.

With the release of HunyuanVideo, Tencent has taken a significant step towards democratizing video creation technology using AI. Thanks to its open source code, the model is capable of revolutionizing the video generation ecosystem.

Leave a Reply

Your email address will not be published. Required fields are marked *