AI-NEWS · 2024年 8月 28日

Higher Quality, Better Visual Effects! Zhipu Open Source CogVideoX-5B Video Generation Model

Higher Quality, Better Visual Effects! Zhipu Open Source CogVideoX-5B Video Generation Model

The ModelScope community recently announced the official open-source release of a larger version of its domestic open-source Sora video generation model, CogVideoX-5B.

Compared to the previous CogVideoX-2B, the new model has significantly improved the quality and visual effects of video generation.

WeChat Screenshot_20240828081448.png

CogVideoX-5B is a large-scale DiT (diffusion transformer) model designed specifically for text-to-video generation tasks. The model employs a 3D causal variational autoencoder (3D causal VAE) and expert Transformer technology, combining text and video embeddings, using 3D-RoPE for positional encoding, and leveraging a 3D full attention mechanism for spatiotemporal joint modeling.

Additionally, the model incorporates progressive training techniques, enabling the generation of high-quality videos with significant motion characteristics, coherence, and extended duration.

Model Link:

https://modelscope.cn/models/ZhipuAI/CogVideoX-5b

© Copyright AIbase Base 2024, Click to View Source – https://www.aibase.com/news/11318

Source:https://www.aibase.com/news/11318