CogVideo

Video generator

Generates videos from text and images using large language models

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

GitHub

9k stars
127 watching
859 forks
Language: Python
last commit: 7 days ago
Linked from 1 awesome list

cogvideoximage-to-videollmsoratext-to-videovideo-generation

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
thudm/cogvlm Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. 6,080
doubiiu/dynamicrafter This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. 2,596
thudm/cogview A framework for generating images from text using transformers. 1,723
doubiiu/tooncrafter Generates cartoon-style videos from two images using pre-trained diffusion models 5,372
tachibanayoshino/animeganv2 A tool that generates anime styles from landscape photos and videos using deep learning techniques. 5,102
coqui-ai/tts A deep learning toolkit for generating human-like speech from text 35,453
thudm/codegeex A multilingual code generation model with pre-trained weights on 20 programming languages and capabilities for generating executable programs and translating between different languages. 8,240
facebookresearch/co-tracker A model for tracking any point on a video using transformer-based architecture and optical flow benefits 3,850
threestudio-project/threestudio A unified framework for 3D content generation using Python and various machine learning-based algorithms. 6,319
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,422
internlm/internlm-xcomposer A large vision language model that can understand and generate text from visual inputs, with capabilities for long-contextual input and output, high-resolution understanding, fine-grained video understanding, and multi-turn multi-image dialogue. 2,521
nvidia/vid2vid A PyTorch implementation of a video-to-video translation method for generating photorealistic videos from semantic label maps or other input data. 8,607
coqui-ai/stt A toolkit for building and deploying speech-to-text models using deep learning techniques 2,283
cumulo-autumn/streamdiffusion A pipeline-level solution for real-time interactive image generation using diffusion-based techniques 9,736
clovaai/stargan-v2 A Python implementation of an image-to-image translation model for generating diverse images across multiple domains. 3,506