VideoLLaMA2
Video generator
An audio-visual language model designed to understand and generate video content
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
871 stars
10 watching
60 forks
Language: Python
last commit: 8 days ago Related projects:
Repository | Description | Stars |
---|---|---|
showlab/vlog | Transforms video content into a long document containing visual and audio information that can be used for chat or other applications. | 538 |
aspiers/ly2video | Converts music represented by a GNU LilyPond file into a video containing a horizontally scrolling music staff synchronized with audio rendering. | 158 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 733 |
dcdmllm/momentor | A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling | 54 |
showlab/show-1 | This project enables text-to-video generation by combining pixel and latent diffusion models | 1,103 |
damo-nlp-sg/llm-zoo | A collection of information about various large language models used in natural language processing | 272 |
mbzuai-oryx/video-chatgpt | A video conversation model that generates meaningful conversations about videos using large vision and language models | 1,213 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 269 |
singularity42/vgan-tensorflow | An implementation of a deep learning model to generate videos with dynamic scenes | 15 |
nus-hpc-ai-lab/videosys | A toolkit for high-performance video generation and processing using deep learning techniques | 1,773 |
rupertluo/valley | An offline video assistant system powered by large language models and computer vision techniques. | 211 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |
boheumd/ma-lmm | This project develops an AI model for long-term video understanding | 244 |
bryandlee/tune-a-video | Unofficial implementation of a deep learning model to generate or modify video content | 191 |
radi-cho/datasetgpt | A command-line interface to generate textual datasets with Large Language Models | 293 |