VideoLLaMA2
Video processor
An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing.
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
957 stars
11 watching
62 forks
Language: Python
last commit: about 2 months ago Related projects:
Repository | Description | Stars |
---|---|---|
showlab/vlog | Transforms video content into a long document containing visual and audio information that can be used for chat or other applications. | 545 |
aspiers/ly2video | Converts music represented by a GNU LilyPond file into a video containing a horizontally scrolling music staff synchronized with audio rendering. | 158 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 748 |
dcdmllm/momentor | A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling | 58 |
showlab/show-1 | This project enables text-to-video generation using a combination of pixel and latent diffusion models. | 1,110 |
damo-nlp-sg/llm-zoo | A collection of information about various large language models used in natural language processing | 272 |
mbzuai-oryx/video-chatgpt | A video conversation model that generates meaningful conversations about videos using large vision and language models | 1,246 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
singularity42/vgan-tensorflow | An implementation of a deep learning model to generate videos with dynamic scenes | 15 |
nus-hpc-ai-lab/videosys | A comprehensive toolkit for high-performance video generation and processing | 1,819 |
rupertluo/valley | An offline video assistant system powered by large language models and computer vision techniques. | 210 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |
boheumd/ma-lmm | This project develops an AI model for long-term video understanding | 254 |
bryandlee/tune-a-video | Unofficial implementation of a deep learning model to generate or modify video content | 191 |
radi-cho/datasetgpt | A command-line interface to generate textual datasets with Large Language Models | 293 |