VideoLLaMA2

Video processor

An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing.

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

GitHub

957 stars
11 watching
62 forks
Language: Python
last commit: about 2 months ago

Related projects:

Repository Description Stars
showlab/vlog Transforms video content into a long document containing visual and audio information that can be used for chat or other applications. 545
aspiers/ly2video Converts music represented by a GNU LilyPond file into a video containing a horizontally scrolling music staff synchronized with audio rendering. 158
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 748
dcdmllm/momentor A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling 58
showlab/show-1 This project enables text-to-video generation using a combination of pixel and latent diffusion models. 1,110
damo-nlp-sg/llm-zoo A collection of information about various large language models used in natural language processing 272
mbzuai-oryx/video-chatgpt A video conversation model that generates meaningful conversations about videos using large vision and language models 1,246
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 270
singularity42/vgan-tensorflow An implementation of a deep learning model to generate videos with dynamic scenes 15
nus-hpc-ai-lab/videosys A comprehensive toolkit for high-performance video generation and processing 1,819
rupertluo/valley An offline video assistant system powered by large language models and computer vision techniques. 210
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,336
boheumd/ma-lmm This project develops an AI model for long-term video understanding 254
bryandlee/tune-a-video Unofficial implementation of a deep learning model to generate or modify video content 191
radi-cho/datasetgpt A command-line interface to generate textual datasets with Large Language Models 293