VideoLLaMA2

Video processor

An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing.

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

GitHub

957 stars

11 watching

62 forks

Language: Python

last commit: about 1 year ago

Related projects:

Repository	Description	Stars
showlab/vlog	Transforms video content into a long document containing visual and audio information that can be used for chat or other applications.	545
aspiers/ly2video	Converts music represented by a GNU LilyPond file into a video containing a horizontally scrolling music staff synchronized with audio rendering.	158
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
dcdmllm/momentor	A video Large Language Model designed for fine-grained comprehension and localization in videos with a custom Temporal Perception Module for improved temporal modeling	58
showlab/show-1	This project enables text-to-video generation using a combination of pixel and latent diffusion models.	1,110
damo-nlp-sg/llm-zoo	A collection of information about various large language models used in natural language processing	272
mbzuai-oryx/video-chatgpt	A video conversation model that generates meaningful conversations about videos using large vision and language models	1,246
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
singularity42/vgan-tensorflow	An implementation of a deep learning model to generate videos with dynamic scenes	15
nus-hpc-ai-lab/videosys	A comprehensive toolkit for high-performance video generation and processing	1,819
rupertluo/valley	An offline video assistant system powered by large language models and computer vision techniques.	210
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
boheumd/ma-lmm	This project develops an AI model for long-term video understanding	254
bryandlee/tune-a-video	Unofficial implementation of a deep learning model to generate or modify video content	191
radi-cho/datasetgpt	A command-line interface to generate textual datasets with Large Language Models	293