Video-LLaMA

Video understanding model

An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

GitHub

3k stars

33 watching

265 forks

Language: Python

last commit: over 1 year ago

blip2cross-modal-pretraininglarge-language-modelsllamaminigpt4multi-modal-chatgptvideo-language-pretrainingvision-language-pretraining

Related projects:

Repository	Description	Stars
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
llava-vl/llava-next	Develops large multimodal models for various computer vision tasks including image and video analysis	3,099
facico/chinese-vicuna	An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment.	4,152
pku-yuangroup/video-llava	A deep learning framework for generating videos from text inputs and visual features.	3,071
damo-nlp-sg/videollama2	An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing.	957
meta-llama/llama3	Provides pre-trained and instruction-tuned Llama 3 language models and tools for loading and running inference	27,527
hiyouga/llama-factory	A tool for efficiently fine-tuning large language models across multiple architectures and methods.	36,219
scisharp/llamasharp	An efficient C#/.NET library for running Large Language Models (LLMs) on local devices	2,750
openlmlab/openchinesellama	An incremental pre-trained Chinese large language model based on the LLaMA-7B model	234
dvlab-research/mgm	An open-source framework for training large language models with vision capabilities.	3,229
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
meta-llama/llama	A collection of tools and utilities for deploying, fine-tuning, and utilizing large language models.	56,832
damo-nlp-sg/llm-zoo	A collection of information about various large language models used in natural language processing	272