Video-LLaMA

Video understanding model

An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

GitHub

3k stars
32 watching
260 forks
Language: Python
last commit: 6 months ago
blip2cross-modal-pretraininglarge-language-modelsllamaminigpt4multi-modal-chatgptvideo-language-pretrainingvision-language-pretraining

Related projects:

Repository Description Stars
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
llava-vl/llava-next Develops large multimodal models for various computer vision tasks including image and video analysis 2,872
facico/chinese-vicuna An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment. 4,142
pku-yuangroup/video-llava This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. 2,990
damo-nlp-sg/videollama2 An audio-visual language model designed to understand and generate video content 871
meta-llama/llama3 Provides pre-trained and instruction-tuned Llama 3 language models and tools for loading and running inference 27,138
hiyouga/llama-factory A unified platform for fine-tuning multiple large language models with various training approaches and methods 34,436
scisharp/llamasharp A C#/.NET library to efficiently run Large Language Models (LLMs) on local devices 2,673
openlmlab/openchinesellama An incremental pre-trained Chinese large language model based on the LLaMA-7B model 234
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,211
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 733
meta-llama/llama A collection of tools and utilities for deploying, fine-tuning, and utilizing large language models. 56,437
damo-nlp-sg/llm-zoo A collection of information about various large language models used in natural language processing 272