Video-LLaMA

Video understanding model

An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

GitHub

3k stars
33 watching
265 forks
Language: Python
last commit: 8 months ago
blip2cross-modal-pretraininglarge-language-modelsllamaminigpt4multi-modal-chatgptvideo-language-pretrainingvision-language-pretraining

Related projects:

Repository Description Stars
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
llava-vl/llava-next Develops large multimodal models for various computer vision tasks including image and video analysis 3,099
facico/chinese-vicuna An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment. 4,152
pku-yuangroup/video-llava A deep learning framework for generating videos from text inputs and visual features. 3,071
damo-nlp-sg/videollama2 An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing. 957
meta-llama/llama3 Provides pre-trained and instruction-tuned Llama 3 language models and tools for loading and running inference 27,527
hiyouga/llama-factory A tool for efficiently fine-tuning large language models across multiple architectures and methods. 36,219
scisharp/llamasharp An efficient C#/.NET library for running Large Language Models (LLMs) on local devices 2,750
openlmlab/openchinesellama An incremental pre-trained Chinese large language model based on the LLaMA-7B model 234
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,229
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 748
meta-llama/llama A collection of tools and utilities for deploying, fine-tuning, and utilizing large language models. 56,832
damo-nlp-sg/llm-zoo A collection of information about various large language models used in natural language processing 272