Video-LLaMA
Video understanding model
An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
3k stars
32 watching
260 forks
Language: Python
last commit: 6 months ago blip2cross-modal-pretraininglarge-language-modelsllamaminigpt4multi-modal-chatgptvideo-language-pretrainingvision-language-pretraining
Related projects:
Repository | Description | Stars |
---|---|---|
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
llava-vl/llava-next | Develops large multimodal models for various computer vision tasks including image and video analysis | 2,872 |
facico/chinese-vicuna | An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment. | 4,142 |
pku-yuangroup/video-llava | This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. | 2,990 |
damo-nlp-sg/videollama2 | An audio-visual language model designed to understand and generate video content | 871 |
meta-llama/llama3 | Provides pre-trained and instruction-tuned Llama 3 language models and tools for loading and running inference | 27,138 |
hiyouga/llama-factory | A unified platform for fine-tuning multiple large language models with various training approaches and methods | 34,436 |
scisharp/llamasharp | A C#/.NET library to efficiently run Large Language Models (LLMs) on local devices | 2,673 |
openlmlab/openchinesellama | An incremental pre-trained Chinese large language model based on the LLaMA-7B model | 234 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,211 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 733 |
meta-llama/llama | A collection of tools and utilities for deploying, fine-tuning, and utilizing large language models. | 56,437 |
damo-nlp-sg/llm-zoo | A collection of information about various large language models used in natural language processing | 272 |