Video-LLaMA
Video understanding model
An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
3k stars
33 watching
265 forks
Language: Python
last commit: 8 months ago blip2cross-modal-pretraininglarge-language-modelsllamaminigpt4multi-modal-chatgptvideo-language-pretrainingvision-language-pretraining
Related projects:
Repository | Description | Stars |
---|---|---|
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,683 |
llava-vl/llava-next | Develops large multimodal models for various computer vision tasks including image and video analysis | 3,099 |
facico/chinese-vicuna | An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment. | 4,152 |
pku-yuangroup/video-llava | A deep learning framework for generating videos from text inputs and visual features. | 3,071 |
damo-nlp-sg/videollama2 | An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing. | 957 |
meta-llama/llama3 | Provides pre-trained and instruction-tuned Llama 3 language models and tools for loading and running inference | 27,527 |
hiyouga/llama-factory | A tool for efficiently fine-tuning large language models across multiple architectures and methods. | 36,219 |
scisharp/llamasharp | An efficient C#/.NET library for running Large Language Models (LLMs) on local devices | 2,750 |
openlmlab/openchinesellama | An incremental pre-trained Chinese large language model based on the LLaMA-7B model | 234 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,229 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 748 |
meta-llama/llama | A collection of tools and utilities for deploying, fine-tuning, and utilizing large language models. | 56,832 |
damo-nlp-sg/llm-zoo | A collection of information about various large language models used in natural language processing | 272 |