 Video-LLaMA
 Video-LLaMA 
 Video understanding model
 An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
3k stars
 33 watching
 265 forks
 
Language: Python 
last commit: over 1 year ago   blip2cross-modal-pretraininglarge-language-modelsllamaminigpt4multi-modal-chatgptvideo-language-pretrainingvision-language-pretraining 
 Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 | 
|  | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 | 
|  | A system that uses large language and vision models to generate and process visual instructions | 20,683 | 
|  | Develops large multimodal models for various computer vision tasks including image and video analysis | 3,099 | 
|  | An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment. | 4,152 | 
|  | A deep learning framework for generating videos from text inputs and visual features. | 3,071 | 
|  | An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing. | 957 | 
|  | Provides pre-trained and instruction-tuned Llama 3 language models and tools for loading and running inference | 27,527 | 
|  | A tool for efficiently fine-tuning large language models across multiple architectures and methods. | 36,219 | 
|  | An efficient C#/.NET library for running Large Language Models (LLMs) on local devices | 2,750 | 
|  | An incremental pre-trained Chinese large language model based on the LLaMA-7B model | 234 | 
|  | An open-source framework for training large language models with vision capabilities. | 3,229 | 
|  | An image-based language model that uses large language models to generate visual and text features from videos | 748 | 
|  | A collection of tools and utilities for deploying, fine-tuning, and utilizing large language models. | 56,832 | 
|  | A collection of information about various large language models used in natural language processing | 272 |