LLaVA-NeXT
Multimodal model developer
Develops large multimodal models for various computer vision tasks including image and video analysis
3k stars
37 watching
266 forks
Language: Python
last commit: 4 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A system that uses large language and vision models to generate and process visual instructions | 20,683 |
| A deep learning framework for generating videos from text inputs and visual features. | 3,071 |
| An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
| An audio-visual language model designed to understand and respond to video content with improved instruction-following capabilities | 2,842 |
| An open-source framework for training large language models with vision capabilities. | 3,229 |
| An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
| An all-in-one demo for interactive image processing and generation | 353 |
| A platform for training and deploying large language and vision models that can use tools to perform tasks | 717 |
| An efficient C#/.NET library for running Large Language Models (LLMs) on local devices | 2,750 |
| A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
| Provides a unified framework to test generative language models on various evaluation tasks. | 7,200 |
| A tool for efficiently fine-tuning large language models across multiple architectures and methods. | 36,219 |
| A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
| A multimodal large language model series developed by the Qwen team to understand and process images, videos, and text. | 3,613 |
| A framework for training large language models using scalable and optimized GPU techniques | 10,804 |