CogVLM
Visual Language Model
Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems.
a state-of-the-art-level open visual language model | 多模态预训练模型
6k stars
68 watching
419 forks
Language: Python
last commit: 9 months ago cross-modalitylanguage-modelmulti-modalpretrained-modelsvisual-language-models
Related projects:
Repository | Description | Stars |
---|---|---|
| Generates videos from text and images using large language models | 9,761 |
| A general-purpose language model pre-trained with an autoregressive blank-filling objective and designed for various natural language understanding and generation tasks. | 3,207 |
| An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
| Develops large language models capable of processing multiple data types and modalities | 6,394 |
| A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,179 |
| A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,870 |
| An evaluation toolkit for large vision-language models | 1,514 |
| A system that uses large language and vision models to generate and process visual instructions | 20,683 |
| A tool for packaging and deploying machine learning models in a standard, production-ready container environment. | 8,169 |
| An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. | 7,672 |
| A large language model designed to process and generate visual information | 956 |
| An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
| A platform for training, serving, and evaluating large language models to enable tool use capability | 4,888 |
| A visualization grammar that enables rapid creation of data-driven visualizations with concise declarations and infers complex details. | 12,155 |
| Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. | 5,948 |