CogVLM
Visual Language Model
Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems.
a state-of-the-art-level open visual language model | 多模态预训练模型
6k stars
66 watching
415 forks
Language: Python
last commit: 6 months ago cross-modalitylanguage-modelmulti-modalpretrained-modelsvisual-language-models
Related projects:
Repository | Description | Stars |
---|---|---|
thudm/cogvideo | Generates videos from text and images using large language models | 9,156 |
thudm/glm | A general-purpose language model pre-trained with an autoregressive blank-filling objective and designed for various natural language understanding and generation tasks. | 3,199 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
opengvlab/internvl | A pioneering open-source alternative to commercial multimodal models with a family of large-scale language and vision models. | 6,014 |
qwenlm/qwen-vl | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,045 |
openbmb/minicpm-v | A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,619 |
open-compass/vlmevalkit | A toolkit for evaluating large vision-language models on various benchmarks and datasets. | 1,343 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
replicate/cog | A tool for packaging and deploying machine learning models in a standard, production-ready container environment. | 8,081 |
thudm/glm-130b | An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. | 7,659 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 915 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
openbmb/toolbench | A platform for training, serving, and evaluating large language models to enable tool use capability | 4,843 |
antvis/g2 | A visualization grammar that enables rapid creation of data-driven visualizations with concise declarations and infers complex details. | 12,130 |
open-mmlab/mmcv | Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. | 5,906 |