CogVLM

Visual Language Model

Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems.

a state-of-the-art-level open visual language model | 多模态预训练模型

GitHub

6k stars
66 watching
415 forks
Language: Python
last commit: 6 months ago
cross-modalitylanguage-modelmulti-modalpretrained-modelsvisual-language-models

Related projects:

Repository Description Stars
thudm/cogvideo Generates videos from text and images using large language models 9,156
thudm/glm A general-purpose language model pre-trained with an autoregressive blank-filling objective and designed for various natural language understanding and generation tasks. 3,199
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
opengvlab/internvl A pioneering open-source alternative to commercial multimodal models with a family of large-scale language and vision models. 6,014
qwenlm/qwen-vl A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks 5,045
openbmb/minicpm-v A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. 12,619
open-compass/vlmevalkit A toolkit for evaluating large vision-language models on various benchmarks and datasets. 1,343
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
replicate/cog A tool for packaging and deploying machine learning models in a standard, production-ready container environment. 8,081
thudm/glm-130b An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data. 7,659
opengvlab/visionllm A large language model designed to process and generate visual information 915
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
openbmb/toolbench A platform for training, serving, and evaluating large language models to enable tool use capability 4,843
antvis/g2 A visualization grammar that enables rapid creation of data-driven visualizations with concise declarations and infers complex details. 12,130
open-mmlab/mmcv Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. 5,906