CogVLM

Visual Language Model

Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems.

a state-of-the-art-level open visual language model | 多模态预训练模型

GitHub

6k stars

68 watching

419 forks

Language: Python

last commit: about 2 years ago

cross-modalitylanguage-modelmulti-modalpretrained-modelsvisual-language-models

Related projects:

Repository	Description	Stars
thudm/cogvideo	Generates videos from text and images using large language models	9,761
thudm/glm	A general-purpose language model pre-trained with an autoregressive blank-filling objective and designed for various natural language understanding and generation tasks.	3,207
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
opengvlab/internvl	Develops large language models capable of processing multiple data types and modalities	6,394
qwenlm/qwen-vl	A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks	5,179
openbmb/minicpm-v	A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs.	12,870
open-compass/vlmevalkit	An evaluation toolkit for large vision-language models	1,514
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
replicate/cog	A tool for packaging and deploying machine learning models in a standard, production-ready container environment.	8,169
thudm/glm-130b	An open-source implementation of a large bilingual language model pre-trained on vast amounts of text data.	7,672
opengvlab/visionllm	A large language model designed to process and generate visual information	956
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
openbmb/toolbench	A platform for training, serving, and evaluating large language models to enable tool use capability	4,888
antvis/g2	A visualization grammar that enables rapid creation of data-driven visualizations with concise declarations and infers complex details.	12,155
open-mmlab/mmcv	Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops.	5,948