Visual-CoT

Visual reasoning engine

A framework for training multi-modal language models with a focus on visual inputs and providing interpretable thoughts.

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

GitHub

162 stars

1 watching

7 forks

Language: Python

last commit: 8 months ago

Related projects:

Repository	Description	Stars
rowanz/r2c	An open-source project providing PyTorch code and data for a deep learning model that enables visual commonsense reasoning.	466
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
zhegan27/semantic_compositional_nets	A deep learning framework providing a model architecture and training code for image captioning using semantic compositional networks	70
davidmascharka/tbd-nets	An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks.	348
cod3licious/conec	A library for training and evaluating a type of word embedding model that extends the original Word2Vec algorithm	20
rucaibox/comvint	Creating synthetic visual reasoning instructions to improve the performance of large language models on image-related tasks	18
cadene/vqa.pytorch	A PyTorch implementation of visual question answering with multimodal representation learning	718
bigredt/vico	Multi-sense word embeddings learned from visual cooccurrences	25
360cvgroup/360vl	A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities.	32
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
tianyi-lab/hallusionbench	An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy	259
kazuto1011/deeplab-pytorch	PyTorch implementation of DeepLab v2 for semantic segmentation on COCO-Stuff and PASCAL VOC datasets	1,098
jayleicn/clipbert	An efficient framework for end-to-end learning on image-text and video-text tasks	709
satwikkottur/visualword2vec	Learning word embeddings from abstract images to improve language understanding	19
kdexd/virtex	A pretraining approach that uses semantically dense captions to learn visual representations and improve image understanding tasks.	556