mm-cot

Multimodal reasoning model

An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference.

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

GitHub

4k stars

56 watching

317 forks

Language: Python

last commit: about 1 year ago

Screenshot of amazon-science/mm-cot website

arxiv.org/abs/2302.00923

Related projects:

Repository	Description	Stars
deepcs233/visual-cot	A framework for training multi-modal language models with a focus on visual inputs and providing interpretable thoughts.	162
pku-yuangroup/moe-llava	A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks	2,023
dvlab-research/mgm	An open-source framework for training large language models with vision capabilities.	3,229
karpathy/neuraltalk	A system for generating image descriptions using neural networks	5,414
soolab/ddcot	This implementation provides tools and methods for multimodal reasoning in language models through prompting.	35
lupantech/scienceqa	A dataset and software framework for building multimodal reasoning systems to answer science questions.	615
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
subho406/omninet	An implementation of a unified architecture for multi-modal multi-task learning using PyTorch.	515
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
dair-ai/ml-papers-explained	An explanation of key concepts and advancements in the field of Machine Learning	7,352
open-mmlab/mmcv	Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops.	5,948
ucsc-vlaa/sight-beyond-text	An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models	19
byungkwanlee/moai	Improves performance of vision language tasks by integrating computer vision capabilities into large language models	314
qubvel-org/segmentation_models.pytorch	A comprehensive library for training and applying deep learning models for image segmentation	9,829
hxyou/idealgpt	A deep learning framework for iteratively decomposing vision and language reasoning via large language models.	32