mm-cot

Multimodal reasoning model

An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference.

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

GitHub

4k stars
56 watching
313 forks
Language: Python
last commit: 5 months ago

Related projects:

Repository Description Stars
deepcs233/visual-cot Develops a multi-modal language model with a comprehensive dataset and benchmark for chain-of-thought reasoning 134
pku-yuangroup/moe-llava Develops a neural network architecture for multi-modal learning with large vision-language models 1,980
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,211
karpathy/neuraltalk A system for generating image descriptions using neural networks 5,411
soolab/ddcot This implementation provides tools and methods for multimodal reasoning in language models through prompting. 33
lupantech/scienceqa Develops a framework for multimodal reasoning and question answering in science and other domains using natural language processing and machine learning techniques. 606
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
subho406/omninet An implementation of a unified architecture for multi-modal multi-task learning using PyTorch. 512
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
dair-ai/ml-papers-explained An explanation of key concepts and advancements in the field of Machine Learning 7,315
open-mmlab/mmcv Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. 5,906
ucsc-vlaa/sight-beyond-text This repository provides an official implementation of a research paper exploring the use of multi-modal training to enhance language models' truthfulness and ethics in various applications. 19
byungkwanlee/moai Improves performance of vision language tasks by integrating computer vision capabilities into large language models 311
qubvel-org/segmentation_models.pytorch A PyTorch library for building and training neural networks for image segmentation tasks. 9,696
hxyou/idealgpt A deep learning framework for iteratively decomposing vision and language reasoning via large language models. 32