mm-cot
Multimodal reasoning model
An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference.
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
4k stars
56 watching
313 forks
Language: Python
last commit: 5 months ago Related projects:
Repository | Description | Stars |
---|---|---|
deepcs233/visual-cot | Develops a multi-modal language model with a comprehensive dataset and benchmark for chain-of-thought reasoning | 134 |
pku-yuangroup/moe-llava | Develops a neural network architecture for multi-modal learning with large vision-language models | 1,980 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,211 |
karpathy/neuraltalk | A system for generating image descriptions using neural networks | 5,411 |
soolab/ddcot | This implementation provides tools and methods for multimodal reasoning in language models through prompting. | 33 |
lupantech/scienceqa | Develops a framework for multimodal reasoning and question answering in science and other domains using natural language processing and machine learning techniques. | 606 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
subho406/omninet | An implementation of a unified architecture for multi-modal multi-task learning using PyTorch. | 512 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
dair-ai/ml-papers-explained | An explanation of key concepts and advancements in the field of Machine Learning | 7,315 |
open-mmlab/mmcv | Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. | 5,906 |
ucsc-vlaa/sight-beyond-text | This repository provides an official implementation of a research paper exploring the use of multi-modal training to enhance language models' truthfulness and ethics in various applications. | 19 |
byungkwanlee/moai | Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 311 |
qubvel-org/segmentation_models.pytorch | A PyTorch library for building and training neural networks for image segmentation tasks. | 9,696 |
hxyou/idealgpt | A deep learning framework for iteratively decomposing vision and language reasoning via large language models. | 32 |