mm-cot
Multimodal reasoning model
An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference.
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
4k stars
56 watching
317 forks
Language: Python
last commit: 8 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A framework for training multi-modal language models with a focus on visual inputs and providing interpretable thoughts. | 162 |
| A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks | 2,023 |
| An open-source framework for training large language models with vision capabilities. | 3,229 |
| A system for generating image descriptions using neural networks | 5,414 |
| This implementation provides tools and methods for multimodal reasoning in language models through prompting. | 35 |
| A dataset and software framework for building multimodal reasoning systems to answer science questions. | 615 |
| A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
| An implementation of a unified architecture for multi-modal multi-task learning using PyTorch. | 515 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| An explanation of key concepts and advancements in the field of Machine Learning | 7,352 |
| Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. | 5,948 |
| An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models | 19 |
| Improves performance of vision language tasks by integrating computer vision capabilities into large language models | 314 |
| A comprehensive library for training and applying deep learning models for image segmentation | 9,829 |
| A deep learning framework for iteratively decomposing vision and language reasoning via large language models. | 32 |