groundingLMM
Multimodal conversational model
An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
797 stars
31 watching
37 forks
Language: Python
last commit: 3 months ago foundation-modelsllm-agentlmmvision-and-languagevision-language-model
Related projects:
Repository | Description | Stars |
---|---|---|
| A small language model designed to run efficiently on edge devices with minimal resource requirements. | 607 |
| A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
| A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,568 |
| A video conversation model that generates meaningful conversations about videos using large vision and language models | 1,246 |
| A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 |
| An open-source multimodal LLM project with a vision-centric design | 1,799 |
| An evaluation platform for comparing multi-modality models on visual question-answering tasks | 478 |
| An implementation of a multimodal language model with capabilities for comprehension and generation | 585 |
| A collection of multilingual language models trained on a dataset of instructions and responses in various languages. | 94 |
| An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts | 92 |
| A framework for grounding language models to images and handling multimodal inputs and outputs | 478 |
| An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| An LLaMA-based multimodal language model with various instruction-following and multimodal variants. | 17 |
| An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models | 19 |
| A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities | 402 |