groundingLMM

Multimodal conversational model

An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

GitHub

797 stars

31 watching

37 forks

Language: Python

last commit: 11 months ago

foundation-modelsllm-agentlmmvision-and-languagevision-language-model

Screenshot of mbzuai-oryx/groundingLMM website

grounding-anything.com

Related projects:

Repository	Description	Stars
mbzuai-oryx/mobillama	A small language model designed to run efficiently on edge devices with minimal resource requirements.	607
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
mbzuai-oryx/video-chatgpt	A video conversation model that generates meaningful conversations about videos using large vision and language models	1,246
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
cambrian-mllm/cambrian	An open-source multimodal LLM project with a vision-centric design	1,799
opengvlab/multi-modality-arena	An evaluation platform for comparing multi-modality models on visual question-answering tasks	478
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585
mbzuai-nlp/bactrian-x	A collection of multilingual language models trained on a dataset of instructions and responses in various languages.	94
neulab/pangea	An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts	92
kohjingyu/fromage	A framework for grounding language models to images and handling multimodal inputs and outputs	478
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
alpha-vllm/wemix-llm	An LLaMA-based multimodal language model with various instruction-following and multimodal variants.	17
ucsc-vlaa/sight-beyond-text	An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models	19
runpeidong/dreamllm	A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities	402