groundingLMM

Multimodal conversational model

An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks.

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

GitHub

781 stars
31 watching
37 forks
Language: Python
last commit: 6 months ago
foundation-modelsllm-agentlmmvision-and-languagevision-language-model

Related projects:

Repository Description Stars
mbzuai-oryx/mobillama A small language model designed to run efficiently on edge devices with minimal resource requirements. 595
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,550
mbzuai-oryx/video-chatgpt A video conversation model that generates meaningful conversations about videos using large vision and language models 1,213
openbmb/viscpm A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages 1,089
cambrian-mllm/cambrian An open-source multimodal LLM project with a vision-centric design 1,759
opengvlab/multi-modality-arena An evaluation platform for comparing multi-modality models on visual question-answering tasks 467
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 576
mbzuai-nlp/bactrian-x A collection of multilingual language models trained on a dataset of instructions and responses in various languages. 94
neulab/pangea An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts 91
kohjingyu/fromage A framework for grounding language models to images and handling multimodal inputs and outputs 478
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
alpha-vllm/wemix-llm An LLaMA-based multimodal language model with various instruction-following and multimodal variants. 17
ucsc-vlaa/sight-beyond-text This repository provides an official implementation of a research paper exploring the use of multi-modal training to enhance language models' truthfulness and ethics in various applications. 19
runpeidong/dreamllm A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities 394