groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

GitHub

747 stars
28 watching
37 forks
Language: Python
last commit: 4 months ago
foundation-modelsllm-agentlmmvision-and-languagevision-language-model