groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
747 stars
28 watching
37 forks
Language: Python
last commit: 4 months ago foundation-modelsllm-agentlmmvision-and-languagevision-language-model