ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

GitHub

282 stars
5 watching
21 forks
Language: Python
last commit: 3 months ago
chatbotclipcvpr2024foundation-modelsgpt-4gpt-4-visionllamallama2llavamulti-modalvision-languagevisual-prompting