Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

GitHub

3k stars
28 watching
207 forks
Language: Python
last commit: 11 days ago
instruction-tuninglarge-vision-language-modelmulti-modal