Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
3k stars
28 watching
207 forks
Language: Python
last commit: 11 days ago instruction-tuninglarge-vision-language-modelmulti-modal
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection