Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

GitHub

3k stars
32 watching
243 forks
Language: Python
last commit: 4 months ago
blip2cross-modal-pretraininglarge-language-modelsllamaminigpt4multi-modal-chatgptvideo-language-pretrainingvision-language-pretraining