LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

GitHub

693 stars
14 watching
43 forks
Language: Python
last commit: 2 months ago