LLaMA-VID

Video image processor

An image-based language model that uses large language models to generate visual and text features from videos

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

GitHub

748 stars

14 watching

45 forks

Language: Python

last commit: about 1 year ago

Related projects:

Repository	Description	Stars
llava-vl/llava-interactive-demo	An all-in-one demo for interactive image processing and generation	353
dvlab-research/lisa	A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge.	1,923
wisconsinaivision/vip-llava	A system designed to enable large multimodal models to understand arbitrary visual prompts	302
dvlab-research/llmga	An implementation of a multimodal generation assistant using large language models and various image editing techniques.	463
opengvlab/visionllm	A large language model designed to process and generate visual information	956
freedomintelligence/longllava	A system for scaling large language models to process and understand visual information from multiple images efficiently.	183
360cvgroup/360vl	A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities.	32
damo-nlp-sg/videollama2	An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing.	957
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585
mlpc-ucsd/bliva	A multimodal LLM designed to handle text-rich visual questions	270
libav/libav	A collection of libraries and tools for processing multimedia content	1,086
liuzhao1225/youdub-webui	A web-based video processing tool that uses AI to facilitate cultural and linguistic tasks such as transcription, translation, and audio synthesis.	1,980
libvips/lua-vips	A Lua binding for a fast image processing library with low memory needs.	129
luispedro/mahotas	A library of fast computer vision algorithms implemented in C++ for speed, operating over numpy arrays.	855
dvlab-research/prompt-highlighter	An interactive control system for text generation in multi-modal language models	135