Vitron

Vision Model

An end-to-end vision LLM designed for comprehensive understanding, generating, segmenting, and editing of static images and dynamic videos.

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

GitHub

406 stars
14 watching
24 forks
Language: Python
last commit: 3 months ago
mllmmultimodal-large-language-modelssegmentation