Vitron
Vision Model
An end-to-end vision LLM designed for comprehensive understanding, generating, segmenting, and editing of static images and dynamic videos.
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
406 stars
14 watching
24 forks
Language: Python
last commit: 3 months ago mllmmultimodal-large-language-modelssegmentation