Vitron

Vision Model

An end-to-end vision LLM designed for comprehensive understanding, generating, segmenting, and editing of static images and dynamic videos.

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

406 stars

14 watching

24 forks

Language: Python

last commit: almost 2 years ago

mllmmultimodal-large-language-modelssegmentation