Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
321 stars
3 watching
13 forks
Language: Python
last commit: 17 days ago chatbotllama3multimodalmultimodal-large-language-modelsmultimodalityqwenvision-language-learningvision-language-model