Ovis

Multimodal aligner

An MLLM architecture designed to align visual and textual embeddings through structural alignment

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

GitHub

575 stars

7 watching

33 forks

Language: Python

last commit: 11 months ago

chatbotllama3multimodalmultimodal-large-language-modelsmultimodalityqwenvision-language-learningvision-language-model

huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B

Related projects:

Repository	Description	Stars
aidc-ai/parrot	A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages.	34
rlhf-v/rlhf-v	Aligns large language models' behavior through fine-grained correctional human feedback to improve trustworthiness and accuracy.	245
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585
ucsc-vlaa/sight-beyond-text	An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models	19
pku-alignment/align-anything	Aligns large multimodal models with human intentions and values using various algorithms and fine-tuning methods.	270
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
pku-yuangroup/languagebind	Extending pretraining models to handle multiple modalities by aligning language and video representations	751
salt-nlp/llavar	An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets.	259
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
tanloong/interlaced.nvim	A plugin for aligning bilingual parallel texts by re-positioning text and applying highlighting.	7
lancopku/iais	This project proposes a novel method for calibrating attention distributions in multimodal models to improve contextualized representations of image-text pairs.	30
wisconsinaivision/vip-llava	A system designed to enable large multimodal models to understand arbitrary visual prompts	302
pku-yuangroup/moe-llava	A large vision-language model using a mixture-of-experts architecture to improve performance on multi-modal learning tasks	2,023
opengvlab/visionllm	A large language model designed to process and generate visual information	956