EAGLE

Multimodal model builder

Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

GitHub

549 stars

31 watching

45 forks

Language: Python

last commit: 11 months ago

demoeaglegpt4huggingfacelarge-language-modelsllamallama3llavallmlmmlvlmmllmnvdia

arxiv.org/pdf/2408.15998

Related projects:

Repository	Description	Stars
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
llava-vl/llava-interactive-demo	An all-in-one demo for interactive image processing and generation	353
baaivision/eve	A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities	246
runpeidong/dreamllm	A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities	402
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
nvlabs/edm	This project provides a set of tools and techniques to design and improve diffusion-based generative models.	1,447
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
opengvlab/multi-modality-arena	An evaluation platform for comparing multi-modality models on visual question-answering tasks	478
mlpc-ucsd/bliva	A multimodal LLM designed to handle text-rich visual questions	270
bytedance/lynx-llm	A framework for training GPT4-style language models with multimodal inputs using large datasets and pre-trained models	231
open-mmlab/multimodal-gpt	Trains a multimodal chatbot that combines visual and language instructions to generate responses	1,478
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
neulab/pangea	An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts	92