SEED

Multimodal LLM

An implementation of a multimodal language model with capabilities for comprehension and generation

Official implementation of SEED-LLaMA (ICLR 2024).

GitHub

585 stars

15 watching

32 forks

Language: Python

last commit: almost 2 years ago

foundation-modelmultimodalvision-language

ailab-cvc.github.io/seed

Related projects:

Repository	Description	Stars
ailab-cvc/seed-bench	A benchmark for evaluating large language models' ability to process multimodal input	322
alpha-vllm/wemix-llm	An LLaMA-based multimodal language model with various instruction-following and multimodal variants.	17
khanrc/honeybee	An implementation of a multimodal language model using locality-enhanced projection techniques	435
mlpc-ucsd/bliva	A multimodal LLM designed to handle text-rich visual questions	270
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
llava-vl/llava-interactive-demo	An all-in-one demo for interactive image processing and generation	353
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
nvlabs/eagle	Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions	549
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
bytedance/lynx-llm	A framework for training GPT4-style language models with multimodal inputs using large datasets and pre-trained models	231
dvlab-research/prompt-highlighter	An interactive control system for text generation in multi-modal language models	135
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
neulab/pangea	An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts	92
runpeidong/dreamllm	A framework to build versatile Multimodal Large Language Models with synergistic comprehension and creation capabilities	402