cambrian

Vision-based LLM

An open-source multimodal LLM project with a vision-centric design

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

GitHub

2k stars
21 watching
114 forks
Language: Python
last commit: 22 days ago
chatbotclipcomputer-visiondinoinstruction-tuninglarge-language-modelsllmsmllmmultimodal-large-language-modelsrepresentation-learning

Related projects:

Repository Description Stars
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. 781
bobazooba/xllm-demo A demo project showcasing customization possibilities of an XLLM library 9
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,550
victordibia/llmx An API that provides a unified interface to multiple large language models for chat fine-tuning 79
bytedance/lynx-llm A framework for training GPT4-style language models with multimodal inputs using large datasets and pre-trained models 229
bobazooba/xllm A tool for training and fine-tuning large language models using advanced techniques 380
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 576
damo-nlp-mt/polylm A polyglot large language model designed to address limitations in current LLM research and provide better multilingual instruction-following capability. 76
openbmb/viscpm A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages 1,089
terminaldweller/milla An IRC bot that interacts with language models to provide answers and has customizable syntax highlighting. 5
nvlabs/eagle Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions 539
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,477
evolvinglmms-lab/longva This project provides a model for long context transfer from language to vision using a deep learning framework. 334
phellonchen/x-llm A framework that enables large language models to process and understand multimodal inputs from various sources such as images and speech. 306