cambrian

Vision-based LLM

An open-source multimodal LLM project with a vision-centric design

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

GitHub

2k stars
23 watching
117 forks
Language: Python
last commit: 3 months ago
chatbotclipcomputer-visiondinoinstruction-tuninglarge-language-modelsllmsmllmmultimodal-large-language-modelsrepresentation-learning

Related projects:

Repository Description Stars
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 73
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations 797
bobazooba/xllm-demo A demo project showcasing customization possibilities of an XLLM library 9
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,568
victordibia/llmx An API that provides a unified interface to multiple large language models for chat fine-tuning 79
bytedance/lynx-llm A framework for training GPT4-style language models with multimodal inputs using large datasets and pre-trained models 231
bobazooba/xllm A tool for training and fine-tuning large language models using advanced techniques 387
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 585
damo-nlp-mt/polylm A polyglot large language model designed to address limitations in current LLM research and provide better multilingual instruction-following capability. 77
openbmb/viscpm A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages 1,098
terminaldweller/milla An IRC bot that interacts with language models to provide answers and has customizable syntax highlighting. 5
nvlabs/eagle Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions 549
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,478
evolvinglmms-lab/longva An open-source project that enables the transfer of language understanding to vision capabilities through long context processing. 347
phellonchen/x-llm A framework that enables large language models to process and understand multimodal inputs from various sources such as images and speech. 308