cambrian

Vision-based LLM

An open-source multimodal LLM project with a vision-centric design

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

GitHub

2k stars

23 watching

117 forks

Language: Python

last commit: 12 months ago

chatbotclipcomputer-visiondinoinstruction-tuninglarge-language-modelsllmsmllmmultimodal-large-language-modelsrepresentation-learning

Screenshot of cambrian-mllm/cambrian website

cambrian-mllm.github.io/

Related projects:

Repository	Description	Stars
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
bobazooba/xllm-demo	A demo project showcasing customization possibilities of an XLLM library	9
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
victordibia/llmx	An API that provides a unified interface to multiple large language models for chat fine-tuning	79
bytedance/lynx-llm	A framework for training GPT4-style language models with multimodal inputs using large datasets and pre-trained models	231
bobazooba/xllm	A tool for training and fine-tuning large language models using advanced techniques	387
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585
damo-nlp-mt/polylm	A polyglot large language model designed to address limitations in current LLM research and provide better multilingual instruction-following capability.	77
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
terminaldweller/milla	An IRC bot that interacts with language models to provide answers and has customizable syntax highlighting.	5
nvlabs/eagle	Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions	549
open-mmlab/multimodal-gpt	Trains a multimodal chatbot that combines visual and language instructions to generate responses	1,478
evolvinglmms-lab/longva	An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.	347
phellonchen/x-llm	A framework that enables large language models to process and understand multimodal inputs from various sources such as images and speech.	308