fromage

Multimodal model framework

A framework for grounding language models to images and handling multimodal inputs and outputs

🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".

GitHub

478 stars

12 watching

35 forks

Language: Jupyter Notebook

last commit: almost 2 years ago

computer-visionlarge-language-modelsmachine-learningnatural-language-processing

jykoh.com/fromage

Related projects:

Repository	Description	Stars
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
kohjingyu/gill	A software framework for generating images and text using large language models	440
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
baaivision/emu	A multimodal generative model framework	1,672
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
pku-yuangroup/languagebind	Extending pretraining models to handle multiple modalities by aligning language and video representations	751
shizhediao/davinci	Implementing a unified modal learning framework for generative vision-language models	43
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
elliottd/groundedtranslation	Trains multilingual image description models using neural sequence models and extracts hidden features from trained models.	46
joez17/chatbridge	A unified multimodal language model capable of interpreting and reasoning about various modalities without paired data.	49
stanford-crfm/levanter	A framework for training large language models that prioritizes legibility, scalability, and reproducibility	527
modeloriented/drwhy	A collection of tools and guidelines for building responsible machine learning models	682