fromage

Multimodal model framework

A framework for grounding language models to images and handling multimodal inputs and outputs

🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".

GitHub

478 stars
12 watching
35 forks
Language: Jupyter Notebook
last commit: about 1 year ago
computer-visionlarge-language-modelsmachine-learningnatural-language-processing

Related projects:

Repository Description Stars
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
kohjingyu/gill A software framework for generating images and text using large language models 430
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. 781
lyuchenyang/macaw-llm A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation 1,550
baaivision/emu A multimodal generative model framework 1,659
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
pku-yuangroup/languagebind Extending pretraining models to handle multiple modalities by aligning language and video representations 723
shizhediao/davinci An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. 43
openbmb/viscpm A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages 1,089
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 137
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 14
elliottd/groundedtranslation Trains multilingual image description models using neural sequence models and extracts hidden features from trained models. 46
joez17/chatbridge A unified multimodal language model capable of interpreting and reasoning about various modalities without paired data. 47
stanford-crfm/levanter A framework for building and training large language models with focus on reproducibility, scalability, and performance. 516
modeloriented/drwhy A collection of tools and guidelines for building responsible machine learning models 680