VIMA

Robot learner

An implementation of a general-purpose robot learning model using multimodal prompts

Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"

GitHub

774 stars
17 watching
86 forks
Language: Python
last commit: 7 months ago

Related projects:

Repository Description Stars
ailab-cvc/seed An implementation of a multimodal language model with capabilities for comprehension and generation 576
autoviml/auto_viml Automatically builds multiple machine learning models using a single line of code. 524
jdelacroix/simiam Educational tool for robotics that bridges theory and practice using MATLAB 103
vita-epfl/crowdnav Develops robot navigation policies in crowded spaces using reinforcement learning and attention mechanisms. 598
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,477
llava-vl/llava-interactive-demo An all-in-one demo for interactive image processing and generation 351
xverse-ai/xverse-v-13b A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. 77
sergioburdisso/pyss3 A Python package implementing an interpretable machine learning model for text classification with visualization tools 336
dvlab-research/prompt-highlighter An interactive control system for text generation in multi-modal language models 132
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 269
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 14
ethereum/evmlab Utilities for interacting with the Ethereum virtual machine 366
bekovmi/segmentation_tutorial A tutorial project on teaching model training scripts using Config API Catalyst 8
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
mkorpela/robomachine Automates test generation based on user input and system behavior models. 100