VIMA

Robot learner

An implementation of a general-purpose robot learning model using multimodal prompts

Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"

GitHub

781 stars

17 watching

88 forks

Language: Python

last commit: almost 2 years ago

Related projects:

Repository	Description	Stars
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585
autoviml/auto_viml	Automatically builds multiple machine learning models using a single line of code.	526
jdelacroix/simiam	Educational tool for robotics that bridges theory and practice using MATLAB	103
vita-epfl/crowdnav	Develops robot navigation policies in crowded spaces using reinforcement learning and attention mechanisms.	607
open-mmlab/multimodal-gpt	Trains a multimodal chatbot that combines visual and language instructions to generate responses	1,478
llava-vl/llava-interactive-demo	An all-in-one demo for interactive image processing and generation	353
xverse-ai/xverse-v-13b	A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences.	78
sergioburdisso/pyss3	A Python package implementing an interpretable machine learning model for text classification with visualization tools	336
dvlab-research/prompt-highlighter	An interactive control system for text generation in multi-modal language models	135
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
ethereum/evmlab	Utilities for interacting with the Ethereum virtual machine	367
bekovmi/segmentation_tutorial	A tutorial project on teaching model training scripts using Config API Catalyst	8
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
mkorpela/robomachine	Automates test generation based on user input and system behavior models.	101