VIMA
Robot learner
An implementation of a general-purpose robot learning model using multimodal prompts
Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
774 stars
17 watching
86 forks
Language: Python
last commit: 7 months ago Related projects:
Repository | Description | Stars |
---|---|---|
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 576 |
autoviml/auto_viml | Automatically builds multiple machine learning models using a single line of code. | 524 |
jdelacroix/simiam | Educational tool for robotics that bridges theory and practice using MATLAB | 103 |
vita-epfl/crowdnav | Develops robot navigation policies in crowded spaces using reinforcement learning and attention mechanisms. | 598 |
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,477 |
llava-vl/llava-interactive-demo | An all-in-one demo for interactive image processing and generation | 351 |
xverse-ai/xverse-v-13b | A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 77 |
sergioburdisso/pyss3 | A Python package implementing an interpretable machine learning model for text classification with visualization tools | 336 |
dvlab-research/prompt-highlighter | An interactive control system for text generation in multi-modal language models | 132 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 269 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 14 |
ethereum/evmlab | Utilities for interacting with the Ethereum virtual machine | 366 |
bekovmi/segmentation_tutorial | A tutorial project on teaching model training scripts using Config API Catalyst | 8 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
mkorpela/robomachine | Automates test generation based on user input and system behavior models. | 100 |