visprog
Visual task solver
A system that uses code generation and execution to solve complex visual tasks from natural language instructions.
Official code for VisProg (CVPR 2023 Best Paper!)
693 stars
15 watching
65 forks
Language: Python
last commit: 3 months ago Related projects:
Repository | Description | Stars |
---|---|---|
allenai/ai2thor | An open-source platform for simulating complex interactions between agents and virtual environments. | 1,172 |
opengvlab/controlllm | An open-source framework for augmenting large language models with tools by searching on graphs to solve complex real-world tasks. | 186 |
kunpengli1994/vsrn | An open-source PyTorch implementation of a visual semantic reasoning model for image-text matching | 294 |
opendatalab/vigc | Autonomously generates high-quality image-text instruction fine-tuning datasets | 90 |
autoviml/autoviz | Automatically generates insightful visualizations from datasets of any size with minimal code | 1,729 |
lavi-lab/visual-table | A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge. | 14 |
milvlg/prophet | An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. | 267 |
aidc-ai/parrot | A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. | 30 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |
lzane/vrp-using-sa-with-matlab | A Matlab implementation of simulated annealing to solve the vehicle routing problem with multiple vehicles and constraints. | 124 |
penghao-wu/vstar | PyTorch implementation of guided visual search mechanism for multimodal LLMs | 527 |
ethanjwright/vs-tasks.nvim | A plugin to manage and run tasks in a project similar to VS Code's task implementation. | 183 |
cormanz/smartgpt | Provides an LLM with the ability to complete complex tasks by breaking them down into smaller problems and collecting information from external sources. | 1,755 |
nathanflurry/visualprogramminglanguage | An early prototype of a visual programming language designed to assemble executable Swift code using touch or Apple Pencil input | 1,186 |
gordonhu608/mqt-llava | A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 97 |