visprog

Visual task solver

A system that uses code generation and execution to solve complex visual tasks from natural language instructions.

Official code for VisProg (CVPR 2023 Best Paper!)

GitHub

693 stars
15 watching
65 forks
Language: Python
last commit: 3 months ago

Related projects:

Repository Description Stars
allenai/ai2thor An open-source platform for simulating complex interactions between agents and virtual environments. 1,172
opengvlab/controlllm An open-source framework for augmenting large language models with tools by searching on graphs to solve complex real-world tasks. 186
kunpengli1994/vsrn An open-source PyTorch implementation of a visual semantic reasoning model for image-text matching 294
opendatalab/vigc Autonomously generates high-quality image-text instruction fine-tuning datasets 90
autoviml/autoviz Automatically generates insightful visualizations from datasets of any size with minimal code 1,729
lavi-lab/visual-table A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge. 14
milvlg/prophet An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. 267
aidc-ai/parrot A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages. 30
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,300
lzane/vrp-using-sa-with-matlab A Matlab implementation of simulated annealing to solve the vehicle routing problem with multiple vehicles and constraints. 124
penghao-wu/vstar PyTorch implementation of guided visual search mechanism for multimodal LLMs 527
ethanjwright/vs-tasks.nvim A plugin to manage and run tasks in a project similar to VS Code's task implementation. 183
cormanz/smartgpt Provides an LLM with the ability to complete complex tasks by breaking them down into smaller problems and collecting information from external sources. 1,755
nathanflurry/visualprogramminglanguage An early prototype of a visual programming language designed to assemble executable Swift code using touch or Apple Pencil input 1,186
gordonhu608/mqt-llava A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. 97