visprog

Visual task solver

A system that uses code generation and execution to solve complex visual tasks from natural language instructions.

Official code for VisProg (CVPR 2023 Best Paper!)

GitHub

697 stars

15 watching

65 forks

Language: Python

last commit: 11 months ago

Related projects:

Repository	Description	Stars
allenai/ai2thor	An open-source platform for simulating complex interactions between agents and virtual environments.	1,208
opengvlab/controlllm	An open-source framework for augmenting large language models with tools by searching on graphs to solve complex real-world tasks.	187
kunpengli1994/vsrn	An open-source PyTorch implementation of a visual semantic reasoning model for image-text matching	294
opendatalab/vigc	Autonomously generates high-quality image-text instruction fine-tuning datasets	91
autoviml/autoviz	Automatically generates insightful visualizations from datasets of any size with minimal code	1,749
lavi-lab/visual-table	A project that generates visual representations tailored for general visual reasoning, leveraging hierarchical scene descriptions and instance-level world knowledge.	14
milvlg/prophet	An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks.	270
aidc-ai/parrot	A method and toolkit for fine-tuning large language models to perform visual instruction tasks in multiple languages.	34
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
lzane/vrp-using-sa-with-matlab	A Matlab implementation of simulated annealing to solve the vehicle routing problem with multiple vehicles and constraints.	125
penghao-wu/vstar	PyTorch implementation of guided visual search mechanism for multimodal LLMs	541
ethanjwright/vs-tasks.nvim	A plugin that integrates task management with VS Code's Editor Tasks functionality	187
cormanz/smartgpt	Provides an LLM with the ability to complete complex tasks by breaking them down into smaller problems and collecting information from external sources.	1,757
nathanflurry/visualprogramminglanguage	An early prototype of a visual programming language designed to assemble executable Swift code using touch or Apple Pencil input	1,187
gordonhu608/mqt-llava	A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens.	101