visdial

Dialog Agent

A system for an AI agent to engage in natural dialog about visual content using a combination of encoder and decoder architectures.

[CVPR 2017] Torch code for Visual Dialog

GitHub

228 stars
18 watching
69 forks
Language: Lua
last commit: about 6 years ago
computer-visiondeep-learningnatural-language-processingtorch

Related projects:

Repository Description Stars
open3da/ll3da An interactive system for understanding and interacting with 3D environments using natural language. 255
agenta-ai/agenta An end-to-end platform for building and deploying large language model applications 1,624
ucsc-vlaa/sight-beyond-text An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models 19
blazored/modal A reusable UI component for displaying a customizable dialog in Blazor applications 789
dialogflow/dialogflow-ruby-client A Ruby SDK for interacting with the Dialogflow API natural language processing service. 141
geek-ai/magent A platform for multi-agent reinforcement learning research and development 1,700
macournoyer/neuralconvo An implementation of a conversational model using sequence-to-sequence learning and LSTM layers in Torch 777
dvlab-research/prompt-highlighter An interactive control system for text generation in multi-modal language models 135
allenai/visprog A system that uses code generation and execution to solve complex visual tasks from natural language instructions. 697
airaria/visual-chinese-llama-alpaca Develops a multimodal Chinese language model with visual capabilities 429
gulvarol/surreal This project involves generating synthetic human data to train 3D models of human appearance and behavior. 590
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,336
vinhnx/inkchatgpt An application that enables users to upload documents and converse with an AI-powered language model. 9
mlpc-ucsd/bliva A multimodal LLM designed to handle text-rich visual questions 270
fmaclen/hollama A minimal web-based interface for interacting with large language models 542