GPT4Tools

Conversational image interface

An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings.

GPT4Tools is an intelligent system that can automatically decide, control, and utilize different visual foundation models, allowing the user to interact with images during a conversation.

GitHub

762 stars
13 watching
59 forks
Language: Python
last commit: about 1 year ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 517
vinhnx/inkchatgpt An application that enables users to upload documents and converse with an AI-powered language model. 9
thu-coai/cdial-gpt A large-scale Chinese conversation dataset and pre-trained dialog models for text generation 1,799
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,478
neukg/techgpt-2.0 An advanced language model designed to generate human-like responses in various domains and applications 101
fengyuli-dev/multimedia-gpt Enables OpenAI GPT to process multimedia inputs like images and audio with text output 184
chidiwilliams/gpt-automator A voice-controlled Mac assistant that uses natural language processing to automate desktop tasks 232
robitx/gp.nvim An extension for Neovim that integrates GPT models into the editor, enabling AI-powered text operations and speech-to-text capabilities. 928
mbzuai-oryx/video-chatgpt A video conversation model that generates meaningful conversations about videos using large vision and language models 1,246
kejunmao/ai-anything An open-source toolset for creating custom ChatGPT interfaces 568
zcli-charlie/batgpt A large language model designed to support long context conversations with improved efficiency and effectiveness 38
360cvgroup/seechat A multimodal chatbot with computer vision capabilities integrated into a single model 99
pjlab-adg/gpt4v-ad-exploration An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions 288
laurentkneip/opengv A collection of computer vision methods for solving geometric vision problems 1,040
ailab-cvc/seed-bench A benchmark for evaluating large language models' ability to process multimodal input 322