GPT4Tools

Conversational image interface

An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings.

GPT4Tools is an intelligent system that can automatically decide, control, and utilize different visual foundation models, allowing the user to interact with images during a conversation.

GitHub

760 stars
13 watching
58 forks
Language: Python
last commit: 11 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
vinhnx/inkchatgpt An application that enables users to upload documents and converse with an AI-powered language model. 9
thu-coai/cdial-gpt A large-scale Chinese conversation dataset and pre-trained dialog models for text generation 1,782
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,477
neukg/techgpt-2.0 An advanced language model designed to generate human-like responses in various domains and applications 101
fengyuli-dev/multimedia-gpt Enables OpenAI GPT to process multimedia inputs like images and audio with text output 184
chidiwilliams/gpt-automator A voice-controlled Mac assistant that uses natural language processing to automate desktop tasks 230
robitx/gp.nvim An extension for Neovim that integrates GPT models into the editor, enabling AI-powered text operations and speech-to-text capabilities. 869
mbzuai-oryx/video-chatgpt A video conversation model that generates meaningful conversations about videos using large vision and language models 1,213
kejunmao/ai-anything An open-source toolset for creating custom ChatGPT interfaces 566
zcli-charlie/batgpt A large language model designed to support long context conversations with improved efficiency and effectiveness 38
360cvgroup/seechat A multimodal chatbot with computer vision capabilities integrated into a single model 98
pjlab-adg/gpt4v-ad-exploration An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions 287
laurentkneip/opengv A collection of computer vision methods for solving geometric vision problems. 1,031
ailab-cvc/seed-bench A benchmark for evaluating large language models' ability to process multimodal input 315