GPT4Tools

Conversational image interface

An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings.

GPT4Tools is an intelligent system that can automatically decide, control, and utilize different visual foundation models, allowing the user to interact with images during a conversation.

GitHub

762 stars

13 watching

59 forks

Language: Python

last commit: over 2 years ago

Linked from 1 awesome list

Screenshot of AILab-CVC/GPT4Tools website

gpt4tools.github.io

Backlinks from these awesome lists:

bradyfu/awesome-multimodal-large-language-models

Related projects:

Repository	Description	Stars
jshilong/gpt4roi	Training and deploying large language models on computer vision tasks using region-of-interest inputs	517
vinhnx/inkchatgpt	An application that enables users to upload documents and converse with an AI-powered language model.	9
thu-coai/cdial-gpt	A large-scale Chinese conversation dataset and pre-trained dialog models for text generation	1,799
open-mmlab/multimodal-gpt	Trains a multimodal chatbot that combines visual and language instructions to generate responses	1,478
neukg/techgpt-2.0	An advanced language model designed to generate human-like responses in various domains and applications	101
fengyuli-dev/multimedia-gpt	Enables OpenAI GPT to process multimedia inputs like images and audio with text output	184
chidiwilliams/gpt-automator	A voice-controlled Mac assistant that uses natural language processing to automate desktop tasks	232
robitx/gp.nvim	An extension for Neovim that integrates GPT models into the editor, enabling AI-powered text operations and speech-to-text capabilities.	928
mbzuai-oryx/video-chatgpt	A video conversation model that generates meaningful conversations about videos using large vision and language models	1,246
kejunmao/ai-anything	An open-source toolset for creating custom ChatGPT interfaces	568
zcli-charlie/batgpt	A large language model designed to support long context conversations with improved efficiency and effectiveness	38
360cvgroup/seechat	A multimodal chatbot with computer vision capabilities integrated into a single model	99
pjlab-adg/gpt4v-ad-exploration	An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions	288
laurentkneip/opengv	A collection of computer vision methods for solving geometric vision problems	1,040
ailab-cvc/seed-bench	A benchmark for evaluating large language models' ability to process multimodal input	322