GPT4Tools
Conversational image interface
An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings.
GPT4Tools is an intelligent system that can automatically decide, control, and utilize different visual foundation models, allowing the user to interact with images during a conversation.
760 stars
13 watching
58 forks
Language: Python
last commit: 11 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
vinhnx/inkchatgpt | An application that enables users to upload documents and converse with an AI-powered language model. | 9 |
thu-coai/cdial-gpt | A large-scale Chinese conversation dataset and pre-trained dialog models for text generation | 1,782 |
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,477 |
neukg/techgpt-2.0 | An advanced language model designed to generate human-like responses in various domains and applications | 101 |
fengyuli-dev/multimedia-gpt | Enables OpenAI GPT to process multimedia inputs like images and audio with text output | 184 |
chidiwilliams/gpt-automator | A voice-controlled Mac assistant that uses natural language processing to automate desktop tasks | 230 |
robitx/gp.nvim | An extension for Neovim that integrates GPT models into the editor, enabling AI-powered text operations and speech-to-text capabilities. | 869 |
mbzuai-oryx/video-chatgpt | A video conversation model that generates meaningful conversations about videos using large vision and language models | 1,213 |
kejunmao/ai-anything | An open-source toolset for creating custom ChatGPT interfaces | 566 |
zcli-charlie/batgpt | A large language model designed to support long context conversations with improved efficiency and effectiveness | 38 |
360cvgroup/seechat | A multimodal chatbot with computer vision capabilities integrated into a single model | 98 |
pjlab-adg/gpt4v-ad-exploration | An autonomous driving project exploring the capabilities of a visual-language model in understanding complex driving scenes and making decisions | 287 |
laurentkneip/opengv | A collection of computer vision methods for solving geometric vision problems. | 1,031 |
ailab-cvc/seed-bench | A benchmark for evaluating large language models' ability to process multimodal input | 315 |