Video-ChatGPT
Video conversational model
A video conversation model that generates meaningful conversations about videos using large vision and language models
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
1k stars
15 watching
110 forks
Language: Python
last commit: about 1 year ago chatbotclipgpt-4llamallavamulit-modalvicunavideo-chatboatvideo-conversationvision-languagevision-language-pretraining
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
| | A conversational language model developed to improve understanding of complex instructions and Chinese vocabulary. | 62 |
| | An advanced language model designed to generate human-like responses in various domains and applications | 101 |
| | A commercially viable web application for conversational AI built with React and OpenAI's ChatGPT technology | 1,366 |
| | A large language model with 70 billion parameters designed for chatbot and conversational AI tasks | 29 |
| | A large language model designed to understand long videos by binding visual content with timestamps and producing video token sequences of varying lengths. | 314 |
| | A large language model designed to support long context conversations with improved efficiency and effectiveness | 38 |
| | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,478 |
| | A conversational AI demo powered by a large language model | 78 |
| | An evaluation platform for comparing multi-modality models on visual question-answering tasks | 478 |
| | A Discord bot that enables interactive conversations with ChatGPT using a single command. | 291 |
| | A multimodal chatbot with computer vision capabilities integrated into a single model | 99 |
| | A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
| | An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings. | 762 |
| | Transforms video content into a long document containing visual and audio information that can be used for chat or other applications. | 545 |