Video-ChatGPT
Video conversational model
A video conversation model that generates meaningful conversations about videos using large vision and language models
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
1k stars
15 watching
110 forks
Language: Python
last commit: 5 months ago chatbotclipgpt-4llamallavamulit-modalvicunavideo-chatboatvideo-conversationvision-languagevision-language-pretraining
Related projects:
Repository | Description | Stars |
---|---|---|
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
abbey4799/cutegpt | A conversational language model developed to improve understanding of complex instructions and Chinese vocabulary. | 62 |
neukg/techgpt-2.0 | An advanced language model designed to generate human-like responses in various domains and applications | 101 |
79e/chatgpt-web | A commercially viable web application for conversational AI built with React and OpenAI's ChatGPT technology | 1,366 |
kendryte/toucan-llm | A large language model with 70 billion parameters designed for chatbot and conversational AI tasks | 29 |
renshuhuai-andy/timechat | A large language model designed to understand long videos by binding visual content with timestamps and producing video token sequences of varying lengths. | 314 |
zcli-charlie/batgpt | A large language model designed to support long context conversations with improved efficiency and effectiveness | 38 |
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,478 |
nagi-ovo/crag-ollama-chat | A conversational AI demo powered by a large language model | 78 |
opengvlab/multi-modality-arena | An evaluation platform for comparing multi-modality models on visual question-answering tasks | 478 |
m1guelpf/chatgpt-discord | A Discord bot that enables interactive conversations with ChatGPT using a single command. | 291 |
360cvgroup/seechat | A multimodal chatbot with computer vision capabilities integrated into a single model | 99 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 302 |
ailab-cvc/gpt4tools | An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings. | 762 |
showlab/vlog | Transforms video content into a long document containing visual and audio information that can be used for chat or other applications. | 545 |