Video-ChatGPT
Video conversational model
A video conversation model that generates meaningful conversations about videos using large vision and language models
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
1k stars
15 watching
108 forks
Language: Python
last commit: 3 months ago chatbotclipgpt-4llamallavamulit-modalvicunavideo-chatboatvideo-conversationvision-languagevision-language-pretraining
Related projects:
Repository | Description | Stars |
---|---|---|
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. | 781 |
abbey4799/cutegpt | A conversational language model developed to improve understanding of complex instructions and Chinese vocabulary. | 62 |
neukg/techgpt-2.0 | An advanced language model designed to generate human-like responses in various domains and applications | 101 |
79e/chatgpt-web | A commercially viable web application for conversational AI built with React and OpenAI's ChatGPT technology | 1,360 |
kendryte/toucan-llm | A large language model with 70 billion parameters designed for chatbot and conversational AI tasks | 29 |
renshuhuai-andy/timechat | A large language model designed to understand and process long videos with temporal information | 286 |
zcli-charlie/batgpt | A large language model designed to support long context conversations with improved efficiency and effectiveness | 38 |
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,477 |
nagi-ovo/crag-ollama-chat | A conversational AI demo powered by a large language model | 76 |
opengvlab/multi-modality-arena | An evaluation platform for comparing multi-modality models on visual question-answering tasks | 467 |
m1guelpf/chatgpt-discord | A Discord bot that enables interactive conversations with ChatGPT using a single command. | 291 |
360cvgroup/seechat | A multimodal chatbot with computer vision capabilities integrated into a single model | 98 |
wisconsinaivision/vip-llava | A system designed to enable large multimodal models to understand arbitrary visual prompts | 294 |
ailab-cvc/gpt4tools | An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings. | 760 |
showlab/vlog | Transforms video content into a long document containing visual and audio information that can be used for chat or other applications. | 538 |