Video-ChatGPT

Video conversational model

A video conversation model that generates meaningful conversations about videos using large vision and language models

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

GitHub

1k stars
15 watching
108 forks
Language: Python
last commit: 3 months ago
chatbotclipgpt-4llamallavamulit-modalvicunavideo-chatboatvideo-conversationvision-languagevision-language-pretraining

Related projects:

Repository Description Stars
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. 781
abbey4799/cutegpt A conversational language model developed to improve understanding of complex instructions and Chinese vocabulary. 62
neukg/techgpt-2.0 An advanced language model designed to generate human-like responses in various domains and applications 101
79e/chatgpt-web A commercially viable web application for conversational AI built with React and OpenAI's ChatGPT technology 1,360
kendryte/toucan-llm A large language model with 70 billion parameters designed for chatbot and conversational AI tasks 29
renshuhuai-andy/timechat A large language model designed to understand and process long videos with temporal information 286
zcli-charlie/batgpt A large language model designed to support long context conversations with improved efficiency and effectiveness 38
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,477
nagi-ovo/crag-ollama-chat A conversational AI demo powered by a large language model 76
opengvlab/multi-modality-arena An evaluation platform for comparing multi-modality models on visual question-answering tasks 467
m1guelpf/chatgpt-discord A Discord bot that enables interactive conversations with ChatGPT using a single command. 291
360cvgroup/seechat A multimodal chatbot with computer vision capabilities integrated into a single model 98
wisconsinaivision/vip-llava A system designed to enable large multimodal models to understand arbitrary visual prompts 294
ailab-cvc/gpt4tools An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings. 760
showlab/vlog Transforms video content into a long document containing visual and audio information that can be used for chat or other applications. 538