Video-ChatGPT

Video conversational model

A video conversation model that generates meaningful conversations about videos using large vision and language models

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

GitHub

1k stars

15 watching

110 forks

Language: Python

last commit: almost 2 years ago

chatbotclipgpt-4llamallavamulit-modalvicunavideo-chatboatvideo-conversationvision-languagevision-language-pretraining

Screenshot of mbzuai-oryx/Video-ChatGPT website

mbzuai-oryx.github.io/Video-ChatGPT

Related projects:

Repository	Description	Stars
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
abbey4799/cutegpt	A conversational language model developed to improve understanding of complex instructions and Chinese vocabulary.	62
neukg/techgpt-2.0	An advanced language model designed to generate human-like responses in various domains and applications	101
79e/chatgpt-web	A commercially viable web application for conversational AI built with React and OpenAI's ChatGPT technology	1,366
kendryte/toucan-llm	A large language model with 70 billion parameters designed for chatbot and conversational AI tasks	29
renshuhuai-andy/timechat	A large language model designed to understand long videos by binding visual content with timestamps and producing video token sequences of varying lengths.	314
zcli-charlie/batgpt	A large language model designed to support long context conversations with improved efficiency and effectiveness	38
open-mmlab/multimodal-gpt	Trains a multimodal chatbot that combines visual and language instructions to generate responses	1,478
nagi-ovo/crag-ollama-chat	A conversational AI demo powered by a large language model	78
opengvlab/multi-modality-arena	An evaluation platform for comparing multi-modality models on visual question-answering tasks	478
m1guelpf/chatgpt-discord	A Discord bot that enables interactive conversations with ChatGPT using a single command.	291
360cvgroup/seechat	A multimodal chatbot with computer vision capabilities integrated into a single model	99
wisconsinaivision/vip-llava	A system designed to enable large multimodal models to understand arbitrary visual prompts	302
ailab-cvc/gpt4tools	An intelligent system that enables automatic control and utilization of visual foundation models to interact with images in conversational settings.	762
showlab/vlog	Transforms video content into a long document containing visual and audio information that can be used for chat or other applications.	545