Valley

Video Assistant

An offline video assistant system powered by large language models and computer vision techniques.

The official repository of "Video assistant towards large language model makes everything easy"

GitHub

210 stars
4 watching
14 forks
Language: Python
last commit: 11 months ago

Related projects:

Repository Description Stars
liuzhao1225/youdub-webui A web-based video processing tool that uses AI to facilitate cultural and linguistic tasks such as transcription, translation, and audio synthesis. 1,980
damo-nlp-sg/videollama2 An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing. 957
vision-cair/longvu An artificial intelligence system designed to understand and describe long-form video content 329
aliaksandrsiarohin/video-preprocessing Tools for preprocessing videos for various datasets, including video cropping and annotation. 522
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 121
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 748
nus-hpc-ai-lab/videosys A comprehensive toolkit for high-performance video generation and processing 1,819
li-xirong/w2vvpp A deep learning-based video search system using pre-trained models and datasets 28
renshuhuai-andy/timechat A large language model designed to understand long videos by binding visual content with timestamps and producing video token sequences of varying lengths. 314
kylejginavan/youtube_it A Ruby wrapper for accessing YouTube's video API and managing video content 595
webpilot-ai/webpilot An extension for Google Chrome that enables users to engage in conversations with web pages or argue with other users. 1,796
ozmartian/vidcutter A video editing and management application with cross-platform support 1,821
opengvlab/internvideo Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning. 1,467
nkasmanoff/pi-card An AI-powered conversational assistant built on top of a Raspberry Pi. 747
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 270