Valley
Video Assistant
An offline video assistant system powered by large language models and computer vision techniques.
The official repository of "Video assistant towards large language model makes everything easy"
210 stars
4 watching
14 forks
Language: Python
last commit: 11 months ago Related projects:
Repository | Description | Stars |
---|---|---|
liuzhao1225/youdub-webui | A web-based video processing tool that uses AI to facilitate cultural and linguistic tasks such as transcription, translation, and audio synthesis. | 1,980 |
damo-nlp-sg/videollama2 | An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing. | 957 |
vision-cair/longvu | An artificial intelligence system designed to understand and describe long-form video content | 329 |
aliaksandrsiarohin/video-preprocessing | Tools for preprocessing videos for various datasets, including video cropping and annotation. | 522 |
pku-yuangroup/video-bench | Evaluates and benchmarks large language models' video understanding capabilities | 121 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 748 |
nus-hpc-ai-lab/videosys | A comprehensive toolkit for high-performance video generation and processing | 1,819 |
li-xirong/w2vvpp | A deep learning-based video search system using pre-trained models and datasets | 28 |
renshuhuai-andy/timechat | A large language model designed to understand long videos by binding visual content with timestamps and producing video token sequences of varying lengths. | 314 |
kylejginavan/youtube_it | A Ruby wrapper for accessing YouTube's video API and managing video content | 595 |
webpilot-ai/webpilot | An extension for Google Chrome that enables users to engage in conversations with web pages or argue with other users. | 1,796 |
ozmartian/vidcutter | A video editing and management application with cross-platform support | 1,821 |
opengvlab/internvideo | Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning. | 1,467 |
nkasmanoff/pi-card | An AI-powered conversational assistant built on top of a Raspberry Pi. | 747 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |