Valley

Video Assistant

An offline video assistant system powered by large language models and computer vision techniques.

The official repository of "Video assistant towards large language model makes everything easy"

GitHub

211 stars
4 watching
14 forks
Language: Python
last commit: 9 months ago

Related projects:

Repository Description Stars
liuzhao1225/youdub-webui A web-based video processing tool that uses AI to facilitate cultural and linguistic tasks such as transcription, translation, and audio synthesis. 1,940
damo-nlp-sg/videollama2 An audio-visual language model designed to understand and generate video content 871
vision-cair/longvu An artificial intelligence system designed to understand and describe long-form video content 270
aliaksandrsiarohin/video-preprocessing Tools for preprocessing videos for various datasets, including video cropping and annotation. 518
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 733
nus-hpc-ai-lab/videosys A toolkit for high-performance video generation and processing using deep learning techniques 1,773
li-xirong/w2vvpp A deep learning-based video search system using pre-trained models and datasets 28
renshuhuai-andy/timechat A large language model designed to understand and process long videos with temporal information 286
kylejginavan/youtube_it A Ruby wrapper for accessing YouTube's video API and managing video content 595
webpilot-ai/webpilot An extension that enables automatic text interaction with web pages. 1,784
ozmartian/vidcutter A video editing and management application with cross-platform support 1,807
opengvlab/internvideo Developing video foundation models and datasets for multimodal understanding and applications 1,413
nkasmanoff/pi-card An offline voice assistant built on Raspberry Pi using AI and natural language processing 736
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 269