Valley
Video Assistant
An offline video assistant system powered by large language models and computer vision techniques.
The official repository of "Video assistant towards large language model makes everything easy"
211 stars
4 watching
14 forks
Language: Python
last commit: 9 months ago Related projects:
Repository | Description | Stars |
---|---|---|
liuzhao1225/youdub-webui | A web-based video processing tool that uses AI to facilitate cultural and linguistic tasks such as transcription, translation, and audio synthesis. | 1,940 |
damo-nlp-sg/videollama2 | An audio-visual language model designed to understand and generate video content | 871 |
vision-cair/longvu | An artificial intelligence system designed to understand and describe long-form video content | 270 |
aliaksandrsiarohin/video-preprocessing | Tools for preprocessing videos for various datasets, including video cropping and annotation. | 518 |
pku-yuangroup/video-bench | Evaluates and benchmarks large language models' video understanding capabilities | 117 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 733 |
nus-hpc-ai-lab/videosys | A toolkit for high-performance video generation and processing using deep learning techniques | 1,773 |
li-xirong/w2vvpp | A deep learning-based video search system using pre-trained models and datasets | 28 |
renshuhuai-andy/timechat | A large language model designed to understand and process long videos with temporal information | 286 |
kylejginavan/youtube_it | A Ruby wrapper for accessing YouTube's video API and managing video content | 595 |
webpilot-ai/webpilot | An extension that enables automatic text interaction with web pages. | 1,784 |
ozmartian/vidcutter | A video editing and management application with cross-platform support | 1,807 |
opengvlab/internvideo | Developing video foundation models and datasets for multimodal understanding and applications | 1,413 |
nkasmanoff/pi-card | An offline voice assistant built on Raspberry Pi using AI and natural language processing | 736 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 269 |