LongVU
Video describer
An artificial intelligence system designed to understand and describe long-form video content
270 stars
6 watching
19 forks
Language: Python
last commit: 16 days ago Related projects:
Repository | Description | Stars |
---|---|---|
vision-cair/chatcaptioner | Enables automatic generation of descriptive text from images and videos based on user input. | 452 |
gordonhu608/mqt-llava | A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 97 |
li-xirong/w2vvpp | A deep learning-based video search system using pre-trained models and datasets | 28 |
cvondrick/vatic | Tools for efficiently scaling up video annotation using crowdsourced marketplaces. | 607 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 733 |
gabeur/mmt | Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text | 258 |
microsoft/vision-longformer | An implementation of a vision transformer architecture designed for high-resolution image encoding with multiple efficient attention mechanisms | 241 |
nus-hpc-ai-lab/videosys | A toolkit for high-performance video generation and processing using deep learning techniques | 1,773 |
rupertluo/valley | An offline video assistant system powered by large language models and computer vision techniques. | 211 |
longwei/qmlvideo | A video player that uses VLC as the decoder and renders QML components on OpenGL textures. | 34 |
liuzhao1225/youdub-webui | A web-based video processing tool that uses AI to facilitate cultural and linguistic tasks such as transcription, translation, and audio synthesis. | 1,940 |
rese1f/moviechat | A deep learning model designed to efficiently process and analyze long videos using large language models | 525 |
aliaksandrsiarohin/video-preprocessing | Tools for preprocessing videos for various datasets, including video cropping and annotation. | 518 |
xiadingz/video-caption.pytorch | PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 401 |
damo-nlp-sg/videollama2 | An audio-visual language model designed to understand and generate video content | 871 |