LongVU

Video describer

An artificial intelligence system designed to understand and describe long-form video content

GitHub

270 stars
6 watching
19 forks
Language: Python
last commit: 16 days ago

Related projects:

Repository Description Stars
vision-cair/chatcaptioner Enables automatic generation of descriptive text from images and videos based on user input. 452
gordonhu608/mqt-llava A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. 97
li-xirong/w2vvpp A deep learning-based video search system using pre-trained models and datasets 28
cvondrick/vatic Tools for efficiently scaling up video annotation using crowdsourced marketplaces. 607
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 733
gabeur/mmt Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text 258
microsoft/vision-longformer An implementation of a vision transformer architecture designed for high-resolution image encoding with multiple efficient attention mechanisms 241
nus-hpc-ai-lab/videosys A toolkit for high-performance video generation and processing using deep learning techniques 1,773
rupertluo/valley An offline video assistant system powered by large language models and computer vision techniques. 211
longwei/qmlvideo A video player that uses VLC as the decoder and renders QML components on OpenGL textures. 34
liuzhao1225/youdub-webui A web-based video processing tool that uses AI to facilitate cultural and linguistic tasks such as transcription, translation, and audio synthesis. 1,940
rese1f/moviechat A deep learning model designed to efficiently process and analyze long videos using large language models 525
aliaksandrsiarohin/video-preprocessing Tools for preprocessing videos for various datasets, including video cropping and annotation. 518
xiadingz/video-caption.pytorch PyTorch implementation of video captioning, combining deep learning and computer vision techniques. 401
damo-nlp-sg/videollama2 An audio-visual language model designed to understand and generate video content 871