LongVU

Video describer

An artificial intelligence system designed to understand and describe long-form video content

329 stars

5 watching

22 forks

Language: Python

last commit: over 1 year ago

Screenshot of Vision-CAIR/LongVU website

vision-cair.github.io/LongVU

Related projects:

Repository	Description	Stars
vision-cair/chatcaptioner	Enables automatic generation of descriptive text from images and videos based on user input.	457
gordonhu608/mqt-llava	A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens.	101
li-xirong/w2vvpp	A deep learning-based video search system using pre-trained models and datasets	28
cvondrick/vatic	Tools for efficiently scaling up video annotation using crowdsourced marketplaces.	609
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
gabeur/mmt	Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text	259
microsoft/vision-longformer	An implementation of a vision transformer architecture designed for high-resolution image encoding with multiple efficient attention mechanisms	243
nus-hpc-ai-lab/videosys	A comprehensive toolkit for high-performance video generation and processing	1,819
rupertluo/valley	An offline video assistant system powered by large language models and computer vision techniques.	210
longwei/qmlvideo	A video player that uses VLC as the decoder and renders QML components on OpenGL textures.	33
liuzhao1225/youdub-webui	A web-based video processing tool that uses AI to facilitate cultural and linguistic tasks such as transcription, translation, and audio synthesis.	1,980
rese1f/moviechat	Develops a method for long video understanding by optimizing memory usage	550
aliaksandrsiarohin/video-preprocessing	Tools for preprocessing videos for various datasets, including video cropping and annotation.	522
xiadingz/video-caption.pytorch	PyTorch implementation of video captioning, combining deep learning and computer vision techniques.	402
damo-nlp-sg/videollama2	An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing.	957