LongVU
Video describer
An artificial intelligence system designed to understand and describe long-form video content
329 stars
5 watching
22 forks
Language: Python
last commit: 4 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| Enables automatic generation of descriptive text from images and videos based on user input. | 457 |
| A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 101 |
| A deep learning-based video search system using pre-trained models and datasets | 28 |
| Tools for efficiently scaling up video annotation using crowdsourced marketplaces. | 609 |
| An image-based language model that uses large language models to generate visual and text features from videos | 748 |
| Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text | 259 |
| An implementation of a vision transformer architecture designed for high-resolution image encoding with multiple efficient attention mechanisms | 243 |
| A comprehensive toolkit for high-performance video generation and processing | 1,819 |
| An offline video assistant system powered by large language models and computer vision techniques. | 210 |
| A video player that uses VLC as the decoder and renders QML components on OpenGL textures. | 33 |
| A web-based video processing tool that uses AI to facilitate cultural and linguistic tasks such as transcription, translation, and audio synthesis. | 1,980 |
| Develops a method for long video understanding by optimizing memory usage | 550 |
| Tools for preprocessing videos for various datasets, including video cropping and annotation. | 522 |
| PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 402 |
| An audio-visual language model designed to advance spatial-temporal modeling and audio understanding in video processing. | 957 |