mmt
Video retriever
Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text
Multi-Modal Transformer for Video Retrieval
258 stars
10 watching
41 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list
fusionlanguagemultimodalnlpvideovision
Related projects:
Repository | Description | Stars |
---|---|---|
cshizhe/hgr_v2t | An implementation of a video-text retrieval model using hierarchical graph reasoning with PyTorch. | 209 |
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,477 |
danieljf24/hybrid_space | Develops a deep learning framework for video retrieval using text and computer vision | 87 |
mltframework/mlt | A multimedia framework designed for video editing, providing tools and libraries for audio and video processing. | 1,506 |
krassowski/jupyter-manim | Enables display of video output from 3D animation software in Jupyter notebooks | 196 |
jvt038/metatube | A Python-based tool to download YouTube videos and add metadata from various providers. | 325 |
mdhiggins/sickbeard_mp4_automator | Automates video file conversion and metadata tagging to create a uniform media library | 1,530 |
danieljf24/dual_encoding | A deep learning project that provides a video-text retrieval model and tools for training and evaluating it on the MSR-VTT dataset | 155 |
vision-cair/longvu | An artificial intelligence system designed to understand and describe long-form video content | 270 |
rese1f/moviechat | A deep learning model designed to efficiently process and analyze long videos using large language models | 525 |
antoine77340/mixture-of-embedding-experts | An open-source implementation of the Mixture-of-Embeddings-Experts model in Pytorch for video-text retrieval tasks. | 118 |
dyne/frei0r | A collection of reusable video processing components | 443 |
lettier/movie-monad | A lightweight video player written in Haskell with support for various media formats and playback controls. | 424 |
gamrix/cs231n_proj | This project focuses on manipulating 3D views using deep learning techniques. | 6 |
nickvisionapps/parabolic | Downloads videos from the web and provides an interface for managing downloads | 1,035 |