mmt

Video retriever

Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text

Multi-Modal Transformer for Video Retrieval

GitHub

259 stars
10 watching
40 forks
Language: Python
last commit: 6 months ago
Linked from 1 awesome list

fusionlanguagemultimodalnlpvideovision

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
cshizhe/hgr_v2t An implementation of a video-text retrieval model using hierarchical graph reasoning with PyTorch. 210
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,478
danieljf24/hybrid_space Develops a deep learning framework for video retrieval using text and computer vision 87
mltframework/mlt A multimedia framework designed for video editing, providing tools and libraries for audio and video processing. 1,522
krassowski/jupyter-manim Enables display of video output from 3D animation software in Jupyter notebooks 196
jvt038/metatube A Python-based tool to download YouTube videos and add metadata from various providers. 328
mdhiggins/sickbeard_mp4_automator Automates video file conversion and metadata tagging to create a uniform media library 1,536
danieljf24/dual_encoding A deep learning project that provides a video-text retrieval model and tools for training and evaluating it on the MSR-VTT dataset 154
vision-cair/longvu An artificial intelligence system designed to understand and describe long-form video content 329
rese1f/moviechat Develops a method for long video understanding by optimizing memory usage 550
antoine77340/mixture-of-embedding-experts An open-source implementation of the Mixture-of-Embeddings-Experts model in Pytorch for video-text retrieval tasks. 118
dyne/frei0r A collection of reusable video processing components 452
lettier/movie-monad A lightweight video player written in Haskell with support for various media formats and playback controls. 423
gamrix/cs231n_proj This project focuses on manipulating 3D views using deep learning techniques. 6
nickvisionapps/parabolic Downloads videos from the web and provides an interface for managing downloads 1,108