mmt

Video retriever

Develops a cross-modal architecture for video retrieval by combining multiple types of features from videos and text

Multi-Modal Transformer for Video Retrieval

GitHub

258 stars
10 watching
41 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list

fusionlanguagemultimodalnlpvideovision

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
cshizhe/hgr_v2t An implementation of a video-text retrieval model using hierarchical graph reasoning with PyTorch. 209
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,477
danieljf24/hybrid_space Develops a deep learning framework for video retrieval using text and computer vision 87
mltframework/mlt A multimedia framework designed for video editing, providing tools and libraries for audio and video processing. 1,506
krassowski/jupyter-manim Enables display of video output from 3D animation software in Jupyter notebooks 196
jvt038/metatube A Python-based tool to download YouTube videos and add metadata from various providers. 325
mdhiggins/sickbeard_mp4_automator Automates video file conversion and metadata tagging to create a uniform media library 1,530
danieljf24/dual_encoding A deep learning project that provides a video-text retrieval model and tools for training and evaluating it on the MSR-VTT dataset 155
vision-cair/longvu An artificial intelligence system designed to understand and describe long-form video content 270
rese1f/moviechat A deep learning model designed to efficiently process and analyze long videos using large language models 525
antoine77340/mixture-of-embedding-experts An open-source implementation of the Mixture-of-Embeddings-Experts model in Pytorch for video-text retrieval tasks. 118
dyne/frei0r A collection of reusable video processing components 443
lettier/movie-monad A lightweight video player written in Haskell with support for various media formats and playback controls. 424
gamrix/cs231n_proj This project focuses on manipulating 3D views using deep learning techniques. 6
nickvisionapps/parabolic Downloads videos from the web and provides an interface for managing downloads 1,035