InternVideo

Video foundation models

Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning.

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

GitHub

1k stars

27 watching

91 forks

Language: Python

last commit: 8 months ago

action-recognitionbenchmarkcontrastive-learningfoundation-modelsinstruction-tuningmasked-autoencodermultimodalopen-set-recognitionself-supervisedspatio-temporal-action-localizationtemporal-action-localizationvideo-clipvideo-datavideo-datasetvideo-question-answeringvideo-retrievalvideo-understandingvision-transformerzero-shot-classificationzero-shot-retrieval

Related projects:

Repository	Description	Stars
gsig/pyvideoresearch	A collection of video analysis methods and datasets for research and development	533
pku-yuangroup/video-bench	Evaluates and benchmarks large language models' video understanding capabilities	121
0voice/ffmpeg_develop_doc	A collection of resources and tutorials on using FFmpeg for video processing and playback	1,969
danieljf24/hybrid_space	Develops a deep learning framework for video retrieval using text and computer vision	87
laurentkneip/opengv	A collection of computer vision methods for solving geometric vision problems	1,040
opengvlab/visionllm	A large language model designed to process and generate visual information	956
dvlab-research/llama-vid	An image-based language model that uses large language models to generate visual and text features from videos	748
gitcvfb/cvr	Reconstructs high-quality video frames from two adjacent rolling shutter camera frames	31
nvlabs/edm	This project provides a set of tools and techniques to design and improve diffusion-based generative models.	1,447
opengvlab/all-seeing	A research project that develops tools and models for understanding visual data in the open world, enabling applications such as image-text retrieval and relation comprehension.	466
showlab/vlog	Transforms video content into a long document containing visual and audio information that can be used for chat or other applications.	545
aravisproject/aravis	A software library for video acquisition and processing using Genicam cameras.	918
li-xirong/w2vvpp	A deep learning-based video search system using pre-trained models and datasets	28
allyourcodebase/ffmpeg	FFmpeg packaged for Zig	185