InternVideo

Video frameworks

Developing video foundation models and datasets for multimodal understanding and applications

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

GitHub

1k stars
29 watching
85 forks
Language: Python
last commit: about 2 months ago
action-recognitionbenchmarkcontrastive-learningfoundation-modelsinstruction-tuningmasked-autoencodermultimodalopen-set-recognitionself-supervisedspatio-temporal-action-localizationtemporal-action-localizationvideo-clipvideo-datavideo-datasetvideo-question-answeringvideo-retrievalvideo-understandingvision-transformerzero-shot-classificationzero-shot-retrieval

Related projects:

Repository Description Stars
gsig/pyvideoresearch A collection of video analysis methods and datasets for research and development 533
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
0voice/ffmpeg_develop_doc A repository aggregating online ffmpeg learning resources and documentation for developing multimedia software. 1,945
danieljf24/hybrid_space Develops a deep learning framework for video retrieval using text and computer vision 87
laurentkneip/opengv A collection of computer vision methods for solving geometric vision problems. 1,031
opengvlab/visionllm A large language model designed to process and generate visual information 915
dvlab-research/llama-vid An image-based language model that uses large language models to generate visual and text features from videos 733
gitcvfb/cvr Reconstructs high-quality video frames from two adjacent rolling shutter camera frames 31
nvlabs/edm This project provides a set of tools and techniques to design and improve diffusion-based generative models. 1,399
opengvlab/all-seeing A research project that develops tools and models for understanding visual data in the open world, enabling applications such as image-text retrieval and relation comprehension. 459
showlab/vlog Transforms video content into a long document containing visual and audio information that can be used for chat or other applications. 538
aravisproject/aravis A software library for video acquisition and processing using Genicam cameras. 897
li-xirong/w2vvpp A deep learning-based video search system using pre-trained models and datasets 28
allyourcodebase/ffmpeg FFmpeg packaged for Zig 178