InternVideo
Video foundation models
Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
1k stars
27 watching
91 forks
Language: Python
last commit: about 1 month ago action-recognitionbenchmarkcontrastive-learningfoundation-modelsinstruction-tuningmasked-autoencodermultimodalopen-set-recognitionself-supervisedspatio-temporal-action-localizationtemporal-action-localizationvideo-clipvideo-datavideo-datasetvideo-question-answeringvideo-retrievalvideo-understandingvision-transformerzero-shot-classificationzero-shot-retrieval
Related projects:
Repository | Description | Stars |
---|---|---|
gsig/pyvideoresearch | A collection of video analysis methods and datasets for research and development | 533 |
pku-yuangroup/video-bench | Evaluates and benchmarks large language models' video understanding capabilities | 121 |
0voice/ffmpeg_develop_doc | A collection of resources and tutorials on using FFmpeg for video processing and playback | 1,969 |
danieljf24/hybrid_space | Develops a deep learning framework for video retrieval using text and computer vision | 87 |
laurentkneip/opengv | A collection of computer vision methods for solving geometric vision problems | 1,040 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 956 |
dvlab-research/llama-vid | An image-based language model that uses large language models to generate visual and text features from videos | 748 |
gitcvfb/cvr | Reconstructs high-quality video frames from two adjacent rolling shutter camera frames | 31 |
nvlabs/edm | This project provides a set of tools and techniques to design and improve diffusion-based generative models. | 1,447 |
opengvlab/all-seeing | A research project that develops tools and models for understanding visual data in the open world, enabling applications such as image-text retrieval and relation comprehension. | 466 |
showlab/vlog | Transforms video content into a long document containing visual and audio information that can be used for chat or other applications. | 545 |
aravisproject/aravis | A software library for video acquisition and processing using Genicam cameras. | 918 |
li-xirong/w2vvpp | A deep learning-based video search system using pre-trained models and datasets | 28 |
allyourcodebase/ffmpeg | FFmpeg packaged for Zig | 185 |