InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

GitHub

1k stars
28 watching
84 forks
Language: Python
last commit: 13 days ago
action-recognitionbenchmarkcontrastive-learningfoundation-modelsinstruction-tuningmasked-autoencodermultimodalopen-set-recognitionself-supervisedspatio-temporal-action-localizationtemporal-action-localizationvideo-clipvideo-datavideo-datasetvideo-question-answeringvideo-retrievalvideo-understandingvision-transformerzero-shot-classificationzero-shot-retrieval