Open-Sora-Dataset

Video dataset

A large video dataset collected from various open-source websites for use in computer vision and multimedia applications.

GitHub

94 stars
8 watching
6 forks
Language: Python
last commit: 7 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 121
pku-yuangroup/magictime Generates time-lapse videos from text inputs using deep learning models. 1,312
pku-yuangroup/chronomagic-bench Provides a benchmarking framework for evaluating the quality of text-to-video generation models 191
gsig/pyvideoresearch A collection of video analysis methods and datasets for research and development 533
google-research/cad-estate A large dataset of 3D object and room layout annotations on RGB videos, designed to test automatic scene understanding methods. 106
openarabic/ocr_gs_data A collection of double-checked gold standard data for training and testing OCR engines. 13
jxshin/mzdata A comprehensive dataset of Mozilla issue tracking history, providing multiple extracts and levels for analysis. 7
ubisoft/ubisoft-laforge-animation-dataset An animation dataset for studying human motion and developing computer vision algorithms 1,042
openearth/videomap Tools for processing and exporting video map data 2
pku-yuangroup/chat-univi A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. 895
littleyuyu/stackoverflow-question-code-dataset A collection of mined question-code pairs from Stack Overflow used for training and testing AI models 166
opengvlab/internvideo Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning. 1,467
nytud/hulu A collection of linguistic datasets and benchmarks for natural language understanding tasks 8
pharo-ai/datasets A Smalltalk library for loading and managing datasets as data frames. 9
pythainlp/prachathai-67k An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification 16