Open-Sora-Dataset
Video dataset
A large video dataset collected from various open-source websites for use in computer vision and multimedia applications.
94 stars
8 watching
6 forks
Language: Python
last commit: 7 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
pku-yuangroup/video-bench | Evaluates and benchmarks large language models' video understanding capabilities | 121 |
pku-yuangroup/magictime | Generates time-lapse videos from text inputs using deep learning models. | 1,312 |
pku-yuangroup/chronomagic-bench | Provides a benchmarking framework for evaluating the quality of text-to-video generation models | 191 |
gsig/pyvideoresearch | A collection of video analysis methods and datasets for research and development | 533 |
google-research/cad-estate | A large dataset of 3D object and room layout annotations on RGB videos, designed to test automatic scene understanding methods. | 106 |
openarabic/ocr_gs_data | A collection of double-checked gold standard data for training and testing OCR engines. | 13 |
jxshin/mzdata | A comprehensive dataset of Mozilla issue tracking history, providing multiple extracts and levels for analysis. | 7 |
ubisoft/ubisoft-laforge-animation-dataset | An animation dataset for studying human motion and developing computer vision algorithms | 1,042 |
openearth/videomap | Tools for processing and exporting video map data | 2 |
pku-yuangroup/chat-univi | A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. | 895 |
littleyuyu/stackoverflow-question-code-dataset | A collection of mined question-code pairs from Stack Overflow used for training and testing AI models | 166 |
opengvlab/internvideo | Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning. | 1,467 |
nytud/hulu | A collection of linguistic datasets and benchmarks for natural language understanding tasks | 8 |
pharo-ai/datasets | A Smalltalk library for loading and managing datasets as data frames. | 9 |
pythainlp/prachathai-67k | An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification | 16 |