Open-Sora-Dataset

Video dataset

A large video dataset collected from various open-source websites for use in computer vision and multimedia applications.

94 stars

8 watching

6 forks

Language: Python

last commit: about 2 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

amrzv/awesome-colab-notebooks

Related projects:

Repository	Description	Stars
pku-yuangroup/video-bench	Evaluates and benchmarks large language models' video understanding capabilities	121
pku-yuangroup/magictime	Generates time-lapse videos from text inputs using deep learning models.	1,312
pku-yuangroup/chronomagic-bench	Provides a benchmarking framework for evaluating the quality of text-to-video generation models	191
gsig/pyvideoresearch	A collection of video analysis methods and datasets for research and development	533
google-research/cad-estate	A large dataset of 3D object and room layout annotations on RGB videos, designed to test automatic scene understanding methods.	106
openarabic/ocr_gs_data	A collection of double-checked gold standard data for training and testing OCR engines.	13
jxshin/mzdata	A comprehensive dataset of Mozilla issue tracking history, providing multiple extracts and levels for analysis.	7
ubisoft/ubisoft-laforge-animation-dataset	An animation dataset for studying human motion and developing computer vision algorithms	1,042
openearth/videomap	Tools for processing and exporting video map data	2
pku-yuangroup/chat-univi	A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data.	895
littleyuyu/stackoverflow-question-code-dataset	A collection of mined question-code pairs from Stack Overflow used for training and testing AI models	166
opengvlab/internvideo	Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning.	1,467
nytud/hulu	A collection of linguistic datasets and benchmarks for natural language understanding tasks	8
pharo-ai/datasets	A Smalltalk library for loading and managing datasets as data frames.	9
pythainlp/prachathai-67k	An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification	16