datasets

ML data loader

A tool providing efficient data manipulation and loading for machine learning models

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

GitHub

19k stars
277 watching
3k forks
Language: Python
last commit: 4 days ago
Linked from 1 awesome list

computer-visiondatasetsdeep-learninghacktoberfestmachine-learningnatural-language-processingnlpnumpypandaspytorchspeechtensorflow

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
huggingface/transformers A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. 136,357
huggingface/datatrove A platform-agnostic data processing framework for large-scale text data pipelines 2,103
huggingface/tokenizers A toolkit providing optimized tokenizers for natural language processing tasks in various programming languages. 9,156
huggingface/diffusion-models-class A comprehensive course teaching the theory and hands-on implementation of diffusion models in image and audio generation using PyTorch. 3,722
switchablenorms/celebamask-hq A large-scale face image dataset for training and evaluating algorithms in face parsing, recognition, generation, and editing. 2,136
huggingface/transformers.js An open-source JavaScript library for running machine learning models in the browser without a server. 12,363
huggingface/lerobot A platform providing pre-trained models, datasets, and tools for robotics with focus on imitation learning and reinforcement learning. 7,874
pandas-dev/pandas A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis. 44,052
ayush1997/visualize_ml A Python package for data analysis and visualization in machine learning 198
huggingface/peft An efficient method for fine-tuning large pre-trained models by adapting only a small fraction of their parameters 16,699
iigroup/mm-celeba-hq-dataset A large-scale dataset for training and evaluating algorithms for text-driven face generation and understanding tasks. 223
huggingface/diffusers A PyTorch-based library for training and using state-of-the-art diffusion models to generate images, audio, and 3D structures 26,676
dotnet/machinelearning-samples A collection of samples and examples demonstrating the usage of ML.NET for machine learning tasks in .NET applications. 4,508
rosejn/torch-datasets A collection of pre-processed machine learning datasets for use with the Torch7 deep learning framework. 37
huggingface/alignment-handbook Provides recipes and guidelines for training language models to align with human preferences and AI goals 4,800