bad-data-guide

Data problem solver

An exhaustive guide to common problems in real-world data and suggestions on how to resolve them.

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

GitHub

4k stars
215 watching
405 forks
last commit: about 3 years ago
Linked from 1 awesome list

datadocumentationguideqz-things

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
adibro/data-science-resources A collection of resources and cheatsheets for learning and practicing data science 63
cleanlab/cleanlab Automates data quality checks and model training with AI-driven methods to improve machine learning performance 9,820
xlaszlo/datascience-fails Collects and documents common pitfalls and failure reasons in data science projects 460
meteoswiss/publication-opendata Provides access to standardized meteorological and climatological data from MeteoSwiss. 70
pedrobarcha/old-books-dataset A collection of scanned book pages with ground truth annotations for OCR research and text analysis 12
hadley/stats337 An educational resource providing discussions on applied data science topics in R, with a focus on practical applications and real-world examples. 1,617
cuemacro/findatapy A unified Python API to download market data from various sources 1,716
ghiggi/gpm_api Provides a Python interface to download and analyze GPM data from NASA's Precipitation Processing System 60
geocryology/globsim Automates downloading and processing of global reanalyses to generate meteorological time series 19
gdsbook/book An interactive introduction to geospatial data analysis using Python and Jupyter Notebook 339
serpro69/kotlin-faker A library for generating realistic fake data for testing and development purposes 475
capitalone/dataprofiler A Python library to analyze and profile datasets, detecting sensitive data and generating reports. 1,442
dataforgoodfr/quotaclimat A tool to quantify media coverage of climate crises by collecting and analyzing radio and TV data from the Mediatree API. 29
getredash/redash Enables users to connect to various data sources, visualize and share their data, making it easy to explore insights and drive business decisions. 26,572
jphall663/gwu_data_mining Materials and lecture notes for a data science and machine learning course at GWU 237