ydata-profiling

Data Profiler

An exploratory data analysis tool for Pandas and Spark DataFrames

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

GitHub

13k stars
152 watching
2k forks
Language: Python
last commit: 8 days ago
Linked from 6 awesome lists

big-data-analyticsdata-analysisdata-explorationdata-profilingdata-qualitydata-sciencedeep-learningedaexplorationexploratory-data-analysishacktoberfesthtml-reportjupyterjupyter-notebookmachine-learningpandaspandas-dataframepandas-profilingpythonstatistics

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
pandas-dev/pandas A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis. 43,807
pydata/pandas-datareader Extracts data from various internet sources into a pandas DataFrame 2,948
jvns/pandas-cookbook A comprehensive guide to getting started with Python's pandas library using real-world data examples 6,664
pydantic/pydantic A Python library for validating data using type hints and JSON Schema. 21,145
wesm/pydata-book Materials and IPython notebooks for data analysis with Python 22,248
panda-re/panda An open-source platform for analyzing and debugging complex software systems 2,489
twopirllc/pandas-ta A Python package providing an extensive collection of technical analysis indicators and utility functions for financial data analysis. 5,432
adamerose/pandasgui A GUI tool for visualizing and analyzing Pandas DataFrames 3,194
kanaries/pygwalker A Python library that enables interactive data analysis and visualization using an open-source alternative to Tableau. 13,382
databricks/koalas A Python package that allows users to work with pandas DataFrames on top of Apache Spark 3,336
lux-org/lux A Python library that automates data exploration by recommending visualizations and suggesting next steps based on user interest 5,210
unionai-oss/pandera A lightweight library for validating and processing statistical data in Python 3,393
sparklingpandas/sparklingpandas Enables distributed data analysis using PySpark and Pandas APIs 361
ydataai/ydata-synthetic An educational package providing generative models for synthetic data generation. 1,441
sinaptik-ai/pandas-ai Makes data analysis conversational using LLMs and natural language 13,516