ydata-profiling

Data Profiler

An exploratory data analysis tool for Pandas and Spark DataFrames

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

GitHub

13k stars
152 watching
2k forks
Language: Python
last commit: 1 day ago
Linked from 6 awesome lists

big-data-analyticsdata-analysisdata-explorationdata-profilingdata-qualitydata-sciencedeep-learningedaexplorationexploratory-data-analysishacktoberfesthtml-reportjupyterjupyter-notebookmachine-learningpandaspandas-dataframepandas-profilingpythonstatistics

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
pandas-dev/pandas A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis. 44,052
pydata/pandas-datareader Extracts data from various internet sources into a pandas DataFrame 2,982
jvns/pandas-cookbook A comprehensive guide to getting started with Python's pandas library using real-world data examples 6,697
pydantic/pydantic A Python library for validating data using type hints and JSON Schema. 21,677
wesm/pydata-book Materials and IPython notebooks for data analysis with Python 22,389
panda-re/panda An open-source platform for analyzing and debugging complex software systems 2,507
twopirllc/pandas-ta A Python package providing an extensive collection of technical analysis indicators and utility functions for financial data analysis. 5,545
adamerose/pandasgui A GUI tool for visualizing and analyzing Pandas DataFrames 3,204
kanaries/pygwalker A Python library that enables interactive data analysis and visualization using an open-source alternative to Tableau. 13,533
databricks/koalas A Python package that allows users to work with pandas DataFrames on top of Apache Spark 3,343
lux-org/lux A Python library that automates data exploration by recommending visualizations and suggesting next steps based on user interest 5,226
unionai-oss/pandera A lightweight library for validating and processing statistical data in Python 3,472
sparklingpandas/sparklingpandas Enables distributed data analysis using PySpark and Pandas APIs 362
ydataai/ydata-synthetic An educational package providing generative models for synthetic data generation. 1,456
sinaptik-ai/pandas-ai Makes data analysis conversational using LLMs and natural language 13,714