deequ
Data inspector
A library for testing data quality in large datasets
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
3k stars
81 watching
539 forks
Language: Scala
last commit: about 1 month ago
Linked from 3 awesome lists
dataqualityscalasparkunit-testing
Related projects:
Repository | Description | Stars |
---|---|---|
awslabs/python-deequ | A Python API for defining unit tests for data quality in large datasets | 730 |
databricks/koalas | A Python package that allows users to work with pandas DataFrames on top of Apache Spark | 3,336 |
dmmiller612/sparktorch | A PyTorch implementation on Apache Spark for distributed deep learning model training and inference. | 339 |
databricks/learning-spark | Examples and tutorials for learning Spark using Java and Scala | 3,890 |
spiritlab/spark | A research-focused implementation of Apache Spark with homomorphic encryption support | 3 |
spark-notebook/spark-notebook | An interactive web-based editor for exploring and analyzing large datasets using Scala, Apache Spark, and other data science tools | 3,151 |
mrpowers-io/spark-fast-tests | A testing helper library for Apache Spark applications. | 436 |
johnsnowlabs/spark-nlp | Provides a set of pre-trained models and libraries for natural language processing tasks on top of Apache Spark | 3,871 |
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 39,916 |
dotnet/spark | Provides high-performance APIs for using Apache Spark with .NET | 2,023 |
datastax/spark-cassandra-connector | A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis. | 1,943 |
tofgarion/spark-by-example | An adaptation of ACSL by Example for SPARK 2014 to verify Ada programs with formal methods | 152 |
yaooqinn/itachi | A library that brings useful functions from various modern database management systems to Apache Spark | 56 |
databricks/spark-corenlp | Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks | 422 |
apache/jmeter | A tool used to simulate heavy loads on servers and measure their performance under different conditions. | 8,413 |