datafu

Data analytics framework

A collection of libraries for data mining and statistics in large-scale Hadoop environments

Mirror of Apache DataFu

GitHub

119 stars
18 watching
64 forks
Language: Java
last commit: 9 days ago
Linked from 1 awesome list

datafu

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
linkedinattic/datafu A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions. 583
apache/atlas A framework providing governance services for big data ecosystems, enabling enterprises to manage compliance and security 1,850
apache/accumulo A distributed key/value store with robust data storage and retrieval capabilities 1,075
apache/pig Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. 682
jsdf/flux-coffee An implementation of Facebook's Flux pattern for managing and observing datastores in a client-side web application 16
apache/druid A high-performance real-time analytics database for fast queries and ingest 13,548
topepo/fes Provides code and data sets to support the analysis of feature engineering and selection in predictive models 726
wu-lang/wu A language and data processing framework designed to balance control, readability, and scalability with a syntax inspired by Rust. 473
apache/samza A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees 817
apache/datafusion-python A Python library that provides a data processing and querying framework using the Apache Arrow in-memory query engine. 385
guigarage/datafx A JavaFX UI control data management framework that simplifies common data tasks such as population, sorting, filtering and editing. 116
digipolisantwerp/dataaccess_aspnetcore_deprecated A repository and unit-of-work framework for ASP.NET Core data access using Entity Framework. 140
ashleyyakeley/truth A framework for composing typed interfaces to information and an interpreted language for structuring personal data and creating user interfaces. 32
apache/systemds An end-to-end data science platform that integrates data integration, machine learning model training, and deployment 1,038