gobblin

Data integrator

A distributed data integration framework for managing structured and byte-oriented data in heterogeneous ecosystems

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

GitHub

2k stars
165 watching
751 forks
Language: Java
last commit: 7 days ago
Linked from 6 awesome lists

apachedataingestionmanagementreplication

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
sonalgoyal/hiho A tool for integrating data from various sources into a centralized repository on Hadoop 91
usc-isi-i2/web-karma An information integration tool for data modeling and transformation from various data sources into standardized RDF format. 588
smooks/smooks A Java framework for building flexible data integration pipelines 397
379547990/tdengine_hivemq A system that integrates data from various sources through MQTT and TDengine 0
rudderlabs/airbyte A platform for replicating data from various sources to different destinations, enabling flexible and customizable data integration. 3
jsxlei/scalex A tool for integrating single-cell data from heterogeneous sources into a common cell-space using deep learning techniques. 72
apache/flink-connector-hbase A connector for integrating HBase with the Apache Flink stream processing framework 29
linkedpipes/etl An ETL tool for integrating data from various sources into a centralized knowledge graph using RDF 147
anishkny/integrify Enforces referential and data integrity in Cloud Firestore using triggers to automate tasks such as attribute replication, reference deletion, and counting updates. 109
cipher387/maltego-transforms-list A curated list of tools that provide data processing and integration capabilities for the Maltego graphical analysis tool. 226
okgrow/analytics An integration package for Meteor that automatically records and sends user data and page view events to various analytics services. 213
jazzband/django-analytical An application that integrates multiple analytics services into Django projects in a generic and customizable way. 1,201
googlecloudplatform/healthcare-data-harmonization A mapping language and engine for converting complex data from one schema to another, applicable across domains. 213
mravi/kafka-connect-hbase A tool that enables real-time data integration between Apache Kafka and HBase using Java 43
unipop-graph/unipop An integration platform for graph data models using Gremlin and Tinkerpop 205