awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

GitHub

13k stars
846 watching
3k forks
last commit: 5 months ago
Linked from 14 awesome lists

awesomeawesome-listbigdatadatadata-analyticsdata-sciencedata-streamdata-visualizationdata-warehousedatabasedistributed-databaseseries-databasestream-processingstreaming-datavisualize-data

Awesome Big Data / RDBMS

MySQL The world's most popular open source database
PostgreSQL The world's most advanced open source database
Oracle Database object-relational database management system
Teradata high-performance MPP data warehouse platform

Awesome Big Data / Frameworks

Bistro 1,033 over 1 year ago general-purpose data processing engine for both batch and stream analytics. It is based on a novel data model, which represents data via and processes data via as opposed to having only set operations in conventional approaches like MapReduce or SQL
IBM Streams platform for distributed processing and real-time analytics. Integrates with many of the popular technologies in the Big Data ecosystem (Kafka, HDFS, Spark, etc.)
Apache Hadoop framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system)
Tigon 284 over 7 years ago High Throughput Real-time Stream Processing Framework
Pachyderm Pachyderm is a data storage platform built on Docker and Kubernetes to provide reproducible data processing and analysis
Polyaxon 3,555 7 days ago A platform for reproducible and scalable machine learning and deep learning
Smooks 395 6 days ago An extensible Java framework for building XML and non-XML (CSV, EDI, Java, etc...) streaming applications

Awesome Big Data / Distributed Programming

AddThis Hydra 435 over 4 years ago distributed data processing and storage system originally developed at AddThis
AMPLab SIMR run Spark on Hadoop MapReduce v1
Apache APEX a unified, enterprise platform for big data stream and batch processing
Apache Beam an unified model and set of language-specific SDKs for defining and executing data processing workflows
Apache Crunch a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce
Apache DataFu collection of user-defined functions for Hadoop and Pig developed by LinkedIn
Apache Flink high-performance runtime, and automatic program optimization
Apache Gearpump real-time big data streaming engine based on Akka
Apache Gora framework for in-memory data model and persistence
Apache Hama BSP (Bulk Synchronous Parallel) computing framework
Apache MapReduce programming model for processing large data sets with a parallel, distributed algorithm on a cluster
Apache Pig high level language to express data analysis programs for Hadoop
Apache REEF retainable evaluator execution framework to simplify and unify the lower layers of big data systems
Apache S4 framework for stream processing, implementation of S4
Apache Spark framework for in-memory cluster computing
Apache Spark Streaming framework for stream processing, part of Spark
Apache Storm framework for stream processing by Twitter also on YARN
Apache Samza stream processing framework, based on Kafka and YARN
Apache Tez application framework for executing a complex DAG (directed acyclic graph) of tasks, built on YARN
Apache Twill abstraction over YARN that reduces the complexity of developing distributed applications
Baidu Bigflow an interface that allows for writing distributed computing programs providing lots of simple, flexible, powerful APIs to easily handle data of any scale
Cascalog data processing and querying library
Cheetah High Performance, Custom Data Warehouse on Top of MapReduce
Concurrent Cascading framework for data management/analytics on Hadoop
Damballa Parkour 257 over 8 years ago MapReduce library for Clojure
Datasalt Pangool 57 over 2 years ago alternative MapReduce paradigm
DataTorrent StrAM real-time engine is designed to enable distributed, asynchronous, real time in-memory big-data computations in as unblocked a way as possible, with minimal overhead and impact on performance
Facebook Corona Hadoop enhancement which removes single point of failure
Facebook Peregrine Map Reduce framework
Facebook Scuba distributed in-memory datastore
Google Dataflow create data pipelines to help themæingest, transform and analyze data
Google MapReduce map reduce framework
Google MillWheel fault tolerant stream processing framework
IBM Streams platform for distributed processing and real-time analytics. Provides toolkits for advanced analytics like geospatial, time series, etc. out of the box
JAQL declarative programming language for working with structured, semi-structured and unstructured data
Kite is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem
Metamarkets Druid framework for real-time analysis of large datasets
Netflix PigPen 566 over 1 year ago map-reduce for Clojure which compiles to Apache Pig
Nokia Disco MapReduce framework developed by Nokia
Onyx Distributed computation for the cloud
Pinterest Pinlater asynchronous job execution system
Pydoop Python MapReduce and HDFS API for Hadoop
Ray 33,240 5 days ago A fast and simple framework for building and running distributed applications
Rackerlabs Blueflood multi-tenant distributed metric processing system
Skale 399 over 3 years ago High performance distributed data processing in NodeJS
Stratosphere general purpose cluster computing framework
Streamdrill useful for counting activities of event streams over different time windows and finding the most active one
streamsx.topology 29 about 2 years ago Libraries to enable building IBM Streams application in Java, Python or Scala
Tuktu 60 over 6 years ago Easy-to-use platform for batch and streaming computation, built using Scala, Akka and Play!
Twitter Heron 3,645 over 1 year ago Heron is a realtime, distributed, fault-tolerant stream processing engine from Twitter replacing Storm
Twitter Scalding 3,495 over 1 year ago Scala library for Map Reduce jobs, built on Cascading
Twitter Summingbird 2,140 over 2 years ago Streaming MapReduce with Scalding and Storm, by Twitter
Twitter TSAR TimeSeries AggregatoR by Twitter
Wallaroo The ultrafast and elastic data processing engine. Big or fast data - no fuss, no Java needed

Awesome Big Data / Distributed Filesystem

Ambry 1,741 5 days ago a distributed object store that supports storage of trillion of small immutable objects as well as billions of large objects
Apache HDFS a way to store large files across multiple machines
Apache Kudu Hadoop's storage layer to enable fast analytics on fast data
BeeGFS formerly FhGFS, parallel distributed file system
Ceph Filesystem software storage platform designed
Disco DDFS distributed filesystem
Facebook Haystack object storage system
Google GFS distributed filesystem
Google Megastore scalable, highly available storage
GridGain GGFS, Hadoop compliant in-memory file system
Lustre file system high-performance distributed filesystem
Microsoft Azure Data Lake Store HDFS-compatible storage in Azure cloud
Quantcast File System QFS open-source distributed file system
Red Hat GlusterFS scale-out network-attached storage file system
Seaweed-FS 22,426 5 days ago simple and highly scalable distributed file system
Alluxio reliable file sharing at memory speed across cluster frameworks
Tahoe-LAFS decentralized cloud storage system
Baidu File System 2,854 almost 6 years ago distributed filesystem

Awesome Big Data / Distributed Index

Pilosa 2,529 8 months ago Open source distributed bitmap index that dramatically accelerates queries across multiple, massive data sets

Awesome Big Data / Document Data Model

Actian Versant commercial object-oriented database management systems
Crate Data is an open source massively scalable data store. It requires zero administration
Facebook Apollo Facebook’s Paxos-like NoSQL database
jumboDB document oriented datastore over Hadoop
LinkedIn Espresso horizontally scalable document-oriented NoSQL data store
MarkLogic Schema-agnostic Enterprise NoSQL database technology
Microsoft Azure DocumentDB NoSQL cloud database service with protocol support for MongoDB
MongoDB Document-oriented database system
RavenDB A transactional, open-source Document Database
RethinkDB document database that supports queries like table joins and group by

Awesome Big Data / Key Map Data Model

Apache Accumulo distributed key/value store, built on Hadoop
Apache Cassandra column-oriented distributed datastore, inspired by BigTable
Apache HBase column-oriented distributed datastore, inspired by BigTable
Baidu Tera 1,886 4 months ago an Internet-scale database, inspired by BigTable
Facebook HydraBase evolution of HBase made by Facebook
Google BigTable column-oriented distributed datastore
Google Cloud Datastore is a fully managed, schemaless database for storing non-relational data over BigTable
Hypertable column-oriented distributed datastore, inspired by BigTable
InfiniDB 250 almost 7 years ago is accessed through a MySQL interface and use massive parallel processing to parallelize queries
Tephra 158 22 days ago Transactions for HBase
Twitter Manhattan real-time, multi-tenant distributed database for Twitter scale
ScyllaDB column-oriented distributed datastore written in C++, totally compatible with Apache Cassandra

Awesome Big Data / Key-value Data Model

Aerospike NoSQL flash-optimized, in-memory. Open source and "Server code in 'C' (not Java or Erlang) precisely tuned to avoid context switching and memory copies."
Amazon DynamoDB distributed key/value store, implementation of Dynamo paper
Badger a fast, simple, efficient, and persistent key-value store written natively in Go
Bolt 14,174 over 6 years ago an embedded key-value database for Go
BTDB 136 about 1 month ago Key Value Database in .Net with Object DB Layer, RPC, dynamic IL and much more
BuntDB 4,541 25 days ago a fast, embeddable, in-memory key/value database for Go with custom indexing and geospatial support
Edis 468 about 9 years ago is a protocol-compatible Server replacement for Redis
ElephantDB 558 over 10 years ago Distributed database specialized in exporting data from Hadoop
EventStore distributed time series database
GhostDB 750 over 3 years ago a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale
Graviton 419 over 2 years ago a simple, fast, versioned, authenticated, embeddable key-value store database in pure Go(lang)
GridDB 2,371 about 2 months ago suitable for sensor data stored in a timeseries
HyperDex 1,394 5 months ago a scalable, next generation key-value and document store with a wide array of features, including consistency, fault tolerance and high performance
Ignite is an in-memory key-value data store providing full SQL-compliant data access that can optionally be backed by disk storage
LinkedIn Krati 26 about 12 years ago is a simple persistent data store with very low latency and high throughput
Linkedin Voldemort distributed key/value storage system
Oracle NoSQL Database distributed key-value database by Oracle Corporation
Redis in memory key value datastore
Riak 3,941 5 months ago a decentralized datastore
Storehaus 464 about 4 years ago library to work with asynchronous key value stores, by Twitter
SummitDB 1,409 over 2 years ago an in-memory, NoSQL key/value database, with disk persistance and using the Raft consensus algorithm
Tarantool 3,396 7 days ago an efficient NoSQL database and a Lua application server
TiKV 15,081 5 days ago a distributed key-value database powered by Rust and inspired by Google Spanner and HBase
Tile38 9,087 5 days ago a geolocation data store, spatial index, and realtime geofence, supporting a variety of object types including latitude/longitude points, bounding boxes, XYZ tiles, Geohashes, and GeoJSON
TreodeDB 177 almost 9 years ago key-value store that's replicated and sharded and provides atomic multirow writes

Awesome Big Data / Graph Data Model

AgensGraph a new generation multi-model graph database for the modern complex data environment
Apache Giraph implementation of Pregel, based on Hadoop
Apache Spark Bagel implementation of Pregel, part of Spark
ArangoDB multi model distributed database
DGraph 20,345 5 days ago A scalable, distributed, low latency, high throughput graph database aimed at providing Google production level scale and throughput, with low enough latency to be serving real time user queries, over terabytes of structured data
EliasDB 998 about 2 years ago a lightweight graph based database that does not require any third-party libraries
Facebook TAO TAO is the distributed data store that is widely used at facebook to store and serve the social graph
GCHQ Gaffer 1,766 3 days ago Gaffer by GCHQ is a framework that makes it easy to store large-scale graphs in which the nodes and edges have statistics
Google Cayley 14,842 3 months ago open-source graph database
Google Pregel graph processing framework
GraphLab PowerGraph a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API
GraphX resilient Distributed Graph System on Spark
Gremlin 1,948 about 3 years ago graph traversal Language
Infovore 148 almost 3 years ago RDF-centric Map/Reduce framework
Intel GraphBuilder tools to construct large-scale graphs on top of Hadoop
JanusGraph open-source, distributed graph database with multiple options for storage backends (Bigtable, HBase, Cassandra, etc.) and indexing backends (Elasticsearch, Solr, Lucene)
MapGraph Massively Parallel Graph processing on GPUs
Microsoft Graph Engine 2,198 10 months ago a distributed in-memory data processing engine, underpinned by a strongly-typed in-memory key-value store and a general distributed computation engine
Neo4j graph database written entirely in Java
OrientDB document and graph database
Phoebus 383 over 12 years ago framework for large scale graph processing
Titan distributed graph database, built over Cassandra
Twitter FlockDB 3,337 over 7 years ago distributed graph database
NodeXL A free, open-source template for Microsoft® Excel® 2007, 2010, 2013 and 2016 that makes it easy to explore network graphs

Awesome Big Data / Columnar Databases

Columnar Storage an explanation of what columnar storage is and when you might want it
Actian Vector column-oriented analytic database
ClickHouse an open-source column-oriented database management system that allows generating analytical data reports in real time
EventQL a distributed, column-oriented database built for large-scale event collection and analytics
MonetDB column store database
Parquet columnar storage format for Hadoop
Pivotal Greenplum purpose-built, dedicated analytic data warehouse that offers a columnar engine as well as a traditional row-based one
Vertica is designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses
SQream DB A GPU powered big data database, designed for analytics and data warehousing, with ANSI-92 compliant SQL, suitable for data sets from 10TB to 1PB
Google BigQuery Google's cloud offering backed by their pioneering work on Dremel
Amazon Redshift Amazon's cloud offering, also based on a columnar datastore backend
IndexR 453 almost 2 years ago an open-source columnar storage format for fast & realtime analytic with big data
LocustDB 1,614 about 2 months ago an experimental analytics database aiming to set a new standard for query performance on commodity hardware

Awesome Big Data / NewSQL Databases

Actian Ingres commercially supported, open-source SQL relational database management system
ActorDB 1,893 almost 2 years ago a distributed SQL database with the scalability of a KV store, while keeping the query capabilities of a relational database
Amazon RedShift data warehouse service, based on PostgreSQL
BayesDB 889 about 9 years ago statistic oriented SQL database
Bedrock a simple, modular, networked and distributed transaction layer built atop SQLite
CitusDB scales out PostgreSQL through sharding and replication
Cockroach 29,954 5 days ago Scalable, Geo-Replicated, Transactional Datastore
Comdb2 1,364 5 days ago a clustered RDBMS built on optimistic concurrency control techniques
Datomic distributed database designed to enable scalable, flexible and intelligent applications
FoundationDB distributed database, inspired by F1
Google F1 distributed SQL database built on Spanner
Google Spanner globally distributed semi-relational database
H-Store is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications
Haeinsa 158 over 7 years ago linearly scalable multi-row, multi-table transaction library for HBase based on Percolator
HandlerSocket NoSQL plugin for MySQL/MariaDB
InfiniSQL infinity scalable RDBMS
KarelDB 393 8 days ago a relational database backed by Apache Kafka
Map-D GPU in-memory database, big data analysis and visualization platform
MemSQL in memory SQL database witho optimized columnar storage on flash
NuoDB SQL/ACID compliant distributed database
Oracle TimesTen in-Memory Database in-memory, relational database management system with persistence and recoverability
Pivotal GemFire XD Low-latency, in-memory, distributed SQL data store. Provides SQL interface to in-memory table data, persistable in HDFS
SAP HANA is an in-memory, column-oriented, relational database management system
SenseiDB distributed, realtime, semi-structured database
Sky database used for flexible, high performance analysis of behavioral data
SymmetricDS open source software for both file and database synchronization
TiDB 36,985 5 days ago TiDB is a distributed SQL database. Inspired by the design of Google F1
VoltDB claims to be fastest in-memory database
yugabyteDB 8,901 5 days ago open source, high-performance, distributed SQL database compatible with PostgreSQL

Awesome Big Data / Time-Series Databases

Axibase Time Series Database Integrated time series database on top of HBase with built-in visualization, rule-engine and SQL support
Chronix a time series storage built to store time series highly compressed and for fast access times
Cube uses MongoDB to store time series data
Heroic is a scalable time series database based on Cassandra and Elasticsearch
InfluxDB a time series database with optimised IO and queries, supports pgsql and influx wire protocols
QuestDB high-performance, open-source SQL database for applications in financial services, IoT, machine learning, DevOps and observability
IronDB scalable, general-purpose time series database
Kairosdb 1,738 5 months ago similar to OpenTSDB but allows for Cassandra
M3DB a distributed time series database that can be used for storing realtime metrics at long retention
Newts a time series database based on Apache Cassandra
TDengine 23,269 5 days ago a time series database in C utilizing unique features of IoT to improve read/write throughput and reduce space needed to store data
OpenTSDB distributed time series database on top of HBase
Prometheus a time series database and service monitoring system
Beringei 3,173 about 6 years ago Facebook's in-memory time-series database
TrailDB an efficient tool for storing and querying series of events
Druid 13,429 5 days ago Column oriented distributed data store ideal for powering interactive applications
Riak-TS Riak TS is the only enterprise-grade NoSQL time series database optimized specifically for IoT and Time Series data
Akumuli 836 about 2 years ago Akumuli is a numeric time-series database. It can be used to capture, store and process time-series data in real-time. The word "akumuli" can be translated from esperanto as "accumulate"
Rhombus A time-series object store for Cassandra that handles all the complexity of building wide row indexes
Dalmatiner DB 695 over 5 years ago Fast distributed metrics database
Blueflood 597 about 2 months ago A distributed system designed to ingest and process time series data
Timely 377 3 months ago Timely is a time series database application that provides secure access to time series data based on Accumulo and Grafana
SiriDB 503 11 days ago Highly-scalable, robust and fast, open source time series database with cluster functionality
Thanos 13,014 8 days ago Thanos is a set of components to create a highly available metric system with unlimited storage capacity using multiple (existing) Prometheus deployments
VictoriaMetrics 12,018 5 days ago fast, scalable and resource-effective open-source TSDB compatible with Prometheus. Single-node and cluster versions included

Awesome Big Data / SQL-like processing

Actian SQL for Hadoop high performance interactive SQL access to all Hadoop data
Apache Drill framework for interactive analysis, inspired by Dremel
Apache HCatalog table and storage management layer for Hadoop
Apache Hive SQL-like data warehouse system for Hadoop
Apache Calcite framework that allows efficient translation of queries involving heterogeneous and federated data
Apache Phoenix SQL skin over HBase
Aster Database SQL-like analytic processing for MapReduce
Cloudera Impala framework for interactive analysis, Inspired by Dremel
Concurrent Lingual SQL-like query language for Cascading
Datasalt Splout SQL full SQL query engine for big datasets
Dremio an open-source, SQL-like Data-as-a-Service Platform based on Apache Arrow
Facebook PrestoDB distributed SQL query engine
Google BigQuery framework for interactive analysis, implementation of Dremel
Materialize 5,732 5 days ago is a streaming database for real-time applications using SQL for queries and supporting a large fraction of PostgreSQL
Invantive SQL SQL engine for online and on-premise use with integrated local data replication and 70+ connectors
PipelineDB an open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables
Pivotal HDB SQL-like data warehouse system for Hadoop
RainstorDB database for storing petabyte-scale volumes of structured and semi-structured data
Spark Catalyst 39,387 5 days ago is a Query Optimization Framework for Spark and Shark
SparkSQL Manipulating Structured Data Using Spark
Splice Machine a full-featured SQL-on-Hadoop RDBMS with ACID transactions
Stinger interactive query for Hive
Tajo distributed data warehouse system on Hadoop
Trafodion enterprise-class SQL-on-HBase solution targeting big data transactional or operational workloads

Awesome Big Data / Data Ingestion

redpanda A Kafka® replacement for mission critical systems; 10x faster. Written in C++
Amazon Kinesis real-time processing of streaming data at massive scale
Amazon Web Services Glue serverless fully managed extract, transform, and load (ETL) service
Census A reverse ETL product that let you sync data from your data warehouse to SaaS Applications. No engineering favors required—just SQL
Apache Chukwa data collection system
Apache Flume service to manage large amount of log data
Apache Kafka distributed publish-subscribe messaging system
Apache NiFi Apache NiFi is an integrated data logistics platform for automating the movement of data between disparate systems
Apache Pulsar 14,141 12 days ago a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API
Apache Sqoop tool to transfer data between Hadoop and a structured datastore
Embulk open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services
Facebook Scribe 3,923 about 4 years ago streamed log data aggregator
Fluentd tool to collect events and logs
Gazette 709 11 days ago Distributed streaming infrastructure built on cloud storage which makes it easy to mix and match batch and streaming paradigms
Google Photon geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency
Heka 3,390 9 months ago open source stream processing software system
HIHO 91 over 11 years ago framework for connecting disparate data sources with Hadoop
Kestrel distributed message queue system
LinkedIn Databus stream of change capture events for a database
LinkedIn Kamikaze 22 over 10 years ago utility package for compressing sorted integer arrays
LinkedIn White Elephant 191 almost 11 years ago log aggregator and dashboard
Logstash a tool for managing events and logs
Netflix Suro 794 over 1 year ago log agregattor like Storm and Samza based on Chukwa
Pinterest Secor 1,845 8 days ago is a service implementing Kafka log persistance
Linkedin Gobblin 2,216 16 days ago linkedin's universal data ingestion framework
Skizze 771 over 8 years ago sketch data store to deal with all problems around counting and sketching using probabilistic data-structures
StreamSets Data Collector continuous big data ingest infrastructure with a simple to use IDE
Alooma data pipeline as a service enabling moving data sources such as MySQL into data warehouses
RudderStack 4,056 5 days ago an open source customer data infrastructure (segment, mParticle alternative) written in go
Zilla 531 5 days ago An API gateway built for event-driven architectures and streaming that supports standard protocols such as HTTP, SSE, gRPC, MQTT and the native Kafka protocol

Awesome Big Data / Service Programming

Akka Toolkit runtime for distributed, and fault tolerant event-driven applications on the JVM
Apache Avro data serialization system
Apache Curator Java libaries for Apache ZooKeeper
Apache Karaf OSGi runtime that runs on top of any OSGi framework
Apache Thrift framework to build binary protocols
Apache Zookeeper centralized service for process management
Google Chubby a lock service for loosely-coupled distributed systems
Hydrosphere Mist 326 almost 4 years ago a service for exposing Apache Spark analytics jobs and machine learning models as realtime, batch or reactive web services
Linkedin Norbert cluster manager
Mara 2,072 10 months ago A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
OpenMPI message passing framework
Serf decentralized solution for service discovery and orchestration
Spotify Luigi 17,746 11 days ago a Python package for building complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more
Spring XD 477 over 2 years ago distributed and extensible system for data ingestion, real time analytics, batch processing, and data export
Twitter Elephant Bird 1,139 over 1 year ago libraries for working with LZOP-compressed data
Twitter Finagle asynchronous network stack for the JVM

Awesome Big Data / Scheduling

Apache Airflow 36,519 4 days ago a platform to programmatically author, schedule and monitor workflows
Apache Aurora is a service scheduler that runs on top of Apache Mesos
Apache Falcon data management framework
Apache Oozie workflow job scheduler
Azure Data Factory cloud-based pipeline orchestration for on-prem, cloud and HDInsight
Chronos distributed and fault-tolerant scheduler
Cronicle 3,726 10 days ago Distributed, easy to install, NodeJS based, task scheduler
Dagster 11,237 5 days ago a data orchestrator for machine learning, analytics, and ETL
Linkedin Azkaban batch workflow job scheduler
Schedoscope 96 almost 5 years ago Scala DSL for agile scheduling of Hadoop jobs
Sparrow 319 about 4 years ago scheduling platform

Awesome Big Data / Machine Learning

Azure ML Studio Cloud-based AzureML, R, Python Machine Learning platform
brain 8,006 about 4 years ago Neural networks in JavaScript
Oryx 1,786 about 3 years ago Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Concurrent Pattern machine learning library for Cascading
convnetjs 10,855 over 1 year ago Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser
DataVec A vectorization and data preprocessing library for deep learning in Java and Scala. Part of the Deeplearning4j ecosystem
Deeplearning4j Fast, open deep learning for the JVM (Java, Scala, Clojure). A neural network configuration layer powered by a C++ library. Uses Spark and Hadoop to train nets on multiple GPUs and CPUs
Decider 383 over 7 years ago Flexible and Extensible Machine Learning in Ruby
ENCOG machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data
etcML text classification with machine learning
Etsy Conjecture 361 over 6 years ago scalable Machine Learning in Scalding
Feast 5,514 5 days ago A feature store for the management, discovery, and access of machine learning features. Feast provides a consistent view of feature data for both model training and model serving
GraphLab Create A machine learning platform in Python with a broad collection of ML toolkits, data engineering, and deployment tools
H2O 6,873 5 days ago statistical, machine learning and math runtime with Hadoop. R and Python
Karate Club 2,142 3 months ago An unsupervised machine learning library for graph structured data. Python
Keras 61,683 5 days ago An intuitive neural net API inspired by Torch that runs atop Theano and Tensorflow
Lambdo 1 about 6 years ago Lambdo is a workflow engine which significantly simplifies the analysis process by unifying feature engineering and machine learning operations
Little Ball of Fur 701 8 months ago A subsampling library for graph structured data. Python
Mahout An Apache-backed machine learning library for Hadoop
MLbase distributed machine learning libraries for the BDAS stack
MLPNeuralNet 900 about 8 years ago Fast multilayer perceptron neural network library for iOS and Mac OS X
ML Workspace 3,405 2 months ago All-in-one web-based IDE specialized for machine learning and data science
MOA MOA performs big data stream mining in real time, and large scale machine learning
MonkeyLearn Text mining made easy. Extract and classify data from text
ND4J A matrix library for the JVM. Numpy for Java
nupic 6,335 about 1 year ago Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms
PredictionIO machine learning server buit on Hadoop, Mahout and Cascading
PyTorch Geometric Temporal 2,628 4 months ago a temporal extension library for PyTorch Geometric
RL4J Reinforcement learning for Java and Scala. Includes Deep-Q learning and A3C algorithms, and integrates with Open AI's Gym. Runs in the Deeplearning4j ecosystem
SAMOA distributed streaming machine learning framework
scikit-learn 59,578 5 days ago scikit-learn: machine learning in Python
Shapley 218 over 1 year ago A data-driven framework to quantify the value of classifiers in a machine learning ensemble
Spark MLlib a Spark implementation of some common machine learning (ML) functionality
Sibyl System for Large Scale Machine Learning at Google
TensorFlow 185,782 5 days ago Library from Google for machine learning using data flow graphs
Theano A Python-focused machine learning library supported by the University of Montreal
Torch A deep learning library with a Lua API, supported by NYU and Facebook
Velox 110 over 7 years ago System for serving machine learning predictions
Vowpal Wabbit 8,468 2 months ago learning system sponsored by Microsoft and Yahoo!
WEKA suite of machine learning software
BidMach 916 about 2 years ago CPU and GPU-accelerated Machine Learning Library

Awesome Big Data / Benchmarking

Apache Hadoop Benchmarking micro-benchmarks for testing Hadoop performances
Berkeley SWIM Benchmark real-world big data workload benchmark
Intel HiBench 1,447 7 months ago a Hadoop benchmark suite
PUMA Benchmarking benchmark suite for MapReduce applications
Yahoo Gridmix3 Hadoop cluster benchmarking from Yahoo engineer team
Deeplearning4j Benchmarks
UCSB 49 about 1 year ago extended Yahoo Cloud Serving Benchmark for NoSQL databases

Awesome Big Data / Security

Apache Ranger Central security admin & fine-grained authorization for Hadoop
Apache Eagle real time monitoring solution
Apache Knox Gateway single point of secure access for Hadoop clusters
Apache Sentry security module for data stored in Hadoop
BDA 104 over 4 years ago The vulnerability detector for Hadoop and Spark

Awesome Big Data / System Deployment

Apache Ambari operational framework for Hadoop mangement
Apache Bigtop system deployment framework for the Hadoop ecosystem
Apache Helix cluster management framework
Apache Mesos cluster manager
Apache Slider 79 almost 6 years ago is a YARN application to deploy existing distributed applications on YARN
Apache Whirr set of libraries for running cloud services
Apache YARN Cluster manager
Brooklyn library that simplifies application deployment and management
Buildoop Similar to Apache BigTop based on Groovy language
Cloudera HUE web application for interacting with Hadoop
Facebook Prism multi datacenters replication system
Google Borg job scheduling and monitoring system
Google Omega job scheduling and monitoring system
Hortonworks HOYA application that can deploy HBase cluster on YARN
Kubernetes a system for automating deployment, scaling, and management of containerized applications
Marathon 4,066 about 2 years ago Mesos framework for long-running services
Linkis 3,298 8 days ago Linkis helps easily connect to various back-end computation/storage engines

Awesome Big Data / Applications

411 971 over 1 year ago an web application for alert management resulting from scheduled searches into Elasticsearch
Adobe spindle 331 over 9 years ago Next-generation web analytics processing with Scala, Spark, and Parquet
Apache Metron a platform that integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis
Apache Nutch open source web crawler
Apache OODT capturing, processing and sharing of data for NASA's scientific archives
Apache Tika content analysis toolkit
Argus 505 over 2 years ago Time series monitoring and alerting platform
AthenaX 1,222 over 4 years ago a streaming analytics platform that enables users to run production-quality, large scale streaming analytics using Structured Query Language (SQL)
Atlas 3,439 8 days ago a backend for managing dimensional time series data
Countly open source mobile and web analytics platform, based on Node.js & MongoDB
Domino Run, scale, share, and deploy models — without any infrastructure
Eclipse BIRT Eclipse-based reporting system
ElastAert 7,991 about 2 months ago ElastAlert is a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in ElasticSearch
Eventhub 1,335 over 2 years ago open source event analytics platform
HASH open source simulation and visualization platform
Hermes 811 8 days ago asynchronous message broker built on top of Kafka
Hunk Splunk analytics for Hadoop
Imhotep Large scale analytics platform by indeed
Indicative Web & mobile analytics tool, with data warehouse (AWS, BigQuery) integration
Jupyter Notebook and project application for interactive data science and scientific computing across all programming languages
MADlib data-processing library of an RDBMS to analyze data
Kapacitor 2,310 9 days ago an open source framework for processing, monitoring, and alerting on time series data
Kylin open source Distributed Analytics Engine from eBay
PivotalR 125 almost 2 years ago R on Pivotal HD / HAWQ and PostgreSQL
Rakam 798 almost 3 years ago open-source real-time custom analytics platform powered by Postgresql, Kinesis and PrestoDB
Qubole auto-scaling Hadoop cluster, built-in data connectors
SnappyData 1,039 almost 2 years ago a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) built on Spark in a single integrated cluster
Snowplow 6,823 about 1 month ago enterprise-strength web and event analytics, powered by Hadoop, Kinesis, Redshift and Postgres
SparkR R frontend for Spark
Splunk analyzer for machine-generated data
Sumo Logic cloud based analyzer for machine-generated data
Substation 321 7 days ago Substation is a cloud native data pipeline and transformation toolkit written in Go
Talend unified open source environment for YARN, Hadoop, HBASE, Hive, HCatalog & Pig

Awesome Big Data / Search engine and framework

Apache Lucene Search engine library
Apache Solr Search platform for Apache Lucene
Elassandra 1,714 6 months ago is a fork of Elasticsearch modified to run on top of Apache Cassandra in a scalable and resilient peer-to-peer architecture
ElasticSearch Search and analytics engine based on Apache Lucene
Enigma.io – Freemium robust web application for exploring, filtering, analyzing, searching and exporting massive datasets scraped from across the Web
Google Caffeine continuous indexing system
Google Percolator continuous indexing system
HBase Coprocessor implementation of Percolator, part of HBase
Lily HBase Indexer quickly and easily search for any content stored in HBase
LinkedIn Bobo is a Faceted Search implementation written purely in Java, an extension to Apache Lucene
LinkedIn Cleo 563 almost 11 years ago is a flexible software library for enabling rapid development of partial, out-of-order and real-time typeahead search
LinkedIn Galene search architecture at LinkedIn
LinkedIn Zoie 367 almost 2 years ago is a realtime search/indexing system written in Java
MG4J MG4J (Managing Gigabytes for Java) is a full-text search engine for large document collections written in Java. It is highly customisable, high-performance and provides state-of-the-art features and new research algorithms
Sphinx Search Server fulltext search engine
Vespa is an engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time
Facebook Faiss 30,701 9 days ago is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy
Annoy 13,140 2 months ago is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data
Weaviate 10,967 5 days ago Weaviate is a GraphQL-based semantic search engine with build-in (word) embeddings

Awesome Big Data / MySQL forks and evolutions

Amazon RDS MySQL databases in Amazon's cloud
Drizzle evolution of MySQL 6.0
Google Cloud SQL MySQL databases in Google's cloud
MariaDB enhanced, drop-in replacement for MySQL
MySQL Cluster MySQL implementation using NDB Cluster storage engine
Percona Server enhanced, drop-in replacement for MySQL
ProxySQL 25 almost 7 years ago High Performance Proxy for MySQL
TokuDB TokuDB is a storage engine for MySQL and MariaDB
WebScaleSQL is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale

Awesome Big Data / PostgreSQL forks and evolutions

HadoopDB hybrid of MapReduce and DBMS
IBM Netezza high-performance data warehouse appliances
Postgres-XL Scalable Open Source PostgreSQL-based Database Cluster
RecDB Open Source Recommendation Engine Built Entirely Inside PostgreSQL
Stado open source MPP database system solely targeted at data warehousing and data mart applications
Yahoo Everest multi-peta-byte database / MPP derived by PostgreSQL
TimescaleDB An open-source time-series database optimized for fast ingest and complex queries
PipelineDB The Streaming SQL Database. An open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables

Awesome Big Data / Memcached forks and evolutions

Facebook McDipper key/value cache for flash storage
Facebook Memcached fork of Memcache
Twemproxy 12,130 6 months ago A fast, light-weight proxy for memcached and redis
Twitter Fatcache 1,301 almost 3 years ago key/value cache for flash storage
Twitter Twemcache 929 almost 3 years ago fork of Memcache

Awesome Big Data / Embedded Databases

Actian PSQL ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications
BerkeleyDB a software library that provides a high-performance embedded database for key/value data
HanoiDB 306 about 8 years ago Erlang LSM BTree Storage
LevelDB 36,273 about 1 month ago a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values
LMDB ultra-fast, ultra-compact key-value embedded data store developed by Symas
RocksDB embeddable persistent key-value store for fast storage based on LevelDB

Awesome Big Data / Business Intelligence

BIME Analytics business intelligence platform in the cloud
Blazer 4,512 8 days ago business intelligence made simple
Chartio lean business intelligence platform to visualize and explore your data
Count notebook-based anlytics and visualisation platform using SQL or drag-and-drop
datapine self-service business intelligence tool in the cloud
Dekart Large scale geospatial analytics for Google BigQuery based on Kepler.gl
GoodData platform for data products and embedded analytics
Jaspersoft powerful business intelligence suite
Jedox Palo customisable Business Intelligence platform
Jethrodata Interactive Big Data Analytics
intermix.io Performance Monitoring for Amazon Redshift
Metabase 38,274 5 days ago The simplest, fastest way to get business intelligence and analytics to everyone in your company
Microsoft business intelligence software and platform
Microstrategy software platforms for business intelligence, mobile intelligence, and network applications
Numeracy Fast, clean SQL client and business intelligence
Pentaho business intelligence platform
Qlik business intelligence and analytics platform
Redash Open source business intelligence platform, supporting multiple data sources and planned queries
Saiku Analytics Open source analytics platform
Knowage open source business intelligence platform. (former )
SparklineData SNAP modern B.I platform powered by Apache Spark
Tableau business intelligence platform
Zoomdata Big Data Analytics

Awesome Big Data / Data Visualization

Airpal 2,760 over 3 years ago Web UI for PrestoDB
AnyChart fast, simple and flexible JavaScript (HTML5) charting library featuring pure JS API
Arbor 2,662 over 4 years ago graph visualization library using web workers and jQuery
Banana 668 about 2 months ago visualize logs and time-stamped data stored in Solr. Port of Kibana
Bloomery 17 over 7 years ago Web UI for Impala
Bokeh A powerful Python interactive visualization library that targets modern web browsers for presentation, with the goal of providing elegant, concise construction of novel graphics in the style of D3.js, but also delivering this capability with high-performance interactivity over very large or streaming datasets
C3 D3-based reusable chart library
CartoDB 2,748 4 months ago open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API
chartd responsive, retina-compatible charts with just an img tag
Chart.js open source HTML5 Charts visualizations
Chartist.js 70 5 months ago another open source HTML5 Charts visualization
Crossfilter JavaScript library for exploring large multivariate datasets in the browser. Works well with dc.js and d3.js
Cubism 4,943 over 1 year ago JavaScript library for time series visualization
Cytoscape JavaScript library for visualizing complex networks
DC.js Dimensional charting built to work natively with crossfilter rendered using d3.js. Excellent for connecting charts/additional metadata to hover events in D3
D3 javaScript library for manipulating documents
D3.compose 698 almost 2 years ago Compose complex, data-driven visualizations from reusable charts and components
D3Plus A fairly robust set of reusable charts and styles for d3.js
Dash 21,250 15 days ago Analytical Web Apps for Python, R, Julia, and Jupyter. Built on top of plotly, no JS required
Dekart Large scale geospatial analytics for Google BigQuery based on Kepler.gl
DevExtreme React Chart High-performance plugin-based React chart for Bootstrap and Material Design
Echarts 60,265 5 days ago Baidus enterprise charts
Envisionjs 1,563 over 4 years ago dynamic HTML5 visualization
FnordMetric write SQL queries that return SVG charts rather than tables
Frappe Charts GitHub-inspired simple and modern SVG charts for the web with zero dependencies
Freeboard 6,438 about 1 year ago pen source real-time dashboard builder for IOT and other web mashups
Gephi 5,880 27 days ago An award-winning open-source platform for visualizing and manipulating large graphs and network connections. It's like Photoshop, but for graphs. Available for Windows and Mac OS X
Google Charts simple charting API
Grafana graphite dashboard frontend, editor and graph composer
Graphite scalable Realtime Graphing
Highcharts simple and flexible charting API
IPython provides a rich architecture for interactive computing
Kibana visualize logs and time-stamped data
Lumify open source big data analysis and visualization platform
Matplotlib 20,048 6 days ago plotting with Python
Metricsgraphic.js a library built on top of D3 that is optimized for time-series data
NVD3 chart components for d3.js
Peity 4,218 6 months ago Progressive SVG bar, line and pie charts
Plot.ly Easy-to-use web service that allows for rapid creation of complex charts, from heatmaps to histograms. Upload data to create and style charts with Plotly's online spreadsheet. Fork others' plots
Plotly.js 16,922 8 days ago The open source javascript graphing library that powers plotly
Recline 2,194 15 days ago simple but powerful library for building data applications in pure Javascript and HTML
Redash 26,051 16 days ago open-source platform to query and visualize data
ReCharts A composable charting library built on React components
Shiny a web application framework for R
Sigma.js 11,231 5 days ago JavaScript library dedicated to graph drawing
Superset 62,043 3 days ago a data exploration platform designed to be visual, intuitive and interactive, making it easy to slice, dice and visualize data and perform analytics at the speed of thought
Vega 11,180 10 days ago a visualization grammar
Zeppelin 412 about 7 years ago a notebook-style collaborative data analysis
Zing Charts JavaScript charting library for big data
DataSphere Studio 3,054 3 months ago one-stop data application development management portal

Awesome Big Data / Internet of things and sensor data

Apache Edgent (Incubating) a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the edge devices
Azure IoT Hub Cloud-based bi-directional monitoring and messaging hub
TempoIQ Cloud-based sensor analytics
2lemetry Platform for Internet of things
Pubnub Data stream network
ThingWorx Rapid development and connection of intelligent systems
IFTTT If this then that
Evrything Making products smart
NetLytics 9 over 6 years ago Analytics platform to process network data on Spark
Ably Pub/sub messaging platform for IoT

Awesome Big Data / Interesting Readings

Big Data Benchmark Benchmark of Redshift, Hive, Shark, Impala and Stiger/Tez
NoSQL Comparison Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison
Monitoring Kafka performance Guide to monitoring Apache Kafka, including native methods for metrics collection
Monitoring Hadoop performance Guide to monitoring Hadoop, with an overview of Hadoop architecture, and native methods for metrics collection
Monitoring Cassandra performance Guide to monitoring Cassandra, including native methods for metrics collection

Awesome Big Data / Interesting Papers / 2015 - 2016

2015 - One Trillion Edges: Graph Processing at Facebook-Scale

Awesome Big Data / Interesting Papers / 2013 - 2014

2014 - Mining of Massive Datasets
2013 - Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
2013 - MLbase: A Distributed Machine-learning System
2013 - Shark: SQL and Rich Analytics at Scale
2013 - GraphX: A Resilient Distributed Graph System on Spark
2013 - HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm
2013 - Scalable Progressive Analytics on Big Data in the Cloud
2013 - Druid: A Real-time Analytical Data Store
2013 - Online, Asynchronous Schema Change in F1
2013 - F1: A Distributed SQL Database That Scales
2013 - MillWheel: Fault-Tolerant Stream Processing at Internet Scale
2013 - Scuba: Diving into Data at Facebook
2013 - Unicorn: A System for Searching the Social Graph
2013 - Scaling Memcache at Facebook

Awesome Big Data / Interesting Papers / 2011 - 2012

2012 - The Unified Logging Infrastructure for Data Analytics at Twitter
2012 - Blink and It’s Done: Interactive Queries on Very Large Data
2012 - Fast and Interactive Analytics over Hadoop Data with Spark
2012 - Shark: Fast Data Analysis Using Coarse-grained Distributed Memory
2012 - Paxos Replicated State Machines as the Basis of a High-Performance Data Store
2012 - Paxos Made Parallel
2012 - BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data
2012 - Processing a trillion cells per mouse click
2012 - Spanner: Google’s Globally-Distributed Database
2011 - Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters
2011 - Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
2011 - Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Awesome Big Data / Interesting Papers / 2001 - 2010

2010 - Finding a needle in Haystack: Facebook’s photo storage
2010 - Spark: Cluster Computing with Working Sets
2010 - Pregel: A System for Large-Scale Graph Processing
2010 - Large-scale Incremental Processing Using Distributed Transactions and Notifications base of Percolator and Caffeine
2010 - Dremel: Interactive Analysis of Web-Scale Datasets
2010 - S4: Distributed Stream Computing Platform
2009 HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
2008 - Chukwa: A large-scale monitoring system
2007 - Dynamo: Amazon’s Highly Available Key-value Store
2006 - The Chubby lock service for loosely-coupled distributed systems
2006 - Bigtable: A Distributed Storage System for Structured Data
2004 - MapReduce: Simplied Data Processing on Large Clusters
2003 - The Google File System

Awesome Big Data / Videos

Spark in Motion Spark in Motion teaches you how to use Spark for batch and streaming data analytics
Machine Learning, Data Science and Deep Learning with Python LiveVideo tutorial that covers machine learning, Tensorflow, artificial intelligence, and neural networks
Data warehouse schema design - dimensional modeling and star schema Introduction to schema design for data warehouse using the star schema method
Elasticsearch 7 and Elastic Stack LiveVideo tutorial that covers searching, analyzing, and visualizing big data on a cluster with Elasticsearch, Logstash, Beats, Kibana, and more

Awesome Big Data / Books

Data Science at Scale with Python and Dask Data Science at Scale with Python and Dask teaches you how to build distributed data projects that can handle huge amounts of data
Streaming Data Streaming Data introduces the concepts and requirements of streaming and real-time data systems
Storm Applied Storm Applied is a practical guide to using Apache Storm for the real-world tasks associated with processing and analyzing real-time data streams
Fundamentals of Stream Processing: Application Design, Systems, and Analytics This comprehensive, hands-on guide combining the fundamental building blocks and emerging research in stream processing is ideal for application designers, system builders, analytic developers, as well as students and researchers in the field
Stream Data Processing: A Quality of Service Perspective Presents a new paradigm suitable for stream and complex event processing
Unified Log Processing Unified Log Processing is a practical guide to implementing a unified log of event streams (Kafka or Kinesis) in your business
Kafka Streams in Action Kafka Streams in Action teaches you everything you need to know to implement stream processing on data flowing into your Kafka platform, allowing you to focus on getting more from your data without sacrificing time or effort
Big Data Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data
Spark in Action & - Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0
Kafka in Action Kafka in Action is a fast-paced introduction to every aspect of working with Kafka you need to really reap its benefits
Fusion in Action Fusion in Action teaches you to build a full-featured data analytics pipeline, including document and data search and distributed data clustering
Reactive Data Handling Reactive Data Handling is a collection of five hand-picked chapters, selected by Manuel Bernhardt, that introduce you to building reactive applications capable of handling real-time processing with large data loads--free eBook!
Azure Data Engineering A book about data engineering in general and the Azure platform specifically
Grokking Streaming Systems Grokking Streaming Systems helps you unravel what streaming systems are, how they work, and whether they’re right for your business. Written to be tool-agnostic, you’ll be able to apply what you learn no matter which framework you choose
Distributed Systems for fun and profit – Theory of distributed systems. Include parts about time and ordering, replication and impossibility results
Graph-Powered Machine Learning Alessandro Negro. Combine graph theory and models to improve machine learning projects

Awesome Big Data / Books / Data Visualization

The beauty of data visualization
Designing Data Visualizations with Noah Iliinsky
Hans Rosling's 200 Countries, 200 Years, 4 Minutes
Ice Bucket Challenge Data Visualization

Other Awesome Lists

awesome-awesomeness 31,706 4 months ago Other awesome lists
awesome 327,194 26 days ago Even more lists
list 9,927 5 days ago Another list?
awesome-awesome-awesome 1,915 11 months ago WTF!
awesome-analytics 3,908 5 months ago Analytics
awesome-public-datasets 60,356 29 days ago Public Datasets
awesome-graph-classification 4,741 over 1 year ago Graph Classification
awesome-network-embedding 2,584 almost 4 years ago Network Embedding
awesome-community-detection 2,322 7 months ago Community Detection
awesome-decision-tree-papers 2,369 7 months ago Decision Tree Papers
awesome-fraud-detection-papers 1,605 7 months ago Fraud Detection Papers
awesome-gradient-boosting-papers 997 7 months ago Gradient Boosting Papers
awesome-monte-carlo-tree-search-papers 632 7 months ago Monte Carlo Tree Search Papers
awesome-kafka 204 8 months ago Kafka
Google Bigtable 49 about 2 years ago

Backlinks from these awesome lists: