dumbo

Hadoop tool

Makes writing and running Hadoop programs easier with a Python API

Python module that allows one to easily write and run Hadoop programs.

GitHub

1k stars
62 watching
146 forks
Language: Python
last commit: about 7 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
bwhite/hadoopy A Python MapReduce library written in Cython for efficient data processing on Hadoop clusters. 243
swyxio/swyxio A Python project focused on GitHub and DevRel, with the goal of providing resources and support for developers. 111
damballa/parkour A Clojure-based library for writing efficient MapReduce programs on the Hadoop platform 257
helgeho/hadoopconcatgz Provides a custom input format for handling concatenated GZIP files in distributed processing systems like Hadoop 9
bbva/kapow An HTTP microframework allowing developers to easily expose scripts as APIs and restrict execution. 614
swaroopch/byte-of-python A beginner's guide to the Python programming language 2,322
jhamrick/nbflow Tool that supports reproducible workflows with Jupyter Notebooks and SCons. 160
mzero/haskell-amuse-bouche A collection of Haskell code examples and resources illustrating the language's features and programming techniques. 114
pawegio/kandroid A Kotlin library that provides useful extensions to eliminate boilerplate code in Android development 894
netflix-skunkworks/cloudaux Provides a unified interface to various cloud providers 76
harisekhon/devops-python-tools Tools for managing and automating DevOps tasks, data processing, and cloud infrastructure using Python. 783
kwpolska/pkgbuilder An AUR helper and library that automates the process of building and installing Arch Linux packages from source. 71
halcy/mastodon.py A Python wrapper for the Mastodon API allowing developers to interact with the social media platform's public and private APIs. 889
jedie/django-kippo An integration layer for the kippo SSH honeypot with Django's administrative interface 12
clusto/clusto Tool for managing infrastructure clusters by tracking inventory, connections, and abstracting interactions with infrastructure elements. 291