ThunderKittens

GPU kernel framework

A framework for writing fast deep learning kernels on NVIDIA GPUs

Tile primitives for speedy kernels

GitHub

2k stars
29 watching
70 forks
Language: Cuda
last commit: 6 days ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
xtra-computing/thundergbm Accelerates machine learning algorithms on GPUs to improve performance and efficiency 693
komputeproject/kompute A flexible GPU compute framework providing low-level access to Vulkan for optimized and parallel processing on various graphics cards. 1,997
hannes-brt/hebel A Python library for GPU-accelerated deep learning 1,169
cg-tuwien/auto-vk-toolkit A C++ framework for creating Vulkan-based graphics applications with built-in support for various features and tools. 406
fsole/brokkr A Vulkan framework for building Windows-based graphics applications using C++. 88
can-lehmann/owlkettle A declarative user interface framework built on top of GTK 4. 383
glavnokoman/vuh A Vulkan-based framework for accelerating computations on graphics processing units. 347
coolbutuseless/devoutpdf A custom PDF graphics device for R, providing fine-grained control over output and serving as a learning tool for graphics device implementation. 8
coreylowman/dfdx A deep learning library for Rust with GPU acceleration and ergonomic API. 1,737
jgbit/vuda Provides a Vulkan-based interface to CUDA's runtime API for GPU-accelerated applications 864
kwotsin/tensorflow-xception An implementation of a deep learning model for computer vision tasks using TensorFlow 207
cg-tuwien/vulkanlaunchpad A Vulkan-based framework for beginners to learn and develop 3D graphics applications. 62
denizyuret/knet.jl A deep learning framework implemented in Julia for automatic differentiation and GPU operation. 1,431
michaldrobot/shaderfastlibs Optimized shader libraries for fast operations on AMD GCN architecture. 358
keras-team/keras A high-level deep learning framework for building and training neural networks on multiple backend engines 62,115