marlin
Matrix multiplier
An optimized FP16xINT4 matrix multiplication kernel for large language models
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
655 stars
15 watching
52 forks
Language: Python
last commit: 6 months ago 4bitkernelllmquantization
Related projects:
Repository | Description | Stars |
---|---|---|
| A Python library designed to accelerate model inference with high-throughput and low latency capabilities | 1,924 |
| A small size matrix handling module with linear algebra operations for MicroPython (Python3) | 32 |
| A Matlab implementation of a two-layer perceptron to recognize handwritten digits from the MNIST dataset. | 60 |
| A small C++ library for low-precision matrix multiplication | 1,782 |
| A Python driver for an 8x8 LED Matrix display using I2C communication | 15 |
| A library to support efficient mixed-precision matrix multiplications on GPUs for deep learning model deployment | 445 |
| A high-performance computing library providing optimized primitives for tensor and matrix operations | 281 |
| A Numpy-like library in Swift for multi-dimensional array and matrix operations | 135 |
| A library providing basic matrix arithmetic operations and functions for the MicroPython language. | 15 |
| Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,257 |
| A fast and efficient matrix library for Elixir/Erlang with C implementation using CBLAS. | 479 |
| A linear algebra library with type-based static size checking for matrix operations. | 88 |
| A C++ library for compact data structures and algorithms optimized for memory efficiency and high performance | 413 |
| A Clojure library providing optimized native libraries for fast matrix and linear algebra computations on CPU and GPU. | 1,079 |
| A toolbox implementing arithmetic operations for HODLR and HSS matrices in MATLAB. | 44 |