marlin
Matrix multiplier
An optimized FP16xINT4 matrix multiplication kernel for large language models
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
624 stars
15 watching
47 forks
Language: Python
last commit: 3 months ago 4bitkernelllmquantization
Related projects:
Repository | Description | Stars |
---|---|---|
microsoft/deepspeed-mii | A Python library designed to accelerate model inference with high-throughput and low latency capabilities | 1,898 |
jalawson/ulinalg | A small size matrix handling module with linear algebra operations for MicroPython (Python3) | 32 |
davidstutz/matlab-mnist-two-layer-perceptron | A Matlab implementation of a two-layer perceptron to recognize handwritten digits from the MNIST dataset. | 60 |
google/gemmlowp | A small C++ library for low-precision matrix multiplication | 1,779 |
janbednarik/micropython-matrix8x8 | A Python driver for an 8x8 LED Matrix display using I2C communication | 15 |
microsoft/bitblas | A library for efficient mixed-precision matrix multiplications on GPUs for deep learning models | 420 |
mratsim/laser | A high-performance computing library providing optimized primitives for tensor and matrix operations | 278 |
jjjkkkjjj/matft | A Numpy-like library in Swift for multi-dimensional array and matrix operations | 133 |
iyassou/umatrix | A library providing basic matrix arithmetic operations and functions for the MicroPython language. | 15 |
intel/neural-compressor | Tools and techniques for optimizing large language models on various frameworks and hardware platforms. | 2,226 |
versilov/matrex | A fast and efficient matrix library for Elixir/Erlang with C implementation using CBLAS. | 478 |
akabe/slap | A linear algebra library with type-based static size checking for matrix operations. | 88 |
tlk00/bitmagic | A C++ library for compact data structures and algorithms optimized for memory efficiency and high performance | 412 |
uncomplicate/neanderthal | A Clojure library providing optimized native libraries for fast matrix and linear algebra computations on CPU and GPU. | 1,076 |
numpi/hm-toolbox | A toolbox implementing arithmetic operations for HODLR and HSS matrices in MATLAB. | 43 |