laser
HPCLib
A high-performance computing library providing optimized primitives for tensor and matrix operations
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
278 stars
14 watching
15 forks
Language: Nim
last commit: 11 months ago
Linked from 1 awesome list
assemblerblascompiler-optimizationconvolutiondeep-learninggemmhigh-performance-computingjitmatrix-multiplicationopenmpparallelruntime-cpu-detectionsimdtensor
Related projects:
Repository | Description | Stars |
---|---|---|
mratsim/arraymancer | A fast and ergonomic tensor library with automatic differentiation support for deep learning on multiple platforms. | 1,338 |
mratsim/constantine | A high-performance cryptography library for cryptographic primitives and protocols used in blockchain and zero-knowledge proof systems | 408 |
clmathlibraries/clblas | A software library that enables developers to tap into the performance benefits of heterogeneous computing by providing an OpenCL interface for BLAS functions. | 843 |
laurentmazare/ocaml-torch | Bindings for PyTorch's tensor library in OCaml for GPU acceleration and automatic differentiation | 413 |
biddata/bidmat | A high-performance matrix library with CPU and GPU acceleration for data mining applications. | 265 |
blas-lapack-rs/accelerate-src | Provides optimized linear algebra and numerical computing capabilities via the Accelerate framework | 17 |
akabe/slap | A linear algebra library with type-based static size checking for matrix operations. | 88 |
mmottl/lacaml | An OCaml interface to widely used linear algebra libraries for high-performance numerical computations. | 128 |
numpi/hm-toolbox | A toolbox implementing arithmetic operations for HODLR and HSS matrices in MATLAB. | 43 |
nvidia/matx | A C++17 GPU-accelerated numerical computing library with Python-like syntax | 1,220 |
open-mmlab/mmengine | Provides a flexible and configurable framework for training deep learning models with PyTorch. | 1,179 |
versilov/matrex | A fast and efficient matrix library for Elixir/Erlang with C implementation using CBLAS. | 478 |
hkust-knowcomp/r-net | An implementation of R-Net, a machine reading comprehension model using TensorFlow. | 578 |
uncomplicate/neanderthal | A Clojure library providing optimized native libraries for fast matrix and linear algebra computations on CPU and GPU. | 1,076 |
ist-daslab/marlin | An optimized FP16xINT4 matrix multiplication kernel for large language models | 624 |