laser

HPCLib

A high-performance computing library providing optimized primitives for tensor and matrix operations

The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers

GitHub

278 stars
14 watching
15 forks
Language: Nim
last commit: 11 months ago
Linked from 1 awesome list

assemblerblascompiler-optimizationconvolutiondeep-learninggemmhigh-performance-computingjitmatrix-multiplicationopenmpparallelruntime-cpu-detectionsimdtensor

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
mratsim/arraymancer A fast and ergonomic tensor library with automatic differentiation support for deep learning on multiple platforms. 1,338
mratsim/constantine A high-performance cryptography library for cryptographic primitives and protocols used in blockchain and zero-knowledge proof systems 408
clmathlibraries/clblas A software library that enables developers to tap into the performance benefits of heterogeneous computing by providing an OpenCL interface for BLAS functions. 843
laurentmazare/ocaml-torch Bindings for PyTorch's tensor library in OCaml for GPU acceleration and automatic differentiation 413
biddata/bidmat A high-performance matrix library with CPU and GPU acceleration for data mining applications. 265
blas-lapack-rs/accelerate-src Provides optimized linear algebra and numerical computing capabilities via the Accelerate framework 17
akabe/slap A linear algebra library with type-based static size checking for matrix operations. 88
mmottl/lacaml An OCaml interface to widely used linear algebra libraries for high-performance numerical computations. 128
numpi/hm-toolbox A toolbox implementing arithmetic operations for HODLR and HSS matrices in MATLAB. 43
nvidia/matx A C++17 GPU-accelerated numerical computing library with Python-like syntax 1,220
open-mmlab/mmengine Provides a flexible and configurable framework for training deep learning models with PyTorch. 1,179
versilov/matrex A fast and efficient matrix library for Elixir/Erlang with C implementation using CBLAS. 478
hkust-knowcomp/r-net An implementation of R-Net, a machine reading comprehension model using TensorFlow. 578
uncomplicate/neanderthal A Clojure library providing optimized native libraries for fast matrix and linear algebra computations on CPU and GPU. 1,076
ist-daslab/marlin An optimized FP16xINT4 matrix multiplication kernel for large language models 624