laser

HPCLib

A high-performance computing library providing optimized primitives for tensor and matrix operations

The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers

GitHub

281 stars
14 watching
15 forks
Language: Nim
last commit: about 1 year ago
Linked from 1 awesome list

assemblerblascompiler-optimizationconvolutiondeep-learninggemmhigh-performance-computingjitmatrix-multiplicationopenmpparallelruntime-cpu-detectionsimdtensor

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
mratsim/arraymancer A fast and ergonomic tensor library with automatic differentiation support for deep learning on multiple platforms. 1,342
mratsim/constantine A high-performance cryptography library for cryptographic primitives and protocols used in blockchain and zero-knowledge proof systems 417
clmathlibraries/clblas A software library that enables developers to tap into the performance benefits of heterogeneous computing by providing an OpenCL interface for BLAS functions. 845
laurentmazare/ocaml-torch Bindings for PyTorch's tensor library in OCaml for GPU acceleration and automatic differentiation 416
biddata/bidmat A high-performance matrix library with CPU and GPU acceleration for data mining applications. 265
blas-lapack-rs/accelerate-src Provides optimized linear algebra and numerical computing capabilities via the Accelerate framework 17
akabe/slap A linear algebra library with type-based static size checking for matrix operations. 88
mmottl/lacaml An OCaml interface to high-performance linear algebra libraries (BLAS/LAPACK) for numerical computations. 128
numpi/hm-toolbox A toolbox implementing arithmetic operations for HODLR and HSS matrices in MATLAB. 44
nvidia/matx A C++17 GPU-accelerated numerical computing library with Python-like syntax 1,229
open-mmlab/mmengine Provides a flexible and configurable framework for training deep learning models with PyTorch. 1,196
versilov/matrex A fast and efficient matrix library for Elixir/Erlang with C implementation using CBLAS. 479
hkust-knowcomp/r-net An implementation of R-NET, a machine reading comprehension model using scaled multiplicative attention and variational dropout. 578
uncomplicate/neanderthal A Clojure library providing optimized native libraries for fast matrix and linear algebra computations on CPU and GPU. 1,079
ist-daslab/marlin An optimized FP16xINT4 matrix multiplication kernel for large language models 655