Native ROCm C++ kernels for Strix Halo (gfx1151): ternary BitNet GEMV, RMSNorm, RoPE, split-KV Flash-Decoding attention. Zero hipBLAS, zero Python.
amd hip gpu-computing rocm cpp20 bitnet llm-inference flash-decoding strix-halo gfx1151 1-58-bit ternary-llm
-
Updated
Apr 29, 2026 - C++