Achieved native F16 (half-precision) weight storage running on GPU via OpenCL through HAT, working around a Babylon codegen bug in F16.f16ToFloat(). The fix is a single line change that extracts the ...