CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua… ...
Abstract: Non-Maximum Suppression (NMS) algorithm is an important post-processing step in object detection networks for various applications [1]. Standard NMS procedure suffers from poor time ...
That's it. The skill file teaches the agent how to install the binary and use all commands. Technical details: OfficeCLI ships with a SKILL.md (239 lines, ~8K tokens) that covers command syntax, ...