`gl.nvidia.blackwell.tma.async_scatter` functions respectively. TMA gather and scatter operations only support 2D tensor descriptors, where the first dimension of the block shape must be 1. Gather ...
you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 ...