AMD engineer Haocong Wang presents the ROCm Composable Kernel library which allows to write high-performance kernels for machine learning with HIP C++.
Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.