In this video we looks at the basics of inline PTX in CUDA kernels getting the SM Id for each thread block!
TB Scheduler Reverse Engineering: https://users.oden.utexas.edu/~sreepai/fermi-tbs/
Application of TB Scheduler: http://casl.gatech.edu/wp-content/uploads/2016/04/wang-isca2016.pdf
For code samples: http://github.com/coffeebeforearch
For live content: http://twitch.tv/CoffeeBeforeArch