GPU L5: Computation
00:04:50.467,00:04:53.467 Mohit Singla cs17b113: id = threadId.x + blockDim.x * blockId.x 00:05:02.739,00:05:05.739 Banavath Tarun cs17b102: __global__ void kernel(int *gpu) { unsigned idx = blockIdx.x*blockDim.x + threadIdx.x; gpu[idx]=idx; } 00:10:14.802,00:10:17.802 ee20s136 Rahual G S: 15 00:10:17.065,00:10:20.065 Banavath Tarun cs17b102: 15? 00:10:18.045,00:10:21.045 KASHYAPI SHUBHAM BHALCHANDRA mm16b027: m*n/2 00:10:18.385,00:10:21.385 Shubham Mohan Randive cs20m064: 15 00:10:19.615,00:10:22.615 Nishant Prabhu me17b084: MN/2 00:10:20.245,00:10:23.245 Rigved Sah cs20m053: 15 00:10:22.044,00:10:25.044 Patlolla Bharath Simha Reddy cs18b034: 15 00:12:10.703,00:12:13.703 KASHYAPI SHUBHAM BHALCHANDRA mm16b027: Process multiple items in a single thread 00:12:14.463,00:12:17.463 ee20s136 Rahual G S: if(tid lessthan Number of computations required) 00:12:40.794,00:12:43.794 Shagnik Pal ee17b147: some processes will have to wait 00:14:38.466,00:14:41.466 Shouvick Mondal: Scan-based approach? 00:16:59.655,00:17:02.655 Shouvick Mondal: scanf() would be an issue on Colab. I have posted on moodle how to fix the issue. 00:17:44.273,00:17:47.273 KASHYAPI SHUBHAM BHALCHANDRA mm16b027: Is there a limit on the number of blocks? 00:19:05.038,00:19:08.038 Akash Haridas ae17b020: check? 00:19:14.017,00:19:17.017 ee20s136 Rahual G S: memcpy? 00:20:25.919,00:20:28.919 ee20s136 Rahual G S: no problem sir 00:20:34.164,00:20:37.164 SHETH DEV YASHPAL CS17B106: Pass the value of N to kernel? 00:22:15.966,00:22:18.966 Arihant Samar cs18b052: integer division 00:27:29.343,00:27:32.343 Shagnik Pal ee17b147: Can we call another kernal from a kernel? 00:31:33.453,00:31:36.453 Akshat Singh cs18b001: can you explain how to find the 2 points again 00:33:13.046,00:33:16.046 Rupesh Nasre.: id = blockIdx.x * blockDim.x + threadIdx.x; 00:33:35.293,00:33:38.293 Rupesh Nasre.: x1 = id / N; 00:33:44.859,00:33:47.859 Rupesh Nasre.: x2 = id % N; 00:34:46.123,00:34:49.123 Akash Haridas ae17b020: so basically x1 = blockIdx.x and x2 = threadIdx.x? 00:34:47.516,00:34:50.516 Akshat Singh cs18b001: yes sir 00:36:10.701,00:36:13.701 Akash Haridas ae17b020: ok sir 00:37:24.392,00:37:27.392 NIKAM ASHUTOSH SHASHIKANT ee16b143: For maximum, pass another double pointer to kernel 00:37:32.003,00:37:35.003 Aditya Agrawal CS20S026: We can allocate a variable in the device and once we get a distance, we can check with the variable in the device and then copy that to the host once all the distances are computed by the threads 00:37:52.656,00:37:55.656 K Sampreeth Prem cs17b013: Can we do a cudeMalloc and intialise it to 0 , each time we compute the distance we update the maximum and finally we can do cudamemcopy 00:39:13.040,00:39:16.040 Mohit Singla cs17b113: with locks 00:41:34.710,00:41:37.710 Vignesh S me17b078: so here the issue, which we might face is that multiple threads might update the max variable at the same time ? 00:41:51.086,00:41:54.086 KASHYAPI SHUBHAM BHALCHANDRA mm16b027: We can recursively launch threads from a thread? Each thread will find the max among its child threads. The child threads will work in parallel. 00:43:37.231,00:43:40.231 Rupesh Nasre.: Do not feed the cat. 00:44:47.153,00:44:50.153 P Sai Venkat Kushal ee17b141: concurrency issue? 00:44:50.653,00:44:53.653 K Sampreeth Prem cs17b013: race condition 00:44:54.482,00:44:57.482 Aditya Agrawal CS20S026: Race Condition? 00:44:58.847,00:45:01.847 Mr Rahul Mastram Verma CS20S038: Reader writer problem 00:45:49.949,00:45:52.949 Shouvick Mondal: Anything can happen anytime.
Download
0 formatsNo download links available.