GPU: L4 Part 1: CUDA Topics

Name: GPU: L4 Part 1: CUDA Topics
Uploaded: Dec 14, 2023
Duration: 4019 s

HPC Education5.54K subscribers

149 views

Dec 14, 2023

1:06:59

00:00:08.942,00:00:11.942 CHEKKALA SANDEEP REDDY: grid.sync() is not working 00:02:52.920,00:02:55.920 Soumik Basu: But should not this be taken care automatically by NVIDIA architecture ? that upon a grid sync, all the blocks should be executed completely, even by context switching among the blocks. 00:04:21.809,00:04:24.809 CHEKKALA SANDEEP REDDY: NO 00:04:26.831,00:04:29.831 Darpan Gaur: what is differrence between grid.sync() & __syncthreads() 00:06:57.713,00:07:00.713 Chinmay Rajesh Ingle: some optimization are only for gpu i guess? 00:07:44.871,00:07:47.871 Soumik Basu: __host__ __device__ = __global__ right ? 00:11:27.119,00:11:30.119 Soumik Basu: __host__ __device__ means without threading properties but can be callable from host and device 00:11:29.329,00:11:32.329 CHEKKALA SANDEEP REDDY: yes 00:11:37.643,00:11:40.643 srikakolapu bhagavan: yes I think 00:11:38.724,00:11:41.724 Chinmay Rajesh Ingle: yes 00:11:41.307,00:11:44.307 Soumik Basu: yes 00:11:45.764,00:11:48.764 Binong Kiri Bey: Yes 00:12:21.417,00:12:24.417 Soumik Basu: dkernel-to-dhfun 00:12:22.503,00:12:25.503 srikakolapu bhagavan: host function calling dkernal 00:12:32.183,00:12:35.183 Soumik Basu: host-to-dkernel 00:12:42.799,00:12:45.799 Ponnampalam Pirapuraj: main--to-dfun 00:13:37.782,00:13:40.782 Soumik Basu: dhfun-to-hostfun ? possible ? 00:13:42.705,00:13:45.705 CHEKKALA SANDEEP REDDY: NO 00:14:03.536,00:14:06.536 Soumik Basu: NO 00:14:24.814,00:14:27.814 Chinmay Rajesh Ingle: with thread size specified yes ? 00:15:20.271,00:15:23.271 CHEKKALA SANDEEP REDDY: YES 00:15:20.398,00:15:23.398 srikakolapu bhagavan: yes 00:15:30.913,00:15:33.913 Sayan Dey: possible when it runs on host 00:15:33.325,00:15:36.325 R Sowmeya Lakshmi: yes 00:21:00.072,00:21:03.072 Shyam Murthy: Is there a size limitation for the HostAlloc ed memory? 00:21:54.109,00:21:57.109 Soumik Basu: IS it passed based on page fault ? 00:22:56.662,00:22:59.662 R Sowmeya Lakshmi: Everytime a chnage happens to the varibale it is passed from gpu to cpu and cpu to gpu or only when function ends 00:23:37.992,00:23:40.992 Abhishek u: sir, are pinned pages cached? 00:25:55.832,00:25:58.832 CHEKKALA SANDEEP REDDY: kernal invcation 00:25:56.655,00:25:59.655 R Sowmeya Lakshmi: cudaDeviceSynchronize() 00:25:59.112,00:26:02.112 Chinmay Rajesh Ingle: global barrier 00:31:31.604,00:31:34.604 Chinmay Rajesh Ingle: deadlock? if we wait for call from main 00:31:35.873,00:31:38.873 CHEKKALA SANDEEP REDDY: host cannot synctreads 00:31:36.880,00:31:39.880 VIPIN PATEL: _syncthread is only supporte dopn GPU 00:32:46.294,00:32:49.294 CHEKKALA SANDEEP REDDY: *counter = 0 in main() is not correct 00:33:46.429,00:33:49.429 Abhishek u: counter cannot be accessed on GPU 00:35:17.097,00:35:20.097 Soumik Basu: No, this is not in pinned mem 00:35:27.383,00:35:30.383 Abhishek u: variable declaration issue 00:35:33.985,00:35:36.985 CHEKKALA SANDEEP REDDY: is there any problem in __host__ __device__ int counter 00:36:45.817,00:36:48.817 CHEKKALA SANDEEP REDDY: counter cannot be same on same gpu and cpu i think that is the reason for invalid declaratipon 00:37:22.266,00:37:25.266 Chinmay Rajesh Ingle: consistency issues or too many syncs with cpu-gpu? 00:39:22.668,00:39:25.668 Sayan Dey: race condition? 00:39:24.331,00:39:27.331 CHEKKALA SANDEEP REDDY: problem with fun() accasing gpu varaible 00:40:55.768,00:40:58.768 CHEKKALA SANDEEP REDDY: is there is fun() call from main() wil ther be run time erorr 00:41:09.351,00:41:12.351 CHEKKALA SANDEEP REDDY: ok 00:42:46.361,00:42:49.361 Soumik Basu: __host__ __device__ incr(*arr, i) {arr[i] = arr[i] + 1}, now call this function from GPU kernel for some parts and call this function onhost in loops 00:46:50.256,00:46:53.256 Shyam Murthy: is there a way to know if i am running on weather on CPU or GPU? that can differentiate the function 00:55:40.599,00:55:43.599 Sayan Dey: sir can you explain the diff b/w host_vector and vector? 00:56:35.434,00:56:38.434 Ankit Saha: are the writes to the device_vector happening on the GPU? 00:56:35.810,00:56:38.810 Abhishek u: updates to D are by what threads? 00:56:43.254,00:56:46.254 Sayan Dey: what's the advantage of host_vector over vector here?

Download

0 formats

No download links available.