GPU: L4 Part 2: CUDA Topics
00:13:28.220,00:13:31.220 CHEKKALA SANDEEP REDDY: we have to use saxpy as sub routine ?? 00:15:45.866,00:15:48.866 Chinmay Rajesh Ingle: int count; __host__ __device__ func(int x){ if x==req count++ } i am not sure about the syntax 00:15:50.107,00:15:53.107 CHEKKALA SANDEEP REDDY: what does transform and reduce do can you reexplain? 00:19:40.927,00:19:43.927 Chinmay Rajesh Ingle: in reduce do we directly add all the return values? 00:34:14.093,00:34:17.093 Sayan Dey: Sir, can you explain the syntax once more? 00:35:49.859,00:35:52.859 Sayan Dey: no sir...understood 00:35:49.873,00:35:52.873 R Sowmeya Lakshmi: So we can decide on which stream we want to do the transfer on 00:35:54.655,00:35:57.655 R Sowmeya Lakshmi: ?? 00:36:34.430,00:36:37.430 R Sowmeya Lakshmi: If we schedule two transfers on same stream wont they be queued and happen one after other? why will overlap happen 00:36:38.720,00:36:41.720 Abhishek u: cudamalloc does not affect the correctness 00:36:43.320,00:36:46.320 CHEKKALA SANDEEP REDDY: If we use cudaMemcpy() copying is not queued into stream. 00:37:18.356,00:37:21.356 R Sowmeya Lakshmi: So overlap happends between sreams 00:37:55.111,00:37:58.111 R Sowmeya Lakshmi: okay sir 00:38:35.337,00:38:38.337 CHEKKALA SANDEEP REDDY: Is stream is really required in cudaMemcpy() 00:39:26.535,00:39:29.535 Chinmay Rajesh Ingle: cudamemcpy is blocking it waits till we transfer all the data so we wont move to next cudamemcpy 00:39:29.534,00:39:32.534 CHEKKALA SANDEEP REDDY: There is no parallelsim if cudaMemcpy() is used . 00:44:21.757,00:44:24.757 Sayan Dey: for the next prog, all K1, all K2, all K3 will be printed together but overlap among the three clusters 00:56:41.444,00:56:44.444 R Sowmeya Lakshmi: in kernel K followed by inside callback streamno 00:57:22.072,00:57:25.072 Chinmay Rajesh Ingle: interleaved print between different kernel is possible right? 00:57:55.958,00:57:58.958 R Sowmeya Lakshmi: for loop is sequential 00:59:06.370,00:59:09.370 CHEKKALA SANDEEP REDDY: YES 00:59:27.626,00:59:30.626 R Sowmeya Lakshmi: no as only after callback for i=0 is executed next kernel is launched 00:59:34.193,00:59:37.193 CHEKKALA SANDEEP REDDY: In the next slide !! 00:59:58.533,01:00:01.533 Chinmay Rajesh Ingle: oh i didn't sync 01:00:01.714,01:00:04.714 Chinmay Rajesh Ingle: see * 01:03:59.202,01:04:02.202 Chinmay Rajesh Ingle: callbacks here are just to add order to the stream, right? 01:04:21.799,01:04:24.799 Chinmay Rajesh Ingle: as in adding kernels i means 01:04:54.516,01:04:57.516 Chinmay Rajesh Ingle: oh okay got it 01:11:27.577,01:11:30.577 R Sowmeya Lakshmi: The call waits for whatever is in the stream when we record the event? 01:17:17.116,01:17:20.116 Sayan Dey: Thank you very much for the nice lectures, Sir 01:18:03.638,01:18:06.638 Ashwina Kumar: 10.1 i am using
Download
0 formatsNo download links available.