Quantcast
Channel: CUDA 5.0 context management with single application thread in multiple GPU environment - Stack Overflow
Viewing all articles
Browse latest Browse all 2

CUDA 5.0 context management with single application thread in multiple GPU environment

$
0
0

It seems that most tutorials, guides, books and Q&A from the web refers to CUDA 3 and 4.x, so that is why I'm asking it specifically about CUDA 5.0. To the question...

I would like to program for an environment with two CUDA devices, but use only one thread, to make the design simple (specially because it is a prototype). I want to know if the following code is valid:

float *x[2];float *dev_x[2];for(int d = 0; d < 2; d++) {    cudaSetDevice(d);    cudaMalloc(&dev_x[d], 1024);}for(int repeats = 0; repeats < 100; repeats++) {    for(int d = 0; d < 2; d++) {        cudaSetDevice(d);        cudaMemcpy(dev_x[d],x[d],1024,cudaMemcpyHostToDevice);        some_kernel<<<...>>>(dev_x[d]);        cudaMemcpy(x[d],dev_x[d],1024,cudaMemcpyDeviceToHost);    }    cudaStreamSynchronize(0);}

I would like to know specifically if cudaMalloc(...)s from before the testing for persist even with the interchanging of cudaSetDevice() that happens in the same thread. Also, I would like to know if the same happens with context-dependent objects such as cudaEvent_t and cudaStream_t.

I am asking it because I have an application in this style that keeps getting some mapping error and I can't find what it is, if some missing memory leak or wrong API usage.

Note: In my original code, I do check every single CUDA call. I did not put it here for code readability.


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images