GPU Programming

In the context of ECE459, we mainly do this in the context of CUDA (on Nvidia GPUs).

Programming Model

Write the code for parallel computation (kernel) separately from the main code
Transfer the data to the GPU co-processor
Wait
Transfer results back It makes sense to hand work over to the GPU because there are a lot of cores so we can distribute. There is a significant runtime overhead for the data transfer but its really fast once it starts (like driving vs flying).