In the context of ECE459, we mainly do this in the context of CUDA (on Nvidia GPUs).

Programming Model

  1. Write the code for parallel computation (kernel) separately from the main code
  2. Transfer the data to the GPU co-processor
  3. Wait
  4. Transfer results back It makes sense to hand work over to the GPU because there are a lot of cores so we can distribute. There is a significant runtime overhead for the data transfer but its really fast once it starts (like driving vs flying).