In the context of ECE459, we mainly do this in the context of CUDA (on Nvidia GPUs).
Programming Model
- Write the code for parallel computation (kernel) separately from the main code
- Transfer the data to the GPU co-processor
- Wait
- Transfer results back It makes sense to hand work over to the GPU because there are a lot of cores so we can distribute. There is a significant runtime overhead for the data transfer but its really fast once it starts (like driving vs flying).