CUDA

Compute unified device architecture

API lets you use most C++ features in CUDA where OpenCL (which is supposed to be cross-platform?) is more restrictive.

CUDA includes both Task Parallelism and Data Parallelism. Data parallelism here is a central feature where we want to evaluate a function (or a kernel) over a set of points.

A work item is a fundamental unit of work in CUDA. These live on an N-dimensional grid. where N is up to 3. You get your choice of block size but we usually let the system decide (this is how many work items get grouped into a block which may share memory although they operate independently). However, you want to make the best use of the hardware, and choose for your numbers a multiple of the size of the warp (unit of execution).

Shared Memory
CUDA Kernels

🤖 Dan Huynh

Recent Notes

Dan Huynh

Linearity

CAP Theorem

Causality

Quorum Reads and Writes

Explorer

CUDA

Graph View

Recent Notes

Dan Huynh

Linearity

CAP Theorem

Backlinks