Parallelism

Increase bandwidth at (often times) the cost of increased Latency. There are problems that are very easy to parallelize, but there is often some sort of overhead. There are problems that are “inherently sequential” (if an iteration’s execution depends directly on the previous execution). There’s also always going to be some cognitive overhead writing parallel code (partial ordering etc).

Parallelism can lead to data race (which is solved with primatives, but when too constrained can lead to deadlock.

Amdahl’s Law

In 1967 Gene Amdahl argued that improvements in processor design for single processors would be more effective than designing multi-processor systems.

Assumptions:

Problem size is fixed

The program or the underlying implementation behaves the same on 1 vs N processors

We can accurately measure runtimes and overhead doesn’t matter

$T_{p} = T_{s} \cdot (S + \frac{P}{N})$
Where:

$T_{p}$ : Parallel time

$T_{s}$ : Total time in single processor system

$S$ : Serial part

$P$ : Parallelizable part

Here, as N increases, $T_{p}$ is dominated by S, limiting potential speedups

We can also define max speedup as $max speedup = \frac{1}{( 1 - P )}$ Where as P increases, we can get up to an 18x speedup. We can use this information to figure out our runtimes relative to how many processors we have.

To empirically estimate parallel speedup: $P_{es t ima t e d} = \frac{\frac{1}{speedup} - 1}{\frac{1}{N} - 1}$ .
If we generalize Amdahl’s law, lets have:

$f_{1}, f_{2}, \dots, f_{n}$ which is the fraction of time in part n
$S_{f_{1}}, S_{f_{2}}, \dots, S_{f_{n}}$ which is the speedup for part n

Then: $speedup = \frac{1}{\frac{f _{1}}{S _{f_{1}}} + \dots + \frac{f _{n}}{S _{f n}}}$ .

🤖 Dan Huynh

Recent Notes

ECE459

Bitcoin

CUDA Kernels

ChatGPT

Counters

Explorer

Parallelism

Categories

Graph View

Recent Notes

ECE459

Bitcoin

CUDA Kernels

Backlinks