Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture
[.pdf]
Abstract
Modern microprocessor architectures have gradually incorporated
support for parallelism.
In the past the degree of parallelism was rather small and as such it
could be best modeled as a constant speedup over the traditional RAM model,
however, as a consequence of continued growth this assumption might no longer hold.
For example, with the introduction of 32- and 64-bit architectures, bit-level parallelism
became significant. This led to the introduction of the transdichotomous RAM model, for which many
algorithms which are faster in theory and practice have been developed.
Similarly, over the last five years, major microprocessor manufacturers have released
road maps for the next decade predicting a rapidly increasing number of
cores, with upwards of 64 cores per microprocessor by 2015.
In such a setting a sequential RAM computer no longer accurately reflects
the architecture on which algorithms are being executed.
In this paper we propose a model of low degree parallelism (LoPRAM) which builds
upon the RAM and PRAM models yet better reflects recent advances in parallel (multi-core) architectures.
The model has as a goal a combination of fidelity of the underlying hardware
and ease of programming and analysis. When necessary we make tradeoffs
between what is achievable in hardware and what is understandable and
programmable by humans/compilers.
The proposed model supports a high level abstraction that
simplifies the design and analysis of parallel programs.
Then we show that in many instances this model, though in many ways similar to
the classic PRAM, it naturally leads to work-optimal parallel algorithms via simple modifications to sequential algorithms.
This is in contrast to the PRAM model, in which the design and
analysis and implementation of work-optimal algorithms proved to be one of the biggest
challenges in practice for their adoption.
Bibtex Entry