Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture



Modern microprocessor architectures have gradually incorporated support for parallelism. In the past the degree of parallelism was rather small and as such it could be best modeled as a constant speedup over the traditional RAM model, however, as a consequence of continued growth this assumption might no longer hold. For example, with the introduction of 32- and 64-bit architectures, bit-level parallelism became significant. This led to the introduction of the transdichotomous RAM model, for which many algorithms which are faster in theory and practice have been developed. Similarly, over the last five years, major microprocessor manufacturers have released road maps for the next decade predicting a rapidly increasing number of cores, with upwards of 64 cores per microprocessor by 2015. In such a setting a sequential RAM computer no longer accurately reflects the architecture on which algorithms are being executed. In this paper we propose a model of low degree parallelism (LoPRAM) which builds upon the RAM and PRAM models yet better reflects recent advances in parallel (multi-core) architectures. The model has as a goal a combination of fidelity of the underlying hardware and ease of programming and analysis. When necessary we make tradeoffs between what is achievable in hardware and what is understandable and programmable by humans/compilers. The proposed model supports a high level abstraction that simplifies the design and analysis of parallel programs. Then we show that in many instances this model, though in many ways similar to the classic PRAM, it naturally leads to work-optimal parallel algorithms via simple modifications to sequential algorithms. This is in contrast to the PRAM model, in which the design and analysis and implementation of work-optimal algorithms proved to be one of the biggest challenges in practice for their adoption.

Bibtex Entry