An Empirical Study of the Performance of Scalable Locality on a Distributed Shared Memory System

Date
2011
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Moderator
Panelist
Alternative Title
Department
Haverford College. Department of Computer Science
Type
Thesis
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
eng
Note
Table of Contents
Terms of Use
Rights Holder
Access Restrictions
Open Access
Tripod URL
Identifier
Abstract
We run an empirical study to determine the performance attainable using scalable locality on a distributed shared memory system. We utilize the PLUTO automatic parallelizer and locality optimizer on a one-dimensional Jacobi stencil with a distributed shared memory system provided by Intel’s Cluster OpenMP on commodity hardware. Unlike earlier works which base performance concerns on tile size, we discover tile sizes are less crucial, and instead threading issues and cache interference need particular attention. On our 56 processor cluster of 14 quad-core machines, with preliminary single-level tiling we are able to realize 121.3 GFLOPS, or 63% of linear speedup. Our data suggest that adding further cores would increase performance linearly.
Description
Citation
Collections