An Adaptive OpenMP Loop Scheduler for Hyperthreaded SMPs
Yun Zhang, Mihai Burcea, Victor Cheng, Ron Ho and Michael Voss
Department of Electrical and Computer Engineering, University of Toronto
10 King’s College Road, Toronto, ON, M5S 3G4, Canada
threaded (SMT) processors are now available in
commodity workstations and servers. This technology
is designed to increase throughput by executing
multiple concurrent threads on a single physical pro-
cessor. These multiple threads share the processor’s
functional units and on-chip memory hierarchy in
an attempt to make better use of idle resources.
This work focuses on tuning the behavior of OpenMP
applications executing on SMPs with SMT processors.
We propose a self-tuning OpenMP loop scheduler
designed to react to behavior caused by inter-thread
data locality, instruction mix and SMT-related load
imbalance. This adaptive loop scheduler automati-
cally selects the number of threads that should be
used for each parallel loop and a good scheduling
policy for the iterations.
It is shown that this
scheduler outperforms all other OpenMP schedulers,
and because it can dynamically select the number of
threads to use for each region, it even outperforms
the best combination of runtime schedulers for any
fixed number of threads.
Hyperthreaded (HT)  and simultaneous multi-
threaded (SMT)  processors are now available in
commodity systems1. SMTs allow multiple threads
to execute concurrently on a single physical proces-
sor. Intel states that HT yields performance gains as
large as 25 - 40% for a less than a 5% increase in die
size . The cost is minimal because most resources
are shared by the threads on an SMT, including the
functional units and the on-chip memory hierarchy.
In this work, we focus on tuning the behavior
of OpenMP applications executing on Hyperthreaded
SMPs. OpenMP has emerged over the last few years
as the dominant programming interface for express-
ing loop-level parallelism for shared-