EVALUATING AND IMPROVING PERFORMANCE OF MULTIMEDIA APPLICATIONS
ON SIMULTANEOUS MULTI-THREADING
Yen-Kuang Chen, Eric Debes, Rainer Lienhart, Matthew Holliman, and Minerva Yeung
Microprocessor Research Labs, Intel Corporation
ABSTRACT
This paper presents the study and results of running
several core multimedia applications on a simultaneous
multithreading (SMT) architecture,
including some
detailed analysis ranging from memory-bounded kernels
to computational-bounded functions. A performance
metric to evaluate effective SMT performance gain is
introduced, and compared to similar metrics on symmetric
multiprocessor (SMP) systems. In addition, we analyze
and compare SMT versus SMP systems, and highlight the
advantages in the studied applications. The results indicate
that sharing the cache in SMT processors can provide
better cache locality and thus better performance although
sharing the cache can introduce cache conflicts and reduce
the actual cache size available for each logical processor.
We also propose “mutual prefetching” -- a technique to
schedule threads so that they prefetch data for each other
in order to reduce cache miss penalty.
1. INTRODUCTION
While processors nowadays are much faster than they
used to be, the rapidly growing complexity of such
designs also makes achieving significant additional gains
more difficult. Consequently, processors/systems that can
run multiple software threads have received increasing
attention as a means of boosting overall performance. In
this paper, we first characterize the workloads of video
decoding, encoding, watermarking, and machine learning
on current superscalar architectures, and then we
characterize the same workloads on simultaneous multi-
threading (SMT) architectures. Specially, we use Intel
Xeon processors with Hyper-Threading Technology,
which is one implementation of the SMT architecture. Our
goal is to provide a better understanding of performance
improvements on SMT processors.
Figure 1 shows a high-