Building a Java MapReduce Framework
for Multi-core Architectures
George Kovoor, Jeremy Singer and Mikel Luján
Advanced Processor Technologies Group
The University of Manchester, UK
Abstract. MapReduce is a programming pattern that has been proved to be a
simple abstraction on top of which can be built an efficient platform for large-
scale data processing in distributed environments, such as Google or Hadoop.
With this pattern, application logic is expressed using sequential map and
reduce functions. Thus, a runtime system can exploit the lack of side effects
(pure functions) in these functions to execute concurrently. The runtime
framework also takes care of the low-level parallelisation and scheduling
details. The success of the MapReduce pattern has led to several
implementations for various scenarios. This paper introduces MR-J, a
MapReduce Java framework for multi-core architectures, and reports the
scalability results from the first experiments.
Keywords: MapReduce, parallel software framework.
The MapReduce programming pattern was, by no means, invented by Google. Its
roots can be traced back to functional programming . Nonetheless, it has attracted a
fair amount of attention from industry, academics and open-source projects (Hadoop
), since Google made public  that in their experience this pattern was easy to use
and provided a highly effective means of attaining massive parallelism in large data-
centers. The pattern is not a silver bullet that can be applied to any general-purpose
application (some consider it a step backwards ), but it covers an important part of
the application spectrum. Our objective is to investigate the MapReduce pattern in the
context of multi-core architectures and not within data-centers as commonly used by
Amazon, Facebook, Google and Yahoo, to name a few .
We have selected Java as the programming language because the main open-source
implementation of MapReduce, i.e. Hadoop, is also developed in this language. Thus,