Wide-area cooperative storage with CFS
Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica
□
MIT Laboratory for Computer Science
chord@lcs.mit.edu
http://pdos.lcs.mit.edu/chord/
Abstract
The Cooperative File System (CFS) is a new peer-to-peer read-
only storage system that provides provable guarantees for the ef-
ficiency, robustness, and load-balance of file storage and retrieval.
CFS does this with a completely decentralized architecture that can
scale to large systems. CFS servers provide a distributed hash table
(DHash) for block storage. CFS clients interpret DHash blocks as
a file system. DHash distributes and caches blocks at a fine granu-
larity to achieve load balance, uses replication for robustness, and
decreases latency with server selection. DHash finds blocks using
the Chord location protocol, which operates in time logarithmic in
the number of servers.
CFS is implemented using the SFS file system toolkit and runs
on Linux, OpenBSD, and FreeBSD. Experience on a globally de-
ployed prototype shows that CFS delivers data to clients as fast
as FTP. Controlled tests show that CFS is scalable: with 4,096
servers, looking up a block of data involves contacting only seven
servers. The tests also demonstrate nearly perfect robustness and
unimpaired performance even when as many as half the servers
fail.
1.
Introduction
Existing peer-to-peer systems (such as Napster [20], Gnu-
tella [11], and Freenet [6]) demonstrate the benefits of cooperative
storage and serving: fault tolerance, load balance, and the ability to
harness idle storage and network resources. Accompanying these
benefits are a number of design challenges. A peer-to-peer archi-
tecture should be symmetric and decentralized, and should operate
well with unmanaged volunteer participants. Finding desired data
in a large system must be fast; servers must be able to join and leave
the system frequently without affecting its robustness or efficiency;
and load must be balanced across the available servers. While the
peer-to-peer sys