Clustering Techniques
A Technical Whitepaper
By Lorinda Visnick
14 Oak Park
Bedford, MA 01730 USA
Phone: +1-781-280-4000
www.objectstore.net
Clustering Techniques: A Technical White Paper
Copyright © 2002-2003 Progress Software Corporation. All rights reserved.
Page 2 of 8
Introduction
Performance of a database can be greatly impacted by the manner in which data is loaded. This fact is true
regardless of when the data is loaded; whether loaded before the application(s) begin accessing the data, or
concurrently while the application(s) are accessing the data. This paper will present various strategies for
locating data as it is loaded into the database and detail the performance implications of those strategies.
Data Clustering, Working Sets, and Performance
With ObjectStore® access to persistent data can perform at in-memory speeds. In order to achieve in-
memory speeds, one needs cache affinity. Cache affinity is the generic term that describes the degree to
which data accessed within a program overlaps with data already retrieved on behalf of a previous request.
Effective data clustering allows for better, if not optimal, cache affinity.
Data density is defined as the proportion of objects within a given storage block that are accessed by a
client during some scope of activation. Clustering is a technique to achieve high data density. The
working set is defined as the set of database pages a client needs at a given time. ObjectStore is a page-
based architecture (see the ObjectStore Architecture Introductory technical paper at
www.objectstore.net for a detailed review of the ObjectStore architecture) which performs best when the
following goals are met:
• Minimize the number of pages transferred between the client and server
• Maximize the use of pages already in the cache
In order to achieve these goals, the working set of the application should be optimal. The way to achieve
an optimal working set is via data clus