EURASIP Journal on Applied Signal Processing 2005:8, 1159–1173
c© 2005 Hindawi Publishing Corporation
Clustering Time Series Gene Expression Data
Based on Sum-of-Exponentials Fitting
Ciprian Doru Giurcăneanu
Institute of Signal Processing, Tampere University of Technology, P.O. Box 553, 33101 Tampere, Finland
Email: ciprian.giurcaneanu@tut.fi
Ioan Tăbuş
Institute of Signal Processing, Tampere University of Technology, P.O. Box 553, 33101 Tampere, Finland
Email: ioan.tabus@tut.fi
Jaakko Astola
Institute of Signal Processing, Tampere University of Technology, P.O. Box 553, 33101 Tampere, Finland
Email: jaakko.astola@tut.fi
Received 8 June 2004; Revised 26 October 2004; Recommended for Publication by Xiaodong Wang
This paper presents a method based on fitting a sum-of-exponentials model to the nonuniformly sampled data, for clustering the
time series of gene expression data. The structure of the model is estimated by using the minimum description length (MDL)
principle for nonlinear regression, in a new form, incorporating a normalized maximum-likelihood (NML) model for a subset of
the parameters. The performance of the structure estimation method is studied using simulated data, and the superiority of the
new selection criterion over earlier criteria is demonstrated. The accuracy of the nonlinear estimates of the model parameters is
analyzed with respect to the Cramér-Rao lower bounds. Clustering examples of gene expression data sets from a developmental
biology application are presented, revealing gene grouping into clusters according to functional classes.
Keywords and phrases: nonuniformly sampled data, sum-of-exponentials model, normalized maximum likelihood, time series
clustering, gene expression data, developmental biology.
1.
INTRODUCTION
The gene expression time profiles are a rich source of infor-
mation about the dynamics of the underlying genomic net-
work. The experiments are often taken at nonuniform time
points, suggested by the biologist’s intuition about the time
scale of the important changes