daily volume, and Bouchaud et al (2004) discover non-trivial serial corre-
lation in volume and price data.
The publicly available data sets lack reliable classification of individual
trades as buyer- or seller-initiated. Even more significantly, each transaction
exists in isolation; there is no information on sequences of trades that form
part of a large transaction. Some academic studies have used limited data sets
made available by asset managers that do have this information, where the
date but not the time duration of the trade is known (Chan & Lakonishok,
1995, Holthausen, Leftwich & Mayers, 1990, and Keim & Madhavan, 1996).
The transaction cost model embedded in our analysis is based on the
model presented by Almgren & Chriss (2000) with non-linear extensions
from Almgren (2003). The essential features of this model, as described in
below, are that it explicitly divides market impact costs into a permanent
component associated with information, and a temporary component aris-
ing from the liquidity demands made by execution in a short time.
Data
The data set on which we base our analysis contains, before filtering, al-
most 700,000 US stock trade orders executed by Citigroup equity trading
desks for the 19-month period from December 2001 to June 2003. (The
model actually used within the BECS software is estimated on an ongoing
basis, to reflect changes in the trading environment.) We now briefly de-
scribe and characterise the raw data, and then the particular quantities of
interest that we extract from it.
■ Description and filters. Each order is broken into one or more trans-
actions, each of which may generate one or more executions. For each
order, we have the following information:
■ The stock symbol, requested order size (number of shares) and sign
(buy or sell) of the entire order. Client identification is removed.
■ The times and methods by which transactions were submitted by the
Citigroup trader to the market. We take the time t0 of the first transaction
to be the start of the order. Some of the