### About Global Documents

**Global Documents provides you with documents from around the globe on a variety of topics for your enjoyment. **

**Global Documents utilizes edocr for all its document needs due to edocr's wonderful content features. Thousands of professionals and businesses around the globe publish marketing, sales, operations, customer service and financial documents making it easier for prospects and customers to find content. **

Dijkstra’s Algorithm On–Line:

An Empirical Case Study from

Public Railroad Transport1

Frank Schulz

and

Dorothea Wagner

and

Karsten Weihe

University of Konstanz, Department of Computer & Information Science

Traffic information systems are among the most prominent real–world applications of Dijkstra’s

algorithm for shortest paths. We consider the scenario of a central information server in the

realm of public railroad transport on wide–area networks. Such a system has to process a large

number of on–line queries for optimal travel connections in real time. In practice, this problem is

usually solved by heuristic variations of Dijkstra’s algorithm, which do not guarantee an optimal

result. We report results from a pilot study, in which we focused on the travel time as the only

optimization criterion. In this study, various speed–up techniques for Dijkstra’s algorithm were

analysed empirically. This analysis was based on the timetable data of all German trains and on

a “snapshot” of half a million customer queries.2

1. INTRODUCTION

1.1 Problem.

From a theoretical viewpoint, the problem of finding a shortest path from one

node to another one in a graph with edge lengths is satisfactorily solved. In fact,

the Fibonacci–heap implementation of Dijkstra’s algorithm requires O(m+n log n)

time, where n is the number of nodes and m the number of edges [6]. However,

various practical application scenarios impose restrictions that make this sort of

algorithm impractical. For instance, many scenarios impose a strict limitation on

space consumption.3

1 c©ACM, 2000. This is the author’s version of the work. It is posted here by permission of ACM

for your personal use. Not for redistribution. The definitive version was published in Journal of

Experimental Algorithmics, Volume 5(12), 2000, http://www.jea.acm.org

2With special courtesy of the TLC Transport-, Informatik- und Logistik–Consulting GmbH/EVA–

Fahrplanzentrum, a subsidiary of the Deutsche Bahn AG.

3To give a concrete example: if a traffic information system is to be distributed on CD–Rom or to

be run on an embedded system, a naive implementation of Dijkstra’s algorithm would typically

Address: University of Konstanz, Box D188, 78457 Konstanz, Germany

{schulz,wagner}@fmi.uni–konstanz.de, weihek@acm.org

http://www.fmi.uni-konstanz.de/˜{schulz,wagner,weihe}

2

·

F. Schulz and D. Wagner and K. Weihe

In this paper, we consider a different scenario: space consumption is not an issue,

but the system has to answer a potentially infinite number of customer queries for

optimal travel connections on–line. The real–time restrictions are soft, which basi-

cally means that the average response time is more important than the maximum

response time. The concrete scenario we have in mind is a central server for public

railroad transport, which has to process a large number of queries (e.g. a server

that is directly accessible by customers through terminals in the train stations or

through a web interface).

Assumption 1.1. Our computational study is based on two further simplifying

assumptions:

(1) The time tables are identical every day.

(2) The length of a travel connection is exclusively defined as the travel time (to

be specified in greater detail in Definition 1.2 below).

To achieve Assumption 1.1(1), we simply unite the time tables of all days. Due

to this assumption, a customer query need not specify a date.

Definition 1.2. A customer query (or simply query for short) is a triple con-

sisting of two stations, A and B, and an earliest departure time t. Such a query

asks for a connection from A to B that does not start before t and arrives B as

early as possible.

Algorithmic problems of this kind are usually approached heuristically in practice,

which means that an optimal result is not guaranteed.

Definition 1.3. An algorithm for a shortest–path problem is called distance–

preserving if it produces an optimal connection for every input instance.

Distance–preserving algorithms are not in wide use in traffic information sys-

tems, because the average response time of such an algorithm was perceived to be

unacceptable. In a new long–term project, we plan to investigate the question to

what extent distance–preserving variants of Dijkstra’s algorithm have become com-

petitive on contemporary computer technology. Here we give an experience report

from a pilot study, in which we focused on the most fundamental kind of query,

namely the one specified in Definition 1.2 (subject to Assumption 1.1(1)).

This scenario is an example of a general problem in the design of practical algo-

rithms, which we discussed in [22]: computational studies based on artificial (e.g.

random) data do not make much sense, because the characteristics of the real–world

data are crucial for the success or failure of an algorithmic approach in a concrete

use scenario. Hence, experiments on real–world data are the method of choice.

1.2 Related work.

Various textbooks address speed–up techniques for Dijkstra’s algorithm but have

no concrete applications in mind, notably [2] and [14]. In [21], Chapter 4, a brief,

introductory survey of selected techniques is given (with a strong bias towards

the use scenario discussed here). Most work from the scientific side addresses the

single–source variant, where a spanning tree of shortest paths from a designated

exceed the available space.

Dijkstra’s Algorithm On–Line

·

3

root to all other nodes is to be found. Moreover, the main aspect addressed in work

like [5] is the choice of the data structure for the priority queue. In Section 3 below

(paragraph on the “search horizon”), we will see that the scenario considered in

this paper requires algorithmic approaches that are fundamentally different, and

Section 4 will show that the choice of the priority queue is indeed a marginal aspect

in view of the goal of our project.

On the other hand, most application–oriented work in this field is commercial,

not scientific, and there is only a small number of publications. In fact, we are not

aware of any publication especially about algorithms for wide–area railroad traffic

information systems.

Some scientific work has been done on local public transport. For example, [17]

gives some insights into the state of the art. However, local public transport is quite

different from wide–area public transport, because the timetables are very regular,

and the most powerful speed–up techniques are based on the strict periodicity of

the trains, buses, ferries, etc. In contrast, our experience is that the timetables of

the national European train companies are not regular enough to gain a significant

profit from these techniques.4

On the other hand, private transport has been extensively investigated in view of

wide–area networks. Roughly speaking, this means “routing planners” for cars on

city and country maps [9; 1; 4; 20; 12; 18]. This problem is different to ours in that it

is two–dimensional, whereas train timetables induce time as a third dimension: due

to the lack of periodicity, the earliest departure time is significant in our scenario.

In contrast, temporal aspects do not play any role in the work quoted above.5 So

it is not surprising that the research on routing planners has focused on purely

geometric techniques.

Some of the application–oriented work quoted above is based on hierarchical

approaches. This is also one of our strategies (see Section 3.2, strategy “selections of

stations”). For example, in [19] this approach is viewed from a database perspective

without any specific scenario in mind (though with a bias towards road maps).

For completeness, we mention the work on variants of Dijkstra’s algorithm that

are intended to efficiently cope with large data in secondary memory. Chapter 9

of [3] gives an introduction to theoretical and practical aspects. As mentioned above

(“problem” section in the Introduction), the problems caused by the slow access to

secondary memory are beyond the scope of our paper.

Since our main contribution is a computational study, the work on the method-

ology of this sort of study (e.g. [7; 8; 11; 10; 13; 15; 16]) is also related. We will

come back to methodological issues in Section 2.

1.3 Contribution of the paper.

We implemented and evaluated various distance–preserving speed–up techniques for

Dijkstra’s algorithm. The study is based on all train data (winter period 1996/97)

of the Deutsche Bahn AG, the national railroad and train company of Germany.

4This experience is supported by personal communications with people from the industry.

5In principle, temporal aspects would also be relevant for the private transport, for example, the

distinction between “rush hours” and other times of the day. However, this sort of temporal aspect

is quite different from ours and, to our knowledge, not scientifically invesitigated either.

4

·

F. Schulz and D. Wagner and K. Weihe

The processed queries are a “snapshot” of the central Hafas6 server of the Deutsche

Bahn AG, in which all queries of customers of all ticket offices in Germany were

recorded over several hours.

In particular, an input instance consists of a fixed and a variable part:

—Fixed : all input instances include the same set of time tables.

—Variable: the query, that is, departure station, arrival station, and earliest de-

parture time, varies from input instance to input instance.

The result of this snapshot comprises more than half a million queries, which

might suffice for a representative analysis of this particular service (assuming that

the typical query profile of customers does not vary dramatically from day to day).

Due to the above–mentioned insight that the periodicity of the timetables is not

a promising base for algorithmic approaches, the question is particularly interesting

whether geometric techniques like those in routing planners are successful, although

the scenario has geometric and temporal characteristics. We will see that this

question can indeed be answered in the affirmative.

1.4 Overview.

In Section 2 we will discuss the general rationale of our experimental set–up. Then,

in Section 3, we will briefly summarize the algorithmic techniques that we evaluated.

Finally, in Section 4, we will present and discuss the results of the computational

study.

2. EMPIRICAL METHODOLOGY

As mentioned in the Introduction, the question to be answered by our computa-

tional study is very specific:

Are distance–preserving variants of Dijkstra’s algorithm (in the sense

of Definition 1.3) competitive on contemporary computer technology in

the use scenario addressed in this paper?

The notion “competitive” is here used in a very informal sense, which is probably

the sense of interest from an applied viewpoint. Roughly speaking, we regard an

algorithm as competitive if it would probably not become a bottleneck operation in a

typical central server. Such a fuzzy notion does not allow far–reaching conclusions.

We believe that our conclusions (which we will give in detail in Section 4 and

in summary in Section 5) are modest enough to be safely based on our usage of

competitiveness.

The question addressed in our paper is quite different from the questions typically

addressed in methodological work such as [7; 8; 11; 10; 13; 15; 16]. In fact, to

our knowledge, the general focus in the literature is on systematic performance

comparisons of different algorithms without or with only loose relation to concrete

applications. The overwhelming majority of computational studies in theoretical

computer science primarily addresses this kind of question. The method of choice

here is the evaluation of the individual algorithms on artificial (mainly random)

6Hafas is a trademark of the Hacon Ingenieurgesellschaft mbH, Hannover, Germany.

Dijkstra’s Algorithm On–Line

·

5

input data, which is a good base for the systematic variation of parameters, and a

statistical analysis of the computational results.

In contrast, the question we address in this paper is tightly bound to a concrete

application scenario. The profile of the input data is very specific, but a sufficient

specification of this profile in terms of parameters such as the number of nodes/edges

or the average/maximum node degree does not seem to be in our reach.7 Thus,

there is no base for the creation of artificial, yet realistic data. On the other hand,

a snapshot of half a million real customer queries, which is the base of our study,

is certainly sufficient for a profound answer to the above question. Hence, we will

restrict our attention to real–world data.

Note that the question addressed in this paper is binary (yes/no). As mentioned

in the Introduction, the real–time constraints are rather soft. This means that the

overall average response time, taken over the total set of all queries, is the most

important piece of information to decide upon yes or no. However, beyond this

very specific question we also regard an estimation of the scope of the result as

important. For instance, such a large body of response times, if well analysed,

should allow a reliable prediction how the average response time changes if the

average geographic distance of departure and arrival station changes. To give a

concrete example: the percentage of long–distance queries is significantly higher

for the web service than for the ticket office service, from which our snapshot was

taken.

To estimate the effect of variations like this, we will also consider the interdepen-

dence between response times and two parameters, the Euclidean distance between

departure and arrival station and the minimal total travel time computed in re-

sponse to the respective query. In Section 4 we will see that this dependency is

surprisingly strong, which means that we may assume a high level of confidence for

predictions.

As mentioned in the Introduction, we implemented and evaluated various speed–

up techniques for Dijkstra’s algorithm. One notable exception is the bidirected–

search technique. This technique does not really apply in our scenario, because the

target node of the search depends on the arrival time and is thus not known in

advance (see Section 3, paragraph on “bidirected search,” for details). However,

one can imagine a variation of the bidirected–search technique, which does not rely

on the exact arrival time but on an upper bound. To answer the question whether

this class of modified bidirected–search techniques is potentially competitive at all,

we implemented and evaluated an idealized variant, which indeed does rely on the

exact arrival time and thus yields a lower bound on the potential performance of

this class of speed–up techniques.

7Nonetheless, a comparison of the algorithms from Section 3 based on random data and a conven-

tional statistical analysis as advocated by some of the above–mentioned papers may be of interest

in its own right. However, such an analysis does not contribute to our specific question and is

thus beyond the scope of this paper.

6

·

F. Schulz and D. Wagner and K. Weihe

3. ALGORITHMS

3.1 Fundamental Algorithmic Approach

3.1.1 Train graph.. From Assumption 1.1 recall our restriction to time tables

that do not vary from day to day. Due to this restriction, we only need a train

network of 24 hours.

The arrival or departure of a train at a station will be called an event. The

train graph contains one node for every event. Two events v and w are connected

by a directed edge (v, w) if v represents the departure of a train at some station

and w represents the very next arrival of this train at some other station. On the

other hand, two successive events at the same station are connected by an edge (in

positive time direction), which means that every station is represented in the graph

by a cycle through all of its events (the cycle is closed by a turn–around edge at

midnight).

In each case, the length of an edge is the time difference (in minutes) of the

events represented by its endnodes. Obviously, a query then amounts to finding a

shortest path from the earliest event at the departure station not before the earliest

departure time to an arbitrary arrival event at the arrival station.

The data contains 933, 066 events on 6, 961 stations. Consequently, there are

933, 066 · 3/2 = 1, 399, 599 edges in the graph.

3.1.2 Priority queue.. Dijkstra’s algorithm relies on a priority queue, which man-

ages the events on the current “frontier line” of the search. As mentioned above, the

best general worst–case bound, O(m+ n log n), is obtained from Fibonacci heaps,

where n is again the number of events and m the number of edges. We do not use

a Fibonacci heap but a normal heap (also often called a 2–heap), which yields an

O((n +m) log n) bound [6]. Since m ∈ O(n) in train graphs, both bounds reduce

to O(n log n).

As an alternative to heaps, we also implemented the Dial variant as described

in [2]. Basically, this means that the priority queue is realized by an array of buckets

with a cyclically moving array index. The events of the frontier line are distributed

among the buckets, and it is guaranteed that the very next non–empty bucket after

the current array index always contains the candidates to be processed next. The

asymptotic worst–case bound of the Dial implementation of Dijkstra’s algorithm is

O(m+ nL), where L is the largest length of an edge.

3.1.3 Search horizon.. Of course, we deviate from the “textbook version” of Di-

jkstra’s algorithm in that we do not compute the distance of every event from

the source event but terminate the algorithm immediately once the first (and thus

optimal) event at the arrival station is processed. The most fundamental distance–

preserving speed–up technique for our scenario is then a reduction of the search to

a (hopefully) small part of the graph, which contains all relevant events. Figs. 1–

4 demonstrate that such a reduction is crucial. In fact, they reveal that for the

majority of all queries only small fractions of the total area and time horizon are

relevant.

To our knowledge, some commercial implementations remove events and edges

from the graph (more or less heuristically, i.e. losing optimality of distances) before

the search itself takes place. In contrast, we aim at an evaluation of distance–

Dijkstra’s Algorithm On–Line

·

7

0

5000

10000

15000

20000

25000

30000

35000

0

100

200

300

400

500

600

700

800

Fig. 1. The frequency distribution histogram of the queries from the snapshot according to

the Euclidean distance between the departure station and the arrival station (granularity: 15

kilometers).

0

5000

10000

15000

20000

25000

30000

0

100

200

300

400

500

600

700

800

Fig. 2. Like Fig. 1 except that the abscissa now denotes the minimal travel time in minutes

(granularity: 20 minutes).

preserving strategies, so our approach is quite different. First of all, we apply the

amortization technique discussed in [21] to obtain a sublinear expected run time

per query.8 The only obstacle to sublinearity is the initialization of all events with

infinite distance labels in the beginning of the textbook algorithm; in fact, Figs. 1–4

8The term “sublinear expected run time” is here used in a rather informal, experience–oriented

way and not to be understood in a formal probabilistic sense.

8

·

F. Schulz and D. Wagner and K. Weihe

0

5000

10000

15000

20000

25000

0

50

100

150

200

Fig. 3. Like Fig. 1 except that the abscissa now denotes the number of edges in the shortest path

(granularity: 10 edges).

0

50

100

150

200

250

300

350

400

450

500

0

200

400

600

800

1000

1200

Fig. 4. For each minimal travel time (granularity: 20 minutes) the total CPU times of all queries

yielding this value of the travel time are summed up to reveal which range of travel times con-

tributes most of the total CPU time required for all queries.

strongly suggest that on average the main loop of the algorithm only processes a

very small fraction of the graph until the arrival station is seen and the algorithm

terminates.

As described in [21], every event is given an additional time stamp, which stores

the number of the query in which it was reached in the main loop. Whenever

an event is reached, its time stamp is updated accordingly. This update properly

increases the time stamp if and only if this is the first time that the node is seen

Dijkstra’s Algorithm On–Line

·

9

in the current search. This is exactly the situation in which the current distance

label should be infinity. Hence, if the time stamp is increased, the distance label

is regarded as infinite, otherwise the value of the distance label is taken as is.

Consequently, there is no need for an expensive initialization phase, and no event

outside the “search horizon” of the main loop is hit at all.

It might be worth mentioning an equivalent strategy to avoid the initialization

step: An additional data structure, e.g. a linked list, is used to store the nodes

with non-infinite distance labels. Then, all non-infinite distance labels can be reset

to infinity after each query.

3.2 Algorithmic Variations

The good performance of the following additional techniques relies on the general

design decisions introduced in Section 3.1. All of these techniques are independent

of each other in the sense that an arbitrary selection of them may be applied

simultaneously.

3.2.1 Angle restriction.. This technique additionally relies on the coordinates

associated with the individual stations. In a preprocessing step, we apply Dijkstra’s

algorithm to each event to compute shortest paths from this event to all other

stations.9 The results are not stored (this would require too much space) but

only used to compute two values α(v, w) and β(v, w) for each edge (v, w) that

corresponds to a train hop. These 2 ·n/2 values (n again the number of events) are

stored and then used in the on–line system.

More specifically, these values are to be interpreted as angles in the plane. See

Fig. 5. Let (v, w) be an edge from some event v to some other event w, and let

sv and sw be the stations to which events v and w belong, respectively. Then the

values α(v, w) and β(v, w) stored for edge (v, w) span a circle sector with center sv

in the plane, where, say, α(v, w) is the right leg. The values α(v, w) and β(v, w) are

determined such that this circle sector has the following meaning: if the shortest

path from event v to some event u at some station s′ contains (v, w), then s′

has to be in this circle sector. Subject to this condition, the difference of the

angles, (β(v, w)−α(v, w)) mod 2π, is minimum. Clearly, the brute–force application

of Dijkstra’s algorithm in the preprocessing allows one to compute this specific,

narrowest possible circle sector for each edge.

Consequently, edge (v, w) may be ignored by the search if the arrival station is

not in the circle sector of this edge. The restriction of the search to edges whose

circle sectors contain the arrival station is the strategy that we will call “angle

restriction” in the following.

Proposition 3.1. Due to the specific construction of the values α(·, ·) and β(·, ·),

the strategy “angle restriction” is distance–preserving in the sense of Definition 1.3.

3.2.2 Selection of stations.. The basic idea behind this technique is similarly

implemented in various routing planners for the private transport [9; 1; 4; 12].

9This preprocessing takes several hours, which is absolutely acceptable in the practical setting

from which this research originated, namely generating time tables once for every winter/summer

period. Hence, there is no need to optimize the preprocessing.

10

·

F. Schulz and D. Wagner and K. Weihe

( , ) α ( , )

β

s’

u

v w

at

v w

at s

v

1

at s

w

2

Fig. 5. The circle sector spanned by α(v, w) and β(v, w) contains all stations s′ such that there

is an event u at s′ whose shortest path from v starts with (v, w).

0

20000

40000

60000

80000

100000

120000

140000

160000

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

Fig. 6. The relation of the number of edges hit without (abscissa) and with (ordinate) strategy

angle selection (granularity: 500 edges).

In general, a certain small set of nodes (i.e. events) is selected, which means

events in our setting. However, we will only select sets of nodes such that either

every node from a station s is selected, or none of the nodes from station s. More

specifically, in the data available to us, every station is assigned an “importance

number,” which is intended to rank its degree of “centrality” in the railroad network.

For our computational study, we select the 225 stations in the highest categories

(see Fig. 10). These stations induce 95, 423 events in total, which constitute our

selection of events.

Let G = (V,E) denote the (directed) train graph. In particular, V is the set of all

events. Moreover, let Ṽ be the set of selected nodes. The strategy is based on what

we will call replaceable paths in the following. Let v, w ∈ V be two events, which

may or may not belong to Ṽ . A path from v to w in G will be called replaceable, if

Dijkstra’s Algorithm On–Line

·

11

no node of the path (except for the endnodes, v and w, themselves) belongs to Ṽ .

Based on this notion, we construct an auxiliary directed graph G′ = (V,E′) on

V . For v, w ∈ V , there is a directed edge (v, w) ∈ E ′ if, and only if, there is a

replaceable path from v to w in G. The length `(v, w) of an edge (v, w) ∈ E ′ in G′

is then defined as the minimum length of a replaceable path from v to w in G.

The auxiliary graph G′ and these edge weights are also constructed once and for

all in a preprocessing step (which only takes a few minutes). Each query is then

answered by the computation of a shortest path in the auxiliary graph G′, and this

path corresponds to a shortest path in the train graph G.

We implement this general approach as follows. First of all, note that it is

not necessary to reconstruct the path in G that corresponds to the shortest path

computed in G′. In fact, what we really want to have from the computation is a

sequence of trains and the stations where to change train. This data can be attached

to the edges of the auxiliary graph in an even more compact (less redundant) and

thus more efficiently evaluable fashion than to the edges of the train graph.

As mentioned above, we select a set of stations, and the events of these stations

are the selected events. Clearly, there is a trade–off. Roughly speaking, the smaller

the number of selected stations is, the larger the resulting connected components

are and, even worse, the larger the number of selected stations neighboring to a

component. Since the number of edges depends on the latter number in a quadratic

fashion, an improvement of performance due to a rigorous reduction of stations is

soon outweighed by the tremendous increase in the number of edges.

It has turned out that in our setting, a minor refinement of this strategy is

sufficient to overcome this trade–off. For this, let u, v, and w be three selected

events such that edges (u, v), (v, w), and (u,w) exist in the auxiliary graph. If

`(u, v)+`(v, w) ≤ `(u,w), then edge (u,w) is dropped in the auxiliary graph. Again,

the optimal distance is preserved. The number of edges grows only moderately after

this modification, so quite a small set of selected stations becomes feasible.

The selected stations induce 95, 423 events in total, which means that the number

of events is approximately reduced by a factor of 10, and the number of stations

is reduced by a factor of 31. This discrepancy between these two factors is not

surprising, because central stations are typically met by more trains than other

stations.

Proposition 3.2. The strategy “selection of stations” is distance–preserving in

the sense of Definition 1.3.

3.2.3 Goal–directed search.. This strategy is based on one real number b(s) for

each station s, which must be a lower bound on the minimal travel time from

s to the arrival station. Let (v, w) ∈ E be an edge, and let sv and sw be the

stations of v and w, respectively. The values b(sv) and b(sw) must additionally

satisfy b(sv)− b(sw) ≤ `(v, w). The length `(v, w) of (v, w) is then replaced by the

modified length `′(v, w) := `(v, w)− b(sv) + b(sw) ≥ 0.

It can be shown that the shortest paths with respect to the length function `′(·, ·)

are identical with the shortest paths with respect to the original length function

`(·, ·). Hence, we may alternately compute a shortest path with respect to `′(·, ·).

The hope is that the “frontier” of the search horizon approaches the arrival station

faster when the modified edge lengths are used. See [14] for a more systematic

12

·

F. Schulz and D. Wagner and K. Weihe

discussion.

Roughly speaking, the tighter the lower bounds s(·) are, the larger the speed–up

effect will be. The base of our lower bounds is the maximum speed of all trains as

a transformation factor. The lower bound for an arbitrary station with respect to a

given target station is the Euclidean distance of these two stations divided by this

factor. These values are computable “on the fly” during the search.

Proposition 3.3. The strategy “goal–directed search” is distance–preserving in

the sense of Definition 1.3.

3.2.4 Bidirected search.. This strategy “merges” two applications of Dijkstra’s

algorithm into one loop. More specifically, one search starts from the source event

and goes forward as usual, whereas the other search starts from the target event

and traverses every edge in backward direction. The algorithm is finished once an

event was labeled permanently by both searches. A shortest path from the source

event to the target event is then easily constructed from the results of the two

searches up to this moment. We refer to [2; 14] for details.

This strategy is not directly applicable in our scenario, because the target event

is not known until the search is finished. Nonetheless, one can imagine variations of

this strategy, in which the backward search indeed starts from the arrival station,

however, not from the optimal arrival time but from an upper bound on this value.

To evaluate the general potential of this strategy in view of our application sce-

nario, we implemented an idealized variant. More specifically, we indeed use the

exact optimal arrival time as a start point for the backward search. Clearly, this

gives a lower bound on the performance of all of these variations of bidirected

search.10 Since we only consider an idealized strategy here, it does not make sense

to evaluate the CPU times achieved by this strategy. Hence, we will restrict the

evaluation of this strategy to operation counts (see Section 4).

3.2.5 Combination of strategies.. In principle, all of these strategies can be freely

combined. In that, the strategy “selection of stations” can be combined with the

strategy “angle restrictions” in two ways, namely the angle restrictions can be

computed for the auxiliary graph, or they can be computed for the train graph and

simply taken over for the auxiliary graph.

4. ANALYSIS OF THE ALGORITHMIC PERFORMANCE

The experiments were performed on a SUN Sparc Enterprise 5000, and the code

was written in C++ using the GNU compiler.

Tables 1 and 2 present a summarizing comparison of all combinations of strate-

gies. Since every node in the train graph has a degree of three, the total number of

algorithmic steps is asymptotically dominated by the number of operations inside

the priority queue. In other words, these operations are representative operation

counts in the sense of [2], Sect. 18. More specifically, for a heap the number of

exchange operations is representative, whereas for the Dial variant the number

of cyclic increments of the moving array index plus the number of the bucket-

operations “insert” and “remove” is representative. The average total number per

10Except for pathological cases that are imagineable but might hardly occur in reality.

Dijkstra’s Algorithm On–Line

·

13

query of these operations are listed in the last part of Tables 1 and 2. As men-

tioned in Section 2, we regard these average numbers as the most important piece

of information in view of our central yes/no question.

For both implementations of the priority queue, the CPU times imply the same

strong ranking of combinations of strategies. Figs. 9–12 might give a visual impres-

sion why this ranking is so unambiguous. Moreover, the discrepancy between the

heap and the Dial implementation in terms of CPU times also decreases roughly

from row to row.

Our experience with several versions of the code is that the exact CPU times are

strongly sensitive to the details of the implementation, but the general tendency

was maintained and seems to be reliable. In particular, the main question raised

in this paper (whether distance–preserving techniques in the sense of Definition 1.3

are competitive) can be safely answered in the affirmative at least for the restriction

of the problem to the total travel time as the only optimization criterion.

However, a detailed look at the results is more insightful (and necessary to con-

clude predictions for scenarios with different profiles as discussed in Section 2).

Fig. 6 shows that there is a very strong correlation between the number of edges

hit with/without strategy “angle restriction.” On the other hand, Fig. 7 shows a

detailed analysis of one particular, exemplary combination of strategies. A com-

parison of the diagrams in Fig. 7 reveals an interesting effect, which is also found

in the analogous diagrams of the other combinations: the CPU times of both the

heap and the Dial implementation “look” linear in the number of nodes hit by the

search.11

This correlation is so strong and the variance is so small that corresponding

diagrams in Fig. 7 look almost identical. Figure 8 reveals the cause.

In other

words, in both cases the operations on the priority queue come close to constant

time on average, even when the average is taken over each query separately.

Finally, the bidirected–search technique required, on average, 5606 nodes, 9497

edges, an operation count of 7531 for the Dial implementation, and an operation

count of 45826 for the heap implementation. As mentioned in the paragraph on

bidirected search in Section 3, it does not make sense to take CPU times here,

because an idealized algorithm was implemented.

Since one can safely assume that the above results for this idealized algorithm

are a lower bound on the real performance of this technique in our scenario, they

give some evidence that bidirected search would be a reasonable choice (assuming

that tight upper bounds on the exact arrival times can be found efficiently), but

might not be the favorite choice.

In Section 2, we raised the question whether the results allow reliable predictions

for similar scenarios such as a web server, where a larger fraction of all queries

asks for long–distance connections. The small variance encourages us to conclude

(with caution) that our results are scalable in an approximately linear fashion to

11From the work on this study we gained the experience that the difference between Θ(n) and

Θ(n log n) may be very small and hidden by the “noise” of the data. Hence, the claim that one

can observe a linear behavior does not really contradict the claim that the asymptotic complexity

is Θ(n logn). In other words, the above claim that the run time be linear is to be taken with a

pinch of salt.

14

·

F. Schulz and D. Wagner and K. Weihe

Selection of stations

Angle restriction

CPU heap

CPU Dial

no

no

0.310

0.103

no

yes

0.036

0.018

yes

no

0.027

0.012

yes

yes (train)

0.007

0.005

yes

yes (auxiliary)

0.005

0.003

Selection of stations

Angle restriction

Nodes

Edges

no

no

17576

31820

no

yes

4744

10026

yes

no

2114

3684

yes

yes (train)

1140

2737

yes

yes (auxiliary)

993

2378

Selection of stations

Angle restriction

Ops. heap

Ops. Dial

no

no

246255

23866

no

yes

24526

3334

yes

no

26304

3660

yes

yes (train)

4973

1197

yes

yes (auxiliary)

3191

932

Table 1. A summary of all computational results for the individual combinations of the techniques

“selection of stations” and “angle restrictions.” The entries “train” and “auxiliary” in column #2

refer to the graph in which the angles were computed (see the last paragraph in Section 3). The

columns #3–#4 give the average over all queries of the snapshot. More specifically, the first table

gives the average raw CPU times, the second table the average number of nodes and edges hit by

the search, and the third table the average operation counts.

scenarios like that.

As announced in the Introduction, the choice of the data structure for the priority

queue seems to be a detail of minor relevance. The difference between the heap and

the Dial implementation looks significant at first glance, however, this seems to be

one of the aspects which are not stable but highly dependent on implementation

details.

5. CONCLUSION AND OUTLOOK

The outcome of this study suggests that geometric speed–up techniques are a good

basis for the computation of provably optimal connections in railroad traffic infor-

mation systems. The question raised in this paper is answered for the total travel

time: the best combinations of strategies are by far faster than is currently needed

in practice. The success of geometry–based strategies is a bit surprising, because

the underlying data is not purely geometric in nature.

Another surprising outcome of the study is that both the normal heap and

the Dial data structure only require an (amortized) constant time per operation,

whereas the worst–case bound is logarithmic for heaps and even linear for Dials.

Note that no amortization over a set of queries must be applied to obtain a con-

stant time per operation; the variance is small enough that the average run time

per operation within a single query can essentially be regarded as bounded by a

Dijkstra’s Algorithm On–Line

·

15

Selection of stations

Angle restriction

CPU heap

CPU Dial

no

no

0.204

0.072

no

yes

0.022

0.013

yes

no

0.019

0.010

yes

yes (train)

0.006

0.004

yes

yes (auxiliary)

0.005

0.003

Selection of stations

Angle restriction

Nodes

Edges

no

no

11072

20046

no

yes

3415

7120

yes

no

1437

2465

yes

yes (train)

856

2002

yes

yes (auxiliary)

754

1765

Selection of stations

Angle restriction

Ops. heap

Ops. Dial

no

no

151274

15274

no

yes

16990

2430

yes

no

18123

2595

yes

yes (train)

3502

961

yes

yes (auxiliary)

2282

751

Table 2. Like Table 1 except that now the strategy “goal–directed search” is additionally applied

in all cases.

constant. Due to the fact that the variance is negligible, a conventional statistical

analysis would not make any sense.

The choice of algorithms (and of minor details of their implementation) is justified

a posteriori in view of our main question. Implementing and comparing further

speed–up strategies is certainly an interesting topic in its own right. However, the

goal of our research effort is different: to go far beyond the restricted scenario of

Assumption 1.1 and to see which algorithms will stand this ultimate test.

The minimal travel time is certainly an empirical research topic in its own right,

not only because it is the most fundamental objective in practice. However, a

practical algorithm must consider further criteria and restrictions. For example,

the ticket costs and the number of train changes are also important objectives.

Moreover, certain trains do not operate every day (contrasting Assumption 1.1(1)),

and certain kinds of tickets are not valid for all trains, so it should be possible

to exclude train connections in a query. A satisfactory compromise must be found

between the speed of the algorithm and the quality of the result. Thus, the problem

is not purely technical anymore but also involves “business rules,” which are usually

very informal.

In the future, an extensive requirements analysis will be necessary, which means

that the work will be no longer purely “algorithmical” in nature. Such an analysis

must be very detailed because otherwise there is no hope to match the real problem.

Unfortunately, there is high evidence that the general problems addressed in [22]

will become virulent here: a sufficiently simple formal model that captures all

relevant details does not seem to be in our reach, and many details are “volatile”

16

·

F. Schulz and D. Wagner and K. Weihe

in the sense that they may change time and again in unforeseen ways. Future

research will show whether these conflicting criteria can be simultaneously fulfilled

satisfactorily.

REFERENCES

[1] R. Agrawal and H. Jagadish. Algorithms for searching massive graphs. IEEE Transact.

Knowledge and Data Eng., 6:225–238, 1994.

[2] R.K. Ahuja, T.L. Magnanti, and J.B. Orlin. Network Flows. Prentice–Hall, 1993.

[3] S. Albers, A. Crauser, and K. Mehlhorn. Algorithmen für sehr große Datenmengen. MPI

Saarbrücken, 1997. 12

[4] A. Car and A. Frank. Modelling a hierarchy of space applied to large road networks. In Proc.

Int. Worksh. Adv. Research Geogr. Inform. Syst. (IGIS ’94), pages 15–24, 1994.

[5] B.V. Cherkassky, A.V. Goldberg, and T. Radzik. Shortest paths algorithms: Theory and

experimental evaluation. Mathematical Programming, 73:129–174, 1996.

[6] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MIT Press and

McGraw–Hill, 1994.

[7] J. Hooker. Needed: An empirical science of algorithms. Operations Research, 42:201–212,

1994.

[8] J. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1:33–42, 1995.

[9] K. Ishikawa, M. Ogawa, S. Azume, and T. Ito. Map navigation software of the electro multivi-

sion of the ’91 toyota soarer. In IEEE Int. Conf. Vehicle Navig. Inform. Syst. (VNIS ’91),

pages 463–473, 1991.

[10] H.F. Jackson, P.T. Boggs, S.G. Nash, and S. Powell. Guidelines for reporting results of com-

putational experiments—report of the ad hoc committee. Mathematical Programming,

49:413–425, 1991.

[11] D.S. Johnson. A theoretician’s guide to the experimental analysis of algorithms, 1996.

http://www.research.att.com/~dsj/papers/exper.ps.

[12] S. Jung and S. Pramanik. Hiti graph model of topographical road maps in navigation systems.

In Proc. 12th IEEE Int. Conf. Data Eng., pages 76–84, 1996.

[13] C.Y. Lee, J. Bard, M.L. Pinedo, and W. Wilhelm. Guidelines for reporting computational

results in iie transactions. IEE Transactions, 25:121–123, 1993.

[14] T. Lengauer. Combinatorial Algorithms for Integrated Circuit Layout. Wiley, 1990.

[15] C. McGeoch. Analyzing algorithms by simulation: Variance reduction techniques and simu-

lation speedups. ACM Computing Surveys, 24(2):195–212, 1992.

[16] C. McGeoch. Toward an experimental method of algorithm simulation. INFORMS Journal

on Computing, 8(1):1–15, 1996.

[17] T. Preuss and J.-H. Syrbe. An integrated traffic information system. In Proc. 6th Int. Conf.

Appl. Computer Networking in Architecture, Construction, Design, Civil Eng., and Ur-

ban Planning (europIA ’97), 1997.

[18] J. Shapiro, J. Waxman, and D. Nir. Level graphs and approximate shortest path algorithms.

Network, 22:691–717, 1992.

[19] S. Shekhar, A. Fetterer, and G. Goyal. Materialization trade–offs in hierarchical shortest

path algorithms. In Proc. Int. Symp. Large Spatial Databases, Springer Lecture Notes

in Computer Science 1262, pages 94–111, 1997.

[20] S. Shekhar, A. Kohli, and M. Coyle. Path computation algorithms for advanced traveler

information system (atis). In Proc. 9th IEEE Int. Conf. Data Eng., pages 31–39, 1993.

[21] K. Weihe. Reuse of algorithms — still a challenge to object–oriented programming. In Proc.

12th ACM Symp. Object–Oriented Programming, Systems, Languages, and Applications

(OOPSLA ’97), pages 34–48, 1997.

12Available from http://www.mpi-sb.mpg.de/units/ag1/extcomp.html. The lecture notes are in

German, however, the chapter on shortest paths (Chapter 9) is in English.

Dijkstra’s Algorithm On–Line

·

17

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

0

5000

10000

15000

20000

25000

30000

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

0

200

400

600

800

1000

1200

0

50000

100000

150000

200000

250000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0

200

400

600

800

1000

1200

0

50000

100000

150000

200000

250000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0

200

400

600

800

1000

1200

Fig. 7. An exemplary sequence of diagrams for one particular combination of strategies: “angle

restriction” is applied, but the other strategies are not. First column: the frequency distribution

histogram of all queries in the snapshot according to (a) the number of nodes met by the search

(granularity: 500 nodes), (b) the CPU time for the heap implementation and (c) for the Dial

implementation (granularity: 10 milliseconds). Second column: the average of (a) the number of

nodes met and (b/c) the CPU times for the heap/Dial variant taken over all queries with roughly

the same resulting total travel times (granularity: 20 minutes). The strong resemblance of the

diagrams in each row nicely demonstrates the linear behavior of both priority–queue implemen-

tations.

[22] K. Weihe, U. Brandes, A. Liebers, M. Müller-Hannemann, D. Wagner, and T. Willhalm.

Empirical design of geometric algorithms. In Proc. 15th ACM Symp. Comp. Geometry

(SCG ’99), pages 86–94, 1999.

18

·

F. Schulz and D. Wagner and K. Weihe

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

Fig. 8. The relation between the number of nodes hit by the search (abscissa) and the cpu time

in seconds for the heap (left) and the Dial (right) implementation. The first row shows one point

for every query, the second row condenses this information to histograms (granularity: 500 nodes).

Dijkstra’s Algorithm On–Line

·

19

Fig. 9. The left picture shows the edges hit by Dijkstra’s algorithm from Berlin Main East Station

until the arrival station Frankfurt/Main Main Station is reached. In the right picture, the strategy

“angle selection” was applied to the same query.

20

·

F. Schulz and D. Wagner and K. Weihe

Fig. 10. The left picture shows the 225 stations selected for the study on strategy “selection of

stations,” and the right picture shows the edges hit by this strategy for the same query as in

Figure 9. In the right picture, the original train graph is shown in the background, whereas the

highlighted edges are the edges of the auxiliary graph hit by the search.

Dijkstra’s Algorithm On–Line

·

21

Fig. 11. Both pictures refer to the same query as in Fig. 9. The strategies “selection of stations”

and “angle restriction” are applied simultaneously with angles computed from the train graph

(left), and with angles computed from the auxiliary graph itself (right). Like in the previous

picture, the original train graph is shown in the background, and the highlighted edges are the

edges of the auxiliary graph hit by the search.

22

·

F. Schulz and D. Wagner and K. Weihe

Fig. 12. Both pictures refer to the same query as in Fig. 9. They show the results of the strategy

“goal–directed search” and “bidirected search,” respectively.

An Empirical Case Study from

Public Railroad Transport1

Frank Schulz

and

Dorothea Wagner

and

Karsten Weihe

University of Konstanz, Department of Computer & Information Science

Traffic information systems are among the most prominent real–world applications of Dijkstra’s

algorithm for shortest paths. We consider the scenario of a central information server in the

realm of public railroad transport on wide–area networks. Such a system has to process a large

number of on–line queries for optimal travel connections in real time. In practice, this problem is

usually solved by heuristic variations of Dijkstra’s algorithm, which do not guarantee an optimal

result. We report results from a pilot study, in which we focused on the travel time as the only

optimization criterion. In this study, various speed–up techniques for Dijkstra’s algorithm were

analysed empirically. This analysis was based on the timetable data of all German trains and on

a “snapshot” of half a million customer queries.2

1. INTRODUCTION

1.1 Problem.

From a theoretical viewpoint, the problem of finding a shortest path from one

node to another one in a graph with edge lengths is satisfactorily solved. In fact,

the Fibonacci–heap implementation of Dijkstra’s algorithm requires O(m+n log n)

time, where n is the number of nodes and m the number of edges [6]. However,

various practical application scenarios impose restrictions that make this sort of

algorithm impractical. For instance, many scenarios impose a strict limitation on

space consumption.3

1 c©ACM, 2000. This is the author’s version of the work. It is posted here by permission of ACM

for your personal use. Not for redistribution. The definitive version was published in Journal of

Experimental Algorithmics, Volume 5(12), 2000, http://www.jea.acm.org

2With special courtesy of the TLC Transport-, Informatik- und Logistik–Consulting GmbH/EVA–

Fahrplanzentrum, a subsidiary of the Deutsche Bahn AG.

3To give a concrete example: if a traffic information system is to be distributed on CD–Rom or to

be run on an embedded system, a naive implementation of Dijkstra’s algorithm would typically

Address: University of Konstanz, Box D188, 78457 Konstanz, Germany

{schulz,wagner}@fmi.uni–konstanz.de, weihek@acm.org

http://www.fmi.uni-konstanz.de/˜{schulz,wagner,weihe}

2

·

F. Schulz and D. Wagner and K. Weihe

In this paper, we consider a different scenario: space consumption is not an issue,

but the system has to answer a potentially infinite number of customer queries for

optimal travel connections on–line. The real–time restrictions are soft, which basi-

cally means that the average response time is more important than the maximum

response time. The concrete scenario we have in mind is a central server for public

railroad transport, which has to process a large number of queries (e.g. a server

that is directly accessible by customers through terminals in the train stations or

through a web interface).

Assumption 1.1. Our computational study is based on two further simplifying

assumptions:

(1) The time tables are identical every day.

(2) The length of a travel connection is exclusively defined as the travel time (to

be specified in greater detail in Definition 1.2 below).

To achieve Assumption 1.1(1), we simply unite the time tables of all days. Due

to this assumption, a customer query need not specify a date.

Definition 1.2. A customer query (or simply query for short) is a triple con-

sisting of two stations, A and B, and an earliest departure time t. Such a query

asks for a connection from A to B that does not start before t and arrives B as

early as possible.

Algorithmic problems of this kind are usually approached heuristically in practice,

which means that an optimal result is not guaranteed.

Definition 1.3. An algorithm for a shortest–path problem is called distance–

preserving if it produces an optimal connection for every input instance.

Distance–preserving algorithms are not in wide use in traffic information sys-

tems, because the average response time of such an algorithm was perceived to be

unacceptable. In a new long–term project, we plan to investigate the question to

what extent distance–preserving variants of Dijkstra’s algorithm have become com-

petitive on contemporary computer technology. Here we give an experience report

from a pilot study, in which we focused on the most fundamental kind of query,

namely the one specified in Definition 1.2 (subject to Assumption 1.1(1)).

This scenario is an example of a general problem in the design of practical algo-

rithms, which we discussed in [22]: computational studies based on artificial (e.g.

random) data do not make much sense, because the characteristics of the real–world

data are crucial for the success or failure of an algorithmic approach in a concrete

use scenario. Hence, experiments on real–world data are the method of choice.

1.2 Related work.

Various textbooks address speed–up techniques for Dijkstra’s algorithm but have

no concrete applications in mind, notably [2] and [14]. In [21], Chapter 4, a brief,

introductory survey of selected techniques is given (with a strong bias towards

the use scenario discussed here). Most work from the scientific side addresses the

single–source variant, where a spanning tree of shortest paths from a designated

exceed the available space.

Dijkstra’s Algorithm On–Line

·

3

root to all other nodes is to be found. Moreover, the main aspect addressed in work

like [5] is the choice of the data structure for the priority queue. In Section 3 below

(paragraph on the “search horizon”), we will see that the scenario considered in

this paper requires algorithmic approaches that are fundamentally different, and

Section 4 will show that the choice of the priority queue is indeed a marginal aspect

in view of the goal of our project.

On the other hand, most application–oriented work in this field is commercial,

not scientific, and there is only a small number of publications. In fact, we are not

aware of any publication especially about algorithms for wide–area railroad traffic

information systems.

Some scientific work has been done on local public transport. For example, [17]

gives some insights into the state of the art. However, local public transport is quite

different from wide–area public transport, because the timetables are very regular,

and the most powerful speed–up techniques are based on the strict periodicity of

the trains, buses, ferries, etc. In contrast, our experience is that the timetables of

the national European train companies are not regular enough to gain a significant

profit from these techniques.4

On the other hand, private transport has been extensively investigated in view of

wide–area networks. Roughly speaking, this means “routing planners” for cars on

city and country maps [9; 1; 4; 20; 12; 18]. This problem is different to ours in that it

is two–dimensional, whereas train timetables induce time as a third dimension: due

to the lack of periodicity, the earliest departure time is significant in our scenario.

In contrast, temporal aspects do not play any role in the work quoted above.5 So

it is not surprising that the research on routing planners has focused on purely

geometric techniques.

Some of the application–oriented work quoted above is based on hierarchical

approaches. This is also one of our strategies (see Section 3.2, strategy “selections of

stations”). For example, in [19] this approach is viewed from a database perspective

without any specific scenario in mind (though with a bias towards road maps).

For completeness, we mention the work on variants of Dijkstra’s algorithm that

are intended to efficiently cope with large data in secondary memory. Chapter 9

of [3] gives an introduction to theoretical and practical aspects. As mentioned above

(“problem” section in the Introduction), the problems caused by the slow access to

secondary memory are beyond the scope of our paper.

Since our main contribution is a computational study, the work on the method-

ology of this sort of study (e.g. [7; 8; 11; 10; 13; 15; 16]) is also related. We will

come back to methodological issues in Section 2.

1.3 Contribution of the paper.

We implemented and evaluated various distance–preserving speed–up techniques for

Dijkstra’s algorithm. The study is based on all train data (winter period 1996/97)

of the Deutsche Bahn AG, the national railroad and train company of Germany.

4This experience is supported by personal communications with people from the industry.

5In principle, temporal aspects would also be relevant for the private transport, for example, the

distinction between “rush hours” and other times of the day. However, this sort of temporal aspect

is quite different from ours and, to our knowledge, not scientifically invesitigated either.

4

·

F. Schulz and D. Wagner and K. Weihe

The processed queries are a “snapshot” of the central Hafas6 server of the Deutsche

Bahn AG, in which all queries of customers of all ticket offices in Germany were

recorded over several hours.

In particular, an input instance consists of a fixed and a variable part:

—Fixed : all input instances include the same set of time tables.

—Variable: the query, that is, departure station, arrival station, and earliest de-

parture time, varies from input instance to input instance.

The result of this snapshot comprises more than half a million queries, which

might suffice for a representative analysis of this particular service (assuming that

the typical query profile of customers does not vary dramatically from day to day).

Due to the above–mentioned insight that the periodicity of the timetables is not

a promising base for algorithmic approaches, the question is particularly interesting

whether geometric techniques like those in routing planners are successful, although

the scenario has geometric and temporal characteristics. We will see that this

question can indeed be answered in the affirmative.

1.4 Overview.

In Section 2 we will discuss the general rationale of our experimental set–up. Then,

in Section 3, we will briefly summarize the algorithmic techniques that we evaluated.

Finally, in Section 4, we will present and discuss the results of the computational

study.

2. EMPIRICAL METHODOLOGY

As mentioned in the Introduction, the question to be answered by our computa-

tional study is very specific:

Are distance–preserving variants of Dijkstra’s algorithm (in the sense

of Definition 1.3) competitive on contemporary computer technology in

the use scenario addressed in this paper?

The notion “competitive” is here used in a very informal sense, which is probably

the sense of interest from an applied viewpoint. Roughly speaking, we regard an

algorithm as competitive if it would probably not become a bottleneck operation in a

typical central server. Such a fuzzy notion does not allow far–reaching conclusions.

We believe that our conclusions (which we will give in detail in Section 4 and

in summary in Section 5) are modest enough to be safely based on our usage of

competitiveness.

The question addressed in our paper is quite different from the questions typically

addressed in methodological work such as [7; 8; 11; 10; 13; 15; 16]. In fact, to

our knowledge, the general focus in the literature is on systematic performance

comparisons of different algorithms without or with only loose relation to concrete

applications. The overwhelming majority of computational studies in theoretical

computer science primarily addresses this kind of question. The method of choice

here is the evaluation of the individual algorithms on artificial (mainly random)

6Hafas is a trademark of the Hacon Ingenieurgesellschaft mbH, Hannover, Germany.

Dijkstra’s Algorithm On–Line

·

5

input data, which is a good base for the systematic variation of parameters, and a

statistical analysis of the computational results.

In contrast, the question we address in this paper is tightly bound to a concrete

application scenario. The profile of the input data is very specific, but a sufficient

specification of this profile in terms of parameters such as the number of nodes/edges

or the average/maximum node degree does not seem to be in our reach.7 Thus,

there is no base for the creation of artificial, yet realistic data. On the other hand,

a snapshot of half a million real customer queries, which is the base of our study,

is certainly sufficient for a profound answer to the above question. Hence, we will

restrict our attention to real–world data.

Note that the question addressed in this paper is binary (yes/no). As mentioned

in the Introduction, the real–time constraints are rather soft. This means that the

overall average response time, taken over the total set of all queries, is the most

important piece of information to decide upon yes or no. However, beyond this

very specific question we also regard an estimation of the scope of the result as

important. For instance, such a large body of response times, if well analysed,

should allow a reliable prediction how the average response time changes if the

average geographic distance of departure and arrival station changes. To give a

concrete example: the percentage of long–distance queries is significantly higher

for the web service than for the ticket office service, from which our snapshot was

taken.

To estimate the effect of variations like this, we will also consider the interdepen-

dence between response times and two parameters, the Euclidean distance between

departure and arrival station and the minimal total travel time computed in re-

sponse to the respective query. In Section 4 we will see that this dependency is

surprisingly strong, which means that we may assume a high level of confidence for

predictions.

As mentioned in the Introduction, we implemented and evaluated various speed–

up techniques for Dijkstra’s algorithm. One notable exception is the bidirected–

search technique. This technique does not really apply in our scenario, because the

target node of the search depends on the arrival time and is thus not known in

advance (see Section 3, paragraph on “bidirected search,” for details). However,

one can imagine a variation of the bidirected–search technique, which does not rely

on the exact arrival time but on an upper bound. To answer the question whether

this class of modified bidirected–search techniques is potentially competitive at all,

we implemented and evaluated an idealized variant, which indeed does rely on the

exact arrival time and thus yields a lower bound on the potential performance of

this class of speed–up techniques.

7Nonetheless, a comparison of the algorithms from Section 3 based on random data and a conven-

tional statistical analysis as advocated by some of the above–mentioned papers may be of interest

in its own right. However, such an analysis does not contribute to our specific question and is

thus beyond the scope of this paper.

6

·

F. Schulz and D. Wagner and K. Weihe

3. ALGORITHMS

3.1 Fundamental Algorithmic Approach

3.1.1 Train graph.. From Assumption 1.1 recall our restriction to time tables

that do not vary from day to day. Due to this restriction, we only need a train

network of 24 hours.

The arrival or departure of a train at a station will be called an event. The

train graph contains one node for every event. Two events v and w are connected

by a directed edge (v, w) if v represents the departure of a train at some station

and w represents the very next arrival of this train at some other station. On the

other hand, two successive events at the same station are connected by an edge (in

positive time direction), which means that every station is represented in the graph

by a cycle through all of its events (the cycle is closed by a turn–around edge at

midnight).

In each case, the length of an edge is the time difference (in minutes) of the

events represented by its endnodes. Obviously, a query then amounts to finding a

shortest path from the earliest event at the departure station not before the earliest

departure time to an arbitrary arrival event at the arrival station.

The data contains 933, 066 events on 6, 961 stations. Consequently, there are

933, 066 · 3/2 = 1, 399, 599 edges in the graph.

3.1.2 Priority queue.. Dijkstra’s algorithm relies on a priority queue, which man-

ages the events on the current “frontier line” of the search. As mentioned above, the

best general worst–case bound, O(m+ n log n), is obtained from Fibonacci heaps,

where n is again the number of events and m the number of edges. We do not use

a Fibonacci heap but a normal heap (also often called a 2–heap), which yields an

O((n +m) log n) bound [6]. Since m ∈ O(n) in train graphs, both bounds reduce

to O(n log n).

As an alternative to heaps, we also implemented the Dial variant as described

in [2]. Basically, this means that the priority queue is realized by an array of buckets

with a cyclically moving array index. The events of the frontier line are distributed

among the buckets, and it is guaranteed that the very next non–empty bucket after

the current array index always contains the candidates to be processed next. The

asymptotic worst–case bound of the Dial implementation of Dijkstra’s algorithm is

O(m+ nL), where L is the largest length of an edge.

3.1.3 Search horizon.. Of course, we deviate from the “textbook version” of Di-

jkstra’s algorithm in that we do not compute the distance of every event from

the source event but terminate the algorithm immediately once the first (and thus

optimal) event at the arrival station is processed. The most fundamental distance–

preserving speed–up technique for our scenario is then a reduction of the search to

a (hopefully) small part of the graph, which contains all relevant events. Figs. 1–

4 demonstrate that such a reduction is crucial. In fact, they reveal that for the

majority of all queries only small fractions of the total area and time horizon are

relevant.

To our knowledge, some commercial implementations remove events and edges

from the graph (more or less heuristically, i.e. losing optimality of distances) before

the search itself takes place. In contrast, we aim at an evaluation of distance–

Dijkstra’s Algorithm On–Line

·

7

0

5000

10000

15000

20000

25000

30000

35000

0

100

200

300

400

500

600

700

800

Fig. 1. The frequency distribution histogram of the queries from the snapshot according to

the Euclidean distance between the departure station and the arrival station (granularity: 15

kilometers).

0

5000

10000

15000

20000

25000

30000

0

100

200

300

400

500

600

700

800

Fig. 2. Like Fig. 1 except that the abscissa now denotes the minimal travel time in minutes

(granularity: 20 minutes).

preserving strategies, so our approach is quite different. First of all, we apply the

amortization technique discussed in [21] to obtain a sublinear expected run time

per query.8 The only obstacle to sublinearity is the initialization of all events with

infinite distance labels in the beginning of the textbook algorithm; in fact, Figs. 1–4

8The term “sublinear expected run time” is here used in a rather informal, experience–oriented

way and not to be understood in a formal probabilistic sense.

8

·

F. Schulz and D. Wagner and K. Weihe

0

5000

10000

15000

20000

25000

0

50

100

150

200

Fig. 3. Like Fig. 1 except that the abscissa now denotes the number of edges in the shortest path

(granularity: 10 edges).

0

50

100

150

200

250

300

350

400

450

500

0

200

400

600

800

1000

1200

Fig. 4. For each minimal travel time (granularity: 20 minutes) the total CPU times of all queries

yielding this value of the travel time are summed up to reveal which range of travel times con-

tributes most of the total CPU time required for all queries.

strongly suggest that on average the main loop of the algorithm only processes a

very small fraction of the graph until the arrival station is seen and the algorithm

terminates.

As described in [21], every event is given an additional time stamp, which stores

the number of the query in which it was reached in the main loop. Whenever

an event is reached, its time stamp is updated accordingly. This update properly

increases the time stamp if and only if this is the first time that the node is seen

Dijkstra’s Algorithm On–Line

·

9

in the current search. This is exactly the situation in which the current distance

label should be infinity. Hence, if the time stamp is increased, the distance label

is regarded as infinite, otherwise the value of the distance label is taken as is.

Consequently, there is no need for an expensive initialization phase, and no event

outside the “search horizon” of the main loop is hit at all.

It might be worth mentioning an equivalent strategy to avoid the initialization

step: An additional data structure, e.g. a linked list, is used to store the nodes

with non-infinite distance labels. Then, all non-infinite distance labels can be reset

to infinity after each query.

3.2 Algorithmic Variations

The good performance of the following additional techniques relies on the general

design decisions introduced in Section 3.1. All of these techniques are independent

of each other in the sense that an arbitrary selection of them may be applied

simultaneously.

3.2.1 Angle restriction.. This technique additionally relies on the coordinates

associated with the individual stations. In a preprocessing step, we apply Dijkstra’s

algorithm to each event to compute shortest paths from this event to all other

stations.9 The results are not stored (this would require too much space) but

only used to compute two values α(v, w) and β(v, w) for each edge (v, w) that

corresponds to a train hop. These 2 ·n/2 values (n again the number of events) are

stored and then used in the on–line system.

More specifically, these values are to be interpreted as angles in the plane. See

Fig. 5. Let (v, w) be an edge from some event v to some other event w, and let

sv and sw be the stations to which events v and w belong, respectively. Then the

values α(v, w) and β(v, w) stored for edge (v, w) span a circle sector with center sv

in the plane, where, say, α(v, w) is the right leg. The values α(v, w) and β(v, w) are

determined such that this circle sector has the following meaning: if the shortest

path from event v to some event u at some station s′ contains (v, w), then s′

has to be in this circle sector. Subject to this condition, the difference of the

angles, (β(v, w)−α(v, w)) mod 2π, is minimum. Clearly, the brute–force application

of Dijkstra’s algorithm in the preprocessing allows one to compute this specific,

narrowest possible circle sector for each edge.

Consequently, edge (v, w) may be ignored by the search if the arrival station is

not in the circle sector of this edge. The restriction of the search to edges whose

circle sectors contain the arrival station is the strategy that we will call “angle

restriction” in the following.

Proposition 3.1. Due to the specific construction of the values α(·, ·) and β(·, ·),

the strategy “angle restriction” is distance–preserving in the sense of Definition 1.3.

3.2.2 Selection of stations.. The basic idea behind this technique is similarly

implemented in various routing planners for the private transport [9; 1; 4; 12].

9This preprocessing takes several hours, which is absolutely acceptable in the practical setting

from which this research originated, namely generating time tables once for every winter/summer

period. Hence, there is no need to optimize the preprocessing.

10

·

F. Schulz and D. Wagner and K. Weihe

( , ) α ( , )

β

s’

u

v w

at

v w

at s

v

1

at s

w

2

Fig. 5. The circle sector spanned by α(v, w) and β(v, w) contains all stations s′ such that there

is an event u at s′ whose shortest path from v starts with (v, w).

0

20000

40000

60000

80000

100000

120000

140000

160000

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

Fig. 6. The relation of the number of edges hit without (abscissa) and with (ordinate) strategy

angle selection (granularity: 500 edges).

In general, a certain small set of nodes (i.e. events) is selected, which means

events in our setting. However, we will only select sets of nodes such that either

every node from a station s is selected, or none of the nodes from station s. More

specifically, in the data available to us, every station is assigned an “importance

number,” which is intended to rank its degree of “centrality” in the railroad network.

For our computational study, we select the 225 stations in the highest categories

(see Fig. 10). These stations induce 95, 423 events in total, which constitute our

selection of events.

Let G = (V,E) denote the (directed) train graph. In particular, V is the set of all

events. Moreover, let Ṽ be the set of selected nodes. The strategy is based on what

we will call replaceable paths in the following. Let v, w ∈ V be two events, which

may or may not belong to Ṽ . A path from v to w in G will be called replaceable, if

Dijkstra’s Algorithm On–Line

·

11

no node of the path (except for the endnodes, v and w, themselves) belongs to Ṽ .

Based on this notion, we construct an auxiliary directed graph G′ = (V,E′) on

V . For v, w ∈ V , there is a directed edge (v, w) ∈ E ′ if, and only if, there is a

replaceable path from v to w in G. The length `(v, w) of an edge (v, w) ∈ E ′ in G′

is then defined as the minimum length of a replaceable path from v to w in G.

The auxiliary graph G′ and these edge weights are also constructed once and for

all in a preprocessing step (which only takes a few minutes). Each query is then

answered by the computation of a shortest path in the auxiliary graph G′, and this

path corresponds to a shortest path in the train graph G.

We implement this general approach as follows. First of all, note that it is

not necessary to reconstruct the path in G that corresponds to the shortest path

computed in G′. In fact, what we really want to have from the computation is a

sequence of trains and the stations where to change train. This data can be attached

to the edges of the auxiliary graph in an even more compact (less redundant) and

thus more efficiently evaluable fashion than to the edges of the train graph.

As mentioned above, we select a set of stations, and the events of these stations

are the selected events. Clearly, there is a trade–off. Roughly speaking, the smaller

the number of selected stations is, the larger the resulting connected components

are and, even worse, the larger the number of selected stations neighboring to a

component. Since the number of edges depends on the latter number in a quadratic

fashion, an improvement of performance due to a rigorous reduction of stations is

soon outweighed by the tremendous increase in the number of edges.

It has turned out that in our setting, a minor refinement of this strategy is

sufficient to overcome this trade–off. For this, let u, v, and w be three selected

events such that edges (u, v), (v, w), and (u,w) exist in the auxiliary graph. If

`(u, v)+`(v, w) ≤ `(u,w), then edge (u,w) is dropped in the auxiliary graph. Again,

the optimal distance is preserved. The number of edges grows only moderately after

this modification, so quite a small set of selected stations becomes feasible.

The selected stations induce 95, 423 events in total, which means that the number

of events is approximately reduced by a factor of 10, and the number of stations

is reduced by a factor of 31. This discrepancy between these two factors is not

surprising, because central stations are typically met by more trains than other

stations.

Proposition 3.2. The strategy “selection of stations” is distance–preserving in

the sense of Definition 1.3.

3.2.3 Goal–directed search.. This strategy is based on one real number b(s) for

each station s, which must be a lower bound on the minimal travel time from

s to the arrival station. Let (v, w) ∈ E be an edge, and let sv and sw be the

stations of v and w, respectively. The values b(sv) and b(sw) must additionally

satisfy b(sv)− b(sw) ≤ `(v, w). The length `(v, w) of (v, w) is then replaced by the

modified length `′(v, w) := `(v, w)− b(sv) + b(sw) ≥ 0.

It can be shown that the shortest paths with respect to the length function `′(·, ·)

are identical with the shortest paths with respect to the original length function

`(·, ·). Hence, we may alternately compute a shortest path with respect to `′(·, ·).

The hope is that the “frontier” of the search horizon approaches the arrival station

faster when the modified edge lengths are used. See [14] for a more systematic

12

·

F. Schulz and D. Wagner and K. Weihe

discussion.

Roughly speaking, the tighter the lower bounds s(·) are, the larger the speed–up

effect will be. The base of our lower bounds is the maximum speed of all trains as

a transformation factor. The lower bound for an arbitrary station with respect to a

given target station is the Euclidean distance of these two stations divided by this

factor. These values are computable “on the fly” during the search.

Proposition 3.3. The strategy “goal–directed search” is distance–preserving in

the sense of Definition 1.3.

3.2.4 Bidirected search.. This strategy “merges” two applications of Dijkstra’s

algorithm into one loop. More specifically, one search starts from the source event

and goes forward as usual, whereas the other search starts from the target event

and traverses every edge in backward direction. The algorithm is finished once an

event was labeled permanently by both searches. A shortest path from the source

event to the target event is then easily constructed from the results of the two

searches up to this moment. We refer to [2; 14] for details.

This strategy is not directly applicable in our scenario, because the target event

is not known until the search is finished. Nonetheless, one can imagine variations of

this strategy, in which the backward search indeed starts from the arrival station,

however, not from the optimal arrival time but from an upper bound on this value.

To evaluate the general potential of this strategy in view of our application sce-

nario, we implemented an idealized variant. More specifically, we indeed use the

exact optimal arrival time as a start point for the backward search. Clearly, this

gives a lower bound on the performance of all of these variations of bidirected

search.10 Since we only consider an idealized strategy here, it does not make sense

to evaluate the CPU times achieved by this strategy. Hence, we will restrict the

evaluation of this strategy to operation counts (see Section 4).

3.2.5 Combination of strategies.. In principle, all of these strategies can be freely

combined. In that, the strategy “selection of stations” can be combined with the

strategy “angle restrictions” in two ways, namely the angle restrictions can be

computed for the auxiliary graph, or they can be computed for the train graph and

simply taken over for the auxiliary graph.

4. ANALYSIS OF THE ALGORITHMIC PERFORMANCE

The experiments were performed on a SUN Sparc Enterprise 5000, and the code

was written in C++ using the GNU compiler.

Tables 1 and 2 present a summarizing comparison of all combinations of strate-

gies. Since every node in the train graph has a degree of three, the total number of

algorithmic steps is asymptotically dominated by the number of operations inside

the priority queue. In other words, these operations are representative operation

counts in the sense of [2], Sect. 18. More specifically, for a heap the number of

exchange operations is representative, whereas for the Dial variant the number

of cyclic increments of the moving array index plus the number of the bucket-

operations “insert” and “remove” is representative. The average total number per

10Except for pathological cases that are imagineable but might hardly occur in reality.

Dijkstra’s Algorithm On–Line

·

13

query of these operations are listed in the last part of Tables 1 and 2. As men-

tioned in Section 2, we regard these average numbers as the most important piece

of information in view of our central yes/no question.

For both implementations of the priority queue, the CPU times imply the same

strong ranking of combinations of strategies. Figs. 9–12 might give a visual impres-

sion why this ranking is so unambiguous. Moreover, the discrepancy between the

heap and the Dial implementation in terms of CPU times also decreases roughly

from row to row.

Our experience with several versions of the code is that the exact CPU times are

strongly sensitive to the details of the implementation, but the general tendency

was maintained and seems to be reliable. In particular, the main question raised

in this paper (whether distance–preserving techniques in the sense of Definition 1.3

are competitive) can be safely answered in the affirmative at least for the restriction

of the problem to the total travel time as the only optimization criterion.

However, a detailed look at the results is more insightful (and necessary to con-

clude predictions for scenarios with different profiles as discussed in Section 2).

Fig. 6 shows that there is a very strong correlation between the number of edges

hit with/without strategy “angle restriction.” On the other hand, Fig. 7 shows a

detailed analysis of one particular, exemplary combination of strategies. A com-

parison of the diagrams in Fig. 7 reveals an interesting effect, which is also found

in the analogous diagrams of the other combinations: the CPU times of both the

heap and the Dial implementation “look” linear in the number of nodes hit by the

search.11

This correlation is so strong and the variance is so small that corresponding

diagrams in Fig. 7 look almost identical. Figure 8 reveals the cause.

In other

words, in both cases the operations on the priority queue come close to constant

time on average, even when the average is taken over each query separately.

Finally, the bidirected–search technique required, on average, 5606 nodes, 9497

edges, an operation count of 7531 for the Dial implementation, and an operation

count of 45826 for the heap implementation. As mentioned in the paragraph on

bidirected search in Section 3, it does not make sense to take CPU times here,

because an idealized algorithm was implemented.

Since one can safely assume that the above results for this idealized algorithm

are a lower bound on the real performance of this technique in our scenario, they

give some evidence that bidirected search would be a reasonable choice (assuming

that tight upper bounds on the exact arrival times can be found efficiently), but

might not be the favorite choice.

In Section 2, we raised the question whether the results allow reliable predictions

for similar scenarios such as a web server, where a larger fraction of all queries

asks for long–distance connections. The small variance encourages us to conclude

(with caution) that our results are scalable in an approximately linear fashion to

11From the work on this study we gained the experience that the difference between Θ(n) and

Θ(n log n) may be very small and hidden by the “noise” of the data. Hence, the claim that one

can observe a linear behavior does not really contradict the claim that the asymptotic complexity

is Θ(n logn). In other words, the above claim that the run time be linear is to be taken with a

pinch of salt.

14

·

F. Schulz and D. Wagner and K. Weihe

Selection of stations

Angle restriction

CPU heap

CPU Dial

no

no

0.310

0.103

no

yes

0.036

0.018

yes

no

0.027

0.012

yes

yes (train)

0.007

0.005

yes

yes (auxiliary)

0.005

0.003

Selection of stations

Angle restriction

Nodes

Edges

no

no

17576

31820

no

yes

4744

10026

yes

no

2114

3684

yes

yes (train)

1140

2737

yes

yes (auxiliary)

993

2378

Selection of stations

Angle restriction

Ops. heap

Ops. Dial

no

no

246255

23866

no

yes

24526

3334

yes

no

26304

3660

yes

yes (train)

4973

1197

yes

yes (auxiliary)

3191

932

Table 1. A summary of all computational results for the individual combinations of the techniques

“selection of stations” and “angle restrictions.” The entries “train” and “auxiliary” in column #2

refer to the graph in which the angles were computed (see the last paragraph in Section 3). The

columns #3–#4 give the average over all queries of the snapshot. More specifically, the first table

gives the average raw CPU times, the second table the average number of nodes and edges hit by

the search, and the third table the average operation counts.

scenarios like that.

As announced in the Introduction, the choice of the data structure for the priority

queue seems to be a detail of minor relevance. The difference between the heap and

the Dial implementation looks significant at first glance, however, this seems to be

one of the aspects which are not stable but highly dependent on implementation

details.

5. CONCLUSION AND OUTLOOK

The outcome of this study suggests that geometric speed–up techniques are a good

basis for the computation of provably optimal connections in railroad traffic infor-

mation systems. The question raised in this paper is answered for the total travel

time: the best combinations of strategies are by far faster than is currently needed

in practice. The success of geometry–based strategies is a bit surprising, because

the underlying data is not purely geometric in nature.

Another surprising outcome of the study is that both the normal heap and

the Dial data structure only require an (amortized) constant time per operation,

whereas the worst–case bound is logarithmic for heaps and even linear for Dials.

Note that no amortization over a set of queries must be applied to obtain a con-

stant time per operation; the variance is small enough that the average run time

per operation within a single query can essentially be regarded as bounded by a

Dijkstra’s Algorithm On–Line

·

15

Selection of stations

Angle restriction

CPU heap

CPU Dial

no

no

0.204

0.072

no

yes

0.022

0.013

yes

no

0.019

0.010

yes

yes (train)

0.006

0.004

yes

yes (auxiliary)

0.005

0.003

Selection of stations

Angle restriction

Nodes

Edges

no

no

11072

20046

no

yes

3415

7120

yes

no

1437

2465

yes

yes (train)

856

2002

yes

yes (auxiliary)

754

1765

Selection of stations

Angle restriction

Ops. heap

Ops. Dial

no

no

151274

15274

no

yes

16990

2430

yes

no

18123

2595

yes

yes (train)

3502

961

yes

yes (auxiliary)

2282

751

Table 2. Like Table 1 except that now the strategy “goal–directed search” is additionally applied

in all cases.

constant. Due to the fact that the variance is negligible, a conventional statistical

analysis would not make any sense.

The choice of algorithms (and of minor details of their implementation) is justified

a posteriori in view of our main question. Implementing and comparing further

speed–up strategies is certainly an interesting topic in its own right. However, the

goal of our research effort is different: to go far beyond the restricted scenario of

Assumption 1.1 and to see which algorithms will stand this ultimate test.

The minimal travel time is certainly an empirical research topic in its own right,

not only because it is the most fundamental objective in practice. However, a

practical algorithm must consider further criteria and restrictions. For example,

the ticket costs and the number of train changes are also important objectives.

Moreover, certain trains do not operate every day (contrasting Assumption 1.1(1)),

and certain kinds of tickets are not valid for all trains, so it should be possible

to exclude train connections in a query. A satisfactory compromise must be found

between the speed of the algorithm and the quality of the result. Thus, the problem

is not purely technical anymore but also involves “business rules,” which are usually

very informal.

In the future, an extensive requirements analysis will be necessary, which means

that the work will be no longer purely “algorithmical” in nature. Such an analysis

must be very detailed because otherwise there is no hope to match the real problem.

Unfortunately, there is high evidence that the general problems addressed in [22]

will become virulent here: a sufficiently simple formal model that captures all

relevant details does not seem to be in our reach, and many details are “volatile”

16

·

F. Schulz and D. Wagner and K. Weihe

in the sense that they may change time and again in unforeseen ways. Future

research will show whether these conflicting criteria can be simultaneously fulfilled

satisfactorily.

REFERENCES

[1] R. Agrawal and H. Jagadish. Algorithms for searching massive graphs. IEEE Transact.

Knowledge and Data Eng., 6:225–238, 1994.

[2] R.K. Ahuja, T.L. Magnanti, and J.B. Orlin. Network Flows. Prentice–Hall, 1993.

[3] S. Albers, A. Crauser, and K. Mehlhorn. Algorithmen für sehr große Datenmengen. MPI

Saarbrücken, 1997. 12

[4] A. Car and A. Frank. Modelling a hierarchy of space applied to large road networks. In Proc.

Int. Worksh. Adv. Research Geogr. Inform. Syst. (IGIS ’94), pages 15–24, 1994.

[5] B.V. Cherkassky, A.V. Goldberg, and T. Radzik. Shortest paths algorithms: Theory and

experimental evaluation. Mathematical Programming, 73:129–174, 1996.

[6] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MIT Press and

McGraw–Hill, 1994.

[7] J. Hooker. Needed: An empirical science of algorithms. Operations Research, 42:201–212,

1994.

[8] J. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1:33–42, 1995.

[9] K. Ishikawa, M. Ogawa, S. Azume, and T. Ito. Map navigation software of the electro multivi-

sion of the ’91 toyota soarer. In IEEE Int. Conf. Vehicle Navig. Inform. Syst. (VNIS ’91),

pages 463–473, 1991.

[10] H.F. Jackson, P.T. Boggs, S.G. Nash, and S. Powell. Guidelines for reporting results of com-

putational experiments—report of the ad hoc committee. Mathematical Programming,

49:413–425, 1991.

[11] D.S. Johnson. A theoretician’s guide to the experimental analysis of algorithms, 1996.

http://www.research.att.com/~dsj/papers/exper.ps.

[12] S. Jung and S. Pramanik. Hiti graph model of topographical road maps in navigation systems.

In Proc. 12th IEEE Int. Conf. Data Eng., pages 76–84, 1996.

[13] C.Y. Lee, J. Bard, M.L. Pinedo, and W. Wilhelm. Guidelines for reporting computational

results in iie transactions. IEE Transactions, 25:121–123, 1993.

[14] T. Lengauer. Combinatorial Algorithms for Integrated Circuit Layout. Wiley, 1990.

[15] C. McGeoch. Analyzing algorithms by simulation: Variance reduction techniques and simu-

lation speedups. ACM Computing Surveys, 24(2):195–212, 1992.

[16] C. McGeoch. Toward an experimental method of algorithm simulation. INFORMS Journal

on Computing, 8(1):1–15, 1996.

[17] T. Preuss and J.-H. Syrbe. An integrated traffic information system. In Proc. 6th Int. Conf.

Appl. Computer Networking in Architecture, Construction, Design, Civil Eng., and Ur-

ban Planning (europIA ’97), 1997.

[18] J. Shapiro, J. Waxman, and D. Nir. Level graphs and approximate shortest path algorithms.

Network, 22:691–717, 1992.

[19] S. Shekhar, A. Fetterer, and G. Goyal. Materialization trade–offs in hierarchical shortest

path algorithms. In Proc. Int. Symp. Large Spatial Databases, Springer Lecture Notes

in Computer Science 1262, pages 94–111, 1997.

[20] S. Shekhar, A. Kohli, and M. Coyle. Path computation algorithms for advanced traveler

information system (atis). In Proc. 9th IEEE Int. Conf. Data Eng., pages 31–39, 1993.

[21] K. Weihe. Reuse of algorithms — still a challenge to object–oriented programming. In Proc.

12th ACM Symp. Object–Oriented Programming, Systems, Languages, and Applications

(OOPSLA ’97), pages 34–48, 1997.

12Available from http://www.mpi-sb.mpg.de/units/ag1/extcomp.html. The lecture notes are in

German, however, the chapter on shortest paths (Chapter 9) is in English.

Dijkstra’s Algorithm On–Line

·

17

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

0

5000

10000

15000

20000

25000

30000

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

0

200

400

600

800

1000

1200

0

50000

100000

150000

200000

250000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0

200

400

600

800

1000

1200

0

50000

100000

150000

200000

250000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0

200

400

600

800

1000

1200

Fig. 7. An exemplary sequence of diagrams for one particular combination of strategies: “angle

restriction” is applied, but the other strategies are not. First column: the frequency distribution

histogram of all queries in the snapshot according to (a) the number of nodes met by the search

(granularity: 500 nodes), (b) the CPU time for the heap implementation and (c) for the Dial

implementation (granularity: 10 milliseconds). Second column: the average of (a) the number of

nodes met and (b/c) the CPU times for the heap/Dial variant taken over all queries with roughly

the same resulting total travel times (granularity: 20 minutes). The strong resemblance of the

diagrams in each row nicely demonstrates the linear behavior of both priority–queue implemen-

tations.

[22] K. Weihe, U. Brandes, A. Liebers, M. Müller-Hannemann, D. Wagner, and T. Willhalm.

Empirical design of geometric algorithms. In Proc. 15th ACM Symp. Comp. Geometry

(SCG ’99), pages 86–94, 1999.

18

·

F. Schulz and D. Wagner and K. Weihe

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

Fig. 8. The relation between the number of nodes hit by the search (abscissa) and the cpu time

in seconds for the heap (left) and the Dial (right) implementation. The first row shows one point

for every query, the second row condenses this information to histograms (granularity: 500 nodes).

Dijkstra’s Algorithm On–Line

·

19

Fig. 9. The left picture shows the edges hit by Dijkstra’s algorithm from Berlin Main East Station

until the arrival station Frankfurt/Main Main Station is reached. In the right picture, the strategy

“angle selection” was applied to the same query.

20

·

F. Schulz and D. Wagner and K. Weihe

Fig. 10. The left picture shows the 225 stations selected for the study on strategy “selection of

stations,” and the right picture shows the edges hit by this strategy for the same query as in

Figure 9. In the right picture, the original train graph is shown in the background, whereas the

highlighted edges are the edges of the auxiliary graph hit by the search.

Dijkstra’s Algorithm On–Line

·

21

Fig. 11. Both pictures refer to the same query as in Fig. 9. The strategies “selection of stations”

and “angle restriction” are applied simultaneously with angles computed from the train graph

(left), and with angles computed from the auxiliary graph itself (right). Like in the previous

picture, the original train graph is shown in the background, and the highlighted edges are the

edges of the auxiliary graph hit by the search.

22

·

F. Schulz and D. Wagner and K. Weihe

Fig. 12. Both pictures refer to the same query as in Fig. 9. They show the results of the strategy

“goal–directed search” and “bidirected search,” respectively.