ALGORITHMS
ROBERT SEDGEWICK
BROWN UNNER!MY
ADDISON-WESLEY PUBLISHING COMPANY
Reading, Massachusetts l Menlo Park, California
London l Amsterdam l Don Mills, Ontario l Sydney
To Adam, Brett, Robbie
and especially Linda
This book is in the
Addison-Wesley Series in Computer Science
Consulting Editor
Michael A. Harrison
Sponsoring Editor
James T. DeWolfe
Library of Congress Cataloging in Publication Data
Sedgewick, Robert, 1946-
Algorithms.
1. Algorithms. I. Title.
QA76.6.S435 1983
ISBN O-201 -06672-6
519.4 82-11672
Reproduced by Addison-Wesley from camera-ready copy supplied by the author.
Reprinted with corrections, August 1984
Copyright 0 1983 by Addison-Wesley Publishing Company, Inc.
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise, without prior written permission
of the publisher. Printed in the United States of America.
ISBN o-201-06672-6
FGHIJ-HA-8987654
Preface
This book is intended to survey the most important algorithms in use on
computers today and to teach fundamental techniques to the growing number
of people who are interested in becoming serious computer users. It is appropriate
for use as a textbook for a second, third or fourth course in computer
science: after students have acquired some programming skills and familiarity
with computer systems, but before they have specialized courses in advanced
areas of computer science or computer applications. Additionally, the book
may be useful as a reference for those who already have some familiarity with
the material, since it contains a number of computer implementations of useful
algorithms.
The book consists of forty chapters which are grouped into seven major
parts: mathematical algorithms, sorting, searching, string processing, geometric
algorithms, graph algorithms and advanced topics. A major goal in the
development of this book has been to bring together the fundamental methods
from these diverse areas, in order to provide access to the best methods
that we know for solving problems by computer for as many people as possible.
The treatment of sorting, searching and string processing (which may
not be covered in other courses) is somewhat more complete than the treatment
of mathematical algorithms (which may be covered in more depth in
applied mathematics or engineering courses), or geometric and graph algorithms
(which may be covered in more depth in advanced computer science
courses). Some of the chapters involve mtroductory treatment of advanced
material. It is hoped that the descriptions here can provide students with
some understanding of the basic properties of fundamental algorithms such
as the FFT or the simplex method, while at the same time preparing them
to better appreciate the methods when they learn them in advanced courses.
The orientation of the book is towards algorithms that are likely to be
of practical use. The emphasis is on t,eaching students the tools of their
trade to the point that they can confidently implement, run and debug useful
algorithms. Full implementations of the methods discussed (in an actual
programming language) are included in the text, along with descriptions of
the operations of these programs on a consistent set of examples. Though not
emphasized, connections to theoretical computer science and the analysis of
algorithms are not ignored. When appropriate, analytic results are discussed
to illustrate why certain algorithms are preferred. When interesting, the
relationship of the practical algorithms being discussed to purely theoretical
results is described. More information of the orientation and coverage of the
material in the book may be found in the Introduction which follows.
One or two previous courses in computer science are recommended for
students to be able to appreciate the material in this book: one course in
. . .
111
iv
programming in a high-level language such as Pascal, and perhaps another
course which teaches fundamental concepts of programming systems. In short,
students should be conversant with a modern programming language and
have a comfortable understanding of the basic features of modern computer
systems. There is some mathematical material which requires knowledge of
calculus, but this is isolated within a few chapters and could be skipped.
There is a great deal of flexibility in the way that the material in the
book can be taught. To a large extent, the individual chapters in the book
can each be read independently of the others. The material can be adapted
for use for various courses by selecting perhaps thirty of the forty chapters.
An elementary course on "data structures and algorithms" might omit some
of the mathematical algorithms and some of the advanced graph algorithms
and other advanced topics, then emphasize the ways in which various data
structures are used in the implementation. An intermediate course on "design
and analysis of algorithms" might omit some of the more practically-oriented
sections, then emphasize the identification and study of the ways in which
good algorithms achieve good asymptotic performance. A course on "software
tools" might omit the mathematical and advanced algorithmic material, then
emphasize means by which the implementations given here can be integrated
for use into large programs or systems. Some supplementary material might be
required for each of these examples to reflect their particular orientation (on
elementary data structures for "data structures and algorithms," on mathematical
analysis for "design and analysis of algorithms," and on software
engineering techniques for "software tools"); in this book, the emphasis is on
the algorithms themselves.
At Brown University, we've used preliminary versions of this book in our
third course in computer science, which is prerequisite to all later courses.
Typically, about one-hundred students take the course, perhaps half of whom
are majors. Our experience has been that the breadth of coverage of material
in this book provides an "introduction to computer science" for our majors
which can later be expanded upon in later courses on analysis of algorithms,
systems programming and theoretical computer science, while at the same
time providing all the students with a large set of techniques that they can
immediately put to good use.
The programming language used throughout the book is Pascal. The
advantage of using Pascal is that it is widely available and widely known;
the disadvantage is that it lacks many features needed by sophisticated algorithms.
The programs are easily translatable to other modern programming
languages, since relatively few Pascal constructs are used. Some of the programs
can be simplified by using more advanced language features (some not
available in Pascal), but this is true less often than one might think. A goal of
this book is to present the algorithms in as simple and direct form as possible.
The programs are not intended to be read by themselves, but as part of the
surrounding text. This style was chosen as an alternative, for example, to
having inline comments. Consistency in style is used whenever possible, so
that programs which are similar, look similar. There are 400 exercises, ten
following each chapter, which generally divide into one of two types. Most
of the exercises are intended to test students' understanding of material in
the text, and ask students to work through an example or apply concepts
described in the text. A few of the exercises at the end of each chapter involve
implementing and putting together some of the algorithms, perhaps running
empirical studies to learn their properties.
Acknowledgments
Many people, too numerous to mention here, have provided me with helpful
feedback on earlier drafts of this book. In particular, students and teaching
assistants at Brown have suffered through preliminary versions of the material
in this book over the past three years. Thanks are due to Trina Avery, Tom
Freeman and Janet Incerpi, all of whom carefully read the last two drafts
of the book. Janet provided extensive detailed comments and suggestions
which helped me fix innumerable technical errors and omissions; Tom ran
and checked the programs; and Trina's copy editing helped me make the text
clearer and more nearly correct.
Much of what I've written in this book I've learned from the teaching and
writings of Don Knuth, my thesis advisor at Stanford. Though Don had no
direct influence at all on this work, his presence may be felt in the book, for
it was he who put the study of algorithms on a scientific footing that makes
a work such as this possible.
Special thanks are due to Janet Incerpi who initially converted the book
into QX format, added the thousands of changes I made after the "last draft,"
guided the files through various systems to produce printed pages and even
wrote the scan conversion routine for Ylj$ that we used to produce draft
manuscripts, among many other things.
The text for the book was typeset at the American Mathematical Society;
the drawings were done with pen-and-ink by Linda Sedgewick; and the final
assembly and printing were done by Addison-Wesley under the guidance of
Jim DeWolf. The help of all the people involved is gratefully acknowledged.
Finally, I am very thankful for the support of Brown University and
INRIA where I did most of the work on the book, and the Institute for Defense
Analyses and the Xerox Palo Alto Research Center, where I did some work
on the book while visiting.
Robert Sedgewick
Marly-le-Roi, France
February, 1985'
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . .
Algorithms, Outline of Topics
1. Preview. . . . . . . . . . . . . . . . . . . . . . .
Pascal, Euclid's Algorithm, Recursion, Analysis of Algorithms
Implementing Algorithms
MATHEMATICAL ALGORITHMS
2. Arithmetic . . . . . . . . . . . . . . . . . . . . .
Polynomials, Matrices, Data Structures
3. Random Numbers . . . . . . . . . . . . . . . . . . .
Applications, Linear Congruential Method, Additive
Congruential Method, Testing Randomness, Implementation Notes
4. Polynomials . . . . . . . . . . . . . . . . . . . . . .
Evaluation, Interpolation, Multiplication, Divide-and-conquer
Recurrences, Matriz Multiplication
5. Gaussian Elimination . . . . . . . . . . . . . . . . . .
A Simple Example, Outline of the Method, Variations and Extensions
6. Curve Fitting . . . . . . . . . . . . . . . . . . . . .
Polynomaal Interpolation, Spline Interpolation, Method of Least Squares
7. Integration . . . . . . . . . . . . . . . . . . . . . .
Symbolac Integration, Simple Quadrature Methods, Compound Methods,
Adaptive Quadrature
SORTING
8. Elementary Sorting Methods . . . . . . . . . . . . . .
Rules of the Game, Selection Sort, Insertion Sort, Shellsort,
Bubble Sort, Distribution Counting, Non-Random Files
9. Quicksort . . . . . . . . . . . . . . , , . , . . . . .
The Baszc Algorithm, Removing Recursion, Small Subfiles,
Median-of- Three Partitioning
10. Radix Sorting . . . . . . . . . . . , . . . . . . . . .
Radiz Ezchange Sort, Straight Radix Sort, A Linear Sort
11. Priority Queues . . . . . . . . . . . . . . . . . . . .
Elementary Implementations, Heap Data Structure, Algorithms
on Heaps, Heapsort, Indirect Heaps, Advanced Implementations
12. Selection and Merging . . . . . . . . . . . . . . . . .
Selection, Mergang, Recursion Revisited
13. External Sorting . . . . . . . . . . . . . . . . . . . .
Sort-Merge, Balanced Multiway Merging, Replacement Selectzon,
Practical Considerations, Polyphase Merging, An Easier Way
. . . 3
. . . . 9
. . . . 21
. . . . 33
. . . . 45
. . . . 57
. . . . 67
. . . . 79
. . . . 91
* . . 103
. . . 115
. . 127
. . . 143
. . 155
vi
vii
SEARCHING
14. Elementary Searching Methods . . . . . . . . . . . . . . . . 171
Sequential Searching, Sequential List Searchang, Binary Search,
Binary 'Pree Search, Indirect Binary Search Trees
15. Balanced Trees . . . . . . . . . . . . . . . . . . . . . . 187
Top-Down 2-9-4 Trees, Red-Black Trees, Other Algorithms
16. Hashing . . . . . . . . . . . . . . . . . , . . . . . . . 201
Hash Functions, Separate Chaining, Open Addresszng, Analytic Results
17. Radix Searching . . . . . . . . . . . . . . . . . . . . . . 213
Digital Search Trees, Radix Search Wes, M&iway Radar Searching,
Patricia
18. External Searching . . . . . . . . , , . . . . . . . . . . . . . 225
Indexed Sequential Access, B- nees, Extendible Hashing, Virtual Memory
STRING PROCESSING
19. String Searching . . . . . . . . . . . . . . . . . . . . . . 241
A Short History, Brute-Force Algorithm, Knuth-Morris-Pratt Algorzthm,
Bayer-Moore Algorithm, Rabin-Karp Algorithm, Multiple Searches
20. Pattern Matching . . . . . . . . . . . . . . . . . . . . . 257
Describing Patterns, Pattern Matching Machznes, Representzng
the Machine, Simulating the Machine
21. Parsing , . . . . . . . . . . . . . . . . . . . . . . . . . 269
Conteti-Free Grammars, Top-Down Parsing, Bottom-Up Parsing,
Compilers, Compiler-Compilers
22. File Compression . . . . . . . . . . . . . . . . . . . . . . 283
Run-Length Encoding, Variable-Length Encoding
23. Cryptology . . . . . . . . . . . . . . . . . . . . . . . . . 295
Rules of the Game, Simple Methods, Encrypt:!on/Decryption
Machines, Publzc-Key Cryptosystems
GEOMETRIC ALGORITHMS
24. Elementary Geometric Methods . . . . . . . . . . . . . . . . 307
Poznts, Lines, and Polygons, Line Intersection, Simple
Closed Path, Inclusaon in 4 Polygon, Perspective
25. Finding the Convex Hull . . . . . . . . . . . . . . . . . . . 321
Rules of the Game, Package Wrapping, The Graham Scan,
Hull Selection, Performance Issues
26. Range Searching . . . . . . . . . . . . . . . . . . . . . . . 335
Elementary Methods, Grad Method, 2D Trees,
Multidimensaonal Range Searching
27. Geometric Intersection . , . . . . . . . . . . . . . . . . . . 349
Horizontal and Vertical Lines, General Line Intersection
28. Closest Point Problems . . . . . . . . . . . . . . . . . . . 361
Closest Paar, Voronoi Diagrams
V l l l
GRAPH ALGORITHMS
29. Elementary Graph Algorithms . . . . . . . . . . . . . . .
Glossary, Representation, Depth-First Search, Mazes, Perspectzve
30. Connectivity . . . . . . . . . . . . . . . . . . . . .
Biconnectivity, Graph Traversal Algorzthms, Union-Find Algorithms
31. Weighted Graphs . . . . . . . . . . . . . . . . . . .
Mmimum Spanning Tree, Shortest Path, Dense Graphs, Geometrzc Problems
32. Directed Graphs . . . . . . . . . . . . . . . . . . . .
Depth-Farst Search, Transitwe Closure, Topological Sorting,
Strongly Connected Components
33. Network Flow . . . . . . . . . . . . . . . . . . .
The Network Flow Problem, Ford-Adkerson Method, Network Searching
34. Matching . . . . . . . . . . . . . . . . . . , . . . . .
Bapartite Graphs, Stable Marriage Problem, Advanced Algorathms
ADVANCED TOPICS
35. Algorithm Machines . . . . . . . . . . . . . . . . . . .
General Approaches> Perfect ShujIes, Systolic Arrays
36. The Fast Fourier Transform . . . . . . . . . . . . . . .
Evaluate, Multiply, Interpolate, Complez Roots of Unity, Evaluation
at the Roots of Unity, Interpolatzon at the Roots of Unity, Implementation
37. Dynamic Programming . . . . . . . . . . . . . . . . . .
Knapsack Problem, Matriz Chain Product, Optimal Binary Search Trees,
Shortest Paths, Time and Space Requirements
38. Linear Programming . . . . . . . . . . . . . . . . . .
Lznear Programs, Geometric Interpretation, The Simplex Method,
Implementation
39. Exhaustive Search . . . . . . . . . . . . . . . . . . .
Exhaustive Search in Graphs, Backtrackzng, Permutation Generation,
Approximation Algorithms
40. NP-complete Problems . . . . . . . . . . . . . . . .
Deterministic and Nondeterministic Polynomial- Time Algorzthms,
NP-Completeness, Cook's Theorem, Some NP-Complete Problems
. . 373
. . 389
. . 407
. 421
. . 433
. . 443
. . 457
. . 471
. . 483
. . 497
. . 513
. . 527
................................................................
.......................................... ...... .. .................................
..... : ..... : ..... : ...... : ..... :
: ........
...........
:
...... ... .
... : ... : ... : ... : ... : ... : ::: ... : ... : ... : .... :
: : .... : .:.': .. . : : : .. : .... I .... .
.. ....................
........... .:: 'LE. -
. *
........ . :: : :
: .... .... . . ... : ....... . ..: ... .: ... :.: .................... .:. ..... :...............
............................................. ..'...: .:.....:..:...:.
... : ... : .... : ... : : -..: ... : ... : ... : :: : ... : ::: ... : ... : ... : .:. : .... : ... : : .. : ... : ... : ... : .
........ . .:. ..:........ : ........ . .:: . :....... : ..... .:: . .:. . : .. ..... : :..:. ... . .:. . : ....... : ......
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
:..: :..: .:. : :. :.:. : .:..:-:. : .:. : :. :.:.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-:.::.: : :: : :: . .. .. .* . . . . . . . . . . . . . :.:. .
. . . . . . . . . . . . . . . : ::: .:: : :. :-:-. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
:. :.::: :. : .:: : :. : :..: :. : :*. :.:. :-:.
. . . . . . . . . . . . . . . . . . . . . . . . *.. . . . . . . . . . . . .
. :. . : . :. :: . :. . : . :. : : . :... : . ::.
:. : :. : :. : :..: :. :-:. : :. : :. : :. : :.
. . . . . . . . ::.. . .:. ..:a: . . . . . . .:. : . . . :..: ..:..
. . . . . . . . . . . . . . . . . . . . . * . . . . . . *.. .
:. : :. : :. : :. : ::: :. : :. :.:. : :. : ::
..:....: . .::. :.. :... : ..::. ..: . .:. ..::. :...
. . . : . . . : . . . : .*. : . . . : . . . : . . . . . .*. : .*. : . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . a:: . ..: . .:.*. . . . : :.. . . . . . . . . .
. . . . . ..: : . . . :.:.. :.... : . . . . . ..: : . . . : . . ::..
. . . . . . . . . . . . . . . . . . . . . . . . . :-. :*:. I
........................................
:.a: ... : :: ..: ... : .... : .... . .:. : .:. : ... : .... : .:..: .
...............................................
...................... : .... :::.: :.: : *.- : :.: :::.: -:.: - .....................................
................................................
..:...:...... ..: .:..:.: ........: .:.:..: ....... .: .......:...:....: .: .....: .:....:...... : .
........ . .:: . : ....... : ........ . . .:. . : .. : .... : ......
..................................
: .. : .:. : ... : ... : ... : ... : ... : ... : ... : .... : :: : .
. I ,
........ ...... ... :........... : ... ::. ... . . ..: ........ .
... : ... : ..... ... : .... . .....................
- -.,- : :
.............: ... : ... : ... : : .. : ... : .
........ .:... : ....... :......... : .:. . : .. :...:: ...... :.:,,; .*. : . . . : . ..'. . . . : . . . : . . . : . . . : . . . : . . . : . . . : .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .-... . .:..: . . . . . . . :... . ..: .:... . . . . . . . . . . . . . . . . . . . . . .
. ...,: ..; : . . -. . . : . . . . . . . :*. . :. . : : . . s-:. . : . . . . . . . . :. : :'. :.:. : ::: :: : :. : ::. :-:. : ::: . ................................ ......................................................
...............................................
... : ... : .... . ::: ... : ... : ... : .': : .:. : ... : ... :
: .: ..: ... : .... :.: ... . ....
:
.:. : ... :; .... . .... . .
................... ..............
..... ..:....:. . : .. ..: .. : ..... *:: . .:. . :.: .: ... : ........ . . .: "1 : ....... : ........ . . ::. . :.. ... . . : .......
................................................................
... : .... : : .. : ... : :: :.::: ... : ... : .... : : .. : ... : :: : ::: ... : ... :-.:. : : .. : ... : :: : ::: ... : .
......................................................................................... : .....
............. . ........................... ..~................ .....
... : .... : ... :'.:. : ... : .... . : .. : ... : .... : ... : .:. : ... : ::.: ... : ... :.::: ... : .:. : ... : .... . ... : .
..... ..: .. .:. . :.: ..... : .: ..... . . ::. . : ....... : .... . .. . . .: ... : ...... . : ..... .:: . .:. . : ....... : .......
.............................. ............................ : ............
.... . ... : .::: ... : .... : ::.: ... : .:. : ... : .... : .... . : .. : .:. : ... :: ... : .... . ... : .:. : ... :.::: .... . .
......... : ... .:. ... . ......... : ......... : ... .:. ... . ......... : ......... : ... .:. ... . ......... : .
.... . .. . : ... ; :. .. . .... : .... . .. . : .... . : .. :. ... : .,.,,: .:: : ... : : .. :. ... : ..: .: .. . : ... : : .. :: ... : .... . .
................................................................
............................................................. : ..............................
................................................. : .............................
... : ... : .... : .... : : .. : ... : ... : ... : :: : .:. : :.": ... : ... : ... : ... : -....: ... : : .. : .I. : ... : ... : .
........ . . ::. . ::. .... . : ........ . : .:. ..: .. .. : .: ...... . . :: ... : .. : .... : ...... :.: . .:. . :.: ...... . .... . .
... : .... . .... . .. . : ... : ... : ... : : ...
1-i
. .... ................... .. .....
....................... :
.:. : .....
.. : ....... : ... : ::: ... : :: : ... : ... : ... : : .. : ... : .... : .
............................................. : ...................... .: .................... ..: ...
................................. : ................................. : .......
... : ... : .... : ... : ... : ... : .:. : ... : ... : ... : .... . ... : ... : ... : ... : .... : ... : ... : ... : .:. : ... : .
....... . . .:. . : ....... : ....... . . .:. . : ....... : ....... . . .:. . : ....... : ....... . . .:. . : ....... : ......
......................... : ......................... : .....................
:..: .... : ::. : ... :.:: : :.:: ... : .:. : .... :.: .. : :.,.: :: : .::: ... : .... : -....: : .. : .:. : :: :.::: .... . .
........................ . . : ....... .:. . .:. ..... : ..... . .................. : ....... : ..........
-:.: -.: : ... : :: ::: : *:.: -.: : ... : :: ::: : -:.: ... . : ... : : .. : .... : .... . .. . : ... : : .. : .... : .... . .
............... : ............... : ..... .., ........ . ..... .:. ..... : ...............
............... . ..... .:. ..... ; ............... : ............... . ..... .:. ..... : .............
.... . ... : .:. : ... : .... : .... . ... : .;. : ... : .... : . .., .: ... : .;. : ... : .... : .... . ... : .:. : ... : .... : .... . .
.........................................................
......... : ... .:. ... . ......... : ......... : ... .:. ... . ......... : ......... : ... .:. ... .
..~..~.
..........
..... : ..... : ..... : ..... : ..... : ..... : ..... : ..... : ..... : ..... : ..... : ..... : .....
... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : ... : .
Introduction
The objective of this book is to study a broad variety of important and
useful algorithms: methods for solving problems which are suited for
computer implementation. We'll deal with many different areas of application,
always trying to concentrate on "fundamental" algorithms which are
important to know and interesting to stu.dy. Because of the large number of
areas and algorithms to be covered, we won't have room to study many of
the methods in great depth. However, we will try to spend enough time on
each algorithm to understand its essential characteristics and to respect its
subtleties. In short, our goal is to learn a large number of the most important
algorithms used on computers today, well enough to be able to use and
appreciate them.
To learn an algorithm well, one must implement it. Accordingly, the
best strategy for understanding the programs presented in this book is to
implement and test them, experiment with variants, and try them out on
real problems. We will use the Pascal programming language to discuss and
implement most of the algorithms; since, however, we use a relatively small
subset of the language, our programs are easily translatable to most modern
programming languages.
Readers of this book are expected to have at least a year's experience
in programming in high- and low-level languages. Also, they should have
some familiarity with elementary algorithms on simple data structures such
as arrays, stacks, queues, and trees. (We'll review some of this material but
within the context of their use to solve particular problems.) Some elementary
acquaintance with machine organization and computer architecture is also
assumed. A few of the applications areas that we'll deal with will require
knowledge of elementary calculus. We'll also be using some very basic material
involving linear algebra, geometry, and discrete mathematics, but previous
knowledge of these topics is not necessary.
INTRODUCTION
This book is divided into forty chapters which are organized into seven
major parts. The chapters are written so that they can be read independently,
to as great extent as possible. Generally, the first chapter of each part
gives the basic definitions and the "ground rules" for the chapters in that
part; otherwise specific references make it clear when material from an earlier
chapter is required.
Algorithms
When one writes a computer program, one is generally implementing a method
of solving a problem which has been previously devised. This method is often
independent of the particular computer to be used: it's likely to be equally
appropriate for many computers. In any case, it is the method, not the
computer program itself, which must be studied to learn how the problem
is being attacked. The term algorithm is universally used in computer science
to describe problem-solving methods suitable for implementation as computer
programs. Algorithms are the "stuff" of computer science: they are central
objects of study in many, if not most, areas of the field.
Most algorithms of interest involve complicated methods of organizing
the data involved in the computation. Objects created in this way are called
data structures, and they are also central objects of study in computer science.
Thus algorithms and data structures go hand in hand: in this book we will
take the view that data structures exist as the byproducts or endproducts of
algorithms, and thus need to be studied in order to understand the algorithms.
Simple algorithms can give rise to complicated data structures and, conversely,
complicated algorithms can use simple data structures.
When a very large computer program is to be developed, a great deal
of effort must go into understanding and defining the problem to be solved,
managing its complexity, and decomposing it into smaller subtasks which can
be easily implemented. It is often true that many of the algorithms required
after the decomposition are trivial to implement. However, in most cases
there are a few algorithms the choice of which is critical since most of the
system resources will be spent running those algorithms. In this book, we will
study a variety of fundamental algorithms basic to large programs in many
applications areas.
The sharing of programs in computer systems is becoming more widespread,
so that while it is true that a serious computer user will use a large
fraction of the algorithms in this book, he may need to implement only a
somewhat smaller fraction of them. However, implementing simple versions
of basic algorithms helps us to understand them better and thus use advanced
versions more effectively in the future. Also, mechanisms for sharing software
on many computer systems often make it difficult to tailor standard programs
INTRODUCTION 5
to perform effectively on specific tasks, so that the opportunity to reimplement
basic algorithms frequently arises.
Computer programs are often overoptimized. It may be worthwhile to
take pains to ensure that an implementation is the most efficient possible only
if an algorithm is to be used for a very large task or is to be used many times.
In most situations, a careful, relatively simple implementation will suffice: the
programmer can have some confidence that it will work, and it is likely to
run only five or ten times slower than the best possible version, which means
that it may run for perhaps an extra fraction of a second. By contrast, the
proper choice of algorithm in the first place can make a difference of a factor
of a hundred or a thousand or more, which translates to minutes, hours, days
or more in running time. In this book, -we will concentrate on the simplest
reasonable implementations of the best algorithms.
Often several different algorithms (or implementations) are available to
solve the same problem. The choice of the very best algorithm for a particular
task can be a very complicated process, often involving sophisticated mathematical
analysis. The branch of computer science where such questions are
studied is called analysis of algorithms. Many of the algorithms that we will
study have been shown to have very good performance through analysis, while
others are simply known to work well through experience. We will not dwell
on comparative performance issues: our goal is to learn some reasonable algorithms
for important tasks. But we will try to be aware of roughly how well
these algorithms might be expected to perform.
Outline of Topics
Below are brief descriptions of the major parts of the book, which give some of
the specific topics covered as well as some indication of the general orientation
towards the material described. This set of topics is intended to allow us
to cover as many fundamental algorithms as possible. Some of the areas
covered are "core" computer science areas which we'll study in some depth
to learn basic algorithms of wide applicability. We'll also touch on other
disciplines and advanced fields of study within computer science (such as
numerical analysis, operations research, clompiler construction, and the theory
of algorithms): in these cases our treatment will serve as an introduction to
these fields of study through examination of some basic methods.
MATHEMATICAL ALGORITHMS include fundamental methods from
arithmetic and numerical analysis. We study methods for addition and multiplication
of integers, polynomials, and matrices as well as algorithms for
solving a variety of mathematical problems which arise in many contexts:
random number generation, solution of simultaneous equations, data fitting,
6 IiVTRODUCTIOiV
and integration. The emphasis is on algorithmic aspects of the methods, not
the mathematical basis. Of course we can't do justice to advanced topics
with this kind of treatment, but the simple methods given here may serve to
introduce the reader to some advanced fields of study.
SORTING methods for rearranging files into order are covered in some
depth, due to their fundamental importance. A variety of methods are developed,
described, and compared. Algorithms for several related problems are
treated, including priority queues, selection, and merging. Some of these
algorithms are used as the basis for other algorithms later in the book.
SEARCHING methods for finding things in files are also of fundamental
importance. We discuss basic and advanced methods for searching using trees
and digital key transformations, including binary search trees, balanced trees,
hashing, digital search trees and tries, and methods appropriate for very large
files. These methods are related to each other and similarities to sorting
methods are discussed.
STRING PROCESSING algorithms include a range of methods for dealing
with (long) sequences of characters. String searching leads to pattern
matching which leads to parsing. File compression techniques and cryptology
are also considered. Again, an introduction to advanced topics is given
through treatment of some elementary problems which are important in their
own right.
GEOMETRIC ALGORITHMS comprise a collection of methods for solving
problems involving points and lines (and other simple geometric objects)
which have only recently come into use. We consider algorithms for finding
the convex hull of a set of points, for finding intersections among geometric
objects, for solving closest point problems, and for multidimensional searching.
Many of these methods nicely complement more elementary sorting and
searching methods.
GRAPH ALGORITHMS are useful for a variety of difficult and important
problems. A general strategy for searching in graphs is developed and
applied to fundamental connectivity problems, including shortest-path, minimal
spanning tree, network flow, and matching. Again, this is merely an
introduction to quite an advanced field of study, but several useful and interesting
algorithms are considered.
ADVANCED TOPICS are discussed for the purpose of relating the material
in the book to several other advanced fields of study. Special-purpose hardware,
dynamic programming, linear programming, exhaustive search, and NPcompleteness
are surveyed from an elementary viewpoint to give the reader
some appreciation for the interesting advanced fields of study that are suggested
by the elementary problems confronted in this book.
INTRODUCTION 7
The study of algorithms is interesting because it is a new field (almost
all of the algorithms we will study are less than twenty-five years old) with
a rich tradition (a few algorithms have been known for thousands of years).
New discoveries are constantly being made, and few algorithms are comp!etely
understood. In this book we will consider intricate, complicated, and difficult
algorithms as well as elegant, simple, and easy algorithms. Our challenge is
to understand the former and appreciate the latter in the context of many
different potential application areas. In doing so, we will explore a variety of
useful tools and develop a way of "algorithmic thinking" that will serve us
well in comnutational challenges to come.
1. Preview
To introduce the general approach that we'll be taking to studying
algorithms, we'll examine a classic elementary problem: "Reduce a given
fraction to lowest terms." We want to write 213, not 416, 200/300, or 178468/
267702. Solving this problem is equival.ent to finding the greatest common
divisor (gcd) of the numerator and the denominator: the largest integer which
divides them both. A fraction is reduced to lowest terms by dividing both
numerator and denominator by their greatest common divisor.
Pascal
A concise description of the Pascal language is given in the Wirth and Jensen
Pascal User Manual and Report that serves as the definition for the language.
Our purpose here is not to repeat information from that book but rather to
examine the implementation of a few simple algorithms which illustrate some
of the basic features of the language and. the style that we'll be using.
Pascal has a rigorous high-level syntax which allows easy identification of
the main features of the program. The variables (var) and functions (function)
used by the program are declared first, f~ollowed by the body of the program.
(Other major program parts, not used in the program below which are declared
before the program body are constants and types.) Functions have the same
format as the main program except that they return a value, which is set by
assigning something to the function name within the body of the function.
(Functions that return no value are called procedures.)
The built-in function readln reads a. line from the input and assigns the
values found to the variables given as arguments; writeln is similar. A standard
built-in predicate, eof, is set to true when there is no more input. (Input and
output within a line are possible with read, write, and eoln.) The declaration
of input and output in the program statement indicates that the program is
using the "standard" input and output &reams.
9
10 CHAPTER 1
To begin, we'll consider a Pascal program which is essentially a translation
of the definition of the concept of the greatest common divisor into a
programming language.
program example(input, output);
var x, y: integer;
function gcd( u, v: integer) : integer;
var t: integer;
if u<v then t:=u else t:=v;
while (u mod t<>O) or (vmod t<>O) do t:=t-1;
gcd:=t
end ;
while not eof do
readln (x, y ) ;
writeln(x, y, gcd(abs(x), abs(y)));
end
end.
The body of the program above is trivial: it reads two numbers from the
input, then writes them and their greatest common divisor on the output.
The gcd function implements a "brute-force" method: start at the smaller of
the two inputs and test every integer (decreasing by one until 1 is reached)
until an integer is found that divides both of the inputs. The built-in function
abs is used to ensure that gcd is called with positive arguments. (The mod
function is used to test whether two numbers divide: u mod v is the remainder
when u is divided by v, so a result of 0 indicates that v divides u.)
Many other similar examples are given in the Pascal User Manual and
Report. The reader is encouraged to scan the manual, implement and test
some simple programs and then read the manual carefully to become reasonably
comfortable with most of the features of Pascal.
Euclid's Algorithm
A much more efficient method for finding the greatest common divisor than
that above was discovered by Euclid over two thousand years ago. Euclid's
method is based on the fact that if u is greater than v then the greatest
common divisor of u and v is the same as the greatest common divisor of v
and u - v. Applying this rule successively, we can continue to subtract off
multiples of v from u until we get a number less than v. But this number is
PREVIEW 11
exactly the same as the remainder left after dividing u by v, which is what
the mod function computes: the greatee:t common divisor of u and v is the
same as the greatest common divisor of 1) and u mod v. If u mod v is 0, then v
divides u exactly and is itself their greatest common divisor, so we are done.
This mathematical description explains how to compute the greatest
common divisor of two numbers by computing the greatest common divisor
of two smaller numbers. We can implement this method directly in Pascal
simply by having the gcd function call itself with smaller arguments:
function gcd( u, v:integer) : integer;
if v=O then gcd:= u
else gcd:=gcd(v, u mod v)
end;
(Note that if u is less than v, then u m'od v is just u, and the recursive call
just exchanges u and v so things work as described the next time around.)
If the two inputs are 461952 and 116298, then the following table shows the
values of u and v each time gcd is invoked:
(461952,1:16298)
(116298,1:13058)
(113058,324O)
(3240,2898)
(2898,342)
(342,162)
(162,18)
(1% 0)
It turns out that this algorithm always uses a relatively small number of
steps: we'll discuss that fact in some moire detail below.
Recursion
A fundamental technique in the design of efficient algorithms is recursion:
solving a problem by solving smaller versions of the same problem, as in the
program above. We'll see this general approach used throughout this book,
and we will encounter recursion many tirnes. It is important, therefore, for us
to take a close look at the features of the above elementary recursive program.
An essential feature is that a recursive program must have a termination
condition. It can't always call itself, there must be some way for it to do
12 CHAPTER 1
something else. This seems an obvious point when stated, but it's probably
the most common mistake in recursive programming. For similar reasons, one
shouldn't make a recursive call for a larger problem, since that might lead to
a loop in which the program attempts to solve larger and larger problems.
Not all programming environments support a general-purpose recursion
facility because of intrinsic difficulties involved. Furthermore, when recursion
is provided and used, it can be a source of unacceptable inefficiency. For these
reasons, we often consider ways of removing recursion. This is quite easy to
do when there is only one recursive call involved, as in the function above. We
simply replace the recursive call with a goto to the beginning, after inserting
some assignment statements to reset the values of the parameters as directed
by the recursive call. After cleaning up the program left by these mechanical
transformations, we have the following implementation of Euclid's algorithm:
function gcd(u, v:integer):integer;
var t: integer;
while v<>O do
begin t:= u mod v; u:=v; v:=t end;
gcd:=u
end ;
Recursion removal is much more complicated when there is more than
one recursive call. The algorithm produced is sometimes not recognizable, and
indeed is very often useful as a di.fferent way of looking at a fundamental algorithm.
Removing recursion almost always gives a more efficient implementation.
We'll see many examples of this later on in the book.
Analysis of Algorithms
In this short chapter we've already seen three different algorithms for the same
problem; for most problems there are many different available algorithms.
How is one to choose the best implementation from all those available?
This is actually a well developed area of study in computer science.
Frequently, we'll have occasion to call on research results describing the performance
of fundamental algorithms. However, comparing algorithms can be
challenging indeed, and certain general guidelines will be useful.
Usually the problems that we solve have a natural "size" (usually the
amount of data to be processed; in the above example the magnitude of
the numbers) which we'll normally call N. We would like to know the
resources used (most often the amount of time taken) as a function of N.
We're interested in the average case, the amount of time a program might be
expected to take on "typical" input data, and in the worst case, the amount
of time a program would take on the worst possible input configuration.
Many of the algorithms in this book are very well understood, to the point
that accurate mathematical formulas are known for the average- and worstcase
running time. Such formulas are developed first by carefully studying
the program, to find the running time in terms of fundamental mathematical
quantities and then doing a mathematical analysis of the quantities involved.
For some algorithms, it is easy to hgure out the running time. For example,
the brute-force algorithm above obviously requires min(u, VU)-gcd(u, V)
iterations of the while loop, and this quantity dominates the running time if
the inputs are not small, since all the other statements are executed either
0 or 1 times. For other algorithms, a substantial amount of analysis is involved.
For example, the running time of the recursive Euclidean algorithm
obviously depends on the "overhead" required for each recursive call (which
can be determined only through detailed1 knowledge of the programming environment
being used) as well as the number of such calls made (which can
be determined only through extremely sophisticated mathematical analysis).
Several important factors go into this analysis which are somewhat outside
a given programmer's domain of influence. First, Pascal programs are
translated into machine code for a given computer, and it can be a challenging
task to figure out exactly how long even one Pascal statement might take to
execute (especially in an environment where resources are being shared, so
that even the same program could have varying performance characteristics).
Second, many programs are extremely sensitive to their input data, and performance
might fluctuate wildly depending on the input. The average case
might be a mathematical fiction that is not representative of the actual data
on which the program is being used, and the worst case might be a bizarre
construction that would never occur in practice. Third, many programs of
interest are not well understood, and specific mathematical results may not
be available. Finally, it is often the case that programs are not comparable at
all: one runs much more efficiently on one particular kind of input, the other
runs efficiently under other circumstances.
With these caveats in mind, we'll use rough estimates for the running
time of our programs for purposes of classification, secure in the knowledge
that a fuller analysis can be done for important programs when necessary.
Such rough estimates are quite often easy to obtain via the old programming
saw "90% of the time is spent in 10% of the code." (This has been quoted in
the past for many different values of "go%.")
The first step in getting a rough estimate of the running time of a program
is to identify the inner loop. Which instructions in the program are executed
most often? Generally, it is only a few instructions, nested deep within the
CHAPTER 1
control structure of a program, that absorb all of the machine cycles. It is
always worthwhile for the programmer to be aware of the inner loop, just to
be sure that unnecessary expensive instructions are not put there.
Second, some analysis is necessary to estimate how many times the inner
loop is iterated. It would be beyond the scope of this book to describe the
mathematical mechanisms which are used in such analyses, but fortunately
the running times many programs fall into one of a few distinct classes. When
possible, we'll give a rough description of the analysis of the programs, but it
will often be necessary merely to refer to the literature. (Specific references
are given at the end of each major section of the book.) For example, the
results of a sophisticated mathematical argument show that the number of
recursive steps in Euclid's algorithm when u is chosen at random less than v is
approximately ((12 In 2)/7r2) 1n TJ. Often, the results of a mathematical analysis
are not exact, but approximate in a precise technical sense: the result might
be an expression consisting of a sequence of decreasing terms. Just as we are
most concerned with the inner loop of a program, we are most concerned with
the leading term (the largest term) of a mathematical expression.
As mentioned above, most algorithms have a primary parameter N,
usually the number of data items to be processed, which affects the running
time most significantly. The parameter N might be the degree of a polynomial,
the size of a file to be sorted or searched, the number of nodes in a
graph, etc. Virtually all of the algorithms in this book have running time
proportional to one of the following functions:
1 Most instructions of most programs are executed once or at most
only a few times. If all the instructions of a program have this
property, we say that its running time is constant. This is obviously
the situation to strive for in algorithm design.
log N When the running time of a program is logarithmic, the program
gets slightly slower as N grows.This running time commonly occurs
in programs which solve a big problem by transforming it into a
smaller problem by cutting the size by some constant fraction. For
our range of interest, the running time can be considered to be less
than a Yarge" constant. The base of the logarithm changes the
constant, but not by much: when N is a thousand, log N is 3 if the
base is 10, 10 if the base is 2; when N is a million, 1ogN is twice
as great. Whenever N doubles, log N increases by a constant, but
log N doesn't double until N increases to N2.
N When the running time of a program is linear, it generally is the case
that a small amount of processing is done on each input element.
When N is a million, then so is the running time. Whenever N
PREVTEW 15
doubles, then so does the running time. This is the optimal situation
for an algorithm that must process N inputs (or produce N outputs).
NlogN This running time arises in algorithms which solve a problem by
N2
N3
2N
breaking it up into smaller subpr'oblems, solving them independently,
and then combining the solutions. For lack of a better adjective
(linearithmic?), we'll say that th'e running time of such an algorithm
is "N log N." When N is a million, N log N is perhaps twenty
million. When N doubles, the running time more than doubles (but
not much more).
When the running time of an algorithm is quadratic, it is practical
for use only on relatively small problems. Quadratic running times
typically arise in algorithms which process all pairs of data items
(perhaps in a double nested loop). When N is a thousand, the
running time is a million. Whenever N doubles, the running time
increases fourfold.
Similarly, an algorithm which prlocesses triples of data items (perhaps
in a triple-nested loop) has a cubic running time and is practical for
use only on small problems. VVhen N is a hundred, the running
time is a million. Whenever N doubles, the running time increases
eightfold.
Few algorithms with exponential running time are likely to be appropriate
for practical use, though such algorithms arise naturally as
"brute-force" solutions to problems. When N is twenty, the running
time is a million. Whenever N doubles, the running time squares!
The running time of a particular prlogram is likely to be some constant
times one of these terms (the "leading term") plus some smaller terms. The
values of the constant coefficient and the terms included depends on the results
of the analysis and on implementation details. Roughly, the coefficient of the
leading term has to do with the number of instructions in the inner loop:
at any level of algorithm design it's prudent to limit the number of such
instructions. For large N the effect of the leading term dominates; for small
N or for carefully engineered algorithms, more terms may contribute and
comparisions of algorithms are more difficult. In most cases, we'll simply refer
to the running time of programs as "linear," "N log N, " "cubic," etc., with
the implicit understanding that more detailed analysis or empirical studies
must be done in cases where efficiency is very important.
A few other functions do arise. For example, an algorithm with N2
inputs that has a running time that is cubic in N is more properly classed
as an N3j2 algorithm. Also some algorithms have two stages of subproblem
decomposition, which leads to a running time proportional to N(log N)2. Both
CHAPTER 1
of these functions should be considered to be much closer to N log N than to
N2 for large N.
One further note on the "log" function. As mentioned above, the base
of the logarithm changes things only by a constant factor. Since we usually
deal with analytic results only to within a constant factor, it doesn't matter
much what the base is, so we refer to "logN," etc. On the other hand,
it is sometimes the case that concepts can be explained more clearly when
some specific base is used. In mathematics, the natz~ral logarithm (base e =
2.718281828.. .) arises so frequently that a special abbreviation is commonly
used: log, N = In N. In computer science, the binary logarithm (base 2) arises
so frequently that the abbreviation log, N = lg N is commonly used. For
example, lg N rounded up to the nearest integer is the number of bits required
to represent N in binary.
Implementing Algorithms
The algorithms that we will discuss in this book are quite well understood,
but for the most part we'll avoid excessively detailed comparisons. Our goal
will be to try to identify those algorithms which are likely to perform best for
a given type of input in a given application.
The most common mistake made in the selection of an algorithm is to
ignore performance characteristics. Faster algorithms are often more complicated,
and implementors are often willing to accept a slower algorithm to
avoid having to deal with added complexity. But it is often the case that
a faster algorithm is really not much more complicated, and dealing with
slight added complexity is a small price to pay to avoid dealing with a slow
algorithm. Users of a surprising number of computer systems lose substantial
time waiting for simple quadratic algorithms to finish when only slightly more
complicated N log N algorithms are available which could run in a fraction
the time.
The second most common mistake made in the selection of an algorithm
is to pay too much attention to performance characteristics. An N log N
algorithm might be only slightly more complicated than a quadratic algorithm
for the same problem, but a better N log N algorithm might give rise to a
substantial increase in complexity (and might actually be faster only for very
large values of N). Also, many programs are really run only a few times:
the time required to implement and debug an optimized algorithm might be
substantially more than the time required simply to run a slightly slower one.
The programs in this book use only basic features of Pascal, rather than
taking advantage of more advanced capabilities that are available in Pascal
and other programming environments. Our purpose is to study algorithms,
not systems programming nor advanced features of programming languages.
PREVIEW 17
It is hoped that the essential features of the algorithms are best exposed
through simple direct implementations in a near-universal language. For the
same reason, the programming style is somewhat terse, using short variable
names and few comments, so that the control structures stand out. The
"documentation" of the algorithms is the accompanying text. It is expected
that readers who use these programs in actual applications will flesh them out
somewhat in adapting them for a particular use.
1 8
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Solve our initial problem by writing a Pascal program to reduce a given
fraction x/y to lowest terms.
Check what values your Pascal system computes for u mod v when u and
v are not necessarily positive. Which versions of the gcd work properly
when one or both of the arugments are O?
Would our original gcd program ever be faster than the nonrecursive
version of Euclid's algorithm?
Give the values of u and v each time the recursive gcd is invoked after
the initial call gcd(12345,56789).
Exactly how many Pascal statements are executed in each of the three
gcd implementations for the call in the previous exercise?
Would it be more efficient to test for u>v in the recursive implementation
of Euclid's algorithm?
Write a recursive program to compute the largest integer less than log, N
based on the fact that the value of this function for N div 2 is one greater
than for N if N > 1.
Write an iterative program for the problem in the previous exercise. Also,
write a program that does the computation using Pascal library subroutines.
If possible on your computer system, compare the performance
of these three programs.
Write a program to compute the greatest common divisor of three integers
u, v, and w.
For what values of N is 10NlgN > 2N2? (Thus a quadratic algorithm
is not necessarily slower than an NlogN one.)
19
SOURCES for background material
A reader interested in learning more about Pascal will find a large number
of introductory textbooks available, for example, the ones by Clancy and
Cooper or Holt and Hune. Someone with experience programming in other
languages can learn Pascal effectively directly from the manual by Wirth and
Jensen. Of course, the most important thing to do to learn about the language
is to implement and debug as many programs as possible.
Many introductory Pascal textbooks contain some material on data structures.
Though it doesn't use Pascal, an important reference for further information
on basic data structures is volume one of D.E. Knuth's series on The
Art of Computer Programming. Not only does this book provide encyclopedic
coverage, but also it and later books in the series are primary references for
much of the material that we'll be covering in this book. For example, anyone
interested in learning more about Euclid's algorithm will find about fifty pages
devoted to it in Knuth's volume two.
Another reason to study Knuth's volume one is that it covers in detail
the mathematical techniques needed for the analysis of algorithms. A reader
with little mathematical background sh'ould be warned that a substantial
amount of discrete mathematics is required to properly analyze many algorithms;
a mathematically inclined reader will find much of this material ably
summarized in Knuth's first book and applied to many of the methods we'll
be studying in later books.
M. Clancy and D. Cooper, Oh! Pascal, W. W. Norton & Company, New York,
1982.
R. Holt and J. P.Hume, Programming Standard Pascal, Reston (Prentice-Hall),
Reston, Virginia, 1980.
D. E. Knuth, The Art of Computer Programming. Volume 1: Fundamental
Algorithms, Addison-Wesley, Reading, MA, 1968.
D. E. Knuth, The Art of Computer Programming. Volume .2: Seminumerical
Algorithms, Addison-Wesley, Reading, MA, Second edition, 1981.
K. Jensen and N. Wirth, Pascal User Manual and Report, Springer-Verlag,
New York, 1974.
MATHEMATICAL ALGORITHMS
5 *.a . .). . . . . . .I
-. .
. . . -.- .. -. . m..... l ..
: -
. .
. :- .: -..
. : --
: .:. . .
- a-:.. . * . . ..'.. - a. -*. . . . '.:. . . . . . ; . i . . . . .+. . -
, ,.- - . . . . . , . . .:
. : mm. . . . : . *' . . c -*.,
- . . -. . . - - . l .
- 5. - - /: .m
. .
.- .. . - -. - . . 5 '-'-: . . . . : . I
: :* . . . '..
. * -I--.
. . :. .
2. Arithmetic
cl Algorithms for doing elementary arithmetic operations such as addition,
multiplication, and division have a. very long history, dating back to
the origins of algorithm studies in the work of the Arabic mathematician
al-Khowdrizmi, with roots going even further back to the Greeks and the
Babylonians.
Though the situation is beginning to change, the raison d'e^tre of many
computer systems is their capability for doing fast, accurate numerical calculations.
Computers have built-in capabilities to perform arithmetic on integers
and floating-point representations of real numbers; for example, Pascal
allows numbers to be of type integer or re;d, with all of the normal arithmetic
operations defined on both types. Algorithms come into play when the operations
must be performed on more complicated mathematical objects, such as
polynomials or matrices.
In this section, we'll look at Pascal implementations of some simple
algorithms for addition and multiplication of polynomials and matrices. The
algorithms themselves are well-known and straightforward; we'll be examining
sophisticated algorithms for these problems in Chapter 4. Our main purpose
in this section is to get used to treating th'ese mathematical objects as objects
for manipulation by Pascal programs. This translation from abstract data to
something which can be processed by a computer is fundamental in algorithm
design. We'll see many examples throughout this book in which a proper
representation can lead to an efficient algorithm and vice versa. In this
chapter, we'll use two fundamental ways of structuring data, the array and
the linked list. These data structures are used by many of the algorithms in
this book; in later sections we'll study some more advanced data structures.
Polynomials
Suppose that we wish to write a program that adds two polynomials: we would
2 3
2 4 CJUJ'TER 2
like it to perform calculations like
(1+ 2x - 3x3) + (2 -x) = 3 + x - 3x3.
In general, suppose we wish our program to be able to compute r(x) = p(x) +
q(x), where p and q are polynomials with N coefficients. The following
program is a straightforward implementation of polynomial addition:
program poJyadd(input, output);
con& maxN=lOO;
var p, q, r: array [ O..maxN] of real;
N, i: integer;
readln (N) ;
for i:=O to N-l do read(p[i]);
for i:=O to N-l do read(q[i]);
for i:=O to N-J do r[i] :=p[i]+q[i];
for i:=O to N-l do write(r[i]);
wri teln
end.
In this program, the polynomial p(z) = pc + pix + a.. + pr\r-isN-' is
represented by the array p [O..N-l] with p [j] = pj, etc. A polynomial of degree
N-l is defined by N coefficients. The input is assumed to be N, followed by
the p coefficients, followed by the q coefficients. In Pascal, we must decide
ahead of time how large N might get; this program will handle polynomials
up to degree 100. Obviously, maxN should be set to the maximum degree
anticipated. This is inconvenient if the program is to be used at different
times for various sizes from a wide range: many programming environments
allow "dynamic arrays" which, in this case, could be set to the size N. We'll
see another technique for handling this situation below.
The program above shows that addition is quite trivial once this representation
for polynomials has been chosen; other operations are also easily
coded. For example, to multiply we can replace the third for loop by
for i:=O to 2*(N-I) do r[i] :=O;
for i:=O to N-l do
for j:=O to N-l do
rTi+j]:=r[i+j]+p[i]*qb];
ARITHMETIC
Also, the declaration of r has to be suita.bly changed to accomodate twice as
many coefficients for the product. Each of the N coefficients of p is multiplied
by each of the N coefficients of q, so this is clearly a quadratic algorithm.
An advantage of representing a polynomial by an array containing its
coefficients is that it's easy to reference any coefficient directly; a disadvantage
is that space may have to be saved for more numbers than necessary. For
example, the program above couldn't reasonably be used to multiply
(1+ .10000)(1+ 2,lOOOO) = 1+ 3~10000 + 2~20000,
even though the input involves only four c'oefficients and the output only three.
An alternate way to represent a pol:ynomial is to use a linked list. This
involves storing items in noncontiguous memory locations, with each item
containing the address of the next. The Pascal mechanisms for linked lists are
somewhat more complicated than for arrays. For example, the following program
computes the sum of two polynomials using a linked list representation
(the bodies of the readlist and add functions and the writelist procedure are
given in the text following):
program polyadd(input, output);
type link q = mode;
node = record c: real; next: link end ;
var N: integer; a: link;
function readlist(N: integer) : link;
procedure writelist(r: link);
function add(p, q: link) : link;
readln(N); new(z);
writelist(add(readlist(N), readlist(N
end.
The polynomials are represented by linked lists which are built by the
readlist procedure. The format of these is described in the type statement:
the lists are made up of nodes, each node containing a coefficient and a link
to the next node on the list. If we have a link to the first node on a list, then
we can examine the coefficients in order, by following links. The last node
on each list contains a link to a special (dummy node called a: if we reach z
when scanning through a list, we know we're at the end. (It is possible to get
by without such dummy nodes, but they do make certain manipulations on
the lists somewhat simpler.) The type statement only describes the formats
of the nodes; nodes can be created only when the builtin procedure new is
called. For example, the call new(z) creates a new node, putting a pointer to
26 CHAPTER 2
it in a. (The other nodes on the lists processed by this program are created
in the readlist and add routines.)
The procedure to write out what's on a list is the simplest. It simply
steps through the list, writing out the value of the coefficient in each node
encountered, until z is found:
procedure writelist(r: link);
while r<>z do
begin write(rf.c); r:=rt.next end;
wri teln
end ;
The output of this program will be indistinguishable from that of the
program above which uses the simple array representation.
Building a list involves first calling new to create a node, then filling in
the coefficient, and then linking the node to the end of the partial list built so
far. The following function reads in N coefficients, assuming the same format
as before, and constructs the linked list which represents the corresponding
polynomial:
function readlist (N: integer) : link;
var i: integer; t: link;
t:=z;
for i:=O to N-l do
begin new(tf.next); t:=tt.next; read(tt.c) end;
tf.next:=z; readlist:=zf.next; zf.next:=z
end ;
The dummy node z is used here to hold the link which points to the first node
on the list while the list is being constructed. After this list is built, a is set
to link to itself. This ensures that once we reach the end of a list, we stay
there. Another convention which is sometimes convenient, would be to leave z
pointing to the beginning, to provide a way to get from the back to the front.
Finally, the program which adds two polynomials constructs a new list
in a manner similar to readlist, calculating the coefficients for the result
by stepping through the argument lists and adding together corresponding
coefficients:
ARITHM73TIC 2 7
function add(p, q: link): link;
var t : link ;
t:=z;
repeat
new(tt.next); t:=tf.next;
tf.c:=pt.c+qf.c;
p:=pf.next; q:=qf.next
until (p=z) and (q=z);
tt.next:=z; add:=zt.next
end ;
Employing linked lists in this way, we use only as many nodes as are
required by our program. As N gets larger, we simply make more calls on new.
By itself, this might not be reason enough. to use linked lists for this program,
because it does seem quite clumsy comlpared to the array implementation
above. For example, it uses twice as much space, since a link must be stored
along with each coefficient. However, as suggested by the example above, we
can take advantage of the possibility that many of the coefficients may be zero.
We can have list nodes represent only the nonzero terms of the polynomial by
also including the degree of the term represented within the list node, so that
each list node contains values of c and j to represent cxj. It is then convenient
to separate out the function of creating a node and adding it to a list, as
follows:
type link = fnode;
node = record c: real; j: integer; next: link end;
function listadd(t: link; c: real; j: integer): link;
new(tf.next); t:=tT.next;
tf.c:=c; tt.j:=j;
listadd:=t;
end ;
The listadd function creates a new node, gives it the specified fields, and links
it into a list after node t. Now the readlist routine can be changed either to
accept the same input format as above (a:nd create list nodes only for nonzero
coefficients) or to input the coefficient and exponent directly for terms with
nonzero coefficient. Of course, the write,!ist function also has to be changed
suitably. To make it possible to process the polynomials in an organized
28 CIZ4PTER 2
way, the list nodes might be kept in increasing order of degree of the term
represented.
Now the add function becomes more interesting, since it has to perform
an addition only for terms whose degrees match, and then make sure that no
term with coefficient 0 is output:
function add(p, q: link): link;
t:=z; st.j:=iV+l;
repeat
if (pf.j=qf.j) and (pf.c+qf.c<>O.O) then
t:=listadd(t,pf.c+qt.c,pt.j);
p:=pt.next; q:=qf.next
end
else if pf.j<qt.j then
begin t:=listadd(t, pt.c, pf.j); p:=pt.next end
else if qf.j<pt.j then
begin t:=listadd(t, qf.c, q1.j); q:=qt.next end;
until (p=z) and (q=z);
tf.next:=z; add:=zf.next
end ;
These complications are worthwhile for processing "sparse" polynomials
with many zero coefficients, but the array representation is better if there are
only a few terms with zero coefficients. Similar savings are available for other
operations on polynomials, for example multiplication.
Matrices
We can proceed in a similar manner to implement basic operations on twodimensional
matrices, though the programs become more complicated. Suppose
that we want to compute the sum of the two matrices
This is term-by-term addition, just as for polynomials, so the addition program
is a straightforward extension of our program for polynomials:
ARITHMETIC 29
program matrixadd(input, output);
const maxN=lO;
var p, q, r: array [O..maxN, O..maxN] of real;
N, i, j: integer;
readln (N) ;
for i:=O to N-l do for j:=O to N-l do read(p[i, j]);
for i:=O to N-l do for j:=O to N-l do read(q[i, j]);
for i:=O to N-l do for j:=O to N-l do r[i, j]:=p[i, j]+q[i, j];
for i:=O to N-l do for j:=O to N do
if j=N then writeln else write(r[i, j]);
end.
Matrix multiplication is a more complicated operation. For our example,
Element r[i, j] is the dot product of the ith row of p with the jth column
of q. The dot product is simply the sum of the N term-by-term multiplications
p[i, l]*q[l, j]+p[i, 2]*q[2, j]+... p[i, N-l]*q[N-I, j] as in the following
program:
for i:=O to h-1 do
for j:=O to N-l do
t:=o.o;
for k:=iO to N-l do t:=t+p[i, k]*q[k, j];
r[i, j]:=t
end ;
Each of the N2 elements in the result matrix is computed with N multiplications,
so about N3 operations are required to multiply two N by N
matrices together. (As noted in the previous chapter, this is not really a cubic
algorithm, since the number of data items in this case is about N2, not N.)
As with polynomials, sparse matrices (those with many zero elements) can
be processed in a much more efficient manner using a linked list representation.
To keep the two-dimensional structure intact, each nonzero matrix element
is represented by a list node containing ,a value and two links: one pointing
to the next nonzero element in the same row and the other pointing to the
next nonzero element in the same column. Implementing addition for sparse
CHAPTER 2
matrices represented in this way is similar to our implementation for sparse
polynomials, but is complicated by the fact that each node appears on two
lists.
Data Structures
Even if there are no terms with zero coefficients in a polynomial or no zero
elements in a matrix, an advantage of the linked list representation is that we
don't need to know in advance how big the objects that we'll be processing
are. This is a significant advantage that makes linked structures preferable
in many situations. On the other hand, the links themselves can consume a
significant part of the available space, a disadvantage in some situations. Also,
access to individual elements in linked structures is much more restricted than
in arrays.
We'll see examples of the use of these data structures in various algorithms,
and we'll see more complicated data structures that involve more
constraints on the elements in an array or more pointers in a linked representation.
For example, multidimensional arrays can be defined which use
multiple indices to access individual items. Similarly, we'll encounter many
"multidimensional" linked structures with more than one pointer per node.
The tradeoffs between competing structures are usually complicated, and
different structures turn out to be appropriate for different situations.
When possible it is wise to think of the data and the specific operations
to be performed on it as an abstract data structure which can be realized in
several ways. For example, the abstract data structure for polynomials in the
examples above is the set of coefficients: a user providing input to one of the
programs above need not know whether a linked list or an array is being used.
Modern programming systems have sophisticated mechanisms which make
it possible to change representations easily, even in large, tightly integrated
systems.
AFUTHAJETIC 31
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Another way to represent polynomials is to write them in the form rc(xrr)(
z - r2) . . . (X - TN). How would you multiply two polynomials in this
representation?
How would you add two polynomials represented as in Exercise l?
Write a Pascal program that multiplies two polynomials, using a linked
list representation with a list node for each term.
Write a Pascal program that multiplies sparse polynomials, using a linked
list representation with no nodes for terms with 0 coefficients.
Write a Pascal function that returns the value of the element in the ith
row and jth column of a sparse matrix, assuming that the matrix is
represented using a linked list representation with no nodes for 0 entries.
Write a Pascal procedure that sets the value of the element in the ith
row and jth column of a sparse matrix to v, assuming that the matrix is
represented using a linked list representation with no nodes for 0 entries.
What is the running time of matrix multiplication in terms of the number
of data items?
Does the running time of the polynornial addition programs for nonsparse
input depend on the value of any of the coefficients?
Run an experiment to determine which of the polynomial addition pro
grams runs fastest on your computer system, for relatively large N.
Give a counterexample to the assertion that the user of an abstract data
structure need not know what representation is being used.
3. Random Numbers
Our next set of algorithms will bie methods for using a computer to
generate random numbers. We will find many uses for random numbers
later on; let's begin by trying to get a better idea of exactly what they are.
Often, in conversation, people use the term random when they really
mean arbitrary. When one asks for an trrbitrary number, one is saying that
one doesn't really care what number one gets: almost any number will do.
By contrast, a random number is a precisely defined mathematical concept:
every number should be equally likely to occur. A random number will satisfy
someone who needs an arbitrary number, but not the other way around.
For "every number to be equally likely to occur" to make sense, we must
restrict the numbers to be used to some finite domain. You can't have a
random integer, only a random integer in some range; you can't have a random
real number, only a random fraction in some range to some fixed precision.
It is almost always the case that not just one random number, but a
sequence of random numbers is needed (otherwise an arbitrary number might
do). Here's where the mathematics comes in: it's possible to prove many facts
about properties of sequences of random numbers. For example, we can expect
to see each value about the same number of times in a very long sequence
of random numbers from a small domain. Random sequences model many
natural situations, and a great deal is known about their properties. To be
consistent with current usage, we'll refer to numbers from random sequences
as random numbers.
There's no way to produce true random numbers on a computer (or any
deterministic device). Once the program is written, the numbers that it will
produce can be deduced, so how could they be random? The best we can hope
to do is to write programs which produce isequences of numbers having many of
the same properties as random numbers. Such numbers are commonly called
pseudo-random numbers: they're not really random, but they can be useful
3 3
CHAF'TER 3
as approximations to random numbers, in much the same way that floatingpoint
numbers are useful as approximations to real numbers. (Sometimes it's
convenient to make a further distinction: in some situations, a few properties
of random numbers are of crucial interest while others are irrelevant. In
such situations, one can generate quasi-random numbers, which are sure to
have the properties of interest but are unlikely to have other properties of
random numbers. For some applications, quasi-random numbers are provably
preferable to pseudo-random numbers.)
It's easy to see that approximating the property "each number is equally
likely to occur" in a long sequence is not enough. For example, each number in
the range [l,lOO] appears once in the sequence (1,2,. . . ,lOO), but that sequence
is unlikely to be useful as an approximation to a random sequence. In fact,
in a random sequence of length 100 of numbers in the range [l,lOO], it is
likely that a few numbers will appear more than once and a few will not
appear at all. If this doesn't happen in a sequence of pseudo-random numbers,
then there is something wrong with the random number generator. Many
sophisticated tests based on specific observations like this have been devised
for random number generators, testing whether a long sequence of pseudo
random numbers has some property that random numbers would. The random
number generators that we will study do very well in such tests.
We have been (and will be) talking exclusively about uniform random
numbers, with each value equally likely. It is also common to deal with random
numbers which obey some other distribution in which some values are more
likely than others. Pseudo-random numbers with non-uniform distributions
are usually obtained by performing some operations on uniformly distributed
ones. Most of the applications that we will be studying use uniform random
numbers.
Applications
Later in the book we will meet many applications in which random numbers
will be useful. A few of them are outlined here. One obvious application is in
cryptography, where the major goal is to encode a message so that it can't be
read by anyone but the intended recipient. As we will see in Chapter 23, one
way to do this is to make the message look random using a pseudo-random
sequence to encode the message, in such a way that the recipient can use the
same pseudorandom sequence to decode it.
Another area in which random numbers have been widely used is in
simulation. A typical simulation involves a large program which models some
aspect of the real world: random numbers are natural for the input to such
programs. Even if true random numbers are not needed, simulations typically
need many arbitrary numbers for input, and these are conveniently provided
by a random number generator.
RANDOM NUMBERS 3 5
When a very large amount of data is to be analyzed, it is sometimes
sufficient to process only a very small amount of the data, chosen according
to random sampling. Such applications are widespread, the most prominent
being national political opinion polls.
Often it is necessary to make a choice when all factors under consideration
seem to be equal. The national draft lottery of the 70's or the mechanisms
used on college campuses to decide which students get the choice dormitory
rooms are examples of using random numbers for decision making. In this
way, the responsibility for the decision is given to "fate" (or the computer).
Readers of this book will find themselves using random numbers extensively
for simulation: to provide random or arbitrary inputs to programs.
Also, we will see examples of algorithms which gain efficiency by using random
numbers to do sampling or to aid in decision making.
Linear Congruential Method
The most well-known method for generating random numbers, which has been
used almost exclusively since it was introduced by D. Lehmer in 1951, is the
so-called linear congruential method. If a [I] contains some arbitrary number,
then the following statement fills up an array with N random numbers using
this method:
for i:=2 to N do
a[i]:=(a[i-l]*b $1) mod m
That is, to get a new random number, take the previous one, multiply
it by a constant b, add 1 and take the remainder when divided by a second
constant m. The result is always an integer between 0 and m-l. This is
attractive for use on computers because the mod function is usually trivial to
implement: if we ignore overflow on the arithmetic operations, then most computer
hardware will throw away the bits that overflowed and thus effectively
perform a mod operation with m equal to one more than the largest integer
that can be represented in the computer word.
Simple as it may seem, the linear congruential random number generator
has been the subject of volumes of detailed and difficult mathematical analysis.
This work gives us some guidance in choosing the constants b and m. Some
"common-sense" principles apply, but in this case common sense isn't enough
to ensure good random numbers. First, m should be large: it can be the
computer word size, as mentioned above, but it needn't be quite that large
if that's inconvenient (see the implementation below). It will normally be
convenient to make m a power of 10 or 2. Second, b shouldn't be too large or
too small: a safe choice is to use a number with one digit less than m. Third,
CHAPTER 3
b should be an arbitrary constant with no particular pattern in its digits,
except that it should end with *-~21, with z even: this last requirement is
admittedly peculiar, but it prevents the occurrence of some bad cases that
have been uncovered by the mathematical analysis.
The rules described above were developed by D.E.Knuth, whose textbook
covers the subject in some detail. Knuth shows that these choices will make
the linear congruential method produce good random numbers which pass
several sophisticated statistical tests. The most serious potential problem,
which can become quickly apparent, is that the generator could get caught
in a cycle and produce numbers it has already produced much sooner than
it should. For example, the choice b=l9, m=381, with a[ I] =O, produces the
sequence 0,1,20,0,1,20 ,..., a notrvery-random sequence of integers between 0
and 380.
Any initial value can be used to get the random number generator started
with no particular effect except of course that different initial values will give
rise to different random sequences. Often, it is not necessary to store the
whole sequence as in the program above. Rather, we simply maintain a global
variable a, initialized with some value, then updated by the computation
a:=(a*b+l) mod m.
In Pascal (and many other programming languages) we're still one step
away from a working implementation because we're not allowed to ignore
overflow: it's defined to be an error condition that can lead to unpredictable
results. Suppose that we have a computer with a 32-bit word, and we choose
m =190000000, b=3f 415821, and, initially, a=1234567. All of these values are
comfortably less than the largest integer that can be represented, but the first
a* b+l operation causes overflow. The part of the product that causes the
overflow is not relevant to our computation, we're only interested in the last
eight digits. The trick is to avoid overflow by breaking the multiplication up
into pieces. To multiply p by q, we write p = 104pr + pc and q = 104qr + qo,
so the product is
Pq= (104Pl + po)(104q1 + qo)
= lo8Plql + 104(p1qo + poq1) + poqo.
Now, we're only interested in eight digits for the result, so we can ignore
the first term and the first four digits of the second term. This leads to the
following program:
RAAJDOM NUMBERS 37
program random (inpu t, output) ;
con& m=100000000; ml=lOOOO; b=31415821;
var i, a, IV: integer;
function mult(p, q: integer): integer;
var pl, ~0, ql, q0: integer;
pl :=p div ml ; pO:=p mod ml ;
ql :=q div ml; qO:=q mod ml;
mult:=( ((pO*ql+pl*qO) mod ml)*ml+pO*qO) mod m;
end ;
function random : integer ;
a:=(mult(a, b)+l) mod m;
random :=a;
end ;
read(hJ, a);
for i:=l to N do writeln(random)
end.
The function mult in this program computes p*q mod m, with no overflow
as long as m is less than half the largest integer that can be represented. The
technique obviously can be applied with m=ml*ml for other values of ml.
Here are the ten numbers produced by this program with the input N =
10 and a = 1234567:
358845'08
80001069
63512650
43635651
1034472
87181513
6917174
209855
67115956
59939877
There is some obvious non-randomness in these numbers: for example,
the last digits cycle through the digits O-9. It is easy to prove from the
formula that this will happen. Generally speaking, the digits on the right are
CHAPTER 3
not particularly random. This leads to a common and serious mistake in the
use of linear congruential random number generators: the following is a bad
program for producing random numbers in the range [0, r - 11:
function randombad(r: integer) : integer;
a:=(mult(b, a)+l) mod m;
randombad:=a mod r;
end ;
The non-random digits on the right are the only digits that are used,
so the resulting sequence has few of the desired properties. This problem is
easily fixed by using the digits on the left. We want to compute a number
between 0 and r-l by computing a*r mod m, but, again, overflow must be
circumvented, as in the following implementation:
function randomint(r: integer): integer;
a:=(mult(a, b)+l) mod m;
randomint:=((a div ml)*r) div ml
end ;
Another common technique is to generate random real numbers between
0 and 1 by treating the above numbers as fractions with the decimal point
to the left. This can be implemented by simply returning the real value a/m
rather than the integer a. Then a user could get an integer in the range [O,r)
by simply multiplying this value by r and truncating to the nearest integer.
Or, a random real number between 0 and 1 might be exactly what is needed.
Additive Congruential Method
Another method for generating random numbers is based on linear feedback
shift registers which were used for early cryptographic encryption machines.
The idea is to start with a register filled with some arbitrary pattern, then
shift it right (say) a step at a time, filling in vacated positions from the left
with a bit determined by the contents of the register. The diagram below
shows a simple 4-bit register, with the new bit taken as the "exclusive or" of
the two rightmost bits.
RANDOM NUMBERS 39
Below are listed the contents of the register for the first sixteen steps of
the process:
0 1 2 3 4 5 6 7
1011 0101 1010 1101 1110 1111 0111 0011
8 9 10 11 12 13 14 15
0001 1000 0100 0010 1001 1100 0110 1011
Notice that all possible nonzero bit patterns occur, the starting value
repeats after 15 steps. As with the linear congruential method, the mathematics
of the properties of these registers has been studied extensively. For
example, much is known about the choices of "tap" positions (the bits used
for feedback) which lead to the generation of all bit patterns for registers of
various sizes.
Another interesting fact is that the calculation can be done a word at a
time, rather than a bit at a time, according to the same recursion formula.
In our example, if we take the bitwise "exclusive or" of two successive words,
we get the word which appears three places later in the list. This leads
us to a random number generator suitable for easy implementation on a
general-purpose computer. Using a feedblack register with bits b and c tapped
corresponds to using the recursion: a[k]==(a[k-b]+a[k-c]) mod m. To keep
the correspondence with the shift register model, the "+" in this recursion
should be a bitwise "exclusive or." However, it has been shown that good
random numbers are likely to be produced even if normal integer addition is
used. This is termed the additive congwential method.
To implement this method, we need to keep a table of size c which always
has the c most recently generated num.bers. The computation proceeds by
replacing one of the numbers in the table by the sum of two of the other
numbers in the table. Initially, the tablse should be filled with numbers that
are not too small and not too large. ('One easy way to get these numbers
is to use a simple linear congruential generator!) Knuth recommends the
choices b=31, c=55 will work well for most applications, which leads to the
implementation below.
40 CHAPTER 3
procedure randinit (s: integer) ;
a[O] :=s; j:=O;
repeat j:=j+l; ab]:=(mult(b, alj--l])+l) mod m until j=54;
end ;
function randomint(r: integer): integer;
j:=(j+l) mod 5.5;
alj]:=(a[(j+23) mod 55]+a [ (jf54) mod 551) mod m;
randomint:=((ab] div ml)*r) div ml
end;
The program maintains the 55 most recently generated numbers, with the last
generated pointed to by j. Thus, the global variable a has been replaced by
a full table plus a pointer (j) into it. This large amount of "global state" is a
disadvantage of this generator in some applications, but it is also an advantage
because it leads to an extremely long cycle even if the modulus m is small.
The function randomint returns a random integer between 0 and r-l. Of
course, it can easily be changed, just as above, to a function which returns a
random real number between 0 and 1 (a b]/m).
Testing Randomness
One can easily detect numbers that are not random, but certifying that a
sequence of numbers is random is a difficult task indeed. As mentioned above,
no sequence produced by a computer can be random, but we want a sequence
that exhibits many of the properties of random numbers. Unfortunately, it is
often not possible to articulate exactly which properties of random numbers
are important for a particular application.
On the other hand, it is always a good idea to perform some kind of test
on a random number generator to be sure that no degenerate situations have
turned up. Random number generators can be very, very good, but when
they are bad they are horrid.
Many tests have been developed for determining whether a sequence
shares various properties with a truly random sequence. Most of these tests
have a substantial basis in mathematics, and it would definitely be beyond the
scope of this book to examine them in detail. However, one statistical test,
the x2 (chi-square) test, is fundamental in nature, quite easy to implement,
and useful in several applications, so we'll examine it more carefully.
The idea of the x2 test is to check whether or not the numbers produced
are spread out reasonably. If we generate N positive numbers less than r, then
RAh'DOM NUA4BERS 41
we'd expect to get about N/r numbers o:f each value. (But the frequencies of
occurrence of all the values should not be exactly the same: that wouldn't be
random!) It turns out that calculating whether or not a sequence of numbers
is distributed as well as a random sequence is very simple, as in the following
program:
function t:hisquare(N, r, s: integer) : real;
var i, t: integer;
f: array [O..rmax] of integer;
ran&nit(s);
for i:=O to rmax do f[i] :=O;
for i:=l to Ndo
t:=ranndomint(r);
f[t]::=f[t]+l;
end ;
t:=O; for i:=O to r-l do t:=t+f[i]*f[i];
chisquare:= ((r*t/N) - N);
end ;
We simply calculate the sum of the squares of the frequencies of occurrence
of each value, scaled by the expected frequency then subtract off the
size of the sequence. This number is called the "x2 statistic," which may be
exprebsed mathematically as
x2 = Co<i<T(fi -WY
N/r '
If the x2 statistic is close to r, then the numbers are random; if it is too far
away, then they are not. The notions of "close" and "far away" can be more
precisely defined: tables exist which tell exactly how to relate the statistic to
properties of random sequences. For the simple test that we're performing,
the statistic should be within 2fi of r. This is valid if N is bigger than about
107, and to be sure, the test should be tried a few times, since it could be
wrong about one out of ten times.
This test is so simple to implement that it probably should be included
with every random number generator, just to ensure that nothing unexpected
can cause serious problems. All the "good generators" that we have discussed
pass this test; the "bad ones" do not. Using the above generators to generate
a thousand numbers less than 100, we get a x2 statistic of 100.8 for the
CHAPTER 3
linear congruential method and 105.4 for the additive congruential method,
both certainly well within 20 of 100. But for the "bad" generator which uses
the rightrhand bits from the linear congruential generator the statistic is 0
(why?) and for a linear congruential method with a bad multiplier (101011)
the statistic is 77.8, which is significantly out of range.
Implementation Notes
There are a number of facilities commonly added to make a random number
generator useful for a variety of applications. Usually, it is desirable to set
up the generator as a function that is initialized and then called repeatedly,
returning a different random number each time. Another possibility is to call
the random number generator once, having it fill up an array with all the
random numbers that will be needed for a particular computation. In either
case, it is desirable that the generator produce the same sequence on successive
calls (for initial debugging or comparison of programs on the same inputs) and
produce an arbitrary sequence (for later debugging). These facilities all involve
manipulating the "state" retained by the random number generator between
calls. This can be very inconvenient in some programming environments. The
additive generator has the disadvantage that it has a relatively large state (the
array of recently produced words), but it has the advantage of having such a
long cycle that it is probably not necessary for each user to initialize it.
A conservative way to protect against eccentricities in a random number
generator is to combine two generators. (The use of a linear congruential
generator to initialize the table for an additive congruential generator is
an elementary example of this.) An easy way to implement a combination
generator is to have the first generator fill a table and the second choose
random table positions to fetch numbers to output (and store new numbers
from the first generator).
When debugging a program that uses a random number generator, it is
usually a good idea to use a trivial or degenerate generator at first, such as
one which always returns 0 or one which returns numbers in order.
As a rule, random number generators are fragile and need to be treated
with respect. It's difficult to be sure that a particular generator is good
without investing an enormous amount of effort in doing the various statistical
tests that have been devised. The moral is: do your best to use a good
generator, based on the mathematical analysis and the experience of others;
just to be sure, examine the numbers to make sure that they "look" random;
if anything goes wrong, blame the random number generator!
RAh'DOM MJM73ERS 43
Exercises
1. Write a program to generate random four-letter words (collections of
letters). Estimate how many words your program will generate before
a word is repeated.
2. How would you simulate generating random numbers by throwing two
dice and taking their sum, with the added complication that the dice are
nonstandard (say, painted with the numbers 1,2,3,5,8, and 13)?
3. What is wrong with the following linear feedback shift register?
4. Why wouldn't the "or" or "and" function (instead of the "exclusive or"
function) work for linear feedback shift registers?
5. Write a program to produce a randorn two dimensional image. (Example:
generate random bits, write a "*" when 1 is generated, " " when 0 is
generated. Another example: use random numbers as coordinates in a
two dimensional Cartesian system, write a "*" at addressed points.)
6. Use an additive congruential random number generator to generate 1000
positive integers less than 1000. Design a test to determine whether or
not they're random and apply the test.
7. Use a linear congruential generator .with parameters of your own choosing
to generate 1000 positive integers less than 1000. Design a test to
determine whether or not they're random and apply the test.
8. Why would it be unwise to use, for example, b=3 and c=6 in the additive
congruential generator?
9. What is the value of the x2 statistic for a degenerate generator which
always returns the same number?
10. Describe how you would generate random numbers with m bigger than
the computer word size.
4. Polynomials
The methods for doing arithmetic operations given in Chapter 2 are
simple and straightforward solutions to familiar problems. As such, they
provide an excellent basis for applying allgorithmic thinking to produce more
sophisticated methods which are substantially more efficient. As we'll see, it
is one thing to write down a formula which implies a particular mathematical
calculation; it is quite another thing to write a computer program which
performs the calculation efficiently.
Operations on mathematical objects are far too diverse to be catalogued
here; we'll concentrate on a variety of algorithms for manipulating polynomials.
The principal method that we'll study in this section is a polyno
mial multiplication scheme which is of no particular practical importance but
which illustrates a basic design paradigm called divide-and-conquer which is
pervasive in algorithm design. We'll see in this section how it applies to matrix
multiplication as well as polynomial multiplication; in later sections we'll see
it applied to most of the problems that we encounter in this book.
Evaluation
A first problem which arises naturally is to compute the value of a given
polynomial at a given point. For example, to evaluate
p(x) = x4 + 3x3 - 6x2 + 2x + 1
for any given x, one could compute x4, then compute and add 3x3, etc. This
method requires recomputation of the powers of x; an alternate method, which
requires extra storage, would save the powers of x as they are computed.
A simple method which avoids recomputation and uses no extra space
is known as Homer's rule: by alternat:ing the multiplication and addition
operations appropriately, a degree-N polynomial can be evaluated using only
45
46 CHAPTER 4
N - 1 multiplications and N additions. The parenthesization
P(X) = x(x(x(x + 3) - 6) + 2) + 1
makes the order of computation obvious:
Y:=PN;
for i:=N-I downto 0 do y:=x*y+p[i];
This program (and the others in this section) assume the array representation
for polynomials that we discussed in Chapter 2.
A more complicated problem is to evaluate a given polynomial at many
different points. Different algorithms are appropriate depending on how many
evaluations are to be done and whether or not they are to be done simultaneously.
If a very large number of evaluations is to be done, it may be
worthwhile to do some "precomputing" which can slightly reduce the cost
for later evaluations. Note that using Horner's method would require about
N2 multiplications to evaluate a degree-N polynomial at N different points.
Much more sophisticated methods have been designed which can solve the
problem in N(logN)' steps, and in Chapter 36 we'll see a method that uses
only N log N multiplications for a specific set of N points of interest.
If the given polynomial has only one term, then the polynomial evaluation
problem reduces to the exponentiation problem: compute xN. Horner's
rule in this case degenerates to the trivial algorithm which requires N - 1
multiplications. For an easy example of how we can do much better, consider
the following sequence for computing x32:
x z2 x4 x8 x16 32 I 7 f f 7x .
Each term is obtained by squaring the previous term, so only five multiplications
are required (not 31).
The "successive squaring" method can easily be extended to general N
if computed values are saved. For example, x55 can be computed from the
above values with four more multiphcations:
In general, the binary representation of N can be used to choose which
computed values to use. (In the example, since 55 = (110111)2, all but x8
are used.) The successive squares can be computed and the bits of N tested
within the same loop. Two methods are available to implement this using only
PoLYNoMlALs 4 7
one "accumulator," like Horner's method. One algorithm involves scanning
the binary representation of N from left to right, starting with 1 in the
accumulator. At each step, square the accumulator and also multiply by z
when there is a 1 in the binary representation of N. The following sequence
of values is computed by this method for N = 55:
1 1 z x2 x3 x6 xl2 26 27 54 55
7,) 7 I, ,xlZi,x ,x ,x ,x .
; / / *~
Another well-known alforithm whks similarly, bht scans N from right to
left. This problem is a standard introductory programming exercise, but it is
hardly of practical interest.
Interpolation
The "inverse" problem to the problem of evaluating a polynomial of degree N
at N points simultaneously is the problem of polynomial interpolation: given
a set of N points x1 ,x2,. . . ,xN and associated values yr,y2,. . . ,yN, find the
unique polynomial of degree N - 1 which1 has
p(Xl)= Yl,P(zz)= Y21 . . ..?'(xN) = YN.
The interpolation problem is to find the polynomial, given a set of points and
values. The evaluation problem is to find the values, given the polynomial
and the points. (The problem of finding the points, given the polynomial and
the values, is root-finding.)
The classic solution to the interpolation problem is given by Lagrange's
interpolation formula, which is often used as a proof that a polynomial of
degree N - 1 is completely determined by N points:
This formula seems formidable at first but is actually quite simple. For
example, the polynomial of degree 2 which has p(l) = 3, p(2) = 7, and p(3) =
13 is given by
P(X) = 3
x - 2 x - 3
1-21-3+7s!s+13s5=j
which simplifies to
x2 +a:+ 1.
For x from xl, x2, . . . , XN, the formula is constructed so that p(xk) = yk for
1 5 k 5 N, since the product evaluates to 0 unless j = k, when it evaluates
48 CHAPTER 4
to 1. In the example, the last two terms are 0 when z = 1, the first and last
terms are 0 when x = 2, and the first two terms are 0 when x = 3.
To convert a polynomial from the form described by Lagrange's formula
to our standard coefficient representation is not at all straightforward. At
least N2 operations seem to be required, since there are N terms in the sum,
each consisting of a product with N factors. Actually, it takes some cleverness
to achieve a quadratic algorithm, since the factors are not just numbers, but
polynomials of degree N. On the other hand, each term is very similar to
the previous one. The reader might be interested to discover how to take
advantage of this to achieve a quadratic algorithm. This exercise leaves one
with an appreciation for the non-trivial nature of writing an efficient program
to perform the calculation implied by a mathematical formula.
As with polynomial evaluation, there are more sophisticated methods
which can solve the problem in N(log N)2 steps, and in Chapter 36 we'll see
a method that uses only N log N multiplications for a specific set of N points
of interest.
Multiplication
Our first sophisticated arithmetic algorithm is for the problem of polynomial
multiplication: given two polynomials p(x) and q(x), compute their product
p(x)q(x). As noted in Chapter 2, polynomials of degree N - 1 could have
N terms (including the constant) and the product has degree 2N - 2 and as
many as 2N - 1 terms. For example,
(1 +x+3x2 -4x3)(1 + 2x - 5s2 - 3~~) = (1 + 3a: - 6z3 - 26x4 + 11~~ + 12x7.
The naive algorithm for this problem that we implemented in Chapter 2
requires N2 multiplications for polynomials of degree N - 1: each of the N
terms of p(x) must be multiplied by each of the N terms of q(x).
To improve on the naive algorithm, we'll use a powerful technique for
algorithm design called divide-and-conquer: split the problem into smaller
parts, solve them (recursively), then put the results back together in some
way. Many of our best algorithms are designed according to this principle.
In this section we'll see how divide-and-conquer applies in particular to the
polynomial multiplication problem. In the following section we'll look at some
analysis which gives a good estimate of how much is saved.
One way to split a polynomial in two is to divide the coefficients in half:
given a polynomial of degree N-l (with N coefficients) we can split it into two
polynomials with N/2 coefficients (assume that N is even): by using the N/2
low-order coefficients for one polynomial and the N/2 high-order coefficients
PoLMvoMIALs 49
for the other. For p(z) = po + pla: + . .. + PN-IZ?', define
f%(x) =p,, +pla:+"~+pN,2-~xN'2-1
Ph(x) = pN/2 + pN/2+15 -I- ' ' * + pN-1xN'2-?
Then, splitting q(x) in the same way, we have:
P(x) = Pi(x) + zN'2ph(x),
q(x) = 41(x) + "N'2qh(x).
Now, in terms of the smaller polynomials;, the product is given by:
P(x)dx) = Pdx)ql(x) + (Pdxk?h(x) + d+h(x))xN'2 + Ph(x)qh(x)xN.
(We used this same split in the previous chapter to avoid overflow.) What's
interesting is that only three multiplications are necessary to compute these
products, because if we compute TV = pl(x)ql(x), ?-h(x) = ph(x)qh(x), and
Tm(x) = (?'dx) + Ph(x))(ql(x) + qh(z)), we can get the product p(x)q(x) by
computing
p(x)q(x) = Tl(x)+ (Tm(x) - ?-l(x)- ?-h(x))xN'2 + '?-h(x)xN.
Polynomial addition requires a linear algorithm, and the straightforward polynomial
multiplication algorithm of Chapter 2 is quadratic, so it's worthwhile
to do a few (easy) additions to save one (difficult) multiplication. Below we'll
look more closely at the savings achieved by this method.
For the example given above, with p(x) = 1 +x +3x2 -4x3 and q(x) =
1 + 2x - 5x2 - 3x3, we have
Q(X) = (1+ x)(1 + 2x) = I + 3x + 2x2,
Q(X) = (3 -4x)(-5 - 3x) = -15 + 11x + 12x2,
T,(X) = (4 - 3x)(-4 - x) =: -16 +8x + 3x2.
Thus, r,(x) - Q(X) -?-h(x) = -2 - 6x - 11x2, and the product is computed as
p(x)q(x) = (1 + 3x + 2x2) + (-2 -6x - 11x2)x2 + (-15 + 11x + 12x2)x4
= 1+3x - 6x3 - 26x4 + 11x5 t 12x6.
This divide-and-conquer approach solves a polynomial multiplication problem
of size N by solving three subproblems of size N/2, using some polynomial
addition to set up the subproblems and to combine their solutions. Thus, this
procedure is easily described as a recursive program:
50 CHAPTER 4
function mult(p, q: array[O..N-I] of real;
N: integer) : array [O..2*N-21 of real;
var pl, 41, ph, qh, tl, t2: array [O..(N div 2)-I] of real;
rl, rm, rh: array [O..N-I] of red;
i, N2: integer;
if N=l then mult[O]:=p[O]*q[O]
else
N2:=N div 2;
for i:=O to N2-1 do
begin pl[i]:=p[i]; ql[i]:=q[i] end;
for i:=N2 to N-l do
begin ph[i-N2]:=p[i]; qh[i-N2]:=q[i] end;
for i:=O to N2-I do tI[i]:=pl[i]+ph[i];
for i:=O to N2-1 do t2[i]:=ql[i]+qh[i];
rm:=mult(tl, t2, N2);
rl:=mult(pl, 41, N2);
rh:=mult(ph, qh, N2);
for i:=O to N-2 do mult [i] :=rl[i]
mult [N-l] :=O;
for i:=O to N-2 do mult [N+i] :=rh [i]
for i:=O to N-2 do
mult[N2+i]:=mult[N2+i]+rm[i]-(rl[i]+rh[i]);
end ;
end.
Although the above code is a succinct description of this method, it is (unfortunately)
not a legal Pascal program because functions can't dynamically declare
arrays. This problem could be handled in Pascal by representing the polync+
mials as linked lists, as we did in Chapter 2. This program assumes that N is a
power of two, though the details for general N can be worked out easily. The
main complications are to make sure that the recursion terminates properly
and that the polynomials are divided properly when N is odd.
The same method can be used for multiplying integers, though care must
be taken to treat "carries" properly during the subtractions after the recursive
calls.
As with polynomial evaluation and interpolation, there are sophisticated
methods for polynomial multiplication, and in Chapter 36 we'll see a method
that works in time proportional to N log N.
POLYNOMIALS 51
Divide-and-conquer Recurrences
Why is the divide-and-conquer method g:iven above an improvement? In this
section, we'll look at a few simple recurrence formulas that can be used to
measure the savings achieved by a divide-and-conquer algorithm.
From the recursive program, it is clear that the number of integer multiplications
required to multiply two polynomials of size N is the same as the
number of multiplications to multiply three pairs of polynomials of size N/2.
(Note that, for example, no multiplications are required to compute T~(z)z~,
just data movement.) If M(N) is the number of multiplications required to
multiply two polynomials of size N, we have
M(N) = 3M(N/2)
for N > 1 with M(1) = 1. Thus M(2) q = 3, M(4) = 9, M(8) = 27, etc. In
general, if we take N = 2n, then we can repeatedly apply the recurrence to
itself to find the solution:
M(2n) = 3M(2"-l) = 32M(2"-2) = 33M(2+s) = . . . = 3n~(I) = 3n.
If N = 2n, then 3% = 2('s31n = 2n1s3 = N's3. Although this solution is exact
only for N = 2n, it works out in general that
M(N) FZ Nlg3 z N1.58,
which is a substantial savings over the N2 naive method. Note that if we were
to have used all four multiplications in the simple divide-and-conquer method,
the recurrence would be M(N) = 4M(Nl/2) with the solution M(2n) = 4n =
N2.
The method described in the previous section nicely illustrates the divideand-
conquer technique, but it is seldom usled in practice because a much better
divide-and-conquer method is known, which we'll study in Chapter 36. This
method gets by with dividing the original into only two subproblems, with
a little extra processing. The recurrence describing the number of multiplications
required is
M(N) = 2M(N/2) + N.
Though we don't want to dwell on the mathematics of solving such recurrences,
formulas of this particular form arise so frequently that it will be
worthwhile to examine the development of an approximate solution. First, as
above, we write N = 2?
M(2n) = 2M(2'"-') + 2".
52 CHAPTER 4
The trick to making it simple to apply this same recursive formula to itself is
to divide both sides by 2n:
M(2n) M(29
-2n = 2n-l $1.
Now, applying this same formula to itself n times ends up simply giving n
copies of the "1," from which it follows immediately that M(2n) = 712~. Again,
it turns out that this holds true (roughly) for all N, and we have the solution
M(N) z NlgN.
We'll see several algorithms from different applications areas whose performance
characteristics are described by recurrences of this type. Fortunately,
many of the recurrences that come up are so similar to those above that the
same techniques can be used.
For another example, consider the situation when an algorithm divides
the problem to be solved in half, then is able to ignore one half and (recursively)
solve the other. The running time of such an algorithm might be described
by the recurrence
M(N) = M(N/2) + 1.
This is easier to solve than the one in the previous paragraph. We immediately
have I14(2~) = n and, again, it turns out that M(N) z 1gN.
Of course, it's not always possible to get by with such trivial manipulations.
For a slightly more difficult example, consider an algorithm of the type
described in the previous paragraph which must somehow examine each element
before or after the recursive step. The running time of such an algorithm
is described by the recurrence
M(N) = M(N/2) + N.
Substituting N = 2n and applying the same recurrence to itself n times now
gives
This must be evaluated to get the result I~f(2~) = 2n+1 - 1 which translates
to M(N) z 2N for general N.
To summarize, many of the most interesting algorithms that we will
encounter are based on the divide-and-conquer technique of combining the
solutions of recursively solved smaller subproblems. The running time of such
algorithms can usually be described by recurrence relationships which are a
direct mathematical translation of the structure of the algorithm. Though
PoLYNoMIALs 53
such relationships can be challenging to solve precisely, they are often easy to
solve for some particular values of N to get solutions which give reasonable
estimates for all values of N. Our purpo,se in this discussion is to gain some
intuitive feeling for how divide-and-conquer algorithms achieve efficiency, not
to do detailed analysis of the algorithms. Indeed, the particular recurrences
that we've just solved are sufficient to describe the performance of most of
the algorithms that we'll be studying, and we'll simply be referring back to
them.
Matrix Multiplication
The most famous application of the divide-and-conquer technique to an arithmetic
problem is Strassen's method for matrix multiplication. We won't go
into the details here, but we can sketch the method, since it is very similar to
the polynomial multiplication method that we have just studied.
The straightforward method for multiplying two N-by-N matrices requires
N3 scalar multiplications, since each of the N2 elements in the product
matrix is obtained by N multiplications.
Strassen's method is to divide the size of the problem in half; this corresponds
to dividing each of the matrice;s into quarters, each N/2 by N/2.
The remaining problem is equivalent to multiplying 2-by-2 matrices. Just as
we were able to reduce the number of multiplications required from four to
three by combining terms in the polynomial multiplication problem, Strassen
was able to find a way to combine terms to reduce the number of multiplications
required for the 2-by-2 matrix multiplication problem from 8 to 7. The
rearrangement and the terms required are quite complicated.
The number of multiplications required for matrix multiplication using
Strassen's method is therefore defined by the divide-and-conquer recurrence
M(N) = 7M(N/2)
which has the solution
M(N) M N1g7 FZ N2.81.
This result was quite surprising when it first appeared, since it had previously
been thought that N3 multiplications were absolutely necessary for matrix
multiplication. The problem has been studied very intensively in recent years,
and slightly better methods than Strassen's have been found. The "best"
algorithm for matrix multiplication has still not been found, and this is one
of the most famous outstanding problems of computer science.
It is important to note that we have been counting multiplications only.
Before choosing an algorithm for a practical application, the costs of the
extra additions and subtractions for combining terms and the costs of the
CHAPTER 4
recursive calls must be considered. These costs may depend heavily on the
particular implementation or computer used. But certainly, this overhead
makes Strassen's method less efficient than the standard method for small
matrices. Even for large matrices, in terms of the number of data items input,
Strassen's method really represents an improvement only from N'.5 to N1.41.
This improvement is hard to notice except for very large N. For example, N
would have to be more than a million for Strassen's method to use four times
as few multiplications as the standard method, even though the overhead per
multiplication is likely to be four times as large. Thus the algorithm is a
theoretical, not practical, contribution.
This illustrates a general tradeoff which appears in all applications (though
the effect, is not always so dramatic): simple algorithms work best for small
problems, but sophisticated algorithms can reap tremendous savings for large
problems.
r-l
POLYNOMIALS 55
Exercises
1. Give a method for evaluating a polynomial with known roots ~1, ~2, . . . ,
TN, and compare your method with Horner's method.
2. Write a program to evaluate polynomials using Horner's method, where
a linked list representation is used for the polynomials. Be sure that your
program works efficiently for sparse polynomials.
3. Write an N2 program to do Lagrang:ian interpolation.
4. Suppose that we know that a polynomial to be interpolated is sparse (has
few non-zero coefficients). Describe how you would modify Lagrangian
interpolation to run in time proportional to N times the number of nonzero
coefficients.
5. Write out all of the polynomial multipllications performed when the divideand-
conquer polynomial multiplication method described in the text is
used tosquare 1+x+~2+x3+s4+x5+x6+~7+xs.
6. The polynomial multiplication routinie mult could be made more efficient
for sparse polynomials by returning 0 if all coefficients of either input are
0. About how many multiplications ((to within a constant factor) would
such a program use to square 1 + xN?
7. Can x32 be computed with less than five multiplications? If so, say which
ones; if not, say why not.
8. Can x55 be computed with less than nine multiplications? If so, say which
ones; if not, say why not.
9. Describe exactly how you would modify mult to multiply a polynomial of
degree N by another of degree M, with N > M.
10. Give the representation that you would use for programs to add and
multiply multivariate polynomials such as xy2z + 31~~~7~~2~~ + w. Give
the single most important reason for choosing this representation.
5. Gaussian Elimination
Certainly one of the most fundam.ental scientific computations is the
solution of systems of simultaneous equations. The basic algorithm for
solving systems of equations, Gaussian elimination, is relatively simple and
has changed little in the 150 years since it was invented. This algorithm has
come to be well understood, especially in the past twenty years, so that it can
be used with some confidence that it will efficiently produce accurate results.
This is an example of an algorithm that will surely be available in most
computer installations; indeed, it is a primitive in several computer languages,
notably APL and Basic. However, the basic algorithm is easy to understand
and implement, and special situations do arise where it might be desirable
to implement a modified version of the algorithm rather than work with a
standard subroutine. Also, the method deserves to be learned as one of the
most important numeric methods in use today.
As with the other mathematical material that we have studied so far, our
treatment of the method will highlight only the basic principles and will be
self-contained. Familiarity with linear algebra is not required to understand
the basic method. We'll develop a simple Pascal implementation that might
be easier to use than a library subroutine for simple applications. However,
we'll also see examples of problems which could arise. Certainly for a large or
important application, the use of an expertly tuned implementation is called
for, as well as some familiarity with the underlying mathematics.
A Simple Example
Suppose that we have three variables :c,y and z and the following three
equations:
x + 3y - 4;~ = 8,
x+y-2;z=2,
-x-2y+5;s=-1.
57
5 8 CHAPTER 5
Our goal is to compute the values of the variables which simultaneously
satisfy the equations. Depending on the particular equations there may not
always be a solution to this problem (for example, if two of the equations are
contradictory, such as 2 + y = 1, z + y = 2) or there may be many solutions
(for example, if two equations are the same, or there are more variables than
equations). We'll assume that the number of equations and variables is the
same, and we'll look at an algorithm that will find a unique solution if one
exists.
To make it easier to extend the formulas to cover more than just three
points, we'll begin by renaming the variables, using subscripts:
s1+3s2-4~=8,
21+22-223=2,
-x1 - 2x2 + 5x3 = -1.
To avoid writing down variables repeatedly, it is convenient to use matrix
notation to express the simultaneous equations. The above equations are
exactly equivalent to the matrix equation
There are several operations which can be performed on such equations which
will not alter the solution:
Interchange equations: Clearly, the order in which the equations are
written down doesn't affect the solution. In the matrix representation,
this operation corresponds to interchanging rows in the matrix (and
the vector on the right hand side).
Rename variables: This corresponds to interchanging columns in the
matrix representation. (If columns i and j are switched, then variables
xi and xj must also be considered switched.)
Multiply equations by a constant: Again, in the matrix representation,
this corresponds to multiplying a row in the matrix (and the corresponding
element in the vector on the righbhand side) by a constant.
Add two equations and replace one of them by the sum. (It takes a
little thought to convince oneself that this will not affect the solution.)
For example, we can get a system of equations equivalent to the one above
by replacing the second equation by the difference between the first two:
GAUSSIAN ELIMINATION 59
Notice that this eliminates zi from the second equation. In a similar manner,
we can eliminate xi from the third equation by replacing the third equation
by the sum of the first and third:
(i ; ';)(iz) =($).
Now the variable zi is eliminated from all but the first equation. By systematically
proceeding in this way, we can transform the original system of
equations into a system with the same solution that is much easier to solve.
For the example, this requires only one more step which combines two of the
operations above: replacing the third equation by the difference between the
second and twice the third. This makes all of the elements below the main
diagonal 0: systems of equations of this form are particularly easy to solve.
The simultaneous equations which result in our example are:
xi+3xs-453=8,
2x2 - 22s = 6,
-42s = -8.
Now the third equation can be solved immediately: x3 = 2. If we substitute
this value into the second equation, we can compute the value of x2:
2x2 - 4 == 6,
x2 == 5.
Similarly, substituting these two values in the first equation allows the value
of xi to be computed:
x1 + 15 - 8 = 8,
Xl = 1,
which completes the solution of the equations.
This example illustrates the two basic phases of Gaussian elimination.
The first is the forward elimination phase, where the original system is transformed,
by systematically eliminating variables from equations, into a system
with all zeros below the diagonal. This process is sometimes called triangulation.
The second phase is the backward substitution phase, where the values
of the variables are computed using the t:riangulated matrix produced by the
first phase.
Outline of the Method
In general, we want to solve a system of N equations in N unknowns:
allxl + a1222 + --- + alNxN = bl,
a2121 + a2252 + *9* + a2NxN = bz,
aNlx1 + aN252 $ - -. -+ aNNxN = bN.
60 CHAJ'TER 5
In matrix form, these equations are written as a single matrix equation:
or simply AX = b, where A represents the matrix, z represents the variables,
and b represents the rightrhand sides of the equations. Since the rows of A
are manipulated along with the elements of b, it is convenient to regard b as
the (N + 1)st column of A and use an N-by-(N + 1) array to hold both.
Now the forward elimination phase can be summarized as follows: first
eliminate the first variable in all but the first equation by adding the appropriate
multiple of the first equation to each of the other equations, then
eliminate the second variable in all but the first two equations by adding the
appropriate multiple of the second equation to each of the third through the
Nth equations, then eliminate the third variable in all but the first three
equations, etc. To eliminate the ith variable in the jth equation (for j between
i $- 1 and N) we multiply the ith equation by aji/aii and subtract it
from the jth equation. This process is perhaps more succinctly described by
the following program, which reads in N followed by an N-by-( N + 1) matrix,
performs the forward elimination, and writes out the triangulated result. In
the input, and in the output the ith line contains the ith row of the matrix
followed by b,.
program gauss(input, output);
const maxN=50;
var a: array[I..maxN, l..maxN] of real;
i, J', k, N: integer;
readln (N) ;
for j:==l to N do
begin for k:=l to N+1 do read(ab, k]); readln end;
for i:==l to N do
for j:=i+l to N do
for k:=N+l downto i do
ab,k]:=aIj,k]-a[i,k]*ab,i]/a[i,i];
for j:==l to N do
begin for k:=l to N+1 do write(ab, k]); writeln end;
end.
GAUSSIAN ELlMlNATION 61
(As we found with polynomials, if we wtint to have a program that takes N
as input, it is necessary in Pascal to first decide how large a value of N will
be "legal," and declare the array suitably.) Note that the code consists of
three nested loops, so that the total running time is essentially proportional
to N3. The third loop goes backwards so as to avoid destroying ab, i] before
it is needed to adjust the values of other #elements in the same row.
The program in the above paragraph is too simple to be quite right: a[& i]
might be zero, so division by zero could 'occur. This is easily fixed, because
we can exchange any row (from i+1 to N) with the ith row to make a[i, i]
non-zero in the outer loop. If no such row can be found, then the matrix is
singular: there is no unique solution.
In fact, it is advisable to do slightly more than just find a row with a
non-zero entry in the ith column. It's best to use the row (from if1 to N)
whose entry in the ith column is the largest in absolute value. The reason for
this is that severe computational errors can arise if the a[& i] value which is
used to scale a row is very small. If a[i, i] is very small, then the scaling factor
ab, i]/a[i, i] which is used to eliminate the ith variable from the jth equation
(for j from i+l to N) will be very large. In fact, it could get so large as to
dwarf the actual coefficients ali, k], to the point where the alj, k] value gets
distorted by "round-off error."
Put simply, numbers which differ greatly in magnitude can't be accurately
added or subtracted in the floating point number system commonly used to
represent real numbers, but using a small a[& i] value greatly increases the
likelihood that such operations will have to be performed. Using the largest
value in the ith column from rows i+l to N will ensure that the scaling factor
is always less than 1, and will prevent the occurrence of this type of error. One
might contemplate looking beyond the ith column to find a large element, but
it has been shown that accurate answers can be obtained without resorting to
this extra complication.
The following code for the forward elimination phase of Gaussian elimination
is a straightforward implementation of this process. For each i from 1 to
N, we scan down the ith column to find the largest element (in rows past the
ith). The row containing this element is exchanged with the ith , then the ith
variable is eliminated in the equations i+.l to N exactly as before:
62 CHAPTER 5
procedure eliminate;
var i, j, k, max: integer;
t: real;
for i:=l to Ndo
max:=i;
for j:=i+l to N do
if abs(aIj, i])>abs(a[max, i]) then max:=j;
for k:=i to N+l do
begin t:=a[i, k]; a[i, k] :=a[max, k]; a[max, k] :=t end;
for j:=i+l to N do
for k:=N+l downto i do
ab,k]:=ab,k]-a[i,k]*ab,i]/a[i,i];
end
end ;
(A call to eliminate should replace the three nested for loops in the program
gauss given above.) There are some algorithms where it is required that the
pivot a[i, i] be used to eliminate the ith variable from every equation but the
ith (not just the (i+l)st through the Nth). This process is called full pivoting;
for forward elimination we only do part of this work hence the process is called
partial pivoting .
After the forward elimination phase has completed, the array a has
all zeros below the diagonal, and the backward substitution phase can be
executed. The code for this is even more straightforward:
procedure substitute;
var j, k: integer;
t: real;
for j:=N downto 1 do
tr=o.o;
for k:=j+l to N do t:=t+ab, k]*x[k];
r!i:=(ali, N+ll-t)/alj,j]
end ;
A call to eliminate followed by a call to substitute computes the solution in
the N-element array x. Division by 0 could still occur for singular matrices.
GAUSSLAN ELMNATION 63
Obviously a "library" routine would check for this explicitly.
An alternate way to proceed after forward elimination has created all
zeros below the diagonal is to use precisely the same method to produce all
zeros above the diagonal: first make the last column zero except for a[N, N]
by adding the appropriate multiple of a[N, N], then do the same for the nextto-
last column, etc. That is, we do "partial pivoting" again, but on the other
"part" of each column, working backwards through the columns. After this
process, called Gauss- Jordan reduction, is complete, only diagonal elements
are non-zero, which yields a trivial solution.
Computational errors are a prime source of concern in Gaussian elimination.
As mentioned above, we should be wary of situations when the magnitudes
of the coefficients vastly differ. Using the largest available element
in the column for partial pivoting ensures that large coefficients won't be arbitrarily
created in the pivoting process, but it is not always possible to avoid
severe errors. For example, very small coefficients turn up when two different
equations have coefficients which are quite close to one another. It is actually
possible to determine in advance whether such problems will cause inaccurate
answers in the solution. Each matrix haa an associated numerical quantity
called the condition number which can ble used to estimate the accuracy of
the computed answer. A good library subroutine for Gaussian elimination
will compute the condition number of the matrix as well as the solution, so
that the accuracy of the solution can be lknown. Full treatment of the issues
involved would be beyond the scope of this book.
Gaussian elimination with partial pivoting using the largest available
pivot is "guaranteed" to produce results with very small computational errors.
There are quite carefully worked out mathematical results which show that the
calculated answer is quite accurate, except for ill-conditioned matrices (which
might be more indicative of problems in .the system of equations than in the
method of solution). The algorithm has been the subject of fairly detailed
theoretical studies, and can be recommended as a computational procedure
of very wide applicability.
Variations and Extensions
The method just described is most appropriate for N-by-N matrices with
most of the N2 elements non-zero. As we've seen for other problems, special
techniques are appropriate for sparse matrices where most of the elements are
0. This situation corresponds to systems 'of equations in which each equation
has only a few terms.
If the non-zero elements have no particular structure, then the linked
list representation discussed in Chapter '2 is appropriate, with one node for
each non-zero matrix element, linked together by both row and column. The
64 CHAPTER 5
standard method can be implemented for this representation, with the usual
extra complications due to the need to create and destroy non-zero elements.
This technique is not likely to be worthwhile if one can afford the memory to
hold the whole matrix, since it is much more complicated than the standard
method. Also, sparse matrices become substantially less sparse during the
Gaussian elimination process.
Some matrices not only have just a few non-zero elements but also have
a simple structure, so that linked lists are not necessary. The most common
example of this is a "band)) matrix, where the non-zero elements all fall very
close to the diagonal. In such cases, the inner loops of the Gaussian elimination
algorithms need only be iterated a few times, so that the total running time
(and storage requirement) is proportional to N, not N3.
An interesting special case of a band matrix is a "tridiagonal" matrix,
where only elements directly on, directly above, or directly below the diagonal
are non-zero. For example, below is the general form of a tridiagonal matrix
for N = !j:
/a11 a12 0 0 0
~21 a22 a23 0 0
0 a32 a33 a34 0
0 0 a43 a44 a45
i 0 0 0 a54 a55
For such matrices, forward elimination and backward substitution each reduce
to a single for loop:
for i:=l to N-l do
a[i+l, N+l]:=a[i+l, N+l]-a[i, iV+l]*a[i+l, i]/a[i,i];
a[i+l, ifl] :=a[i+l, i+l]-a[i, i+l]*a[i+l, i]/a[i,i]
end ;
for j:== N downto 1 do
xb]:=(ab, N+1]-ab, j+l]*xb+1])/ab, j];
For forward elimination, only the case j=i+l and k=i+l needs to be included,
since a[i, k]=O for k>i+l. (The case k =i can be skipped since it sets to 0
an array element which is never examined again -this same change could be
made to straight Gaussian elimination.) Of course, a two-dimensional array
of size N2 wouldn't be used for a tridiagonal matrix. The storage required for
the above program can be reduced to be linear in N by maintaining four arrays
instead of the a matrix: one for each of the three nonzero diagonals and one
for the (N + l)st column. Note that this program doesn't necessarily pivot on
the largest available element, so there is no insurance against division by zero
GAUSSIAN ELMNATION 65
or the accumulation of computational errors. For some types of tridiagonal
matrices which arise commonly, it can be proven that this is not a reason for
concern.
Gauss-Jordan reduction can be implemented with full pivoting to replace
a matrix by its inverse in one sweep th.rough it. The inverse of a matrix
A, written A-', has the property that a system of equations Ax = b could
be solved just by performing the matrix multiplication z = A-lb. Still, N3
operations are required to compute x given b. However, there is a way to
preprocess a matrix and "decompose" it into component parts which make
it possible to solve the corresponding system of equations with any given
rightchand side in time proportional to 1V2, a savings of a factor of N over
using Gaussian elimination each time. Roughly, this involves remembering
the operations that are performed on the (N + 1)st column during the forward
elimination phase, so that the result of forward elimination on a new (N + 1)st
column can be computed efficiently and then back-substitution performed as
usual.
Solving systems of linear equations has been shown to be computationally
equivalent to multiplying matrices, so tlhere exist algorithms (for example,
Strassen's matrix multiplication algorithm) which can solve systems of N
equations in N variables in time proportional to N2.*l.... As with matrix
multiplication, it would not be worthwhile to use such a method unless very
large systems of equations were to be processed routinely. As before, the
actual running time of Gaussian elimination in terms of the number of inputs
is N312. which is difficult to imnrove uoon in nractice.
66
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Give the matrix produced by the forward elimination phase of Gaussian
elimination (gauss, with eliminate) when used to solve the equations x +
y+z=6, 2x+y+3z=12, and3x+y+32=14.
Give a system of three equations in three unknowns for which gauss as is
(without eliminate) fails, even though there is a solution.
What is the storage requirement for Gaussian elimination on an N-by-N
matrix with only 3N nonzero elements?
Describe what happens when eliminate is used on a matrix with a row of
all 0's.
Describe what happens when eliminate then substitute are used on a
matrix with a column of all 0's.
Which uses more arithmetic operations: Gauss-Jordan reduction or back
substitution?
If we interchange columns in a matrix, what is the effect on the corresponding
simultaneous equations?
How would you test for contradictory or identical equations when using
eliminate.
Of what use would Gaussian elimination be if we were presented with a
system of M equations in N unknowns, with M < N? What if M > N?
Give an example showing the need for pivoting on the largest available
element, using a mythical primitive computer where numbers can be
represented with only two significant digits (all numbers must be of the
form z.y x 10' for single digit integers 2, y, and 2).
6. Curve Fitting
The term curve fitting (or data fitting) is used to describe the general
problem of finding a function which matches a set of observed values at
a set of given points. Specifically, given the points
and the corresponding values
Yl,Y2,--.,YN,
the goal is to find a function (perhaps of a specified type) such that
f(zl) = Yl, f(z2) = Y2,. . . , f(zN) = YN
and such that f(z) assumes "reasonable" values at other data points. It could
be that the z's and y's are related by some unknown function, and our goal
is to find that function, but, in general, the definition of what is "reasonable"
depends upon the application. We'll see that it is often easy to identify
"unreasonable" functions.
Curve fitting has obvious application in the analysis of experimental data,
and it has many other uses. For example,, it can be used in computer graphics
to produce curves that "look nice" withlout the overhead of storing a large
number of points to be plotted. A related application is the use of curve fitting
to provide a fast algorithm for computing the value of a known function at
an arbitrary point: keep a short table of exact values, curve fit to find other
values.
Two principal methods are used to approach this problem. The first is
interpolation: a smooth function is to be found which exactly matches the
given values at the given points. The second method, least squares data fitting,
is used when the given values may not be exact, and a function is sought which
matches them as well as possible.
6 7
68 CHAPTER 6
Polynomial Interpolation
We've already seen one method for solving the data-fitting problem: if f is
known to be a polynomial of degree N - 1, then we have the polynomial interpolation
problem of Chapter 4. Even if we have no particular knowledge about
f, we could solve the data-fitting problem by letting f(z) be the interpolating
polynomial of degree N - 1 for the given points and values. This could be
computed using methods outlined elsewhere in this book, but there are many
reasons not to use polynomial interpolation for data fitting. For one thing,
a fair amount of computation is involved (advanced N(log N)2 methods are
available, but elementary techniques are quadratic). Computing a polynomial
of degree 100 (for example) seems overkill for interpolating a curve through
100 points.
The main problem with polynomial interpolation is that high-degree
polynomials are relatively complicated functions which may have unexpected
properties not well suited to the function being fitted. A result from classical
mathematics (the Weierstrass approximation theorem) tells us that it is possible
to approximate any reasonable function with a polynomial (of sufficiently
high degree). Unfortunately, polynomials of very high degree tend to fluctuate
wildly. It turns out that, even though most functions are closely approximated
almost everywhere on a closed interval by an interpolation polynomial, there
are always some places where the approximation is terrible. Furthermore,
this theory assumes that the data values are exact values from some unknown
function when it is often the case that the given data values are only approximate.
If the y's were approximate values from some unknown low-degree
polynomial, we would hope that the coefficients for the high-degree terms in
the interpolating polynomial would be 0. It doesn't usually work out this
way; instead the interpolating polynomial tries to use the high-degree terms
to help achieve an exact fit. These effects make interpolating polynomials
inappropriate for many curve-fitting applications.
Spline Interpolation
Still, low-degree polynomials are simple curves which are easy to work with
analytically, and they are widely used for curve fitting. The trick is to abandon
the idea of trying to make one polynomial go through all the points and instead
use different polynomials to connect adjacent points, piecing them together
smoothly,, An elegant special case of this, which also involves relatively
straightforward computation, is called spline interpolation.
A "spline" is a mechanical device used by draftsmen to draw aesthetically
pleasing curves: the draftsman fixes a set of points (knots) on his drawing, then
bends a flexible strip of plastic or wood (the spline) around them and traces
it to produce the curve. Spline interpolation is the mathematical equivalent
of this process and results in the same curve.
CURVE FITTING 6 9
It can be shown from elementary mechanics that the shape assumed by
the spline between two adjacent knots is a third-degree (cubic) polynomial.
Translated to our data-fitting problem, this means that we should consider
the curve to be N - 1 different cubic polynomials
Si(X) = aix3 + biX2 + cix + di, i=1,2 ,..., N - l ,
with si(x) defined to be the cubic polynomial to be used in the interval between
xi and xi+17 as shown in the following diagram:
yps ... A:'"'"'
The spline can be represented in the obvious way as four one-dimensional
arrays (or a 4-by-(N - 1) two-dimensional array). Creating a spline consists
of computing the necessary a, b, c, d coefficients from the given x points and
y values. The physical constraints on the spline correspond to simultaneous
equations which can be solved to yield the coefficients.
For example, we obviously must have si(xi) = yi and si(xi+i) = yi+l for
i=1,2 ,***, N - 1 because the spline must touch the knots. Not only does the
spline touch the knots, but also it curves ;smoothly around them with no sharp
bends or kinks. Mathematically, this means that the first derivatives of the
spline polynomials must be equal at the knots (si-i(xi) = s:(xi) for i = 2,3,. . . ,
N - 1). In fact, it turns out that the second derivatives of the polynomials
must be equal at the knots. These conditions give a total of 4N - 6 equations
in the 4(N-1) unknown coefficients. Two more conditions need to be specified
to describe the situation at the endpoints of the spline. Several options are
available; we'll use the so-called "natural" spline which derives from s;1(xr) =
0 and s&,(x~) = 0. The s e conditions give a full system of 4N - 4 equations
in 4N - 4 unknowns, which could be solved using Gaussian elimination to
calculate all the coefficients that describe the spline.
The same spline can be computed somewhat more efficiently because
there are actually only N - 2 "unknowns": most of the spline conditions are
redundant. For example, suppose that p, is the value of the second derivative
of the spline at xi, so that s:)_~(x~) = .sy(xi) = pi for i = 2,. . . , N - 1, with
pr = pN = 0. If the values of pl, . . . ,phr are known, then all of the a, b, c, d
coefficients can be computed for the spline segments, since we have four
70 CHAPTER 6
equations in four unknowns for each spline segment: for i = 1,2,. . . , N - 1,
we must have
Si(&) = yi
%(%+1) = !A+1
((Xi) = pi
s:'(xi+l) = pi+1.
Thus, to fully determine the spline, we need only compute the values of
P21..., pN-1. But this discussion hasn't even considered the conditions that
the first derivatives must match. These N - 2 conditions provide exactly
the N - 2 equations needed to solve for the N - 2 unknowns, the pi second
derivative values.
To express the a, b, c, and d coefficients in terms of the p second derivative
values, then substitute those expressions into the four equations listed above
for each spline segment, leads to some unnecessarily complicated expressions.
Instead it is convenient to express the equations for the spline segments in a
certain canonical form that involves fewer unknown coefficients. If we change
variables to t = (CC - zi)/x(,+1 - Q) then the spline can be expressed in the
following way:
sip) = tyi+1 + (1 - t)yi + ( x,+1 - xd2[(t3 - qpi+1 - ((1 -t)" - (1 - t))pJ
Now each spline is defined on the interval [O,l]. This equation is less formidable
than it looks because we're mainly interested in the endpoints 0 and 1,
and either t or (1 - t) is 0 at these points. It's trivial to check that the spline
interpolates and is continuous because si-r(l) = si(O) = yi for i = 2,. . . , N-l,
and it's only slightly more difficult to verify that the second derivative is continuous
because s:(l) = s;+,(O) = pi+i. These are cubic polynomials which
satisfy the requisite conditions at the endpoints, so they are equivalent to the
spline segments described above. If we were to substitute for t and find the
coefficient of x3, etc., then we would get the same expressions for the a's, b's,
c's, and d's in terms of the x's, y's, and p's as if we were to use the method
described in the previous paragraph. But there's no reason to do so, because
we've checked that these spline segments satisfy the end conditions, and we
can evaluate each at any point in its interval by computing t and using the
above formula (once we know the p's).
To solve for the p's we need to set the first derivatives of the spline
segments equal at the endpoints. The first derivative (with respect to x) of
the above equation is
s:(t) = zi + (Xi+1 - zJ[(3t2 - l)pi+i + (3(I - t)2 + I)pi]
CURVE FITTING 71
where z = (yi+l-yi)/(zi+l-zi). Now, setting &(l) = s;(O) for i = 2,. . . , N-
1 gives our system of N - 2 equations to solve:
(Xi - ~i-l)Pi-l+2(~i,-l - +1)p, + (Xi+1 -z&+1 = zi -z&1.
This system of equations is a simple "tridiagonal" form which is easily solved
with a degenerate version of Gaussian elimination as we saw in Chapter 5. If
we let ui = zi+l - zi, di = 2(zi+l -xi--i), and wi = zi - zi.-1, we have, for
example, the following simultaneous equ.ations for N = 7:
In fact, this is a symmetric tridiagonal system, with the diagonal below the
main diagonal equal to the diagonal above the main diagonal. It turns out that
pivoting on the largest available element is not necessary to get an accurate
solution for this system of equations.
The method described in the above paragraph for computing a cubic
spline translates very easily into Pascal:
procedure makespline;
var i: integer;
readln (N) ;
for i:=l to N do readln(x[i],y[i]);
for i:=2 to N-l do d[i]:=2*(x[i+l]-x[i-11);
for i:=l to N-l do u[i]:=x[i+l]-x[i];
for i:=2 to N-l do
w[i]:=(y[i+l]--y[i])/u[i]-(y[i]-y[i-l])/u[i-11;
p[l] :=o.o; p[Nj :=o.o;
for i:=2 to N-2 do
w[i+l]:=w[i+;l]-w[i]*u[i]/d[i];
d[i+l]:=d[i+l]-u[i]*u[i]/d[i]
end ;
for i:=N-1 downto 2 do
p[i]:=(w[i]-u[i]*p[i+l])/d[i];
end ;
7 2 CHAPTER 6
The arrays d and u are the representation of the tridiagonal matrix that is
solved using the program in Chapter 5. We use d[i] where a[i, i] is used in
that program, u[i] where a [i+l, i] or a[i, i+l] is used, and z[i] where a[i, N+I]
is used.
For an example of the construction of a cubic spline, consider fitting a
spline to the five data points
(1.0,2.0), (2.0,1.5), (4.0,1.25), (5.0,1.2), (8.0,1.125), (10.0,l.l).
(These come from the function 1 + l/z.) The spline parameters are found by
solving the system of equations
with the result p2 = 0.06590, p3 = -0.01021, p4 = 0.00443, ps = -0.00008.
To evaluate the spline for any value of 2 in the range [zr , zN], we simply
find the interval [zi, zi+r] containing z, then compute t and use the formula
above for si(z) (which, in turn, uses the computed values for pi and pi+r).
function eval(v: real): real;
var t: real; i: integer;
function f(x: real): red;
begin f:=x*x*x-x end;
i:=O; repeat i:=i+l until v<=x[i+l];
t:=(v-x[i])/u[i];
eval:=t*y[i+l]+(l-t)*y[i]
end +u[i]*u[i]*(f(t)*p[i+l]-f(l-t)*p[i])
This program does not check for the error condition when v is not between
x[l] and x[IVl. If there are a large number of spline segments (that is, if N
is large), then there are more efficient "searching" methods for finding the
interval containing v, which we'll study in Chapter 14.
There are many variations on the idea of curvefitting by piecing together
polynomials in a "smooth" way: the computation of splines is a quite welldeveloped
field of study. Other types of splines involve other types of smoothness
criteria as well as changes such as relaxing the condition that the spline
must exactly touch each data point. Computationally, they involve exactly
CURVE FITTING 73
the same steps of determining the coefficients for each of the spline pieces by
solving the system of linear equations derived from imposing constraints on
how they are joined.
Method of Least Squares
A very common experimental situation is that, while the data values that we
have are not exact, we do have some idea of the form of the function which
is to fit the data. The function might depend on some parameters
and the curve fitting procedure is to find the choice of parameters that "best"
matches the observed values at the given points. If the function were a polynomial
(with the parameters being the coefficients) and the values were exact,
then this would be interpolation. But now we are considering more general
functions and inaccurate data. To simplify the discussion, we'll concentrate
on fitting to functions which are expressed as a linear combination of simpler
functions, with the unknown parameters being the coefficients:
f(x) = Clfl(X) + c2f2(x) +**a+ cA4fdx).
This includes most of the functions that we'll be interested in. After studying
this case, we'll consider more general functions.
A common way of measuring how well a function fits is the least-squares
criterion: the error is calculated by adding up the squares of the errors at
each of the observation points:
E = c (f(xj)-yj)2.
l-<j-<N
This is a very natural measure: the squaring is done to stop cancellations
among errors with different signs. Obviously, it is most desirable to find the
choice of parameters that minimizes E. It turns out that this choice can be
computed efficiently: this is the so-called method of least squares.
The method follows quite directly from the definition. To simplify the
derivation, we'll do the case M = 2, N = 3, but the general method will follow
directly. Suppose that we have three points xi, x2, x3 and corresponding values
yi, ys, ys which are to be fitted to a function of the form f(x) = cl fi(x) +
cz f2(x). Our job is to find the choice of the coefficients cl, cz which minimizes
the least-squares error
E =(clfl(xl) + czfdxd - yd2
+ (nf1(x2) + czfz(x2) - Y2)2
+ (Clfi(X3) + c2f2(x3) - Y3)2.
74 CHAPTER 6
To find the choices of cl and c2 which minimize this error, we simply need to
set the derivatives dE/dq and dE/dcz to zero. For cl we have:
g =2(cl.fl(G) + Czfz(~1) - Yl)fl(~l)
1
+ 2(Clfl(~Z) + czfz(z2) - yz)f1(52)
+2(clfl(~3)+CZf2(~3)-Y3)fi(X3).
Setting the derivative equal to zero leaves an equation which the variables cl
and cs must satisfy (fi(~i), etc. are all "constants" with known values):
clvl(~1vl(~l) + fl(~Z)fl(~Z) + fl(Z3)fi(53)]
+c2[f2(~l)fl(~l) + f2(~2)fl(z2) + f2(~3)fl(z3)]
= Yl.fl(G) + YZfl(52) + Y3fl(53).
We get a similar equation when we set the derivative dE/dc2 to zero.
These rather formidable-looking equations can be greatly simplified using
vector notation and the "dot product" operation that we encountered briefly
in Chapter 2. If we define the vectors x = (21, zz,23) and y = (yi, yz, ys) and
then the dot product of x and y is the real number defined by
X-Y = z1y1+ xzy2 + 23y3
Now, if we define the vectors fi = (fl(xi),fi(xz),fi(xs)) and fs = (fs(xi),
f2(52), f2(23)) the n our equations for the coefficients cl and cz can be very
simply expressed:
Clfi . fi + CZfi *fz = y . fi
ClfZ.fi +c2f2 *f2 =y-f2.
These can be solved with Gaussian elimination to find the desired coefficients.
For example, suppose that we know that the data points
(1.0,2.05), (2.0,1.53), (4.0,1.26), (5.0,1.21), (8.0,1.13), (10.0,l.l).
should be fit by a function of the form cl + cz/x. (These data points are
slightly perturbed from the exact values for 1 + l/x). In this case, we have
fi = (l.O,l.O, l.O,l.O,l.O, 1.0) and f2 = (1.0,0.5,0.25,0.2,0.125,0.1) so we have
to solve the system of equations
CURVE FITTING 7 5
with the result cl = 0.998 and c2 = 1.054 (both close to 1, as expected).
The method outlined above easily generalizes to find more than two
coefficients. To find the constants ~1~2,. . . ,CM in
f(z) = clfl(z) + czfz(s) + *+* + CA4fM(Z)
which minimize the least squares error for the point and observation vectors
first compute the function component vectors
fl = (fl(d, f1(s2), . . . , fl(ZN)),
f-2 = (f2(~1), f2(z2), . * *, f2(Ziv)),
Then make up an M-by-M linear system of equations AC = b with
a,j = fi "fy,
b, = fj . y.
The solution to this system of simultaneous equations yields the required
coefficients.
This method is easily implemented by maintaining a two dimensional
array for the f vectors, considering y as the (M + 1)st vector. Then an array
a[l..M, I..M+I] can be filled as follows:
for i:=l to Mdo
for j:==l to M+l do
it:= 0.0;
for k:=l to N do t:=t+f[i, k]*fb, k];
a[& j]:=t;
end;
and then solved using the Gaussian elimination procedure from Chapter 5.
The method of least squares can be extended to handle nonlinear functions
(for example a function such as f(x) = cle-C2Zsincg~), and it is often
76 CHAPTER 6
used for this type of application. The idea is fundamentally the same; the
problem is that the derivatives may not be easy to compute. What is used
is an iterative method: use some estimate for the coefficients, then use these
within the method of least squares to compute the derivatives, thus producing
a better estimate for the coefficients. This basic method, which is widely used
today, was outlined by Gauss in the 1820s.
CURVE FITTING 77
Exercises
1. Approximate the function lgx with a degree 4 interpolating polynomial
at the points 1,2,3,4, and 5. Estimat.e the quality of the fit by computing
the sum of the squares of the errors at 1.5, 2.5, 3.5, and 4.5.
2. Solve the previous problem for the function sinx. Plot the function and
the approximation, if that's possible on your computer system.
3. Solve the previous problems using a cubic spline instead of an interpolating
polynomial.
4. Approximate the function lgx with a cubic spline with knots at 2N for
N between 1 and 10. Experiment with different placements of knots in
the same range to try to obtain a better fit.
5. What would happen in least squares data fitting if one of the functions
was the function Ii(x) = 0 for some i?
6. What would happen in least squares data-fitting if all the observed values
were O?
7. What values of a, b, c minimize the least-squares error in using the function
f(x) = ux log x + bx + c to approximate the observations f(1) = 0, f(4) =
13, f(8) = 41?
8. Excluding the Gaussian elimination phase, how many multiplications are
involved in using the method of least squares to find M coefficients based
on N observations?
9. Under what circumstances would the matrix which arises in least-squares
curve fitting be singular?
10. Does the least-squares method work if two different observations are included
for the same point?
7. Integration
Computing the integral is a fundamental analytic operation often performed
on functions being processed on computers. One of two completely
different approaches can be used, depending on the way the function is
represented. If an explicit representation of the function is available, then it
may be possible to do symbolic integrathn to compute a similar representation
for the integral. At the other extreme, the function may be defined by a table,
so that function values are known for only a few points. The most common
situation is between these: the function to be integrated is represented in such
a way that its value at any particular point can be computed. In this case,
the goal is to compute a reasonable approximation to the integral of the function,
without performing an excessive number of function evaluations. This
computation is often called quadrature by numerical analysts.
Symbolic Integration
If full information is available about a function, then it may be worthwhile
to consider using a method which involves manipulating some representation
of the function rather than working with numeric values. The goal is to
transform a representation of the function into a representation of the integral,
in much the same way that indefinite integration is done by hand.
A simple example of this is the integ,ration of polynomials. In Chapters 2
and 4 we examined methods for "symbolically" computing sums and products
of polynomials, with programs that work.ed on a particular representation for
the polynomials and produced the representation for the answers from the representation
for the inputs. The operation of integration (and differentiation)
of polynomials can also be done in this way. If a polynomial
79
8 0 CHAPTER 7
is represented simply by keeping the values of the coefficients in an array p
then the integral can be easily computed as follows:
!;;:I!downto 1 do p [i] :=p[i-II/i;
: ;
This is a direct implementation of the well-known symbolic integration
rule Jc tie' dt = xi/i for i > 0.
Obviously a wider class of functions than just polynomials can be handled
by adding more symbolic rules. The addition of composite rules such as
integration by parts,
/ udv=uv- s v du,
can greatly expand the set of functions which can be handled. (Integration
by parts requires a differentiation capability. Symbolic differentiation is somewhat
easier than symbolic integration, since a reasonable set of elementary
rules plus the composite chain rule will suffice for most common functions.)
The large number of rules available to be applied to a particular function
makes symbolic integration a difficult task. Indeed, it has only recently been
shown that there is an algorithm for this task: a procedure which either
returns the integral of any given function or says that the answer cannot be
expressed in terms of elementary functions. A description of this algorithm
in its full generality would be beyond the scope of this book. However,
when the functions being processed are from a small restricted class, symbolic
integration can be a powerful tool.
Of course, symbolic techniques have the fundamental limitation that
there are a great many integrals (many of which occur in practice) which can't
be evaluated symbolically. Next, we'll examine some techniques which have
been developed to compute approximations to the values of real integrals.
Simple Quadrature Methods
Perhaps the most obvious way to approximate the value of an integral is the
rectangle method: evaluating an integral is the same as computing the area
under a curve, and we can estimate the area under a curve by summing the
areas of small rectangles which nearly fit under the curve, as diagrammed
below.
INTEGRATION 81
To be precise, suppose that we are to compute Jab f(x)dx, and that the
interval [a, b] over which the integral is to be computed is divided into N
parts, delimited by the points x1, x2,. . . ,xN+l. Then we have N rectangles,
with the width of the ith rectangle (1 5 i 5 N)) given by x,+1 - x,. For the
height of the ith rectangle, we could use f(x,) or f(xi+l), but it would seem
that the result would be more accurate -if the value of f at the midpoint of
the interval (f((xi + xi+l)/2)) is used, as in the above diagram. This leads to
the quadrature formula
which estimates the value of the integral 'of f(x) over the interval from a = x1
to b = xN+l. In the common case where all the intervals are to be the same
size, say x$+1 - xi = 20, we have xi+1 + zz = (2i + l)w, so the approximation
r to the integral is easily computed.
function inf,rect(a, b: real; N: integer) : real;
var i: intieger; w, i-: real;
r:=O; w:=(b-a)/N;
for i:=l to N do r:=r+w*f(a-w,/2+i*w);
intrect :==r;
end ;
Of course, as N gets larger, the answer becomes more accurate, For
example, the following table shows the estimate produced by this function for
J: dxlx (which we know to be In 2 = 0.6931471805599.. . ) when invoked with
the call intrect(l.0,2.O,N) for N = 10,100,1000:
8 2 CHAPTER 7
10 0.6928353604100
100 0.6931440556283
1000 0.6931471493100
When N = 1000, our answer is accurate to about seven decimal places.
More sophisticated quadrature methods can achieve better accuracy with
much less work.
It is not difficult to derive an analytic expression for the error made in
the rectangle method by expanding f(z) in a Taylor series about the midpoint
of each interval, integrating, then summing over all intervals. We won't go
through the details of this calculation: our purpose is not to derive detailed
error bounds, but rather to show error estimates for simple methods and how
these estimates suggest more accurate methods. This can be appreciated even
by a reader not familiar with Taylor series. It turns out that
~bf(Z)dz=7+~313+W5e5f...
where w is the interval width ((b - a)/N) and es depends on the value of
the third derivative of f at the interval midpoints, etc. (Normally, this is
a good approximation because most "reasonable" functions have small highorder
derivatives, though this is not always true.) For example, if we choose
to make w = .Ol (which would correspond to N = 200 in the example above),
this formula says the integral computed by the procedure above should be
accurate to about six places.
Another way to approximate the integral is to divide the area under the
curve into trapezoids, as diagrammed below.
This trapezoid method leads to the quadrature formula
t = c (x,+1 -Xi)
f(G) + f(si+d
l<z<N 2 .
INTEGRATION 8 3
(Recall that the area of a trapezoid is one-half the product of the height and
the sum of the lengths of the two bases.) The error for this method can be
derived in a similar way as for the rectangle method. It turns out that
sp f(x) dx = t-2w3e3-4w5e5+ . . . .
Thus the rectangle method is twice as accurate as the trapezoid method.
This is borne out by our example. The following procedure implements the
trapezoid method in the common case where all the intervals are the same
width:
function inttrap(a, b: real; N: integer): real;
var i: integer; w, t: real;
t:=O; w:=(b-a)/N;
for i:=l to N do t:=t+w*(f(a+(i--l)*w)+f(a+i*w))/2;
inttrap:=t;
end ;
This procedure produces the following estimates for J12 dx/x:
10 0.6937714031754
100 0.6931534304818
1000 0.6931472430599
It may seem surprising at first that the rectangle method is more accurate
than the trapezoid method: the rectangles tend to fall partly under the curve,
partly over (so that the error can cancel out within an interval), while the
trapezoids tend to fall either completely under or completely over the curve.
Another perfectly reasonable method is spline quadrature: spline interpolation
is performed using methods we have discussed and then the integral
is computed by piecewise application of the trivial symbolic polynomial integration
technique described above. Bel'ow, we'll see how this relates to the
other methods.
Compound Methods
Examination of the formulas given above for the error of the rectangle and
trapezoid methods leads to a simple method with much greater accuracy,
called Simpson's method. The idea is to eliminate the leading term in the error
84 CHAPTER 7
by combining the two methods. Multiplying the formula for the rectangle
method by 2, adding the formula for the trapezoid method then dividing by
3 gives the equation
s~bJ(~)d5=~(2r+t-2w5t5+...).
The w3 term has disappeared, so this formula tells us that we can get a method
that is accurate to within w5 by combining the quadrature formulas in the
same way:
If an interval size of .Ol is used for Simpson's rule, then the integral can
be computed to about ten-place accuracy. Again, this is borne out in our
example. The implementation of Simpson's method is only slightly more
complicated than the others (again, we consider the case where the intervals
are the same width):
function intsimp(a, b: real; N: integer): real;
var i: integer; w, s: real;
s:=O; w:=(b-a)/N;
for i:=l to Ndo
s:=s+w*(f(a+(i-l)*w)+4*f(a-w/2+i*w)+f(a+i*w))/6;
intsimp:=s;
end ;
This program requires three "function evaluations" (rather than two) in the
inner loop, but it produces far more accurate results than do the previous two
methods.
10 0.6931473746651
100 0.6931471805795
1000 0.6931471805599
More complicated quadrature methods have been devised which gain
accuracy by combining simpler methods with similar errors. The most wellknown
is Romberg integration, which uses two different sets of subintervals
for its two "methods."
INTEGRATION 8 5
It turns out that Simpson's method is exactly equivalent to interpolating
the data to a piecewise quadratic function, then integrating. It is interesting
to note that the four methods we have discussed all can be cast as piecewise
interpolation methods: the rectangle rule interpolates to a constant (degree-O
polynomial); the trapezoid rule to a line (degree-l polynomial); Simpson's rule
to a quadratic polynomial; and spline qua.drature to a cubic polynomial.
Adaptive Quadrature
A major flaw in the methods that we have discussed so far is that the errors
involved depend not, only upon the subinterval size used, but also upon the
value of the high-order derivatives of the function being integrated. This
implies that the methods will not work well at all for certain functions (those
with large high-order derivatives). But few functions have large high-order
derivatives everywhere. It is reasonable to use small intervals where the
derivatives are large and large intervals where the derivatives are small. A
method which does this in a systematic way is called an adaptive quadrature
routine.
The general approach in adaptive quadrature is to use two different
quadrature methods for each subinterval, compare the results, and subdivide
the interval further if the difference is too great. Of course some care should
be exercised, since if two equally bad methods are used, they might agree quite
closely on a bad result. One way to avoid this is to ensure that one method
always overestimates the result and that the other always underestimates the
result,. Another way to avoid this is to ensure that one method is more accurate
than the other. A method of this type is described next.
There is significant overhead involved in recursively subdividing the interval,
so it pays to use a good method fo:r estimating the integrals, as in the
following implementation:
function adapt (a, b: real) : real;
if abs(intsimp(a, b, 10)-intsimp(a, b, 5))<tolerance
then adapt:=intsimp(a, b, 10)
else adapt:=adapt(a, (a+b)/2) + adapt((a+b)/2, b);
end;
Both estimates for the integral are derived from Simpson's method, one
using twice as many subdivisions as the other. Essentially, this amounts to
checking the accuracy of Simpson's method over the interval in question and
then subdividing if it is not good enough.
86 CHAPTER 7
Unlike our other methods, where we decide how much work we want
to do and then take whatever accuracy results, in adaptive quadrature we do
however much work is necessary to achieve a degree of accuracy that we decide
upon ahead of time. This means that tolerance must be chosen carefully,
so that the routine doesn't loop indefinitely to achieve an impossibly high
tolerance. The number of steps required depends very much on the nature of
the function being integrated. A function which fluctuates wildly will require
a large number of steps, but such a function would lead to a very inaccurate
answer for the "fixed interval" methods. A smooth function such as our
example can be handled with a reasonable number of steps. The following
table gives, for various values of t, the value produced and the number of
recursive calls required by the above routine to compute Jrz dx/x:
0.00001000000 0.6931473746651 1
0.00000010000 0.6931471829695 5
0.00000000100 0.6931471806413 13
0.00000000001 0.6931471805623 33
The above program can be improved in several ways. First, there's
certainly no need to call intsimp(a, b, IO) twice. In fact, the function values
for this call can be shared by intsimp(a, b, 5). Second, the tolerance bound
can be related to the accuracy of the answer more closely if the tolerance is
scaled by the ratio of the size of the current interval to the size of the full
interval. Also, a better routine can obviously be developed by using an even
better quadrature rule than Simpson's (but it is a basic law of recursion that
another adaptive routine wouldn't be a good idea). A sophisticated adaptive
quadrature routine can provide very accurate results for problems which can't
be handled any other way, but careful attention must be paid to the types of
functions to be processed.
We will be seeing several algorithms that have the same recursive structure
as the adaptive quadrature method given above. The general technique
of adapting simple methods to work hard only on difficult parts of complex
problems can be a powerful one in algorithm design.
r l
INTEGRATION 8 7
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Write a program to symbolically integrate (and differentiate) polynomials
in x and lnx. Use a recursive implementation based on integration by
parts.
Which quadrature method is likely to produce the best answer for integrating
the following functions: f(s) = 5x, f(x) = (3 - x)(4 + z), f(s) =
sin(x)?
Give the result of using each of the four elementary quadrature methods
(rectangle, trapezoid, Simpson's, spline) to integrate y = l/x in the interval
[.l,lO].
Answer the previous question for the function y = sinx.
Discuss what happens if adaptive quadrature is used to integrate the
function y = l/x in the interval [-1,2].
Answer the previous question for the elementary quadrature methods.
Give the points of evaluation when adaptive quadrature is used to integrate
the function y = l/s in the interval [.l,lO] with a tolerance of
.l.
Compare the accuracy of an adaptive quadrature based on Simpson's
method to an adaptive quadrature ba:sed on the rectangle method for the
integral given in the previous problent.
Answer the previous question for the function y = sinx.
Give a specific example of a function for which adaptive quadrature would
be likely to give a drastically more accurate result than the other methods.
88
SOURCES for Mathematical Algorithms
Much of the material in this section falls within the domain of numerical
analysis, and several excellent textbooks are available. One which pays
particular attention to computational issues is the 1977 book by Forsythe,
Malcomb and Moler. In particular, much of the material given here in Chapters
5, 6, and 7 is based on the presentation given in that book.
The second major reference for this section is the second volume of D. E.
Knuth's comprehensive treatment of "The Art of Computer Programming."
Knuth uses the term "seminumerical" to describe algorithms which lie at
the interface between numerical and symbolic computation, such as random
number generation and polynomial arithmetic. Among many other topics,
Knuths volume 2 covers in great depth the material given here in Chapters
1, 3, and 4. The 1975 book by Borodin and Munro is an additional reference
for Strassen's matrix multiplication method and related topics. Many of
the algorithms that we've considered (and many others, principally symbolic
methods as mentioned in Chapter 7) are embodied in a computer system called
MACSYMA, which is regularly used for serious mathematical work.
Certainly, a reader seeking more information on mathematical algorithms
should expect to find the topics treated at a much more advanced mathematical
level in the references than the material we've considered here.
Chapter 2 is concerned with elementary data structures, as well as polynomials.
Beyond the references mentioned in the previous part, a reader interested
in learning more about this subject might study how elementary data
structures are handled in modern programming languages such as Ada, which
have facilities for building abstract data structures.
A. Borodin and I. Munro, The Computational Complexity of Algebraic and
Numerical Problems, American Elsevier, New York, 1975.
G. E. Forsythe, M. A. Malcomb, and C. B. Moler, Computer Methods for
Mathematical Computations, Prentice-Hall, Englewood Cliffs, NJ, 1977.
D. E. Knuth, The Art of Computer Programming. Volume &: Seminumerical
Algorithms, Addison-Wesley, Reading, MA (second edition), 1981.
MIT Mathlab Group, MACSYMA Reference Manual, Laboratory for Computer
Science, Massachusetts Institute of Technology, 1977.
P. Wegner, Programming with ada: an introduction by means of graduated
examples, Prentice-Hall, Englewood Cliffs, NJ, 1980.
SORTING
.
.
. - . * . .
. .-.. .-. . - :
. . .
.
. -
.
.
. I . -
. . :
.:
.
8. Elementary Sorting Methods
As our first excursion into the area of sorting algorithms, we'll study
some "elementary" methods which are appropriate for small files or
files with some special structure. There are several reasons for studying these
simple sorting algorithms in some detail. First, they provide a relatively
painless way to learn terminology and basic mechanisms for sorting algorithms
so that we get an adequate background for studying the more sophisticated
algorithms. Second, there are a great ma.ny applications of sorting where it's
better to use these simple methods than the more powerful general-purpose
methods. Finally, some of the simple methods extend to better generalpurpose
methods or can be used to improve the efficiency of more powerful
methods. The most prominent example of this is seen in recursive sorts
which "divide and conquer" big files into many small ones. Obviously, it is
advantageous to know the best way to deal with small files in such situations.
As mentioned above, there are several sorting applications in which a
relatively simple algorithm may be the method of choice. Sorting programs
are often used only once (or only a few times). If the number of items to be
sorted is not too large (say, less than five hundred elements), it may well be
more efficient just to run a simple method than to implement and debug a
complicated method. Elementary metho'ds are always suitable for small files
(say, less than fifty elements); it is unlikely that a sophisticated algorithm
would be justified for a small file, unless a very large number of such files are to
be sorted. Other types of files that are relatively easy to sort are ones that are
already almost sorted (or already sorted!') or ones that contain large numbers
of equal keys. Simple methods can do much better on such well-structured
files than general-purpose methods.
As a rule, the elementary methods that we'll be discussing take about
N2 steps to sort N randomly arranged items. If N is small enough, this may
not be a problem, and if the items are not randomly arranged, some of the
91
92 CHAPTER 8
methods might run much faster than more sophisticated ones. However, it
must be emphasized that these methods (with one notable exception) should
not be used for large, randomly arranged files.
Rules of the Game
Before considering some specific algorithms, it will be useful to discuss some
general terminology and basic assumptions for sorting algorithms. We'll be
considering methods of sorting files of records containing keys. The keys,
which are only part of the records (often a small part), are used to control the
sort. The objective of the sorting method is to rearrange the records so that
their keys are in order according to some well-defined ordering rule (usually
numerical or alphabetical order).
If the file to be sorted will fit into memory (or, in our context, if it will
fit into a Pascal array), then the sorting method is called internal. Sorting
files from tape or disk is called external sorting. The main difference between
the two is that any record can easily be accessed in an internal sort, while
an external sort must access records sequentially, or at least in large blocks.
We'll look at a few external sorts in Chapter 13, but most of the algorithms
that we'll consider are internal sorts.
As usual, the main performance parameter that we'll be interested in is
the running time of our sorting algorithms. As mentioned above, the elementary
methods that we'll examine in this chapter require time proportional
to N2 to sort N items, while more advanced methods can sort N items in
time proportional to N log N. It can be shown that no sorting algorithm
can use less than N log N comparisons between keys, but we'll see that there
are methods that use digital properties of keys to get a total running time
proportional to N.
The amount of extra memory used by a sorting algorithm is the second
important factor we'll be considering. Basically, the methods divide into three
types: those that sort in place and use no extra memory except perhaps for
a small stack or table; those that use a linked-list representation and so use
N extra words of memory for list pointers; and those that need enough extra
memory to hold another copy of the array to be sorted.
A characteristic of sorting methods which is sometimes important in
practice is stability: a sorting method is called stable if it preserves the relative
order of equal keys in the file. For example, if an alphabetized class list is
sorted by grade, then a stable method will produce a list in which students
with the same grade are still in alphabetical order, but a non-stable method is
likely to produce a list with no evidence of the original alphabetic order. Most
of the simple methods are stable, but most of the well-known sophisticated
algorithms are not. If stability is vital, it can be forced by appending a
ELEMENTARY SORTING METHODS 93
small index to each key before sorting or 5y lengthening the sort key in some
other way. It is easy to take stability for granted: people often react to the
unpleasant effects of instability with disbelief. Actually there are few methods
which achieve stability without using significant extra time or space.
The following program, for sorting three records, is intended to illustrate
the general conventions that we'll be using. (In particular, the main program is
a peculiar way to exercise a program that is known to work only for N = 3: the
point is that most of the sorting programs we'll consider could be substituted
for sort3 in this "driver" program.)
program threesort(input, output);
con& maxN==100;
var a: array [l..maxN] of integer;
N, i: integer;
procedure sort3;
var t : integer;
if a[l]>a[2] then
begin t:-=a[l]; a[1
if a[l]>a[J] then
]:=a[2]; a[2]:=t end
begin t:=a[l]; a[l]:=a[3]; a[3]:=t end;
if a[2]>a[3] then
begin t:=a[2]; a[2]:=a[3]; a[3]:=t end;
end;
readln (N) ;
for i:=l to N do read(a[i]);
if N=3 then sort3;
for i:=l to N do write(a[i]);
wri teln
end.
The three assignment statements following each if actually implement an
"exchange" operation. We'll write out the code for such exchanges rather than
use a procedure call because they're fundamental to many sorting programs
and often fall in the inner loop.
In order to concentrate on algorithmjc issues, we'll work with algorithms
that simply sort arrays of integers into numerical order. It is generally straightforward
to adapt such algorithms for use in a practical application involving
large keys or records. Basically, sorting programs access records in one of two
ways: either keys are accessed for comparison, or entire records are accessed
94 CHAPTER 8
to be moved. Most of the algorithms that we will study can be recast in terms
of performing these two operations on arbitrary records. If the records to be
sorted are large, it is normally wise to do an "indirect sort": here the records
themselves are not necessarily rearranged, but rather an array of pointers (or
indices) is rearranged so that the first pointer points to the smallest record,
etc. The keys can be kept either with the records (if they are large) or with
the pointers (if they are small).
By using programs which simply operate on a global array, we're ignoring
"packaging problems" that can be troublesome in some programming environments.
Should the array be passed to the sorting routine as a parameter?
Can the same sorting routine be used to sort arrays of integers and arrays
of reals (and arrays of arbitrarily complex records)? Even with our simple
assumptions, we must (as usual) circumvent the lack of dynamic array sizes
in Pascal by predeclaring a maximum. Such concerns will be easier to deal
with in programming environments of the future than in those of the past
and present. For example, some modern languages have quite well-developed
facilities for packaging together programs into large systems. On the other
hand, such mechanisms are not truly required for many applications: small
programs which work directly on global arrays have many uses; and some
operating systems make it quite easy to put together simple programs like
the one above, which serve as "filters" between their input and their output.
Obviously, these comments apply to many of the other algorithms that we'll
be examining, though their effects are perhaps most acutely felt for sorting
algorithms.
Some of the programs use a few other global variables. Declarations
which are not obvious will be included with the program code. Also, we'll
sometimes assume that the array bounds go to 0 or iV+1, to hold special keys
used by some of the algorithms. We'll frequently use letters from the alphabet
rather than numbers for examples: these are handled in the obvious way using
Pascal's ord and chr "transfer functions" between integers and characters.
The sort3 program above uses an even more constrained access to the file:
it is three instructions of the form "compare two records and exchange them
if necessary to put the one with the smaller key first." Programs which use
only this type of instruction are interesting because they are well suited for
hardware implementation. We'll study this issue in more detail in Chapter
35.
Selection Sort
One of the simplest sorting algorithms works as follows: first find the smallest
element in the array and exchange it with the element in the first position,
then find the second smallest element and exchange it with the element in
ELEMENTARY SORTING METHODS 95
the second position, continuing in this way until the entire array is sorted.
This method is called selection sort because it works by repeatedly "selecting"
the smallest remaining element. The following program sorts a [1..N] into
numerical order:
procedure selection;
var i, j, min, t: integer;
for i:=l to N do
min:=i;
for j:=i+l to N do
if ab]<a[min] then min:=j;
t:=a[min]; a[min]:=a[i]; a[i]:=t
end ;
end ;
This is among the simplest of sorting methods, and it will work very well for
small files. Its running time is proportional to N2: the number of comparisons
between array elements is about N2/2 since the outer loop (on i) is executed N
times and the inner loop (on j) is executed about N/2 times on the average. It
turns out that the statement min:=j is executed only on the order of N log N
times, so it is not part of the inner loop
Despite its simplicity, selection sort has a quite important application:
it is the method of choice for sorting files with very large records and small
keys. If the records are M words long (but the keys are only a few words long),
then the exchange takes time proportional to M, so the total running time
is proportional to N2 (for the comparisons) plus NM (for the exchanges). If
M is proportional to N then the running time is linear in the amount of data
input, which is difficult to beat even with an advanced method. Of course if
it is not absolutely required that the records be actually rearranged, then an
"indirect sort" can be used to avoid the NM term entirely, so a method which
uses less comparisons would be justified. Still selection sort is quite attractive
for sorting (say) a thousand lOOO-word records on one-word keys.
Insertion Sort
An algorithm almost as simple as selection sort but perhaps more flexible is
insertion sort. This is the method often used by people to sort bridge hands:
consider the elements one at a time, inserting each in its proper place among
those already considered (keeping them s.orted). The element being considered
is inserted merely by moving larger elements one position to the right, then
96 ChXPTER 8
inserting the element into the vacated position. The code for this algorithm
is straightforward:
procedure insertion;
var i, j, v: integer;
for i:=2 to N do
v:=a[i]; j:=i;
while ab-1]>v do
begin ab] :=ab-11; j:=j-1 end;
ab] :=v
end ;
end ;
As is, this code doesn't work, because the while will run past the left end
of the array if t is the smallest element in the array. One way to fix this is
to put a "sentinel" key in a[O], making it at least as small as the smallest
element in the array. Using sentinels in situations like this is common in
sorting programs to avoid including a test (in this case j>l) which almost
always succeeds within the inner loop. If for some reason it is inconvenient to
use a sentinel and the array really must have the bounds [1..N], then standard
Pascal does not allow a clean alternative, since it does not have a "conditional"
and instruction: the test while (j>l) and (a1j-l]>v) won't work because
even when j=l, the second part of the and will be evaluated and will cause
an o&of-bounds array access. A goto out of the loop seems to be required.
(Some programmers prefer to goto some lengths to avoid goto instructions,
for example by performing an action within the loop to ensure that the loop
terminates. In this case, such a solution seems hardly justified, since it makes
the program no clearer, and it adds extra overhead everytime through the
loop to guard against a rare event.)
On the average, the inner loop of insertion sort is executed about N2/2
times: The "average" insertion goes about halfway into a subfile of size N/2.
This is inherent in the method. The point of insertion can be found more
efficiently using the searching techniques in Chapter 14, but N2/2 moves (to
make room for each element being inserted) are still required; or the number
of moves can be lowered by using a linked list instead of an array, but then
the methods of Chapter 14 don't apply and N2/2 comparisons are required
(to find each insertion point).
ELEMENTARY SORTING METHODS 97
Shellsort
Insertion sort is slow because it exchang,es only adjacent elements. For example,
if the smallest element happens to be at the end of the array, it takes
N steps to get it where it belongs. Shellsort is a simple extension of insertion
sort which gets around this problem by allowing exchanges of elements that
are far apart.
If we replace every occurrence of "1" by "h" (and "2" by "h+l") in
insertion sort, the resulting program rearranges a file to give it the property
that taking every hth element (starting anywhere) yields a sorted file. Such a
file is said to be h-sorted. Put another way, an h-sorted file is h independent
sorted files, interleaved together. By h-sorting for some large values of h, we
can move elements in the array long distances and thus make it easier to h-sort
for smaller values of h. Using such a procedure for any sequence of values of
h which ends in 1 will produce a sorted file: this is Shellsort.
The following example shows how a sample file of fifteen elements is
sorted using the increments 13, 4, 1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A S 0 R T I N G E X A M P L E
13
A E 0 R T I N C: E X A M P L S
4
A E A G E I N M P L 0 R T X S
1
A A E E G I L h4 N 0 P R S T X
In the first pass, the A in position 1 is compared to the L in position 14, then
the S in position 2 is compared (and exchanged) with the E in position 15. In
the second pass, the A T E P in positions 1, 5, 9, and 13 are rearranged to
put A E P T in those positions, and similarly for positions 2, 6, 10, and 14,
etc. The last pass is just insertion sort, but no element has to move very far.
The above description of how Shellsort gains efficiency is necessarily
imprecise because no one has been able to analyze the algorithm. Some
sequences of values of h work better than others, but no explanation for this
has been discovered. A sequence which has been shown empirically to do well
is . . . ,1093,364,121,40,13,4,1, as in the following program:
98 CHAPTER 8
procedure shellsort;
label 0;
var i, j, h, v: integer;
h:=l; repeat h:=3*h+l until h>N;
repeat
h:=h div 3;
for i:=h+l to N do
v:=a[i]; j:=i;
while ab-h]>v do
a[j]:=ab-h]; j : = j - h ;
if j<=h then goto 0
end ;
0: ab]:=v
end ;
until h= 1;
end ;
Note that sentinels are not used because there would have to be h of them,
for the largest value of h used.
The increment sequence in this program is easy to use and leads to an
efficient sort. There are many other increment sequences which lead to a
more efficient sort (the reader might be amused to try to discover one), but it
is difficult to beat the above program by more than 20% even for relatively
large N. (The possibility that much better increment sequences exist is still,
however, quite real.) On the other hand, there are some bad increment
sequences. Shellsort is sometimes implemented by starting at h=N (instead
of initializing so as to ensure the same sequence is always used as above). This
virtually ensures that a bad sequence will turn up for some N.
Comparing Shellsort with other methods analytically is difficult because
the functional form of the running time for Shellsort is not, even known (and
depends on the increment sequence). For the above program, two conjectures
a r e N(logN)2 a n d 1N2 5 . The running time is not particularly sensitive to
the initial ordering of the file, especially in contrast to, say, insertion sort,
which is linear for a file already in order but quadratic for a file in reverse
order.
Shellsort is the method of choice for many sorting applications because it
has acceptable running time even for moderately large files (say, five thousand
elements) and requires only a very srnall amount of code, which is easy to get
ELEMENTARY SORTING METHODS 99
working. We'll see methods that are more efficient in the next few chapters,
but they're perhaps only twice as fast (if that much) except for large N, and
they're significantly more complicated. In short, if you have a sorting problem,
use the above program, then determine vvhether the extra effort required to
replace it with a sophisticated method will be worthwhile. (On the other
hand, the Quicksort algorithm of the next chapter is not that much more
difficult to implement. . . )
Digression: Bubble Sort
An elementary sorting method that is often taught in introductory classes is
bubble sort: keep passing through the file, exchanging adjacent elements, if
necessary; when no exchanges are required on some pass, the file is sorted.
An implementation of this method is given below.
procedure bubblesort;
var j, t: integer;
repeat
t:=a[l];
for j:=2 to N do
if ab--l]>alj] then
begin t:=:a[j-I]; ab-1]:=ab]; a[j]:=t end
until t=a[l];
end ;
It takes a moment's reflection to convince oneself first that this works at all,
second that the running time is quadratic. It is not clear why this method
is so often taught, since insertion sort seems simpler and more efficient by
almost any measure. The inner loop of bubble sort has about twice as many
instructions as either insertion sort or selection sort.
Distribution Counting
A very special situation for which there is a simple sorting algorithm is the
following: "sort a file of N records whose keys are distinct integers between 1
and N." The algorithm for this problem is
for i:=l to N do t[a[i]]:=a[i];
for i:=l to N do a[i]:=t[i];
100 CHAPTER 8
This algorithm uses a temporary array t. It is possible (but much more
complicated) to solve this problem without an auxiliary array.
A more realistic problem solved by an algorithm in the same spirit is:
"sort a file of N records whose keys are integers between 0 and M - 1." If M
is not too large, an algorithm called distribution counting can be used to solve
this problem. The idea is to count the number of keys with each value, then
use the counts to move the records into position on a second pass through the
file, as in the following code:
for j:=O to M-l do countlj]:=O;
for i:=l to Ndo
count[a[i]]:=count[a[i]]+l;
for j:=l to M-l do
count b] :=count Ij-l]+count b] ;
for i:=N downto 1 do
t[count[a[i]]]:=a[i];
count[a[i]]:=count[a[i]]-1
end ;
for i:=l to N do a[i]:=t[i];
To see how this code works, consider the following sample file of integers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
3 110 3 7 5 5 2 4 2 10 2 6 4
The first for loop initializes the counts to 0; the second produces the counts
0 1 2 3 4 5 6 7
2 3 3 2 2 2 1 1
This says that there are two O's, three l's, etc. The third for loop adds these
numbers to produce
0 1 2 3 4 5 6 7
2 5 8 10 12 14 15 16
That is, there are two numbers less than 1, five numbers less than 2, etc.
Now, these can be used as addresses to sort the array:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 0 1112 2 2 3 3 4 4 5 5 6 7
ELEMENTARY SORTING METHODS 101
For example, when the 4 at the end of the file is encountered, it's put into
location 12, since count[4] says that there are 12 keys less than or equal to
4. Then count[4] is decremented, since there's now one less key less than or
equal to 4. The inner loop goes from N down to 1 so that the sort will be
stable. (The reader may wish to check this.)
This method will work very well for the type of files postulated. Furthermore,
it can be extended to produce a much more powerful method that we'll
examine in Chapter 10.
Non-Random Files
We usually think of sorting files that are in some arbitrarily scrambled order.
However, it is quite often the case that we have a lot of information about a
file to be sorted. For example, one often wants to add a few elements to a
sorted file and thus produce a larger sorted file. One way to do so is to simply
append the new elements to the end of t.he file, then call a sorting algorithm.
General-purpose sorts are commonly mi'sused for such applications; actually,
elementary methods can take advantage of the order present in the file.
For example, consider the operation of insertion sort on a file which is
already sorted. Each element is immediately determined to be in its proper in
the file, and the total running time is linear. The same is true for bubble sort,
but selection sort is still quadratic. (The leading term in the running time of
selection sort does not depend on the order in the file to be sorted.)
Even if a file is not completely sorted, insertion sort can be quite useful
because the running time of insertion sort depends quite heavily on the order
present in the file. The running time depends on the number of inversions: for
each element count up the number of e:iements to its left which are greater.
This is the distance the elements have to move when inserted into the file
during insertion sort. A file which has some order in it will have fewer
inversions in it than one which is arbitrarily scrambled.
The example cited above of a file formed by tacking a few new elenients
onto a large sorted file is clearly a case where the number of the inversions
is low: a file which has only a constant number of elements out of place will
have only a linear number of inversions. Another example is a file where each
element is only a constant distance front its final position. Files like this can
be created in the initial stages of some advanced sorting methods: at a certain
point it is worthwhile to switch over to jnsertion sort.
In short, insertion sort is the method of choice for "almost sorted" files
with few inversions: for such files, it will outperform even the sophisticated
methods in the next few chapters.
102
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Give a sequence of "compare-exchange" operations for sorting four records.
Which of the three elementary methods runs fastest for a file which is
already sorted?
Which of the three elementary methods runs fastest for a file in reverse
order?
Test the hypothesis that selection sort is the fastest of the three elementary
methods, then insertion sort, then bubble sort.
Give a good reason why it might be inconvenient to use a sentinel key for
insertion sort (aside from the one that comes up in the implementation
of Shellsort).
How many comparisons are used by Shellsort to 7-sort, then S-sort the
keys EASYQUESTION?
Give an example to show why 8,4,2,1 would not be a good way to finish
off a Shellsort increment sequence.
Is selection sort stable? How about insertion sort and bubble sort?
Give a specialized version of distribution counting for sorting files where
elements have only one of two values (x or y).
Experiment with different increment sequences for Shellsort: find one that
runs faster than the one given for a random file of 1000 elements.
9. Quicksort
In this chapter, we'll study the sorting algorithm which is probably
more widely used than any other, Quicksort. The basic algorithm was
invented in 1960 by C. A. R. Hoare, and it has been studied by many people
since that time. Quicksort is popular because it's not difficult to implement,
it's a good "general-purpose" sort (works well in a variety of situations), and
it consumes less resources than any other sorting method in many situations.
The desirable features of the Quicksort algorithm are that it is in-place
(uses only a small auxiliary stack), requires only about NlogN operations
on the average to sort N items, and has an extremely short inner loop.
The drawbacks of the algorithm are that it is recursive (implementation is
complicated if recursion is not available), has a worst case where it takes
about N2 operations, and is fragile: a simple mistake in the implementation
might go unnoticed and could cause it tea perform badly for some files.
The performance of Quicksort is very well understood. It has been
subjected to a thorough mathematical analysis and very precise statements
can be made about performance issues. The analysis has been verified by
extensive empirical experience, and the algorithm has been refined to the
point where it is the method of choice in a broad variety of practical sorting
applications. This makes it worthwhile to look somewhat more carefully at
ways of efficiently implementing Quicksort than we have for other algorithms.
Similar implementation techniques are appropriate for other algorithms; with
Quicksort we can use them with confidence because the performance is so well
understood.
It is tempting to try to develop ways to improve Quicksort: a faster
sorting algorithm is computer science's "better mousetrap." Almost from the
moment Hoare first published the algorithm, "improved" versions have been
appearing in the literature. Many ideas have been tried and analyzed, but
it is easy to be deceived, because the algorithm is so well balanced that the
103
104 CHAPTER 9
effects of improvements in one part of the program can be more than offset by
the effects of bad performance in another part of the program. We'll examine
in some detail three modifications which do improve Quicksort substantially.
A carefully tuned version of Quicksort is likely to run significantly faster
than any other sorting method on most computers. However, it must be
cautioned that tuning any algorithm can make it more fragile, leading to
undesirable and unexpected effects for some inputs. Once a version has been
developed which seems free of such effects, this is likely to be the program to
use for a library sort utility or for a serious sorting application. But if one is
not willing to invest the effort to be sure that a Quicksort implementation is
not flawed, Shellsort is a much safer choice and will perform adequately for
significantly less implementation effort.
The Basic Algorithm
Quicksort is a "divide-and-conquer" method for sorting. It works by partitioning
a file into two parts, then sorting the parts independently. As we will see,
the exact position of the partition depends on the file, so the algorithm has
the following recursive structure:
procedure quicksort(l, r: integer);
var i;
if r>l then
i:=:partition(1, r)
quicksort (1, i- 1) ;
quicksort(i+l, r);
end
end ;
The parameters I and r delimit the subfile within the original file that is to
be sorted: the call quicksort(l, N) sorts the whole file.
The crux of the method is the partition procedure, which must rearrange
the array to make the following three conditions hold:
(i) the element a[i] is in its final place in the array for some i,
(ii) all the elements in a[]],. . . ,a[i-l] are less than or equal to a[i],
(iii) all the elements in a[i+l], . . . ,a[r] are greater than or equal to a[i].
This can be simply and easily implemented through the following general
strategy. First, arbitrarily choose a[r] to be the element that will go into
QUICKSORT 105
its final position. Next, scan from the left end of the array until finding
an element greater than a[r] and scan from the right end of the array until
finding an element less than a[r]. The two elements which stopped the scans
are obviously out of place in the final p,srtitioned array, so exchange them.
(Actually, it turns out, for reasons described below, to be best to also stop the
scans for elements equal to a[r], even though this might seem to involve some
unnecessary exhanges.) Cont,inuing in this way ensures that all array elements
to the left of the left pointer are less than a[r], and array elements to the right
of the right pointer are greater than a [r] . When the scan pointers cross, the
partitioning process is nearly complete: all that remains is to exchange a[r]
with the leftmost element of the right subfile.
The following table shows how our sample file of keys is partitioned using
this method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A S 0 R T INGEXAMPLH
A A S M P Lm
A A E OXSMPLN
A A EHTI N G O X S M P L R
The rightmost element, E, is chosen as the partitioning element. First
the scan from the left stops at the S, then the scan from the right stops at the
A, then these two are exchanged, as shown on the second line of the table.
Next the scan from the left stops at the 0, then the scan from the right stops
at the E, then these two are exchanged, as shown on the third line of the
table. Next the pointers cross. The scan from the left stops at the R, and
the scan from the right stops at the E. The proper move at this point is to
exchange the E at the right with the R, leaving the partitioned file shown on
the last line of the table. The sort is finished by sorting the two subfiles on
either side of the partitioning element (recursively).
The following program gives a full implementation of the method.
106 CHAPTER 9
procedure quicksort(1, r: integer) ;
var v, t, i, j: integer;
if r>l then
v:=a[r]; i:=I-I; j:=r;
repeat
repeat i:=i+l until a[i]>=v;
repeat j:=j-1 until ab]<=v;
t:=a[i]; a[i]:=alj]; ab]:=t;
until j<=i;
a[j]:=a[i]; a[i]:=a[r]; a[rJ:=t;
quicksort(l, i-l);
quicksort(i+l, r)
end
end;
In this implementation, the variable v holds the current value of the "partitioning
element" a[r], and i and j are the left and right scan pointers, respectively.
An extra exchange of a[i] with a b] is done with j<i just after the pointers cross
but before the crossing is detected and the outer repeat loop exited. (This
could be avoided with a goto.) The three assignment statements following that
loop implement the exchanges a[i] with a[j] (to undo the extra exchange) and
a[i] with a[r] (to put the partitioning element into position).
As in insertion sort, a sentinel key is needed to stop the scan in the
case that the partitioning element is the smallest element in the file. In this
implementation, no sentinel is needed to stop the scan when the partitioning
element is the largest element in the file, because the partitioning element
itself is at the right end of the file to stop the scan. We'll shortly see an easy
way to avoid having either sentinel key.
The "inner loop" of Quicksort consists simply of incrementing a pointer
and comparing an array element against a fixed value. This is really what
makes Quicksort quick: it's hard to imagine a simpler inner loop.
Now the two subfiles are sorted recursively, finishing the sort. The
following table traces through these recursive calls. Each line depicts the result
of partitioning the displayed subfile, using the boxed partitioning element.
QUICKSORT 107
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A S O R T I N G E X A M P L E
AAEmTI N G O X S M P L R
A A(El
0A A
I.4
L I N G 0 P MUX T S
L IGmOPN
LG- l I L
@I
LI-l
0N P 0
u0 P
0 P
cSl T X
w
A A E E G I L M N O P R S T X
Note that every element is (eventually) put into place by being used as a
partitioning element.
The most disturbing feature of the program above is that it runs very
inefficiently on simple files. For example, if it is called with a file that is already
sorted, the partitions will be degenerate, and the program will call itself N
times, only knocking off one element for each call. This means not only
that the time required will be about N2/2, but also that the space required
to handle the recursion will be about N (see below), which is unacceptable.
Fortunately, there are relatively easy ways to ensure that this worst case
doesn't occur in actual applications of the program.
When equal keys are present in the file, two subtleties become apparent.
First, there is the question of whether to have both pointers stop on keys
CHAPTER 9
equal to the partitioning element, or to have one pointer stop and the other
scan over them, or to have both pointers scan over them. This question has
actually been studied in some detail mathematically, with the result that
it's best to have both pointers stop. This tends to balance the partitions in
the presence of many equal keys. Second, there is the question of properly
handling the pointer crossing in the presence of equal keys. Actually, the
program above can be slightly improved by terminating the scans when j<i,
then using quicksort(l, j) for the first recursive call. This is an improvement
because when j=i we can put two elements into position with the partitioning,
by letting the loop iterate one more time. (This case occurs, for example, if R
were E in the example above.) It is probably worth making this change because
the program given leaves a record with a key equal to the partitioning key in
a[r], which makes the first partition in the call quicksort(i+l, r) degenerate
because its rightmost key is its smallest. The implementation of partitioning
given above is a bit easier to understand, so we'll leave it as is in the discussions
below, with the understanding that this change should be made when large
numbers of equal keys are present.
The best thing that could happen would be for each partitioning stage to
divide the file exactly in half. This would make the number of comparisons
used by Quicksort satisfy the divide-and-conquer recurrence
C(N) = 2C(N/2) + N.
(The 2C(N/2) covers the cost of doing the two subfiles; the N is the cost of
examining each element, using one partitioning pointer or the other.) From
Chapter 4, we know this recurrence has the solution
C(N) z N lg iv.
Though things don't always go this well, it is true that the partition falls
in the middle on the average. Taking the precise probability of each partition
position into account makes the recurrence more complicated, and more
difficult to solve, but the final result is similar. It turns out that
C(N) 25 2NlnN,
which implies that the total running time of Quicksort is proportional to
N log N (on the average).
Thus, the implementation above will perform very well for many applications
and is a very reasonable general-purpose sort. However, if the sort is to
be used a great many times, or if it is to be used to sort a very large file, then
it might be worthwhile to implement several of the improvements discussed
below which can ensure that the worst case won't occur, reduce the average
running time by 20-30%, and easily eliminate the need for a sentinel key.
QUICKSORT 109
Removing Recursion
In Chapter 1 we saw that the recursive call could be removed from Euclid's
algorithm to yield a non-recursive program controlled by a simple loop. This
can be done for other programs with one recursive call, but the situation
is more complicated when two or more recursive calls are involved, as in
Quicksort. Before dealing with one recursive call, enough information must
be saved to allow processing of later recursive calls.
The Pascal programming environment uses a pushdown stack to manage
this. Each time a procedure call is made, the values of all the variables are
pushed onto the stack (saved). Each time a procedure returns, the stack is
popped: the information that was most recently put on it is removed.
A stack may be represented as a linked list, in which case a push is
implemented by linking a new node onto the front of the list and a pop
by removing the first node on the list, or as an array, in which case a
pointer is maintained which points to the top of the stack, so that a push
is implemented by storing the information and incrementing the pointer, and
a pop by decrementing the pointer and retrieving the information.
There is a companion data structure called a queue, where items are
returned in the order they were added. In a linked list implementation of
a queue new items are added at the end, not the beginning. The array
implementation of queues is slightly more complicated. Later in this book
we'll see other examples of data structures which support the twin operations
of inserting new items and deleting items according to a prescribed rule (most
notably in Chapters 11 and 20).
When we use recursive calls, the values of all variables are saved on an
implicit stack by the programming environment; when we want an improved
program, we use an explicit stack and save only necessary information. It
is usually possible to determine which variables must be saved by examining
the program carefully; another approach is to rework the algorithm based on
using an explicit stack rather than explicit recursion.
This second approach is particularly appropriate for Quicksort and many
similar algorithms. We think of the stack as containing "work to be done,"
in the form of subfiles to be sorted. Any time we need a subfile to process,
we pop the stack. When we partition, we create two subfiles to be processed,
which can be pushed on the stack. This leads to the following non-recursive
implementation of Quicksort:
110 CHAPTER 9
procedure quicksort;
var t, i, 1, r: integer;
stack: array[O..M] of integer; p: integer;
1:=1; r:=N; p:=2;
repeat
if r>l then
i:=partition(l, r);
if (i-l)> (r-i)
then begin stack[p] :=I; stack[p+l] :=i-I; I:=i+I end
else begin stack[p] :=i+l; stack[p+l] :=r; r:=i-I end;
p:=p+2;
end
else
begin p:=p-2; I:=stack[p]; r:=stack[p+I] end;
until p=O
end;
This program differs from the description above in two important ways. First,,
rather than simply putting two subfiles on the stack in some arbitrary order,
their sizes are checked and the larger of the two is put on the stack first.
Second, the smaller of the two subfiles is not put on the stack at all; the values
of the parameters are simply reset,, just as we did for Euclid's algorithm. This
technique, called "end-recursion removal" can be applied to any procedure
whose last action is a recursive call. For Quicksort, the combination of endrecursion
removal and a policy of processing the smaller of the two subfiles
first turns out to ensure that the stack need only contain room for about, lg N
entries, since each entry on the stack after the top one must represent a subfile
less than half the size of the previous entry.
This is in sharp contrast to the size of the stack in the worst case in the
recursive implementation, which could be as large as N (for example, in the
case that the file is already sorted). This is a subtle but real difficulty with
a recursive implementation of Quicksort: there's always an underlying stack,
and a degenerate case on a large file could cause the program to terminate
abnormally because of lack of memory. This behavior is obviously undesirable
for a library sorting routine. Below we'll see ways to make degenerate cases
extremely unlikely, but, there's no way to avoid this problem completely in
a recursive implementation (even switching the order in which subfiles are
processed doesn't help, without end-recursion removal).
Of course the non-recursive method processes the same subfiles as the
QUICKSORT 111
recursive method for our example; it just does them in a different order, as
shown in the following table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A S O R T I N G E X A M P L E
AAEIE/I N G O X S M P L R
A A[El
uA A
0 A
L I N G 0 P M 0R X T
ES l T
T
S
X
L I GmO P N
0G I L
al
0 I
0N P 0
00 P
PA
A A E E G I L M N O P R S T X
The simple use of an explicit stack above leads to a far more efficient
program than the direct recursive implementation, but there is still overhead
that could be removed. The problem is that, if both subfiles have only one
element, entries with r-1 are put on the stack only to be immediately taken
off and discarded. It is straightforward to change the program to simply not
put any such files on the stack. This change is more important when the next
improvement is included, which involves ignoring small subfiles in the same
way.
112 CHAPTER 9
Small Subfiles
The second improvement stems from the observation that a recursive program
is guaranteed to call itself for many small subfiles, so it should be changed to
use a better method when small subfiles are encountered. One obvious way to
do this is to change the test at the beginning of the recursive routine from "if
r>l then" to a call on insertion sort (modified to accept parameters defining
the subfile to be sorted), that is "if r-l <= M then insertion(1, r)." Here M
is some parameter whose exact value depends upon the implementation. The
value chosen for M need not be the best possible: the algorithm works about
the same for M in the range from about 5 to about 25. The reduction in the
running time is on the order of 20% for most applications.
A slightly easier method, which is also slightly more efficient, is to just
change the test at the beginning to "if r-l > M then": that is, simply ignore
small subfiles during partitioning. In the non-recursive implementation, this
would be done by not putting any files of less than M on the stack. After
partitioning, what is left is a file that is almost sorted. As mentioned in the
previous chapter, insertion sort is the method of choice for such files. That
is, insertion sort will work about as well for such a file as for the collection of
little files that it would get if it were being used directly. This method should
be used with caution, because the insertion sort is likely always to sort even
if the Quicksort has a bug which causes it not to work at all. The excessive
cost may be the only sign that something went wrong.
Median-of- Three Partitioning
The third improvement is to use a better partitioning element. There are
several possibilities here. The safest thing to do to avoid the worst case would
be to use a random element from the array for a partitioning element. Then
the worst case will happen with negligibly small probability. This is a simple
example of a "probabilistic algorithm," which uses randomness to achieve
good performance almost always, regardless of the arrangement of the input.
This can be a useful tool in algorithm design, especially if some bias in the
input is suspectred. However, for Quicksort it is probably overkill to put a full
random-number generator in just for this purpose: an arbitrary number will
do just as well.
A more useful improvement is to take three elements from the file, then
use the median of the three for the partilioning element. If the three elements
chosen are from the left,, middle, and right of the array, then the use of
sentinels can be avoided as follows: sort the three elements (using the threeexchange
method in the last chapter), then exchange the one in the middle
with air-l], then run the partitioning algorithm on a[1+1, . . ..r-21. This
improvement is called the median-of-three partitioning method.
QUICKSORT 113
The median-of-three method helps Quicksort in three ways. First, it
makes the worst case much more unlikely to occur in any actual sort. In order
for the sort to take N2 time, two out of the three elements examined must be
among the largest or among the smallest elements in the file, and this must
happen consistently through most of the partitions. Second, it eliminates the
need for a sentinel key for partitioning, since this function is served by the
three elements examined before partitioning. Third, it actually reduces the
total running time of the algorithm by about 5%.
The combination of a nonrecursive implementation of the median-ofthree
method with a cutoff for small subfiles can improve the running time of
Quicksort from the naive recursive implementation by 25% to 30%. Further
algorithmic improvements are possible (for example the median of five or more
elements could be used), but the amount of time saved will be marginal. More
significant time savings can be realized (with less effort) by coding the inner
loops (or the whole program) in assembly or machine language. Neither path
is recommended except for experts with serious sorting applications.
114
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Implement a recursive Quicksort with a cutoff to insertion sort for subfiles
with less than M elements and empirically determine the value of M for
which it runs fastest on a random file of 1000 elements.
Solve the previous problem for a nonrecursive implementation.
Solve the previous problem also incorporating the median-of-three improvement.
About how long will Quicksort take to sort a file of N equal elements?
What is the maximum number of times that the largest element could be
moved during the execution of Quicksort?
Show how the file ABABABA is partitioned, using the two methods
suggested in the text.
How many comparisons does Quicksort use to sort the keys EASY QUE
STION?
How many "sentinel" keys are needed if insertion sort is called directly
from within Quicksort?
Would it be reasonable to use a queue instead of a stack for a non-recursive
implementation of Quicksort? Why or why not?
Use a least squares curvefitter to find values of a and b that give the
best formula of the form aN In N + bN for describing the total number
of instructions executed when Quicksort is run on a random file.
10. Radix Sorting
The "keys" used to define the order of the records in files for many
sorting applications can be very complicated. (For example, consider
the ordering function used in the telephone book or a library catalogue.)
Because of this, it is reasonable to define sorting methods in terms of the
basic operations of "comparing" two keys and "exchanging" two records.
Most of the methods we have studied can be described in terms of these two
fundamental operations. For many applications, however, it is possible to
take advantage of the fact that the keys can be thought of as numbers from
some restricted range. Sorting methods which take advantage of the digital
properties of these numbers are called radix sorts. These methods do not just
compare keys: they process and compare pieces of keys.
Radix sorting algorithms treat the keys as numbers represented in a
base-M number system, for different values of M (the radix) and work with
individual digits of the numbers. For example, consider an imaginary problem
where a clerk must sort a pile of cards with three-digit numbers printed on
them. One reasonable way for him to proceed is to make ten piles: one for
the numbers less than 100, one for the numbers between 100 and 199, etc.,
place the cards in the piles, then deal with the piles individually, either by
using the same method on the next dig:t or by using some simpler method
if there are only a few cards. This is a slimple example of a radix sort with
M = 10. We'll examine this and some other methods in detail in this chapter.
Of course, with most computers it's more convenient to work with M = 2 (or
some power of 2) rather than M = 10.
Anything that's represented inside a digital computer can be treated
as a binary number, so many sorting applications can be recast to make
feasible the use of radix sorts operating on keys which are binary numbers.
Unfortunately, Pascal and many other lari.guages intentionally make it difficult
to write a program that depends on the binary representation of numbers.
115
116 CYHPTER 10
(The reason is that Pascal is intended to be a language for expressing programs
in a machine-independent manner, and different computers may use different
representations for the same numbers.) This philosophy eliminates many types
of "bit-flicking" techniques in situations better handled by fundamental Pascal
constructs such as records and sets, but radix sorting seems to be a casualty of
this progressive philosophy. Fortunately, it's not too difficult to use arithmetic
operations to simulate the operations needed, and so we'll be able to write
(inefficient) Pascal programs to describe the algorithms that can be easily
translated to efficient programs in programming languages that support bit
operations on binary numbers.
Given a (key represented as a) binary number, the fundamental operation
needed for radix sorts is extracting a contiguous set of bits from the number.
Suppose we are to process keys which we know to be integers between 0 and
1000. We may assume that these are represented by ten-bit binary numbers.
In machine language, bits are extracted from binary numbers by using bitwise
"and" operations and shifts. For example, the leading two bits of a ten-bit
number are extracted by shifting right eight bit positions, then doing a bitwise
"and" with the mask 0000000011. In Pascal, these operations can be simulated
with div and mod. For example, the leading two bits of a ten-bit number x
are given by (x div 256)mod 4. In general, "shift 2 right k bit positions"
can be simulated by computing x div 2k, and "zero all but the j rightmost
bits of x" can be simulated by computing x mod 2j. In our description of
the radix sort algorithms, we'll assume the existence of a function bits(x, k, j:
integer): integer which combines these operations to return the j bits which
appear k bits from the right in 5 by computing (x div ak) mod 23. For
example, the rightmost bit of z is returned by the call bits(x, 0,l). This
function can be made efficient by precomputing (or defining as constants)
the powers of 2. Note that a program which uses only this function will
do radix sorting whatever the representation of the numbers, though we can
hope for much improved efficiency if the representation is binary and the
compiler is clever enough to notice that the computation can actually be
done with machine language "shift" and "and" instructions. Many Pascal
implementations have extensions to the language which allow these operations
to be specified somewhat more directly.
Armed with this basic tool, we'll consider two different types of radix
sorts which differ in the order in which they examine the bits of the keys. We
assume that the keys are not short, so that it is worthwhile to go to the effort
of extracting their bits. If the keys are short, then the distribution counting
method in Chapter 8 can be used. Recall that this method can sort N keys
known to be integers between 0 and M - 1 in linear time, using one auxiliary
table of size M for counts and another of size N for rearranging records.
Thus, if we can afford a table of size 2', then b-bit keys can easily be sorted
RADIX SORTING 117
in linear time. Radix sorting comes into play if the keys are sufficiently long
(say b = 32) that this is not possible.
The first basic method for radix sorting that we'll consider examines the
bits in the keys from left to right. It is based on the fact that the outcome of
"comparisons" between two keys depend: only on the value of the bits at the
first position at which they differ (reading from left to right). Thus, all keys
with leading bit 0 appear before all keys with leading bit 1 in the sorted file;
among the keys with leading bit 1, all keys with second bit 0 appear before
all keys with second bit 1, and so forth. The left-to-right radix sort, which
is called radix exchange sort, sorts by sy::tematically dividing up the keys in
this way.
The second basic method that we'll consider, called straight radix sort,
examines the bits in the keys from right to left. It is based on an interesting
principle that reduces a sort on b-bit keys to b sorts on l-bit keys. We'll see
how this can be combined with distribution counting to produce a sort that
runs in linear time under quite generous assumptions.
The running times of both basic radix sorts for sorting N records with b
bit keys is essentially Nb. On the one hanIl, one can think of this running time
as being essentially the same as N log N, ;;ince if the numbers are all different,
b must be at least 1ogN. On the other hand, both methods usually use
many fewer than Nb operations: the left-to-right method because it can stop
once differences between keys have been Yound; and the right-to-left method,
because it can process many bits at once.
Radix Exchange Sort
Suppose we can rearrange the records of a file so that all those whose keys
begin with a 0 bit come before all those whose keys begin with a 1 bit. This
immediately defines a recursive sorting method: if the two subfiles are sorted
independently, then the whole file is sorted. The rearrangement (of the file)
is done very much like the partitioning n Quicksort: scan from the left to
find a key which starts with a 1 bit, scan from the right to find a key which
starts with a 0 bit, exchange, and continue the process until the scanning
pointers cross. This leads to a recursive sorting procedure that is very similar
to Quicksort:
118 CHAPTER 10
procedure radixexchange(l, r, b: integer);
var t, i, j: integer;
if (r>l) and (b>=O) then
i:=]; j:=r;
repeat
while (bits(a[i], b, 1)=0) and (i<j) do i:=i+l;
while (bits(ab], b, 1)=1) and (i<j) do j:=j-1;
t:=a[i]; a[i]:=a[j]; ab]:=t;
until j=i;
if bits(a[r], b, I)=0 then j:=j+l;
radixexchange(l, j-1, b-l);
radixexchangedj, r, b-l) ;
end
end ;
For simplicity, assume that a [1..N] contains positive integers less than 232
(that is, they could be represented as 31-bit binary numbers). Then the call
radixexchange(1, N, 30) will sort the array. The variable b keeps track of
the bit being examined, ranging from 30 (leftmost) down to 0 (rightmost).
(It is normally possible to adapt the implementation of bits to the machine
representation of negative numbers so that negative numbers are handled in
a uniform way in the sort.)
This implementation is obviously quite similar to the recursive implementation
of Quicksort in the previous chapter. Essentially, the partitioning in
radix exchange sort is like partitioning in Quicksort except that the number
2b is used instead of some number from the file as the partitioning element.
Since 2b may not be in the file, there can be no guarantee that an element
is put into its final place during partitioning. Also, since only one bit is being
examined, we can't rely on sentinels to stop the pointer scans; therefore
the tests (i<j) are included in the scanning loops. As with Quicksort, an
extra exchange is done for the case j=i, but it is not necessary to undo this
exchange outside the loop because the "exchange" is a[i] with itself. Also as
with Quicksort, some care is necessary in this algorithm to ensure that the
nothing ever "falls between the cracks" when the recursive calls are made.
The partitioning stops with j=i and all elements to the right of a[i] having 1
bits in the bth position and all elements to the left of a[i] having 0 bits in the
bth position. The element a[i] itself will have a 1 bit unless all keys in the
file have a 0 in position b. The implementation above has an extra test just
after the partitioning loop to cover this case.
RADIX SORTING 119
The following table shows how our rample file of keys is partitioned and
sorted by this method. This table is can be compared with the table given in
Chapter 9 for Quicksort, though the operation of the partitioning method is
completely opaque without the binary representation of the keys.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A S O R T I N G E X A M P L E
A E O L M I N G E A X T P R S
A E A E G I N M L O
A A E E G
A A
A A
E E G
E E
I N b/i L 0
L M N 0
L M
N 0
S T P R X
S R P T
P R S
R S
A A E E G I L M N O P R S T X
The binary representation of the keys used for this example is a simple
five-bit code with the ith letter in the alphabet represented by the binary
representation of the number i. This, is a simplified version of real character
codes, which use more bits (seven or eight) and represent more characters
(upper/lower case letters, numbers, special symbols). By translating the keys
in this table to this five-bit characte:i* code, compressing the table so that the
subfile partitioning is shown "in parallel" rather than one per line, and then
120 CIIAF'TER 10
transposing rows and columns, we can see how the leading bits of the keys
control partitioning:
A 00001 A 00001
s 10011 E 00101
0 01111 0 01111
R 10010 L 01100
T 10100 M 01101
I 01001 I 01001
N 01110 N 01110
G 00111 G 00111
E 00101 E 00101
x 11000 A 00001
A 00001 x 11000
M 01101 T 10100
P 10000 P 10000
L 01100 R 10010
E 00101 s 10011
A 00001
E 00101
A 00001
E 00101
G 00111
I 01001
N 01110
M 01101
L 01100
0 01111
s 10011
T 10100
P 10000
R 10010
x 11000
A 00001
A 00001
E 00101
E 00101
G 00111
I 01001
N 01110
M 01101
L 01100
0 01111
s 10011
R 10010
P 10000
T 10100
L 011 0
M
N 01110
0 01111
P
011 i 1
100 0
R 10010
s 10011
A 00001
A 00001
E 00101
E 00101
L 01100
M 01101
N 01110
0 01111
R 10010
s 10011
One serious potential problem for radix sort not brought out in this
example is that degenerate partitions (with all keys having the same value for
the bit being used) can happen frequently. For example, this arises commonly
in real files when small numbers (with many leading zeros) are being sorted.
It also occurs for characters: for example suppose that 32-bit keys are made
up from four characters by encoding each in a standard eight-bit code then
putting them together. Then degenerate partitions are likely to happen at the
beginning of each character position, since, for example, lower case letters all
begin with the same bits in most character codes. Many other similar effects
are obviously of concern when sorting encoded data.
From the example, it can be seen that once a key is distinguished from
all the other keys by its left bits, no further bits are examined. This is a
distinct advantage in some situations, a disadvantage in others. When the
keys are truly random bits, each key should differ from the others after about
lg N bits, which could be many fewer than the number of bits in the keys.
This is because, in a random situation, we expect each partition to divide the
subfile in half. For example, sorting a file with 1000 records might involve
only examining about ten or eleven bits from each key (even if the keys
are, say, 32-bit keys). On the other hand, notice that all the bits of equal
keys are examined. Radix sorting simply does not work well on files which
R&XX SORTING 121
contain many equal keys. Radix exchange sort is actually slightly faster than
Quicksort if the keys to be sorted are comprised of truly random bits, but
Quicksort can adapt better to less randon situations.
Straight Radix Sort
An alternative radix sorting method is tc examine the bits from right to left.
This is the method used by old computer-card-sorting machines: a deck of
cards was run through the machine 80 times, once for each column, proceeding
from right to left. The following example shows how a right-to-left bit-by-bit
radix sort works on our file of sample ke:rs.
A 00001 R 10010
s 10011 T 10100
0 01111 N 01110
R 10010 x 11000
T 10100 P 10000
I 01001 L 01100
N 01110 A 00001
G 00111 s 10011
E 00101 0 01111
x 11000 I 01001
A 00001 G 00111
M 01101 E 00101
P 10000 A 00001
L 01100 M 01101
E 00101 E 00101
T 10100
x 11000
P 10000
L 01100
A 00001
I 01001
E 00101
A 00001
M 01101
E 00101
R 10010
N 01110
s 10011
0 01111
G 00111
:c 11000
I' 10000
fl 00001
.: 01001
ii 00001
I1 10010
3 10011
'Y 10100
1, 01100
I? 00101
14 01101
12 00101
N 01110
0 01111
G 00111
P 10000 A 00001
A 00001 A 00001
A 00001 E 00101
R 10010 E 00101
s 10011 G 00111
T 10100 I 01001
E 00101 L 01100
E 00101 M 01101
G 00111 N 01110
x 11000 0 01111
I 01001 P 10000
L 01100 R 10010
M 01101 s 10011
N 01110 T 10100
0 01111 x 11000
The ith column in this table is sorted on the trailing i bits of the keys.
The ith column is derived from the (i - l$t column by extracting all the keys
with a 0 in the ith bit, then all the keys with a 1 in the ith bit.
It's not easy to be convinced that the method works; in fact it doesn't
work at all unless the one-bit partitioning process is stable. Once stability
has been identified as being important, a trivial proof that the method works
can be found: after putting keys with ti,h bit 0 before those with ith bit 1
(in a stable manner) we know that any l,wo keys appear in proper order (on
the basis of the bits so far examined) in the file either because their ith bits
are different, in which case partitioning puts them in the proper order, or
because their ith bits are the same, in which case they're in proper order
because of stability. The requirement 01% stability means, for example, that
122 CRAPTER 10
the partitioning method used in the radix exchange sort can't be used for this
right-to-left sort.
The partitioning is like sorting a file with only two values, and the distribution
counting sort that we looked at in Chapter 8 is entirely appropriate
for this. If we assume that A4 = 2 in the distribution counting program and
replace a[i] by bits(a[i], k, l), then that program becomes a method for sorting
the elements of the array a on the bit k positions from the right and putting
the result in a temporary array t. But there's no reason to use A4 = 2; in
fact we should make M as large as possible, realizing that we need a table of
M counts. This corresponds to using m bits at a time during the sort, with
M = 2m. Thus, straight radix sort becomes little more than a generalization
of distribution counting sort, as in the following implementation for sorting
a[l..N] on the b rightmost bits:
procedure straig&radix( b: integer) ;
var i, j, pass: integer;
for pass:=0 to (b div m)-1 do
for j:=O to M-l do countlj] :=O;
for i:=l to N do
count[bits(a[i],pass*m, m)] :=count[bits(a[i],pass*m, m)]+l;
for j:=l to M-l do
countIj]:=countIj-l]+countb];
for i:=N downto 1 do
t[count[bits(a[i],pass*m,m)]]:=a[i];
count[bits(a[i],pass*m,m)]:=count[bits(a[i],pass*m,m)]-l;
end ;
for i:=l to N do a[i]:=t[i];
end ;
end ;
For clarity, this procedure uses two calls on bits to increment and decrement
count, when one would suffice. Also, the correspondence M = 2m has been
preserved in the variable names, though some versions of "pascal" can't tell
the difference between m and M.
The procedure above works properly only if b is a multiple of m. Normally,
this is not a particularly restrictive assumption for radix sort: it simply corresponds
to dividing the keys to be sorted into an integral number of equal
size pieces. When m=b we have distribution counting sort; when m=l we
RADIX SORTING 123
have straight radix sort, the rightrto-left l&by-bit radix sort described in the
example above.
The implementation above moves the file from a to t during each distribution
counting phase, then back to a in a simple loop. This "array copy"
loop could be eliminated if desired by making two copies of the distribution
counting code, one to sort from a into t, the other to sort from t into a.
A Linear Sort
The straight radix sort implementation given in the previous section makes
b/m passes through the file. By making rr: large, we get a very efficient sorting
method, as long as we have M = 2m words of memory available. A reasonable
choice is to make m about one-fourth th,e word-size (b/4), so that the radix
sort is four distribution counting passes. The keys are treated as base-M
numbers, and each (base--M) digit of each key is examined, but there are
only four digits per key. (This directly corresponds with the architectural
organization of many computers: one typical organization is to have 32-bit
words, each consisting of four g-bit bytes. The bits procedure then winds up
extracting particular bytes from words in this case, which obviously can be
done very efficiently on such computers.) Now, each distribution counting
pass is linear, and since there are only four of them, the entire sort is linear,
certainly the best performance we could hope for in a sort.
In fact, it turns out that we can get bj, with only two distribution counting
passes. (Even a careful reader is likely '10 have difficulty telling right from
left by this time, so some caution is called for in trying to understand this
method.) This can be achieved by taking advantage of the fact that the file
will be almost sorted if only the leading b,'2 bits of the bbit keys are used. As
with Quicksort, the sort can be completed efficiently by using insertion sort
on the whole file afterwards. This method is obviously a trivial modification
to the implementation above: to do a right-to-left sort using the leading half
of the keys, we simply start the outer loop at pass=b div (2*m) rather than
pass=l. Then a conventional insertion sol-t can be used on the nearly-ordered
file that results. To become convinced that a file sorted on its leading bits
is quite well-ordered, the reader should examine the first few columns of the
table for radix exchange sort above. For example, insertion sort run on the
the file sorted on the first three bits would require only six exchanges.
Using two distribution counting passes (with m about one-fourth the word
size), then using insertion sort to finish ;he job will yield a sorting method
that is likely to run faster than any of the others that we've seen for large files
whose keys are random bits. Its main disal,dvantage is that it requires an extra
array of the same size as the array being sorted. It is possible to eliminate
the extra array using linked-list techniquies, but extra space proportional to
N (for the links) is still required.
CHAPTER 10
A linear sort is obviously desirable for many applications, but there are
reasons why it is not the panacea that it might seem. First, it really does
depend on the keys being random bits, randomly ordered. If this condition is
not sati.sfied, severely degraded performance is likely. Second, it requires extra
space proportional the size of the array being sorted. Third, the "inner loop"
of the program actually contains quite a few instructions, so even though it's
linear, it won't be as much faster than Quicksort (say) as one might expect,
except for quite large files (at which point the extra array becomes a real
liability). The choice between Quicksort and radix sort is a difficult one
that is likely to depend not only on features of the application such as key,
record, and file size, but also on features of the programming and machine
environment that relate to the efficiency of access and use of individual bits.
Again, such tradeoffs need to be studied by an expert and this type of study
is likely to be worthwhile only for serious sorting applications.
RADlX SORTING 125
Exercises
1. Compare the number of exchanges used by radix exchange sort with
the number of exchanges used by Qlricksort for the file 001,011,101,110,
000,001,010,111,110,010.
2. Why is it not as important to remove the recursion from the radix exchange
sort as it was for Quicksort?
3. Modify radix exchange sort to skip leading bits which are identical on all
keys. In what situations would this be worthwhile?
4. True or false: the running time of sti,aight radix sort does not depend on
the order of the keys in the input file. Explain your answer.
5. Which method is likely to be faste-* for a file of all equal keys: radix
exchange sort or straight radix sort?
6. True or false: both radix exchange sort and straight radix sort examine
all the bits of all the keys in the file. Explain your answer.
7. Aside from the extra memory reqciirement, what is the major disadvantage
to the strategy of doing straight radix sorting on the leading
bits of the keys, then cleaning up with insertion sort afterwards?
8. Exactly how much memory is requirl:d to do a 4-pass straight radix sort
of N b-bit keys?
9. What type of input file will make radix exchange sort run the most slowly
(for very large N)?
10. Empirically compare straight radix sort with radix exchange sort for a
random file of 1000 32-bit keys.
11. Priority Queues
In many applications, records with keys must be processed in order,
but not necessarily in full sorted order and not necessarily all at once.
Often a set of records must be collected, then the largest processed, then
perhaps more records collected, then the next largest processed, and so forth.
An appropriate data structure in such an environment is one which supports
the operations of inserting a new element and deleting the largest element.
This can be contrasted with queues (delete the oldest) and stacks (delete the
newest). Such a data structure is called a priority queue. In fact, the priority
queue might be thought of as a generalization of the stack and the queue (and
other simple data structures), since these data structures can be implemented
with priority queues, using appropriate priority assignments.
Applications of priority queues include simulation systems (where the
keys might correspond to "event times" which must be processed in order),
job scheduling in computer systems (where the keys might correspond to
"priorities" which indicate which users should be processed first), and numerical
computations (where the keys might be computational errors, so the largest
can be worked on first).
Later on in this book, we'll see how to use priority queues as basic
building blocks for more advanced algorithms. In Chapter 22, we'll develop a
file compression algorithm using routines from this chapter, and in Chapters
31 and 33, we'll see how priority queues can serve as the basis for several
fundamental graph searching algorithms. These are but a few examples of
the important role served by the priority queue as a basic tool in algorithm
design.
It is useful to be somewhat more precise about how a priority queue will
be manipulated, since there are several operations we may need to perform
on priority queues in order to maintain them and use them effectively for
applications such as those mentioned above. Indeed, the main reason that
127
128 CHAPTER 11
priority queues are so useful is their flexibility in allowing a variety of different
operations to be efficiently performed on set of records with keys. We want to
build and maintain a data structure containing records with numerical keys
(priorities), suppor t'mg some of the following operations:
Construct a priority queue from N given items.
Insert a new item.
Remove the largest item.
Replace the largest item with a new item (unless the new item is larger).
Change the priority of an item.
Delete an arbitrary specified item.
Join two priority queues into one large one.
(If records can have duplicate keys, we take "largest" to mean "any record
with the largest key value.")
The replace operation is almost equivalent to an insert followed by a
remove (the difference being that the insert/remove requires the priority queue
to grow temporarily by one element). Note that this is quite different from
doing a remove followed by an insert. This is included as a separate capability
because, as we will see, some implementations of priority queues can do the
replace operation quite efficiently. Similarly, the change operation could be
implemented as a delete followed by an insert and the construct could be implemented
with repeated uses of the insert operation, but these operations can be
directly implemented more efficiently for some choices of data structure. The
join operation requires quite advanced data structures for efficient implementation;
we'll concentrate instead on a "classical" data structure, called a heap,
which allows efficient implementations of the first five operations.
The priority queue as described above is an excellent example of an
abstract data structure: it is very well defined in terms of the operations
performed on it, independent of the way the data is organized and processed
in any particular implementation. The basic premise of an abstract data
structure is that nothing outside of the definitions of the data structure
and the algorithms operating on it should refer to anything inside, except
through function and procedure calls for the fundamental operations. The
main motivation for the development of abstract data structures has been
as a mechanism for organizing large programs. They provide a way to limit
the size and complexity of the interface between (potentially complicated)
algorithms a.nd associated data structures and (a potentially large number
of) programs which use the algorithms and data structures. This makes it
easier to understand the large program, and makes it more convenient to
change or improve the fundamental algorithms. For example, in the present
PRIORITY QUEUES 129
context, there are several methods for implementing the various operations
listed above that can have quite different performance characteristics. Defining
priority queues in terms of operations on an abstract data structure provides
the flexibility necessary to allow experimentation with various alternatives.
Different implementations of priority queues involve different performance
characteristics for the various operations to be performed, leading to cost
tradeoffs. Indeed, performance differences are really the only differences allowed
by the abstract data structure concept. First, we'll illustrate this point
by examining a few elementary data structures for implementing priority
queues. Next, we'll examine a more advanced data structure, and then show
how the various operations can be implemented efficiently using this data
structure. Also, we'll examine an important sorting algorithm that follows
naturally from these implementations.
Elementary Implementations
One way to organize a priority queue is as an unordered list, simply keeping
the items in an array a[l..N] without paying attention to the keys. Thus
construct is a "no-op" for this organization. To insert simply increment N and
put the new item into a[N], a constant-time operation. But replace requires
scanning through the array to find the element with the largest key, which
takes linear time (all the elements in the array must be examined). Then
remove can be implemented by exchanging a[N] with the element with the
largest key and decrementing N.
Another organization is to use a sorted list, again using an array a [1..N]
but keeping the items in increasing order of their keys. Now remove simply
involves returning a[N] and decrementing N (constant time), but insert involves
moving larger elements in the array right one position, which could
take linear time.
Linked lists could also be used for the unordered list or the sorted list.
This wouldn't change the fundamental performance characteristics for insert,
remove, or replace, but it would make it possible to do delete and join in
constant time.
Any priority queue algorithm can be turned into a sorting algorithm by
successively using insert to build a priority queue containing all the items to be
sorted, then successively using remove to empty the priority queue, receiving
the items in reverse order. Using a priority queue represented as an unordered
list in this way corresponds to selection sort; using the sorted list corresponds
to insertion sort.
As usual, it is wise to keep these simple implementations in mind because
they can outperform more complicated methods in many practical situations.
For example, the first method might be appropriate in an application where
130 CRAPTER 11
only a few "remove largest" operations are performed as opposed to a large
number of insertions, while the second method would be appropriate if the
items inserted always tended to be close to the largest element in the priority
queue. Implementations of methods similar to these for the searching problem
(find a record with a given key) are given in Chapter 14.
Heap Data Structure
The data structure that we'll use to support the priority queue operations
involves storing the records in an array in such a way that each key is
guaranteed to be larger than the keys at two other specific positions. In turn,
each of those keys must be larger than two more keys, and so forth. This
ordering is very easy to see if we draw the array in a two-dimensional "tree"
structure with lines down from each key to the two keys known to be smaller.
This structure is called a "complete binary tree": place one node (called the
root), then, proceeding down the page and from left to right, connect two nodes
beneath each node on the previous level until N nodes have been placed. The
nodes below each node are called its sons; the node above each node is called
its father. (We'll see other kinds of "binary trees" and "trees" in Chapter 14
and later chapters of this book.) Now, we want the keys in the tree to satisfy
the heap condition: the key in each node should be larger than (or equal to)
the keys in its sons (if it has any). Note that this implies in particular that
the largest key is in the root.
We can represent complete binary trees sequentially within an array by
simply putting the root at position 1, its sons at positions 2 and 3, the nodes at
the next level in positions 4, 5,6 and 7, etc., as numbered in the diagram above.
For example, the array representation for the tree above is the following:
1 2 3 4 5 6 7 8 9 10 11 12
X T O G S M N A E R A I
PRIORITY QUEUES 1 3 1
This natural representation is useful because it is very easy to get from a
node to its father and sons. The father of the node in position j is in position
j div 2, and, conversely, the two sons of the node in position j are in position
2j and 2j + 1. This makes t,raversal of such a tree even easier than if the tree
were implemented with a standard linked representation (with each element
containing a pointer to its father and sons). The rigid structure of complete
binary trees represented as arrays does limit their utility as data structures,
but there is just enough flexibility to allow the implementation of efficient
priority queue algorithms. A heap is a complete binary tree, represented as
an array, in which every node satisfies the heap condition. In particular, the
largest key is always in the first position in the array.
All of the algorithms operate along some path from the root to the bottom
of the heap (just moving from father to son or from son to father). It is easy
to see that, in a heap of N nodes, all paths have about 1gN nodes on them.
(There are about N/2 nodes on the bottom, N/4 nodes with sons on the
bottom, N/8 nodes with grandsons on the bottom, etc. Each "generation"
has about half as many nodes as the next, which implies that there can be
at most lg N generations.) Thus all of the priority queue operations (except
join) can be done in logarithmic time using heaps.
Algorithms on Heaps
The priority queue algorithms on heaps all work by first making a simple
structural modification which could violate the heap condition, then traveling
through the heap modifying it to ensure that the heap condition is satisfied
everywhere. Some of the algorithms travel through the heap from bottom to
top, others from top to bottom. In all of the algorithms, we'll assume that
the records are one-word integer keys stored in an array a of some maximum
size, with the current size of the heap kept in an integer N. Note that N is as
much a part of the definition of the heap as the keys and records themselves.
To be able to build a heap, it is necessary first to implement the insert
operation. Since this operation will increase the size of the heap by one, N
must be incremented. Then the record to be inserted is put into a[N], but
this may violate the heap property. If the heap property is violated (the new
node is greater than its father), then the violation can be fixed by exchanging
the new node with its father. This may, in turn, cause a violation, and thus
can be fixed in the same way. For example, if P is to be inserted in the heap
above, it is first stored in a[N] as the right son of M. Then, since it is greater
than M, it is exchanged with M, and since it is greater than 0, it is exchanged
with 0, and the process terminates since it is less that X. The following heap
results:
132 CHAPTER 11
The code for this method is straightforward. In the following implementation,
insert adds a new item to a[N], then calls upheap to fix the heap condition
violation at N
procedure upheap(k: integer);
var v: integer;
v:=a[k]; a[O]:=maxint;
while a[k div 21 <=v do
begin a[k]:=a[k div 21; k:=k div 2 end;
a[k] :=v;
end ;
procedure insert (v: integer) ;
N:=N+l; a[N] :=v;
whew(N)
end ;
As with insertion sort, it is not necessary to do a full exchange within the
loop, because v is always involved in the exchanges. A sentinel key must be
put in a[O] to stop the loop for the case that v is greater than all the keys in
the heap.
The replace operation involves replacing the key at the root with a new
key, then moving down the heap from top to bottom to restore the heap
condition. For example, if the X in the heap above is to be replaced with
C, the first step is to store C at the root. This violates the heap condition,
but the violation can be fixed by exchanging C with T, the larger of the two
sons of the root. This creates a violation at the next level, which can be fixed
PRIORITY QUEUES 133
again by exchanging C with the larger of its two sons (in this case S). The
process continues until the heap condition is no longer violated at the node
occupied by C. In the example, C makes it all the way to the bottom of the
heap, leaving:
The "remove the largest" operation involves almost the same process.
Since the heap will be one element smaller after the operation, it is necessary
to decrement iV, leaving no place for the element that was stored in the last
position. But the largest element is to be removed, so the remove operation
amounts to a replace, using the element that was in a(iV]. For example, the
following heap results from removing the T from the heap above:
The implementation of these procedures is centered around the operation
of fixing up a heap which satisfies the heap condition everywhere except
possibly at the root. The same operation can be used to fix up the heap
after the value in any position is lowered. It may be implemented as follows:
134 CHAPTER 11
procedure downheap(k: integer) ;
label 0;
var i, j, v: integer;
v:=a[k];
while k<= N div 2 do
j:=k+k;
if j<N then if ab]<ab+l] then j:=j+l;
if v>=ab] then goto 0;
a[k]:=ab]; k:=j;
end;
O:a[k] :=v
end ;
This procedure moves down the heap (starting from position k), exchanging
the node at position j with the larger of its two sons if necessary, stopping when
j is larger than both sons or the bottom is reached. As above, a full exchange
is not needed because v is always involved in the exchanges. The inner loop in
this program is an example of a loop which really has two distinct exits: one
for the case that the bottom of the heap is hit (as in the first example above),
and another for the case that the heap condition is satisfied somewhere in the
interior of the heap.
Now the implementation of the remove operation is simple:
function remove: integer;
remove:=a[l];
a[l]:=a[N]; N:=N-1;
downheap(1);
end ;
The return value is set from a[l], then the element from a [N] is put into a[l]
and the size of the heap decremented, leaving only a call to downheap to fix
up the heap condition everywhere.
The implementation of the replace operation is only slightly more complicated:
PRIORITY QUEUES 135
function replace(v: integer):integer;
a[O] :=v;
downheap( 0) ;
replace:=a[O];
end ;
This code uses a[O] in an artificial way: its sons are 0 (itself) and 1, so if v is
larger than the largest element in the heap, the heap is not touched; otherwise
v is put into the heap and a[11 returned..
The delete operation for an arbitrary element from the heap and the
change operation can also be implemented by using a simple combination of
the methods above. For example, if the priority of the element at position k
is raised, then upheap can be called, and if it is lowered then downheap
does the job. On the other hand, the join operation is far more difficult and
seems to require a much more sophisticated data structure.
All of the basic operations insert, remove, replace, (downheap and upheup),
delete, and change involve moving along a path between the root and the hottom
of the heap, which includes no more than about log N elements for a heap
of size N. Thus the running times of the above programs are logarithmic.
Heapsort
An elegant and efficient sorting method can be defined from the basic operations
on heaps outlined above. This method, called Heapsort, uses no extra
memory and is guaranteed to sort M elements in about Mlog M steps no
matter what the input. Unfortunately, its inner loop is quite a bit longer
than the inner loop of Quicksort, and it is about twice as slow as Quicksort
on the average.
The idea is simply to build a heap containing the elements to be sorted
and then to remove them all in order. In this section, N will continue to be
the size of the heap, so we will use M for the number of elements to be sorted.
One way to sort is to implement the construct operation by doing M insert
operations, as in the first two lines of the following code, then do M remove
operations, putting the element removed into the place just vacated by the
shrinking heap:
N:=O;
for k:=l to M do insert(a[k]);
for k:=M downto 1 do a[k]:=remove;
136 CHAPTER 11
This code breaks all the rules of abstract data structures by assuming a particular
representation for the priority queue (during each loop, the priority
queue resides in a[l], . . . , a[k-1]), but it is reasonable to do this here because
we are implementing a sort, not a priority queue. The priority queue procedures
are being used only for descriptive purposes: in an actual implementation
of the sort, we would simply use the code from the procedures to avoid
doing so many unnecessary procedure calls.
It is actually a little better to build the heap by going backwards through
it, making little heaps from the bottom up. Note that every position in the
array is the root of a small heap, and downheap will work equally well for
such small heaps as for the big heap. Also, we've noted that remove can be
implemented by exchanging the first and last elements, decrementing N, and
calling downheap(1). This leads to the following implementation of Heapsort:
procedure heapsort;
var k, t: integer;
N:=M;
for k:=M div 2 downto 1 do downheap(
repeat
t:=a[l]; a[l]:=a[N]; a[N]:=t;
N:=N-1; downheap(1)
until NC = 1;
end ;
The first two lines of this code constitute an implementation of construct(M:
integer) to build a heap of M elements. (The keys in a[ (M div 2)+1..M] each
form heaps of one element, so they trivially satisfy the heap condition and
don't need to be checked.) It is interesting to note that, though the loops in
this program seem to do very different things, they can be built around the
same fundamental procedure.
The following table shows the contents of each heap operated on by
downheap for our sorting example, just after downheap has made the heap
condition hold everywhere.
PRIORITY QUEUES 137
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A S O R T I N G E X A M P L E
N L E
P M I
X T A
R G E
P 0 N M I L E
X R T G E S A
X T P R S O N G E A A M I L E
T S P R E O N G E A A M I L
S R P L E O N G E A A M I
R L P I E O N G E A A M
P L 0 I E M N G E A A
O L N I E M A G E A
N L M I E A A G E
M L E I E A A G
L I E G E A A
I G E A E A
G E E A A
E A E A
E A A
A A
A
A A E E G I L M N O P R S T X
As mentioned above, the primary reason that Heapsort is of practical
interest is that the number of steps required to sort M elements is guaranteed
to be proportional to M log M, no matter what the input. Unlike the other
methods that we've seen, there is no "worst-case" input that will make Heapsort
run slower. The proof of this is simple: we make about 3M/2 calls to
downheap (about M/2 to construct the heap and M for the sort), each of
which examines less than log M heap elements, since the heap never has more
than M elements.
Actually, the above proof uses an overestimate. In fact, it can be proven
that the construction process takes linear time since so many small heaps are
processed. This is not of particular importance to Heapsort, since this time
is still dominated by the M log M time for sorting, but it is important for
other priority queue applications, where a linear time construct can lead to
a linear time algorithm. Note that constructing a heap with M successive
inserts requires M log M steps in the worst case (though it turns out to be
linear on the average).
138 CHAPTER 11
Indirect Heaps
For many applications of priority queues, we don't want the records moved
around at all. Instead, we want the priority queue routine to tell us which
of the records is the largest, etc., instead of returning values. This is akin
to the "indirect sort" or the "pointer sort" concept described at the beginning
of Chapter 8. Modification of the above programs to work in this way
is straightforward, though sometimes confusing. It will be worthwhile to examine
this in more detail here because it is so convenient to use heaps in this
way.
Specifically, instead of rearranging the keys in the array a the priority
queue routines will work with an array heap of indices into the array a, such
that a[heap[k]] is the key of the kth element of the heap, for k between 1 and
N. Moreover, we want to maintain another array inv which keeps the heap
position of the kth array element. Thus the inv entry for the largest element
in the array is 1, etc. For example, if we wished to change the value of a[k]
we could find its heap position in inv[k], for use by upheap or downheap. The
following table gives the values in these arrays for our sample heap:
k: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
a[k]:A S 0 R T I N G E X A M P L E
heap[k]:lO 5 13 4 2 3 7 8 9 1 11 12 6 14 15
a[heap[k]]: X T P R S 0 N G E A A M I L E
a[k]: A S 0 R T I N G E X A M P L E
inv[k]: 10 5 6 4 2 1 3 7 8 9 1 1 1 1 2 3 1 4 1 5
Note that heap[inv[k]]=inv[heap[k]]=k for all k from 1 to N.
We start with heap[k]=inv[k]=k for k from 1 to N, which indicates that
no rearrangement has been done. The code for heap construction looks much
the same as before:
procedure pqconstruct;
var k: integer;
N:=M;
for k:=l to N do
begin heap[k] :=k; inv[k] :=k end;
for k:=M div 2 downto 1 do pqdownheap(k) ;
end ;
PRIORITY QUEUES 139
We'll prefix implementations of priority queue routines based on indirect heaps
with "pq" for indentification when they are used in later chapters.
Now, to modify downheap to work indirectly, we need only examine the
places where it references a. Where it did a comparison before, it must now
access a indirectly through heap. Where it did a move before, it must now
make the move in heap, not a, and it must modify inv accordingly. This leads
to the following implementation:
procedure pqdownheap(k: integer);
label 0;
var j, v: integer;
v:=heap[k];
while k<= N div 2 do
j:=k+k;
if j<N then if a[heapIj]]<a[heapb+1]] then j:=j+l;
if a[v]>=a[heaplj]:] then goto 0;
heap [k] :=heap b] ; inv[heaplj]] :=k; k:=j;
end;
O:heap[k] :=v; inv[v] :=k
end ;
The other procedures given above can be modified in a similar fashion to
implement "pqinsert," "pqchange," etc.
A similar indirect implementation can be developed based on maintaining
heap as an array of pointers to separately allocated records. In this case, a
little more work is required to implement the function of inv (find the heap
position, given the record).
Advanced Implementations
If the join operation must be done efficiently, then the implementations that
we have done so far are insufficient and more advanced techniques are needed.
Although we don't have space here to go into the details of such methods, we
can discuss some of the considerations that go into their design.
By "efficiently," we mean that a join should be done in about the same
time as the other operations. This immediately rules out the linkless representation
for heaps that we have been using, since two large heaps can be
joined only by moving all the elements in at least one of them to a large
array. It is easy to translate the algorithms we have been examining to use
linked representations; in fact, sometimes there are other reasons for doing
140 CHAPTER 11
so (for example, it might be inconvenient to have a large contiguous array).
In a direct linked representation, links would have to be kept in each node
pointing to the father and both sons.
It turns out that the heap condition itself seems to be too strong to allow
efficient implementation of the join operation. The advanced data structures
designed to solve this problem all weaken either the heap or the balance
condition in order to gain the flexibility needed for the join. These structures
allow all the operations be completed in logarithmic time.
Ll
PRIORITY QUEUES 141
Exercises
1.
2 .
3 .
4 .
5 .
6 .
7 .
8 .
9 .
1 0 .
Draw the heap that results when the following operations are performed
on an intitially empty heap: insert( IO), insert(5), insert(2), replate(4),
insert(6), insert(8), remove, insert(T), insert(3).
Is a file in reverse sorted order a heap?
Give the heap constructed by successive application of insert on the keys
EASYQUESTION.
Which positions could be occupied by the 3rd largest key in a heap of
size 32? Which positions could not be occupied by the 3rd smallest key
in a heap of size 32?
Why not use a sentinel to avoid the j<N test in downheap?
Show how to obtain the functions of stacks and normal queues as special
cases of priority queues.
What is the minimum number of keys that must be moved during a
remove the largest operation in a heap? Draw a heap of size 15 for which
the minimum is achieved.
Write a program to delete the element at postion d in a heap.
Empirically compare the two methods of heap construction described in
the text, by building heaps with 1000 random keys.
Give the contents of inv after pqconstruct is used on the keys E A S Y Q
U E S T I O N .
12. Selection and Merging
Sorting programs are often used for applications in which a full sort is
not necessary. Two important operations which are similar to sorting
but can be done much more efficiently are selection, finding the kth smallest
element (or finding the k smallest elements) in a file, and merging, combining
two sorted files to make one larger sorted file. Selection and merging are
intimately related to sorting, as we'll see, and they have wide applicability in
their own right.
An example of selection is the process of finding the median of a set of
numbers, say student test scores. An example of a situation where merging
might be useful is to find such a statistic for a large class where the scores are
divided up into a number of individually sorted sections.
Selection and merging are complementary operations in the sense that
selection splits a file into two independent files and merging joins two independent
files to make one file. The relationship between these operations also
becomes evident if one tries to apply the "divide-and-conquer" paradigm to
create a sorting method. The file can either be rearranged so that when two
parts are sorted the whole file is sorted, or broken into two parts to be sorted
and then combined to make the sorted whole file. We've already seen what
happens in the first instance: that's Quicksort, which consists basically of a
selection procedure followed by two recursive calls. Below, we'll look at mergesort,
Quicksort's complement in that it consists basically of two recursive
calls followed by a merging procedure.
Both selection and merging are easier than sorting in the sense that their
running time is essentially linear: the programs take time proportional to N
when operating on N items. But available methods are not perfect in either
case: the only known ways to merge in place (without using extra space)
are too complex to be reduced to practical programs, as are the only known
selection methods which are guaranteed to be linear even in the worst case.
143
144 CHAPTER 12
Selection
Selection has many applications in the processing of experimental and other
data. The most prominent use is the special case mentioned above of finding
the median element of a file: that item which is greater than half the items
in the file and smaller than half the items in the file. The use of the median
and other order statistics to divide a file up into smaller percentile groups is
very common. Often only a small part of a large file is to be saved for further
processing; in such cases, a program which can select, say, the top ten percent
of the elements of the file might be more appropriate than a full sort.
An algorithm for selection must find the kth smallest item out of a file of
N items. Since an algorithm cannot guarantee that a particular item is the
kth smallest without having examined and identified the k- 1 items which are
smaller and the N - k elements which are larger, most selection algorithms
can return all of the k smallest elements of a file without a great deal of extra
calculation.
We've already seen two algorithms which are suitable for direct adaptation
to selection methods. If k is very small, then selection sort will work very
well, requiring time proportional to Nk: first find the smallest element, then
find the second smallest by finding the smallest among the remaining items,
etc. For slightly larger k, priority queues provide a selection mechanism: first
insert k items, then replace the largest N - k times using the remaining items,
leaving the k smallest items in the priority queue. If heaps are used to implement
the priority queue, everything can be done in place, with an approximate
running time proportional to N log k.
An interesting method which will run much faster on the average can be
formulated from the partitioning procedure used in Quicksort. Recall that
Quicksort's partitioning method rearranges an array a[l..N] and returns an
integer i such that a[l],. . . ,a[i-l] are less than or equal to a[i] and a[i+l],. . . ,
a[N] are greater than or equal to a[i]. If we're looking for the kth smallest
element in the file, and we're fortunate enough to have k=i, then we're done.
Otherwise, if k<i then we need to look for the kth smallest element in the
left subfile, and if k>i then we need to look for the (k-i)th smallest element
in the right subfile. This leads immediately to the recursive formulation:
SELECTION AAD MERGING 145
procedure seJect(J, r, k: integer);
var I;
begil 1
if r> J then
i::=partition(J, r);
if i>J+k-1 then seJect(J, i-l, k);
if i<J+k-I then seJect(i+l, r, k-i);
end
end;
This procedure rearranges the array so lhat a[J],. . . ,a[k-l] are less than or
equal to a[k] and a[k+1],... ,a[r] are greater than or equal to a[k]. For
example, the call seJect(l, N, (N+l) div Z) partitions the array on its median
value. For the keys in our sorting example, this program uses only three
recursive calls to find the median, as shown in the following table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A S O R T I N G E X A M P L E
A A EMTI N G O X S M P L R
L INGOPMmXTS
L I G(M(0 P N
m-
The file is rearranged so that the median is in place with all smaller elements
to the left and all larger elements to the right (and equal elements on either
side), but it is not fully sorted.
Since the select procedure always end 3 with only one call on itself, it is not
really recursive in the sense that no stael. is needed to remove the recursion:
when the time comes for the recursive call, we can simply reset the parameters
and go back to the beginning, since there is nothing more to do.
146 CHAPTER 12
procedure select(k: integer) ;
var v, t 7i 9j t 1 ) r: integer;
l:=l; r:=N;
while r>l do
v:=a[r]; i:=l-1; j : = r ;
repeat
repeat i:=i+l until a[i]>=v;
repeat j:=j-1 until ab]<=v;
t:=a[i]; a[i]:=alj]; ab]:=t;
until j<=i;
alj]:=a[i]; a[i]:=a[r]; a[r]:=t;
if i>=k then r:=i-1;
if i<=k then l:=i+l;
end ;
end ;
We use the identical partitioning procedure to Quicksort: as with Quicksort,
this could be changed slightly if many equal keys are expected. Note that in
this non-recursive program, we've eliminated the simple calculations involving
k.
This method has about the same worst case as Quicksort: using it to
find the smallest element in an already sorted file would result in a quadratic
running time. It is probably worthwhile to use an arbitrary or a random
partitioning element (but not the median-of-three: for example, if the smallest
element is sought, we probably don't want the file split near the middle). The
average running time is proportional to about N + Iclog(N/k), which is linear
for any allowed value of k.
It is possible to modify this Quicksort-based selection procedure so that its
running time is guaranteed to be linear. These modifications, while important
from a theoretical standpoint, are extremely complex and not at all practical.
Merging
It is common in many data processing environments to maintain a large
(sorted) data file to which new entries are regularly added. Typically, a
number of new entries are "batched," appended to the (much larger) main
file, and the whole thing resorted. This situation is tailor-made for merging: a
much better strategy is to sort the (small) batch of new entries, then merge it
with the large main file. Merging has many other similar applications which
SELECTION AND MERGING 147
make it worthwhile to study. Also, we'll examine a sorting method based on
merging.
In this chapter we'll concentrate on programs for two-way merging: programs
which combine two sorted input files to make one sorted output file. In
the next chapter, we'll look in more deLii at multiway merging, when more
than two files are involved. (The most important application of multiway
merging is external sorting, the subject cf that chapter.)
To begin, suppose that we have two sorted arrays a [l..M] and b [1..N] of
integers which we wish to merge into a third array c [ 1. .M+N] . The following
is a direct implementation of the obvious method of successively choosing for
c the smallest remaining element from a and b:
a[M+l I:=maxint; b[N+l] :=maxint;
for k:=l to M+N do
if a[il<blj]
then begin c[k]:=a[i]; i:=i+l end
elw begin c[k]:=bb]; j:=j+l end;
The implementation is simplified by making room in the a and b arrays for
sentinel keys with values larger than all the other keys. When the a(b) array
is exhausted, the loop simply moves the rc,st of the b(a) array into the c array.
The time taken by this method is obviously proportional to M+N.
The above implementation uses extra space proportional to the size of the
merge. It would be desirable to have an in-place method which uses c[l..M]
for one input and c[M+I..M+N] for the other. While such methods exist,
they are so complicated that an (N + M)log(N + M) inplace sort would be
more efficient for practical values of N and M.
Since extra space appears to be required for a practical implementation,
we might as well consider a linked-list imy'lementation. In fact, this method is
very well suited to linked lists. A full implementation which illustrates all the
conventions we'll use is given below; note that the code for the actual merge
is just about as simple as the code above:
CHAPTER 12
program listmerge(input, output);
type link=tnode;
node=record k: integer; next: link end;
var N, M: integer; z: link;
function merge(a, b: link) : link;
var c: link;
c:=z;
repeat
if at.k<=bf.k
then begin ct.next:=a; c:=a; a:=at.next end
else begin ct.next:=b; c:=b; b:=bf.next end
until cf.k=maxint;
merge:=zf.next; zt next:=z
end ;
readln (N, M) ;
new(z); zf.k:=maxint; zt.next:=z;
writelist(merge(readlist(N), read&(M)))
end.
This program merges the list pointed to by a with the list pointed to by b,
with the help of an auxiliary pointer c. The lists are initially built with the
readlist routine from Chapter 2. All lists are defined to end with the dummy
node a, which normally points to itself, and also serves as a sentinel. During
the merge, a points to the beginning of the newly merged list (in a manner
similar to the implementation of readlist), and c points to the end of the
newly merged list (the node whose link field must be changed to add a new
element to the list). After the merged list is built, the pointer to its first node
is retrieved from z and z is reset to point to itself.
The key comparison in merge includes equality so that the merge will be
stable, if the b list is considered to follow the a list. We'll see below how this
stability in the merge implies stability in the sorting programs which use this
merge.
Once we have a merging procedure, it's not difficult to use it as the basis
for a recursive sorting procedure. To sort a given file, divide it in half, sort the
two halves (recursively), then merge the two halves together. This involves
enough data movement that a linked list representation is probably the most
convenient. The following program is a direct recursive implementation of a
function which takes a pointer to an unsorted list as input and returns as its
value a pointer to the sorted version of the list. The program does this by
rearranging the nodes of the list: no temporary nodes or lists need be allocated.
SELECTION AND MIERGING 149
(It is convenient to pass the list length as E. parameter to the recursive program:
alternatively, it could be stored with the list or the program could scan the
list to find its length.)
function sort(c: link; fi: integer): link;
var a, b: link;
i: integer;
if cf.next=z then sor~,:=c else
a:=c;
for i:= 2 to N div 2 do c:=ct.next;
b:=cf.next; cf.next:=z;
sort:=merge(sort(a N div 2), sort(b, N-(N div 2)));
end ;
end ;
This program sorts by splitting the list po: nted to by c into two halves, pointed
to by a and b, sorting the two halves recursively, then using merge to produce
the final result. Again, this program adheres to the convention that all lists
end with z: the input list must end with z (and therefore so does the b list);
and the explicit instruction cf.next:=z puts z at the end of the a list. This
program is quite simple to understand in a recursive formulation even though
it actually is a rather sophisticated algorithm.
The running time of the program fits the standard "divide-and-conquer"
recurrence M(N) = 2M(N/2) + N. The program is thus guaranteed to run
in time proportional to NlogN. (See Chapter 4).
Our file of sample sorting keys is processed as shown in the following
table. Each line in the table shows the result of a call on merge. First we
merge 0 and S to get 0 S, then we merge this with A to get A 0 S. Eventually
we merge R T with I N to get I N R T, then this is merged with A 0 S to
get A I N 0 R S T, etc.:
150 CHAPTER 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A S 0 R T I N G E X A M P L E
0 s
A 0 S
R T
I N
I N R T
A I N O R S T
E G
A X
A E G X
M P
E L
E L M P
A E E G L M P X
A A E E G I L M N O P R S T X
Thus, this method recursively builds up small sorted files into larger ones.
Another version of mergesort processes the files in a slightly different order:
first scan through the list performing l-by-l merges to produce sorted sublists
of size 2, then scan through the list performing 2-by-2 merges to produce
sorted sublists of size 4, then do 4-by-4 merges to get sorted sublists of size
8, etc., until the whole list is sorted. Our sample file is sorted in four passes
using this "bottom-up" mergesort:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A S O R T I N G E X A M P L E
A SIO R I TIG N E XIA M L PIE
A O R S G I N T A E M X E L P
A G I N O R S T A E E L M P X
A A E E G I L M N O P R S T X
In general, 1ogN passes are required to sort a file of N elements, since each
pass doubles the size of the sorted subfiles. A detailed implementation of this
idea is given below.
SELECTION AND MERGING 151
function mergesort(c: link): link;
var a, b, head, todo, t: link;
i, N: integer;
N:=l; ntw(head); headf.next:=c;
repeat
todo:=.ieadt.next; c:=head;
repeat
t:=todo;
a:=t: for i:=l to N-l do t:=tf.next;
b:=t".next; tt.next:=z;
t:=b for i:=l to N-l do t:=tf.next;
todo:=tt.next; tf.next:=z;
cf.nest:=merge(a, b);
for i: =1 to N+N do c:=ct.next
until to do=z;
N:=N+ N;
until a=hf:adf.next;
mergesort:=headf.next
end ;
This program uses a "list header" node (pointed to by head) whose link field
points to the file being sorted. Each iteration of the outer repeat loop passes
through the file, producing a linked list comprised of sorted subfiles twice as
long as for the previous pass. This is done by maintaining two pointers, one
to the part of the list not yet seen (todoi and one to the end of the part of
the list for which the subfiles have already been merged (c). The inner repeat
loop merges the two subfiles of length N starting at the node pointed to by
todo producing a subfile of length N+N vrhich is linked onto the c result list.
The actual merge is accomplished by saking a link to the first subfile to be
merged in a, then skipping N nodes (using the temporary link t), linking z
onto the end of a's list, then doing the same to get another list of N nodes
pointed to by b (updating todo with the link of the last node visited), then
calling merge. (Then c is updated by simply chasing down to the end of the
list just merged. This is a simpler (but slightly less efficient) method than
various alternatives which are available, such as having merge return pointers
to both the beginning and the end, or maintaining multiple pointers in each
list node.)
Like Heapsort, mergesort has a gual,anteed N log N running time; like
Quicksort, it has a short inner loop. Thus it combines the virtues of these
methods, and will perform well for all ir.puts (though it won't be as quick
CHAPTER 12
as Quicksort for random files). The main advantage of mergesort over these
methods is that it is stable; the main disadvantage of mergesort over these
methods is that extra space proportional to N (for the links) is required. It
is also possible to develop a nonrecursive implementation of mergesort using
arrays, switching to a different array for each pass in the same way that we
discussed in Chapter 10 for straight radix sort.
Recursion Revisited
The programs of this chapter (together with Quicksort) are typical of implementations
of divide-and-conquer algorithms. We'll see several algorithms
with similar structure in later chapters, so it's worthwhile to take a more
detailed look at some basic characteristics of these implementations.
Quicksort is actually a "conquer-and-divide" algorithm: in a recursive
implementation, most of the work is done before the recursive calls. On the
other hand, the recursive mergesort is more in the spirit of divide-and-conquer:
first the file is divided into two parts, then each part is conquered individually.
The first problem for which mergesort does actual processing is a small one;
at the finish the largest subfile is processed. Quicksort starts with actual
processing on the largest subfile, finishes up with the small ones.
This difference manifests itself in the non-recursive implementations of
the two methods. Quicksort must maintain a stack, since it has to save large
subproblems which are divided up in a data-dependent manner. Mergesort
admits to a simple non-recursive version because the way in which it divides
the file is independent of the data, so the order in which it processes subproblems
can be rearranged somewhat to give a simpler program.
Another practical difference which manifests itself is that mergesort is
stable (if properly implemented); Quicksort is not (without going to extra
trouble). For mergesort, if we assume (inductively) that the subfiles have been
sorted stably, then we need only be sure that the merge is done in a stable
manner, which is easily arranged. But for Quicksort, no easy way of doing
the partitioning in a stable manner suggests itself, so the possibility of being
stable is foreclosed even before the recursion comes into play.
Many algorithms are quite simply expressed in a recursive formulation.
In modern programming environments, recursive programs implementing such
algorithms can be quite useful. However, it is always worthwhile to study the
nature of the recursive structure of the program and the possibility of removing
the recursion. If the result is not a simpler, more efficient implementation
of the algorithm, such study will at least lead to better understanding of the
method.
SELECTION AND MERGING 153
Exercises
1. For N = 1000, empirically determine: the value of k for which the Quicksort-
based partitioning procedure be :omes faster than using heaps to find
the Ic th smallest element in a randc'm file.
2. Describe how you would rearrange an array of 4N elements so that the
N smallest keys fall in the first N Isositions, the next N keys fall in the
next N positions, the next N in the next N positions, and the N largest
in the last N positions.
3. Show the recursive calls made when select is used to find the median of
the keys EASYQUESTION.
4. Write a program to rearrange a file so that all the elements with keys
equal to the median are in place, with smaller elements to the left and
larger elements to the right.
5. What method would be best for an application that requires selection of
the lath largest element (for various arbitrary Ic) a large number of times
on the same file?
6. True or false: the running time of mergesort does not depend on the order
of the keys in the input file. Explair. your answer.
7. What is the smallest number of steps mergesort could use (to within a
constant factor)?
8. Implement a bottom-up non-recursive mergesort that uses two arrays
instead of linked lists.
9. Show the contents of the linked lists l'assed as arguments to each call when
the recursive mergesort is used to sort the keys EASY Q U E S T IO N.
10. Show the contents of the linked l&t at each iteration when the nonrecursive
mergesort is used to sort the keys EASY QUE S TIO N.
I 3. External Sorting
Many important sorting applications involve processing very large files,
much too large to fit into the primal y memory of any computer. Methods
appropriate for such applications are ca.led external methods, since they involve
a large amount of processing external to the central processing unit (as
opposed to the internal methods that we've been studying).
There are two major factors which make external algorithms quite different
from those we've seen until now. l'irst, the cost of accessing an item
is orders of magnitude greater than any bookkeeping or calculating costs.
Second, even with this higher cost, there are severe restrictions on access,
depending on the external storage medium used: for example, items on a
magnetic tape can be accessed only in a sequential manner.
The wide variety of external storage device types and costs make the development
of external sorting methods v~:ry dependent on current technology.
The methods can be complicated, and many parameters affect their performance:
that a clever method might go unappreciated or unused because of a
simple change in the technology is a definite possibility in external sorting.
For this reason, we'll concentrate on general methods in this chapter rather
than on developing specific implementatjons.
In short, for external sorting, the "s:istems" aspect of the problem is certainly
as important as the "algorithms" .ispect. Both areas must be carefully
considered if an effective external sort is to be developed. The primary costs in
external sorting are for input-output. A good exercise for someone planning
to implement an efficient program to sort a very large file is first to implement
an efficient program to copy a large file, then (if that was too easy) implement
an efficient program to reverse the order of the elements in a large file. The
systems problems that arise in trying tc solve these problems efficiently are
similar to those that arise in external sorts. Permuting a large external file
in any non-trivial way is about as diffictllt as sorting it, even though no key
155
156 CWAPTER 13
comparisons, etc. are required. In external sorting, we are mainly concerned
with limiting the number of times each piece of data is moved between the
external storage medium and the primary memory, and being sure that such
transfers are done as efficiently as allowed by the available hardware.
External sorting methods have been developed which are suitable for the
punched cards and paper tape of the past, the magnetic tapes and disks of
the present, and the bubble memories and videodisks of the future. The essential
differences among the various devices are the relative size and speed
of available storage and the types of data access restrictions. We'll concentrate
on basic methods for sorting on magnetic tape and disk because these
devices are likely to remain in widespread use and illustrate the two fundamentally
different modes of access that characterize many external storage systems.
Often, modern computer systems have a "storage hierarchy" of several
progressively slower, cheaper, and larger memories. Many of the algorithms
that we will consider can be adapted to run well in such an environment, but
we'll deal exclusively with "two-level" memory hierarchies consisting of main
memory and disk or tape.
Sort-Merge
Most external sorting methods use the following general strategy: make a first
pass through the file to be sorted, breaking it up into blocks about the size
of the internal memory, and sort these blocks. Then merge the sorted blocks
together, by making several passes through the file, making successively larger
sorted blocks until the whole file is sorted. The data is most often accessed in
a sequential manner, which makes this method appropriate for most external
devices. Algorithms for external sorting strive to reduce the number of passes
through the file and to reduce the cost of a single pass to be as close to the
cost of a copy as possible.
Since most of the cost of an external sorting method is for input-output,
we can get a rough measure of the cost of a sort-merge by counting the number
of times each word in the file is read or written, (the number of passes over all
the data). For many applications, the methods that we consider will involve
on the order of ten or less such passes. Note that this implies that we're
interested in methods that can eliminate even a single pass. Also, the running
time of the whole external sort can be easily estimated from the running time
of something like the "reverse file copy" exercise suggested above.
Balanced Multiway Merging
To begin, we'll trace through the various steps of the simplest sort-merge
procedure for an example file. Suppose that we have records with the keys A
S O R T I N G A N D M E R G I N G E X A M P L E o n a n i n p u t t a p e ; t h e s e
EXTERNAL SORTING 157
are to be sorted and put onto an output tape. Using a "tape" simply means
that we're restricted to read the records isequentially: the second record can't
be read until the first has been, etc. Assume further that we have only enough
room for three records in our computer memory but that we have plenty of
tapes available.
The first step is to read in the file t.lree records at a time, sort them to
make three-record blocks, and output the sorted blocks. Thus, first we read in
A S 0 and output the block A 0 S, next are read in R T I and output the block
I R T,and so forth. Now, in order for these blocks to be merged together,
they must be on different tapes. If we want to do a three-way merge, then we
would use three tapes, ending up with the following configuration after the
sorting pass:
Tape I: A 0 S 1, M N A E X
Tape 2: I R T El G R L M P
Tape 3: A G N GIN E
Now we're ready to merge the sorted blocks of size three together. We
read the first record off each input tapl: (there's just enough room in the
memory) and output the one with the smallest key. Then the next record
from the same tape as the record just output is read in and, again, the record
in memory with the smallest key is output. When the end of a three-word
block in the input is encountered, then that tape is ignored until the blocks
from the other two tapes have been processed, and nine records have been
output. Then the process is repeated to merge the second three-word block
on each tape into a nine-word block (whiC1 is output on a different tape, to get
ready for the next merge). Continuing, we get three long blocks configured
as follows:
Taped: A A G I N 0 R S T
Tape 5: D E G G I M N N R
Tape 6: A E E 1, M P X
Now one more three-way merge completes the sort. If we had a much
longer file with many blocks of size 9 on each tape, then we would finish the
second pass with blocks of size 27 on tapes 1, 2, and 3, then a third pass
would produce blocks of size 81 on tapes 4, 5, and 6, and so forth. We need
six tapes to sort an arbitrarily large file: three for the input and three for the
CHAPTER 13
output of each three-way merge. Actually, we could get by with just four
tapes: the output could be put on just one tape, then the blocks from that
tape distributed to the three input tapes in between merging passes.
This method is called the balanced multiway merge: it is a reasonable algorithm
for external sorting and a good starting point for the implementation
of an external sort. The more sophisticated algorithms below could make the
sort run perhaps 50% faster, but not much more. (On the other hand, when
execution times are measured in hours, which is not uncommon in external
sorting, even a small percentage decrease in running time can be helpful and
50% can be quite significant.)
Suppose that we have N words to be manipulated by the sort and an
internal memory of size M. Then the "sort" pass produces about N/M sorted
blocks. (This estimate assumes l-word records: for larger records, the number
of sorted blocks is computed by multiplying further by the record size.) If
we do P-way merges on each subsequent pass, then the number of subsequent
passes is about logp(N/M), since each pass reduces the number of sorted
blocks by a factor of P.
Though small examples can help one understand the details of the algorithm,
it is best to think in terms of very large files when working with
external sorts. For example, the formula above says that using a 4-way merge
to sort a 200-million-word file on a computer with 1 million words of memory
should take a total of about five passes. A very rough estimate of the running
time can be found by multiplying by five the running time for the reverse file
copy implementation suggested above.
Replacement Selection
It turns out that the details of the implementation can be developed in an
elegant and efficient way using priority queues. First, we'll see that priority
queues provide a natural way to implement a multiway merge. More important,
it turns out that we can use priority queues for the initial sorting pass
in such a way that they can produce sorted blocks much longer than could fit
into internal memory.
The basic operation needed to do P-way merging is to repeatedly output
the smallest of the smallest elements not yet output from each of the P blocks
to be merged. That element should be replaced with the next element from
the block from which it came. The replace operation on a priority queue
of size P is exactly what is needed. (Actually, the "indirect" verions of the
priority queue routines, as described in Chapter 11, are more appropriate for
this application.) Specifically, to do a P-way merge we begin by filling up a
priority queue of size P with the smallest element from each of the P inputs
using the pqinsert procedure from Chapter 11 (appropriately modified so that
EXTERNAL SORTING 159
the smallest element rather than the larl;est is at the top of the heap). Then,
using the pqreplace procedure from Chitpter 11 (modified in the same way)
we output the smallest element and replace it in the priority queue with the
next element from its block.
For example, the following table shows the result of merging A 0 S with
I R T and A G N (the first merge from ')ur example above):
1 2 3
A I A
A I 0
G I 0
I N 0
N R 0
0 R
R S
S T
T
The lines in the table represent the contents of a heap of size three used
in the merging process. We begin with t1 e first three keys in each block. (The
"heap condition" is that the first key must be smaller than the second and
third.) Then the first A is output and replaced with the 0 (the next key in its
block). This violates the heap condition, so the 0 is exchanged with the other
A. Then that A is output and replaced with the next key in its block, the G.
This does not violate the heap condition, so no further change is necessary.
Continuing in this way, we produce the! sorted file (read down in the table
to see the keys in the order in which they appear in the first heap position
and are output). When a block is exhausted, a sentinel is put on the heap
and considered to be larger than all the other keys. When the heap consists
of all sentinels, the merge is completed. This way of using priority queues is
sometimes called replacement selection.
Thus to do a P-way merge, we can use replacement selection on a priority
queue of size P to find each element to be output in 1ogP steps. This
performance difference is not of particular practical relevance, since a bruteforce
implementation can find each elernent to output in P steps, and P is
normally so small that this cost is dwarfed by the cost of actually outputting
the element. The real importance of re'placement selection is the way that
it can be used in the first part of the scrt-merge process: to form the initial
sorted blocks which provide the basis fo* the merging passes.
160 CHAPTER 13
The idea is to pass the (unordered) input through a large priority queue,
always writing out the smallest element on the priority queue as above, and
always replacing it with the next element from the input, with one additional
proviso: if the new element is smaller than the last one put out, then, since
it could not possibly become part of the current sorted block, it should be
marked as a member of the next block and treated as greater than all elements
in the current block. When a marked element makes it to the top of the
priority queue, the old block is ended and a new block started. Again, this
is easily implemented with pqinsert and pqreplace from Chapter 11, again
appropriately modified so that the smallest element is at the top of the heap,
and with pqreplace changed to treat marked elements as always greater than
unmarked elements.
Our example file clearly demonstrates the value of replacement selection.
With an internal memory capable of holding only three records, we can
produce sorted blocks of size 5, 4, 9, 6, and 1, as illustrated in the following
table. Each step in the diagram below shows the next key to be input (boxed)
and the contents of the heap just before that key is input. (As before, the
order in which the keys occupy the first position in the heap is the order in
which they are output.) Asterisks are used to indicate which keys in the heap
belong to different blocks: an element marked the same way as the element at
the root belongs to the current sorted block, others belong to the next sorted
block. Always, the heap condition (first key less than the second and third) is
maintained, with elements in the next sorted block considered to be greater
than elements in the current sorted block.
IlLI
0T
0 Iu
NI-
s-l
Ll AI.
4
0D
I-d
A
0
R
S
T
G*
1*
N*
N*
S
S
S
I'
I'
I'
A
A
A
0 0E A M D M-q A* G* E"
R I-RI D M E I-PI E* G* M*
T 0G E M R L0 G* P* M* - -
T 1-11 G M R I-El L* P* M"
N' qN I M R 0 M' P* E
- -
N' LGl MN R u P* E
N* qE N G* R 0 E
N* i-d R G* E*
D IAl X G* E*
For example, when pqreplace is called for M, it returns N for output (A and
D are considered greater) and then sifts down M to make the heap A M D.
EXTERNAL SORTING 161
It can be shown that, if the keys a-e random, the runs produced using
replacement selection are about twice the size of what could be produced using
an internal method. The practical effect of this is to save one merging pass:
rather than starting with sorted runs about the size of the internal memory
and then taking a merging pass to produce runs about twice the size of the
internal memory, we can start right of' with runs about twice the size of
the internal memory, by using replacement selection with a priority queue of
size M. If there is some order in the keys, then the runs will be much, much
longer. For example, if no key has more than M larger keys before it in the
file, the file will be completely sorted by the replacement selection pass, and
no merging will be necessary! This is the most important practical reason to
use the method.
In summary, the replacement seleclion technique can be used for both
the "sort" and the "merge" steps of a balanced multiway merge. To sort N
l-word records using an internal memo y of size M and P + 1 tapes, first
use replacement selection with a priority queue of size M to produce initial
runs of size about 2M (in a random situation) or longer (if the file is partially
ordered) then use replacement selection with a priority queue of size P for
about log,(N/2M) (or fewer) merge passes.
Practical Considerations
To complete an implementation of the llorting method outlined above, it is
necessary to implement the inputroutput functions which actually transfer
data between the processor and the ex;ernal devices. These functions are
obviously the key to good performance for the external sort, and they just
as obviously require careful consideratisw of some systems (as opposed to
algorithm) issues. (Readers unfamiliar with computers at the "systems" level
may wish to skim the next few paragraphs.)
A major goal in the implementation should be to overlap reading, writing,
and computing as much as possible. Most large computer systems have
independent processing units for controlling the large-scale input/output (I/O)
devices which make this overlapping pos:#ible. The efficiency achievable by an
external sorting method depends on the number of such devices available.
For each file being read or written, there is a standard systems programming
technique called double-buffering which can be used to maximize the
overlap of I/O with computing. The idl:a is to maintain two "buffers," one
for use by the main processor, one for us: by the I/O device (or the processor
which controls the I/O device). For input, the processor uses one buffer while
the input device is filling the other. When the processor has finished using
its buffer, it waits until the input device has filled its buffer, then the buffers
switch roles: the processor uses the near data in the just-filled buffer while
162 CHAPTER 13
the input device refills the buffer with the data already used by the processor.
The same technique works for output, with the roles of the processor and
the device reversed. Usually the I/O time is far greater than the processing
time and so the effect of double-buffering is to overlap the computation time
entirely; thus the buffers should be as large as possible.
A difficulty with double-buffering is that it really uses only about half
the available memory space. This can lead to inefficiency if a large number
of buffers are involved, as is the case in P-way merging when P is not small.
This problem can be dealt with using a technique called forecasting, which
requires the use of only one extra buffer (not P) during the merging process.
Forecasting works as follows. Certainly the best way to overlap input with
computation during the replacement selection process is to overlap the input
of the buffer that needs to be filled next with the processing part of the
algorithm. But it is easy to determine which buffer this is: the next input
buffer to be emptied is the one whose lust item is smallest. For example, when
merging A 0 S with I R T and A G N we know that the third buffer will be
the first to empty, then the first. A simple way to overlap processing with
input for multiway merging is therefore to keep one extra buffer which is filled
by the input device according to this rule. When the processor encounters an
empty buffer, it waits until the input buffer is filled (if it hasn't been filled
already), then switches to begin using that buffer and directs the input device
to begin filling the buffer just emptied according to the forecasting rule.
The most important decision to be made in the implementation of the
multiway merge is the choice of the value of P, the "order" of the merge. For
tape sorting, when only sequential access is allowed, this choice is easy: P
must be chosen to be one less than the number of tape units available: the
multiway merge uses P input tapes and one output tape. Obviously, there
should be at least two input tapes, so it doesn't make sense to try to do tape
sorting with less than three tapes.
For disk sorting, when access to arbitrary positions is allowed but is
somewhat more expensive than sequential access, it is also reasonable to
choose P to be one less than the number of disks available, to avoid the
higher cost of non-sequential access that would be involved, for example, if
two different input files were on the same disk. Another alternative commonly
used is to pick P large enough so that the sort will be complete in two merging
phases: it is usually unreasonable to try to do the sort in one pass, but a twopass
sort can often be done with a reasonably small P. Since replacement
selection produces about N/2M runs and each merging pass divides the
number of runs by P, this means P should be chosen to be the smallest integer
with P2 > N/2M. For our example of sorting a 200-million-word file on a
computer with a l-million-word memory, this implies that P = 11 would be a
safe choice to ensure a two-pass sort. (The right value of P could be computed
EXTERNAL SORTING 163
exactly after the sort phase is completed ) The best choice between these two
alternatives of the lowest reasonable value of P and the highest reasonable
value of P is obviously very dependent on many systems parameters: both
alternatives (and some in between) should be considered.
Polyphase Merging
One problem with balanced multiway merging for tape sorting is that it
requires either an excessive number of tape units or excessive copying. For
P-way merging either we must use 2P t lpes (P for input and P for output)
or we must copy almost all of the file from a single output tape to P input
tapes between merging passes, which effectively doubles the number of passes
to be about 21og,(N/2M). S everal clevl:r tape-sorting algorithms have been
invented which eliminate virtually all of this copying by changing the way in
which the small sorted blocks are merged together. The most prominent of
these methods is called polyphase mergir;g.
The basic idea behind polyphase merging is to distribute the sorted blocks
produced by replacement selection somewhat unevenly among the available
tape units (leaving one empty) and thc:n to apply a "merge until empty"
strategy, at which point one of the output tapes and the input, tape switch
roles.
For example, suppose that we have just three tapes, and we start out
with the following initial configuration of sorted blocks on the tapes. (This
comes from applying replacement selection to our example file with an internal
memory that can only hold two records.:
Tape I : A 0 R S T I N A G N D E M R G I N
Tape,2:EGX A M P E L
Tape 3:
After three 2-way merges from tape3 1 and 2 to tape 3, the second tape
becomes empty and we are left with the configuration:
T a p e l : D E M R G I N
Tape 2:
TapeS:AEGOR STX A I M N P A E G L N
Then, after two 2-way merges from tapes 1 and 3 to tape 2, the first tape
becomes empty, leaving:
Tape 1:
TapeZ:ADEEGMORRSTX A G I I M N N P
Tape3:AEGLN
164 CHAPTER 13
The sort is completed in two more steps. First, a two-way merge from
tapes 2 and 3 to tape 1 leaves one file on tape 2, one file on tape 1. Then a
twoway merge from tapes 1 and 2 to tape 3 leaves the entire sorted file on
tape 3.
This "merge until empty" strategy can be extended to work for an arbitrary
number of tapes. For example, if we have four tape units Tl, T2,
T3, and T4 and we start out with Tl being the output tape, T2 having 13
initial runs, T3 having 11 initial runs, and T4 having 7 initial runs, then after
running a 3-way "merge until empty," we have T4 empty, Tl with 7 (long)
runs, T2 with 6 runs, and T3 with 4 runs. At this point, we can rewind
Tl and make it an input tape, and rewind T4 and make it an output tape.
Continuing in this way, we eventually get the whole sorted file onto Tl:
Tl T2 T3 T4
0 13 11 7
7 6 4 0
3 2 0 4
1 0 2 2
0 1 1 1
1 0 0 0
The merge is broken up into many phases which don't involve all the data,
but no direct copying is involved.
The main difficulty in implementing a polyphase merge is to determine
how to distribute the initial runs. It is not difficult to see how to build the
table above by working backwards: take the largest number on each line, make
it zero, and add it to each of the other numbers to get the previous line. This
corresponds to defining the highest-order merge for the previous line which
could give the present line. This technique works for any number of tapes
(at least three): the numbers which arise are "generalized Fibonacci numbers"
which have many interesting properties. Of course, the number of initial runs
may not be known in advance, and it probably won't be exactly a generalized
Fibonacci number. Thus a number of "dummy" runs must be added to make
the number of initial runs exactly what is needed for the table.
The analysis of polyphase merging is complicated, interesting, and yields
surprising results. For example, it turns out that the very best method for
distributing dummy runs among the tapes involves using extra phases and
more dummy runs than would seem to be needed. The reason for this is that
some runs are used in merges much more often than others.
EXTERNAL SORTING 165
There are many other factors to be t&ken into consideration in implementing
a most efficient tape-sorting method. For example, a major factor which
we have not considered at all is the timt: that it takes to rewind a tape. This
subject has been studied extensively, ant many fascinating methods have been
defined. However, as mentioned above, the savings achievable over the simple
multiway balanced merge are quite limited. Even polyphase merging is only
better than balanced merging for small P, and then not substantially. For
P > 8, balanced merging is likely to run j'aster than polyphase, and for smaller
P the effect of polyphase is basically to sue two tapes (a balanced merge with
two extra tapes will run faster).
An Easier Way
Many modern computer systems provide a large virtual memory capability
which should not be overlooked in imp ementing a method for sorting very
large files. In a good virtual memory syf#tem, the programmer has the ability
to address a very large amount of data, leaving to the system the responsibility
of making sure that addressed data is Lransferred from external to internal
storage when needed. This strategy relict on the fact that many programs have
a relatively small "locality of reference" : each reference to memory is likely to
be to an area of memory that is relatively close to other recently referenced
areas. This implies that transfers from e:rternal to internal storage are needed
infrequently. An int,ernal sorting method with a small locality of reference can
work very well on a virtual memory system. (For example, Quicksort has two
"localities" : most references are near one of the two partitioning pointers.)
But check with your systems programmclr before trying it on a very large file:
a method such as radix sorting, which hE,s no locality of reference whatsoever,
would be disastrous on a virtual memory system, and even Quicksort could
cause problems, depending on how well the available virtual memory system
is implemented. On the other hand, th': strategy of using a simple internal
sorting method for sorting disk files desl:rves serious consideration in a good
virtual memorv environment.
166
Exercises
1. Describe how you would do external selection: find the kth largest in a
file of N elements, where N is much too large for the file to fit in main
memory.
2. Implement the replacement selection algorithm, then use it to test the
claim that the runs produced are about twice the internal memory size.
3. What is the worst that can happen when replacement selection is used to
produce initial runs in a file of N records, using a priority queue of size
M, with M < N.
4. How would you sort the contents of a disk if no other storage (except
main memory) were available for use?
5. How would you sort the contents of a disk if only one tape (and main
memory) were available for use?
6. Compare the 4-tape and 6-tape multiway balanced merge to polyphase
merge with the same number of tapes, for 31 initial runs.
7. How many phases does 5-tape polyphase merge use when started up with
four tapes containing 26,15,22,28 runs?
8. Suppose the 31 initial runs in a 4-tape polyphase merge are each one
record long (distributed 0, 13, 11, 7 initially). How many records are
there in each of the files involved in the last three-way merge?
9. How should small files be handled in a Quicksort implementation to be
run on a very large file within a virtual memory environment?
10. How would you organize an external priority queue? (Specifically, design
a way to support the insert and remove operations of Chapter 11, when
the number of elements in the priority queue could grow to be much to
large for the queue to fit in main memory.)
167
SOURCES for Sorting
The primary reference for this section is volume three of D. E. Knuth's
series on sorting and searching. Further information on virtually every topic
that we've touched upon can be found in that book. In particular, the results
that we've quoted on performance chal,acteristics of the various algorithms
are backed up by complete mathematic:tl analyses in Knuth's book.
There is a vast amount of literatllre on sorting. Knuth and Rivest's
1973 bibliography contains hundreds of entries, and this doesn't include the
treatment of sorting in countless books ind articles on other subjects (not to
mention work since 1973).
For Quicksort, the best reference is Hoare's original 1962 paper, which
suggests all the important variants, including the use for selection discussed
in Chapter 12. Many more details on the mathematical analysis and the
practical effects of many of the modifications and embellishments which have
been suggested over the years may be fat nd in this author's 1975 Ph.D. thesis.
A good example of an advanced priority queue structure, as mentioned in
Chapter 11, is J. Vuillemin's "binomial cueues" as implemented and analyzed
by M. R. Brown. This data structure supports all of the priority queue
operations in an elegant and efficient manner.
To get an impression of the myriall details of reducing algorithms like
those we have discussed to general-purpoire practical implementations, a reader
would be advised to study the reference material for his particular computer
system's sort utility. Such material necef sarily deals primarily with formats of
keys, records and files as well as many other details, and it is often interesting
to identify how the algorithms themselv:s are brought into play.
M. R. Brown, "Implementation and am.lysis of binomial queue algorithms,"
SIAM Journal of Computing, 7, 3, (August, 1978).
C. A. R. Hoare, "Quicksort," Computer Journal, 5, 1 (1962).
D. E. Knuth, The Art of Computer Programming. Volume S: Sorting and
Searching, Addison-Wesley, Reading, M9, second printing, 1975.
R. L. Rivest and D. E. Knuth, "BibliogIaphy 26: Computing Sorting," Computing
Reviews, 13, 6 (June, 1972).
R. Sedgewick, Quicksort, Garland, New York, 1978. (Also appeared as the
author's Ph.D. dissertation, Stanford University, 1975).
SEARCHING
c f
I
--
!t-
I
14. Elementary Searching Methods
A fundamental operation intrinsic ;o a great many computational tasks
is searching: retrieving some partic-liar information from a large amount
of previously stored information. Normally we think of the information as
divided up into records, each record haling a key for use in searching. The
goal of the search is to find all records with keys matching a given search key.
The purpose of the search is usually to ;1ccess information within the record
(not merely the key) for processing.
Two common terms often used to describe data structures for searching
are dictionaries and symbol tables. For example, in an English language dictionary,
the "keys" are the words and the "records" the entries associated with
the words which contain the definition, pronunciation, and other associated information.
(One can prepare for learning and appreciating searching methods
by thinking about how one would implenent a system allowing access to an
English language dictionary.) A symbol table is the dictionary for a program:
the "keys" a-e the symbolic names used in the program, and the "records"
contain information describing the objet t named.
In searching (as in sorting) we havt: programs which are in widespread
use on a very frequent basis, so that it vrill be worthwhile to study a variety
of methods in some detail. As with sorling, we'll begin by looking at some
elementary methods which are very useful for small tables and in other special
situations and illustrate fundamental techniques exploited by more advanced
methods. We'll look at methods which stelre records in arrays which are either
searched with key comparisons or index:d by key value, and we'll look at a
fundamental method which builds structures defined by the key values.
As with priority queues, it is best to think of search algorithms as belonging
to packages implementing a variety of generic operations which can be
separated from particular implementations, so that alternate implementations
could be substituted easily. The operations of interest include:
171
172 CHAPTER 14
Initialize the data structure.
Search for a record (or records) having a given key.
Insert a new record.
Delete a specified record.
Join two dictionaries to make a large one.
Sort the dictionary; output all the records in sorted order.
As with priority queues, it is sometimes convenient to combine some of these
operations. For example, a search and insert operation is often included for
efficiency in situations where records with duplicate keys are not to be kept
within the data structure. In many methods, once it has been determined
that a key does not appear in the data structure, then the internal state of
the search procedure contains precisely the information needed to insert a new
record with the given key.
Records with duplicate keys can be handled in one of several ways,
depending on the application. First, we could insist that the primary searching
data structure contain only records with distinct keys. Then each "record" in
this data structure might contain, for example, a link to a list of all records
having that key. This is the most convenient arrangement from the point
of view of the design of searching algorithms, and it is convenient in some
applications since all records with a given search key are returned with one
search. The second possibility is to leave records with equal keys in the
primary searching data structure and return any record with the given key
for a search. This is simpler for applications that process one record at a
time, where the order in which records with duplicate keys are processed is
not important. It is inconvenient from the algorithm design point of view
because some mechanism for retrieving all records with a given key must still
be provided. A third possibility is to assume that each record has a unique
identifier (apart from the key), and require that a search find the record with
a given identifier, given the key. Or, some more complicated mechanism could
be used to distinguish among records with equal keys.
Each of the fundamental operations listed above has important applications,
and quite a large number of basic organizations have been suggested to
support efficient use of various combinations of the operations. In this and the
next few chapters, we'll concentrate on implementations of the fundamental
functions search and insert (and, of course, initialize), with some comment on
delete and sort when appropriate. As with priority queues, the join operation
normally requires advanced techniques which we won't be able to consider
here.
Sequential Searching
The simplest method for searching is simply to store the records in an array,
ELEMENTARY SEARCHING METHCDS 173
then look through the array sequentially each time a record is sought. The
following code shows an implementation of the basic functions using this
simple organization, and illustrates sorle of the conventions that we'll use
in implementing searching methods.
type node=record key, info: integer end;
var a: array [O.maxN] of node;
N: integer;
procedure initialize;
begin N:=O er d;
function seqsearc:h(v: integer; x: integer): integer;
a[N+l].key:=v;
if (x>=O) and (x<=N) then
repeat x:=x+1 until v=a[x].key;
seqsearch : =x
end ;
function seqinsel t(v: integer): integer;
N:=N+I; a[Pr].key:=v;
seqinsert:=N;
end ;
The code above processes records that have integer keys (key) and "associated
information" (info). As with sorting, it vrill be necessary in many applicat,ions
to extend the programs to handle more complicated records and keys, but
this won't fundamentally change the algorithms. For example, info could be
made into a pointer to an arbitrarily complicated record structure. In such
a case, this field can serve as the unique identifier for the record for use in
distinguishing among records with equal keys.
The search procedure takes two arguments in this implementation: the
key value being sought and an index (x) into the array. The index is included
to handle the case where several records have the same key value: by successively
executing t:= search(v, t) starting at t=O we can successively set t to
the index of each record with key value v.
A sentinel record containing the key value being sought is used, which
ensures that the search will always terminate, and therefore involves only
one completion test within the inner loclp. After the inner loop has finished,
testing whether the index returned is g;reater than N will tell whether the
search found the sentinel or a key from the table. This is analogous to our
use of a sentinel record containing the smallest or largest key value to simplify
174 CHAF'TER 14
the coding of the inner loop of various sorting algorithms.
This method takes about N steps for an unsuccessful search (every record
must be examined to decide that a record with any particular key is absent)
and about N/2 steps, on the average, for a successful search (a "random"
search for a record in the table will require examining about half the entries,
on the average).
Sequential List Searching
The seqsearch program above uses purely sequential access to the records,
and thus can be naturally adapted to use a linked list representation for the
records. One advantage of doing so is that it becomes easy to keep the list
sorted, as shown in the following implementation:
type link=rnode;
node=record key, info: integer; next: link end;
var head, t, z: link;
i: integer;
procedure initialize;
new(z); zt.next:=z;
new(head); headf.next:=z;
end ;
function listsearch(v: integer; t: link): link;
zf.key:=v;
repeat t : = tt .next until v< = tt .key;
if v=tt .key then listsearch := t
else lis tsearch : = z
end ;
function listinsert (v: integer; t : link) : link;
var x: link;
zf.key:=v;
while tt.nextt.key<v do t:=tt.next;
new(x); xt.next:=tf.next; tt.next:=x;
xf.key:=v;
Jistinsert:=x;
end ;
With a sorted list, a search can be terminated unsuccessfully when a record
with a key larger than the search key is found. Thus only about half the
ELEMENTARY SEARCHING METHO.DS 175
records (not all) need to be examined fo:* an unsuccessful search. The sorted
order is easy to maintain because a new record can simply be inserted into the
list at the point at which the unsuccessful search terminates. As usual with
linked lists, a dummy header node head and a tail node a allow the code to
be substantially simpler than without th:m. Thus, the call listinsert(v, head)
will put a new node with key v into the lj st pointed to by the next field of the
head, and listsearch is similar. Repeated calls on listsearch using the links
returned will return records with duplica,te keys. The tail node z is used as a
sentinel in the same way as above. If lis6search returns a, then the search was
unsuccessful.
If something is known about the relative frequency of access for various
records, then substantial savings can oftc:n be realized simply by ordering the
records intelligently. The "optimal" arrangement is to put the most frequently
accessed record at the beginning, the second most frequently accessed record
in the second position, etc. This technique can be very effective, especially if
only a small set of records is frequently accessed.
If information is not available about the frequency of access, then an
approximation to the optimal arrangerlent can be achieved with a "selforganizing"
search: each time a record is accessed, move it to the beginning
of the list. This method is more conveniently implemented when a linked-list
implementation is used. Of course the running time for the method depends
on the record access distributions, so it it; difficult to predict how it will do in
general. However, it is well suited to the quite common situation when most
of the accesses to each record tend to happen close together.
Binary Search
If the set of records is large, then the total search time can be significantly
reduced by using a search procedure based on applying the "divide-andconquer"
paradigm: divide the set of records into two parts, determine which
of the two parts the key being sought t'elongs to, then concentrate on that
part. A reasonable way to divide the sets of records into parts is to keep the
records sorted, then use indices into the sorted array to delimit the part of the
array being worked on. To find if a given key v is in the table, first compare
it with the element at the middle position of the table. If v is smaller, then
it must be in the first half of the table; if v is greater, then it must be in the
second half of the table. Then apply the method recursively. (Since only one
recursive call is involved, it is simpler to express the method iteratively.) This
brings us directly to the following implementation, which assumes that the
array a is sorted.
176 CHAPTER 14
function binarysearch (v: integer) : integer;
var x, 1, r: integer;
1:=1; r:=N;
repeat
x:=(I+r) div 2;
if v<a[x].key then r:=x-l else 1:=x+1
until (v=a[x].key) or (br);
if v=a [x] .key then binarysearch :=x
else binarysearch := N+ 1
end ;
Like Quicksort and radix exchange sort, this method uses the pointers 1 and
r to delimit the subfile currently being worked on. Each time through the
loop, the variable x is set to point to the midpoint of the current interval, and
the loop terminates successfully, or the left pointer is changed to x+1, or the
right pointer is changed to x-l, depending on whether the search value v is
equal to, less than, or greater than the key value of the record stored at a[~].
The following table shows the subfiles examined by this method when
searching for S in a table built by inserting the keys A S E A R C H I N G E
X A M P L E :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
A A A C E E E G H I L M N P R S X
AAACEEEGMILMNPRSX
ILMm-PR-SX
PNS x
rls x
The interval size is at least halved at each step, so the total number of
times through the loop is only about 1gN. However, the time required to
insert new records is high: the array must be kept sorted, so records must be
moved to make room for new records. For example, if the new record has
a smaller key than any record in the table, then every entry must be moved
over one position. A random insertion requires that N/2 records be moved,
on the average. Thus, this method should not be used for applications which
involve many insertions.
ELEMENTARY SEARCHING METHODS 1 7 7
Some care must be exercised to pro.)erly handle records with equal keys
for this algorithm: the index returned cmluld fall in the middle of a block of
records with key v, so loops which scan in both directions from that index
should be used to pick up all the records. Of course, in this case the running
time for the search is proportional to lg)V plus the number of records found.
The sequence of comparisons made by the binary search algorithm is
predetermined: the specific sequence used is based on the value of the key
being sought and the value of N. The comparison structure can be simply
described by a binary tree structure. The following binary tree describes the
comparison structure for our example se, of keys:
In searching for the key S for instance, it is first compared to H. Since it is
greater, it is next compared to N; otheruise it would have been compared to
C), etc. Below we will see algorithms that use an explicitly constructed binary
tree structure to guide the search.
One improvement suggested for binary search is to try to guess more
precisely where the key being sought falls Tvithin the current interval of interest
(rather than blindly using the middle element at each step). This mimics the
way one looks up a number in the telephone directory, for example: if the
name sought begins with B, one looks r(ear the beginning, but if it begins
with Y, one looks near the end. This method, called interpolation search,
requires only a simple modification to the program above. In the program
above, the new place to search (the midpoint of the interval) is computed
with the statement x:=(l+r) div 2. This is derived from the computation
z = 1+ $(r - 1): the middle of the interval is computed by adding half the size
of the interval to the left endpoint. Inte*polation search simply amounts to
replacing i in this formula by an estima;e of where the key might be based
on the values available: i would be appropriate if v were in the middle of the
interval between a[I].key and a[r].key, but we might have better luck trying
178 CHAPTER 14
x:=J+(v-a[J].Jcey)*(r-J) div (a[r].Jcey-a[J].key). Of course, this assumes
numerical key values. Suppose in our example that the ith letter in the
alphabet is represented by the number i. Then, in a search for S, the first
table position examined would be x = 1 + (19 - 1)*(17 - 1)/(24 - 1) = 13. The
search is completed in just three steps:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
AAACEEEGHILMNPRSX
PHS x
0 s x
Other search keys are found even more efficiently: for example X and A are
found in the first step.
Interpolation search manages to decrease the number of elements examined
to about 1oglogN. This is a very slowly growing function which
can be thought of as a constant for practical purposes: if N is one billion,
1glgN < 5. Thus, any record can be found using only a few accesses, a substantial
improvement over the conventional binary search method. But this
assumes that the keys are rather well distributed over the interval, and it
does require some computation: for small N, the 1ogN cost of straight binary
search is close enough to log log N that the cost of interpolating is not likely
to be worthwhile. But interpolation search certainly should be considered for
large files, for applications where comparisons are particularly expensive, or
for external methods where very high access costs are involved.
Binary Tree Search
Binary tree search is a simple, efficient dynamic searching method which
qualifies as one of the most fundamental algorithms in computer science. It's
classified here as an "elementary" method because it is so simple; but in fact
it is the method of choice in many situations.
The idea is to build up an explicit structure consisting of nodes, each
node consisting of a record containing a key and left and right links. The
left and right links are either null, or they point to nodes called the left son
and the right son. The sons are themselves the roots of trees, called the left
subtree and the right subtree respectively. For example, consider the following
diagram, where nodes are represented as encircled key values and the links by
lines connected to nodes:
ELEMENTARY SEARCHTNG METHODS 179
E
eA ?R C H
I
The links in this diagram all point down. Thus, for example, E's right link
points to R, but H's left link is null.
The defining property of a tree is that every node is pointed to by only
one other node called its father. (We assume the existence of an imaginary
node which points to the root.) The defining property of a binary tree is that
each node has left and right links. For s:arching, each node also has a record
with a key value; in a binary search tree we insist that all records with smaller
keys are in the left subtree and that i.11 records in the right subtree have
larger (or equal) key values. We'll soon see that it is quite simple to ensure
that binary search trees built by successively inserting new nodes satisfy this
defining property.
A search procedure like binarysearch immediately suggests itself for this
structure. To find a record with a give 1 key U, first compare it against the
root. If it is smaller, go to the left SI btree; if it is equal, stop; and if it
is greater, go to the right subtree. AplJy the method recursively. At each
step, we're guaranteed that no parts of tlie tree other than the current subtree
could contain records with key v, and, just as the size of the interval in binary
search shrinks, the "current subtree" always gets smaller. The procedure stops
either when a record with key v is founcl or, if there is no such record, when
the "current subtree" becomes empty. (The words "binary," "search," and
"tree" are admittedly somewhat overuse,1 at this point, and the reader should
be sure to understand the difference betlveen the binarysearch function given
above and the binary search trees described here. Above, we used a binary
tree to describe the sequence of comparisons made by a function searching
in an array; here we actually construct 2. data structure of records connected
with links which is used for the search.)
180 CHAPTER 14
type link=tnode;
node=record key, info: integer; 1, r: link end;
var t, head, z: link;
function treesearch(v: integer; x: link): link;
zt.key:=v;
repeat
if v<xf.key then x:=xt.l else x:=xt.r
until v=xt . key;
treesearch : =x
end ;
As with sequential list searching, the coding in this program is simplified
by the use of a "tail" node z. Similarly, the insertion code given below is
simplified by the use of a tree header node head whose right link points to the
root. To search for a record with key v we set x:= treesearch(v, head).
If a node has no left (right) subtree then its left (right) link is set to
point to z. As in sequential search, we put the value sought in a to stop
an unsuccessful search. Thus, the "current subtree" pointed to by x never
becomes empty and all searches are "successful" : the calling program can
check whether the link returned points to a to determine whether the search
was successful. It is sometimes convenient to think of links which point to z as
pointing to imaginary external nodes with all unsuccessful searches ending at
external nodes. The normal nodes which cont,ain our keys are called internal
nodes; by introducing external nodes we can say that every internal node
points to two other nodes in the tree, even though, in our implementation, all
of the external nodes are represented by the single node z.
For example, if D is sought in the tree above, first it is compared against
E, the key at the root. Since D is less, it is next compared against A, the key
in the left son of the node containing E. Continuing in this way, D is compared
next against the C to the right of that node. The links in the node containing
C are pointers to z so the search terminates with D being compared to itself
in z and the search is unsuccessful.
To insert a node into the tree, we just do an unsuccessful search for it,
then hook it on in place of z at the point where the search terminated, as in
the following code:
ELEMENTARY SEARCHING METHODS 181
function treeinsert(v: integer; x:link): link;
var f: link;
repeat
f:=x;
if v<xf.key then x:=x1.1 else x:=x7.1-
until x=z;
new(x); xf.key:=v; xt.l:=z; xf.r:=z;
if v<ff.key then f/.1:=x else f/x=x;
treeinsert:=x
end ;
To insert a new key in a tree with a tree header node pointed to by head, we
call treeinsert(v, head). To be able to do the insertion, we must keep track of
the father f of x, as it proceeds down the tree. When the bottom of the tree
(x=z) is reached, f points to the node whose link must be changed to point to
the new node inserted. The function returns a link to the newly created node
so that the calling routine can fill in the info field as appropriate.
When a new node whose key is equal to some key already in the tree
is inserted, it will be inserted to the right of the node already in the tree.
All records with key equal to v can be processed by successively setting t to
search(v, t) as we did for sequential searching.
As mentioned above, it is convenient to use a tree header node head
whose right link points to the actual root node of the tree, and whose key is
smaller than all other key values (for simplicity, we use 0 assuming the keys
are all positive integers). The left link of head is not used. The empty tree is
represented by having the right link of head point to z, as constructed by the
following code:
procedure treeinitialize;
new(z); new(head);
headt.key:=O; headf.r:=z;
end ;
To see the need for head, consider what happens when the first node is inserted
into an empty tree constructed by treeinitialize.
The diagram below shows the tree constructed when our sample keys are
inserted into an initially empty tree.
182 CHAPTER 14
The nodes in this tree are numbered in the order in which they were inserted.
Since new nodes are added at the bottom of the tree, the construction process
can be traced out from this diagram: the tree as it was after Ic records had been
inserted is simply the part of the tree consisting only of nodes with numbers
less than k (and keys from the first k letters of A S E A R C H I N G E X A
M P L E).
The sort function comes almost for free when binary search trees are
used, since a binary search tree represents a sorted file if you look at it the
right way. Consider the following recursive program:
procedure treeprint (x: link) ;
if x< > a then
treeprint (xl .I) ;
printnode(
treeprint(xf.r)
end
end ;
ELEMENTARY SEARCHING METHODS
The call treeprint(head7.r) will print out the keys of the tree in order. This
defines a sorting method which is remarkably similar to Quicksort, with the
node at the root of the tree playing a role similar to that of the partitioning
element in Quicksort. A major difference is that the tree-sorting method must
use extra memory for the links, while Quicksort sorts with only a little extra
memory.
The running times of algorithms on binary search trees are quite dependent
on the shapes of the trees. In the best case, the tree could be shaped like
that given above for describing the comparison structure for binary search,
with about lg N nodes between the root and each external node. We might,
roughly, expect logarithmic search times on the average because the first element
inserted becomes the root of the tree; if N keys are to be inserted at
random, then this element would divide the keys in half (on the average),
leading to logarithmic search times (using the same argument on the subtrees).
Indeed, were it not for the equal keys, it could happen that the tree given above
for describing the comparison structure for binary search would be built. This
would be the best case of the algorithm, with guaranteed logarithmic running
time for all searches. Actually, the root is equally likely to be any key in a
truly random situation, so such a perfectly balanced tree would be extremely
rare. But if random keys are inserted, it turns out that the trees are nicely
balanced. The average number of steps for a treesearch in a tree built by
successive insertion of N random keys is proportional to 2 In N.
On the other hand, binary tree searching is susceptible to the same worstcase
problems as Quicksort. For example, when the keys are inserted in order
(or in reverse order) the binary tree search method is no better than the
sequential search method that we saw at the beginning of this chapter. In the
next chapter, we'll examine a technique for eliminating this worst case and
making all trees look more like the best-case tree.
The implementations given above for the fundamental search, insert, and
sort functions using binary tree structures are quite straightforward. However,
binary trees also provide a good example of a recurrent theme in the study
of searching algorithms: the delete function is often quite cumbersome to
implement. To delete a node from a binary tree is easy if the node has no
sons, like L or P in the tree above (lop it off by making the appropriate link
in its father null); or if it has just one son, like G or R in the tree above
(move the link in the son to the appropriate father link); but what about
nodes with two sons, such as H or S in the tree above? Suppose that x is a
link to such a node. One way to delete the node pointed to by x is to first set
y to the node with the next highest key. By examining the treeprint routine,
one can become convinced that this node must have a null left link, and that
it can be found by y:=xf.r; while yt.l<>z do p:=yt.l. Now the deletion can
be accomplished by copying yfkey and yf.info into xt.key and xt.info, then
184 CIfAPTER 14
deleting the node pointed to by y. Thus, we delete H in the example above
by copying I into H, then deleting I; and we delete the E at node 3 by copying
the E at node 11 into node 3, then deleting node 11. A full implementation
of a treedelete procedure according to this description involves a fair amount
of code to cover all the cases: we'll forego the details because we'll be doing
similar, but more complicated manipulations on trees in the next chapter. It is
quite typical for searching algorithms to require significantly more complicated
implementations for deletion: the keys themselves tend to be integral to the
structure, and removal of a key can involve complicated repairs.
Indirect Binary Search Trees
As we saw with heaps in Chapter 11, for many applications we want a searching
structure to simply help us find records, without moving them around.
For example, we might have an array a[l..N] of records with keys, and we
would like the search routine to give us the index into that array of the record
matching a certain key. Or we might want to remove the record with a given
index from the searching structure, but still keep it in the array for some
other use.
To adapt binary search trees to such a situation, we simply make the
info field of the nodes the array index. Then we could eliminate the key field
by having the search routines access the keys in the records directly, e.g. via
an instruction like if v<a[xt .info] then. . . . However, it is often better to
make a copy of the key, and use the code above just as it is given. We'll
use the function name bstinsert(v, info: integer; x: link) to refer to a function
just like treeinsert, except that it also sets the info field to the value given
in the argument. Similarly, a function bstdelete(v,info: integer;x: link) to
delete the node with key v and array index info from the binary search tree
rooted at x will refer to an implementation of the delete function as described
above. These functions use an extra copy of the keys (one in the array, one
in the tree), but this allows the same function to be used for more than one
array, or as we'll see in Chapter 27, for more than one key field in the same
array. (There are other ways to achieve this: for example, a procedure could
be associated with each tree which extracts keys from records.)
Another direct way to achieve "indirection" for binary search trees is to
simply do away with the linked implementation entirely. That is, all links just
become indices into an array a[1 ..N] of records which contain a key field and 1
and r index fields. Then link references such as if v<xf.key then x:=x7.1 else
. . . become array references such as if v<a[x].key then x:=a[x].J else.. . . No
calls to new are used, since the tree exists within the record array: new(head)
becomes head:=& new(z) becomes z:=iV+1, and to insert the Mth node, we
would pass M, not v, to treeinsert, and then simply refer to a[M].key instead
of v and replace the line containing new(x) in treeinsert with x:=M. This
ELEMENTARY SEARCHING METHODS 185
way of implementing binary search trees to aid in searching large arrays of
records is preferred for many applications, since it avoids the extra expense of
copying keys as in the previous paragraph, and it avoids the overhead of the
storage allocation mechanism implied by new. The disadvantage is that space
is reserved with the record array for links which may not be in use, which
could lead to problems with large arrays in a dynamic situation.
186
Exercises
1. Implement a sequential searching algorithm which averages about N/2
steps for both successful and unsuccessful search, keeping the records in
a sorted array.
2. Give the order of the keys after records with the keys E A S Y Q U E S T
I 0 N have been put into an intially empty table with search and insert
using the self-organizing search heuristic.
3. Give a recursive implementation of binary search.
4. Suppose a[i]=2i for 1 5 i 5 N. How many table positions are examined
by interpolation search during the unsuccessful search for 2k - l?
5. Draw the binary search tree that results from inserting records with the
keys E A S Y Q U E S T I 0 N into an initially empty tree.
6. Write a recursive program to compute the height of a binary tree: the
longest distance from the root to an external node.
7. Suppose that we have an estimate ahead of time of how often search keys
are to be accessed in a binary tree. Should the keys be inserted into the
tree in increasing or decreasing order of likely frequency of access? Why?
8. Give a way to modify binary tree search so that it would keep equal keys
together in the tree. (If there are any other nodes in the tree with the
same key as any given node, then either its father or one of its sons should
have an equal key.)
9. Write a nonrecursive program to print out the keys from a binary search
tree in order.
10. Use a least-squares curvefitter to find values of a and b that give the best
formula of the form aN In N + bN for describing the total number of
instructions executed when a binary search tree is built from N random
keys.
15. Balanced Trees
The binary tree algorithms of the previous section work very well for
a wide variety of applications, but they do have the problem of bad
worst-case performance. What's more, as with Quicksort, it's embarrassingly
true that the bad worst case is one that's likely to occur in practice if the
person using the algorithm is not watching for it. Files already in order,
files in reverse order, files with alternating large and small keys, or files with
any large segment having a simple structure can cause the binary tree search
algorithm to perform very badly.
With Quicksort, our only recourse for improving the situation was to
resort to randomness: by choosing a random partitioning element, we could
rely on the laws of probability to save us from the worst case. Fortunately,
for binary tree searching, we can do much better: there is a general technique
that will enable us to guarantee that this worst case will not occur. This
technique, called balancing, has been used as the basis for several different
"balanced tree" algorithms. We'll look closely at one such algorithm and
discuss briefly how it relates to some of the other methods that are used.
As will become apparent below, the implementation of balanced tree
algorithms is certainly a case of "easier said than done." Often, the general
concept behind an algorithm is easily described, but an implementation is a
morass of special and symmetric cases. Not only is the program developed in
this chapter an important searching method, but also it is a nice illustration
of the relationship between a "high-level" algorithm description and a "lowlevel"
Pascal program to implement the algorithm.
Top-Down 2-3-4 Trees
To eliminate the worst case for binary search trees, we'll need some flexibility
in the data structures that we use. To get this flexibility, let's assume that we
can have nodes in our trees that can hold more than one key. Specifically, we'll
187
188 CHAPTER 15
allow J-nodes and d-nodes, which can hold two and three keys respectively. A
3-node has t.hree links coming out of it, one for all records with keys smaller
than both its keys, one for all records with keys in between its two keys, and
one for all records with keys larger than both its keys. Similarly, a 4-node
has four links coming out of it, one for each of the intervals defined by its
three keys. (The nodes in a standard binary search tree could thus be called
,%nodes: one key, two links.) We'll see below some efficient ways of defining
and implementing the basic operations on these extended nodes; for now, let's
assume we can manipulate them conveniently and see how they can be put
together to form trees.
For example, below is a &Y-4 tree which contains some keys from our
searching example.
It is easy to see how to search in such a tree. For example, to search for
0 in the tree above, we would follow the middle link from the root, since 0
is between E and R then terminate the unsuccessful search at the right link
from the node containing H and I.
To insert a new node in a 2-3-4 tree, we would like to do an unsuccessful
search and then hook the node on, as before. It is easy to see what to if the
node at which the search terminates is a 2-node: just turn it into a 3-node.
Similarly, a 3-node can easily be turned into a 4-node. But what should we
do if we need to insert our new node into a 4-node? The answer is that we
should first split the 4-node into two 2-nodes and pass one of its keys further
up in the tree. To see exactly how to do this, let's consider what happens
when the keys from A S E A R C H I N G E X A M P L E are inserted into
an initially empty tree. We start out with a a-node, then a 3-node, then a
4-node:
Now we need to put a second A into the 4-node. But notice that as far as
the search procedure is concerned, the 4-node at the right above is exactly
equivalent to the binary tree:
BALANCED TREES 189
E Feii A s
If our algorithm "splits" the 4-node to make this binary tree before trying to
insert the A, then there will be room for A at the bottom:
E F5b-l AA s
Now R, C, and the H can be inserted, but when it's time for I to be inserted,
there's no room in the 4-node at the right:
Again, this 4-node must be split into two 2-nodes to make room for the I, but
this time the extra key needs to be inserted into the father, changing it from
a 2-node to a S-node. Then the N can be inserted with no splits, then the G
causes another split, turning the root into a 4-node:
But what if we were to need to split a 4-node whose father is also a 4-node?
One method would be to split the father also, but this could keep happening
all the way back up the tree. An easier way is to make sure that the father of
any node we see won't be a 4-node by splitting any 4-node we see on the way
down the tree. For example, when E is inserted, the tree above first becomes
190
This ensures that we could handle the situation at the bottom even if E were
to go into a 4-node (for example, if we were inserting another A instead).
Now, the insertion of E, X, A, M, P, L, and E finally leads to the tree:
The above example shows that we can easily insert new nodes into 2-3-
4 trees by doing a search and splitting 4-nodes on the way down the tree.
Specifically, every time we encounter a 2-node connected to a 4-node, we
should transform it into a 3-node connected to two 2-nodes:
and every time we encounter a 3-node connected to a 4-node, we should
transform it into a 4-node connected to two 2-nodes:
BALANCED TREES
These transformations are purely "local": no part of the tree need be examined
or modified other than what is diagrammed. Each of the transformations
passes up one of the keys from a 4-node to its father in the tree, restructuring
links accordingly. Note that we don't have to worry explicitly about the father
being a 4-node since our transformations ensure that as we pass through each
node in the tree, we come out on a node that is not a 4-node. In particular,
when we come out the bottom of the tree, we are not on a 4-node, and we
can directly insert the new node either by transforming a 2-node to a 3-node
or a 3-node to a 4-node. Actually, it is convenient to treat the insertion as a
split of an imaginary 4-node at the bottom which passes up the new key to be
inserted. Whenever the root of the tree becomes a 4-node, we'll split it into
three 2-nodes, as we did for our first node split in the example above. This
(and only this) makes the tree grow one level "higher."
The algorithm sketched in the previous paragraph gives a way to do
searches and insertions in 2-3-4 trees; since the 4-nodes are split up on the
way from the top down, the trees are called top-down 2-S-4 trees. What's
interesting is that, even though we haven't been worrying about balancing at
all, the resulting trees are perfectly balanced! The distance from the root to
every external node is the same, which implies that the time required by a
search or an insertion is always proportional to log N. The proof that the trees
are always perfectly balanced is simple: the transformations that we perform
have no effect on the distance from any node to the root, except when we split
the root, and in this case the distance from all nodes to the root is increased
by one.
The description given above is sufficient to define an algorithm for searching
using binary trees which has guaranteed worst-case performance. However,
we are only halfway towards an actual implementation. While it would be
possible to write algorithms which actually perform transformations on distinct
data types representing 2-, 3-, and 4-nodes, most of the things that need
to be done are very inconvenient in this direct representation. (One can become
convinced of this by trying to implement even the simpler of the two
node transformations.) Furthermore, the overhead incurred in manipulating
the more complex node structures is likely to make the algorithms slower than
standard binary tree search. The primary purpose of balancing is to provide
"insurance" against a bad worst case, but it would be unfortunate to have
to pay the overhead cost for that insurance on every run of the algorithm.
Fortunately, as we'll see below, there is a relatively simple representation of
2-, 3-, and 4-nodes that allows the transformations to be done in a uniform
way with very little overhead beyond the costs incurred by standard binary
tree search.
192 CHAPTER 15
Red-Black Trees
Remarkably, it is possible to represent 2-3-4 trees as standard binary trees
(2-nodes only) by using only one extra bit per node. The idea is to represent
3-nodes and 4nodes as small binary trees bound together by "red" links
which contrast with the "black" links which bind the 2-3-4 tree together. The
representation is simple: 4-nodes are represented as three 2-nodes connected
by red links and 3-nodes are represented as two 2-nodes connected by a red
link (red links are drawn as double lines):
(Either orientation for a 3-node is legal.) The binary tree drawn below is one
way to represent the final tree from the example above. If we eliminate the
red links and collapse the nodes they connect together, the result is the 2-3-4
tree from above. The extra bit per node is used to store the color of the link
pointing to that node: we'll refer to 2-3-4 trees represented in this way as
red-black trees.
BALANCED TREES 193
The "slant" of each 3-node is determined by the dynamics of the algorithm
to be described below. There are many red-black trees corresponding to each
2-3-4 tree. It would be possible to enforce a rule that 3-nodes all slant the
same way, but there is no reason to do so.
These trees have many structural properties that follow directly from the
way in which they are defined. For example, there are never two red links in
a row along any path from the root to an external node, and all such paths
have an equal number of black links. Note that it is possible that one path
(alternating black-red) be twice as long as another (all black), but that all
path lengths are still proportional to 1ogN.
A striking feature of the tree above is the positioning of duplicate keys.
On reflection, it is clear that any balanced tree algorithm must allow records
with keys equal to a given node to fall on both sides of that node: otherwise,
severe imbalance could result from long strings of duplicate keys. This implies
that we can't find all nodes with a given key by repeated calls to the searching
procedure, as in the previous chapter. However, this does not present a real
problem, because all nodes in the subtree rooted at a given node with the
same key as that node can be found with a simple recursive procedure like
the treeprint procedure of the previous chapter. Or, the option of requiring
distinct keys in the data structure (with linked lists of records with duplicate
keys) could be used.
One very nice property of red-black trees is that the treesearch procedure
for standard binary tree search works without modification (except for the
problem with duplicate keys discussed in the previous paragraph). We'll
implement the link colors by adding a boolean field red to each node which is
true if the link pointing to the node is red, false if it is black; the treesearch
procedure simply never examines that field. That is, no "overhead" is added
by the balancing mechanism to the time taken by the fundamental searching
procedure. Each key is inserted just once, but might be searched for many
times in a typical application, so the end result is that we get improved search
times (because the trees are balanced) at relatively little cost (because no work
for balancing is done during the searches).
Moreover, the overhead for insertion is very small: we have to do something
different only when we see 4-nodes, and there aren't many 4-nodes in
the tree because we're always breaking them up. The inner loop needs only
one extra test (if a node has two red sons, it's a part of a 4-node), as shown
in the following implementation of the insert procedure:
1 9 4 CHripTER 15
function rbtreeinsert(v: integer; x:Jink) : link;
var gg, g, f: link;
f:=x; g:=x;
repeat
gg:=g; g:=f; f:=x;
if v<xf.key then x:=xf.J else x:=xf.r;
if xt.Jt.red and xt.rt.red then x:=spJit(v, gg, g, f, x);
until x=8;
new(x); xt.key:=v; xt.J:=z; xt.r:=z;
if v<f/.key then f/.J:=x else Q.r:=x;
rbtreeinsert:=x;
x:=spJit(v, gg, g, f, x);
end ;
In this program, x moves down the tree as before, with gg, g, and f kept
pointing to x's great-grandfather, grandfather, and father in the tree. To see
why all these links are needed, consider the addition of Y to the tree above.
When the external node at the right of the 3-node containing S and X is
reached, gg is R, g is S, and f is X. Now, Y must be added to make a 4-node
containing S, X, and Y, resulting in the following tree:
We need a pointer to R (gg) because R's right link must be changed to point
to X, not S. To see exactly how this comes about, we need to look at the
operation of the split procedure.
To understand how io implement the split operation, let's consider the
red-black representation for the two transformations we must perform: if we
have a 2-node connected to a 4-node, then we should convert them into a
BALANCED TREES 195
3-node connected to two a-nodes; if we have a S-node connected to a 4-node,
we should convert them into a 4-node connected to two 2-nodes. When a
new node is added at the bottom, it is considered to be the middle node of
an imaginary 4-node (that is, think of a as being red, though this never gets
explicitly tested).
The transformation required when we encounter a a-node connected to a
4-node is easy:
This same transformation works if we have a 3-node connected to a 4-node in
the "right" way:
Thus, split will begin by marking x to be red and the sons of x to be black.
This leaves the two other situations that can arise if we encounter a S-node
connected to a 4-node:
196 g CHAPTER 15 f
X 6* ?
(Actually, there are four situations, since the mirror images of these two can
also occur for S-nodes of the other orientation.) In these cases, the split-up of
the 4-node has left two red links in a row, an illegal situation which must be
corrected. This is easily tested for in the code: we just marked x red; if x's
father f is also red, we must take further action. The situation is not too bad
because we do have three nodes connected by red links: all we need to do is
transform the tree so that the red links point down from the same node.
Fortunately, there is a simple operation which achieves the desired effect.
Let's begin with the easier of the two, the third case, where the red links
are oriented the same way. The problem is that the 3-node was oriented the
wrong way: accordingly, we restructure the tree to switch the orientation of
the 3-node, thus reducing this case to be the same as the second, where the
color flip of x and its sons was sufficient. Restructuring the tree to reorient a
S-node involves changing three links, as shown in the example below:
BALANCED TREES 197
In this diagram, TI represents the tree containing all the records with keys
less than A, Tz, contains all the records with keys between A and B, and so
forth. The transformation switches the orientation of the S-node containing
A and B without disturbing the rest of the tree. Thus none of the keys in
TI, Tz, T3, and T, are touched. In this case, the transformation is effected by
the link changes st.l:=gsf.r; gst.r:=s; yt.l:=gs. Also, note carefully that the
colors of A and B are switched. There are three analogous cases: the 3-node
could be oriented the other way, or it could be on the right side of y (oriented
either way).
Disregarding the colors, this single rotation operation is defined on any
binary search tree and is the basis for several balanced tree algorithms. It is
important to note, however, that doing a single rotation doesn't necessarily
improve the balance of the tree. In the diagram above, the rotation brings
all the nodes in Tl one step closer to the root, but all the nodes in T3 are
lowered one step. If T3 were to have more nodes than Tl, then the tree after
the rotation would become less balanced, not more balanced. Top-down 2-3-4
trees may be viewed as simply a convenient way to identify single rotations
which are likely to improve the balance.
Doing a single rotation involves structurally modifying the tree, something
that should be done with caution. A convenient way to handle the four
different cases outlined above is to use the search key v to "rediscover" the
relevant son (s) and grandson (gs) of the node y. (We know that we'll only be
reorienting a 3-node if the search took us to its bottom node.) This leads to
somewhat simpler code that the alternative of remembering during the search
not only the two links corresponding to s and gs but also whether they are
right or left links. We have the following function for reorienting a 3-node
along the search path for v whose father is y:
function rotate(v: integer; y: link): link;
var s,gs: link;
if v<yt.key then s:=yf.l else s:=yf.r;
if v<st . key
then begin gs:=sf.l; st.l:=gsf.r; gst.r:=s end
else begin gs:=st.r; sf.r:=gsf.l; gsf.I:=s end;
if v<yt.key then yf.l:=gs else yf.r:=gs;
rotate:=gs
end ;
If s is the left link of y and gs is the left link of s, this makes exactly the link
transformations for the diagram above. The reader may wish to check the
198 CHAPTER 15
other cases. This function returns the link to the top of the S-node, but does
not do the color switch itself.
Thus, to handle the third case for split, we can make g red, then set x to
rotate(v,gg), then make x black. This reorients the 3-node consisting of the
two nodes pointed to by g and f and reduces this case to be the same as the
second case, when the 3-node was oriented the right way.
Finally, to handle the case when the two red links are oriented in different
directions, we simply set f to rotate(v, g). This reorients the "illegal" S-node
consisting of the two nodes pointed to by f and x. These nodes are the same
color, so no color change is necessary, and we are immediately reduced to
the third case. Combining this and the rotation for the third case is called a
double rotation for obvious reasons.
This completes the description of the operations which must be performed
by split. It must switch the colors of x and its sons, do the bottom part of a
double rotation if necessary, then do the single rotation if necessary:
function split(v: integer; gg, g, f, x: link): link;
xf.red:=true; xt.lf.red:=false; xf.rt.red:=false;
if ft.red then
gf.red:= true;
if (v<gf.key)<> (v<fi.key) then f:=rotate(v, g);
x:=rotate(v, gg);
xf.red:=false
end ;
headf.rf.red:=false;
split:=x
end ;
This procedure takes care of fixing the colors after a rotation and also restarts
x high enough in the tree to ensure that the search doesn't get lost due
to all the link changes. The long argument list is included for clarity; this
procedure should more properly be declared local to rbtreeinsert, with access
to its variables.
If the root is a 4-node then the split procedure will make the root red,
corresponding to transforming it, along with the dummy node above it into a
3-node. Of course, there is no reason to do this, so a statement is included at
the end of split to keep ihe root black.
Assembling the code fragments above gives a very efficient, relatively
simple algorithm for insertion using a binary tree structure that is guaranteed
BALANCED TREES
to take a logarithmic number of steps for all searches and insertions. This
is one of the few searching algorithms with that property, and its use is
justified whenever bad worst-case performance simply cannot be tolerated.
Furthermore, this is achieved at very little cost. Searching is done just as
quickly as if the balanced tree were constructed by the elementary algorithm,
and insertion involves only one extra bit test and an occasional split. For
random keys the height of the tree seems to be quite close to 1gN (and only
one or two splits are done for the average insertion) but no one has been able
to analyze this statistic for any balanced tree algorithm. Thus a key in a file
of, say, half a million records can be found by comparing it against only about
twenty other keys.
Other Algorithms
The "top-down 2-3-4 tree" implementation using the "red-black" framework
given in the previous section is one of several similar strategies than have
been proposed for implementing balanced binary trees. As we saw above, it
is actually the "rotate" operations that balance the trees: we've been looking
at a particular view of the trees that makes it easy to decide when to rotate.
Other views of the trees lead to other algorithms, a few of which we'll mention
briefly in this section.
The oldest and most well-known data structure for balanced trees is the
AVL tree. These trees have the property that the heights of the two subtrees
of each node differ by at most one. If this condition is violated because of
an insertion, it turns out that it can be reinstated using rotations. But this
requires an extra loop: the basic algorithm is to search for the value being
inserted, then proceed up the tree along the path just travelled adjusting the
heights of nodes using rotations. Also, it is necessary to know whether each
node has a height that is one less than, the same, or one greater than t,he
height of its brother. This requires two bits if encoded in a straightforward
way, though there is a way to get by with just one bit per node.
A second well-known balanced tree structure is the 2-3 tree, where only
2-nodes and 3-nodes are allowed. It is possible to implement insert using an
"extra loop" involving rotations as with AVL trees, but there is not quite
enough flexibility to give a convenient top-down version.
In Chapter 18, we'll study the most important type of balanced tree, an
extension of 2-3-4 trees called B-trees. These allow up to M keys per node for
large M, and are widely used for searching applications involving very large
files.
200
Exercises
1. Draw the top-down 2-3-4 tree that is built when the keys E A S Y Q U
E S T I 0 N are inserted into an initially empty tree (in that order).
2. Draw a red-black representation of the tree from the previous question.
3. Exactly what links are modified by split and rotate when Z is inserted
(after Y) into the example tree for this chapter?
4. Draw the red-black tree that results when the letters A to K are inserted
in order, and describe what happens in general when keys are inserted
into the trees in ascending order.
5. How many tree links actually must be changed for a double rotation, and
how many are actually changed in the given implementation?
6. Generate two random 32-node red-black trees, draw them (either by hand
or with a program), and compare them with the unbalanced binary search
trees built with the same keys.
7. Generate ten random lOOO-node red-black trees. Compute the number of
rotations required to build the trees and the average distance from the
root to an external node for the trees that you generate. Discuss the
results.
8. With 1 bit per node for "color," we can represent 2-, 3-, and 4-nodes.
How many different types of nodes could we represent if we used 2 bits
per node for "color"?
9. Rotations are required in red-black trees when S-nodes are made into 4-
nodes in an "unbalanced" way. Why not eliminate rotations by allowing
4-nodes to be represented as any three nodes connected by two red links
(perfectly balanced or not)?
10. Use a least-squares curvefitter to find values of a and b that give the
best formula of the form aN 1gN + bN for describing the total number
of instructions executed when a red-black tree is built from N random
keys.
16. Hashing
A completely different approach to searching from the comparisonbased
tree structures of the last section is provided by hashing: directly
referencing records in a table by doing arithmetic transformations on keys
into t,able addresses. If we were to know that the keys are distinct integers
from 1 to N, then we could store the record with key i in table position i,
ready for immediate access with the key value. Hashing is a generalization
of this trivial method for typical searching applications when we don't have
such specialized knowledge about the key values.
The first step in a search using hashing is to compute a hush function
which transforms the search key into a table address. No hash function is
perfect, and two or more different keys might hash to the same table address:
the second part of a hashing search is a collision resolution process which
deals with such keys. One of the collision resolution methods that we'll study
uses linked lists, and is appropriate in a highly dynamic situation where the
number of search keys can not be predicted in advance. The other two collision
resolution methods that we'll examine achieve fast search times on records
stored within a fixed array.
Hashing is a good example of a "time-space tradeoff." If there were no
memory limitation, then we could do any search with only one memory access
by simply using the key as a memory address. If there were no time limitation,
then we could get by with only a minimum amount of memory by using a
sequential search method. Hashing provides a way to use a reasonable amount
of memory and time to strike a balance between these two extremes. Efficient
use of available memory and fast access to the memory are prime concerns of
any hashing method.
Hashing is a "classical" computer science problem in the sense that the
various algorithms have been studied in some depth and are very widely used.
There is a great deal of empirical and analytic evidence to support the utility
201
202 CHAPTER 16
of hashing for a broad variety of applications.
Hash Functions
The first problem we must address is the computation of the hash function
which transforms keys into table addresses. This is an arithmetic computation
with properties similar to the random number generators that we have studied.
What is needed is a function which transforms keys (usually integers or short
character strings) into integers in the range [O..M-11, where A4 is the amount
of memory available. An ideal hash function is one which is easy to compute
and approximates a "random" function: for each input, every output should
be "equally likely."
Since the methods that we will use are arithmetic, the first step is to
transform keys into numbers which can be operated on (and are as large as
possible). For example, this could involve removing bits from character strings
and packing them together in a machine word. From now on, we'll assume
that such an operation has been performed and that our keys are integers
which fit into a machine word.
One commonly used method is to take A4 to be prime and, for any key
k, compute h(k) = k mod M. This is a straightforward method which is easy
to compute in many environments and spreads the key values out well.
A second commonly used method is similar to the linear congruential
random number generator: take M = 2m and h(k) to be the leading m bits of
(bkmod w), where w is the word size of the computer and b is chosen as for
the random number generator. This can be more efficiently computed than
the method above on some computers, and it has the advantage that it can
spread out key values which are close to one another (e. g., templ, temp$
temp3). As we've noted before, languages like Pascal are not well-suited to
such operaiions.
Separate Chaining
The hash functions above will convert keys into table addresses: we still need
to decide how to handle the case when two keys hash to the same address. The
most straightforward method is to simply build a linked list, for each table
address, of the records whose keys hash to that address. Since the keys which
hash to the same table position are kept in a linked list, they might as well
be kept in order. This leads directly to a generalization of the elementary list
searching method that we discussed in Chapter 14. Rather than maintaining
a single list with a single list header node head as discussed there, we maintain
M lists with M list header nodes, initialized as follows:
HASHTNG 203
type link=fnode;
node=record key, info: integer; next: link end;
var heads: array [O..M] of link;
t, z: link;
procedure initialize;
var i: integer;
new(z); zt.next:=z;
for i:=O to M-l do
begin new(heads[i]); heads[i]f.next:=z end;
end ;
Now the procedures from Chapter 14 can be used as is, with a hash function
used to choose among the lists. For example, listinsert(v, heads[v mod M] )
can be used to add something to the table, t:=listsearch(v, heads[v mod M])
to find the first record with key v, and successively set t:=listsearch(v, t) until
t=z to find subsequent records with key v.
For example if the ith letter in the alphabet is represented with the
number i and we use the hash function h(k) = kmod M, then we get the
following hash values for our sample set of keys with M = 11:
Key: A S E A R C H I N G E X A M P L E
Hash: 1 8 5 1 7 3 8 9 3 7 5 2 1 2 5 1 5
if these keys are successively inserted into an initially empty table, the following
set of lists would result:
0 1 2 3 4 5 6 7 8 9 10
A M C E G H I
A X N E R S
A E
L P
Obviously, the amount of time required for a search depends on the length
of the lists (and the relative positions of the keys in them). The lists could be
left unordered: maintaining sorted lists is not as important for this application
as it was for the elementary sequential search because the lists are quite short.
For an "unsuccessful search" (for a record with a key not in the table) we
can assume that the hash function scrambles things enough so that each of
204 CHAPTER 16
the M lists is equally likely to be searched and, as in sequential list searching,
that the list searched is only traversed halfway (on t,he average). The average
length of the list examined (not counting z) in this example for unsuccessful
search is (0+4+2+2+0+4+0+2+2+1+0)/11z 1.545. This would be the
average time for an unsuccessful search were the lists unordered; by keeping
them ordered we cut the time in half. For a "successful search" (for one of the
records in the table), we assume that each record is equally likely to be sought:
seven of the keys would be found as the first list item examined, six would be
found as the second item examined, etc., so the average is (7.1+ 6.2 + 2.3 +
2.4)/17) z 1.941. (This count assumes that equal keys are distinguished with
a unique identifier or some other mechanism, and the search routine modified
appropriately to be able to search for each individual key.)
If N, the number of keys in the table, is much larger than M then a good
approximation to the average length of the lists is N/M. As in Chapter 14,
unsuccessful and successful searches would be expected to go about halfway
down some list. Thus, hashing provides an easy way to cut down the time
required for sequential search by a factor of M, on the average.
In a separate chaining implementation, M is typically chosen to be relatively
small so as not to use up a large area of contiguous memory. But it's
probably best to choose M sufficiently large that the lists are short enough to
make sequential search the most efficient method for them: "hybrid" methods
(such as using binary trees instead of linked lists) are probably not worth the
trouble.
The implementation given above uses a hash table of links to headers
of the lists containing the actual keys. Maintaining M list header nodes is
somewhat wasteful of space; it is probably worthwhile to eliminate them and
make heads be a table of links to the first keys in the lists. This leads to
some complication in the algorithm. For example, adding a new record to the
beginning of a list becomes a different operation than adding a new record
anywhere else in a list, because it involves modifying an entry in the table of
links, not a field of a record. An alternate implementation is to put the first
key within the table. If space is at a premium, it is necessary to carefully
analyze the tradeoff between wasting space for a table of links and wasting
space for a key and a link for each empty list. If N is much bigger than M then
the alternate method is probably better, though M is usually small enough
that the extra convenience of using list header nodes is probably justified.
Open Addressing
If the number of elements to be put in the hash table can be estimated in
advance, then it is probably not worthwhile to use any links at all in the hash
table. Several methods have been devised which store N records in a table
HASHING 205
of size A4 > N, relying on empty places in the table to help with collision
resolution. Such methods are called open-addressing hashing methods.
The simplest open-addressing method is called linear probing: when there
is a collision (when we hash to a place in the table which is already occupied
and whose key is not the same as the search key), then just probe the next
position in the table: that is, compare the key in the record there against
the search key. There are three possible outcomes of the probe: if the keys
match, then the search terminates successfully; if there's no record there,
then the search terminates unsuccessfully; otherwise probe the next position,
continuing until either the search key or an empty table position is found. If
a record containing the search key is to be inserted following an unsuccessful
search, then it can simply be put into the empty table space which terminated
the search. This method is easily implemented as follows:
type node=record key, info: integer end;
var a: array [O..M] of node;
function h(k: integer): integer;
begin h:=k mod Mend;
procedure hashinitialize;
var i: integer;
for i:=O to M do a[i].key:=maxint;
end ;
function hashinsert (v: integer) : integer;
var x: integer;
x:=h(v);
while a[x].key<>maxint do x:=(x+1) mod M;
a [x] .key:=v;
hashinsert :=x;
end ;
Linear probing requires a special key value to signal an empty spot in the
table: this program uses maxint for that purpose. The computation x:=(x+1)
mod M corresponds to examining the next position (wrapping back to the
beginning when the end of the table is reached). Note that this program does
not check for the table being filled to capacity. (What would happen in this
case?)
The implementation of hashsearch is similar to hashinsert: simply add
the condition "a [x] .key< >v" to the while loop, and delete the following
instruction which stores v. This leaves the calling routine with the task
206 CHAPTER 16
of checking if the search was unsuccessful, by testing whether the table
position returned actually contains v (successful) or maxi& (unsuccessful).
Other conventions could be used, for example hashsearch could return M
for unsuccessful search. For reasons that will become obvious below, open
addressing is not appropriate if large numbers of records with duplicate keys
are to be processed, but hashsearch can easily be adapted to handle equal
keys in the case where each record has a unique identifier.
For our example set of keys with A4 = 19, we might get the hash values:
Key: A S E A R C H I N G E X A M P L E
Hash: 1 0 5 1 18 3 8 9 14 7 5 5 1 13 16 12 5
The following table shows the steps of successively inserting these into an
initially empty hash table:
0 -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
S A A E
SAAm E
S A A C E 0 H
S A A C E HIII
S A A C E H I 0 N
S A A C E 0G H I N
S A A C R~~GHI
S A A C : ~~~~~~
SmbimwEEGHIX N
S A A C A E E G H I X 0M N
S A A C A E E G H I X M N
S A A C A E E G H I X LL-l M N
SAACA~~~~~~lE]LMN
0R
R
R
R
R
R
R
R
R
R
0P R
P R
P R
HASHING 207
The table size is greater than before, since we must have M > N, but the
total amount of memory space used is less, since no links are used. The
average number of items that must be examined for a successful search for
this example is: 33/17 z 1.941.
Linear probing (indeed, any hashing method) works because it guarantees
that, when searching for a particular key, we look at every key that hashes
to the same table address (in particular, the key itself if it's in the table).
Unfortunately, in linear probing, other keys are also examined, especially
when the table begins to fill up: in the example above, the search for X
involved looking at G, H, and I which did not have the same hash value.
What's worse, insertion of a key with one hash value can drastically increase
the search times for keys with other hash values: in the example, an insertion
at position 17 would cause greatly increased search times for position 16. This
phenomenon, called clustering, can make linear probing run very slowly for
nearly full tables.
Fortunately, there is an easy way to virtually eliminate the clustering
problem: double hashing. The basic strategy is the same; the only difference is
that, instead of examining each successive entry following a collided position,
we use a second hash function to get a fixed increment to use for the "probe"
sequence. This is easily implemented by inserting u:=h2(v) at the beginning
of the procedure and changing x:=(x+1) mod M to x:=(x+u) mod M within
the while loop. The second hash function h2 must be chosen with some care,
otherwise the program might not work at all.
First, we obviously don't want to have u=O, since that would lead to an
infinite loop on collision. Second, it is important that M and u be relatively
prime here, since otherwise some of the probe sequences could be very short
(for example, consider the case M=2u). This is easily enforced by making
M prime and u<M. Third, the second hash function should be "different"
from the first, otherwise a slightly more complicated clustering could occur.
A function such as hz(k) = M - 2 - k mod (M - 2) will produce a good range
of "second" hash values.
For our sample keys, we get the following hash values:
Key A S E A R C H I N G E X A M P L E
Hash 1: 1 0 5 1 1 8 3 8 9 1 4 7 5 5 1 1 3 1 6 1 2 5
Hash 2': 16 15 12 16 16 14 9 8 3 10 12 10 16 4 1 5 12
The following table is produced by successively inserting our sample keys
into an initially empty table using double hashing with these values.
208 CHAPTER 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0A
0S A
S A
aI
S A
S A
S A
S A
S A
S A
S A
S A
S A
S A
S A
sB
0C
C
C
C
C
C
C
C
C
C
C
0 C
lEl
E 0 A
E Al-4
E A R
E IHI A R
E Hl2-l A R
E H I 0 N A R
-
E IGI H I N A R
0E TH Im N 0A R
0 E
G H I E NM A R
E GHIEm 0NX clAR
E G H I E A rM- l N X A R
E G H I E A MNX~AR
-
E GHIEAMMNXPAR
~~G~I~AL~N~PMR
This technique uses the same amount of space as linear probing but the
average number of items examined for successful search is slightly smaller:
32/17 z 1.882. Still, such a full table is to be avoided as shown by the 9
probes used to insert the last E in this example.
Open addressing methods can be inconvenient in a dynamic situation,
when an unpredictable number of insertions and deletions might have to be
processed. First, how big should the table be? Some estimate must be made
of how many insertions are expected but performance degrades drastically as
the table starts to get full. A common solution to this problem is to rehash
everything into a larger table on a (very) infrequent basis. Second, a word of
caution is necessary about deletion: a record can't simply be removed from
a table built with linear probing or double hashing. The reason is that later
insertions into the table might have skipped over that record, and searches
for those records will terminate at the hole left by the deleted record. A
way to solve this problem is to have another special key which can serve
as a placeholder for searches but can be identified and remembered as an
HASHING 209
empty position for insertions. Note that neither table size nor deletion are a
particular problem with separate chaining.
Analytic Results
The methods discussed above have been analyzed completely and it is possible
to compare their performance in some detail. The following formulas,
summarized from detailed analyses described by D. E. Knuth in his book on
sorting and searching, give the average number of items examined (probes) for
unsuccessful and successful searching using the methods we've studied. The
formulas are most conveniently expressed in terms of the "load factor" of the
hash table, cr = N/M. Note that for separate chaining we can have CY > 1,
but for the other methods we must have Q < 1.
Unsuccessful Successful
Separate Chaining: 1+ 42 (a + 1)/Z
Linear Probing: l/2+1/2(1-42 1 / 2 + 1 / 2 ( 1 - a )
Double Hashing: l/(1 - a) - ln( 1 - o)/o
For small o, it turns out that all the formulas reduce to the basic result that
unsuccessful search takes about 1 + N/M probes and successful search takes
about 1 + N/2M probes. (Except, as we've noted, the cost of an unsuccessful
search for separate chaining is reduced by about half by ordering the lists.)
The formulas indicate how badly performance degrades for open addressing
as CY gets close to 1. For large M and N, with a table about 90% full, linear
probing will take about 50 probes for an unsuccessful search, compared to
10 for double hashing. Comparing linear probing and double hashing against
separate chaining is more complicated, because there is more memory available
in the open addressing methods (since there are no links). The value of CY used
should be modified to take this into account, based on the relative size of keys
and links. This means that it is not normally justified to choose separate
chaining over double hashing on the basis of performance.
The choice of the very best hashing method for a particular application
can be very difficult. However, the very best method is rarely needed
for a given situation, and the various methods do have similar performance
characteristics as long as the memory resource is not being severely strained.
Generally, the best course of action is to use the simple separate chaining
method to reduce search times drastically when the number of records to be
processed is not known in advance (and a good storage allocator is available)
and to use double hashing to search among a set of keys whose size can be
roughly predicted ahead of time.
Many other hashing methods have been developed which have application
in some special situations. Although we can't go into details, we'll briefly
210 CHAPTER 16
consider two examples to illustrate the nature of specially adapted hashing
methods. These and many other methods are fully described in Knuth's book.
The first, called ordered hashing, is a method for making use of ordering
within an open addressing table: in standard linear probing, we stop the
search when we find an empty table position or a record with a key equal
to the search key; in ordered hashing, we stop the search when we find a
record with a key greater than or equal to the search key (the table must be
cleverly constructed to make this work). This method turns out to reduce
the time for unsuccessful search to approximately that for successful search.
(This is the same kind of improvement that comes in separate chaining.) This
method is useful for applications where unsuccessful searching is frequently
used. For example, a text processing system might have an algorithm for
hyphenating words that works well for most words, but not for bizarre cases
(such as "bizarre"). The situation could be handled by looking up all words
in a relatively small exception dictionary of words which must be handled in
a special way, with most searches likely to be unsuccessful.
Similarly, there are methods for moving some records around during
unsuccessful search to make successful searching more efficient. In fact, R. P.
Brent developed a method for which the average time for a successful search
can be bounded by a constant, giving a very useful method for applications
with frequent successful searching in very large tables such as dictionaries.
These are only two examples of a large number of algorithmic improvements
which have been suggested for hashing. Many of these improvements
are interesting and have important applications. However, our usual cautions
must be raised against premature use of advanced methods except by experts
with serious searching applications, because separate chaining and double
hashing are simple, efficient, and quite acceptable for most applications.
Hashing is preferred to the binary tree structures of the previous two
chapters for many applications because it is somewhat simpler and it can
provide very fast (constant) searching times, if space is available for a large
enough table. Binary tree structures have the advantages that they are
dynamic (no advance information on the number of insertions is needed), they
can provide guaranteed worst-case performance (everything could hash to the
same place even in the best hashing method), and they support a wider range
of operations (most important, the sort function). When these factors are not
important, hashing is certainly the searching method of choice.
0
HASHING 211
Exercises
1. Describe how you might implement a hash function by making use of a
good random number generator. Would it make sense to implement a
random number generator by making use of a hash function?
2. How long could it take in the worst case to insert N keys into an initially
empty table, using separate chaining with unordered lists? Answer the
same question for sorted lists.
3. Give the contents of the hash table that results when the keys E A S Y Q
U E S T I 0 N are inserted in that order into an initially empty table of
size 13 using linear probing. (Use hi(k) = kmod 13 for the hash function
for the lath letter of the alphabet.)
4. Give the contents of the hash table that results when the keys E A S Y
Q U E S T I 0 N are inserted in that order into an initially empty table
of size 13 using double hashing. (Use hi(k) from the previous question,
hz(lc) = 1 + (Ic mod 11) for the second hash function.)
5. About how many probes are involved when double hashing is used to
build a table consisting of N equal keys?
6. Which hashing method would you use for an application in which many
equal keys are likely to be present?
7. Suppose that the number of items to be put into a hash table is known
in advance. Under what condition will separate chaining be preferable to
double hashing?
8. Suppose a programmer has a bug in his double hashing code so that one
of the hash functions always returns the same value (not 0). Describe
what happens in each situation (when the first one is wrong and when
the second one is wrong).
9. What hash function should be used if it is known in advance that the key
values fall into a relatively small range?
10. Criticize the following algorithm for deletion from a hash table built with
linear probing. Scan right from the element to be deleted (wrapping as
necessary) to find an empty position, then scan left to find an element
with the same hash value. Then replace the element to be deleted with
that element, leaving its table position empty.
17. Radix Searching
Several searching methods proceed by examining the search keys one
bit at a time (rather than using full comparisons between keys at each
step). These methods, called radix searching methods, work with the bits of
the keys themselves, as opposed to the transformed version of the keys used
in hashing. As with radix sorting methods, these methods can be useful when
the bits of the search keys are easily accessible and the values of the search
keys are well distributed.
The principal advantages of radix searching methods are that they provide
reasonable worst-case performance without the complication of balanced trees;
they provide an easy way to handle variable-length keys; some allow some savings
in space by storing part of the key within the search structure; and they
can provide very fast access to data, competitive with both binary search trees
and hashing. The disadvantages are that biased data can lead to degenerate
trees with bad performance (and data comprised of characters is biased) and
that some of the methods can make very inefficient use of space. Also, as
with radix sorting, these methods are designed to take advantage of particular
characteristics of the computer's architecture: since they use digital properties
of the keys, it's difficult or impossible to do efficient implementations in languages
such as Pascal.
We'll examine a series of methods, each one correcting a problem inherent
in the previous one, culminating in an important method which is quite useful
for searching applications where very long keys are involved. In addition, we'll
see the analogue to the "linear-time sort" of Chapter 10, a "constant-time"
search which is based on the same principle.
Digital Search Trees
The simplest radix search method is digital tree searching: the algorithm is
precisely the same as that for binary tree searching, except that rather than
213
214 CHAPTER 17
branching in the tree based on the result of the comparison between the keys,
we branch according to the key's bits. At the first level the leading bit is
used, at the second level the second leading bit, and so on until an external
node is encountered. The code for this is virtually the same as the code
for binary tree search. The only difference is that the key comparisons are
replaced by calls on the bits function that we used in radix sorting. (Recall
from Chapter 10 that bits(x, k, j) is the j bits which appear k from the right
and can be efficiently implemented in machine language by shifting right k
bits then setting to 0 all but the rightmost j bits.)
function digitalsearch(v: integer; x: link) : link;
var b: integer;
zf.key:=v; b:=maxb;
repeat
if bits(v, b, I)=0 then x:=x1.1 else x:=xt.r;
b:=b-1;
until v=xt .key;
digitalsearch:=x
end ;
The data structures for this program are the same as those that we used for
elementary binary search trees. The constant maxb is the number of bits in
the keys to be sorted. The program assumes that the first bit in each key
(the (maxb+l)st from the right) is 0 (perhaps the key is the result of a call to
bits with a third argument of maxb), so that searching is done by setting x:=
digitalsearch(v, head), where head is a link to a tree header node with 0 key
and a left link pointing to the search tree. Thus the initialization procedure
for this program is the same as for binary tree search, except that we begin
with headf.l:=z instead of headt.r:=z.
We saw in Chapter 10 that equal keys are anathema in radix sorting; the
same is true in radix searching, not in this particular algorithm, but in the
ones that we'll be examining later. Thus we'll assume in this chapter that all
the keys to appear in the data structure are distinct: if necessary, a linked list
could be maintained for each key value of the records whose keys have that
value. As in previous chapters, we'll assume that the ith letter of the alphabet
is represented by the five-bit binary representation of i. That is, we'll use the
following sample keys in this chapter:
RADLX SEARCHING 215
A
S
E
R
C
H
I
N
G
X
M
P
L
00001
10011
00101
10010
00011
01000
01001
01110
00111
11000
01101
10000
01100
To be consistent with hits, we consider the bits to be numbered O-4, from
right to left. Thus bit 0 is A's only nonzero bit and bit 4 is P's only nonzero
bit.
The insert procedure for digital search trees also derives directly from the
corresponding procedure for binary search trees:
function digitaJinsert(v: integer; x: link): link;
var f: link; b: integer;
b:=maxb;
repeat
f:=x;
if bits(v, b, I)=0 then x:=xt.J else x:=xf.r;
b:=b-f;
until x=z;
new(x); xf.key:=v;xf.J:=z; xt.r:=z;
if bits(v, b+l, I)=0 then Q.'.l:=x else ff.r:=x;
digitalinsert: =x
end ;
To see how the algorithm works, consider what happens when a new key Z=
11010 is added to the tree below. We go right twice because the leading two
bits of Z are 1, then we go left, where we hit the external node at the left of
X, where Z would be inserted.
216 CRAPTER 17
The worst case for trees built with digital searching will be much better
than for binary search trees. The length of the longest path in a digital
search tree is the length of the longest match in the leading bits between
any two keys in the tree, and this is likely to be relatively short. And it is
obvious that no path will ever be any longer than the number of bits in the
keys: for example, a digital search tree built from eight-character keys with,
say, six bits per character will have no path longer than 48, even if there
are hundreds of thousands of keys. For random keys, digital search trees
are nearly perfectly balanced (the height is about 1gN). Thus, they provide
an attractive alternative to standard binary search trees, provided that bit
extraction can be done as easily as key comparison (which is not really the
case in Pascal).
Radix Search Tries
It is quite often the case that search keys are very long, perhaps consisting of
twenty characters or more. In such a situation, the cost of comparing a search
key for equality with a key from the data structure can be a dominant cost
which cannot be neglected. Digital tree searching uses such a comparison at
each tree node: in this section we'll see that it is possible to get by with only
one comparison per search in most cases.
The idea is to not store keys in tree nodes at all, but rather to put all
the keys in external nodes of the tree. That is, instead of using a for external
nodes of the structure, we put nodes which contain the search keys. Thus,
we have two types of nodes: internal nodes, which just contain links to other
nodes, and external nodes, which contain keys and no links. (E. Fredkin
RADlX SEARCHING 217
named this method "trie" because it is useful for retrieval; in conversation it's
usually pronounced "try-ee" or just "try" for obvious reasons.) To search for
a key in such a structure, we just branch according to its bits, as above, but
we don't compare it to anything until we get to an external node. Each key
in the tree is stored in an external node on the path described by the leading
bit pattern of the key and each search key winds up at one external node, so
one full key comparison completes the search.
After an unsuccessful search, we can insert the key sought by replacing
the external node which terminated the search by an imternal node which
will have the key sought and the key which terminated the search in external
nodes below it. Unfortunately, if these keys agree in more bit positions, it is
necessary to add some external nodes which do not correspond to any keys
in the tree (or put another way, some internal nodes which have an empty
external node as a son). The following is the (binary) radix search trie for our
sample keys:
Now inserting Z=llOlO into this tree involves replacing X with a new internal
node whose left son is another new internal node whose sons are X and Z.
The implementation of this method in Pascal is actually relatively complicated
because of the necessity to maintain two types of nodes, both of
which could be pointed to by links in internal nodes. This is an example of
an algorithm for which a low-level implementation might be simpler than a
high-level implementation. We'll omit the code for this because we'll see an
improvement below which avoids this problem.
The left subtree of a binary radix search trie has all the keys which have
0 for the leading bit; the right subtree has all the keys which have 1 for the
218 CHAPTER 17
leading bit. This leads to an immediate correspondence with radix sorting:
binary trie searching partitions the file in exactly the same way as radix
exchange sorting. (Compare the trie above with the partitioning diagram we
examined for radix exchange sorting, after noting that the keys are slightly
different.) This correspondence is analogous to that between binary tree
searching and Quicksort.
An annoying feature of radix tries is the "one-way" branching required for
keys with a large number of bits in common, For example, keys which differ
only in the last bit require a path whose length is equal to the key length, no
matter how many keys there are in the tree. The number of internal nodes can
be somewhat larger than the number of keys. The height of such trees is still
limited by the number of bits in the keys, but we would like to consider the
possibility of processing records with very long keys (say 1000 bits or more)
which perhaps have some uniformity, as might occur in character encoded
data. One way to shorten the paths in the trees is to use many more than
two links per node (though this exacerbates the "space" problem of using too
many nodes); another way is to "collapse" paths containing one-way branches
into single links. We'll discuss these methods in the next two sections.
Multiway Radix Searching
For radix sorting, we found that we could get a significant improvement in
speed by considering more than one bit at a time. The same is true for radix
searching: by examining m bits at a time, we can speed up the search by a
factor of 2m. However, there's a catch which makes it necessary to be more
careful applying this idea than was necessary for radix sorting. The problem
is that considering m bits at a time corresponds to using tree nodes with
M = 2m links, which can lead to a considerable amount of wasted space for
unused links. For example, if M = 4 the following tree is formed for our
sample keys:
H I L M R S
RADLX SEARCHTNG
Note that there is some wasted space in this tree because of the large number
of unused external links. As M gets larger, this effect gets worse: it turns out
that the number of links used is about MN/In M for random keys. On the
other hand this provides a very efficient searching method: the running time
is about log, N. A reasonable compromise can be struck between the time
efficiency of multiway tries and the space efficiency of other methods by using
a "hybrid" method with a large value of M at the top (say the first two levels)
and a small value of M (or some elementary method) at the bottom. Again,
efficient implementations of such methods can be quite complicated because
of multiple node types.
For example, a two-level 32-way tree will divide the keys into 1024 categories,
each accessible in two steps down the tree. This would be quite useful
for files of thousands of keys, because there are likely to be (only) a few keys
per category. On the other hand, a smaller M would be appropriate for files
of hundreds of keys, because otherwise most categories would be empty and
too much space would be wasted, and a larger M would be appropriate for
files with millions of keys, because otherwise most categories would have too
many keys and too much time would be wasted.
It is amusing to note that "hybrid" searching corresponds quite closely
to the way humans search for things, for example, names in a telephone
book. The first step is a multiway decision ("Let's see, it starts with 'A"'),
followed perhaps by some two way decisions ("It's before 'Andrews', but after
'Aitken"') followed by sequential search (" 'Algonquin' . . . 'Algren' . . . No,
'Algorithms' isn't listed!"). Of course computers are likely to be somewhat
better than humans at multiway search, so two levels are appropriate. Also,
26-way branching (with even more levels) is a quite reasonable alternative
to consider for keys which are composed simply of letters (for example, in a
dictionary).
In the next chapter, we'll see a systematic way to adapt the structure to
take advantage of multiway radix searching for arbitrary file sizes.
Patricia
The radix trie searching method as outlined above has two annoying flaws:
there is "one-way branching" which leads to the creation of extra nodes in the
tree, and there are two different types of nodes in the tree, which complicates
the code somewhat (especially the insertion code). D. R. Morrison discovered
a way to avoid both of these problems in a method which he named Patricia
("Practical Algorithm To Retrieve Information Coded In Alphanumeric").
The algorithm given below is not in precisely the same form as presented
by Morrison, because he was interested in "string searching" applications of
the type that we'll see in Chapter 19. In the present context, Patricia allows
220 CHAPTER 17
searching for N arbitrarily long keys in a tree with just N nodes, but requires
only one full key comparison per search.
One-way branching is avoided by a simple device: each node contains
the index of the bit to be tested to decide which path to take out of that
node. External nodes are avoided by replacing links to external nodes with
links that point upwards in the tree, back to our normal type of tree node
with a key and two links. But in Patricia, the keys in the nodes are not
used on the way down the tree to control the search; they are merely stored
there for reference when the bottom of the tree is reached. To see how Patrica
works, we'll first look at the search algorithm operating on a typical tree, then
we'll examine how the tree is constructed in the first place. For our example
keys, the following Patricia tree is constructed when the keys are successively
inserted.
To search in this tree, we start at the root and proceed down the tree, using
the bit index in each node to tell us which bit to examine in the search key,
going right if that bit is 1, left if it is 0. The keys in the nodes are not
examined at all on the way down the tree. Eventually, an upwards link is
encountered: each upward link points to the unique key in the tree that has
the bits that would cause a search to take that link. For example, S is the
only key in the tree that matches the bit pattern 10x11. Thus if the key at
the node pointed to by the first upward link encountered is equal to the search
key, then the search is successful, otherwise it is unsuccessful. For tries, all
searches terminate at external nodes, whereupon one full key comparison is
done to determine whether the search was successful or not; for Patricia all
searches terminate at upwards links, whereupon one full key comparison is
done to determine whether the search was successful or not. Futhermore, it's
easy to test whether a link points up, because the bit indices in the nodes (by
RADLX SEARCHING 221
definition) decrease as we travel down the tree. This leads to the following
search code for Patricia, which is as simple as the code for radix tree or trie
searching:
type link=fnode;
node=record key, info, b: integer; 1, r: link end;
var head: link;
function patriciasearch(v: integer; x: link): link;
var f: link;
repeat
f:=x;
if bits(v, xf.b, I)=0 then x:=xf.l else x:=xf.r;
until f'r.b<=xt.b;
patriciasearch :=x
end ;
This function returns a link to the unique node which could contain the record
with key v. The calling routine then can t 3st whether the search was successful
or not. Thus to search for Z=llOlO in tie above tree we go right, then up at
the right link of X. The key there is not Z so the search is unsuccessful.
The following diagram shows the ,ransformations made on the right
subtree of the tree above if Z, then T art added.
X
3 7P!!ic?z )
1 1
R
0 --e-&
The search for Z=llOlO ends at the node c:ontaining X=11000. By the defining
property of the tree, X is the only key i-1 the tree for which a search would
terminate at that node. If Z is inserted, there would be two such nodes, so
the upward link that was followed into the node containing X should be made
to point to a new node containing Z, with a bit index corresponding to the
leftmost point where X and Z differ, and with two upward links: one pointing
to X and the other pointing to Z. This corresponds precisely to replacing the
222 CHAPTER 17
external node containing X with a new internal node with X and Z as sons in
radix trie insertion, with one-way branching eliminated by including the bit
index.
The insertion of T=lOlOO illustrates a more complicated case. The search
for T ends at P=lOOOO, indicating that P is the only key in the tree with the
pattern 10x0x. Now, T and P differ at bit 2, a position that was skipped
during the search. The requirement that the bit indices decrease as we go
down the tree dictates that T be inserted between X and P, with an upward
self pointer corresponding to its own bit 2. Note carefully that the fact that
bit 2 was skipped before the insertion of T implies that P and R have the
same bit 2 value.
The examples above illustrate the only two cases that arise in insertion
for Patricia. The following implementation gives the details:
function patriciainsert(v: integer; x: link): link;
var t,f: link; i: integer;
t :=patriciasearch (v, x) ;
i:=maxb;
while bits(v, i, I)=bits(tt.key, i, 1) do i:=i-I;
repeat
f:=x;
if bits(v, xf.b, I)=0 theu x:=xf.l else x:=xt.r;
until (xT.b<=i) or (Q.b<=xt.b);
new(t); tf.key:=v; tf.b:=i;
if bits(v, tf.b, I)=0
then begin tt.l:=t; tt.r:=x end
else begin tf.r:=t; tf.l:=x end;
if bits(v, Q.b, I)=0 then ft.l:=t else ff.r:=t;
patriciainsert := t
end ;
(This code assumes that head is initialized with key field of 0, a bit index of
maxb and both links upward self pointers.) First, we do a search to find the
key which must be distinguished from v, then we determine the leftmost bit
position at which they differ, travel down the tree to that point, and insert a
new node containing v at that point.
Patricia is the quintessential radix searching method: it manages to
identify the bits which distinguish the search keys and build them into a
data structure (with no surplus nodes) that quickly leads from any search
key to the only key in the data structure that could be equal. Clearly, the
RADlX SEARCHING 223
same technique as used in Patricia can be used in binary radix trie searching
to eliminate one-way branching, but this only exacerbates the multiple node
type problem.
Unlike standard binary tree search, the radix methods are insensitive to
the order in which keys are inserted; thtty depend only upon the structure of
the keys themselves. For Patricia the pl,icement of the upwards links depend
on the order of insertion, but the tree structure depends only on the bits in
the keys, as for the other methods. This, even Patricia would have trouble
with a set of keys like 001, 0001, 00001, 300001, etc., but for normal key sets,
the tree should be relatively well-balanced so the number of bit inspections,
even for very long keys, will be roughly proportional to 1gN when there are
N nodes in the tree.
The most useful feature of radix trie searching is that it can be done
efficiently with keys of varying length. In all of the other searching methods
we have seen the length of the key is "built into" the searching procedure in
some way, so that the running time is dependent on the length of the keys
as well as the number of keys. The spetific savings available depends on the
method of bit access used. For example, suppose we have a computer which
can efficiently access g-bit "bytes" of tlata, and we have to search among
hundreds of lOOO-bit keys. Then Patricia would require access of only about
9 or 10 bytes of the search key for thl: search, plus one 125-byte equality
comparison while hashing requires accest of all 125-bytes of the search key for
computing the hash function plus a few elluality comparisons, and comparisonbased
methods require several long comparisons. This effect makes Patricia
(or radix trie searching with one-way branching removed) the search method
of choice when very long keys are involved.
224
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Draw the digital search tree that results when the keys E A S Y Q U E
S T I 0 N are inserted into an initially empty tree (in that order).
Generate a 1000 node digital search tree and compare its height and the
number of nodes at each level against a standard binary search tree and
a red-black tree (Chapter 15) built from the same keys.
Find a set of 12 keys that make a particulary badly balanced digital
search trie.
Draw the radix search trie that results when the keys E A S Y Q U E S
T I 0 N are inserted into an initially empty tree (in that order).
A problem with 26-way multiway radix search tries is that some letters
of the alphabet are very infrequently used. Suggest a way to fix this
problem.
Describe how you would delete an element from a multiway radix search
tree.
Draw the Patricia tree that results when the keys E A S Y Q U E S T I
0 N are inserted into an initially empty tree (in that order).
Find a set of 12 keys that make a particulary badly balanced Patricia
tree.
Write a program that prints out all keys in a Patricia tree having the
same initial t bits as a given search key.
Use a least-squares curvefitter to find values of a and b that give the best
formula of the form UN 1gN + bN for describing the total number of
instructions executed when a Patricia tree is built from N random keys.
18. External Searching
Searching algorithms appropriate for accessing items from very large
files are of immense practical impcrtance. Searching is the fundamental
operation on large data files, and certainly consumes a very significant fraction
of the resources used in many computer installations.
We'll be concerned mainly with met hods for searching on large disk files,
since disk searching is of the most practical interest. With sequential devices
such as tapes, searching quickly degenel.ates to the trivially slow method: to
search a tape for an item, one can't do much better than to mount the tape
and read until the item is found. Remarkably, the methods that we'll study
can find an item from a disk as large as ;L billion words with only two or three
disk accesses.
As with external sorting, the "systems" aspect of using complex I/O
hardware is a primary factor in the perfo.mance of external searching methods
that we won't be able to study in detai:. However, unlike sorting, where the
external methods are really quite differer.t from the internal methods, we'll see
that external searching methods are 1ogil:al extensions of the internal methods
that we've studied.
Searching is a fundamental operaticIn for disk devices. Files are typically
organized to take advantage of particular. device characteristics to make access
of information as efficient as possible. A3 we did with sorting, we'll work with
a rather simple and imprecise model of 'disk" devices in order to explain the
principal characteristics of the fundamental methods. Determining the best
external searching method for a particlllar application is extremely complicated
and very dependent on characteristics of the hardware (and systems
software), and so it is quite beyond the scope of this book. However, we can
suggest some general approaches to use.
For many applications we would like to frequently change, add, delete
or (most important) quickly access small bits of information inside very, very
225
CHAPTER 18
large files. In this chapter, we'll examine some methods for such dynamic
situations which offer the same kinds of advantages over the straightforward
methods that binary search trees and hashing offer over binary search and
sequential search.
A very large collection of information to be processed using a computer
is called a database. A great deal of study has gone into methods of building,
maintaining and using databases. However, large databases have very high
inertia: once a very large database has been built around a particular searching
strategy, it can be very expensive to rebuild it around another. For this reason,
the older, static methods are in widespread use and likely to remain so, though
the newer, dynamic methods are beginning to be used for new databases.
Database applications systems typically support much more complicated
operations than a simple search for an item based on a single key. Searches
are often based on criteria involving more than one key and are expected to
return a large number of records. In later chapters we'll see some examples
of algorithms which are appropriate for some search requests of this type,
but general search requests are sufficiently complicated that it is typical to do
a sequential search over the entire database, testing each record to see if it
meets the criteria.
The methods that we will discuss are of practical importance in the implementation
of large file systems in which every file has a unique identifier
and the purpose of the file system is to support efficient access, insertion and
deletion based on that identifier. Our model will consider the disk storage
to be divided up into pages, contiguous blocks of information that can be
efficiently accessed by the disk hardware. Each page will hold many records;
our task is to organize the records within the pages in such a way that any
record can be accessed by reading only a few pages. We assume that the
I/O time required to read a page completely dominates the processing time
required to do any computing involving that page. As mentioned above, this
is an oversimplified model for many reasons, but it retains enough characteristics
of actual external storage devices to allow us to consider some of the
fundamental methods which are used.
Indexed Sequential Access
Sequential disk searching is the natural extension of the elementary sequential
searching methods that we considered in Chapter 14: the records are stored
in increasing order of their keys, and searches are done by simply reading
in the records one after the other until one containing a key greater than or
equal to the search key is found. For example, if our search keys come from
E X T E R N A L S E A R C H I N G E X A M P L E and we have disks
capable of holding three pages of four records each, then we would have the
configuration:
EXTERNAL SEARCHING 227
Diskl. A A A C E E E E E G H I
Disk Z?: L L M N N P R R S T X X
As with external sorting, we must consider very small examples to understand
the algorithms but think about ve y large examples to appreciate their
performance. Obviously, pure sequentizl searching is unattractive because,
for example, searching for W in the exainple above would require reading all
the pages.
To vastly improve the speed of a search, we can keep, for each disk, an
"index" of which keys belong to which pages on that disk, as in the following
example:
Disk 1: *lc2e A A A C E E E E
Disk ,Z: eli2n E G H I L L M N
Disk3: nlr2x N P R R S T X X
The first page of each disk is its index: lower case letters indicate that
only the key value is stored, not the full record; numbers are page indices.
In the index, each page number is folloTved by the value of its last key and
preceded by the value of the last key on the previous page. (The "*" is a
sentinel key, smaller than all the others.) Thus, for example, the index for
disk 2 says that its first page contains records with keys between E and I
inclusive and its second page contains records with keys between I and N
inclusive. Normally, it should be possil)le to fit many more keys and page
indices on an index page than records on a "data" page; in fact, the index for
a whole disk should require only a few pages. These indices are coupled with
a "master index" which tells which keys are on which disk. For our example,
the master index would be U* 1 e 2 n 3 x," where boldface integers are disk
numbers. The master index is likely to be small enough that it can be kept
in memory, so that most records can be found with only two pages accessed,
one for the index on the appropriate disl. and one for the page containing the
approriate record. For example, a searc.h for W would involve first reading
the index page from disk 3, then reading the second page (from disk 3) which
is the only one that could contain W. Sc:arches for keys which appear in the
index require reading three pages the irdex plus the two pages flanking the
key value in the index. If no duplicate keys are in the file, then the extra page
access can be avoided. On the other hansl, if there are many equal keys in the
228 CHAPTER 18
file, several page accesses might be called for (records with equal keys might
fill several pages).
Because it combines a sequential key organization with indexed access,
this organization is called indexed sequential. It is the method of choice for
applications where changes to the database are likely to be made infrequently.
The disadvantage of using indexed sequential access is that it is very inflexible.
For example, adding B to the configuration above requires that virtually the
whole database be rebuilt, with new positions for many of the keys and new
values for the indices.
B-Trees
A better way to handle searching in a dynamic situation is to use balanced
trees. In order to reduce the number of (relatively expensive) disk accesses, it
is reasonable to allow a large number of keys per node so that the nodes have
a large branching factor. Such trees were named B-trees by R. Bayer and
E. McCreight, who were the first to consider the use of multiway balanced
trees for external searching. (Many people reserve the term "B-tree" to
describe the exact data structure built by the algorithm suggested by Bayer
and McCreight; we'll use it as a generic term to mean "external balanced
trees.")
The top-down algorithm that we used for 2-3-4 trees extends readily to
handle more keys per node: assume that there are anywhere from 1 to M - 1
keys per node (and so anywhere from 2 to M links per node). Searching
proceeds in a way analogous to 2-3-4 trees: to move from one node to the
next, first find the proper interval for the search key in the current node and
then exit through the corresponding link to get to the next node. Continue
in this way until an external node is reached, then insert the new key into
the last internal node reached. As with top-down 2-3-4 trees, it is necessary
to "split" nodes that are "full" on the way down the tree: any time we see
a k-node attached to an M node, we replace it by a (k + 1)-node attached
to two M/2 nodes. This guarantees that when the bottom is reached there
is room to insert the new node. The B-tree constructed for M = 4 and our
sample keys is diagrammed below:
EXTERNAL SEARCHING 229
This tree has 13 nodes, each corresponding to a disk page. Each node must
contain links as well as records. The choice M = 4 (even though it leaves us
with familiar 2-3-4 trees) is meant to emphasize this point: before we could
fit four records per page, now only three! will fit, to leave room for the links.
The actual amount of space used up depends on the relative size of records
and links. We'll see a method below wh ch avoids this mixing of records and
links.
For example, the root node might bl: stored as "10 E 11 N 12", indicating
that the root of the subtree containing records with keys less than or equal
to E is on page 0 of disk 1, etc. Just as 've kept the master index for indexed
sequential search in memory, it's reasonable to keep the root node of the Btree
in memory. The other nodes for ou:' example might be stored as follows:
Disk 1 : 20 A 21 22 E 30 H 31 L 32 40 R 41 T 42
Disk,!?: O A O O A O C O E O O E O
Disk 3 : 0 E 0 GO 0 I 0 O L O M O
Disk 4 : O N O P O R O OS0 0X0X0
The assignment of nodes to disk pages in this example is simply to proceed
down the tree, working from right to l& at each level, assigning nodes to
disk 1, then disk 2, etc. In an actual ihpplication, other assignments might
be indicated. For example, it might bt: better to avoid having all searches
going through disk 1 by assigning first to page 0 of all the disks, etc. In
truth, more sophisticated strategies are needed because of the dynamics of
the tree construction (consider the diffic:ulty of implementing a split routine
that respects either of the above strategies).
The nodes at the bottom level in t1 e B-trees described above all contain
many 0 links which can be eliminated 1)~ marking such nodes in some way.
Furthermore, a much larger value of M 1:an be used at the higher levels of the
tree if we store just keys (not full records) in the interior nodes as in indexed
sequential access. To see how to take aclvantage of these observations in our
example, suppose that we can fit up to seven keys and eight links on a page, so
that we can use M = 8 for the interior rodes and M = 5 for the bottom-level
nodes (not M = 4 because no space for 1 nks need be reserved at the bottom).
A bottom node splits when a fifth record is added to it; the split involves
"inserting" the key of the middle recortl into the tree above, which operates
as a normal B-tree from M = 8 (on stored keys, not records). This leads to
the following tree:
230 CHAPTER 18
The effect for a typical application is likely to be much more dramatic since
the branching factor of the tree is increased by roughly the ratio of the record
size to key size, which is likely to be large. Also, with this type of organization,
the "index" (which contains keys and links) can be separated from the actual
records, as in indexed sequential search:
Diskl: 11 112 20 a 21 e 22 e 30 h 31 32 n 40 r 41 s 42
Disk R: A A A C E E E
Disk 3: E E G H I L L M
Disk4: N N P R R S T X X
As before, the root node is kept in memory. Also the same issues as discussed
above regarding node placement on the disks arise.
Now we have two values of M, one for the interior nodes which determines
the branching factor of the tree (MI) and one for the bottom-level nodes which
determines the allocation of records to pages (MB). To minimize the number
of disk accesses, we want to make both MI and MB as large as possible, even
at the expense of some extra computation. On the other hand, we don't want
to make MI huge, because then most tree nodes would be largely empty and
space would be wasted and we don't want to make MB huge because this
would reduce to sequential search of the bottom-level nodes. Usually, it is
best to relate both MI and MB to the page size. The obvious choice for MB
is the number of records that can fit on a page: the goal of the search is to
find the page containing the record sought. If MI is taken to be the number
of keys that can fit on two to four pages, then the B-tree is likely to be only be
three levels deep, even for very large files (a three-level tree with MI = 1024
can handle up to 10243, or over a billion, entries). But recall that the root
node of the tree, which is accessed for every operation on the tree, is kept in
memory, so that only two disk accesses are required to find any element in
the file.
As discussed in Chapter 15, a more complicated "bottom-up" insertion
method is commonly used for B-trees, though the distinction between topEXTERNAL.
SEARCHING 231
down and bottom up methods loses iml)ortance for three level trees. Other
variations of balanced trees are more iriportant for external searching. For
example, when a node becomes full, sp itting (and the resultant half-empty
nodes) can be forestalled by dumping some of the contents of the node into
its "brother" node (if it's not too full). This leads to better space utilization
within the nodes, which is likely to be o ' central concern in a large-scale disk
searching application.
Extendible Hashing
An alternative to B-trees which extends digital searching algorithms to apply
to external searching was developed in 1978 by R. Fagin, J. Nievergelt, N.
Pippenger, and R. Strong. This method, called extendible hashing, guarantees
that no more than two disk accesses will be used for any search. As with E
trees, our records are stored on pages which are split into two pieces when
they fill up; as with indexed sequential access, we maintain an index which
we access to find the page containing the records which match our search key.
Extendible hashing combines these approaches by using digital properties of
the search keys.
To see how extendible hashing worlcs, we'll consider how it handles successive
insertions of keys from E X T E R N A L S E A R C H I N G E X A
M P L E, using pages with a capacity o * up to four records.
We start with an "index" with just one entry, a pointer to the page
which is to hold the records. The first four records fit on the page, leaving
the following trivial structure:
Disk 1: ; 0
Disk 2: ZETX
The directory on disk 1 says that all records are on page 0 of disk 2, where
they are kept in sorted order of their keys. For reference, we also give the
binary value of the keys, using our standard encoding of the five-bit binary
representation of i for the ith letter of the alphabet. Now the page is full,
and must be split in order to add the ks:y R=lOOlO. The strategy is simple:
put records with keys that begin with 0 on one page and records with keys
that begin with 1 on another page. Thi: necessitates doubling the size of the
directory, and moving half the keys from page 0 of disk 2 to a new page,
leaving the following structure:
232 CHAPTER 18
0 : I$101 E
0101 E
1 : 4R10010 Disk 1: 2 0 2 1
10100 T Disk Z?: E E R T X
111000 x
Now N=OlllO and A=00001 can be added, but another split is needed
before L=OllOO can be added:
0: 0001 A
0101 E
Disk 1: 20 21
Disk2:AEEN R T X
10100 T
11000 x
Recall our basic assumption that we do disk I/O in page units, and that
processing time is negligible compared to the time to input or output a page.
Thus, keeping the records in sorted order of their keys is not a real expense:
to add a record to a page, we must read the page into memory, modify it,
and write it back out. The extra time required to insert the new record to
maintain sorted order is not likely to be noticable in the typical case when
the pages are small.
Proceeding in the same way, as for the first split, we make room for L=
01100 by splitting the first page into two pieces, one for keys that begin with 00
and one for keys that begin with 01. What's not immediately clear is what to
do with the directory. One alternative would be to simply add another entry,
one pointer to each page. This is unattractive because it essentially reduces
to indexed sequential search (albeit a radix version): the directory has to be
scanned sequentially to find the proper page during a search. Alternatively,
we can just double the size of the directory again, giving the structure:
00: 0 001 A
0 101 E
0 101 E
01: 01110 L
DiskI:
01110 N Disk,!?: A E E L N R T X
10: 1 010 R
11: 1 100 T
11000 x
Now we can access any record by using the first two bits of its key to access
directly the directory entry that contains the address of the page containing
EXTERNAL SEARCHING 233
the record.
Continuing a little further, we call add S=lOOll and E=OOlOl before
another split is necessary to add A=OOOOl. This split also requires doubling
the directory, leaving the structure:
Diskl: 2021222230303030
Disk 2: A A EEE LN
Disk3: R S T X
In general, the structure built by exl endible hashing consists of a directory
of 2d words (one for each d-bit pattern) and a set of leaf pages which contain
all records with keys beginning with a specific bit pattern (of less than or
equal to d bits). A search entails using ,he leading d bits of the key to index
into the directory, which contains pointc:rs to 1ea.f pages. Then the referenced
leaf page is accessed and searched (usin:; any strategy) for the proper record.
A leaf page can be pointed to by more tlian one directory entry: to be precise,
if a leaf page contains all the records uith keys that begin with a specific k
bits (those marked with a vertical line in the pages on the diagram above),
then it will have 2d-k directory entries pointing to it. In the example above,
we have d = 3, and page 0 of disk 3 contains all the records with keys that
begin with a 1 bit, so there are four dirl:ctory entries pointing to it.
The directory contains only pointc:rs to pages. These are likely to be
smaller than keys or records, so more directory entries will fit on each page.
For our example, we'll assume that we can fit twice as many directory entries
as records per page, though this ratio is likely to be much higher in practice.
When the directory spans more than one page, we keep a "root node" in
memory which tells where the directoTT pages are, using the same indexing
scheme. For example, if the directory spans two pages, the root node might
contain the two entries "10 11," indicatir g that the directory for all the records
with keys beginning with 0 are on page 0 of disk 1, and the directory for all
keys beginning with 1 are on page 1 o' disk 1. For our example, this split
occurs when the E is inserted. Continuing up until the last E (see below), we
get the following disk storage structure:
234 CHAPTER 18
Disk 1: 2 0 2 0 21 22 30 30 31 32 40 40 41 41 42 42 42 42
Disk Z?: A A A C E E E E G
Disk3: H I L L M N N
Disk4:PRR S T x x
As illustrated in the above example, insertion into an extendible hashing
structure can involve one of three operations, after the leaf page which could
contain the search key is accessed. If there's room in the leaf page, the new
record is simply inserted there; otherwise the leaf page is split in two (half the
records are moved to a new page). If the directory has more than one entry
pointing to that leaf page, then the directory entries can be split as the page
is. If not, the size of the directory must be doubled.
As described so far, this algorithm is very susceptible to a bad input
key distribution: the value of d is the largest number of bits required to
separate the keys into sets small enough to fit on leaf pages, and thus if
a large number of keys agree in a large number of leading bits, then the
directory could get unacceptably large. For actual large-scale applications,
this problem can be headed off by hashing the keys to make the leading
bits (pseudo-)random. To search for a record, we hash its key to get a bit
sequence which we use to access the directory, which tells us which page to
search for a record with the same key. From a hashing standpoint, we can
think of the algorithm as splitting nodes to take care of hash value collisions:
hence the name "extendible hashing." This method presents a very attractive
alternative to B-trees and indexed sequential access because it always uses
exactly two disk accesses for each search (like indexed sequential), while still
retaining the capability for efficient insertion (like B-trees).
Even with hashing, extraordinary steps must be taken if large numbers
of equal keys are present. They can make the directory artificially large; and
the algorithm breaks down entirely if there are more equal keys than fit in
one leaf page. (This actually occurs in our example, since we have five E's,.) If
many equal keys are present then we could (for example) assume distinct keys
in the data structure and put pointers to linked lists of records containing
equal keys in the leaf pages. To see the complication involved, consider the
insertion of the last E into the structure above.
Virtual Memory
The "easier way" discussed at the end of Chapter 13 for external sorting
applies directly and trivially to the searching problem. A virtual memory
is actually nothing more than a general-purpose external searching method:
given an address (key), return the information associated with that address.
EXTERNAL. SEARCHING 235
However, direct use of the virtual men ory is not recommended as an easy
searching application. As mentioned in Chapter 13, virtual memories perform
best when most accesses are relatively close to previous accesses. Sorting
algorithms can be adapted to this, but the very nature of searching is that
requests are for information from arbitr n-y parts of the database.
236
Exercises
1. Give the contents of the B-tree that results when the keys E A S Y Q U
E S T I 0 N are inserted in that order into an initially empty tree, with
M = 5.
2. Give the contents of the B-tree that results when the keys E A S Y Q U
E S T I 0 N are inserted in that order into an initially empty tree, with
M = 6, using the variant of the method where all the records are kept in
external nodes.
3. Draw the B-tree that is built when sixteen equal keys are inserted into an
initially empty tree, with M = 5.
4. Suppose that one page from the database is destroyed. Describe how you
would handle this event for each of the B-tree structures described in the
text.
5. Give the contents of the extendible hashing table that results when the
keys E A S Y Q U E S T I 0 N are inserted in that order into an initially
empty table, with a page capacity of four records. (Following the example
in the text, don't hash, but use the five-bit binary representation of i as
the key for the ith letter.)
6. Give a sequence of as few distinct keys as possible which make an extendible
hashing directory grow to size 16, from an initially empty table,
with a page capacity of three records.
7. Outline a method for deleting an item from an extendible hashing table.
8. Why are "top-down" B-trees better than "bottom-up" B-trees for concurrent
access to data? (For example, suppose two programs are trying to
insert a new node at the same time.)
9. Implement search and insert for internal searching using the extendible
hashing method.
10. Discuss how the program of the previous exercise compares with double
hashing and radix trie searching for internal searching applications.
237
SOURCES for Searching
Again, the primary reference for this section is Knuth's volume three.
Most of the algorithms that we've st ldied are treated in great detail in
that book, including mathematical analyses and suggestions for practical
applications.
The material in Chapter 15 come:, from Guibas and Sedgewick's 1978
paper, which shows how to fit many classical balanced tree algorithms into
the "red-black" framework, and which gives several other implementations.
There is actually quite a large literature on balanced trees. Comer's 1979
survey gives many references on the subject of Btrees.
The extendible hashing algorithm presented in Chapter 18 comes from
Fagin, Nievergelt, Pippenger and Stron;'s 1979 paper. This paper is a must
for anyone wishing further information In external searching methods: it ties
together material from our Chapters 16 and 17 to bring out the algorithm in
Chapter 18.
Trees and binary trees as purely mltthematical objects have been studied
extensively, quite apart from computer science. A great deal is known about
the combinatorial properties of these objects. A reader interested in studying
this type of material might begin with Icnuth's volume 1.
Many practical applications of thl: methods discussed here, especially
Chapter 18, arise within the context of slatabase systems. An introduction to
this field is given in Ullman's 1980 book.
D. Comer, "The ubquitous &tree," Colrlputing Surveys, 11 (1979).
R. Fagin, J. Nievergelt, N. Pippenger aIld H. R. Strong, "Extendible Hashing
- a fast access method for dynamic fi.es," ACM transactions on Database
Systems, 4, 3 (September, 1979).
L. Guibas and R. Sedgewick, "A dichromatic framework for balanced trees,"
in 19th Annual Sym.posium on Foundations of Computer Science, IEEE, 1978.
Also in A Decade of Progress 1970-19811, Xerox PARC, Palo Alto, CA.
D. E. Knuth, The Art of Computer Pr ,gramming. Volume 1: Fundamental
Algorithms, Addison-Wesley, Reading, IAA, 1968.
D. E. Knuth, The Art of Computer PI.ogramming. Volume 3: Sorting and
Searching, Addison-Wesley, Reading, MA, second printing, 1975.
J. D. Ullman, Principles of Database Sy: terns, Computer Science Press, Rockville,
MD, 1982.
. . . . . . . . . . . . . . . . . . . . . . . . . . . : . . . . . . . . :
i:.: . . . . . :
- :..
. . . . .: .:..*::. . . .. :.::. . ..:.'. : ..:.". -. . .. . . .. . .. . . . . : *.. . . - **--
.. .. .. .. .. .. .. .. . . . .. . .. .. .. .. .. .. . .. . .. .. .. .. .. .. .. .. . . .. . .. .. .. . .. .. .. .. .. .. .. .. .. . . .. . .. .. .. .. .. .. .. .. .. . . .. .. . .. .. . .. . . .. . .. .. .. . .. .. .. .. .. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . i..::.. .-. :... -
. . . .
: ..:::.. . .:.::.:.: . . . . . . . : . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..~.....................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
: . . . . . . . . . . .. . . . . :.: . . . . . . . . . . . . . . . . . . . . . .
. . . . . . i::.... .i: .:. I:.:' :.:.:..:;.. * : - ..: :.-:* - .::. . = . . . :.
:..:...:...:............::.......:.....::...::.~:.:..:.*...:...:...:...:~..::.:::.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : . . . . . . . . . . ...::.:.* :*': . . . . .
. . ... . . . . .:: .i: - .:*.:.: . . . .
* .: . : ": . . . . . . . . . . . . . ...* . :. ..i': :
...:...:. ....... ....... ....... .:.....:. ...:... ....... ....... ....:.:. .. ..........:*.........:.:.:.....:... ....... ...:..... ......::.........:.:.:.:. ..*.:.,:..:
. . . . . . . .n. . . . . . . . . . . . . . . . . . . ..,. . . . . . : . .I. : : :. :.:.:... .:.. :' . -.: :....:..::. . :. -. . : . . . : . :.: . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,.............................
-:.: :.:. .:: . ...: .* :. ....:;::.....:.:.: . . . . . . . . . . . . . . . . . . . . :
. . .. . . . . *. . . . . . . . . . . . . . .. . . . . . .. .. . * . . .:*:..*:. . :' . : . . .. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..~
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.:: . . . . . . . . . . . . . . . . . , . . . :. . . . . . . . . . . . .
. .. . '... : -. :i' . - . :: . . ..:.. :.: . . . . . .. . -- ::. ..i. . . . .. .. . :-.-::.
.:*.......:. .. ..:...:.....:.:......I:...:............ .. .. ........ ........... .......~. . ... ...:... .. .....:. ...:.:... .:..::......~. ... :. .. .. .:.......:.:.
., . . . . . , :. :::.. :. . . : . -.*. . . . .; .'..:..'. ::. ... .: ...*.... .. : . 1:.:: :.:::.I...::.::.. : - . .. .. .. .. . . . .:
.,.. . .I... . .. .. .. .. . .. . . .. . .. .. .. . .. .. .. . .. .. . . .. .. .. .. .. .. .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . . . .. .. .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .i. : :. . . . . . - '.: I ':: :..: : :: . . . :: . . . : * . :: ** . . . . . :.. . . . . :.::. :.... -.. .:.:.
.:..........::....:....:::.....:::....:.:.........~..........:.....:...:....:....:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .I.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .. . .. . ..:- * . . :::* "': . ...*-.: :*.: :. : .:* ': .::. :'. :: :. . . . . . . .-. *.-. .:- ::"'..:' . . . . .. .. .. . . . : . . . . . ..
.:..:.:.:........::::..........:...:.:....::::.:..:.:.:.....:.:::::..:.:....::....
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...". . . . . . . . . . . . . . . . .
.,.", .*. . . .:.* : ..:. :.: ..:.. . . . . . . .. . . . . **..* . :. . .:. . . . .
:... . . . . . . . . .: *:.: . . :. . 1.. . .:. *..*. . . . . . .'*I . . . . . . .
.. .. .. . .. .. . .. .. .. .. .. . .. . .. .. .. . . . .. .. .. .. .. .. .. .. .. .. .. . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . .*. ..... .. ..... .. ..... .. ... . . .. .. .. .. .. . .. . . . . . . . . . . . .
*. . -: .: -: . . . . .. . .. .. . . . . . *. . ::..:.. :. .. :. .. .. :. .. ... . 1: ':: :. .. . .. ..* . : . . . . . . . . . . . . . . . .:* . . . . . . . .:
*. . . . . . . . . . . . . . . . .
::...:.....:......:...:.:......:.......::.:.....:.,........:.:.....:...:.:.:.::...
. . . . . ..r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
:.: . . ... '. : . . . . . . . . . . . . . . . . . . . . . . .,. . . . . . . . :
. . . .: .*- : . .: . : ..:...ia.... .:.:.: . . . ..::-: . :. . . . . .
. . . . . . . . . . . . . . .. . ..O..: . .
..::.:..:.:.........:....:.::.::.:.::.i...::.....:.....~.::.........:...::::......
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- -* - - * : .- -- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . :.: :. -:: : .:. .:-..:: .:.,...:.:-- ..: -*..: ': : :..... . . . . . . . . . . . . . . . . .
.:...:.....:.:.::.:::.:.:.::.::....:...::::........:...:....:.:.::.:..:.......:.::
. . . . . . . . . ..I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.: :.:.*.:.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. :.: :::.::. . . . . . . . : :..;..- . . .:..::..: I. ':.. .:..: . . . i: . *
..:.:...........:......::.:..:....:...............,...:...:...:::.::.:..~:.....*..
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . ...:. . . . . . . . . . . . . . . . . . . . . . . .
. . . :: . .: ::..... . . y.:. : . . . .. .. .. .. . . . . . . . . . . . . . : . : .: . . : . :' . .: . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19. String Searching
Data to be processed often does not decompose logically into independent
records with small identifiable pieces. This type of data is characterized
only by the fact that it can be written down as a string: a linear
(typically very long) sequence of characters.
Strings are obviously central in "word processing" systems, which provide
a variety of capabilities for the manipulation of text. Such systems process text
strings, which might be loosely defined as sequences of letters, numbers, and
special characters. These objects can be quite large (for example, this book
contains over a million characters), and efficient algorithms play an important
role in manipulating them.
Another type of string is the binary string, a simple sequence of 0 and 1
values. This is in a sense merely a special type of text string, but it is worth
making the distinction not only because different algorithms are appropriate
but also binary strings arise naturally in many applications. For example,
some computer graphics systems represent pictures as binary strings. (This
book was printed on such a system: the present page was represented at one
time as a binary string consisting of millions of bits.)
In one sense, text strings are quite different objects than binary strings,
since they are made up of characters from a large alphabet. In another, the two
types of strings are equivalent, since each text character can be represented
by (say) eight binary bits and a binary string can be viewed as a text string by
treating eight-bit chunks as characters. We'll see that the size of the alphabet
from which the characters are taken to form a string is an important factor
in the design of string processing algorithms.
A fundamental operation on strings is pattern matching: given a text
string of length N and a pattern of length M, find an occurrence of the
pattern within the text. (We will use the term "text" even when referring to
a sequence of O-l values or some other special type of string.) Most algorithms
241
242 CHAPTER 19
for this problem can easily be extended t,o find all occurrences of the pattern
in the text, since they scan sequentially through t,he text and can be restarted
at the point directly after the beginning of a match to find the next match.
The pattern-matching problem can be characterized as a searching problem
with the pattern as the key, but the searching algorithms that we have
studied do not apply directly because the pattern can be long and because
it "lines up" with the text in an unknown way. It is an interesting problem
to study because several very different (and surprising) algorithms have only
recently been discovered which not only provide a spectrum of useful practical
methods but also illustrate some fundamental algorithm design techniques.
A Short History
The development of the algorithms that we'll be examining has an interesting
history: we'll summarize it here to place the various methods into perspective.
There is an obvious brute-force algorithm for string processing which is in
widespread use. While it has a worst-case running time proportional to MN,
the strings which arise in many applications lead to a running time which
is virtually always proportional to M + N. Furthermore, it is well suited to
good architectural features on most computer systems, so an optimized version
provides a "standard" which is difficult to beat with a clever algorithm.
In 1970, S. A. Cook proved a theoretical result about a particular type of
abstract machine implying that an algorithm exists which solves the patternmatching
problem in time proportional to M + N in the worst case. D. E.
Knuth and V. R. Pratt laboriously followed through the construction Cook
used to prove his theorem (which was not intended at all to be practical)
to get an algorithm which they were then able to refine to be a relatively
simple practical algorithm. This seemed a rare and satisfying example of a
theoretical result having immediate (and unexpected) practical applicability.
But it turned out that J. H. Morris had discovered virtually the same algorithm
as a solution to an annoying practical problem that confronted him when
implementing a text editor (he didn't want to ever "back up" in the text
string). However, the fact that the same algorithm arose from two such
different approaches lends it credibility as a fundamental solution to the
problem.
Knuth, Morris, and Pratt didn't get around to publishing their algorithm
until 1976, and in the meantime R. S. Boyer and J. S. Moore (and, independently,
R. W. Gosper) discovered an algorithm which is much faster in many
applications, since it often examines only a fraction of the characters in the
text string. Many text editors use this algorithm to achieve a noticeable
decrease in response time for string searches.
Both the Knuth-Morris-Pratt and the Boyer-Moore algorithms require
some complicated preprocessing on the pattern that is difficult to understand
STRING SEARCHING 243
and has limited the extent, to which they are used. (In fact, the story goes
that an unknown systems programmer found Morris' algorithm too difficult
to understand and replaced it with a brute-force implementation.)
In 1980, R. M. Karp and M. 0. Rabin observed that the problem is not
as different from the standard searching problem as it had seemed, and came
up with an algorithm almost as simple as the brute-force algorithm which
virtually always runs in time proportional to M + N. Furthermore, their
algorithm extends easily to two-dimensional patterns and text, which makes
it more useful than the others for picture processing.
This story illustrates that the search for a "better algorithm" is still very
often justified: one suspects that there are still more developments on the
horizon even for this problem.
Brute-Force Algorithm
The obvious method for pattern matching that immediately comes to mind is
just to check, for each possible position in the text at which the pattern could
match, whether it does in fact match. The following program searches in this
way for the first occurrence of a pattern p [ 1. .M] in a text string a [ 1. .N] :
function brutesearch: integer;
var i, j: integer;
i:=l; j:=l;
repeat
if a[i]=plj]
then begin i:=i+l; j:=j+l end
else begin i:=i-j+2; j:=1 end;
until (j>M) or (i>N);
if j>M then brutesearch:=i-M else brutesearch:=i
end ;
The program keeps one pointer (i) into the text, and another pointer (j) into
the pattern. As long as they point to matching characters, both pointers are
incremented. If the end of the pattern is reached (j>M), then a match has
been found. If i and j point to mismatching characters, then j is reset to point
to the beginning of the pattern and i is reset to correspond to moving the
pattern to the right one position for matching against the text. If the end
of the text is reached (i>N), then there is no match. If the pattern does not
occur in the text, the value N+l is returned.
In a text-editing application, the inner loop of this program is seldom
iterated, and the running time is very nearly proportional to the number of
244 CHAPTER 19
text characters examined. For example, suppose that we are looking for the
pattern STING in the text string
A STRING SEARCHING EXAMPLE CONSISTING OF SIMPLE TEXT
Then the statement j:=j+l is executed only four times (once for each S,
but twice for the first ST) before the actual match is encountered. On the
other hand, this program can be very slow for some patterns. For example, if
the pattern is 00000001 and the text string is:
00000000000000000000000000000000000000000000000000001
then j is incremented 7*45 (315) times before the match is encountered. Such
degenerate strings are not likely in English (or Pascal) text, but the algorithm
does run more slowly when used on binary (two-character) text, as might occur
in picture processing and systems programming applications. The following
table shows what happens when the algorithm is used to search for 10010111
in the following binary string:
100111010010010010010111000111
1001
1
10
10010
10010
10010
10010111
There is one line in this table for each time the body of the repeat loop
is entered, and one character for each time j is incremented. These are the
"false starts" that occur when trying to find the pattern: an obvious goal is
to try to limit the number and length of these.
Knuth-Morris-Pratt Algorithm
The basic idea behind the algorithm discovered by Knuth, Morris, and Pratt
is this: when a mismatch is detected, our "false start" consists of characters
that we know in advance (since they're in the pattern). Somehow we should be
able to take advantage of this information instead of backing up the i pointer
over all those known characters.
STRING SEARCHING 245
For a simple example of this, suppose that the first character in the
pattern doesn't appear again in the pattern (say the pattern is 10000000).
Then, suppose we have a false start j characters long at some position in
the text. When the mismatch is detected, we know, by dint of the fact
that j characters have matched, that we don't have to "back up" the text
pointer i, since none of the previous j-l characters in the text can match
the first character in the pattern. This change could be implemented by
replacing i:=i-j+2 in the program above by i:=i+l. The practical effect of
this change is limited because such a specialized pattern is not particularly
likely to occur, but the idea is worth thinking about because the Knuth-
Morris-Pratt algorithm is a generalization. Surprisingly, it is always possible
to arrange things so that the i pointer is never decremented.
Fully skipping past the pattern on detecting a mismatch as described
in the previous paragraph won't work when the pattern could match itself
at the point of the mismatch. For example, when searching for 10100111 in
1010100111 we first detect the mismatch at the fifth character, but we had
better back up to the third character to continue the search, since otherwise
we would miss the match. But we can figure out ahead of time exactly what
to do, because it depends only on the pattern, as shown by the following table:
j p[l..j-l] next b]
2 11 1
3 10 1
4 101 2
5 1010 3
The array next [1..M] will be used to determine how far to back up when a
mismatch is detected. In the table, imagine that we slide a copy of the first
j-l characters of the pattern over itself, from left to right starting with the
first character of the copy over the second character of the pattern, stopping
when all overlapping characters match (or there are none). These overlapping
characters define the next possible place that the pattern could match, if a
mismatch is detected at pbl. The distance to back up (next b]) is exactly
one plus the number of the overlapping characters. Specifically, for j>l, the
value of nextb] is the maximum k<j for which the first k-l characters of
the pattern match the last k-l characters of the first j-l characters of the
pattern. A vertical line is drawn just after plj-next[j] ] on each line of the
246 CHAPTER 19
table. As we'll soon see, it is convenient to define next[I] to be 0.
This next array immediately gives a way to limit (in fact, as we'll see,
eliminate) the "backup" of the text pointer i: a generalization of the method
above. When i and j point to mismatching characters (testing for a pattern
match beginning at position i-j+1 in the text string), then the next possible
position for a pattern match is beginning at position i-nextIj]+l. But by
definition of the next table, the first nextb]--I characters at that position
match the first nextb]-l characters of the pattern, so there's no need to
back up the i pointer that far: we can simply leave the i pointer unchanged
and set the j pointer to next b], as in the following program:
function kmpsearch : integer ;
var i, j: integer;
i:=l; j:=l;
repeat
if (j=O) or (a[i]=pb])
then begin i:=i+l; j:=j+l end
else begin j:=nextLj] end;
until (j>M) or (i>N);
if j> M then kmpsearch : =i-M else kmpsearch : =i;
end ;
When j=l and a[i] does not match the pattern, there is no overlap, so we want
to increment i and set j to the beginning of the pattern. This is achieved by
defining next [I] to be 0, which results in j being set to 0, then i is incremented
and j set to 1 next time through the loop. (For this trick to work, the pattern
array must be declared to start at 0, otherwise standard Pascal will complain
about subscript out of range when j=O even though it doesn't really have to
access p[O] to determine the truth of the or.) Functionally, this program is
the same as brutesearch, but it is likely to run faster for patterns which are
highly self-repetitive.
It remains to compute the next table. The program for this is short but
tricky: it is basically the same program as above, except that it is used to
match the pattern against itself.
STRING SEARCHING 247
procedure initnext ;
var i, j: integer;
i:=l; j:=O; next[l]:=O;
repeat
if (j=O) or (p[i]=plj])
then begin i:=i+l; j:=j+l; next[i]:=j end
else begin j:=nextIj] end;
until i>M;
end ;
Just after i and j are incremented, it has been determined that the first j-l
characters of the pattern match the characters in positions p [i-j- 1. .i-1 1, the
last j-l characters in the first i-l characters of the pattern. And this is the
largest j with this property, since otherwise a "possible match" of the pattern
with itself would have been missed. Thus, j is exactly the value to be assigned
to next [il.
An interesting way to view this algorithm is to consider the pattern as
fixed, so that the next table can be "wired in" to the program. For example,
the following program is exactly equivalent to the program above for the
pattern that we've been considering, but it's likely to be much more efficient.
i:=O;
0: i:=i+l;
1: if a[i]<>'l'then goto 0; i:=i+l;
2: if a[i]<>'O'then goto 1; i:=i+l;
3: if a[i]<>'l'then goto 1; i:=i+l;
4: if a[i]<>'O'then goto 2; i:=i+l;
5: if a[i]<>'O'then goto 3; i:=i+l;
6: if a[i]<>'l'then goto 1; i:=i+l;
7: if a[i]<>'l'then goto 2; i:=i+l;
8: if a[i]<>'l'then goto 2; i:=i+l;
search : =i-8;
The goto labels in this program correspond precisely to the next table. In
fact, the in&next program above which computes the next table could easily
be modified to output this program! To avoid checking whether i>N each
time i is incremented, we assume that the pattern itself is stored at the end
of the text as a sentinel, in a[N+l ..N+M]. (This optimization could also
be applied to the standard implementation.) This is a simple example of a
"string-searching compiler" : given a pattern, we can produce a very efficient
248 CHAPTER 19
program which can scan for that pattern in an arbitrarily long text string.
We'll see generalizations of this concept in the next two chapters.
The program above uses just a few very basic operations to solve the
string searching problem. This means that it can easily be described in terms
of a very simple machine model, called a finite-state machine. The following
diagram shows the finite-state machine for the program above:
c-d--------.
..
I,---.
// e--. --.
// \ / ' \ '\
ff -_ \ ' \
\ f \ \
;1 '0 1D0 0 '1-
\' \'\ / I / / / ,
','Z-- H' / / /
- .- . .' /
N-w
--=z----
#CC /'
----_---- cc)
The machine consists of states (indicated by circled letters) and transitions
(indicated by arrows). Each state has two transitions leaving it: a match
transition (solid line) and a non-match transition (dotted line). The states
are where the machine executes instructions; the transitions are the goto instructions.
When in the state labeled "5," the machine can perform just
one instruction: "if t.he current character is x then scan past it and take the
match transition, otherwise take the non-match transition." To "scan past"
a character means to take the next character in the string as the "current
character"; the machine scans past characters as it matches them. There
is one exception to this: the non-match transition in the first state (marked
with a double line) also requires that the machine scan to the next character.
(Essentially this corresponds to scanning for the first occurrence of the
first character in the pattern.) In the next chapter we'll see how to use a
similar (but more powerful) machine to help develop a much more powerful
pattern-matching algorithm.
The alert reader may have noticed that there's still some room for improvement
in this algorithm, because it doesn't take into account the character
which caused the mismatch. For example, suppose that we encounter 1011
when searching for our sample pattern 10100111. After matching 101, we
find a mismatch on the fourth character, at which point the next table says
to check the second character, since we already matched the 1 in the third
character. However, we could not have a match here: from the mismatch, we
know that the next character in the text is not 0, as required by the pattern.
STRING SEARCHING 249
Another way to see this is to look at the version of the program with the next
table "wired in": at label 4 we go to 2 if a[i] is not 0, but at label 2 we go
to 1 if a[i] is not 0. Why not just go to 1 directly? Fortunately, it is easy
to put this change into the algorithm. We need only replace the statement
next[i] :=j in the initnext program by
if plj]<>p[i] then next[i]:=j else next[i]:=nextb];
With this change, we either increment j cr reset it from the next table at most
once for each value of i, so the algorithm is clearly linear.
The Knuth-Morris-Pratt algorithm LS not likely to be significantly faster
than the brute-force method in most actual applications, because few applications
involve searching for highly self-repetitive patterns in highly selfrepetitive
text. However, the method does have a major virtue from a practical
point of view: it proceeds sequentially through the input and never "backs
up" in the input. This makes the method convenient for use on a large file
being read in from some external device. (Algorithms which require backup
require some complicated buffering in this situation.)
Boyer-Moore Algorithm
If "backing up" is not a problem, then a significantly faster string searching
method can be developed by scanning .,he pattern from right to left when
trying to match it against the text. When searching for our sample pattern
10100111, if we find matches on the eighth, seventh, and sixth character but
not on the fifth, then we can immediatelyi slide the pattern seven positions to
the right, and check the fifteenth character next, because our partial match
found 111, which might appear elsewhm?re in the pattern. Of course, the
pattern at the end does appear elsewhere: in general, so we need a next table
as above. For example, the following is a right-to-left version of the next table
for the pattern 10110101:
j p[M--j+2..M] p[M-n~3xt~]+l..M] nextb]
2 1 101 4
3 010110101 7
4 10101 2
5 010110101
1
5
6 1010110101 5
7 11010110101 5
8 011010110101 5
250 CHAPTER 19
The number at the right on the jth line of the table gives the maximum
number of character positions that the pattern can be shifted to the right
given that a mismatch in a right-toleft scan occurred on the jth character
from the right in the pattern. This is found in a similar manner as before, by
sliding a copy of the pattern over the last j-l characters of itself from left
to right starting with the next-to-last character of the copy lined up with the
last character of the pattern, stopping when all overlapping characters match
(also taking into account the character which caused the mismatch).
This leads directly to a program which is quite similar to the above
implementation of the Knuth-Morris-Pratt method. We won't go into this
in more detail because there is a quite different way to skip over characters
with right-to-left pattern scanning which is much better in many cases.
The idea is to decide what to do next based on the character that caused
the mismatch in the tezt as well as the pattern. The simplest realization of
this leads immediately to a quite useful program. Consider the first example
that we studied, searching for the pattern STING in the text string
A STRING SEARCHING EXAMPLE CONSISTING OF SIMPLE TEXT
Proceeding from right to left to match the pattern, we first check the G
in the pattern against the R (the fifth character) in the text. Not only do
these not match, but also we can notice that R does not appear anywhere
in the pattern, so we might as well slide it all the way past the R. The next
comparison is of the G in the pattern against the fifth character following the
R (the S in SEARCHING). This time, we can slide the pattern to the right
until its S matches the S in the text. Then the G in the pattern is compared
against the C in SEARCHING, which doesn't appear in the pattern, so it can
be slid five more places to the right. After three more five-character skips,
we arrive at the T in CONSISTING, at which point we align the pattern
so that the its T matches the T in the text and find the full match. This
method brings us right to the match position at a cost of examining only seven
characters in the text (and five more to verify the match)! If the alphabet
is not small and the pattern is not long, then this "mismatched character
algorithm" will find a pattern of length M in a text string of length N in
about N/M steps.
The mismatched character algorithm is quite easy to implement. It
simply improves a brute-force right-to-left pattern scan by using an array
skip which tells, for each character in the alphabet, how far to skip if that
character appears in the text and causes a mismatch:
STRING SEARCHING 251
function mischarsearch: integer;
var i, j: integer;
i:=M; j:=:M;
repeat
if a[i]=pb]
then begin i:=i-1; j:=j-1 end
else
i:=i+M-j+l; j:=M;
if skip[index(a[i])]>M-j+1 then
i:=i+skip[index(a[i])]-(M-j+l);
end;
until (j<l) or (i>N);
mischarsearch:=i+l
end ;
The statement i:=i+M-j+1 resets i to the next position in the text string (as
the pattern moves from left-to-right across it); then j:=M resets the pattern
pointer to prepare for a right-to-left character-by-character match. The next
statement moves the pattern even further across the text, if warranted. For
simplicity, we assume that we have a function index(c: char): integer; that
returns 0 for blanks and i for the ith letter of the alphabet, and a procedure
initskip which initializes the skip array tll M for characters not in the pattern
and then for j from 1 to M sets skip[index(pb])] to M-j. For example, for
the pattern STING, the skip entry for G would be 0, the entry for N would be
1, the entry for I would be 2, the entry for T would be 3, the entry for S would
be 4, and the entries for all other letters T,vould be 5. Thus, for example, when
an S is encountered during a right-to-lefi, search, the i pointer is incremented
by 4 so that the end of the pattern is alig;ned four positions to the right of the
S (and consequently the S in the pattern lines up with the S in the text). If
there were more than one S in the pattern, we would want to use the rightmost
one for this calculation: hence the skip array is built by scanning from left to
right.
Boyer and Moore suggested combining the two methods we have outlined
for right-to-left patt,ern scanning, choosing the larger of the two skips called
for.
The mismatched character algorithm obviously won't help much for binary
strings, because there are only two possibilities for characters which cause
the mismatch (and these are both likely to be in the pattern). However, the
bits can be grouped together to make "characters" which can be used exactly
252 CRAI'TER 19
as above. If we take b bits at a time, then we need a skip table with 2b entries.
The value of b should be chosen small enough so that this table is not too
large, but large enough that most b-bit sections of the text are not likely to
be in the pattern. Specifically, there are M - b + 1 different b-bit sections in
the pattern (one starting at each bit position from 1 through M-b+ 1) so we
want M - b + 1 to be significantly less than 2b. For example, if we take b to
be about lg(4M), then the skip table will be more than three-quarters filled
with M entries. Also b must be less than M/2, otherwise we could miss the
pattern entirely if it were split between two b-bit text sections.
Rabin-Karp Algorithm
A brute-force approach to string searching which we didn't examine above
would be to use a large memory to advantage by treating each possible Mcharacter
section of the text as a key in a standard hash table. But it is
not necessary to keep a whole hash table, since the problem is set up so that
only one key is being sought: all that we need to do is to compute the hash
function for each of the possible M-character sections of the text and check if
it is equal to the hash function of the pattern. The problem with this method
is that it seems at first to be just as hard to compute the hash function for M
characters from the text as it is merely to check to see if they're equal to the
pattern. Rabin and Karp found an easy way to get around this problem for the
hash function h(k) = kmodq where q (the table size) is a large prime. Their
method is based on computing the hash function for position i in the text
given its value for position i - 1. The method follows quite directly from the
mathematical formulation. Let's assume that we translate our M characters
to numbers by packing them together in a computer word, which we then
treat as an integer. This corresponds to writing the characters as numbers in
a base-d number system, where d is the number of possible characters. The
number corresponding to a[i..i + M - l] is thus
z = a[i]dMP1 + a[i + lIdMe + ... + a[i + M - l]
and we can assume that we know the value of h(z) = xmodq. But shifting
one position right in the text simply corresponds to replacing x by
(x - a[i]dMel)d + a[i + M].
A fundamental property of the mod operation is that we can perform it at any
time during these operations and still get the same answer. Put another way,
if we take the remainder when divided by q after each arithmetic operation
(to keep the numbers that we're dealing with small) then we get the same
answer that we would if we were to perform all of the arithmetic operations,
then take the remainder when divided by q.
STRING SEARCHTNG 253
This leads to the very simple pattelm-matching algorithm implemented
below. The program assumes the same i.ldex function as above, but d=32 is
used for efficiency (the multiplications might be implemented as shifts).
function rksearch : integer;
const q=33554393; d=3.Z;
var hl, h2, dM, i: integer:
dM:=l; for i:=l to M-1 do dM:=(d*dM) mod q;
hl:=O; for i:=l to M do hl:=(hl*d+index(p[i])) mod q;
h2:=0; for i:=l to M do h2:=(h2*d+index(a[i])) mod q;
i:=l;
while (hloh2) and (i<=N-M) do
h2:=(h2+d*q-index(,t[i])*dM) mod q;
h2:=(h2*d+index(a[i+M])) mod q;
i:=i+l;
end ;
rksearch :=i;
end ;
The program first computes a hash valle hl for the pattern, then a hash
value h2 for the first M characters of the text. (Also it computes the value
of d"-' modq in the variable dM.) Then it proceeds through the text string,
using the technique above to compute the hash function for the M characters
starting at position i for each i, comparing each new hash value to hl. The
prime q is chosen to be as large as possible, but small enough that (d+l)*q
doesn't cause overflow: this requires less mod operations then if we used the
largest repesentable prime. (An extra d*q is added during the h2 calculation
to make sure that everything stays positive so that the mod operation works
as it should.)
This algorithm obviously takes time proportional to N + M. Note that
it really only finds a position in the text which has the same hash value as the
pattern, so, to be sure, we really should do a direct comparison of that text
with the pattern. However, the use of suc:i a large value of q, made possible by
the mod computations and by the fact that we don't have to keep the actual
hash table around, make8 it extremely unlikely that a collision will occur.
Theoretically, this algorithm could still take NM steps in the (unbelievably)
worst case, but in practice the algorithm can be relied upon to take about
N + M steps.
254 CHAPTER 19
Multiple Searches
The algorithms that we've been discussing are all oriented towards a specific
string searching problem: find an occurrence of a given pattern in a given
text string. If the same text string is to be the object of many pattern
searches, then it will be worthwhile to do some processing on the string to
make subsequent searches efficient.
If there are a large number of searches, the string searching problem can
be viewed as a special case of the general searching problem that we studied
in the previous section. We simply treat the text string as N overlapping
"keys," the ith key defined to be a[l..N], the entire text string starting at
position i. Of course, we don't manipulate the keys themselves, but pointers
to them: when we need to compare keys i and j we do character-by-character
compares starting at positions i and j of the text string. (If we use a "sentinel"
character larger than all other characters at the end, then one of the keys
will always be greater than the other.) Then the hashing, binary tree, and
other algorithms of the previous section can be used directly. First, an entire
structure is built up from the text string, and then efficient searches can be
performed for particular patterns.
There are many details which need to be worked out in applying searching
algorithms to string searching in this way; our intent is to point this out as
a viable option for some string searching applications. Different methods will
be appropriate in different situations. For example, if the searches will always
be for patterns of the same length, a hash table constructed with a single scan
as in the Rabin-Karp method will yield constant search times on the average.
On the other hand, if the patterns are to be of varying length, then one of the
tree-based methods might be appropriate. (Patricia is especially adaptable to
such an application.)
Other variations in the problem can make it significantly more difficult
and lead to drastically different methods, as we'll discover in the next two
chapters.
r-l
STRING SEARCHING 255
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Implement a brute-force pattern ms.tching algorithm that scans the pattern
from right to left.
Give the next table for the Knuth-Morris-Pratt algorithm for the pattern
AAIWUAA.
Give the next table for the Knuth-Morris-Pratt algorithm for the pattern
AERACADABRA.
Draw a finite state machine which can search for the pattern AE3RACAD
AFBA.
How would you search a text file fo; a string of 50 consecutive blanks?
Give the right-to-left skip table for the right-left scan for the pattern
AENACADABRA.
Construct an example for which the right-to-left pattern scan with only
the mismatch heuristic performs badly.
How would you modify the Rabin-Karp algorithm to search for a given
pattern with the additional proviso that the middle character is a "'wild
card" (any text character at all can match it)?
Implement a version of the Rabin-Karp algorithm that can find a given
two-dimensional pattern in a given two-dimensional text. Assume both
pattern and text are rectangles of characters.
Write programs to generate a random lOOO-bit text string, then find all
occurrences of the last /c bits elsewhere in the string, for k = 5,10,15.
(Different methods might be appropriate for different values of k.)
20. Pattern Matching
It is often desirable to do string searching with somewhat less than
complete information about the pattern to be found. For example, the
user of a text editor may wish to specif;r only part of his pattern, or he may
wish to specify a pattern which could match a few different words, or he might
wish to specify that any number of occurrences of some specific characters
should be ignored. In this chapter we'll consider how pattern matching of this
type can be done efficiently.
The algorithms in the previous chapter have a rather fundamental dependence
on complete specification of the pattern, so we have to consider different
methods. The basic mechanisms that we will consider make possible a very
powerful string searching facility which can match complicated M-character
patterns in N-character text strings in time proportional to MN.
First, we have to develop a way to describe the patterns: a "language"
that can be used to specify, in a rigorous way, the kinds of partial string
searching problems suggested above. Th .s language will involve more powerful
primitive operations than the simple "check if the ith character of the text
string matches the jth character of the pattern" operation used in the previous
chapter. In this chapter, we consider three basic operations in terms of an
imaginary type of machine that has the capability for searching for patterns
in a text string. Our pattern-matching algorithm will be a way to simulate
the operation of this type of machine. [n the next chapter, we'll see how to
translate from the pattern specification (which the user employs to describe
his string searching task) to the machine specification (which the algorithm
employs to actually carry out the search).
As we'll see, this solution to the pattern matching problem is intimately
related to fundamental processes in computer science. For example, the
method that we will use in our program to perform the string searching task
implied by a given pattern description is akin to the method used by the
257
258 CHAPTER 20
Pascal system to perform the computational task implied by a given Pascal
program.
Describing Patterns
We'll consider pattern descriptions made up of symbols tied together with the
following three fundamental operations.
(i) Concatenation. This is the operation used in the last chapter. If two
characters are adjacent in the pattern, then there is a match if and only
if the same two characters are adjacent in the text. For example, AI3
means A followed by B.
(ii) Or. This is the operation that allows us to specify alternatives in the
pattern. If we have an "or" between two characters, then there is a
match if and only if either of the characters occurs in the text. We'll
denote this operation by using the symbol + and use parentheses to
allow it to be combined with concatenation in arbitrarily complicated
ways. For example, A+B means "either A or B"; C(AC+B)D means
"either CACD or CBD"; and (A+C)((B+C)D) means "either ABD or
CBD or ACD or CCD."
(iii) Closure. This operation allows parts of the pattern to be repeated
arbitrarily. If we have the closure of a symbol, then there is a match if
and only if the symbol occurs any number of times (including 0). Closure
will be denoted by placing a * after the character or parenthesized group
to be repeated. For example, AB* matches strings consisting of an A
followed by any number of B's, while (AB)" matches strings consisting
of alternating A's and B's.
A string of symbols built up using these three operations is called a regular
expression. Each regular expression describes many specific text patterns.
Our goal is to develop an algorithm which will determine if any of the patterns
described by a given regular expression occur in a given text string.
We'll concentrate on concatenation, or, and closure in order to show
the basic principles in developing a regular-expression pattern matching algorithm.
Various additions are commonly made in actual systems for convenience.
For example, -A might mean "match any character except A."
This not operation is the same as an or involving all the characters except
A but is much easier to use. Similarly, "7" might mean "match any letter."
Again, this is obviously much more compact than a large or. Other examples
of additional symbols which might make specification of large patterns easier
are symbols which match the beginning or end of a line, any letter or any
number, etc.
These operations can be remarkably descriptive. For example, the pattern
description ?*(ie + ei)?' matches all words which have ie or ei in them (and so
PATTERN h4ATCHliVG
are likely to be misspelled!); and (1 + 01)' (0 + 1) describes all strings of O's and
l's which do not have two consecutive 0'~ Obviously there are many different
pattern descriptions which describe the same strings: we must try to specify
succinct pattern descriptions just as we ;ry to write efficient algorithms.
The pattern matching algorithm that we'll examine may be viewed as
a generalization of the brute force left-to-right string searching method (the
first method that we looked at in Chapter 19). The algorithm looks for the
leftmost substring in the text string which matches the pattern description by
scanning the text string from left to rignt, testing, at each position whether
there is a substring beginning at that position which matches the pattern
description.
Pattern Matching Machines
Recall that we can view the Knuth-Morris-Pratt algorithm as a finite-state
machine constructed from the search pattern which scans the text. The
method we will use for regular-expressior pattern matching is a generalization
of this.
The finite-state machine for the Knuth-Morris-Pratt algorithm changes
from state to state by looking at a character from the text string and then
changing to one state if there's a match, to another state if not. A mismatch
at any point means that the pattern couldn't occur in the text starting at that
point. The algorithm itself can be thought of as a simulation of the machine.
The characteristic of the machine that makes it easy to simulate is that it
is deterministic: each state transition is completely determined by the next
input character.
To handle regular expressions, it will be necessary to consider a more
powerful abstract machine. Because of the or operation, the machine can't
determine whether or not the pattern could occur at a given point by examining
just one character; in fact, because 0:' closure, it can't even determine how
many characters might need to be exam:ned before a mismatch is discovered.
The most natural way to overcome these problems is to endow the machine
with the power of nondeterminism: when faced with more than one way to
try to match the pattern, the machine should "guess" the right one! This
operation seems impossible to allow, but, we will see that it is easy to write a
program to simulate the actions of such a machine.
For example, the following diagram shows a nondeterministic finite-state
machine that could be used to search for the pattern description (A*B+AC)D
in a text string.
260 CHAPTER 20
As in the deterministic machine of the previous chapter, the machine can
travel from a state labeled with a character to the state "pointed to" by that
state by matching (and scanning past) that character in the text string. What
makes the machine nondeterministic is that there are some states (called null
states) which not only are not labeled, but also can "point to" two different
successor states. (Some null states, such as sta.te 4 in the diagram, are "noop"
states with one exit, which don't affect the operation of the machine,
but which make easier the implementation of the program which constructs
the machine, as we'll see. State 9 is a null state with no exits, which stops
the machine.) When in such a state, the machine can go to either successor
state regardless of what's in the input (without scanning past anything). The
machine has the power to guess which transition will lead to a match for the
given text string (if any will). Note that there are no "non-match" transitions
as in the previous chapter: the machine fails no find a match only if there is
no way even to guess a sequence of transitions that leads to a match.
The machine has a unique initial state (which is pointed to by a "free"
arrow) and a unique final state (which has no arrows going out). When started
out in the initial state, the machine should be able to "recognize" any string
described by the pattern by reading characters and changing state according
to its rules, ending up in the "final state." Because it has the power of
nondeterminism, the machine can guess the sequence of state changes that
can lead to the solution. (But when we try to simulate the machine on a
standard computer, we'll have to try all the possibilities.) For example, to
determine if its pattern description (A*B+AC)D can occur in the text string
CDAABCAAABDDACDAAC
the machine would immediately report failure if started on the first or second
character; it would work some to report failure on the next two characters; it
would immediately report failure on the fifth or sixth characters; and it would
guess the sequence of state transitions
PATTERN iMATCHING
5 2 2 1 2
261
1 2 3 4 8 9
to recognize AAAE%D if started on the seventh character.
We can construct the machine for a given regular expression by building
partial machines for parts of the expref#sion and defining the ways in which
two partial machines can be composed into a larger machine for each of the
three operations: concatenation, or, and closure.
We start with the trivial machine to recognize a particular character. It's
convenient to write this as a two-state machine, with one initial state (which
also recognizes the character) and one final state, as below:
-49-O
Now to build the machine for the concatenation of two expressions from the
machines for the individual expressions, we simply merge the final state of
the first with the initial state of the second:
Similarly, the machine for the or operation is built by adding a new null state
pointing to the two initial states, and making one final state point to the
other, which becomes the final state of l,he combined machine.
Finally, the machine for the closure operation is built by making the final
state the initial state and making it point back to the old initial state and a
new final state.
262 CHAPTER 20
A machine can be built which corresponds to any regular expression by
successively applying these rules. The numbers of the states for the example
machine above are in order of creation as the machine is built by scanning
the pattern from left to right, so the construction of the machine from the
rules above can be easily traced. Note that we have a 2-state trivial machine
for each letter in the regular expression, and each + and * causes one state to
be created (concatenation causes one to be deleted) so the number of states is
certainly less than twice the number of characters in the regular expression.
Representing the Machine
Our nondeterministic machines will all be constructed using only the three
composition rules outlined above, and we can take advantage of their simple
structure to manipulate them in a straightforward way. For example, no state
has more than two arrows leaving it. In fact, there are only two types of states:
those labeled by a character from the input alphabet (with one arrow leaving)
and unlabeled (null) states (with two or fewer arrows leaving). This means
that the machine can be represented with only a few pieces of information
per node. For example, the machine above might be represented as follows:
State Character Next I Next 2
0 5 -
1 A 2
2 3 1
3 B 4 -
4 8 8
5 6 2
6 A 7 -
7 C 8 -
8 D 9
9 0 0
The rows in this table may be interpreted as instructions to the nondeterministic
machine of the iorm "If you are in State and you see Character then
scan the character and go to state Next I (or Next 2)" State 9 is the final
state in this example, and State 0 is a pseudo-initial state whose Next I entry
PATTERN MATCHING
is the number of the actual initial state. (Note the special representation used
for null states with 0 or 1 exits.)
Since we often will want to access states just by number, the most suitable
organization for the machine is to use the array representation. We'll use the
three arrays
ch: amty [O..Mmax] of char;
nextl, next2: array [O..Mmax] of integer;
Here Mmax is the maximum number of' states (twice the maximum pattern
length). It would be possible to get by with two-thirds this amount of space,
since each state really uses only two rreaningful pieces of information, but
we'll forsake this improvement for the sake of clarity and also because pattern
descriptions are not likely to be particularly long.
We've seen how to build up mach.nes from regular expression pattern
descriptions and how such machines might be represented as arrays. However,
to write a program to do the translation from a regular expression to the
corresponding nondeterministic machine representation automatically is quite
another matter. In fact, even writing a program to determine if a given regular
expression is legal is challenging for the uninitiated. In the next chapter, we'll
study this operation, called parsing, in much more detail. For the moment,
we'll assume that this translation has been done, so that we have available
the ch, nextl, and next2 arrays representing a particular nondeterministic
machine which corresponds to the regular expression pattern description of
interest.
Simulating the Machine
The last step in the development of a. general regular-expression patternmatching
algorithm is to write a program which somehow simulates the operation
of a nondeterministic pattern-matching machine. The idea of writing a
program which can "guess" the right answer seems ridiculous. However, in
this case it turns out that we can keep track of all possible matches in a
systematic way, so that we do eventually encounter the correct one.
One possibility would be to develop a recursive program which mimics
the nondeterministic machine (but tries all possibilities rather than guessing
the right one). Instead of using this approach, we'll look at a nonrecursive
implementation which exposes the basic operating principles of the method
by keeping the states under consideration in a rather peculiar data structure
called a deque, described in some detail below.
The idea is to keep track of all states that could possibly be encountered
while the machine is "looking at" the c:lrrent input character. Each of these
264 CHAPTER 20
states are processed in turn: null states lead to two (or fewer) states, states for
characters which do not match the current input are eliminated, and states
for characters which do match the current input lead to new states for use
when the machine is looking at the next input character. Thus, we maintain
a list of all the states that the nondeterministic machine could possibly be in
at a particular point in the text: the problem is to design an appropriate data
structure for this list.
Processing null states seems to require a stack, since we are essentially
postponing one of two things to be done, just as when we removed the
recursion from Quicksort (so the new state should be put at the beginning
of the current list, lest it get postponed indefinitely). Processing the other
states seems to require a queue, since we don't want to examine states for the
next input character until we've finished with the current character (so the
new state should be put at the end of the current list). Rather than choosing
between these two data structures, we'll use both! Deques ("double-ended
queues") combine the features of stacks and queues: a deque is a list to which
items can be added at either end. (Actually, we use an "output-restricted
deque," since we always remove items from the beginning, not the end: that
would be "dealing from the bottom of the deck.")
A crucial property of the machine is that there are no "loops" consisting of
just null states: otherwise it could decide nondeterministically to loop forever.
It turns out that this implies that the number of states on the deque at any
time is less than the number of characters in the pattern description.
The program given below uses a deque to simulate the actions of a nondeterministic
pattern-matching machine as described above. While examining
a particular character in the input, the nondeterministic machine can be
in any one of several possible states: the program keeps track of these in
a deque dq. One pointer (head) to the head of the deque is maintained so
that items can be inserted or removed at the beginning, and another pointer
(tail) to the tail of the deque is maintained so that items can be inserted
at the end. If the pattern description has M characters the deque can be
implemented in a "circular" manner in an array of M integers. The contents
of the deque are the elements "between" head and tail (inclusive): if
head<=tail, the meaning is obvious; if head>tail we take the elements that
would fall between head and tail if the elements of dq were arranged in a
circle: dq[head], dq[head+l],. . .,dq[M-l],dq[O], dq[l], . . .,dq[tail]. This is
quite simply implemented by using head:= head+1 mod M to increment head
and similarly for tail. Similarly, head:= head+M-1 mod M refers to the element
before head in the rrray: this is the position at which an element should
be added to the beginning of the deque.
The main loop of the program removes a state from the deque (by
PATTERN MATCHING 265
incrementing head mod M and then referring to dq[head]) and performs the
action required. If a character is to be matched, the input is checked for the
required character: if it is found, the sate transition is effected by putting
the new state at the end of the deque (so that all states involving the current
character are processed before those involving the next one). If the state is
null, the two possible states to be simulated are put at the beginning of the
deque. The states involving the curren, input character are kept separated
from those involving the next by a marker scan=-1 in the deque: when
scan is encountered, the pointer into th,: input string is advanced. The loop
terminates when the end of the input is reached (no match found), state 0 is
reached (legal match found), or only one item, the scan marker is left on the
deque (no match found). This leads directly to the following implementation:
function match(j: intege.-): integer;
const scan=- 1;
var head, tail, nl, n2: integer;
dq: array [O..Mmax] of integer;
procedure addhead(x: integer);
begin dq[head] := x; head:=(head+M-1) mod A4 end;
procedure addtail(x: integer);
begin tail:=(tail+l) mod M; dq[tail]:=x end;
head:=l; taiJ:=O;
addtail(next1 [O]); addtail(scan);
match:=j-1;
repeat
if dq [head] =scan thfsn
begin j:=j+l; addtail(scan) end
else if ch [dq[head]]==alj] then
addtail(next1 [dq[head]])
else if ch[dq[head]]==' 'then
nl :=nextl [dq[her!d]] ; n2:=next2[dq[head]];
addhead(n1); if r'l<>n2 then addhead(n2)
end ;
head:=(head+l) mod M
until (j>N) or (dq[head]=O) or (head=tail);
if dq[head]=O then match:=j-1;
end ;
This function takes as its argument the -1osition j in the text string a at which
266 GIAF'TER20
it should start trying to match. It returns the index of the last character in
the match found (if any, otherwise it returns j-1).
The following table shows the contents of the deque each time a state is
removed when our sample machine is run with the text string AABD. (For
clarity, the details involving head, tail, and the maintenance of the circular
deque are suppressed in this table: each line shows those elements in the deque
between the head and tail pointers.) The characters appear in the lefthand
column in the table at the point when the program has finished scanning
them.
5 scan
2 6
1 3
3 6
6 s c a n
A scan 2
2 7
1 3
3 7
7 scan
A scan 2
2 scan
1 3
3 scan
B scan 4
4 s c a n
8 scan
D scan 9
9 scan
0 s c a n
s c a n
6 scan
scan 2
2
7
s c a n
7 s c a n
s c a n 2
2
scan
Thus, we start with State 5 while scanning the first character. First State 5
leads to States 2 and 6, then State 2 leads to States 1 and 3, all of which need
to scan the same character and are on the beginning of the deque. Then State
1 leads to State 2, but at the end of the deque (for the next input character).
State 3 only leads to another state while scanning a B, so it is ignored while
an A is being scanned. When the "scan" sentinel finally reaches the front of
the deque, we see that the machine could be either in State 2 or State 7 after
scanning an A. Continuing, the program eventually ends up the final state,
after considering all transitions consistent with the text string.
PATTERN MATCHING
The running time of this program obviously depends very heavily on
the pattern being matched. However, for each of the N input characters, it
processes at most M states of the mac:nne, so the worst case running time
is proportional to MN. For sure, not all nondeterministic machines can be
simulated so efficiently, as discussed in more detail in Chapter 40, but the use
of a simple hypothetical pattern-matching machine in this application leads
to a quite reasonable algorithm for a quite difficult problem. However, to
complete the algorithm, we need a program which translates arbitrary regular
expressions into "machines" for interpretation by the above code. In the next
chapter, we'll look at the implementation of such a program in the context of
a more general discussion of compilers a,nd parsing techniques.
r - l
268
Exercises
1. Give a regular expression for recognizing all occurrences of four or fewer
consecutive l's in a binary string.
2. Draw the nondeterministic pattern matching machine for the pattern
description (A+B)* +C.
3. Give the state transitions your machine from the previous exercise would
make to recognize ABBAC.
4. Explain how you would modify the nondeterministic machine to handle
the "not" function.
5. Explain how you would modify the nondeterministic machine to handle
"don't-care" characters.
6. What would happen if match were to try to simulate the following machine?
7. Modify match to handle regular expressions with the "not" function and
"don't-care" characters.
8. Show how to construct a pattern description of length M and a text
string of length N for which the running time of match is as large as
possible.
9. Why must the deque in match have only one "scan" sentinel in it?
10. Show the contents of the deque each time a state is removed when match
is used to simulate the example machine in the text with the text string
ACD.
21. Parsing
Several fundamental algorithms have been developed to recognize legal
computer programs and to decomI:ose their structure into a form suitable
for further processing. This operation, called parsing, has application beyond
computer science, since it is directly related to the study of the structure
of language in general. For example, parsing plays an important role in systems
which try to "understand" natural (human) languages and in systems
for translating from one language to another. One particular case of interest
is translating from a "high-level" co.nputer language like Pascal (suitable
for human use) to a "low-level" assembly or machine language (suitable for
machine execution). A program for doing such a translation is called a compiler.
Two general approaches are used for parsing. Top-down methods look
for a legal program by first looking for parts of a legal program, then looking
for parts of parts, etc. until the pieces are small enough to match the input
directly. Bottom-up methods put pieces of the input together in a structured
way making bigger and bigger pieces until a legal program is constructed.
In general, top-down methods are recursive, bottom-up methods are iterative;
top-down methods are thought to be easier to implement, bottom-up methods
are thought to be more efficient.
A full treatment of the issues involved in parser and compiler construction
would clearly be beyond the scope of thi>, book. However, by building a simple
"compiler" to complete the pattern-mats:hing algorithm of the previous chapter,
we will be able to consider some of' the fundamental concepts involved.
First we'll construct a top-down parser for a simple language for describing
regular expressions. Then we'll modify the parser to make a program which
translates regular expressions into pattern-matching machines for use by the
match procedure of the previous chapter.
Our intent in this chapter is to give some feeling for the basic principles
269
270 CHAPTER 21
of parsing and compiling while at the same time developing a useful pattern
matching algorithm. Certainly we cannot treat the issues involved at the
level of depth that they deserve. The reader should be warned that subtle
difficulties are likely to arise in applying the same approach to similar problems,
and advised that compiler construction is a quite well-developed field
with a variety of advanced methods available for serious applications.
Context-Free Grammars
Before we can write a program to determine whether a program written in
a given language is legal, we need a description of exactly what constitutes
a legal program. This description is called a grammar: to appreciate the terminology,
think of the language as English and read "sentence" for "program"
in the previous sentence (except for the first occurrence!). Programming languages
are often described by a particular type of grammar called a contextfree
grammar. For example, the context-free grammar which defines the set
of all legal regular expressions (as described in the previous chapter) is given
below.
(expression) : : = (term) 1 (term) + (expression)
(term) ::= (factor) 1(factor)(term)
(factor) ::= ((expression)) (21 1 (factor)*
This grammar describes regular expressions like those that we used in the last
chapter, such as (l+Ol)*(O+l) or (A*B+AC)D. Each line in the grammar is
called a production or replacement rule. The productions consist of terminal
symbols (, ), + and * which are the symbols used in the language being
described ('91," a special symbol, stands for any letter or digit); nonterminal
symbols (expression), (term), and (factor) which are internal to the grammar;
and metasymbols I:= and ( which are used to describe the meaning of the
productions. The ::= symbol, which may be read 2s a," defines the left-hand
side of the production in terms of the right-hand side; and the 1 symbol, which
may be read as "or" indicates alternative choices. The various productions,
though expressed in this concise symbolic notation, correspond in a simple
way to an intuitive description of the grammar. For example, the second
production in the example grammar might be read "a (term) is a (factor)
or a (factor) followed by a (term)." One nonterminal symbol, in this case
(expreswon), is distinguished in the sense that a string of terminal symbols is
in the language described by the grammar if and only if there is some way to
use the productions to derive that string from the distinguished nonterminal
by replacing (in any number of steps) a nonterminal symbol by any of the "or"
clauses on the right-hand side of a production for that nonterminal symbol.
PARSING 271
One natural way to describe the result of this derivation process is called
a purse tree: a diagram of the complete grammatical structure of the string
being parsed. For example, the following parse tree shows that the string
(A*B+AC)D is in the language described by the above grammar.
The circled internal nodes labeled E, F, a.nd T represent (expression), (factor),
and (term), respectively. Parse trees like this are sometimes used for English,
to break down a "sentence" into "subject," "verb," "object," etc.
The main function of a parser is to accept strings which can be so derived
and reject those that cannot, by attempting to construct a parse tree for
any given string. That is, the parser can recognize whether a string is in
the language described by the grammar by determining whether or not there
exists a parse tree for the string. Top-down parsers do so by building the
tree starting with the distinguished nonterminal at the top, working down
towards the string to be recognized at the bottom; bottom-up parsers do this
by starting with the string at the bottom, working backwards up towards the
distinguished nonterminal at the top.
As we'll see, if the strings being reo>gnized also have meanings implying
further processing, then the parser can convert them into an internal representation
which can facilitate such processing.
Another example of a context-free grammar may be found in the appendix
of the Pascal User Manual and Report: it describes legal Pascal programs.
The principles considered in this section for recognizing and using legal expressions
apply directly to the complex job of compiling and executing Pascal
272 CHAPTER 21
programs. For example, the following grammar describes a very small subset
of Pascal, arithmetic expressions involving addition and multiplication.
(expression) ::= (term) 1 (term) + (expression)
(term) ::= (factor) 1 (factor)* (term)
(factor) ::= ((expression)) ) 21
Again, w is a special symbol which stands for any letter, but in this grammar
the letters are likely to represent variables with numeric values. Examples of
legal strings for this grammar are A+(B*C) and (A+B*C)*D*(A+(B+C)).
As we have defined things, some strings are perfectly legal both as arithmetic
expressions and as regular expressions. For example, A*(B+C) might
mean "add B to C and multiply the result by A" or "take any number of A's
followed by either B or C." This points out the obvious fact that checking
whether a string is legally formed is one thing, but understanding what it
means is quite another. We'll return to this issue after we've seen how to
parse a string to check whether or not it is described by some grammar.
Each regular expression is itself an example of a context-free grammar:
any language which can be described by a regular expression can also be
described by a context-free grammar. The converse is not true: for example,
the concept of "balancing" parentheses can't be captured with regular expressions.
Other types of grammars can describe languages which can't be
described by context-free grammars. For example, context-sensitive grammars
are the same as those above except that the left-hand sides of productions
need not be single nonterminals. The differences between classes of languages
and a hierarchy of grammars for describing them have been very carefully
worked out and form a beautiful theory which lies at the heart of computer
science.
Top-Down Parsing
One parsing method uses recursion to recognize strings from the language
described exactly as specified by the grammar. Put simply, the grammar is
such a complete specification of the language that it can be turned directly
into a program!
Each production corresponds to a procedure with the name of the nonterminal
on the left-hand side. Nonterminals on the right-hand side of the
input correspond to (possibly recursive) procedure calls; terminals correspond
to scanning the input string. For example, the following procedure is part of
a top-down parser for our regular expression grammar:
PARSING 273
procedure expression;
if plj]='+' then
begin j:=j+ 1; expression end
end ;
An array p contains the regular expre:;sion being parsed, with an index j
pointing to the character currently begin examined. To parse a given regular
expression, we put it in p[l..M], (with a sentinel character in p[M+l] which
is not used in the grammar) set j to 1, and call expression. If this results in
j being set to M+1, then the regular ex 3ression is in the language described
by the grammar. Otherwise, we'll see below how various error conditions are
handled.
The first thing that expression does is call term, which has a slightly more
complicated implementation:
procedure term ;
fact x-;
if (1: b]='( ') or letter(ptj]) then term;
end
A direct translation from the grammar would simply have term call factor
and then term. This obviously won't work because it leaves no way to
exit from term: this program would go into an infinite recursive loop if
called. (Such loops have particularly unpleasant effects in many systems.)
The implementation above gets around this by first checking the input to
decide whether term should be called. l'he first thing that term does is call
factor, which is the only one of the proc:dures that could detect a mismatch
in the input. From the grammar, we know that when factor is called, the
current input character must be either :L "(" or an input letter (represented
by u). This process of checking the nez- t character (without incrementing j
to decide what to do is called lookahead. For some grammars, this is not
necessary; for others even more lookahead is required.
Now, the implementation of factor fallows directly from the grammar. If
the input character being scanned is not a "(" or an input letter, a procedure
error is called to handle the error condit on:
274 CHAPTER 21
procedure factor;
if pb]='('then
j:=j+l;
expression ;
if p b] = ') ' then j : =j+ 1 else error
end
else if letter(plj]) then j:=j+l else error;
if pb]='*'then j:=j+l;
end ;
Another error condition occurs when a ")" is missing.
These procedures are obviously recursive; in fact they are so intertwined
that they can't be compiled in Pascal without using the forward construct
to get around the rule that a procedure can't be used without first being
declared.
The parse tree for a given string gives the recursive cal! structure during
parsing. The reader may wish to refer to the tree above and trace through
the operation of the above three procedures when p contains (A*B+AC)D and
expression is called with j=1. This makes the origin of the "top-down" name
obvious. Such parsers are also often called recursive descent parsers because
they move down the parse tree recursively.
The top-down approach won't work for all possible context-free grammars.
For example, if we had the production (expression) ::= v 1 (expression)
+ (term) then we would have
procedure badexpression ;
if letter(pb]) then j:=j+l else
badexpression ;
if p b] < > '+ ' then error else
begin j:=j+l; term end
end
end ;
If this procedure were called with plj] a nonletter (as in our example, for
j=l) then it would go into an infinite recursive loop. Avoiding such loops is
a principal difficulty in the implementation of recursive descent parsers. For
PARSING 275
term, we used lookahead to avoid such a loop; in this case the proper way to
get around the problem is to switch the grammar to say (term)+(expression).
The occurrence of a nonterminal as the first thing on the right hand side of
a replacement rule for itself is called left recursion. Actually, the problem
is more subtle, because the left recursion can arise indirectly: for example
if we were to have the productions (expression) ::= (term) and (term) ::=
v 1 (expression) + (term). Recursive descent parsers won't work for such
grammars: they have to be transformed to equivalent grammars without left
recursion, or some other parsing method has to be used. In general, there
is an intimate and very widely studied connection between parsers and the
grammars they recognize. The choice of a parsing technique is often dictated
by the characteristics of the grammar to be parsed.
Bottom- Up Parsing
Though there are several recursive calls in the programs above, it is an instructive
exercise to remove the recursion systematically. Recall from Chapter
9 (where we removed the recursion from Quicksort) that each procedure call
can be replaced by a stack push and each procedure return by a stack pop,
mimicking what the Pascal system does to implement recursion. A reason
for doing this is that many of the calls which seem recursive are not truly
recursive. When a procedure call is the last action of a procedure, then a
simple goto can be used. This turns expression and term into simple loops,
which can be incorporated together and combined with factor to produce a
single procedure with one true recursive call (the call to expression within
factor).
This view leads directly to a quite simple way to check whether regular
expressions are legal. Once all the procedure calls are removed, we see that
each terminal symbol is simply scanned as it is encountered. The only real
processing done is to check whether there is a right parenthesis to match each
left parenthesis and whether each 'I+" is followed by either a letter or a "(I'.
That is, checking whether a regular expression is legal is essentially equivalent
to checking for balanced parentheses. This can be simply implemented by
keeping a counter, initialized to 0, which is incremented when a left parenthesis
is encountered, decremented when a right parenthesis is encountered.
If the counter is zero when the end of the expression is reached, and each 'I+"
of the expression is followed by either a letter or a "(", then the expression
was legal.
Of course, there is more to parsing than simply checking whether the
input string is legal: the main goal is to build the parse tree (even if in an
implicit way, as in the top-down parser) for other processing. It turns out to
be possible to do this with programs with the same essential structure as the
parenthesis checker described in the previous paragraph. One type of parser
276 CHAPTER 21
which works in this way is the 'so-called shift-reduce parser. The idea is to
maintain a pushdown stack which holds terminal and nonterminal symbols.
Each step in the parse is either a shift step, in which the next input character
is simply pushed onto the stack, or a reduce step, in which the top characters
on the stack are matched to the right-hand side of some production in the
grammar and "reduced to" (replaced by) the nonterminal on the left side
of that production. Eventually all the input characters get shifted onto the
stack, and eventually the stack gets reduced to a single nonterminal symbol.
The main difficulty in building a shift-reduce parser is deciding when to
shift and when to reduce. This can be a complicated decision, depending
on the grammar. Various types of shift-reduce parsers have been studied in
great detail, an extensive literature has been developed on them, and they are
quite often preferred over recursive descent parsers because they tend to be
slightly more efficient and significantly more flexible. Certainly we don't have
space here to do justice to this field, and we'll forgo even the details of an
implementation for our example.
Compilers
A compiler may be thought of as a program which translates from one language
to another. For example, a Pascal compiler translates programs from
the Pascal language into the machine language of some particular computer.
We'll illustrate one way that this might be done by continuing with our
regular-expression pattern-matching example, where we wish to translate
from the language of regular expressions to a "language" for pattern-matching
machines, the ch, nextl, and next2 arrays of the match program of the previous
chapter.
Essentially, the translation process is "one-to-one": for each character in
the pattern (with the exception of parentheses) we want to produce a state
for the pattern-matching machine (an entry in each of the arrays). The trick
is to keep track of the information necessary to fill in the next1 and next2
arrays. To do so, we'll convert each of the procedures in our recursive descent
parser into functions which create pattern-matching machines. Each function
will add new states as necessary onto the end of the ch, nextl, and next2
arrays, and return the index of the initial state of the machine created (the
final state will always be the last entry in the arrays).
For example, the function given below for the (expression) production
creates the "or" states for the pattern matching machine.
PARSING 277
function expression : integer;
var tl, t2: integer;
tl : = term ; expression : = tl ;
if plj]='+' then
j:=j+l; state:=state+I;
t2:=state; expression:=t2; state:=state+l;
setstate(t2, ' ', expression, tl ) ;
setstate(t2-I, ' ', state, state);
end ;
end ;
This function uses a procedure setstate which simply sets the ch, nextl, and
next2 array entries indexed by the first argument to the values given in the
second, third, and fourth arguments, respectively. The index state keeps track
of the "current" state in the machine being built. Each time a new state is
created, state is simply incremented. Thus, the state indices for the machine
corresponding to a particular procedure call range between the value of state
on entry and the value of state on exit. The final state index is the value
of state on exit. (We don't actually "create" the final state by incrementing
state before exiting, since this makes it easy to "merge" the final state with
later initial states, as we'll see below.)
With this convention, it is easy to check (beware of the recursive call!)
that the above program implements the rule for composing two machines with
the "or" operation as diagramed in the previous chapter. First the machine
for the first part of the expression is built (recursively), then two new null
states are added and the second part of the expression built. The first null
state (with index t2--1) is the final state of the machine of the first part of
the expression which is made into a "no-op" state to skip to the final state for
the machine for the second part of the expression, as required. The second
null state (with index t2) is the initial state, so its index is the return value
for expression and its next1 and next2 entries are made to point to the initial
states of the two expressions. Note carefully that these are constructed in the
opposite order than one might expect, because the value of state for the no-op
state is not known until the recursive call to expression has been made.
The function for (term) first builds the machine for a (factor) then, if
necessary, merges the final state of that machine with the initial state of the
machine for another (term). This is easier done than said, since state is the
final state index of the call to factor. A call to term without incrementing
state does the trick:
278 CHAPTER 21
function term ;
var t: integer;
term :=factor;
if (pb]='(') or letter(p[j]) then t:=term
end ;
(We have no use for the initial state index returned by the second call to
term, but Pascal requires us to put it, somewhere, so we throw it away in a
temporary variable t.)
The function for (factor) uses similar techniques to handle its three cases:
a parenthesis calls for a recursive call on expression; a v calls for simple
concatenation of a new state; and a * calls for operations similar to those in
expression, according to the closure diagram from the previous section:
function factor;
var tl, t2: integer;
tl :=state;
if plj]='('then
j:=j+l; t2:=expression;
if p b] = ') ' then j := j+ 1 else error
end
else if letter(pb]) then
setstate(state,plj], state+l, 0);
t2:=state; j:=j+l; state:=state+I
end
else error;
if p[j]<>'*'thenfactor:=t2 else
setstate(state, ' ', state+l, t2);
factor:=state; next1 [tl-I] :=state;
j:=j+l; state:=state+l;
end ;
end ;
The reader may find it instructive to trace through the construction of
the machine for the pattern (A*B+AC)D given in the previous chapter.
PARSING 279
The final step in the development >f a general regular expression pattern
matching algorithm is to put these procedures together with the match
procedure, as follows:
j:==l; state:=l;
ne Ytl [0] :=expression;
setstate(state, ' ', 0,O);
foI i:=l to N-l do
if match(i)>=i then writeln(i);
This program will print out all character positions in a text string a[l.. . N]
where a pattern p[l.. .M] leads to a match.
Compiler-Compilers
The program for general regular expresr:ion pattern matching that we have
developed in this and the previous chapter is efficient and quite useful. A
version of this program with a few added capabilities (for handling "don'tcare"
characters and other amenities) is likely to be among the most heavily
used utilities on many computer systems.
It is interesting (some might say confusing) to reflect on this algorithm
from a more philosophical point of view. In this chapter, we have considered
parsers for unraveling the structure of regular expressions, based on a formal
description of regular expressions using a context-free grammar. Put another
way, we used the context-free gramma]' to specify a particular "pattern":
sequences of characters with legally balz.nced parentheses. The parser then
checks to see if the pattern occurs in the input (but only considers a match
legal if it covers the entire input string). Thus parsers, which check that an
input string is in the set of strings defined by some context-free grammar,
and pattern matchers, which check that an input string is in the set of
strings defined by some regular expression, are essentially performing the same
function! The principal difference is that context-free grammars are capable
of describing a much wider class of strings. For example, the set of all regular
expressions can't be described with regular expressions.
Another difference in the way we've implemented the programs is that the
context-free grammar is "built in" to the parser, while the match procedure
is "table-driven": the same program wol,ks for all regular expressions, once
they have been translated into the propel. format. It turns out to be possible
to build parsers which are table-driven In the same way, so that the same
program can be used to parse all language 3 which can be described by contextfree
grammars. A parser generator is a program which takes a grammar as
input and produces a parser for the language described by that grammar as
280 CHAPTER 21
output. This can be carried one step further: it is possible to build compilers
which are table-driven in terms of both the input and the output languages. A
compiler-compiler is a program which takes two grammars (and some formal
specification of the relationships between them) as input and produces a
compiler which translates strings from one language to the other as output.
Parser generators and compiler-compilers are available for general use in
many computing environments, and are quite useful tools which can be used
to produce efficient and reliable parsers and compilers with a relatively small
amount of effort. On the other hand, top-down recursive descent parsers of the
type considered here are quite serviceable for simple grammars which arise in
many applications. Thus, as with many of the algorithms we have considered,
we have a straightforward method which can be used for applications where
a great deal of implementation effort might not be justified, and several advanced
methods which can lead to significant performance improvements for
large-scale applications. Of course, in this case, this is significantly understating
the point: we've only scratched the surface of this extensively researched
PARSING 281
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
How does the recursive descent parser find an error in a regular expression
such as (A+B)*BC+ which is incomplete?
Give the parse tree for the regular expression ((A+B)+(Ct-D)*)*.
Extend the arithmetic expression grammar to include exponentiation, div
and mod.
Give a context-free grammar to d':scribe all strings with no more than
two consecutive 1's.
How many procedure calls are used by the recursive descent parser to
recognize a regular expression in terms of the number of concatenation,
or, and closure operations and the number of parentheses?
Give the ch, next1 and next2 arrays that result from building the pattern
matching machine for the pattern ((A+B)+(C+D)*)*.
Modify the regular expression grammar to handle the "not" function and
"don't-care" characters.
Build a general regular expression pattern matcher based on the improved
grammar in your answer to the previous question.
Remove the recursion from the recursive descent compiler, and simplify
the resulting code as much as possible. Compare the running time of the
nonrecursive and recursive methods.
Write a compiler for simple arithmetic expressions described by the grammar
in the text. It should produce a list of '*instructions" for a machine
capable of three operations: Pugh the value of a variable onto a stack;
add the top two values on the stick, removing them from the stack, then
putting the result there; and mt.ltiply the top two values on the stack, in
the same way.
22. File Compression
For the most part, the algorithms that we have studied have been designed
primarily to use as little time as possible and only secondarily to
conserve space. In this section, we'll examine some algorithms with the opposite
orientation: methods designed primarily to reduce space consumption
without using up too much time. Ironically, the techniques that we'll examine
to save space are "coding" methods from information theory which were developed
to minimize the amount of information necessary in communications
systems and therefore originally intended to save time (not space).
In general, most files stored on computer systems have a great deal of
redundancy. The methods we will examine save space by taking advantage
of the fact that most files have a relatively low "information content." File
compression techniques are often used for text files (in which certain characters
appear much more often than others), "raster" files for encoding pictures
(which can have large homogeneous areas), and files for the digital representation
of sound and other analog signals (which can have large repeated
patterns).
We'll look at an elementary algorithm for the problem (which is still quite
useful) and an advanced "optimal" method. The amount of space saved by
these methods will vary depending on characteristics of the file. Savings of
20% to 50% are typical for text files, and savings of 50% to 90% might be
achieved for binary files. For some types of files, for example files consisting
of random bits, little can be gained. In fact, it is interesting to note that any
general-purpose compression method must make some files longer (otherwise
we could continually apply the method to produce an arbitrarily small file).
On one hand, one might argue that file compression techniques are less
important than they once were because the cost of computer storage devices
has dropped dramatically and far more storage is available to the typical user
than in the past. On the other hand, it can be argued that file compression
283
CHAPTER 22
techniques are more important than ever because, since so much storage is in
use, the savings they make possible are greater. Compression techniques are
also appropriate for storage devices which allow extremely high-speed access
and are by nature relatively expensive (and therefore small).
Run-Length Encoding
The simplest type of redundancy in a file is long runs of repeated characters.
For example, consider the following string:
AAAABBBAABBBBBCCCCCCCCDABCBAAABBBBCCCD
This string can be encoded more compactly by replacing each repeated
string of characters by a single instance of the repeated character along with
a count of the number of times it was repeated. We would like to say that this
string consists of 4 A's followed by 3 B's followed by 2 A's followed by 5 B's,
etc. Compressing a string in this way is called run-length encoding. There
are several ways to proceed with this idea, depending on characteristics of the
application. (Do the runs tend to be relatively long? How many bits are used
to encode the characters being encoded?) We'll look at one particular method,
then discuss other options.
If we know that our string contains just letters, then we can encode
counts simply by interspersing digits with the letters, thus our string might
be encoded as follows:
Here "4A" means "four A's," and so forth. Note that is is not worthwhile
to encode runs of length one or two, since two characters are needed for the
encoding.
For binary files (containing solely O's and l's), a refined version of this
method is typically used to yield dramatic savings. The idea is simply to store
the run lengths, taking advantage of the fact that the runs alternate between
0 and 1 to avoid storing the O's and l's themselves. (This assumes that there
are few short runs, but no run-length encoding method will work very well
unless most of the runs are long.) For example, at the left in the figure
below is a "raster" representation of the letter "q" lying on its side, which is
representative of the type of information that might have to be processed by a
text formatting system (such as the one used to print this book); at the right
is a list of numbers which might be used to store the letter in a compressed
form.
FILE COMPRESSION
000000000000000000000000000011111111111111000000000
000000000000000000000000001111111111111111110000000
000000000000000000000001111111111111111111111110000
000000000000000000000011111111111111111111111111000
000000000000000000001111111111111111111111111111110
0000000000000000000111111100000000000000~0001111111
000000000000000000011111000000000000000000000011111
000000000000000000011100000000000000000000000000111
000000000000000000011100000000000000000000000000111
000000000000000000011100000000000000000000000000111
000000000000000000011100000000000000000000000000111
000000000000000000001111000000000000000000000001110
000000000000000000000011100000000000000000000111000
011111111111111111111111111111111111111111111111111
011111111111111111111111111111111111111111111111111
011111111111111111111111111111111111111111111111111
011111111111111111111111111111111111111111111111111
0!1111111111111111111111111111111111111111111111111
011000000000000000000000000000000000000000000000011
285
28 14 9
26 18 7
23 24 4
22 26 3
20 30 1
19 7 18 7
19 5 22 5
19 3 26 3
19 3263
19 3 26 3
19 3 26 3
20 4 23 3 1
22 3 20 3 3
1 50
1 50
1 50
1 50
1 50
1 2 4 6 2
That is, the first line consists of 28 O's followed by 14 l's followed by 9 more
O's, etc. The 63 counts in this table plus the number of bits per line (51)
contain sufficient information to reconstruct the bit array (in particular, note
that no "end of line" indicator is needed). If six bits are used to represent each
count, then the entire file is represented with 384 bits, a substantial savings
over the 975 bits required to store it explicitly.
Run-length encoding requires a separate representation for the file to be
encoded and the encoded version of the file, so that it can't work for all files.
This can be quite inconvenient: for example, the character file compression
method suggested above won't work for character strings that contain digits.
If other characters are used to encode the counts, it won't work for strings
that contain those characters. To illustrate a way to encode any string from
a fixed alphabet of characters using only characters from that alphabet, we'll
assume that we only have the 26 letters of the alphabet (and spaces) to work
with.
How can we make some letters represent digits and others represent
parts of the string to be encoded? One solution is to use some character
which is likely to appear rarely in the text as a so-called escape character.
Each appearance of that character signals that the next two letters form a
(count,character) pair, with counts represented by having the ith letter of
the alphabet represent the number i. Thus our example string would be
represented as follows with Q as the escape character:
286 CHAPTER 22
QDABBBAAQEBQHCDABCBAAAQDBCCCD
The combination of the escape character, the count, and the one copy
of the repeated character is called an escape sequence. Note that it's not
worthwhile to encode runs less than four characters long since at least three
characters are required to encode any run.
But what if the escape character itself happens to occur in the input?
We can't afford to simply ignore this possibility, because it might be difficult
to ensure that any particular character can't occur. (For example, someone
might try to encode a string that has already been encoded.) One solution to
this problem is to use an escape sequence with a count of zero to represent the
escape character. Thus, in our example, the space character could represent
zero, and the escape sequence "Q(space)" would be used to represent any
occurrence of Q in the input. It is interesting to note that files which contain
Q are the only files which are made longer by this compression method. If a
file which has already been compressed is compressed again, it grows by at
least the number of characters equal to the number of escape sequences used.
Very long runs can be encoded with multiple escape sequences. For
example, a run of 51 A's would be encoded as QZAQYA using the conventions
above. If many very long runs are expected, it would be worthwhile to reserve
more than one character to encode the counts.
In practice, it is advisable to make both the compression and expansion
programs somewhat sensitive to errors. This can be done by including a small
amount of redundancy in the compressed file so that the expansion program
can be tolerant of an accidental minor change to the file between compression
and expansion. For example, it probably is worthwhile to put "end-of-line"
characters in the compressed version of the letter "q" above, so that the
expansion program can resynchronize itself in case of an error.
Run-length encoding is not particularly effective for text files because the
only character likely to be repeated is the blank, and there are simpler ways to
encode repeated blanks. (It was used to great advantage in the past to compress
text files created by reading in punched-card decks, which necessarily
contained many blanks.) In modern systems, repeated strings of blanks are
never entered, never stored: repeated strings of blanks at the beginning of
lines are encoded as "tabs," blanks at the ends of lines are obviated by the
use of "end-of-line" indicators. A run-length encoding implementation like
the one above (but modified to handle all representable characters) saves only
about 4% when used on the text file for this chapter (and this savings all
comes from the letter "q" example!).
Variable-Length Encoding
In this section we'll examine a file compression technique called Huffman
FILE COMPRESSION 287
encoding which can save a substantial amount of space on text files (and
many other kinds of files). The idea is to abandon the way that text files are
usually stored: instead of using the usual seven or eight bits for each character,
Huffman's method uses only a few bits for characters which are used often,
more bits for those which are rarely used.
It will be convenient to examine how the code is used before considering
how it is created. Suppose that we wish to encode the string "A SIMPLE
STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS."
Encoding it in our standard compact binary code with the five-bit binary
representation of i representing the ith letter of the alphabet (0 for blank)
gives the following bit sequence:
000010000010011010010110110000011000010100000
100111010010010010010111000111000001010001111
000000001000101000000010101110000110111100100
001010010000000101011001101001011100011100000
000010000001101010010111001001011010000101100
000000111010101011010001000101100100000001111
001100000000010010011010010011
To "decode" this message, simply read off five bits at a time and convert
according to the binary encoding defined above. In this standard code, the
C, which appears only once, requires the same number of bits as the I, which
appears six times. The Huffman code achieves economy in space by encoding
frequently used characters with as few bits as possible so that the total number
of bits used for the message is minimized.
The first step is to count the frequency of each character within the
message to be encoded. The following code fills an array count[0..26] with the
frequency counts for a message in a character array a[l..M]. (This program
uses the index procedure described in Chapter 19 to keep the frequency count
for the ith letter of the alphabet in count[i], with count[0] used for blanks.)
for i:=O to 26 do count [i] :=O;
for i:=l to M do
count[index(a[i])] := count[index(a[i])]+1;
For our example string, the count table produced is
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1 1 3 3 1 2 5 1 2 0 6 0 0 2 4 5 3 1 0 2 4 3 2 0 0 0 0 0
288 CHAPTER 22
which indicates that there are eleven blanks, three A's, three B's, etc.
The next step is to build a "coding tree" from the bottom up according
to the frequencies. First we create a tree node for each nonzero frequency
from the table above:
Now we pick the two nodes with the smallest frequencies and create a new
node with those two nodes as sons and with frequency value the sum of the
values of the sons:
(It doesn't matter which nodes are used if there are more than two with the
smallest frequency.) Continuing in this way, we build up larger and larger
subtrees. The forest of trees after all nodes with frequency 2 have been put
in is as follows:
Next, the nodes with frequency 3 are put together, creating two new nodes
of frequency 6, etc. Ultimately, all the nodes are combined together into a
single tree:
FILE COMPRESSION 289
c1! lb1
c P
Note that nodes with low frequencies end up far down in the tree and nodes
with high frequencies end up near the root of the tree. The numbers labeling
the external (square) nodes in this tree are the frequency counts, while the
number labeling each internal (round) node is the sum of the labels of its
two sons. The small number above each node in this tree is the index into
the count array where the label is stored, for reference when examining the
program which constructs the tree below. (The labels for the internal nodes
will be stored in count[27..51] in an order determined by the dynamics of the
construction.) Thus, for example, the 5 in the leftmost external node (the
frequency count for N) is stored in count [14], the 6 in the next external node
(the frequency count for I) is stored in count [9], and the 11 in the father of
these two is stored in count[33], etc.
It turns out that this structural description of the frequencies in the form
of a tree is exactly what is needed to create an efficient encoding. Before
looking at this encoding, let's look at the code for constructing the tree.
The general process involves removing the smallest from a set of unordered
elements, so we'll use the pqdownheap procedure from Chapter 11 to build and
maintain an indirect heap on the frequency values. Since we're interested in
small values first, we'll assume that the sense of the inequalities in pqdownheap
has been reversed. One advantage of using indirection is that it is easy to
ignore zero frequency counts. The following table shows the heap constructed
for our example:
290 CWTER 22
k 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
heap PI 3 7 16 21 12 15 6 20 9 4 13 14 5 2 18 19 1 0
count[heap[k]] 1 2 1 2 2 3 1 3 6 2 4 5 5 3 2 4 3 11
Specifically, this heap is built by first initializing the heap array to point to
the non-zero frequency counts, then using the pqdownheap procedure from
Chapter 11, as follows:
N:=O;
for i:=O to 26 do
if count [i] < > 0 then
begin N:=N+I; heap[N] :=i end;
for k:=N downto 1 do pqdownheap(k);
As mentioned above, this assumes that the sense of the inequalities in the
pqdownheap code has been reversed.
Now, the use of this procedure to construct the tree as above is straightforward:
we take the two smallest elements off the heap, add them and put the
result back into the heap. At each step we create one new count, and decrease
the size of the heap by one. This process creates N-l new counts, one for
each of the internal nodes of the tree being created, as in the following code:
repeat
t:=heap[l]; heap[l]:=heap[N]; N:=N-1;
pqdownheap(l);
count[26+N]:=count[heap[I]]+count[t];
dad[t]:=26+N; dad[heap[l]]:=-26-N;
heap[l]:=26+N; pqdownheap(1);
until N= 1;
dad[26+N] :=O;
The first two lines of this loop are actually pqremove; the size of the heap is
decreased by one. Then a new internal node is "created" with index 26+Nand
given a value equal to the sum of the value at the root and value just removed.
Then this node is put at the root, which raises its priority, necessitating
another call on pqdownheap to restore order in the heap. The tree itself is
represented with an array of "father" links: dad[t] is the index of the father
of the node whose weight is in count [t]. The sign of dad[t] indicates whether
the node is a left or right son of its father. For example, in the tree above
we might have dad[O]=-30, count[30]=21, dad[30]=-28, and count[28]=37
FILE COMPRESSION 2 9 1
(indicating that the node of weight 21 has index 30 and its father has index
28 and weight 37).
The Huffman code is derived from this coding tree simply by replacing the
frequencies at the bottom nodes with the associated letters and then viewing
the tree as a radix search trie:
CP
Now the code can be read directly from this tree. The code for N is 000,
the code for I is 001, the code for C is 110100, etc. The following program
fragment reconstructs this information from the representation of the coding
tree computed during the sifting process. The code is represented by two
arrays: code[k] gives the binary representation of the kth letter and len [k]
gives the number of bits from code[k] to use in the code. For example, I is
the 9th letter and has code 001, so code [9]=1 and len [ 9]=3.
292 CHAPTER 22
for k:=O to 26 do
if count[k]=O then
begin code[k] :=O; len[k] :=O end
else
i:=O; j:=l; t:=dad[k]; x:=0;
repeat
if t<O then begin x:=x+j; t:=--t end;
t:=dad[t]; j:=j+j; i:=i+I
until t=O;
code[k] :=x; len[k] :=i;
end ;
Finally, we can use these computed representations of the code to encode the
message:
for j:=l to M do
for i:=Ien[index(ab])] downto 1 do
write(bits(code[index(ab])],i-I, 1):1);
This program uses the bits procedure from Chapters 10 and 17 to access single
bits. Our sample message is encoded in only 236 bits versus the 300 used for
the straightforward encoding, a 21% savings:
011011110010011010110101100011100111100111011
101110010000111111111011010011101011100111110
000011010001001011011001011011110000100100100
001111111011011110100010000011010011010001111
000100001010010111001011111101000111011101010
01110111001
An interesting feature of the Huffman code that the reader undoubtedly
has noticed is that delimiters between characters are not stored, even though
different characters may be coded with different numbers of bits. How can
we determine when one character stops and the next begins to decode the
message? The answer is to use the radix search trie representation of the
code. Starting at the root, proceed down the tree according to the bits in the
message: each time an external node is encountered, output the character at
that node and restart at the root. But the tree is built at the time we encode
FlLE COMPRESSION 293
the message: this means that we need to save the tree along with the message
in order to decode it. Fortunately, this does not present any real difficulty.
It is actually necessary only to store the code array, because the radix search
trie which results from inserting the entries from that array into an initially
empty tree is the decoding tree.
Thus, the storage savings quoted above is not entirely accurate, because
the message can't be decoded without the trie and we must take into account
the cost of storing the trie (i.e., the code array) along with the message.
Huffman encoding is therefore only effective for long files where the savings in
the message is enough to offset the cost, or in situations where the coding trie
can be precomputed and used for a large number of messages. For example, a
trie based on the frequencies of occurrence of letters in the English language
could be used for text documents. For that matter, a trie based on the
frequency of occurrence of characters in Pascal programs could be used for
encoding programs (for example, ";" is likely to be near the top of such a
trie). A Huffman encoding algorithm saves about 23% when run on the text
for this chapter.
As before, for truly random files, even this clever encoding scheme won't
work because each character will occur approximately the same number of
times, which will lead to a fully balanced coding tree and an equal number of
bits per letter in the code.
I
294
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Exercises
Implement compression and expansion procedures for the run-length encoding
method for a fixed alphabet described in the text, using Q as the
escape character.
Could "QQ" occur somewhere in a file compressed using the method
described in the text? Could "QQ&" occur?
Implement compression and expansion procedures for the binary file encoding
method described in the text.
The letter "q" given in the text can be processed as a sequence of fivebit
characters. Discuss the pros and cons of doing so in order to use a
character-based run-length encoding method.
Draw a Huffman coding tree for the string "ABRACADABRA." How
many bits does the encoded message require?
What is the Huffman code for a binary file? Give an example showing
the maximum number of bits that could be used in a Huffman code for a
N-character ternary (three-valued) file.
Suppose that the frequencies of the occurrence of all the characters to be
encoded are different. Is the Huffman encoding tree unique?
Huffman coding could be extended in a straightforward way t,o encode
in two-bit characters (using 4-way trees). What would be the main
advantage and the main disadvantage of doing so?
What would be the result of breaking up a Huffman-encoded string into
five-bit characters and Huffman encoding that string?
Implement a procedure to decode a Huffman-encoded string, given the
code and len arrays.
23. Cryptology
In the previous chapter we looked at methods for encoding strings of
characters to save space. Of course, there is another very important
reason to encode strings of characters: to keep them secret.
Cryptology, the study of systems for secret communications, consists of
two competing fields of study: cryptography, the design of secret communications
systems, and cryptanalysis, the study of ways to compromise secret communications
systems. The main application of cryptology has been in military
and diplomatic communications systems, but other significant applications
are becoming apparent. Two principal examples are computer file systems
(where each user would prefer to keep his files private) and "electronic funds
transfer" systems (where very large amounts of money are involved). A computer
user wants to keep his computer files just as private as papers in his
file cabinet, and a bank wants electronic funds transfer to be just as secure
as funds transfer by armored car.
Except for military applications, we assume that cryptographers are "good
guys" and cryptanalysts are "bad guys": our goal is to protect our computer
files and our bank accounts from criminals. If this point of view seems somewhat
unfriendly, it must be noted (without being over-philosophical) that by
using cryptography one is assuming the existence of unfriendliness! Of course,
even "good guys" must know something about cryptanalysis, since the very
best way to be sure that a system is secure is to try to compromise it yourself.
(Also, there are several documented instances of wars being brought to an
end, and many lives saved, through successes in cryptanalysis.)
Cryptology has many close connections with computer science and algorithms,
especially the arithmetic and string-processing algorithms that we
have studied. Indeed, the art (science?) of cryptology has an intimate relationship
with computers and computer science that is only beginning to be fully
understood. Like algorithms, cryptosystems have been around far longer
295
296 CHAPTER 23
than computers. Secrecy system design and algorithm design have a common
heritage, and the same people are attracted to both.
It is not entirely clear which branch of cryptology has been affected most
by the availability of computers. The cryptographer now has available a much
more powerful encryption machine than before, but this also gives him more
room to make a mistake. The cryptanalyst has much more powerful tools
for breaking codes, but the codes to be broken are more complicated than
ever before. Cryptanalysis can place an incredible strain on computational
resources; not only was it among the first applications areas for computers,
but it still remains a principal applications area for modern supercomputers.
More recently, the widespread use of computers has led to the emergence
of a variety of important new applications for cryptology, as mentioned above.
New cryptographic methods have recently been developed appropriate for such
applications, and these have led to the discovery of a fundamental relationship
between cryptology and an important area of theoretical computer science
that we'll examine briefly in Chapter 40.
In this chapter, we'll examine some of the basic characteristics of cryptographic
algorithms because of the importance of cryptography in modern
computer systems and because of close relationships with many of the algorithms
we have studied. We'll refrain from delving into detailed implementations:
cryptography is certainly a field that should be left to experts. While
it's not difficult to "keep people honest" by encrypting things with a simple
cryptographic algorithm, it is dangerous to rely upon a method implemented
by a non-expert.
Rules of the Game
All the elements that go into providing a means for secure communications
between two individuals together are called a cryptosystem. The canonical
structure of a typical cryptosystem is diagramed below:
"attack
at dawn"
The sender (S) wishes to send a message (called the plaintezt) to the
receiver (R). To do so, he transforms the plaintext into a secret form suitable
CRYPTOLOGY
for transmission (called the ciphertext) using a cryptographic algorithm (the
encryption method) and some key (K) parameters. To read the message,
the receiver must have a matching cryptographic algorithm (the decryption
method) and the same key parameters, which he can use to transform the
ciphertext back into the plaintext, the message. It is usually assumed that
the ciphertext is sent over insecure communications lines and is available to
the cryptanalyst (A). It also is usually assumed that the encryption and
decryption methods are known to the cryptanalyst: his aim is to recover the
plaintext from the ciphertext without knowing the key parameters. Note that
the whole system depends on some separate prior method of communication
between the sender and receiver to agree on the key parameters. As a rule,
the more key parameters, the more secure the cryptosystem is but the more
inconvenient it is to use. This situation is akin to that for more conventional
security systems: a combination safe is more secure with more numbers on the
combination lock, but it is harder to remember the combination. The parallel
with conventional systems also serves as a reminder that any security system
is only as secure as the trustworthiness of the people that have the key.
It is important to remember that economic questions play a central role
in cryptosystems. There is an economic motivation to build simple encryption
and decryption devices (since many may need to be provided and complicated
devices cost more). Also, there is an economic motivation to reduce the
amount of key information that must be distributed (since a very secure and
expensive method of communications must be used). Balanced against the cost
of implementing cryptographic algorithms and distributing key information
is the amount of money the cryptanalyst would be willing to pay to break
the system. For most applications, it is the cryptographer's aim to develop a
low-cost system with the property that it would cost the cryptanalyst much
more to read messages than he would be willing to pay. For a few applications,
a "provably secure" cryptosystem may be required: one for which it can be
ensured that the cryptanalyst can never read messages no matter what he is
willing to spend. (The very high stakes in some applications of cryptology
naturally imply that very large amounts of money are used for cryptanalysis.)
In algorithm design, we try to keep track of costs to help us choose the best
algorithms; in cryptology, costs play a central role in the design process.
Simple Methods
Among the simplest (and among the oldest) methods for encryption is the
Caesar cipher: if a letter in the plaintext is the Nth letter in the alphabet,
replace it by the (N + K)th letter in the alphabet, where K is some fixed
integer (Caesar used K = 3). For example, the table below shows how a
message is encrypted using this method with K = 1:
298 CHAPTER 23
Plaintext: ATTACK AT DAWN
Ciphertext: BUUBDLABUAEB X 0
This method is weak because the cryptanalyst has only to guess the value
of K: by trying each of the 26 choices, he can be sure that he will read the
message.
A far better method is to use a general table to define the substitution
to be made: for each letter in the plaintext, the table tells which letter to put
in the ciphertext. For example, if the table gives the correspondence
ABCDEFGHI JKLMNOPQRSTUVWXYZ
THE QUICKBROWNFXJMPDVRLAZYG
then the message is encrypted as follows:
Plaintext: ATTACK AT DAWN
Ciphertext: HWH OTHVTQHAF
This is much more powerful than the simple Caesar cipher because the cryptanalyst
would have to try many more (about 27! > 1028) tables to be sure
of reading the message. However, "simple substitution" ciphers like this are
easy to break because of letter frequencies inherent in the language. For example,
since E is the most frequent letter in English text, the cryptanalyst
could get good start on reading the message by looking for the most frequent
letter in the ciphertext and assuming that it is to be replaced by E. While
this might not be the right choice, he certainly is better off than if he had to
try all 26 letters. He can do even better by looking at two-letter combinations
("digrams"): certain digrams (such as QJ) never occur in English text while
others (such as ER) are very common. By examining frequencies of letters
and combinations of letters, a cryptanalyst can very easily break a simple
substitution cipher.
One way to make this type of attack more difficult is to use more than
one table. A simple example of this is an extension of the Caesar cipher called
the Vigenere cipher: a small repeated key is used to determine the value of K
for each letter. At each step, the key letter index is added to the plaintext
letter index to determine the ciphertext letter index. Our sample plaintext,
with the key ABC, is encrypted as follows:
Key: ABCABCABCABCAB
Plaintext: ATTACK AT DAWN
Ciphertext: BVWBENACWAFDX P
CRYPTOLOGY 299
For example, the last letter of the ciphertext is P, the 16th letter of the
alphabet, because the corresponding plaintext letter is N (the 14th letter),
and the corresponding key letter is B (the 2nd letter).
The Vigenere cipher can obviously be made more complicated by using
different general tables for each letter of the plaintext (rather than simple
offsets). Also, it is obvious that the longer the key, the better. In fact, if the
key is as long as the plaintext, we have the V&am cipher, more commonly
called the one-time pad. This is the only provably secure cryptosystem known,
and it is reportedly used for the Washington-Moscow hotline and other vital
applications. Since each key letter is used only once, the cryptanalyst can
do no better than try every possible key letter for every message position,
an obviously hopeless situation since this is as difficult as trying all possible
messages. However, using each key letter only once obviously leads to a severe
key distribution problem, and the one-time pad is only useful for relatively
short messages which are to be sent infrequently.
If the message and key are encoded in binary, a more common scheme
for position-by-position encryption is to use the "exclusive-or" function: to
encrypt the plaintext, "exclusive-or" it (bit by bit) with the key. An attractive
feature of this method is that decryption is the same operation as encryption:
the ciphertext is the exclusive-or of the plaintext and the key, but doing
another exclusive-or of the ciphertext and the key returns the plaintext.
Notice that the exclusive-or of the ciphertext and the plaintext is the key.
This seems surprising at first, but actually many cryptographic systems have
the property that the cryptanalyst can discover the key if he knows the
plaintext.
Encryption/Decryption Machines
Many cryptographic applications (for example, voice systems for military
communications) involve the transmission of large amounts of data, and this
makes the one-time pad infeasible. What is needed is an approximation to
the one-time pad in which a large amount of "pseudo-key" can be generated
from a small amount of true key to be distributed.
The usual setup in such situations is as follows: an encryption machine
is fed some cryptovariables (true key) by the sender, which it uses to generate
a long stream of key bits (pseudo-key). The exclusive-or of these bits and
the plaintext forms the ciphertext. The receiver, having a similar machine
and the same cryptovariables, uses them to generate the same key stream to
exclusive-or against the ciphertext and to retrieve the plaintext.
Key generation in this context is obviously very much like random number
generation, and our random number generation methods are appropriate for
key generation (the cryptovariables are the initial seeds of the random number
300 CHAPTER 23
generator). In fact, the linear feedback shift registers that we discussed in
Chapter 3 were first developed for use in encryption/decryption machines
such as described here. However, key generators have to be somewhat more
complicated than random number generators, because there are easy ways to
attack simple linear feedback shift registers. The problem is that it might be
easy for the cryptanalyst to get some plaintext (for example, silence in a voice
system), and therefore some key. If the cryptanalyst can get enough key that
he has the entire contents of the shift register, then he can get all the key
from that point on.
Cryptographers have several ways to avoid such problems. One way is to
make the feedback function itself a cryptovariable. It is usually assumed that
the cryptanalyst knows everything about the structure of the machine (maybe
he stole one) except the cryptovariables, but if some of the cryptovariables are
used to "configure" the machine, he may have difficulty finding their values.
Another method commonly used to confuse the cryptanalyst is the product
cipher, where two different machines are combined to produce a complicated
key stream (or to drive each other). Another method is nonlinear substitution;
here the translation between plaintext and ciphertext is done in large chunks,
not bit-by-bit. The general problem with such complex methods is that they
can be too complicated for even the cryptographer to understand and that
there always is the possibility that things may degenerate badly for some
choices of the cryptovariables.
Public-Key Cryptosystems
In commercial applications such as electronic funds transfer and (real) computer
mail, the key distribution problem is even more onerous than in the
traditional applications of cryptography. The prospect of providing long
keys (which must be changed often) to every citizen, while still maintaining
both security and cost-effectiveness, certainly inhibits the development of
such systems. Methods have recently been developed, however, which promise
to eliminate the key distribution problem completely. Such systems, called
public-key cryptosystems, are likely to come into widespread use in the near
future. One of the most prominent of these systems is based on some of the
arithmetic algorithms that we have been studying, so we will take a close look
at how it works.
The idea in public-key cryptosystems is to use a "phone book" of encryption
keys. Everyone's encryption key (denoted by P) is public knowledge: a
person's key could be listed, for example, next to his number in the telephone
book. Everyone also has a secret key used for decryption; this secret key
(denoted by S) is not known to anyone else. To transmit a message M, the
sender looks up the receiver's public key, uses it to encrypt the message, and
then transmits the message. We'll denote the encrypted message (ciphertext)
CRYPTOLOGY
by C=P(M). The receiver uses his private decryption key to decrypt and read
the message. For this system to work we must have at least the following
properties:
(4 S(P(M))=M for every message M.
(ii) All (S,P) pairs are distinct.
(iii) Deriving S from P is as hard as reading M.
(iv) Both S and P are easy to compute.
The first of these is a fundamental cryptographic property, the second two
provide the security, and the fourth makes the system feasible for use.
This general scheme was outlined by W. Diffie and M. Hellman in 1976,
but they had no method which satisfied all of these properties. Such a
method was discovered soon afterwards by R. Rivest, A. Shamir, and L.
Adleman. Their scheme, which has come to be known as the RSA publickey
cryptosystem, is based on arithmetic algorithms performed on very large
integers. The encryption key P is the integer pair (N,p) and the decryption
key S is the integer pair (N,s), where s is kept secret. These numbers are
intended to be very large (typically, N might be 200 digits and p and s might
be 100 digits). The encryption and decryption methods are then simple: first
the message is broken up into numbers less than N (for example, by taking
lg N bits at a time from the binary string corresponding to the character
encoding of the message). Then these numbers are independently raised to a
power modulo N: to encrypt a (piece of a) message M, compute C = P(M) =
MPmod N, and to decrypt a ciphertext C, compute M = S(C) = C"mod
N. This computation can be quickly and easily performed by modifying the
elementary exponentiation algorithm that we studied in Chapter 4 to take
the remainder when divided by N after each multiplication. (No more than
2 log N such operations are required for each piece of the message, so the tot,al
number of operations (on 100 digit numbers!) required is linear in the number
of bits in the message.)
Property (iv) above is therefore satisfied, and property (ii) can be easily
enforced. We still must make sure that the cryptovariables N, p, and s can be
chosen so as to satisfy properties (i) and (iii). To be convinced of these requires
an exposition of number theory which is beyond the scope of this book, but
we can outline the main ideas. First, it is necessary to generate three large
(approximately loo-digit) "random" prime numbers: the largest will be s and
we'll call the other two x and y. Then N is chosen to be the product of x
and y, and p is chosen so that ps mod (x - l)(y - 1) = 1. It is possible to prove
that, with N, p, and s chosen in this way, we have Mps mod N = M for all
messages M.
More specifically, each large prime can be generated by generating a large
random number, then testing successive numbers starting at that point until
302 CHAPTER 23
a prime is found. One simple method performs a calculation on a random
number that, with probability l/2, will "prove" that the number to be tested
is not prime. (A number which is not prime will survive 20 applications of
this test less than one time out of a million, 30 applications less than 1 time
out of a billion.) The last step is to compute p: it turns out that a variant of
Euclid's algorithm (see Chapter 1) is just what is needed.
Furthermore, s seems to be difficult to compute from knowledge of p (and
N), though no one has been able to prove that to be the case. Apparently,
finding p from s requires knowledge of x and y, and apparently it is necessary
to factor N to calculate x and y. But factoring N is thought to be very
difficult: the best factoring algorithms known would take millions of years to
factor a 200-digit number, using current technology.
An attractive feature of the RSA system is that the complicated computations
involving N, p, and s are performed only once for each user who
subscribes to the system, which the much more frequent operations of encryption
and decryption involve only breaking up the message and applying the
simple exponentiation procedure. This computational simplicity, combined
with all the convenience features provided by public-key cryptosystems, make
this system quite attractive for secure communications, especially on computer
systems and networks.
The RSA method has its drawbacks: the exponentiation procedure is actually
expensive by cryptographic standards, and, worse, there is the lingering
possibility that it might be possible to read messages encrypted using the
method. This is true with many cryptosystems: a cryptographic method must
withstand serious cryptanalytic attacks before it can be used with confidence.
Several other methods have been suggested for implementing public-key
cryptosystems. Some of the most interesting are linked to an important class
of problems which are generally thought to be very hard (though this is not
known for sure), which we'll discuss in Chapter 40. These cryptosystems
have the interesting property that a successful attack could provide insight on
how to solve some well-known difficult unsolved problems (as with factoring
for the RSA method). This link between cryptology and fundamental topics
in computer science research, along with the potential for widespread use of
public-key cryptography, have made this an active area of current research.
r l
303
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Decrypt the following message, which was encrypted with a Vigenere
cipher using the pattern CAB (repeated as necessary) for the key (on a
27-letter alphabet, with blank preceding A): DOBHBUAASXFZWJQQ
What table should be used to decrypt messages that have been encrypted
using the table substitution method?
Suppose that a Vigenere cipher with a two-character key is used to encrypt
a relatively long message. Write a program to infer the key, based on the
assumption that the frequency of occurrence of each character in odd
positions should be roughly equal to the frequency of occurrence of each
character in the even positions.
Write matching encryption and decryption procedures which use the
"exclusive or" operation between a binary version of the message with
a binary stream from one of the linear congruential random number
generators of Chapter 3.
Write a program to "break" the method given in the previous exercise,
assuming that the first 10 characters of the message are known to be
blanks.
Could one encrypt plaintext by "and"ing it (bit by bit) with the key?
Explain why or why not.
True or false: Public-key cryptography makes it convenient to send the
same message to several different users. Discuss your answer.
What is P(S(M)) for the RSA method for public-key cryptography?
RSA encoding might involve computing Mn, where M might be a k digit
number, represented in an array of k integers, say. About how many
operations would be required for this computation?
Implement encryption/decryption procedures for the RSA method (assume
that s, p and N are all given and represented in arrays of integers
of size 25).
304
SOURCES for String Processing
The best references for further information on many of the algorithms in
this section are the original sources. Knuth, Morris, and Pratt's 1977 paper
and Boyer and Moore's 1977 paper form the basis for much of the material
from Chapter 19. The 1968 paper by Thompson is the basis for the regularexpression
pattern matcher of Chapters 20-21. Huffman's 1952 paper, though
it predates many of the algorithmic considerations here, still makes interesting
reading. Rivest, Shamir, and Adleman describe fully the implementation and
applications of their public-key cryptosystem in their 1978 paper.
The book by Standish is a good general reference for many of the topics
covered in these chapters, especially Chapters 19, 22, and 23. Parsing and
compiling are viewed by many to be the heart of computer science, and there
are a large number of standard references available, for example the book
by Aho and Ullman. An extensive amount of background information on
cryptography may be found in the book by Kahn.
A. V. Aho and J. D. Ullman, Principles of Compiler Design, Addison-Wesley,
Reading, MA, 1977.
R. S. Boyer and J. S. Moore, "A fast string searching algorithm," Communications
of the ACM, 20, 10 (October, 1977).
D. A. Huffman, "A method for the construction of minimum-redundancy
codes," Proceedings of the IRE, 40 (1952).
D. Kahn, The Codebreakers, Macmillan, New York, 1967.
D. E. Knuth, J. H. Morris, and V. R. Pratt, "Fast pattern matching in strings,"
SIAM Journal on Computing, 6, 2 (June, 1977).
R. L. Rivest, A. Shamir and L. Adleman, "A method for obtaining digital
signatures and public-key cryptosystems," Communications of the ACM, 21,
2 (February, 1978).
T. A. Standish, Data Structure Techniques, Addison-Wesley, Reading, MA,
1980.
K. Thompson, "Regular expression search algorithm," Communications of the
ACM, 11, 6 (June, 1968).
GEOMETRIC ALGORITHMS
24. Elementary Geometric Methods
Computers are being used more and more to solve large-scale problems
which are inherently geometric. Geometric objects such as points, lines
and polygons are the basis of a broad variety of important applications and
give rise to an interesting set of problems and algorithms.
Geometric algorithms are important in design and analysis systems for
physical objects ranging from buildings and automobiles to very large-scale
integrated circuits. A designer working with a physical object has a geometric
intuition which is difficult to support in a computer representation. Many
other applications directly involve processing geometric data. For example,
a political "gerrymandering" scheme to divide a district up into areas which
have equal population (and which satisfy other criteria such as putting most
of the members of the other party in one area) is a sophisticated geometric
algorithm. Other applications abound in mathematics and statistics, where
many types of problems arise which can be naturally set in a geometric
representation.
Most of the algorithms that we've studied have involved text and numbers,
which are represented and processed naturally in most programming
environments. Indeed, the primitive operations required are implemented in
the hardware of most computer systems. For geometric problems, we'll see
that the situation is different: even the most elementary operations on points
and lines can be computationally challenging.
Geometric problems are easy to visualize, but that can be a liability.
Many problems which can be solved instantly by a person looking at a piece
of paper (example: is a given point inside a given polygon?) require nontrivial
computer programs. For more complicated problems, as in many other
applications, the method of solution appropriate for implementation on a
computer might be quite different from the method of solution appropriate
for a person.
307
CWAPTER 24
One might suspect that geometric algorithms would have a long history
because of the constructive nature of ancient geometry and because useful
applications are so widespread, but actually much of the work in the field
has been quite recent. Of course, it is often the case that the work of ancient
mathematicians has useful application in the development of algorithms
for modern computers. The field of geometric algorithms is interesting to
study because there is strong historical context, because new fundamental
algorithms are still being developed, and because many important large-scale
applications require these algorithms.
Points, Lines, and Polygons
Most of the programs that we'll study will operate on simple geometric objects
defined in a two-dimensional space. (But we will consider a few algorithms
which work in higher dimensions.) The fundamental object is a point, which
we'll consider to be a pair of integers -the "coordinates" of the point in
the usual Cartesian system. Considering only points with integer coordinates
leads to slightly simpler and more efficient algorithms, and is not as severe a
restriction as it might seem. A line is a pair of points, which we assume are
connected together by a straight line segment. A polygon is a list of points: we
assume that successive points are connected by lines and that the first point
is connected to the last to make a closed figure.
To work with these geometric objects, we need to be decide how to
represent them. Most of our programs will use the obvious representations
type point = record x,y: integer end;
line = record pl, p2: point end;
Note that points are restricted to have integer coordinates. A real representation
could also be used. However, it is important to note that restricting
the algorithms to process only integers can be a very significant timesaver
in many computing environments, because integer calculations are typically
much more efficient that "floating-point" calculations. Thus, when we can get
by with dealing only with integers without introducing much extra complication,
we will do so.
More complicated geometric objects will be represented in terms of these
basic components. For example, polygons will be represented as arrays of
points. Note that using arrays of lines would result in each point on the
polygon being included twice (though that still might be the natural representation
for some algorithms). Also, it is useful for some applications to
include extra information associated with each point or line. This can clearly
be handled by adding an info field to the records.
ELEMENTARY GEOMETRIC METHODS 309
We'll use the following set of sixteen points to illustrate the operation of
several geometric algorithms:
.E
l K
' 0
l F
*A
'C
' I
'G
'N
'L
l J
'H
l D
l M
l B
The points are labeled with single letters for reference in explaining the
examples. The programs usually have no reason to refer to points by "name";
they are simply stored in an array and are referred to by index. Of course,
the order in which the points appear in the array may be important in some
of the programs: indeed, it is the goal of some geometric algorithms to "sort"
the points into some particular order. The labels that we use are assigned
in the order in which the points are assumed to appear in the input. These
points have the following integer coordinates:
A B C D E F G H I J K L M N O P
x: 3 11 6 4 5 8 1 7 9 14 10 16 15 13 2 12
y: 9 1 8 3 15 11 6 4 7 5 13 14 2 16 12 10
A typical program will maintain an array p [1..N] of points and simply
read in N pairs of integers, assigning the first pair to the x and y coordinates
of p [I], the second pair to the x and y coordinates of p [2], etc. When p
is representing a polygon, it is sometimes convenient to maintain "sentinel"
values p[O]=p[N] and p[N+l]=p[l].
310 CXAPTER 24
At some' point, we usually want to "draw" our geometric objects. Even
if this is not inherent in the application, it is certainly more pleasant to work
with pictures than numbers when developing or debugging a new implementation.
It is clear that the fundamental operation here is drawing a line. Many
types of hardware devices have this capability, but it is reasonable to consider
drawing lines using just characters, so that we can print approximations to
picture of our objects directly as output to Pascal programs (for example).
Our restriction to integer point coordinates helps us here: given a line, we
simply want a collection of points that approximates the line. The following
recursive program draws a line by drawing the endpoints, then splitting the
line in half and drawing the two halves.
procedure draw(l: line) ;
var dx, dy: integer;
t: point; 11,12: line;
dot(l.pl.x,l.pl.y); dot(l.p2.x,l.p2.y);
dx:=J.p2.x-1.pl.x; dy:=l.p2.y-1.pl.y;
if (abs(dx)>l) or (abs(dy)>l) then
t.x:=l.pl .x+dx div 2; t.y:=l.pl .y+dy div 2;
Il.pl:=l.pl; H.p2:=t; draw(l1);
/2.pl:=t; /2.p2:=l.p2; draw(12);
end ;
end ;
The procedure dot is assumed to "draw" a single point. One way to
implement this is to maintain a two-dimensional array of characters with one
character per point allowed, initialized to "a". The dot simply corresponds to
storing a different character (say rr*") in the array position corresponding to
the referenced point. The picture is "drawn" by printing out the whole array.
For example, when the above procedure is used with such a dot procedure
with a 33x33 picture array to "draw" the lines connecting our sample points
BG, CD, EO, IL, and PK at a resolution of two characters per unit of measure,
we get the following picture:
ELEMENTARY GEOMETRIC METHODS 311
.................................
.................................
........ ..* ......................
....... ..* .......................
...... ..* .................... ...*
..... ..*.......................*.
.... ..*.............*.........* ..
... ..*...............*.......* ...
.. ..*................*......* ....
.................... ..*....* .....
..................... ..*..* ......
..................... ..*.* .......
...................... ..* ........
..................... ..* .........
.................... ..* ..........
................... ..* ...........
.......... ..*.......* ............
.......... ..*......* .............
......... ..*......* ..............
......... ..* .....................
..**.......* .....................
... .***...* ......................
..... ..**.* ......................
....... ..* .......................
....... ..*** .....................
....... ..*..** ...................
...... ..*.....*** ................
............... ..** ..............
................. ..* .............
.................. ..** ...........
.................... ..* ..........
.................................
.................................
Algorithms for converting geometric objects to points in this manner are called
scan-conversion algorithms. This example illustrates that it is easy to draw
nice-looking diagonal lines like EO and IL, but that it is somewhat harder to
make lines of arbitrary slope look nice using a coarse matrix of characters. The
recursive method given above has the disadvantages that it is not particularly
efficient (some points get plotted several times) and that it doesn't draw
certain lines very well (for example lines which are nearly horizontal and
nearly vertical). It has the advantages that it is relatively simple and that
it handles all the cases of the various orientation of the endpoints of the line
in a uniform way. Many sophisticated scan-conversion algorithms have been
developed which are more efficient and more accurate than this recursive one.
If the array has a very large number of dots, then the ragged edges of
the lines aren't discernible, but the same types of algorithms are appropriate.
However, the very high resolution necessary to make high-quality lines can
require very large amounts of memory (or computer time), so sophisticated
algorithms are called for, or other technologies might be appropriate. For
example, the text of this book was printed on a device capable of printing
millions of dots per square inch, but most of the lines in the figures were drawn
312 CHAPTER 24
with pen and ink.
Line Intersection
As our first example of an elementary geometric problem, we'll consider the
problem of determining whether or not two given line segments intersect. The
following diagram illustrates some of the situations that can arise.
E
0
When line segments actually intersect, as do CD and BG, the situation is quite
straightforward except that the endpoint of one line might fall on the other, as
in the case of KP and IL. When they don't intersect, then we extend the line
segments and consider the position of the intersection point of these extended
lines. This point could fall on one of the segments, as in the case of IL and
BG, or it could lie on neither of the segments, as in the case of CD and OE, or
the lines could be parallel, as in the case of OE and IL. The straightforward
way to solve this problem is to find the intersection point of the lines defined
by the line segments, then check whether this intersection point falls between
the endpoints of both of the segments. Another easy method uses a tool that
we'll find useful later, so we'll consider it in more detail: Given a line and
oints, we're often interested in whether the two points fall on the same
the line or not. This function is straightforward to compute from the
for the lines as follows:
ELEMENTARY GEOMETRIC METHODS 313
function same(l: line; pl,p2: point): integer;
var dx, dy, dxl, dx2, dyl, dy2: integer;
dx:=l.p2.x-1.pl.x; dy:=l.p2.y-1.pl.y;
dxl :=pl .x-1.~1.~; dyl :=pl.y-1.~1 .y;
dx2:=p2.x-1.p2.x; dy2:=p2.y-1.p2.y;
same:=(dx*dyl-ddy*dxl)*(dx*dy2-dpdx2)
end;
In terms of the variables in this program, it is easy to check that the quantity
(da: dyl - dy dzl) is 0 if pl is on the line, positive if pl is on one side, and
negative if it is on the other side. The same holds true for the other point, so
the product of the quantities for the two points is positive if and only if the
points fall on the same side of the line, negative if and only if the points fall
on different sides of the line, and 0 if and only if one or both points fall on
the line. We'll see that different algorithms need to treat points which fall on
lines in different ways, so this three-way test is quite useful.
This immediately gives an implementation of the intersect function. If
the endpoints of both line segments are on opposite sides of the other then
they must intersect.
function intersect(ll,l2: line): boolean;
intersect:=(same(ll, 12.pl,12.p2)<=0)
and (same(l2,ll.pl,ll.p2)<=0)
end ;
Unfortunately, there is one case where this function returns the wrong answer:
if the four line endpoints are collinear, it always will report intersection, even
though the lines may be widely separated. Special cases of this type are the
bane of geometric algorithms. The reader may gain some appreciation for the
kind of complications such cases can lead to by finding a clean way to repair
intersect and same to handle all cases.
If many lines are involved, the situation becomes much more complicated.
Later on, we'll see a sophisticated algorithm for determining whether any pair
in a set of N lines intersects.
Simple Closed Path
To get the flavor of problems dealing with sets of points, let's consider the
problem of finding a path through a set of N given points which doesn't
314 CHAF'TER 24
intersect itself, visits all the points, and returns to the point at which it
started. Such a path is called a simple closed path. One can imagine many
applications for this: the points might represent homes and the path the route
that a mailman might take to get to each of the homes without crossing his
path. Or we might simply want a reasonable way to draw the points using a
mechanical plotter. This is an elementary problem because it asks only for
any closed path connecting the points. The problem of finding the best such
path, called the traveling salesman problem, is much, much more difficult.
We'll look at this problem in some detail in the last few chapters of this book.
In the next chapter, we'll consider a related but much easier problem: finding
the shortest path that surrounds a set of N given points. In Chapter 31, we'll
see how to find the best way to "connect" a set of points.
An easy way to solve the elementary problem at hand is the following:
Pick one of the points to serve as an "anchor." Then compute the angle made
by drawing a line from each of the points in the set to the anchor and then
out the positive horizontal direction (this is part of the polar coordinate of
each point with the anchor point as origin). Next, sort the points according
to that angle. Finally, connect adjacent points. The result is a simple closed
path connecting the points, as drawn below:
ELEMENTARY GEOMETRIC METHODS 315
In this example, B is used as the anchor. If the points are visited in the order
B M J L N P K F I E C 0 A H G D B then a simple closed polygon will be
traced out.
If dx and dy are the delta x and y distances from some point to the anchor
point, then the angle needed in this algorithm is tan-' dyldx. Although
the arctangent is a built-in function in Pascal (and some other programming
environments), it is likely to be slow and it leads to at least two annoying extra
conditions to compute: whether dx is zero, and which quadrant the point is
in. Since the angle is used only for the sort in this algorithm, it makes sense
to use a function that is much easier to compute but has the same ordering
properties as the arctangent (so that when we sort, we get the same result).
A good candidate for such a function is simply dy/(dy + dz). Testing for
exceptional conditions is still necessary, but simpler. The following program
returns a number between 0 and 360 that is not the angle made by pl and
p2 with the horizontal but which has the same order properties as the true
angle.
function theta(pl, p2: point): real;
var dx, dy, ax, ay: integer;
t: real;
dx:=p2.x-p1.x; ax:=abs(dx);
dy:=p2.y-p1.y; ay:=abs(dy);
if (dx=O) and (dy=O) then t:=O
else t:=dy/(ax+ay);
if dx<O then t:= 2-t
else if dy<O then t:=4+t;
theta:=t*90.0;
end ;
In some programming environments it may not be worthwhile to use such
programs instead of standard trigonometric functions; in others it might lead
to significant savings. (In some cases it might be worthwhile to change theta
to have an integer value, to avoid using real numbers entirely.)
Inclusion in a Polygon
The next problem that we'll consider is a natural one: given a polygon represented
as an array of points and another point, determine whether the point
is inside or outside. A straightforward solution to this problem immediately
suggests itself: draw a long line segment from the point in any direction (long
enough so that its other endpoint is guaranteed to be outside the polygon) and
316 CHAPTER 24
count the number of lines from the polygon that it crosses. If the number is
odd, the point must be inside; if it is even, the point is outside. This is easily
seen by tracing what happens as we come in from the endpoint on the outside:
after the first line we hit, we are inside, after the second we are outside, etc.
If we proceed an even number of times, the point at which we end up (the
original point) must be outside.
The situation is not quite so simple, because some intersections might
occur right at the vertices of the input polygon. The drawing below shows
some of the situations that need to be handled.
Lines 1 and 2 are straightforward; line 3 leaves the polygon at a vertex; and
line 4 coincides with one of the edges of the polygon. The point where line
3 exits should count as 1 intersection with the polygon; all the other points
where lines intersect vertices should count as 0 (or 2). The reader may be
amused to try to find a simple test to distinguish these cases before reading
further.
The need to handle cases where polygon vertices fall on the test lines
forces us to do more than just count the line segments in the polygon which
intersect the test line. Essentially, we want to travel around the polygon,
incrementing an intersection counter whenever we go from one side of the test
ELEMENTARY GEOMETRIC METHODS 317
line to another. One way to implement this is to simply ignore points which
fall on the test line, as in the following program:
function inside(t: point): boolean;
var count, i,j: integer;
It, lp: line;
count:=O; j:=O;
PPI :=PPI; P[N+~I :=~[ll;
lt.pl:=t; It.p2:=t; It.plx:=maxint;
for i:=l to N do
Ip.pl:=p[i]; Ip.p2:=p[i];
if not intersect(lp, It) then
Ip.p2:=pIj] ; j:=i;
if intersect(lp, It) then count:=count+l;
end ;
end ;
inside:=( (count mod 2)=1);
end ;
This program uses a horizontal test line for ease of calculation (imagine the
above diagram as rotated 45 degrees). The variable j is maintained as the
index of the last point on the polygon known not to lie on the test line. The
program assumes that p[l] is the point with the smallest z coordinate among
all the points with the smallest y coordinate, so that if p[l] is on the test
line, then p [0] cannot be. For example, this choice might be used for p [I] as
the "anchor" for the procedure suggested above for computing a simple closed
polygon. The same polygon can be represented by N different p arrays, but
as this illustrates it is sometimes convenient to fix a standard rule for p [l]. If
the next point on the polygon which is not on the test line is on the same side
of the test line as the jth point, then we need not increment the intersection
counter (count); otherwise we have an intersection. The reader may wish to
check that this algorithm works properly for lines like lines 3 and 4 in the
diagram above.
If the polygon has only three or four sides, as is true in many applications,
then such a complex program is not called for: a simpler procedure based on
calls to same will be adequate.
318 CHAPTER 24
Perspective.
From the few examples given, it should be clear that it is easy to underestimate
the difficulty of solving a particular geometric problem with a computer.
There are many other elementary geometric computations that we have not
treated at all. For example, a program to compute the area of a polygon
makes an interesting exercise. However, the problems that we have studied
have provided some basic tools that we will find useful in later sections for
solving the more difficult problems.
Some of the algorithms that we'll study involve building geometric structures
from a given set of points. The "simple closed polygon" is an elementary
example of this. We will need to decide upon appropriate representations
for such structures, develop algorithms to build them, and investigate their
use for particular applications areas. As usual, these considerations are intertwined.
For example, the algorithm used in the inside procedure in this
chapter depends in an essential way on the representation of the simple closed
polygon as an ordered set of points (rather than as an unordered set of lines).
Many of the algorithms that we'll study involve geometric search: we
want to know which points from a given set are close to a given point, or which
points fall in a given rectangle, or which points are closest to each other. Many
of the algorithms appropriate for such search problems are closely related to
the search algorithms that we studied in Chapters 14-17. The parallels will
be quite evident.
Few geometric algorithms have been analyzed to the point where precise
statements can be made about their relative performance characteristics. As
we've already seen, the running time of a geometric algorithm can depend on
many things. The distribution of the points themselves, the order in which
they appear in the input, and whether trigonometric functions are needed
or used can all significantly affect the running time of geometric algorithms.
As usual in such situations, we do have empirical evidence which suggests
good algorithms for particular applications. Also, many of the algorithms are
designed to nerform well in the worst case. no matter what the inout is.
ELEMENTARY GEOMETRIC METHODS 319
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
List the points plotted by draw when plotting a line from (0,O) to (1,21).
Give a quick algorithm for determining whether two line segments are
parallel, without using any divisions.
Given an array of lines, how would you test to see if they form a simple
closed polygon?
Draw the simple closed polygons that result from using A, C, and D as
"anchors" in the method described in the text.
Suppose that we use an arbitrary point for the "anchor" in the method for
computing a simple closed polygon described in the text. Give conditions
which such a point must satisfy for the method to work.
What does the intersect function return when called with two copies of
the same line segment?
Write a program like draw to "fill in" an arbitrary triangle. (Your program
should call dot for all the points inside the triangle.)
Does inside call a vertex of the polygon inside or outside?
What is the maximum value achievable by count when inside is executed
on a polygon with N vertices? Give an example supporting your answer.
Write an efficient program for determining if a given point is inside a
given quadrilateral.
25. Finding the Convex Hull
Often, when we have a large number of points to process, we're interested
in the boundaries of the point set. When looking at a diagram of
a set of points plotted in the plane, a human has little trouble distinguishing
those on the "inside" of the point set from those which lie on the edge. This
distinction is a fundamental property of point sets; in this chapter we'll see
how it can be precisely characterized by looking at algorithms for separating
out the "boundary" points of a point set.
The mathematical notion of the natural boundary of a point set depends
on a geometric property called conwezity. This is a simple concept that the
reader may have encountered before: a convex polygon is a polygon with the
property that any line connecting any two points inside the polygon must
itself lie inside the polygon. For example, the "simple closed polygon" that
we computed in the previous chapter is decidedly nonconvex, but any triangle
or rectangle is convex.
Now, the mathematical name for the natural boundary of a point set is
the convex hull. The convex hull of a set of points in the plane is defined to
be the smallest convex polygon containing them all. Equivalently, the convex
hull is the shortest path which surrounds the points. An obvious property of
the convex hull that is easy to prove is that the vertices of the convex polygon
defining the hull are points from the original point set. Given N points, some
of them form a convex polygon within which all the others are contained. The
problem is to find those points. Many algorithms have been developed to find
the convex hull; in this chapter we'll examine a few representative ones.
For a large number N of points, the convex hull could contain as few as
3 points (if three points form a large triangle containing all the others) or as
many as N points (if all the points fall on a convex polygon, then they all
comprise their own convex hull). Some algorithms work well when there are
many points on the convex hull; others work better when there are only a few.
321
322 CHAPTER 25
Below is diagramed our sample set of points and their convex hull.
A fundamental property of the convex hull is that any line outside the hull,
when moved in any direction towards the hull, hits the hull at one of its vertex
points. (This is an alternate way to define the hull: it is the subset of points
from the point set that could be hit by a line moving in at some angle from
infinity.) In particular, it's easy to find a few points that are guaranteed to be
on the hull by applying this rule for horizontal and vertical lines: the points
with the smallest and largest z and y coordinates are all on the convex hull.
The convex hull naturally defines the "boundaries" of a point set and is
a fundamental geometric computation. It plays an important role in many
statistical computations, especially when generalized to higher dimensions.
Rules of the Game
The input to an algorithm for finding the convex hull will obviously be an
array of points; we can use the point type defined in the previous chapter.
The output will be a polygon, also represented as an array of points with the
property that tracing through the points in the order in which they appear in
the array traces the outline of the polygon. On reflection, this might appear
to require an extra ordering condition on the computation of the convex hull
FINDING THE: CONVEXHULL 323
(why not just return the points on the hull in any order?), but output in
the ordered form is obviously more useful, and it has been shown that the
unordered computation is no easier to do. For all of the algorithms that we
consider, it is convenient to do the computation in place: the array used for
the original point set is also used to hold the output. The algorithms simply
rearrange the points in the original array so that the convex hull appears in
the first M positions, in order.
From the description above, it may be clear that computing the convex
hull is closely related to sorting. In fact, a convex hull algorithm can be used
to sort, in the following way. Given N numbers to sort, turn them into points
(in polar coordinates) by treating the numbers as angles (suitably normalized)
with a fixed radius for each point. The convex hull of this point set is an
N-gon containing all of the points. Now, since the output must be ordered
in the order the points appear on this polygon, it can be used to find the
sorted order of the original values (remember that the input was unordered).
This is not a formal proof that computing the convex hull is no easier than
sorting, because, for example, the cost of the trigonometric functions required
to convert the original numbers to be sorted into points on the polygon must be
considered. Comparing convex hull algorithms (which involve trigonometric
operations) to sorting algorithms (which involve comparisons bet,ween keys)
is a bit like comparing apples to oranges, but it has been shown that any
convex hull algorithm must require about N log N operations, the same as
sorting (even though the operations allowed are likely to be quite different).
It is helpful to view finding the convex hull of a set of points as a kind of
"two-dimensional sort" because frequent parallels to sorting algorithms arise
in the study of algorithms for finding the convex hull.
In fact, the algorithms that we'll study show that finding the convex hull
is no harder than sorting either: there are several algorithms that run in time
proportional to N log N in the worst case. Many of the algorithms tend to
use even less time on actual point sets, because their running time depends
on the way that the points are distributed and on the number of points on
the hull.
We'll look at three quite different methods for finding the convex hull of
a set of points and then discuss their relative running times.
Package Wrapping
The most natural convex hull algorithm, which parallels the method a human
would use to draw the convex hull of a set of points, is a systematic way to
"wrap up" the set of points. Starting with some point guaranteed to be on the
convex hull (say the one with the smallest y coordinate), take a horizontal ray
in the positive direction and "sweep" it upward until hitting another point; this
324 CHAPTER 25
point must be on the hull. Then anchor at that point and continue "sweeping"
until hitting another point, etc., until the "package" is fully "wrapped" (the
beginning point is included again). The following diagram shows how the hull
is discovered in this way.
Of course, we don't actually sweep through all possible angles, we just do a
standard find-the-minimum computation to find the point that would be hit
next. This method is easily implemented by using the function theta(pl, p2:
point) developed in the previous chapter, which can be thought of as returning
the angle between pl, p2 and the horizontal (though it actually returns a more
easily computed number with the same ordering properties). The following
program finds the convex hull of an array p [I..iV] of points, represented as
described in the previous chapter (the array position p[N+l] is also used, to
hold a sentinel):
FINDING THE CONVEX HULL 325
function wrap: integer;
var i, min, M: integer;
minangle, v: real;
t: point;
min:=l;
for i:=2 to Ndo
if p[i].y<p[min].y then min:=i;
M:=O; p[N+1]:=p[min]; minangle:=O.O;
repeat
M:=M+1; t:=p[M]; p[Mj:=p[min]; p[min]:=t;
min:=N+l; v:=minangle; minangle:=360.0;
for i:=M+1 to N+I do
if theta(p[M],p[i])>v then
if theta(p[M], p[i])<minangle then
begin min:=i; minangle:=theta(p[M], p[min]) end;
until min= N+1;
wrap:=M;
end ;
First, the point with the lowest y coordinate is found and copied into p[N+l]
in order to stop the loop, as described below. The variable M is maintained
as the number of points so far included on the hull, and v is the current
value of the "sweep" angle (the angle from the horizontal to the line between
p[M-l] a n d p[M]). The repeat loop puts the last point found into the
hull by exchanging it with the Mth point, and uses the theta function from
the previous chapter to compute the angle from the horizontal made by the
line between that point and each of the points not yet included on the hull,
searching for the one whose angle is smallest among those with angles bigger
than v. The loop stops when the first point (actually the copy of the first
point that was put into p[N+1]) is encountered again.
This program may or may not return points which fall on a convex hull
edge. This happens when more than one point has the same theta value with
p[M] during the execution of the algorithm; the implementation above takes
the first value. In an application where it is important to find points falling
on convex hull edges, this could be achieved by changing theta to take the
distance between the points given as its arguments into account and give the
closer point a smaller value when two points have the same angle.
The following table traces the operation of this algorithm: the Mth line
of the table gives the value of v and the contents of the p array after the Mth
point has been added to the hull.
326 CHAPTER 25
7.50(B(A C D E F G H I J K L M N 0 P
18.00 BmC D E F G H I J K L A N 0 P
83.08 B MWD E F G H I J K C A N 0 P
144.00BMLmEFGHIJKCADOP
190.00 B M L NHF G H I J K C A D 0 P
225.00 B M L N EmG H I J K C A D F P
257.14 B M L N E OmH I J K C A D F P
315.00 B M L N E 0 GmI J K C A H F P
One attractive feature of this method is that it generalizes to three (or
more) dimensions. The convex hull of a set of points in 3-space is a convex
three-dimensional object with flat faces. It can be found by "sweeping" a
plane until the hull is hit, then "folding" faces of the plane, anchoring on
different lines on the boundary of the hull, until the "package" is "wrapped."
The program is quite similar to selection sorting, in that we successively
choose the "best" of the points not yet chosen, using a brute-force search for
the minimum. The major disadvantage of the method is that in the worst case,
when all the points fall on the convex hull, the running time is proportional
to N2.
The Graham Scan
The next method that we'll examine, invented by R. L. Graham in 1972,
is interesting because most of the computation involved is for sorting: the
algorithm includes a sort followed by a relatively inexpensive (though not immediately
obvious) computation. The algorithm starts with the construction
of a simple closed polygon from the points using the method of the previous
chapter: sort the points using as keys the theta function values corresponding
to the angle from the horizontal made from the line connecting each point
with an 'anchor' point p[l] (with the lowest y coordinate) so that tracing
P~~l,Pk% . . . ,p[N],p[l] gives a closed polygon. For our example set of points,
we get the simple closed polygon of the previous section. Note that p[N],
p[l], and p[2] are consecutive points on the hull; we've essentially run the
first iteration of the package wrapping procedure (in both directions).
Computation of the convex hull is completed by proceeding around,
trying to place each point on the hull and eliminating previously placed points
that couldn't possibly be on the hull. For our example, we consider the points
FINDING 'IYE CONVEXHULL 327
in the order B M J L N P K F I E C 0 A H G D. The test for which points to
eliminate is not difficult. After each point has been added, we assume that we
have eliminated enough points so that what we have traced out so far could
be part of the convex hull, based on the points so far seen. The algorithm
is based on the fact that all of the point,s in the point set must be on the
same side of each edge of the convex hull. Each time we consider a point, we
eliminate from the hull any edge which violates this condition.
Specifically, the test for eliminating a point is the following: when we
come to examine a new point p[i], we eliminate p[k] from the hull if the line
between p[k] and p[k-l] g oes between p[i] and p[l]. If p[i] and p[l] are
on the same side of the line, then p[k] could still be on the hull, so we don't
eliminate it. The following diagram shows the situation for our example when
L is considered:
.
.
.
.
l F
l K
l N
'P
\
\
.I \ \J
. JM
\
B \
The extended line JM runs between L and B, so J couldn't be on the hull. Now
L, N, and P are added to the hull, then P is eliminated when K is considered
(because the extended line NP goes between B and K), then F and I are added,
leaving the following situation when E is considered.
328 CHAPTER 25
.
.
\\\\ 'il
J \
\\ \ M
\
'\ B
At this point, I must be eliminated because FI runs between E and B, then F
and K must be eliminated because NKF runs between E and B. Continuing
in this way, we finally arrive back at B, as illustrated below:
G
FINDING THE CONVEX HULL 329
The dotted lines in the diagrams are all the edges that were included, then
eliminated. The initial sort guarantees that each point is considered as a
possible hull point in turn, because all points considered earlier have a smaller
theta value. Each line that survives the "eliminations" has the property that
every point is on the same side of it as p[l], which implies that it must be on
the hull.
Once the basic method is understood, the implementation is straightforward.
First, the point with the minimum y value is exchanged with p[l].
Next, shellsort (or some other appropriate sorting routine) is used to rearrange
the points, modified as necessary to compare two points using their
theta values with p[l]. Finally, the scan described above is performed. The
following program finds the convex hull of the point set p [1..N] (no sentinel
is needed):
function grahamscan : integer;
var i, j, min, M: integer;
1: line; t: point;
min:=l;
for i:=2 to N do
if p [i] .y<p [min] .y then min:=i;
t:=p[l]; p[l]:=p[min]; p[min]:=t;
shellsort ;
M:=2;
for i:=4 to Ndo
M:=M+2;
repeat
M:=M-1;
I.pl:=p[M]; l.p2:=p[M-I];
until same(l,p[l],p[i])>=O;
t:=p[M+I]; p[M+I]:=p[i]; p[i]:=t;
end ;
grahamscan : =M;
end ;
The loop maintains a partial hull in p[l], . . . , p [Ml, as described in the text,
above. For each new i value considered, M is decremented if necessary to
eliminate points from the partial hull and then p [i] is exchanged with p [M+1]
to (tentatively) add it to the partial hull. The following table shows the
contents of the p array each time a new point is considered for our example:
330 CHAF'TER 25
1
B
B
B
B
B
B
B
B
B
B
B
B
B
B
2
M
3 4 5 6 7 8 9 10 11 12 13 14 15 16
~~NPKFIEC~AHGD
- -
MMJNPKF I E C O A H G D
MLmJmKFIECOAHGD
M L;J'BymF I E C O A H G D
- -
MLNHJPBI E C O A H G D
MLNKmPJmECOAHGD
- -
MLNKFWJPNCOAHGD
- -
M L N IEl F I J P K ICI 0 A H G D
MLN6@I JPKymAHGD
- -
MLNEMIJPKFCMHGD
MLNEOaJPKFCImGD
- -
MLNEO-ANPKFCI
MLNEOMHPKFCI
Jlc.lE
J Au
MLNEOG-mPKFCI J A H
This table depicts, for i from 4 to hJ, the solution when p[i] is first considered,
with p[M] and p[i] boxed.
The program as given above could fail if there is more than one point with
the lowest y coordinate, unless theta is modified to properly sort collinear
points, as described above. (This is a subtle point which the reader may
wish to check.) Alternatively, the min computation could be modified to find
the point with the lowest x coordinate among all points with the lowest y
coordinate, the canonical form described in Chapter 24.
One reason that this method is interesting to study is that it is a simple
form of backtracking, the algorithm design technique of "try something, if it
doesn't work then try something else" which we'll see in much more complicated
forms in Chapter 39.
Hull Selection
Almost any convex hull method can be vastly improved by a method developed
independently by W. F. Eddy and R. W. Floyd. The general idea is simple:
pick four points known to be on the hull, then throw out everything inside the
quadrilateral formed by those four points. This leaves many fewer points to
FINDING THE CONVEXHULL 331
be considered by, say, the Graham scan or the package wrapping technique.
The method could be also applied recursively, though this is usually not worth
the trouble.
The four points known to be on the hull should be chosen with an eye
towards any information available about the input points. In the absence
of any information at all, the simplest four points to use are those with the
smallest and largest 5 and y coordinates. However, it might be better to adapt
the choice of points to the distribution of the input. For example, if all x and
y values within certain ranges are equally likely (a rectangular distribution),
then choosing four points by scanning in from the corners might be better
(find the four points with the largest and smallest sum and difference of the
two coordinates). The diagram below shows that only A and J survive the
application of this technique to our example set of points.
G
The recursive version of this technique is very similar to the Quicksortlike
select procedure for selection that we discussed in Chapter 12. Like that
procedure, it is vulnerable to an N2 worst-case running time. For example, if
all the original points are on the convex hull, then no points will get thrown
out in the recursive step. Like select, the running time is linear on the average,
as discussed further below.
CHAPTER 25
Performance Issues
As mentioned in the previous chapter, geometric algorithms are somewhat
harder to analyze than algorithms from some of the other areas we've studied
because the input (and the output) is more difficult to characterize. It often
doesn't make sense to speak of LLrandom" point sets: for example, as N
gets large, the convex hull of points drawn from a rectangular distribution is
extremely likely to be very close to the rectangle defining the distribution. The
algorithms that we've looked at depend on different properties of the point set
distribution and are thus in practice incomparable, because to compare them
analytically would require an understanding of very complicated interactions
between little-understood properties of point sets. On the other hand, we
can say some things about the performance of the algorithms that will help
choosing one for a particular application.
The easiest of the three to analyze is the Graham scan. It requires time
proportional to N log N for the sort and N for the scan. A moment's reflection
is necessary to convince oneself that the scan is linear, since it does have
a repeat "loop-within-a-loop." However, it is easy to see that no point is
"eliminated" more than once, so the total number of times the code within
that repeat loop is iterated must be less than N.
The "package-wrapping" technique, on the other hand, obviously takes
about MN steps, where M is the number of vertices on the hull. To compare
this with the Graham scan analytically would require a formula for M in terms
of N, a difficult problem in stochastic geometry. For a circular distribution
(and some others) the answer is that M is about N1/3, and for values of N
which are not large N'j3 is comparable to log N (which is the expected value
for a rectangular distribution), so this method will compete very favorably
with the Graham scan for many practical problems. Of course, the N2 worst
case should always be taken into consideration.
Analysis of the Floyd-Eddy method requires even more sophisticated
stochastic geometry, but the general result is the same as that given by
intuition: almost all the points fall inside the quadrilateral and are discarded.
This makes the running time of tbe whole convex hull algorithm proportional
to N, since most points are examined only once (when they are thrown out).
On the average, it doesn't matter much which method is used after one
application of the Floyd-Eddy met,hod, since so few points are likely to be
left. However, to protect against the worst case (when all points are on the
hull), it is prudent to use the Graham scan. This gives an algorithm which is
almost sure to run in linear time in practice and is guaranteed to run in time
-prop_ ortional to N log N.
r-l
FINDING THE CONVEX HULL 333
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10,
Suppose it is known in advance that the convex hull of a set of points is
a triangle. Give an easy algorithm for finding the triangle. Answer the
same question for a quadrilateral.
Give an efficient method for determining whether a point falls within a
given convex polygon.
Implement a convex hull algorithm like insertion sort, using your method
from the previous exercise.
Is it strictly necessary for the Graham scan to start with a point guaranteed
to be on the hull? Explain why or why not.
Is it strictly necessary for the package-wrapping method to start with a
point guaranteed to be on the hull? Explain why or why not.
Draw a set of points that makes the Graham scan for finding the convex
hull particularly inefficient.
Does the Graham scan work for finding the convex hull of the points
which make up the vertices of any simple polygon? Explain why or give
a counterexample showing why not.
What four points should be used for the Floyd-Eddy method if the input
is assumed to be randomly distributed within a circle (using random polar
coordinates)?
Run the package-wrapping method for large points sets with both 2 and
y equally likely to be between 0 and 1000. Use your curve fitting routine
to find an approximate formula for the running time of your program for
a point set of size N.
Use your curve-fitting routine to find an approximate formula for the
number of points left after the Floyd-Eddy method is used on point sets
with x and y equally likely to be between 0 and 1000.
26. Range Searching
Given a set of points in the plane, it is natural to ask which of those
points fall within some specified area. "List all cities within 50 miles of
Providence" is a question of this type which could reasonably be asked if a
set of points corresponding to the cities of the U.S. were available. When the
geometric shape is restricted to be a rectangle, the issue readily extends to
non-geometric problems. For example, "list all those people between 21 and
25 with incomes between $60,000 and $100,000" asks which "points" from a
file of data on people's names, ages, and incomes fall within a certain rectangle
in the age-income plane.
Extension to more than two dimensions is immediate. If we want to list
all stars within 50 light years of the sun, we have a three-dimensional problem,
and if we want the rich young people of the second example in the paragraph
above to be tall and female as well, we have a four-dimensional problem. In
fact, the dimension can get very high for such problems.
In general, we assume that we have a set of records with certain attributes
that take on values from some ordered set. (This is sometimes called
a database, though more precise and complete definitions have been developed
for this important term.) The problem of finding all records in a database
which satisfy specified range restrictions on a specified set of attributes is
called range searching. For practical applications, this is a difficult and important
problem. In this chapter, we'll concentrate on the two-dimensional
geometric problem in which records are points and attributes are their coordinates,
then we'll discuss appropriate generalizations.
The methods that we'll look at are direct generalizations of methods that
we have seen for searching on single keys (in one dimension). We presume that
many queries will be made on the same set of points, so the problem splits into
two parts: we need a preprocessing algorithm, which builds the given points
into a structure supporting efficient range searching, and a range-searching
335
336 CHAPTER 26
algorithm, which uses the structure to return points falling within any given
(multidimensional) range. This separation makes different methods difficult
to compare, since the total cost depends not only on the distribution of the
points involved but also on the number and nature of the queries.
The range-searching problem in one dimension is to return all points
falling within a specified interval. This can be done by sorting the points
for preprocessing and, then using binary search (to find all points in a given
interval, do a binary search on the endpoints of the interval and return all the
points that fall in between). Another solution is to build a binary search tree
and then do a simple recursive traversal of the tree, returning points that are
within the interval and ignoring parts of the tree that are outside the interval.
For example, the binary search tree that is built using the x coordinates of
our points from the previous chapter, when inserted in the given order, is the
following:
Now, the program required to find all the points in a given interval
is a direct generalization of the treeprint procedure of Chapter 14. If the
left endpoint of the interval falls to the left of the point at the root, we
(recursively) search the left subtree, similarly for the right, checking each
node we encounter to see whether its point falls within the interval:
RANGE SEARCHING
type interval = record xl, x2: integer end;
procedure bstrange(t: link; int: interval);
var txl, tx2: boolean;
if t<>z then
txl:=tt.key>=int.xl;
tx2:=tt.key<=int.x2;
if txl then bstrange(tt .l, int);
if txl and tx2 then write(name(tt.id), ' ');
if tx2 then bstrange( tt.r, int);
end
end ;
(This program could be made slightly more efficient by maintaining the interval
int as a global variable rather than passing its unchanged values through
the recursive calls.) For example, when called on the interval [5,9] for the example
tree above, range prints out E C H F I. Note that the points returned
do not necessarily need to be connected in the tree.
These methods require time proportional to about N log N for preprocessing,
and time proportional to about R+log N for range, where R is the number
of points actually falling in the range. (The reader may wish to check that
this is true.) Our goal in this chapter will be to achieve these same running
times for multidimensional range searching.
The parameter R can be quite significant: given the facility to make range
queries, it is easy for a user to formulate queries which could require all or
nearly all of the points. This type of query could reasonably occur in many
applications, but sophisticated algorithms are not necessary if all queries are
of this type. The algorithms that we consider are designed to be efficient for
queries which are not expected to return a large number of points.
Elementary Methods
In two dimensions, our "range" is an area in the plane. For simplicity, we'll
consider the problem of finding all points whose 5 coordinates fall within a
given x-interval and whose y coordinates fall within a given y-interval: that
is, we seek all points falling within a given rectangle. Thus, we'll assume a
type rectangle which is a record of four integers, the horizontal and vertical
interval endpoints. The basic operation that we'll use is to test whether a
point falls within a given rectangle, so we'll assume a function insiderect(p:
point; rect: rectangle) which checks this in the obvious way, returning true if
338 CHAPTER 26
p falls within rect. Our goal is to find all the points which fall within a given
rectangle, using as few calls to insiderect as possible.
The simplest way to solve this problem is sequential search: scan through
all the points, testing each to see if it falls within the specified range (by calling
insiderect for each point). This method is in fact used in many database
applications because it is easily improved by "batching" the range queries,
testing for many different ones in the same scan through the points. In a very
large database, where the data is on an external device and the time to read
the data is by far the dominating cost factor, this can be a very reasonable
method: collect as many queries as will fit in the internal memory and search
for them all in one pass through the large external data file. If t,his type of
batching is inconvenient or the database is somewhat smaller, however, there
are much better methods available.
A simple first improvement to sequential search is to apply directly a
known one-dimensional method along one or more of the dimensions to be
searched. For example, suppose the following search rectangle is specified for
our sample set of points:
D l N
E
.L
'K
' 0
l F
'P
'A
'C
' I
'G
'D
l H
'J
'M
One way to proceed is to find the points whose x coordinates fall within the x
range specified by the rectangle, then check the y coordinates of those points
RANGE SEARCHING
to determine whether or not they fall within the rectangle. Thus, points that
could not be within the rectangle because their x coordinates are out of range
are never examined. This technique is called projection; obviously we could
also project on y. For our example, we would check E C H F and I for an x
projection, as described above and we would check 0 E F K P N and L for a
y projection.
If the points are uniformly distributed in a rectangular shaped region,
then it's trivial to calculate the average number of points checked. The
fraction of points we would expect to find in a given rectangle is simply the
ratio of the area of that rectangle to the area of the full region; the fraction of
points we would expect to check for an x projection is the ratio of the width
of the rectangle to the width of the region, and similarly for a y projection.
For our example, using a 4-by-6 rectangle in a 16-by-16 region means that
we would expect to find 3132 of the points in the rectangle, l/4 of them in
an x projection, and 318 of them in a y projection. Obviously, under such
circumstances, it's best to project onto the axis corresponding to the narrower
of the two rectangle dimensions. On the other hand, it's easy to construct
situations in which the projection technique could fail miserably: for example
if the point set forms an "L" shape and the search is for a range that encloses
only the point at the corner of the "L," then projection on either axis would
eliminate only half the points.
At first glance, it seems that the projection technique could be improved
somehow to "intersect" the points that fall within the x range and the points
that fall within the y range. Attempts to do this without examining either
all the points in the x range or all the points in the y range in the worst case
serve mainly to give one an appreciation for the more sophisticated methods
that we are about to study.
Grid Method
A simple but effective technique for maintaining proximity relationships among
points in the plane is to construct an artificial grid which divides the area to
be searched into small squares and keep short lists of points that fall into
each square. (This technique is reportedly used in archaeology, for example.)
Then, when points that fall within a given rectangle are sought, only the lists
corresponding to squares that intersect the rectangle have to be searched. In
our example, only E, C, F, and K are examined, as sketched below.
340 CRAPTER 26
The main decision to be made in implementing this method is determining
the size of the grid: if it is too coarse, each grid square will contain too many
points, and if it is too fine, there will be too many grid squares to search (most
of which will be empty). One way to strike a balance between these two is to
choose the grid size so that the number of grid squares is a constant fraction
of the total number of points. Then the number of points in each square is
expected to be about equal to some small constant. For our example, using a
4 by 4 grid for a sixteen-point set means that each grid square is expected to
contain one point.
Below is a straightforward implementation of a program to read in zy
coordinates of a set of points, then build the grid structure containing those
points. The variable size is used to control how big the grid squares are and
thus determine the resolution of the grid. For simplicity, assume that the
coordinates of all the points fall between 0 and some maximum value max.
Then, to get a G-by-G grid, we set size to the value max/G, the width of
the grid square. To find which grid square a point belongs to, we divide its
coordinates by size, as in the following implementation:
RANGE SEARCHING 341
program rangegrid(input, output);
const Gmax=20;
type point = record x, y, info: integer end;
Jink=tnode;
node=record p: point; next: link end;
var grid: array[O..Gmax, O..Gmax] of link;
p: point;
i, j, k, size, N: integer;
z: link;
procedure insert(p: point);
var t: link;
new(t); tf.p:=p;
tf.next:=grid[p.x div size,p.y div size];
grid [p.x div size, p.y div size] := t ;
end ;
new(z);
for i:=O to Gmax do
for j:=O to Gmax do grid[i, j] :=z;
readln (N) ;
for k:=l to N do
readJn(p.x, p.y); p.info:=k;
end ;
end.
This program uses our standard linked list representations, with dummy tail
node z. The point type is extended to include a field info which contains
the integer k for the Jcth point read in, for convenience in referencing the
points. In keeping with the style of our examples, we'll assume a function
name(k) to return the Jcth letter of the alphabet: clearly a more general
naming mechanism will be appropriate for actual applications.
As mentioned above, the setting of the variable size (which is omitted
from the above program) depends on the number of points, the amount of
memory available, and the range of coordinate values. Roughly, to get M
points per grid square, size should be chosen to be the nearest integer to max
divided by ,/m. This leads to about N/M grid squares. These estimates
aren't accurate for small values of the parameters, but they are useful for
most situations, and similar estimates can easily be formulated for specialized
applications.
342 CHAF'TER 26
Now, most of the work for range searching is handled by simply indexing
into the grid array, as follows:
procedure gridrange(rect : rectangle) ;
var t: link;
i, j: integer;
for i:=(rect.xl div size) to (rect.x2 div size) do
for j:=(rect.yl div size) to (rect.y2 div size) do
t:=grid[i, j];
while t<>z do
if insiderect( tt.p, rect) then write(name( tt.p.info));
t:=tf.next
end
end
end ;
The running time of this program is proportional to the number of grid squares
touched. Since we were careful to arrange things so that each grid square
contains a constant number of points on the average, this is also proportional,
on the average, to the number of points examined. If the number of points
in the search rectangle is R, then the number of grid squares examined is
proportional to R. The number of grid squares examined which do not fall
completely inside the search rectangle is certainly less than a small constant
times R, so the total running time (on the average) is linear in R, the number
of points sought. For large R, the number of points examined which don't fall
in the search rectangle gets quite small: all such points fall in a grid square
which intersects the edge of the search rectangle, and the number of such
squares is proportional to fi for large R. Note that this argument falls
apart if the grid squares are too small (too many empty grid squares inside
the search rectangle) or too large (too many points in grid squares on the
perimeter of the search rectangle) or if the search rectangle is thinner than
the grid squares (it could intersect many grid squares, but have few points
inside it).
The grid method works well if the points are well distributed over the
assumed range but badly if they are clustered together. (For example, all
the points could fall in one grid box, which would mean that all the grid
machinery gained nothing.) The next method that we will examine makes
this worst case very unlikely by subdividing the space in a nonuniform way,
RANGE SEARCHING 343
adapting to the point set at hand.
2D Trees
Two-dimensional trees are dynamic, adaptable data structures which are very
similar to binary trees but divide up a geometric space in a manner convenient
for use in range searching and other problems. The idea is to build binary
search trees with points in the nodes, using the y and x coordinates of the
points as keys in a strictly alternating sequence.
The same algorithm is used for inserting points into 2D trees as for normal
binary search trees, except at the root we use the y coordinate (if the point
to be inserted has a smaller y coordinate than the point at the root, go left;
otherwise go right), then at the next level we use the 2 coordinate, then at
the next level the y coordinate, etc., alternating until an external node is
encountered. For example, the following 2D tree is built for our sample set of
points:
El El
The particular coordinate used is given at each node along with the point
name: nodes for which the y coordinate is used are drawn vertically, and
those for which the x coordinates is used are drawn horizontally.
344 CHAPTER 26
This technique corresponds to dividing up the plane in a simple way: all
the points below the point at the root go in the left subtree, all those above in
the right subtree, then all the points above the point at the root and to the left
of the point in the right subtree go in the left subtree of the right subtree of
the root, etc. Every external node of the tree corresponds to some rectangle in
the plane. The diagram below shows the division of the plane corresponding
to the above tree. Each numbered region corresponds to an external node in
the tree; each point lies on a horizontal or vertical line segment which defines
the division made in the tree at that point.
For example, if a new point was to be inserted into the tree from region 9 in
the diagram above, we would move left at the root, since all such points are
below A, then right at B, since all such points are to the right of B, then right
at J, since all such points are above J. Insertion of a point in region 9 would
correspond to drawing a vertical line through it in the diagram.
The code for the construction of 2D trees is a straightforward modification
of standard binary tree search to switch between x and y coordinates at each
level:
RANGE SEARCHING 345
function twoDinsert(p: point; t: link) : link;
var f: link;
d, td: boolean;
d:=true;
repeat
if d then td:=p.x<tt.p.x
else td :=p.y< tt .p.y;
f:=t;
if td then t:=tt.l else t:=tf.r;
d:= not d;
until t=z;
new(t); tf.p:=p; tf.l:=z; tt.r:=z;
if td then Q.l:=t else fi.r:=t;
twoDinsert:=t
end ;
As usual, we use a header node head with an artificial point (0,O) which is
"less" than all the other points so that the tree hangs off the right link of
head, and an artificial node z is used to represent all the external nodes. The
call twoDinsert(p, head) will insert a new node containing p into the tree. A
boolean variable d is toggled on the way down the tree to effect the alternating
tests on x and y coordinates. Otherwise the procedure is identical to the
standard procedure from Chapter 14. In fact, it turns out that for randomly
distributed points, 2D trees have all the same performance characteristics of
binary search trees. For example, the average time to build such a tree is
proportional to N log N, but there is an N2 worst case.
To do range searching using 2D trees, we test the point at each node
against the range along the dimension that is used to divide the plane of that
node. For our example, we begin by going right at the root and right at node
E, since our search rectangle is entirely above A and to the right of E. Then,
at node F, we must go down both subtrees, since F falls in the x range defined
by the rectangle (note carefully that this is not the same as F falling within
the rectangle). Then the left subtrees of P and K are checked, corresponding
to checking areas 12 and 14 of the plane, which overlap the search rectangle.
This process is easily implemented with a straightforward generalization of
the 1D range procedure that we examined at the beginning of this chapter:
346 CHAPTER 26
procedure twoDrange(t: link; rect: rectangle; d: boolean);
var tl, t2, txl, tx2, tyl, ty2: boolean ;
if t<>z then
txl:=rect.xl<tf.p.x; tx2:=tt.p.x<=rect.x2;
tyl :=rect.yl<tf.p.y; ty2:=tf.p.y<=rect.y2;
if d then begin tl := txl ; t2:= tx2 end
else begin tl:=tyl; t2:=ty2 end;
if tl then twoDrange(tt.l,rect, not d);
if insiderect( tt.p, rect) then write(name( tt.p.info), ' ');
if t2 then twoDrange(tf.r, rect, not d);
end
end ;
This procedure goes down both subtrees only when the dividing line cuts the
rectangle, which should happen infrequently for relatively small rectangles.
Although the method hasn't been fully analyzed, its running time seems sure
to be proportional to R + log N to retrieve R points from reasonable ranges in
a region containing N points, which makes it very competitive with the grid
method.
Multidimensional Range Searching
Both the grid method and 2D trees generalize directly to more than two dimensions:
simple, straightforward extensions to the above algorithms immediately
yield range-searching methods which work for more than two dimensions.
However, the nature of multidimensional space dictates that some caution is
called for and that the performance characteristics of the algorithms might
be difficult to predict for a particular application.
To implement the grid method for k-dimensional searching, we simply
make grid a k-dimensional array and use one index per dimension. The main
problem is to pick a reasonable value for size. This problem becomes quite
obvious when large k is considered: what type of grid should we use for lodimensional
search? The problem is that even if we use only three divisions
per dimension, we need 31° grid squares, most of which will be empty, for
reasonable values of N.
The generalization from 2D to kD trees is also straightforward: simply
cycle through the dimensions (as we did for two dimensions by alternating
between x and y) while going down the tree. As before, in a random situation,
the resulting trees have the same characteristics as binary search trees. Also
as before, there is a natural correspondence between the trees and a simple
RANGE SEARCHING 347
geometric process. In three dimensions, branching at each node corresponds
to cutting the three-dimensional region of interest with a plane; in general we
cut the k-dimensional region of interest with a (k- 1)-dimensional hyperplane.
If k is very large, there is likely to be a significant amount of imbalance
in the kD trees, again because practical point sets can't be large enough to
take notice of randomness over a large number of dimensions. Typically, all
points in a subtree will have the same value across several dimensions, which
leads to several one-way branches in the trees. One way to help alleviate this
problem is, rather than simply cycle through the dimensions, always to use the
dimension that will divide up the point set in the best way. This technique
can also be applied to 2D trees. It requires that extra information (which
dimension should be discriminated upon) be stored in each node, but it does
relieve imbalance, especially in high-dimensional trees.
In summary, though it is easy to see how to to generalize the programs for
range searching that we have developed to handle multidimensional problems,
such a step should not be taken lightly for a large application. Large databases
with many attributes per record can be quite complicated objects indeed, and
it is often necessary to have a good understanding of the characteristics of
the database in order to develop an efficient range-searching method for a
particular application. This is a quite important problem which is still being
activelv studied.
348
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Write a nonrecursive version of the 1D range program given in the text.
Write a program to print out all points from a binary tree which do not
fall in a specified interval.
Give the maximum and minimum number of grid squares that will be
searched in the grid method as functions of the dimensions of the grid
squares and the search rectangle.
Discuss the idea of avoiding the search of empty grid squares by using
linked lists: each grid square could be linked to the next nonempty grid
square in the same row and the next nonempty grid square in the same
column. How would the use of such a scheme affect the grid square size
to be used?
Draw the tree and the subdivision of the plane that results if we build a
2D tree for our sample points starting with a vertical dividing line. (That
is, call range with a third argument of false rather than true.)
Give a set of points which leads to a worst-case 2D tree having no nodes
with two sons; give the subdivision of the plane that results.
Describe how you would modify each of the methods, to return all points
that fall within a given circle.
Of all search rectangles with the same area, what shape is likely to make
each of the methods perform the worst?
Which method should be preferred for range searching in the case that
the points cluster together in large groups spaced far apart?
Draw the 3D tree that results when the points (3,1,5) (4,8,3) (8,3,9)
(6,277) (1,673) (1,375) (6,4,2) are inserted into an initially empty tree.
27. Geometric Intersection
A natural problem arising frequently in applications involving geometric
data is: "Given a set of N objects, do any two intersect?" The "objects"
involved may be lines, rectangles, circles, polygons, or other types of geometric
objects. For example, in a system for designing and processing integrated
circuits or printed circuit boards, it is important to know that no two wires
intersect to make a short circuit. In an industrial application for designing
layouts to be executed by a numerically controlled cutting tool, it is important
to know that no two parts of the layout intersect. In computer graphics,
the problem of determining which of a set of objects is obscured from a
particular viewpoint can be formulated as a geometric intersection problem
on the projections of the objects onto the viewing plane. And in operations
research, the mathematical formulation of many important problems leads
naturally to a geometric intersection problem.
The obvious solution to the intersection problem is to check each pair of
objects to see if they intersect. Since there are about ;N" pairs of objects, the
running time of this algorithm is proportional to N2. For many applications,
this is not a problem because other factors limit the number of objects which
can be processed. However, geometric applications systems have become
much more ambitious, and it is not uncommon to have to process hundreds
of thousands or even millions of objects. The brute-force N2 algorithm is
obviously inadequate for such applications. In this section, we'll study a
general method for determining whether any two out of a set of N objects
intersect in time proportional to N log N, based on algorithms presented by
M. Shamos and D. Hoey in a seminal paper in 1976.
First, we'll consider an algorithm for returning all intersecting pairs
among a set of lines that are constrained to be horizontal or vertical. This
makes the problem easier in one sense (horizontal and vertical lines are relatively
simple geometric objects), more difficult in another sense (returning all
349
350 CHAPTER 27
intersecting pairs is more difficult than simply determining whether one such
pair exists). The implementation that we'll develop applies binary search trees
and the interval range-searching program of the previous chapter in a doubly
recursive program.
Next, we'll examine the problem of determining whether any two of a
set of N lines intersect, with no constraints on the lines. The same general
strategy as used for the horizontal-vertical case can be applied. In fact, the
same basic idea works for detecting intersections among many other types
of geometric objects. However, for lines and other objects, the extension
to return all intersecting pairs is somewhat more complicated than for the
horizontal-vertical case.
Horizontal and Vertical Lines
To begin, we'll assume that all lines are either horizontal or vertical: the two
points defining each line either have equal x coordinates or equal y coordinates,
as in the following sample set of lines:
.
I
IJ
(This is sometimes called Manhattan geometry because, the Chrysler building
notwithstanding, the Manhattan skyline is often sketched using only horizontal
and vertical lines.) Constraining lines to be horizontal or vertical is certainly
a severe restriction, but this is far from a "toy" problem. It is often
the case that this restriction is imposed for some other reason for a particular
GEOMETRIC INTERSECTION
application. For example, very large-scale integrated circuits are typically
designed under this constraint.
The general plan of the algorithm to find an intersection in such a set
of lines is to imagine a horizontal scan line sweeping from bottom to top
in the diagram. Projected onto this scan line, vertical lines are points, and
horizontal lines are intervals: as the scan line proceeds from bottom to top,
points (representing vertical lines) appear and disappear, and horizontal lines
are periodically encountered. An intersection is found when a horizontal line is
encountered which represents an interval on the scan line that contains a point
representing a vertical line. The point means that the vertical line intersects
the scan line, and the horizontal line lies on the scan line, so the horizontal
and vertical lines must intersect. In this way, the two-dimensional problem of
finding an intersecting pair of lines is reduced to the one-dimensional rangesearching
problem of the previous chapter.
Of course, it is not necessary actually to "sweep" a horizontal line all
the way up through the set of lines: since we only need to take action when
endpoints of the lines are encountered, we can begin by sorting the lines
according to their y coordinate, then processing the lines in that order. If the
bottom endpoint of a vertical line is encountered, we add the x coordinate of
that line to the tree; if the top endpoint of a vertical line is encountered, we
delete that line from the tree; and if a horizontal line is encountered, we do
an interval range search using its two x coordinates. As we'll see, some care is
required to handle equal coordinates among line endpoints (the reader should
now be accustomed to encountering such difficulties in geometric algorithms).
To trace through the operation of our algorithm on our set of sample
points, we first must sort the line endpoints by their y coordinate:
B B D E F H J C G D I C A G J F E I
Each vertical line appears twice in this list, each horizontal line appears
once. For the purposes of the line intersection algorithm, this sorted list
can be thought of as a sequence of insert (vertical lines when the bottom
endpoint is encountered), delete (vertical lines when the top endpoint is encountered),
and range (for the endpoints of horizontal lines) commands. All
of these "commands" are simply calls on the standard binary tree routines
from Chapters 14 and 26, using x coordinates as keys.
For our example, we begin with the following sequence of binary search
trees:
352 CHAPTER 27
D
%
E
D
3E F
First B is inserted into an empty tree, then deleted. Then D, E, and F are
inserted. At this point, H is encountered, and a range search for the interval
defined by H is performed on the rightmost tree in the above diagram. This
search discovers the intersection between H and F. Proceeding down the list
above in order, we add J, C, then G to get the following sequence of trees: D
E I:kF
J
Next, the upper endpoint of D is encountered, so it is deleted; then I is added
and C deleted, which gives the following sequence of trees:
At this point A is encountered, and a range search for the interval defined
GEOMETRIC INTERSECTION 353
by A is performed on the rightmost tree in the diagram above. This search
discovers the intersections between A and E, F, and I. (Recall that although
G and J are visited during this search, any points to the left of G or to the
right of J would not be touched.) Finally, the upper endpoints of G, J, F, E,
and I are encountered, so those points are successively deleted, leading back
to the empty tree.
The first step in the implementation is to sort the line endpoints on their
y coordinate. But since binary trees are going to be used to maintain the
status of vertical lines with respect to the horizontal scan line, they may as
well be used for the initial y sort! Specifically, we will use two "indirect"
binary trees on the line set, one with header node hy and one with header
node hx. The y tree will contain all the line endpoints, to be processed in
order one at a time; the x tree will contain the lines that intersect the current
horizontal scan line. We begin by initializing both hx and hy with 0 keys
and pointers to a dummy external node z, as in treeinitialize in Chapter 14.
Then the hy tree is constructed by inserting both y coordinates from vertical
lines and the y coordinate of horizontal lines into the binary search tree with
header node hy, as follows:
procedure buildytree;
var N, k, xl, yl, x2, y2: integer;
readln (N) ;
for k:=l to N do
readln (xl, yl , x2, y2) ;
lines[k].pl.x:=xl; lines[k].pl.y:=yl;
lines[k].p2.x:=x2; lines[k].p2.y:=y2;
bstinsert(k, yl, hy);
if y2< >yl then b&insert (k, y2, hy) ;
end ;
end ;
This program reads in groups of four numbers which specify lines, and puts
them into the lines array and the binary search tree on the y coordinate. The
standard b&insert routine from Chapter 14 is used, with the y coordinates as
keys, and indices into the array of lines as the info field. For our example set
of lines, the following tree is constructed:
354 CHAPTER 27
Now, the sort on y is effected by a recursive program with the same recursive
structure as the treeprint routine for binary search trees in Chapter 14. We
visit the nodes in increasing y order by visiting all the nodes in the left subtree
of the hy tree, then visiting the root, then visiting all the nodes in the right
subtree of the hy tree. At the same time, we maintain a separate tree (rooted
at hx) as described above, to simulate the operation of passing a horizontal
scan line through. The code at the point where each node is "visited" is rather
straightforward from the description above. First, the coordinates of the
endpoint of the corresponding line are fetched from the lines array, indexed
by the info field of the node. Then the key field in the node is compared
against these to determine whether this node corresponds to the upper or the
lower endpoint of the line: if it is the lower endpoint, it is inserted into the
hx tree, and if it is the upper endpoint, it is deleted from the hx tree and
a range search is performed. The implementation differs slightly from this
description in that horizontal lines are actually inserted into the hx tree, then
immediately deleted, and a range search for a one-point interval is performed
for vertical lines. This makes the code properly handle the case of overlapping
vertical lines, which are considered to "intersect."
GEOMETRIC INTERSECTION
procedure scan (next: link) ;
var t, xl, x2, yl, y2: integer;
int: interval;
if next< > z then
scan(nextf.1);
xl:=lines[nextt.info].pl.x; yl :=lines[nextf.info].pl.y;
x2:=lines[nextf.info].p2.x; y2:=lines[nextf.info].p2.y;
if x2<xl then begin t:=x2; x2:=x1; xl :=t end;
if y2<yl then begin t:=y2; y2:=yl; yl :=t end;
if nextf.key=yl then bstinsert(nextf.info, xl, hx);
if nextf .key=y2 then
bstdelete(nextt.info, xl, hx);
i&.x1 :=x1; i&.x2:=x2;
write(name(nextt.info), ': ');
bstrange(hxf.r, int);
wri teln ;
end ;
scan (nextf .r)
end
end ;
The running time of this program depends on the number of intersections that
are found as well as the number of lines. The tree manipulation operations
take time proportional to 1ogN on the average (if balanced trees were used,
a 1ogN worst case could be guaranteed), but the time spent in b&range also
depends on the total number of intersections it returns, so the total running
time is proportional to N log N+I, where I is the number of intersecting pairs.
In general, the number of intersections could be quite large. For example, if
we have N/2 horizontal lines and N/2 vertical lines arranged in a crosshatch
pattern, then the number of intersections is proportional to N2. As with
range searching, if it is known in advance that the number of intersections
will be very large, then some brute-force approach should be used. Typical
applications involve a "needle-in-haystack" sort of situation where a large set
of lines is to be checked for a few possible intersections.
This approach of intermixed application of recursive procedures operating
on the x and y coordinates is quite important in geometric algorithms.
Another example of this is the 2D tree algorithm of the previous chapter, and
we'll see yet another example in the next chapter.
356 CHAPTER 27
General Line Intersection
When lines of arbitrary slope are allowed, the situation can become more
complicated, as illustrated by the following example.
First, the various line orientations possible make it necessary to test explicitly
whether certain pairs of lines intersect: we can't get by with a simple interval
range test. Second, the ordering relationship between lines for the binary
tree is more complicated than before, as it depends on the current y range
of interest. Third, any intersections which do occur add new "interesting" y
values which are likely to be different from the set of y values that we get
from the line endpoints.
It turns out that these problems can be handled in an algorithm with the
same basic structure as given above. To simplify the discussion, we'll consider
an algorithm for detecting whether or not there exists an intersecting pair in
a set of N lines, and then we'll discuss how it can be extended to return all
intersections.
As before, we first sort on y to divide the space into strips within which
no line endpoints appear. Just as before, we proceed through the sorted list
of points, adding each line to a binary search tree when its bottom point is
encountered and deleting it when its top point is encountered. Just as before,
the binary tree gives the order in which the lines appear in the horizontal
GEOMETRIC INTERSECTION 357
"strip" between two consecutive y values. For example, in the strip between
the bottom endpoint of D and the top endpoint of B in the diagram above,
the lines should appear in the order F B D G H. We assume that there are
no intersections within the current horizontal strip of interest: our goal is to
maintain this tree structure and use it to help find the first intersection.
To build the tree, we can't simply use 5 coordinates from line endpoints
as keys (doing this would put B and D in the wrong order in the example
above, for instance). Instead, we use a more general ordering relationship: a
line x is defined to be to the right of a line y if both endpoints of x are on the
same side of y as a point infinitely far to the right, or if y is to the left of 2,
with "left" defined analagously. Thus, in the diagram above, B is to the right
of A and B is to the right of C (since C is to the left of B). If x is neither
to the left nor to the right of y, then they must intersect. This generalized
"line comparison" operation is a simple extension of the same procedure of
Chapter 24. Except for the use of this function whenever a comparison is
needed, the standard binary search tree procedures (even balanced trees, if
desired) can be used. For example, the following sequence of diagrams shows
the manipulation of the tree for our example between the time that line C is
encountered and the time that line D is encountered.
B
H
%
G
B
F H 9G
D
Each "comparison" performed during the tree manipulation procedures is
actually a line intersection test: if the binary search tree procedure can't
decide to go right or left, then the two lines in question must intersect, and
we're finished.
But this is not the whole story, because this generalized comparison
operation is not transitive. In the example above, F is to the left of B (because
B is to the right of F) and B is to the left of D, but F is not to the left of D. It
is essential to note this, because the binary tree deletion procedure assumes
that the comparison operation is transitive: when B is deleted from the last
tree in the above sequence, the tree
358 CHAPTER 27
is formed without F and D ever having been explicitly compared. For our
intersection-testing algorithm to work correctly, we must explicitly test that
comparisons are valid each time we change the tree structure. Specifically,
every time we make the left link of node 5 point to node y, we explicitly test
that the line corresponding to x is to the left of the line corresponding to y,
according to the above definition, and similarly for the right. Of course, this
comparison could result in the detection of an intersection, as it does in our
example.
In summary, to test for an intersection among a set of N lines, we use
the program above, but with the call to range removed, and with the binary
tree routines extended to use the generalized comparison as described above.
If there is no intersection, we'll start with a null tree and end with a null tree
without finding any incomparable lines. If there is an intersection, then the
two lines which intersect must be compared against each other at some point
during the scanning process and the intersection discovered.
Once we've found an intersection, we can't simply press on and hope
to find others, because the two lines that intersect should swap places in
the ordering directly after the point of intersection. One way to handle this
problem would be to use a priority queue instead of a binary tree for the "y
sort": initially put lines on the priority queue according to the y coordinates of
their endpoints, then work the scan line up by successively taking the smallest
y coordinate from the priority queue and doing a binary tree insert or delete
as above. When an intersection is found, new entries are added to the priority
queue for each line, using the intersection point as the lower endpoint for each.
Another way to find all intersections, which is appropriate if not too
many are expected, is to simply remove one of the intersecting lines when
an intersection is found. Then after the scan is completed, we know that all
intersecting pairs must involve one of those lines, so we can use a brute force
method to enumerate all the intersections.
An interesting feature of the above procedure is that it can be adapted to
solve the problem for testing for the existence of an intersecting pair among
a set of more general geometric shapes just by changing the generalized
comparison procedure. For example, if we implement a procedure which
GEOMETRIC INTERSECTION 359
compares two rectangles whose edges are horizontal and vertical according to
the trivial rule that rectangle 5 is to the left of rectangle y if the right edge of
x is to the left of the left edge of y, then we can use the above method to test
for intersection among a set of such rectangles. For circles, we can use the x
coordinates of the centers for the ordering, but explicitly test for intersection
(for example, compare the distance between the centers to the sum of the
radii). Again, if this comparison procedure is used in the above method, we
have an algorithm for testing for intersection among a set of circles. The
problem of returning all intersections in such cases is much more complicated,
though the brute-force method mentioned in the previous paragraph will
always work if few intersections are expected. Another approach that will
suffice for many applications is simply to consider complicated objects as sets
of lines and to use the line intersection procedure.
r l
360
Exercises
1. How would you determine whether two triangles intersect? Squares?
Regular n-gons for n > 4?
2. In the horizontal-vertical line intersection algorithm, how many pairs of
lines are tested for intersection in a set of lines with no intersections in
the worst case? Give a diagram supporting your answer.
3. What happens if the horizontal-vertical line intersection procedure is used
on a set of lines with arbitrary slope?
4. Write a program to find the number of intersecting pairs among a set
of N random horizontal and vertical lines, each line generated with two
random integer coordinates between 0 and 1000 and a random bit to
distinguish horizontal from vertical.
5. Give a method for testing whether or not a given polygon is simple
(doesn't intersect itself).
6. Give a method for testing whether one polygon is totally contained within
another.
7 Describe how you would solve the general line intersection problem given
the additional fact that the minimum separation between two lines is
greater than the maximum length of the lines.
8. Give the binary tree structure that exists when the line intersection
algorithm detects an intersection in the following set of lines:
9. Are the comparison procedures for circles and Manhattan rectangles that
are described in the text transitive?
10. Write a program to find the number of intersecting pairs among a set
of N random lines, each line generated with random integer coordinates
between 0 and 1000.
28. Closest Point Problems
Geometric problems involving points on the plane usually involve implicit
or explicit treatment of distances between the points. For example,
a very natural problem which arises in many applications is the nearestneighbor
problem: find the point among a set of given points closest to a given
new point. This seems to involve checking the distance from the given point to
each point in the set, but we'll see that much better solutions are possible. In
this section we'll look at some other distance problems, a prototype algorithm,
and a fundamental geometric structure called the Voronoi diagram that can
be used effectively for a variety of such problems in the plane. Our approach
will be to describe a general method for solving closest point problems through
careful consideration of a prototype implementation, rather than developing
full implementations of programs to solve all of the problems.
Some of the problems that we consider in this chapter are similar to the
range-searching problems of Chapter 26, and the grid and 2D tree methods
developed there are suitable for solving the nearest-neighbor and other problems.
The fundamental shortcoming of those methods is that they rely on
randomness in the point set: they have bad worst-case performance. Our aim
in this chapter is to examine yet another general approach that has guaranteed
good performance for many problems, no matter what the input. Some of the
methods are too complicated for us to examine a full implementation, and
they involve sufficient overhead that the simpler methods may do better for
actual applications where the point set is not large or where it is sufficiently
well dispersed. However, we'll see that the study of methods with good worstcase
performance will uncover some fundamental properties of point sets that
should be understood even if simpler methods turn out to be more suitable.
The general approach that we'll be examining provides yet another example
of the use of doubly recursive procedures to intertwine processing along
the two coordinate directions. The two previous methods of this type that
361
362 CHAPTER 28
we've seen (IcD trees and line intersection) have been based on binary search
trees; in this case the method is based on mergesort.
Closest Pair
The closest-pair problem is to find the two points that are closest together
among a set of points. This problem is related to the nearest-neighbor problem;
though it is not as widely applicable, it will serve us well as a prototype
closest-point problem in that it can be solved with an algorithm whose general
recursive structure is appropriate for other problems.
It would seem necessary to examine the distances between all pairs of
points to find the smallest such distance: for N points this would mean a
running time proportional to N2. However, it turns out that we can use
sorting to get by with only examining about N log N distances between points
in the worst case (far fewer on the average) to get a worst-case running time
proportional to N 1ogN (far better on the average). In this section, we'll
examine such an algorithm in detail.
The algorithm that we'll use is based on a straightforward "divide-andconquer"
strategy. The idea is to sort the points on one coordinate, say the
x coordinate, then use that ordering to divide the points in half. The closest
pair in the whole set is either the closest pair in one of the halves or the
closest pair with one member in each half. The interesting case, of course, is
when the closest pair crosses the dividing line: the closest pair in each half
can obviously be found by using recursive calls, but how can all the pairs on
either side of the dividing line be checked efficiently?
Since the only information we seek is the closest pair of the point set,
we need examine only points within distance min of the dividing line, where
min is the smaller of the distances between the closest pairs found in the two
halves. By itself, however, this observation isn't enough help in the worst
case, since there could be many pairs of points very close to the dividing line.
For example, all the points in each half could be lined up right next to the
dividing line.
To handle such situations, it seems necessary to sort the points on y.
Then we can limit the number of distance computations involving each point
as follows: proceeding through the points in increasing y order, check if each
point is inside the vertical strip consisting of all points in the plane within min
of the dividing line. For each such point, compute the distance between it and
any point also in the strip whose y coordinate is less than the y coordinate
of the current point, but not more than min less. The fact that the distance
between all pairs of points in each half is at least min means that only a few
points are likely to be checked, as demonstrated in our example set of points:
CLOSEST POINT PROBLEMS 363
I I
I /&
I '1. l L
J
\ M
A vertical dividing line just to the right of F has eight points to the left, eight
points to the right. The closest pair on the left half is AC (or AO), the closest
pair on the right is JM. If we have the points sorted on y, then the closest pair
which is split by the line is found by checking the pairs HI, CI, FK, which is
the closest pair in the whole point set, and finally EK.
Though this algorithm is simply stated, some care is required to implement
it efficiently: for example, it would be too expensive to sort the points
on y within our recursive subroutine. We've seen several algorithms with a
running time described by the recurrence T(N) = 2T(N/2)+N, which implies
that T(N) is proportional to N log N; if we were to do the full sort on y, then
the recurrence would become T(N) = 2T(N/2) + Nlog N, and it turns out
that this implies that T(N) is proportional to N log2 N. To avoid this, we
need to avoid the sort of y.
The solution to this problem is simple, but subtle. The mergesort method
from Chapter 12 is based on dividing the elements to be sorted exactly as
the points are divided above. We have two problems to solve and the same
general method to solve them, so we may as well solve them simultaneously!
Specifically, we'll write one recursive routine that both sorts on y and finds the
closest pair. It will do so by splitting the point set in half, then calling itself
recursively to sort the two halves on y and find the closest pair in each half,
364 CHAPTER 28
then merging to complete the sort on y and applying the procedure above to
complete the closest pair computation. In this way, we avoid the cost of doing
an extra y sort by intermixing the data movement required for the sort with
the data movement required for the closest pair computation.
For the y sort, the split in half could be done in any way, but for
the closest pair computation, it's required that the points in one half all
have smaller z coordinates than the points in the other half. This is easily
accomplished by sorting on x before doing the division. In fact, we may as
well use the same routine to sort on z! Once this general plan is accepted,
the implementation is not difficult to understand.
As mentioned above, the implementation will use the recursive sort and
merge procedures of Chapter 12. The first step is to modify the list structures
to hold points instead of keys, and to modify merge to check a global variable
pass to decide how to do its comparison. If pass=l, the comparison should
be done using the x coordinates of the two points; if pass=2 we do the y
coordinates of the two points. The dummy node z which appears at the
end of all lists will contain a "sentinel" point with artificially high z and y
coordinates.
The next step is to modify the recursive sort of Chapter 12 also to do the
closest-point computation when pass=2. This is done by replacing the line
containing the call to merge and the recursive calls to sort in that program
by the following code:
if pass=2 then middle:=bt.p.x;
c:=merge(sort(a, N div 2), sort(b, N-(N div 2)));
sort:=c;
if pass=2 then
a:=c; pl :=zt.p; p2:=zt.p; p3:=zt.p; p4:=zf.p;
repeat
if abs(at.p.x-middle)<min then
check(at.p, pl);
check( at .p, p2) ;
check(af.p, ~3);
check(at.p, ~4);
pl :=p2; p2:=p3; p3:=p4; p4:=at.p
end ;
a:=af.next
until a=z
end
CLOSEST POINT PROBLEMS 365
If pass=1, this is straight mergesort: it returns a linked list containing the
points sorted on their z coordinates (because of the change to merge). The
magic of this implementation comes when pass=2. The program not only sorts
on y but also completes the closest-point computation, as described in detail
below. The procedure check simply checks whether the distance between the
two points given as arguments is less than the global variable min. If so, it
resets min to that distance and saves the points in the global variables cpl
and cp2. Thus, the global min always contains the distance between cpl and
cp2, the closest pair found so far.
First, we sort on x, then we sort on y and find the closest pair by invoking
sort as follows:
new(z); zf.next:=z;
zf.p.x:=maxint; zt.p.y:=maxint;
new(h); ht.next:=readlist;
min:=maxint;
pass:=l; hf.next:=sort(hf.next, N);
pass:=2; hf.next:=sort(hf.next, N);
After these calls, the closest pair of points is found in the global variables cpl
and cp2 which are managed by the check "find the minimum" procedure.
The crux of the implementation is the operation of sort when pass=2.
Before the recursive calls the points are sorted on x: this ordering is used to
divide the points in half and to find the x coordinate of the dividing line. Ajter
the recursive calls the points are sorted on y and the distance between every
pair of points in each half is known to be greater than min. The ordering on
y is used to scan the points near the dividing line; the value of min is used to
limit the number of points to be tested. Each point within a distance of min
of the dividing line is checked against each of the previous four points found
within a distance of min of the dividing line. This is guaranteed to find any
pair of points closer together than min with one member of the pair on either
side of the dividing line. This is an amusing geometric fact which the reader
may wish to check. (We know that points which fall on the same side of the
dividing line are spaced by at least min, so the number of points falling in any
circle of radius min is limited.)
It is interesting to examine the order in which the various vertical dividing
lines are tried in this algorithm. This can be described with the aid of the
following binary tree:
CHAPTER 28
G OA DE CH FI KB PN JM L
Each node in this tree represents a vertical line dividing the points in the left
and right subtree. The nodes are numbered in the order in which the vertical
lines are tried in the algorithm. Thus, first the line between G and 0 is tried
and the pair GO is retained as the closest so far. Then the line between A and
D is tried, but A and D are too far apart to change min. Then the line between
0 and A is tried and the pairs GD G-4 and OA all are successively closer pairs.
It happens for this example that no closer pairs are found until FK, which is
the last pair checked for the last dividing line tried. This diagram reflects the
difference between top-down and bottom-up mergesort. A bottom-up version
of the closest-pair problem can be developed in the same way as for mergesort,
which would be described by a tree like the one above, numbered left to right
and bottom to top.
The general approach that we've used for the closest-pair problem can
be used to solve other geometric problems. For example, another problem of
interest is the all-nearest-neighbors problem: for each point we want to find
the point nearest to it. This problem can be solved using a program like the
one above with extra processing along the dividing line to find, for each point,
whether there is a point on the other side closer than its closest point on its
own side. Again, the "free" y sort is helpful for this computation.
Voronoi Diagrams
The set of all points closer to a given point in a point set than to all other points
in the set is an interesting geometric structure called the Voronoi polygon for
the point. The union of all the Voronoi polygons for a point set is called its
Voronoi diagram. This is the ultimate in closest-point computations: we'll see
that most of the problems involving distances between points that we face
have natural and interesting solutions based on the Voronoi diagram. The
diagram for our sample point set is comprised of the thick lines in the diagram
below:
CLOSEST POINT PROBLEMS 367
Basically, the Voronoi polygon for a point is made up of the perpendicular
bisectors separating the point from those points closest to it. The actual
definition is the other way around: the Voronoi polygon is defined to be the
set of all points in the plane closer to the given point than to any other point
in the point set, and the points "closest to" a point are defined to be those that
lead to edges on the Voronoi polygon. The dual of the Voronoi diagram makes
this correspondence explicit: in the dual, a line is drawn between each point
and all the points "closest to" it. Put another way, x and y are connected in
the Voronoi dual if their Voronoi polygons have an edge in common. The dual
for our example is comprised of the thin dotted lines in the above diagram.
The Voronoi diagram and its dual have many properties that lead to
efficient algorithms for closest-point problems. The property that makes these
algorithms efficient is t,hat the number of lines in both the diagram and the
dual is proportional to a small constant times N. For example, the line
connecting the closest pair of points must be in the dual, so the problem of
the previous section can be solved by computing the dual and then simply
finding the minimum length line among the lines in the dual. Similarly, the
line connecting each point to its nearest neighbor must be in the dual, so the
all-nearest-neighbors problem reduces directly to finding the dual. The convex
hull of the point set is part of the dual, so computing the Voronoi dual is yet
CHAPTER 28
another convex hull algorithm. We'll see yet another example in Chapter 31
of a problem which can be efficiently solved by first finding the Voronoi dual.
The defining property of the Voronoi diagram means that it can be used
to solve the nearest-neighbor problem: to identify the nearest neighbor in a
point set to a given point, we need only find out which Voronoi polygon the
point falls in. It is possible to organize the Voronoi polygons in a structure
like a 2D tree to allow this search to be done efficiently.
The Voronoi diagram can be computed using an algorithm with the same
general structure as the closest-point algorithm above. The points are first
sorted on their x coordinate. Then that ordering is used to split the points in
half, leading to two recursive calls to find the Voronoi diagram of the point
set for each half. At the same time, the points are sorted on y; finally, the
two Voronoi diagrams for the two halves are merged together. As before, the
merging together (done with pass=2) can make use of the fact that the points
are sorted on x before the recursive calls and that they are sorted on y and the
Voronoi diagrams for the two halves have been built after the recursive calls.
However, even with these aids, it is quite a complicated task, and presentation
of a full implementation would be beyond the scope of this book.
The Voronoi diagram is certainly the natural structure for closest-point
problems, and understanding the characteristics of a problem in terms of
the Voronoi diagram or its dual is certainly a worthwhile exercise. However,
for many particular problems, a direct implementation based on the general
schema given in this chapter may be suitable. This is powerful enough to
compute the Voronoi diagram, so it is powerful enough for algorithms based
on the Voronoi diagram, and it may admit to simpler, more efficient code, just
as we saw for the closest-nair nroblem.
369
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Write programs to solve the nearest-neighbor problem, first using the grid
method, then using 2D trees.
Describe what happens when the closest-pair procedure is used on a set
of points that fall on the same horizontal line, equally spaced.
Describe what happens when the closest-pair procedure is used on a set
of points that fall on the same vertical line, equally spaced.
Give an algorithm that, given a set of 2N points, half with positive z
coordinates, half with negative x coordinates, finds the closest pair with
one member of the pair in each half.
Give the successive pairs of points assigned to cpl and cp2 when the
program in the text is run on the example points, but with A removed.
Test the effectiveness of making min global by comparing the performance
of the implementation given to a purely recursive implementation for
some large random point set.
Give an algorithm for finding the closest pair from a set of lines.
Draw the Voronoi diagram and its dual for the points A B C D E F from
the sample point set.
Give a "brute-force" method (which might require time proportional to
N2) for computing the Voronoi diagram.
Write a program that uses the same recursive structure as the closest-pair
implementation given in the text to find the convex hull of a set of points.
370
SOURCES for Geometric Algorithms
Much of the material described in this section has actually been developed
quite recently, so there are many fewer available references than for older,
more central areas such as sorting or mathematical algorithms. Many of the
problems and solutions that we've discussed were presented by M. Shamos in
1975. Shamos' manuscript treats a large number of geometric algorithms, and
has stimulated much of the recent research.
For the most part, each of the geometric algorithms that we've discussed
is described in its own original reference. The convex hull algorithms treated
in Chapter 25 may be found in the papers by Jarvis, Graham, and Eddy. The
range searching methods of Chapter 26 come from Bentley and Freidman's
survey article, which contains many references to original sources (of particular
interest is Bentley's own original article on kD trees, written while he was an
undergraduate). The treatment of the closest point problems in Chapter 28 is
based on Shamos and Hoey's 1976 paper, and the intersection algorithms of
Chapter 27 are from their 1975 paper and the article by Bentley and Ottmann.
But the best route for someone interested in learning more about geometric
algorithms is to implement some, work with them and try to learn about
their behavior on different types of point sets. This field is still in its infancy
and the best algorithms are yet to be discovered.
J. L. Bentley, "Multidimensional binary search trees used for associative
searching," Communications of the ACM, 18, 9 (September, 1975).
J. L. Bentley and J.H. Friedman, "Data structures for range searching,"
Computing Surveys, 11, 4 (December, 1979).
J. L. Bentley and T. Ottmann, "Algorithms for reporting and counting geometric
intersections," IEEE Transactions on Computing, C-28, 9 (September,
1979).
W. F. Eddy, "A new convex hull algorithm for planar sets," ACM Transactions
on Mathematical Software, 3 (1977).
R. L. Graham, "An efficient algorithm for determining the convex hull of a
finite planar set," Information Processing Letters, 1 (1972).
R. A. Jarvis, "On the identification of the convex hull of a finite set of points
in the plane," Information Processing Letters, 2 (1973).
M. I. Shamos, Problems in Computational Geometry, unpublished manuscript,
1975.
M. I. Shamos and D. Hoey, "Closest-point problems," in 16th Annual Symposium
on Foundations of Computer Science, IEEE, 1975.
M. I. Shamos and D. Hoey, "Geometric intersection problems," in 17th Annual
Symposium on Foundations of Computer Science, IEEE, 1976.
GRAPH ALGORITHMS
29. Elementary Graph Algorithms
A great many problems are naturally formulated in terms of objects
and connections between them. For example, given an airline route
map of the eastern U. S., we might be interested in questions like: "What's
the fastest way to get from Providence to Princeton?" Or we might be more
interested in money than in time, and look for the cheapest way to get from
Providence to Princeton. To answer such questions we need only information
about interconnections (airline routes) between objects (towns).
Electric circuits are another obvious example where interconnections between
objects play a central role. Circuit elements like transistors, resistors,
and capacitors are intricately wired together. Such circuits can be represented
and processed within a computer in order to answer simple questions like "Is
everything connected together?" as well as complicated questions like "If
this circuit is built, will it work?" In this case, the answer to the first question
depends only on the properties of the interconnections (wires), while the
answer to the second question requires detailed information about both the
wires and the objects that they connect.
A third example is "job scheduling," where the objects are tasks to be
performed, say in a manufacturing process, and interconnections indicate
which jobs should be done before others. Here we might be interested in
answering questions like "When should each task be performed?"
A graph is a mathematical object which accurately models such situations.
In this chapter, we'll examine some basic properties of graphs, and in the next
several chapters we'll study a variety of algorithms for answering questions of
the type posed above.
Actually, we've already encountered graphs in several instances in previous
chapters. Linked data structures are actually representations of graphs,
and some of the algorithms that we'll see for processing graphs are similar to
algorithms that we've already seen for processing trees and other structures.
373
374 CHAPTER 29
For example, the finite-state machines of Chapters 19 and 20 are represented
with graph structures.
Graph theory is a major branch of combinatorial mathematics and has
been intensively studied for hundreds of years. Many important and useful
properties of graphs have been proved, but many difficult problems have yet to
be resolved. We'll be able here only to scratch the surface of what is known
about graphs, covering enough to be able to understand the fundamental
algorithms.
As with so many of the problem domains that we've studied, graphs
have only recently begun to be examined from an algorithmic point of view.
Although some of the fundamental algorithms are quite old, many of the
interesting ones have been discovered within the last ten years. Even trivial
graph algorithms lead to interesting computer programs, and the nontrivial
algorithms that we'll examine are among the most elegant and interesting
(though difficult to understand) algorithms known.
Glossary
A good deal of nomenclature is associated with graphs. Most of the terms
have straightforward definitions, and it is convenient to put them in one place
even though we won't be using some of them until later.
A graph is a collection of vertices and edges. Vertices are simple objects
which can have names and other properties; an edge is a, connection between
two vertices. One can draw a graph by marking points for the vertices and
drawing lines connecting them for the edges, but it must be borne in mind
that the graph is defined independently of the representation. For example,
the following two drawings represent the same graph:
We define this graph by saying that it consists of the set of vertices A B C D
E F G H I J K L M and the set of edges between these vertices AG Al3 AC
LMJMJLJKEDFDHIFEAF'GE.
ELEMENTARY GRAPH ALGORITHMS 375
For some applications, such as the airline route example above, it might
not make sense to rearrange the placement of the vertices as in the diagrams
above. But for some other applications, such as the electric circuit application
above, it is best to concentrate only on the edges and vertices, independent
of any particular geometric placement. And for still other applications, such
as the finite-state machines in Chapters 19 and 20, no particular geometric
placement of nodes is ever implied. The relationship between graph algorithms
and geometric problems is discussed in further detail in Chapter 31. For now,
we'll concentrate on "pure" graph algorithms which process simple collections
of edges and nodes.
A path from vertex x to y in a graph is a list of vertices in which successive
vertices are connected by edges in the graph. For example, BAFEG is a path
from B to G in the graph above. A graph is connected if there is a path
from every node to every other node in the graph. Intuitively, if the vertices
were physical objects and the edges were strings connecting them, a connected
graph would stay in one piece if picked up by any vertex. A graph which is
not connected is made up of connected components; for example, the graph
drawn above has three connected components. A simple path is a path in
which no vertex is repeated. (For example, BAFEGAC is not a simple path.)
A cycle is a path which is simple except that the first and last vertex are the
same (a path from a point back to itself): the path AFEGA is a cycle.
A graph with no cycles is called a tree. There is only one path between
any two nodes in a tree. (Note that binary trees and other types of trees that
we've built with algorithms are special cases included in this general definition
of trees.) A group of disconnected trees is called a forest. A spanning tree of a
graph is a subgraph that contains all the vertices but only enough of the edges
to form a tree. For example, below is a spanning tree for the large component
of our sample graph.
Note that if we add any edge to a tree, it must form a cycle (because
there is already a path between the two vertices that it connects). Also, it is
easy to prove by induction that a tree on V vertices has exactly V - 1 edges.
376 CHAPTER 29
If a graph with V vertices has less than V - 1 edges, it can't be connected.
If it has more that V - 1 edges, it must have a cycle. (But if it has exactly
V - 1 edges, it need not be a tree.)
We'll denote the number of vertices in a given graph by V, the number
of edges by E. Note that E can range anywhere from 0 to $V(V - 1). Graphs
with all edges present are called complete graphs; graphs with relatively few
edges (say less than Vlog V) are called sparse; graphs with relatively few of
the possible edges missing are called dense.
This fundamental dependence on two parameters makes the comparative
study of graph algorithms somewhat more complicated than many algorithms
that we've studied, because more possibilities arise. For example, one algorithm
might take about V2 steps, while another algorithm for the same problem
might take (E + V) log E steps. The second algorithm would be better for
sparse graphs, but the first would be preferred for dense graphs.
Graphs as defined to this point are called undirected graphs, the simplest
type of graph. We'll also be considering more complicated type of graphs, in
which more information is associated with the nodes and edges. In weighted
graphs integers (weights) are assigned to each edge to represent, say, distances
or costs. In directed graphs , edges are "one-way": an edge may go from x to
y but not the other way. Directed weighted graphs are sometimes called networks.
As we'll discover, the extra information weighted and directed graphs
contain makes them somewhat more difficult to manipulate than simple undirected
graphs.
Representation
In order to process graphs with a computer program, we first need to decide
how to represent them within the computer. We'll look at two commonly used
representations; the choice between them depends whether the graph is dense
or sparse.
The first step in representing a graph is to map the vertex names to
integers between 1 and V. The main reason for doing this is to make it
possible to quickly access information corresponding to each vertex, using
array indexing. Any standard searching scheme can be used for this purpose:
for instance, we can translate vertex names to integers between 1 and V
by maintaining a hash table or a binary tree which can be searched to find
the integer corresponding to any given vertex name. Since we have already
studied these techniques, we'll assume that we have available a function index
to convert from vertex names to integers between 1 and V and a function name
to convert from integers to vertex names. In order to make the algorithms easy
to follow, our examples will use one-letter vertex names, with the ith letter
of the alphabet corresponding to the integer i. Thus, though name and index
ELEMENTARY GRAPH ALGORITHMS 377
are trivial to implement for our examples, their use makes it easy to extend
the algorithms to handle graphs with real vertex names using techniques from
Chapters 14-17.
The most straightforward representation for graphs is the so-called adjacenc
y matrix representation. A V-by-V array of boolean values is maintained,
with a[x, y] set to true if there is an edge from vertex x to vertex y
and false otherwise. The adjacency matrix for our example graph is given
below.
A B C D E F G H I J K L M
A 1 1 1 0 0 1 1 0 0 0 0 0 0
B1100000000000
c1010000000000
D0001110000000
E0001111000000
F1001110000000
G1000101000000
H0000000110000
1 0 0 0 0 0 0 0 1 1 0 0 0 0
J0000000001111
K O O O O O O O O O 1 1 0 0
L0000000001011
M0000000001011
Notice that each edge is really represented by two bits: an edge connecting
x and y is represented by true values in both a[x, y] and a[y, x]. While it
is possible to save space by storing only half of this symmetric matrix, it
is inconvenient to do so in Pascal and the algorithms are somewhat simpler
with the full matrix. Also, it's sometimes convenient to assume that there's
an "edge" from each vertex to itself, so a[x, x] is set to 1 for x from 1 to V.
A graph is defined by a set of nodes and a set of edges connecting them.
To read in a graph, we need to settle on a format for reading in these sets.
The obvious format to use is first to read in the vertex names and then read
in pairs of vertex names (which define edges). As mentioned above, one easy
way to proceed is to read the vertex names into a hash table or binary search
tree and to assign to each vertex name an integer for use in accessing vertexindexed
arrays like the adjacency matrix. The ith vertex read can be assigned
the integer i. (Also, as mentioned above, we'll assume for simplicity in our
examples that the vertices are the first V letters of the alphabet, so that we
can read in graphs by reading V and E, then E pairs of letters from the first
378 CHAPTER 29
V letters of the alphabet.) Of course, the order in which the edges appear is
not relevant. All orderings of the edges represent the same graph and result
in the same adjacency matrix, as computed by the following program:
program adjmatrix(input, output);
const maxV=50;
var j, x, y, V, E: integer;
a: array[l..maxV, l..maxq of boolean;
readln (V, E) ;
for x:=1 to Vdo
for y:=l to V do a[x, y] :=false;
for x:=1 to V do a[x, x] :=true;
for j:=l to E do
readln (vl , v2) ;
x:=index(vl); y:=index(v2);
a[x,y]:=true; a[y,x]:=true
end ;
end.
The types of vl and v2 are omitted from this program, as well as the code for
index. These can be added in a straightforward manner, depending on the
graph input representation desired. (For our examples, vl and v2 could be of
type char and index a simple function which uses the Pascal ord function.)
The adjacency matrix representation is satisfactory only if the graphs
to be processed are dense: the matrix requires V2 bits of storage and V2
steps just to initialize it. If the number of edges (the number of one bits
in the matrix) is proportional to V2, then this may be no problem because
about V2 steps are required to read in the edges in any case, but if the graph
is sparse, just initializing this matrix could be the dominant factor in the
running time of an algorithm. Also this might be the best representation for
some algorithms which require more than V2 steps for execution. Next we'll
look at a representation which is more suitable for graphs which are not dense.
In the adjacency structure representation all the vertices connected to
each vertex are listed on an adjacency list for that vertex. This can be easily
accomplished with linked lists, as shown in the program below which builds
the adjacency structure for our sample graph.
ELEMENTARY GRAPH ALGORITHMS 379
program adjlist(input, output);
const maxV= 1000;
type link=fnode;
node=record v: integer; next: link end;
var j, x, y, V, E: integer;
t, z: link;
adj: array[I..maxV] of link;
readln (V, E) ;
new(z); zt.next:=z;
for j:=l to V do adjb] :=z;
for j:=l to E do
readln (vl , v2) ;
x:=index(vl); y:=index(v2);
n e w ( t ) ; tt.v:=x; tf.next:=adj[y]; adj[y]:=t;
new(t); tf.v:=y; tt.next:=adj[x]; adj[x]:=t;
end ;
end.
(As usual, each linked list ends with a link to an artificial node z, which
links to itself.) For this representation, the order in which the edges appear
in the input is quite relevant: it (along with the list insertion method used)
determines the order in which the vertices appear on the adjacency lists. Thus,
the same graph can be represented in many different ways in an adjacency
list structure. Indeed, it is difficult to predict what the adjacency lists will
look like by examining just the sequence of edges, because each edge involves
insertions into two adjacency lists.
The order in which edges appear on the adjacency list affects, in turn, the
order in which edges are processed by algorithms. That is, the adjacency list
structure determines the way that various algorithms that we'll be examining
"see" the graph. While an algorithm should produce a correct answer no
matter what the order of the edges on the adjacency lists, it might get to that
answer by quite different sequences of computations for different orders. And
if there is more than one "correct answer," different input orders might lead
to different output results.
If the edges appear in the order listed after the first drawing of our sample
graph at the beginning of the chapter, the program above builds the following
adjacency list structure:
CXAPTER 29
A: F C B G
B: A
C: A
D: F E
E: G F D
F: A E D
G: E A
H: I
I: H
J: K L M
K: J
L: J M
M: J L
Note that again each edge is represented twice: an edge connecting x and
y is represented as a node containing x on y's adjacency list and as a node
containing y on x's adjacency list. It is important to include both, since
otherwise simple questions like "Which nodes are connected directly to node
x?" could not be answered efficiently.
Some simple operations are not supported by this representation. For
example, one might want to delete a vertex, x, and all the edges connected to
it. It's not sufficient to delete nodes from the adjacency list: each node on the
adjacency list specifies another vertex whose adjacency list must be searched
for a node corresponding to x to be deleted. This problem can be corrected by
linking together the two list nodes which correspond to a particular edge and
making the adjacency lists doubly linked. Then if an edge is to be removed,
both list nodes corresponding to that edge can be deleted quickly. Of course,
all these extra links are quite cumbersome to process, and they certainly
shouldn't be included unless operations like deletion are needed.
Such considerations also make it plain why we don't use a "direct"
representation for graphs: a data structure which exactly models the graph,
with vertices represented as allocated records and edge lists containing links
to vertices instead of vertex names. How would one add an edge to a graph
represented in this way?
Directed and weighted graphs are represented with similar structures. For
directed graphs, everything is the same, except that each edge is represented
just once: an edge from x to y is represented by a true value in a [x, y] in
the adjacency matrix or by the appearance of y on x's adjacency list in the
adjacency structure. Thus an undirected graph might be thought of as a
directed graph with directed edges going both ways between each pair of
vertices connected by an edge. For weighted graphs, everything again is the
same except that we fill the adjacency matrix with weights instead of boolean
ELEMENTARY GRAPH ALGORITHMS
values (using some non-existent weight to represent false), or we include a
field for the edge weight in adjacency list records in the adjacency structure.
It is often necessary to associate other information with the vertices
or nodes of a graph to allow it to model more complicated objects or to
save bookkeeping information in complicated algorithms. Extra information
associated with each vertex can be accommodated by using auxiliary arrays
indexed by vertex number (or by making adj an array of records in the
adjacency structure representation). Extra information associated with each
edge can be put in the adjacency list nodes (or in an array a of records in
the adjacency matrix representation), or in auxiliary arrays indexed by edge
number (this requires numbering the edges).
Depth-First Search
At the beginning of this chapter, we saw several natural questions that arise
immediately when processing a graph. Is the graph connected? If not, what
are its connected components? Does the graph have a cycle? These and many
other problems can be easily solved with a technique called depth-first search,
which is a natural way to "visit" every node and check every edge in the graph
systematically. We'll see in the chapters that follow that simple variations
on a generalization of this method can be used to solve a variety of graph
problems.
For now, we'll concentrate on the mechanics of examining every piece
of the graph in an organized way. Below is an implementation of depth-first
search which fills in an array vaJ [l..Vl as it visits every vertex of the graph.
The array is initially set to all zeros, so vaJ[k]=O indicates that vertex k has
not yet been visited. The goal is to systematically visit all the vertices of the
graph, setting the vaJ entry for the nowth vertex visited to now, for now=
1,2,..., V. The program uses a recursive procedure visit which visits all the
vertices in the same connected component as the vertex given in the argument.
To visit a vertex, we check all its edges to see if they lead to vertices which
haven't yet been visited (as indicated by 0 vaJ entries); if so, we visit them:
382 CHAPTER 29
procedure dfs;
var now, k: integer;
val: array [l..maxv] of integer;
procedure visit(k: integer);
var t: link;
now:=now+l; val[k] :=now;
t:=adj[k];
while t<>z do
if val[tt.v]=O then visit(tf.v);
t:=tf.next
end
end ;
now:=O;
for k:=l to V do val[k] :=O;
for k:=l to V do
if val[k]=O then visit(k)
end ;
First visit is called for the first vertex, which results in nonzero val values
being set for all the vertices connected to that vertex. Then dfs scans through
the vaJ array to find a zero entry (corresponding to a vertex that hasn't been
seen yet) and calls visit for that vertex, continuing in this way until all vertices
have been visited.
The best way to follow the operation of depth-first search is to redraw
the graph as indicated by the recursive calls during the visit procedure. This
gives the following structure.
8H
9
8I
Vertices in this structure are numbered with their val values: the vertices are
ELEMENTARY GRAPH ALGORITHMS
actually visited in the order A F E G D C B H I J K L M. Each connected
component leads to a tree, called the depth-first search tree. It is important
to note that this forest of depth-first search trees is simply another way of
drawing the graph; all vertices and edges of the graph are examined by the
algorithm.
Solid lines in the diagram indicate that the lower vertex was found by the
algorithm to be on the edge list of the upper vertex and had not been visited
at that time, so that a recursive call was made. Dotted lines correspond to
edges to vertices which had already been visited, so the if test in visit failed,
and the edge was not "followed" with a recursive call. These comments apply
to the first time each edge is encountered; the if test in visit also guards
against following the edge the second time that it is encountered. For example,
once we've gone from A to F (on encountering F in A's adjacency list), we
don't want to go back from F to A (on encountering A in F's adjacency list).
Similarly, dotted links are actually checked twice: even though we checked
that A was already visited while at G (on encountering A in G's adjacency
list), we'll check that G was already visited later on when we're back at A (on
encountering G in A's adjacency list).
A crucial property of these depth-first search trees for undirected graphs
is that the dotted links always go from a node to some ancestor in the tree
(another node in the same tree, that is higher up on the path to the root).
At any point during the execution of the algorithm, the vertices divide into
three classes: those for which visit has finished, those for which visit has only
partially finished, and those which haven't been seen at all. By definition of
visit, we won't encounter an edge pointing to any vertex in the first class,
and if we encounter an edge to a vertex in the third class, a recursive call
will be made (so the edge will be solid in the depth-first search tree). The
only vertices remaining are those in the second class, which are precisely the
vertices on the path from the current vertex to the root in the same tree, and
any edge to any of them will correspond to a dotted link in the depth-first
search tree.
The running time of dfs is clearly proportional to V + E for any graph.
We set each of the V val values (hence the V term), and we examine each
edge twice (hence the E term).
The same method can be applied to graphs represented with adjacency
matrices by using the following visit procedure:
384 CHAPTER 29
procedure visit(k: integer);
var t: integer;
now:=now+l; vaI[k] :=now;
for t:=l to Vdo
if a[k, t] then
if val[t]=O then visit(t);
end ;
Traveling through an adjacency list translates to scanning through a row in
the adjacency matrix, looking for true values (which correspond to edges). As
before, any edge to a vertex which hasn't been seen before is "followed" via
a recursive call. Now, the edges connected to each vertex are examined in a
different order, so we get a different depth-first search forest:
This underscores the point that the depth-first search forest is simply
another representation of the graph whose particular structure depends both
on the search algorithm and the internal representation used. The running
time of dfs when this visit procedure is used is proportional to V2, since every
bit in the adjacency matrix is checked.
Now, testing if a graph has a cycle is a trivial modification of the above
program. A graph has a cycle if and only if a nonaero val entry is discovered
in visit. That is, if we encounter an edge pointing to a vertex that we've
already visited, then we have a cycle. Equivalently, all the dotted links in the
depth-first search trees belong to cycles.
Similarly, depth-first search finds the connected components of a graph.
Each nonrecursive call to visit corresponds to a different connected component.
An easy way to print out the connected components is to have visit print out
ELEMENTARY GRAPH ALGORITMMS 385
the vertex being visited (say, by inserting write(name(k)) just before exiting),
then print 6ut some indication that a new connected component is to start
just before the call to visit, in dfs (say, by inserting two writeln statements).
This technique would produce the following output when dfs is used on the
adjacency list representation of our sample graph:
G D E F C B A
I H
K M L J
Note that the adjacency matrix version of visit will compute the same connected
components (of course), but that the vertices will be printed out in a
different order.
Extensions to do more complicated processing on the connected components
are straightforward. For example, by simply inserting invaJ[now]=k
after vaJ[k]=now we get the "inverse" of the vaJ array, whose nowth entry
is the index of the nowth vertex visited. (This is similar to the inverse heap
that we studied at the end of Chapter 11, though it serves a quite different
purpose.) Vertices in the same connected components are contiguous in this
array, the index of each new connected component given by the value of now
each time visit is called in dfs. These values could be stored, or used to mark
delimiters in inval (for example, the first entry in each connected component
could be made negative). The following table would be produced for our
example if the adjacency list version of dfs were modified in this way:
k n a m e ( k ) vaJIk1 invaJ[k]
1 A 1 - 1
2 B 7 6
3 C 6 5
4 D 5 7
5 E 3 4
6 F 2 3
7 G 4 2
8 H 8 - 8
9 I 9 9
10 J 10 - 1 0
11 K 11 11
12 L 12 12
13 M 13 13
With such techniques, a graph can be divided up into its connected com386
CHAPTER 29
ponents for later processing by more sophisticated algorithms.
Mazes
This systematic way of examining every vertex and edge of a graph has a
distinguished history: depth-first search was first stated formally hundreds
of years ago as a method for traversing mazes. For example, at left in the
diagram below is a popular maze, and at right is the graph constructed by
putting a vertex at each point where there is more than one path to take,
then connecting the vertices according to the paths:
This is significantly more complicated than early English garden mazes,
which were constructed as paths through tall hedges. In these mazes, all
walls were connected to the outer walls, so that gentlemen and ladies could
stroll in and clever ones could find their way out by simply keeping their
right hand on the wall (laboratory mice have reportedly learned this trick).
When independent inside walls can occur, it is necessary to resort to a more
sophisticated strategy to get around in a maze, which leads to depth-first
search.
To use depth-first search to get from one place to another in a maze, we
use visit, starting at the vertex on the graph corresponding to our starting
point. Each time visit "follows" an edge via a recursive call, we walk along
the corresponding path in the maze. The trick in getting around is that we
must walk back along the path that we used to enter each vertex when visit
finishes for that vertex. This puts us back at the vertex one step higher up
in the depth-first search tree, ready to follow its next edge.
The maze graph given above is an interesting "medium-sized" graph
which the reader might be amused to use as input for some of the algorithms in
later chapters. To fully capture the correspondence with the maze, a weighted
ELEMENTARY GRAPH ALGORITHMS 387
version of the graph should be used, with weights on edges corresponding to
distances (in the maze) between vertices.
Perspective
In the chapters that follow we'll consider a variety of graph algorithms largely
aimed at determining connectivity properties of both undirected and directed
graphs. These algorithms are fundamental ones for processing graphs, but
are only an introduction to the subject of graph algorithms. Many interesting
and useful algorithms have been developed which are beyond the scope of
this book, and many interesting problems have been studied for which good
algorithms have not yet been found.
Some very efficient algorithms have been developed which are much too
complicated to present here. For example, it is possible to determine efficiently
whether or not a graph can be drawn on the plane without any intersecting
lines. This problem is called the planarity problem, and no efficient algorithm
for solving it was known until 1974, when R. E. Tarjan developed an ingenious
(but quite intricate) algorithm for solving the problem in linear time, using
depth-first search.
Some graph problems which arise naturally and are easy to state seem
to be quite difficult, and no good algorithms are known to solve them. For
example, no efficient algorithm is known for finding the minimum-cost tour
which visits each vertex in a weighted graph. This problem, called the
traveling salesman problem, belongs to a large class of difficult problems that
we'll discuss in more detail in Chapter 40. Most experts believe that no
efficient algorithms exist for these problems.
Other graph problems may well have efficient algorithms, though none has
been found. An example of this is the graph isomorphism problem: determine
whether two graphs could be made identical by renaming vertices. Efficient
algorithms are known for this problem for many special types of graphs, but
the general problem remains open.
In short, there is a wide spectrum of problems and algorithms for dealing
with graphs. We certainly can't expect to solve every problem which comes
along, because even some problems which appear to be simple are still baffling
the experts. But many problems which are relatively easy to solve do arise
quite often, and the graph algorithms that we will study serve well in a great
variety of applications.
388
Exercises
1. Which undirected graph representation is most appropriate for determining
quickly whether a vertex is isolated (is connected to no other vertices)?
2. Suppose depth-first search is used on a binary search tree and the right
edge taken before the left out of each node. In what order are the nodes
visited?
3. How many bits of storage are required to represent the adjacency matrix
for an undirected graph with V nodes and E edges, and how many are
required for the adjacency list representation?
4. Draw a graph which cannot be written down on a piece of paper without
two edges crossing.
5. Write a program to delete an edge from a graph represented with adjacency
lists.
6. Write a version of adjlist that keeps the adjacency lists in sorted order of
vertex index. Discuss the merits of this approach.
7. Draw the depth-first search forests that result for the example in the text
when dfs scans the vertices in reverse order (from V down to l), for both
representations.
8. Exactly how many times is visit called in the depth-first search of an
undirected graph, in terms of the number of vertices V, the number of
edges E, and the number of connected components C?
9. Find the shortest path which connects all the vertices in the maze graph
example, assuming each edge to be of length 1.
10. Write a program to generate a "random" graph of V vertices and E edges
as follows: for each pair of integers i < j between 1 and V, include an
edge from i to j if and only if randomint(V*(V-l)div 2) is less than
E. Experiment to determine about how many connected components are
created for V = E = 10,100, and 1000.
30. Connectivity
The fundamental depth-first search procedure in the previous chapter
finds the connected components of a given graph; in this section we'll
examine related algorithms and problems concerning other graph connectivity
properties.
As a first example of a non-trivial graph algorithm we'll look at a generalization
of connectivity called biconnectivity. Here we are interested in knowing
if there is more than one way to get from one vertex to another in the graph.
A graph is biconnected if and only if there are at least two different paths
connecting each pair of vertices. Thus even if one vertex and all the edges
touching it are removed, the graph is still connected. If it is important that
a graph be connected for some application, it might also be important that
it stay connected. We'll look at a method for testing whether a graph is
biconnected using depth-first search.
Depth-first search is certainly not the only way to traverse the nodes of
a graph. Other strategies are appropriate for other problems. In particular,
we'll look at breadth-first search, a method appropriate for finding the shortest
path from a given vertex to any other vertex. This method turns out to differ
from depth-first search only in the data structure used to save unfinished paths
during the search. This leads to a generalized graph traversal program that
encompasses not just depth-first and breadth-first search, but also classical
algorithms for finding the minimum spanning tree and shortest paths in the
graph, as we'll see in Chapter 31.
One particular version of the connectivity problem which arises frequently
involves a dynamic situation where edges are added to the graph one by one,
interspersed with queries as to whether or not two particular vertices belong
to the same connected component. We'll look at an interesting family of
algorithms for this problem. The problem is sometimes called the "union-find"
problem, a nomenclature which comes from the application of the algorithms
389
390 CHAPTER 30
to processing simple operations on sets of elements.
Biconnectivity
It is sometimes reasonable to design more than one route between points on a
graph, so as to handle possible failures at the connection points (vertices). For
example, we can fly from Providence to Princeton even if New York is snowed
in by going through Philadelphia instead. Or the main communications lines
in an integrated circuit might be biconnected, so that the rest of the circuit
still can function if one component fails. Another application, which is not
particularly realistic but which illustrates the concept is to imagine a wartime
stituation where we can make it so that an enemy must bomb at least two
stations in order to cut our rail lines.
An articulation point in a connected graph is a vertex which, if deleted,
would break the graph into two or more pieces. A graph with no articulation
points is said to be biconnected. In a biconnected graph, there are two distinct
paths connecting each pair of vertices. If a graph is not biconnected, it
divides into biconnected components, sets of nodes mutually accessible via two
distinct paths. For example, consider the following undirected graph, which
is connected but not biconnected:
(This graph is obtained from the graph of the previous chapter by adding
the edges GC, GH, JG, and LG. In our examples, we'll assume that these
fours edges are added in the order given at the end of the input, so that
(for example) the adjacency lists are similar to those in the example of the
previous chapter with eight new entries added to the lists to reflect the four
new edges.) The articulation points of this graph are A (because it connects
B to the rest of the graph), H (because it connects I to the rest of the graph),
J (because it connects K to the rest of the graph), and G (because the graph
would fall into three pieces if G were deleted). There are six biconnected
components: ACGDEF, GJLM, and the individual nodes B, H, I, and K.
Determining the articulation points turns out to be a simple extension
CONNECTIVITY 391
of depth-first search. To see this, consider the depth-first search tree for this
graph (adjacency list representation):
Deleting node E will not disconnect the graph because G and D both have
dotted links that point above E, giving alternate paths from them to F (E's
father in the tree). On the other hand, deleting G will disconnect the graph
because there are no such alternate paths from L or H to E (L's father).
A vertex x is not an articulation point if every son y has some node
lower in the tree connected (via a dotted link) to a node higher in the tree
than x, thus providing an alternate connection from x to y. This test doesn't
quite work for the root of the depth-first search tree, since there are no nodes
"higher in the tree." The root is an articulation point if it has two or more
sons, since the only path connecting sons of the root goes through the root.
These tests are easily incorporated into depth-first search by changing the
node-visit procedure into a function which returns the highest point in the
tree (lowest val value) seen during the search, as follows:
CHAPTER 30
function visit (k: integer): integer;
var t: link;
m, min: integer;
now:=now+l; val[k]:=now; min:=now;
t:=adj[k];
while t<>z do
if val[tt.v]=O then
m:=visit(tt.v);
if m<min then min:=m;
if m>=val[k] then write(name(k));
end
else if val[tt.v]<min then min:=val[tt.v];
t:=tf.next
end
visit:=min;
end ;
This procedure recursively determines the highest point in the tree reachable
(via a dotted link) from any descendant of vertex k and uses this information
to determine if k is an articulation point. Normally this calculation
simply involves testing whether the minimum value reachable from a son is
higher up in the tree, but we need an extra test to determine whether k is the
root of a depth-first search tree (or, equivalently, whether this is the first call
to visit for the connected component containing k), since we're using the same
recursive program for both cases. This test is properly performed outside the
recursive visit, so it does not appear in the code above.
The program above simply prints out the articulation points. Of course,
as before, it is easily extended to do additional processing on the articulation
points and biconnected components. Also, since it is a depth-first search
procedure, the running time is proportional to V + E.
Besides the "reliability" sort of application mentioned above, biconnectedness
can be helpful in decomposing large graphs into manageable pieces. It is
obvious that a very large graph may be processed one connected component
at a time for many applications; it is somewhat less obvious but sometimes as
useful that a graph can sometimes be processed one biconnected component
at a time.
CONNECTMTY
Graph Traversal Algorithms
Depth-first search is a member of a family of graph traversal algorithms that
are quite natural when viewed nonrecursively. Any one of these methods can
be used to solve the simple connectivity problems posed in the last chapter.
In this section, we'll see how one program can be used to implement graph
traversal methods with quite different characteristics, merely by changing the
value of one variable. This method will be used to solve several graph problems
in the chapters which follow.
Consider the analogy of traveling through a maze. Any time we face a
choice of several vertices to visit, we can only go along a path t,o one of them,
so we must "save" the others to visit later. One way to implement a program
based on this idea is to remove the recursion from the recursive depth-first
algorithm given in the previous chapter. This would result in a program that
saves, on a stack, the point in the adjacency list of the vertex being visited
at which the search should resume after all vertices connected to previous
vertices on the adjacency list have been visited. Instead of examining this
implementation in more detail, we'll look at a more general framework which
encompasses several algorithms.
We begin by thinking of the vertices as being divided into three classes:
tree (or visited) vertices, those connected together by paths that we've traversed;
fringe vertices, those adjacent to tree vertices but not yet visited; and
unseen vertices, those that haven't been encountered at all yet. To search a
connected component of a graph systematically (implement the visit procedure
of the previous chapter), we begin with one vertex on the fringe, all others
unseen, and perform the following step until all vertices have been visited:
"move one vertex (call it X) from the fringe to the tree, and put any unseen
vertices adjacent to x on the fringe." Graph traversal methods differ in how
it is decided which vertex should be moved from the fringe to the tree.
For depth-first search, we always want to choose the vertex from the
fringe that was most recently encountered. This can be implemented by
always moving the first vertex on the fringe to the tree, then putting the
unseen vertices adjacent to that vertex (x) at the front of the fringe and
moving vertices adjacent to z which happen to be already on the fringe to the
front. (Note carefully that a completely different traversal method results if
we leave untouched the vertices adjacent to x which are already on the fringe.)
For example, consider the undirected graph given at the beginning of
this chapter. The following table shows the contents of the fringe each time
a vertex is moved to the tree; the corresponding search tree is shown at the
right:
394 CHAPTER 30
A
G
E
D
F
C
H
I
J
M
L
K
B
A
G B C F
E C H J L B F
D F C H J L B
F C H J L B
C H J L B
H J L B
I J L B
J L B
M L K B
L K B
K B
B
In this algorithm, the fringe essentially operates as a pushdown stack: we
remove a vertex (call it 5) from the beginning of the fringe, then go through
x's edge list, adding unseen vertices to the beginning of the fringe, and moving
fringe vertices to the beginning. Note that this is not strictly a stack, since
we use the additional operation of moving a vertex to the beginning. The
algorithm can be efficiently implemented by maintaining the fringe as a linked
list and keeping pointers into this list in an array indexed by vertex number:
we'll omit this implementation in favor of a program that can implement other
traversal strategies as well.
This gives a different depth-first search tree than the one drawn in the
biconnectivity section above because x's edges are put on the stack one at a
time, so their order is reversed when the stack is popped. The same tree as
for recursive depth-first search would result if we were to append all of z's
edge list on the front of the stack, then go through and delete from the stack
any other occurrences of vertices from x's edge list. The reader might find it
interesting to compare this process with the result of directly removing the
recursion from the visit procedure of the previous chapter.
By itself, the fringe table does not give enough information to reconstruct
the search tree. In order to actually build the tree, it is necessary to store,
with each node in the fringe table, the name of its father in the tree. This
is available when the node is entered into the table (it's the node that caused
the entry), it is changed whenever the node moves to the front of the fringe,
and it is needed when the node is removed from the table (to determine where
in the tree to attach it).
A second classic traversal method derives from maintaining the fringe
as a queue: always pick the least recently encountered vertex. This can be
maintained by putting the unseen vertices adjacent to x at the end of the fringe
COivNECTNITY 395
in the general strategy above. This method is called breadth-first search: first
we visit a node, then all the nodes adjacent to it, then all the nodes adjacent
to those nodes, etc. This leads to the following fringe table and search tree
for our example graph:
A
F
C
B
G
E
D
L
J
H
M
K
I
A
F C B G
C B G E
B G E D
G E D
E D L J
D L J H
L J H
J H M
H M K
M K I
K I
I
D
H
We remove a vertex (call it X) from the beginning of the fringe, then go
through z's edge list, putting unseen vertices at the end of the fringe. Again,
an efficient implementation is available using a linked list representation for
the fringe, but we'll omit this in favor of a more general method.
A fundamental feature of this general graph traversal strategy is that the
fringe is actually operating as a priority queue: the vertices can be assigned
priorities with the property that the "highest priority" vertex is the one moved
from the fringe to the tree. That is, we can directly use the priority queue
routines from Chapter 11 to implement a general graph searching program.
For the applications in the next chapter, it is convenient to assign the highest
priority to the lowest value, so we assume that the inequalities are switched
in the programs from Chapter 11. The following program is a "priority first
search" routine for a graph represented with adjacency lists (so it is most
appropriate for sparse graphs).
396 CHAPTER 30
procedure sparsepfs;
var now, k: integer;
t: link;
now:=O;
for k:=l to V do
begin vaI[k] :=unseen; dad[k] :=O end;
pqconstruct;
repeat
k:=pqremove;
if val[k]=unseen then
begin val[k] :=O; now:=now+l end
t:=adj[k];
while t<>z do
if val[tt.v]=unseen then now:=now+l;
if onpq(t1.v) and (val[tf.v]>priority) then
begin pqchange(tt.v, priority); dad[tt.v] :=k end;
t:=tf.next
end
until pqempty;
end;
(The functions onpq and pqempty are priority queue utility routines which
are easily implemented additions to the set of programs given in Chapter 11:
pqempty returns true if the priority queue is empty; onpq returns true if the
given vertex is currently on the priority queue.) Below and in Chapter 31, we'll
see how the substitution of various expressions for priority in this program
yields several classical graph traversal algorithms. Specifically, the program
operates as follows: first, we give all the vertices the sentinel value unseen
(which could be maxi&) and initialize the dad array, which is used to store
the search tree. Next we construct an indirect priority queue containing all
the vertices (this construction is trivial because the values are initially all the
same). In the terminology above, tree vertices are those which are not on
the priority queue, unseen vertices are those on the priority queue with value
unseen, and fringe vertices are the others on the priority queue. With these
conventions understood, the operation of the program is straightforward: it
repeatedly removes the highest priority vertex from the queue and puts it on
the tree, then updates the priorities of all fringe or unseen vertices connected
to that vertex.
If all vertices on the priority queue are unseen, then no vertex previously
CONNECTNlTY 397
encountered is connected to any vertex on the queue: that is, we're entering
a new connected component. This is automatically handled by the priority
queue mechanism, so there is no need for a separate visit procedure inside a
main procedure. But note that maintaining the proper value of now is more
complicated than for the recursive depth-first search program of the previous
chapter. The convention of this program is to leave the val entry unseen and
zero for the root of the depth-first search tree for each connected component:
it might be more convenient to set it to zero or some other value (for example,
now) for various applications.
Now, recall that now increases from 1 to V during the execution of the
algorithm so it can be used to assign unique priorities to the vertices. If we
change the two occurrences of priority in sparsepfs to V-now, we get depthfirst
search, because newly encountered nodes have the highest priority. If
we use now for priority we get breadth-first search, because old nodes have
the highest priority. These priority assignments make the priority queues
operate like stacks and queues as described above. (Of course, if we were only
interested in using depth-first or breadth-first search, we would use a direct
implementation for stacks or queues, not priority queues as in sparsepfs.)
In the next chapter, we'll see that other priority assignments lead to other
classical graph algorithms.
The running time for graph traversal when implemented in this way
depends on the method of implementing the priority queue. In general, we
have to do a priority queue operation for each edge and for each vertex, so the
worst case running time should be proportional to (E + V) log V if the priority
queue is implemented as a heap as indicated. However, we've already noted
that for both depth-first and breadth-first search we can take advantage of
the fact that each new priority is the highest or the lowest so far encountered
to get a running time proportional to E + V. Also, other priority queue
implementations might sometimes be appropriate: for example if the graph is
dense then we might as well simply keep the priority queue as an unordered
array. This gives a worst case running time proportional to E + V2 (or just
V2), since each edge simply requires setting or resetting a priority, but each
vertex now requires searching through the whole queue to find the highest
priority vertex. An implementation which works in this way is given in the
next chapter.
The difference between depth-first and breadth-first search is quite evident
when a large graph is considered. The diagram at left below shows the
edges and nodes visited when depth-first search is halfway through the maze
graph of the previous chapter starting at the upper left corner; the diagram
at right is the corresponding picture for breadth-first search:
398 CHAPTER 30
Tree nodes are blackened in these diagrams, fringe nodes are crossed, and
unseen nodes are blank. Depth-first search "explores" the graph by looking
for new vertices far away from the start point, taking closer vertices only when
dead ends are encountered; breadth-first search completely covers the area
close to the starting point, moving farther away only when everything close
has been looked at. Depth-first search is appropriate for one person looking
for something in a maze because the "next place to look" is always close by;
breadth-first search is more like a group of people looking for something by
fanning out in all directions.
Beyond these operational differences, it is interesting to reflect on the
fundamental differences in the implementations of these methods. Depth-first
search is very simply expressed recursively (because its underlying data structure
is a stack), and breadth-first search admits to a very simple nonrecursive
implementation (because its underlying data structure is a queue). But we've
seen that the true underlying data structure for graph algorithms is a priority
queue, and this admits a wealth of interesting properties to consider. Again,
we'll see more examples in the next chapter.
Union-Find Algorithms
In some applications we wish to know simply whether a vertex x is connected
to a vertex y in a graph; the actual path connecting them may not be
relevant. This problem has been carefully studied in recent years; some
efficient algorithms have been developed which are of independent interest
because they can also be used for processing sets (collections of objects).
Graphs correspond to sets of objects in a natural way, with vertices
corresponding to objects and edges have the meaning "is in the same set as."
Thus, the sample graph in the previous chapter corresponds to the sets {A B
C D E F G}, {H I} and {J K L M}. Eac h connected component corresponds
CONNECTIVITY 399
to a different set. For sets, we're interested in the fundamental question 3s
x in the same set as y?" This clearly corresponds to the fundamental graph
question "is vertex x connected to vertex y?"
Given a set of edges, we can build an adjacency list representation of the
corresponding graph and use depth-first search to assign to each vertex the
index of its connected component, so the questions of the form "is x connected
to y?" can be answered with just two array accesses and a comparison. The
extra twist in the methods that we consider here is that they are dynamic: they
can accept new edges arbitrarily intermixed with questions and answer the
questions correctly using the information received. From the correspondence
with the set problem, the addition of a new edge is called a union operation,
and the queries are called jind operations.
Our objective is to write a function which can check if two vertices x and
y are in the same set (or, in the graph representation, the same connected
component) and, if not, can put them in the same set (put an edge between
them in the graph). Instead of building a direct adjacency list or other
representation of the graph, we'll gain efficiency by using an internal structure
specifically oriented towards supporting the union and find operations. The
internal structure that we will use is a forest of trees, one for each connected
component. We need to be able to find out if two vertices belong to the same
tree and to be able to combine two trees to make one. It turns out that both
of these operations can be implemented efficiently.
To illustrate the way this algorithm works, we'll look at the forest constructed
when the edges from the sample graph that we've been using in
this chapter are processed in some arbitrary order. Initially, all nodes are in
separate trees. Then the edge AG causes a two node tree to be formed, with
A at the root. (This choice is arbitrary -we could equally well have put G at
the root.) The edges AB and AC add B and C to this tree in the same way,
leaving
The edges LM, JM, JL, and JK build a tree containing J, K, L, and M that
has a slightly different structure (note that JL doesn't contribute anything,
since LM and JM put L and J in the same component), and the edges ED,
FD, and HI build two more trees, leaving the forest:
400 CHAPTER 30
This forest indicates that the edges processed to this point describe a graph
with four connected components, or, equivalently, that the set union operations
processed to this point have led to four sets {A B C G}, {J K L M}, {D E
F} and {H I}. Now the edge FE doesn't contribute anything to the structure,
since F and E are in the same component, but the edge AE combines the
first two trees; then GC doesn't contribute anything, but GH and JG result
in everything being combined into one tree:
It must be emphasized that, unlike depth-first search trees, the only
relationship between these union-find trees and the underlying graph with
the given edges is that they divide the vertices into sets in the same way. For
example, there is no correspondence between the paths that connect nodes in
the trees and the paths that connect nodes in the graph.
We know that we can represent these trees in exactly the same way that
we represented graph search trees: we keep an array of integers dad [l ..v]
which contains, for each vertex, the index of its father (with a 0 entry for
nodes which are at the root of a tree). To find the father of a vertex j, we
simply set j : =dad b] , and to find the root of the tree to which j belongs, we
repeat this operation until reaching 0. The union and find operations are then
COiViVECTMTY 401
very easily implemented:
function find(x, y: integer; union: boolean): boolean;
var i, j: integer;
i:=x; while dad[i]>O do i:=dad[i];
j:=y; while dadb]>O do j:=dadlj];
if union and (i<>j) then dadb] :=i;
find:=(i<>j)
end ;
This function returns true if the two given vertices are in the same component.
In addition, if they are not in the same component and the union flag is set,
they are put into the same component. The method used is simple: Use the
dad array to get to the root of the tree containing each vertex, then check to
see if they are the same. To merge the tree rooted at j with the tree rooted
at i, we simply set dadb]= .I as shown in the following table:
A B C D E F G H I J K L M
AG: A
AB: A A
AC: AA A
LM: A A A L
JM: AA A J L
JL: AA A J L *
JK: AA A J J L
ED: A A E A J J L
FD: A A E F A J J L
HI: AAEF A H J J L
FE: AAEF A H J J L *
AF: A A E F A A H J J L
GE: A A E F A A H J J L *
GC: A A E F A A H J J L *
GH: A A E F A A A H J J L
J G : J A A E F A A A H J J L
L G : J A A E F A A A H J J L *
An asterisk at the right indicates that the vertices are already in the same
component at the time the edge is processed. As usual, we are assuming
402 CHAPTER 30
that we have available functions index and name to translate between vertex
names and integers between 1 and V each table entry is the name of
the corresponding dad array entry. Also, for example, the function call
f?nd(index(x), index(y), f a 1 se) would be used to test whether a vertex named
x is in the same component as a vertex named y (without introducing an edge
between them).
The algorithm described above has bad worst-case performance because
the trees formed could be degenerate. For example, taking the edges Al3 BC
CD DE EF FG GH HI IJ . . . YZ in that order will produce a long chain
with Z pointing to Y, Y pointing to X, etc. This kind of structure takes time
proportional to V2 to build, and has time proportional to V for an average
equivalence test.
Several methods have been suggested to deal with this problem. One
natural method, which may have already occurred to the reader, is to try to
do the "right" thing when merging two trees rather than arbitrarily setting
dadIj]=i. When a tree rooted at i is to be merged with a tree rooted at j, one
of the nodes must remain a root and the other (and all its descendants) must
go one level down in the tree. To minimize the distance to the root for the
most nodes, it makes sense to take as the root the node with more descendant,s.
This idea, called weight balancing, is easily implemented by maintaining the
size of each tree (number of descendants of the root) in the dad array entry
for each root node, encoded as a nonpositive number so that the root node
can be detected when traveling up the tree in find.
Ideally, we would like every node to point directly to the root of its tree.
No matter what strategy we use, achieving this ideal would require examining
at least all the nodes in the smaller of the two trees to be merged, and this
could be quite a lot compared to the relatively few nodes on the path to the
root that find usually examines. But we can approach the ideal by making
all the nodes that we do examine point to the root! This seems like a drastic
step at first blush, but it is relatively easy to do, and it must be remembered
that there is nothing sacrosanct about the structure of these trees: if they
can be modified to make the algorithm more efficient, we should do so. This
method, called path compression, is easily implemented by making another
pass through each tree after the root has been found, and setting the dad
entry of each vertex encountered along the way to point to the root.
The combination of weight balancing and path compression ensures that
the algorithms will run very quickly. The following implementation shows
that the extra code involved is a small price to pay to guard against degenerate
cases.
CONNECTIVITY 403
function fastfind(x, y: integer; union: boolean): boolean;
var i, j, t: integer;
i:=x; while dad[i]>O do i:=dad[i];
j:=y; while dadlj]>0 do j:=dadb];
while dad [x] > 0 do
begin t:=x; x:=dad[x]; dad[t]:=i end;
while dad [y] > 0 do
begin t:=y; y:=dad[y]; dad[t]:=j end;
if union and (i<>j) then
if dad b] <dad [i]
then begin dadlj]:=dadb]+dad[i]-I; dad[i]:=j end
else begin dad[i]:=dad[i]+dadb]-1; dadlj]:=i end;
fastfind:=(i<>j)
end ;
The dad array is assumed to be initialized to 0. (We'll assume in later chapters
that this is done in a separate procedure findinit.) For our sample set of edges,
this program builds the following trees from the first ten edges:
The trees containing DEF and JKLM are flatter than before because of
path compression. Now FE doesn't contribute anything, as before, then AF
combines the first two trees (with A at the root, since its tree is larger), the
GC doesn't contribute anything, then GH puts the HI tree below A, then JG
puts the JKLM tree below A, resulting in the following tree:
The following
constructed:
table gives the contents of the dad array as this forest is
404 CHAPTER 30
A B C D E F G H I J K L M
A G : 100000A000000
A B : 2AOOOOA000000
A C : 3 A A O O O A O O O O O 0
LM: 3 A A 0 0 O A 0 0 0 0 1 L
JM: 3 A A 0 0 O A 0 O L 0 2 L
JL: 3AAOOOAOOLO2L*
J K : 3AAOOOAOOLL3L
E D : 3AAElOAOOLL3L
F D : 3AAE2EAOOLL3L
HI: 3 A A E 2 E A 1HLL 3 L
F E : 3AAE2EAlHLL3L*
A F : GAAEAEAlHLL 3 L
G E : GAAEAEAlHLL 3 L *
G C : GAAEAEAlHLL 3 L *
G H : 8AAEAEAAHLL3L
JG:12AAEAEAAHLLAL
LG:12AAEAEAAHLLAL *
For clarity in this table, each positive entry i is replaced by the ith letter of the
alphabet (the name of the father), and each negative entry is complemented
to give a positive integer (the weight of the tree).
Several other techniques have been developed to avoid degenerate structures.
For example, path compression has the disadvantage that it requires
another pass up through the tree. Another technique, called halving, is to
make each node point to its granddad on the way up the tree. Still another
technique, splitting, is like halving, but is applied only to every other node
on the search path. Either of these can be used in combination with weight
balancing or with height balancing, which is similar but uses tree height instead
of tree size to decide which way to merge trees.
How is one to choose from among all these methods? And exactly how
"flat" are the trees produced? Analysis for this problem is quite difficult
because the performance depends not only on the V and E parameters, but
also on the number of find operations and, what's worse, on the order in which
the union and find operations appear. Unlike sorting, where the actual files
that appear in practice are quite often close to "random," it's hard to see
how to model graphs and request patterns that might appear in practice. For
this reason, algorithms which do well in the worst case are normally preferred
for union-find (and other graph algorithms), though this may be an overly
ccnservative approach.
CONNECTMTY 405
Even if only the worst case is being considered, the analysis of union-find
algorithms is extremely complex and intricate. This can be seen even from the
nature of the results, which do give us clear indications of how the algorithms
will perform in a practical situation. If either weight balancing or height
balancing is used in combination with either path compression, halving, or
splitting, then the total number of operations required to build up a structure
with E edges is proportional to Es(E), where a(E) is a function that is so
slowly growing that o(E) < 4 unless E is so large that taking lg E, then
taking lg of the result, then taking lg of that result, and continuing 16 times
still gives a number bigger than 1. This is a stunningly large number; for all
practical purposes, it is safe to assume that the average amount of time to
execute each union and find operation is constant. This result is due to R. E.
Tarjan, who further showed that no algorithm for this problem (from a certain
general class) can do better that E&(E), so that this function is intrinsic to
the problem.
An important practical application of union-find algorithms is that they
can be used to determine whether a graph with V vertices and E edges is
connected in space proportional to V (and almost linear time). This is an
advantage over depth-first search in some situations: here we don't need to
ever store the edges. Thus connectivity for a graph with thousands of vertices
and millions of edges can be determined with one quick pass through the
edges.
406
Exercises
1. Give the articulation points and the biconnected components of the graph
formed by deleting GJ and adding IK to our sample graph.
2. Write a program to print out the biconnected components of a graph.
3. Give adjacency lists for one graph where breadth-first search would find a
cycle before depth first search would, and another graph where depth-first
search would find the cycle first.
4. Draw the search tree that results if, in the depth-first search we ignore
nodes already on the fringe (as in breadth-first search).
5. Draw the search tree that results if, in the breadth-first search we change
the priority of nodes already on the fringe (as in depth-first search).
6. Draw the union-find forest constructed for the example in the text, but
assuming that find is changed to set a[i]=j rather than a b]=i.
7. Solve the previous problem, assuming further that path compression is
used.
8. Draw the union-find forests constructed for the edges AB BC CD DE EF
. . . YZ, assuming first that weight balancing without path compression is
used, then that path compression without weight balancing is used, then
that both are used.
9. Implement the union-find variants described in the text, and empirically
determine their comparative performance for 1000 union operations with
both arguments random integers between 1 and 100.
10. Write a program to generate a random connected graph on V vertices
by generating random pairs of integers between 1 and V. Estimate how
many edges are needed to produce a connected graph as a function of V.
3 1. Weighted Graphs
It is often necessary to model practical problems using graphs in which
weights or costs are associated with each edge. In an airline map where
edges represent flight routes, these weights might represent distances or fares.
In an electric circuit where edges represent wires, the length or cost of the wire
are natural weights to use. In a job-scheduling chart, weights could represent
time or cost of performing tasks or of waiting for tasks to be performed.
Questions entailing minimizing costs naturally arise for such situations.
In this chapter, we'll examine algorithms for two such problems in detail: "find
the lowest-cost way to connect all of the points," and "find the lowest-cost
path between two given points." The first, which is obviously useful for graphs
representing something like an electric circuit, is called the minimzlm spanning
tree problem; the second, which is obviously useful for graphs representing
something like an airline route map, is called the shortest path problem. These
problems are representative of a variety of problems that arise on weighted
graphs.
Our algorithms involve searching through the graph, and sometimes our
intuition is supported by thinking of the weights as distances: we speak of "the
closest vertex to 5," etc. In fact, this bias is built into the nomenclature for
the shortest path problem. Despite this, it is important to remember that the
weights need not be proportional to any distance at all; they might represent
time or cost or something else entirely different. When the weights actually
do represent distances, other algorithms may be appropriate. This issue is
discussed in further detail at the end of the chapter.
A typical weighted undirected graph is diagramed below, with edges
comprising a minimum spanning tree drawn with double lines. Note that
the shortest paths in the graph do not necessarily use edges of the minimum
spanning tree: for example, the shortest path from vertex A to vertex G is
AF'EG.
407
408 CHAPTER 31
It is obvious how to represent weighted graphs: in the adjacency matrix
representation, the matrix can contain edge weights rather than boolean
values, and in the adjacency structure representation, each list element (which
represents an edge) can contain a weight.
We'll start by assuming that all of the weights are positive. Some of
the algorithms can be adapted to handle negative weights, but they become
significantly more complicated. In other cases, negative weights change the
nature of the problem in an essential way, and require far more sophisticated
algorithms than those considered here. For an example of the type of difficulty
that can arise, suppose that we have a situation where the sum of the weights
of the edges around a cycle is negative: an infinitely short path could be
generated by simply spinning around the cycle.
Minimum Spanning Tree
A minimum spanning tree of a weighted graph is a collection of edges that
connects all the vertices such that the sum of the weights of the edges is at
least as small as the sum of the weights of any other collection of edges that
connects all the vertices. The minimum spanning tree need not be unique: for
example, the following diagram shows three other minimum spanning trees
for our sample graph.
WEIGHTED GRAPHS 409
It's easy to prove that the "collection of edges" referred to in the definition
above must form a spanning tree: if there's any cycle, some edge in the cycle
can be deleted to give a collection of edges which still connects the vertices
but has a smaller weight.
We've seen in previous chapters that many graph traversal procedures
compute a spanning tree for the graph. How can we arrange things for a
weighted graph so that the tree computed is the one with the lowest total
weight? The answer is simple: always visit next the vertex which can be
connected to the tree using the edge of lowest weight. The following sequence
of diagrams illustrates the sequence in which the edges are visited when this
strategy is used for our example graph.
The implementation of this strategy is a trivial application of the priority
graph search procedure in the previous chapter: we simply add a weight field
to the edge record (and modify the input code to read in weights as well),
then use tt.weight for priority in that program. Thus we always visit next the
vertex in the fringe which is closest to the tree. The traversal is diagramed as
above for comparison with a completely different method that we'll examine
below; we can also redraw the graph in our standard search tree format:
410 CHAPTER 31
This method is based on the following fundamental property of minimum
spanning trees: "Given any division of the vertices of a graph into two sets,
the minimum spanning tree contains the shortest of the edges connecting a
vertex in one of the sets to a vertex in the other set." For example if we
divide the vertices into the sets ABCD and EFG in our sample graph, this
says that DF must be in any minimum spanning tree. This is easy to prove by
contradiction. Call the shortest edge connecting the two sets s, and assume
that s is not in the minimum spanning tree. Then consider the graph formed
by adding s to the purported minimum spanning tree. This graph has a
cycle; furthermore, that cycle must have some other edge besides s connecting
the two sets. Deleting this edge and adding s gives a shorter spanning tree,
contradicting the assumption that s is not in the minimum spanning tree.
When we use priority-first searching, the two sets of nodes in question are
the visited nodes and the unvisited ones. At each step, we pick the shortest
edge from a visited node to a fringe node (there are no edges from visited
nodes to unseen nodes). By the property above every edge that is picked is
on the minimum spanning tree.
As described in the previous chapter, the priority graph traversal alge
rithm has a worst-case running time proportional to (E + V)logV, though
a different implementation of the priority queue can give a V2 algorithm,
which is appropriate for dense graphs. Later in this chapter, we'll examine
this implementation of the priority graph traversal for dense graphs in full
detail. For minimum spanning trees, this reduces to a method discovered by
R. Prim in 1956 (and independently by E. Dijkstra soon thereafter). Though
the methods are the same in essence (just the graph representation and implementation
of priority queues differ), we'll refer to the sparsepfs program of
the previous chapter with priority replaced by tf.weight as the "priority-first
search solution" to the minimum spanning tree problem and the adjacency
matrix version given later in this chapter (for dense graphs) as "Prim's alWEIGHTED
GRAPHS 411
gorithm." Note that Prim's algorithm takes time proportional to V2 even
for sparse graphs (a factor of about V2/E 1ogV slower than the priority-first
search solution, and that the priority-first search solution is a factor of 1ogV
slower than Prim's algorithm for dense graphs.
A completely different approach to finding the minimum spanning tree is
to simply add edges one at a time, at each step using the shortest edge that
does not form a cycle. This algorithm gradually builds up the tree one edge at
a time from disconnected components, as illustrated in the following sequence
of diagrams for our sample graph:
D
8F'
The correctness of this algorithm also follows from the general property of
minimum spanning trees that is proved above.
The code for this method can be pieced together from programs that
we've already seen. A priority queue is obviously the data structure to use to
consider the edges in order of their weight, and the job of testing for cycles can
be obviously done with union-find structures. The appropriate data structure
to use for the graph is simply an array edge with one entry for each edge. The
indirect priority queue procedures pqconstruct and pqremove from Chapter 11
can be used to maintain the priority queue, using the weight fields in the edge
array for priorities. Also, the program uses the findinit and fastfind procedures
from Chapter 30. The program simply prints out the edges which comprise
the spanning tree; with slightly more work a dad array or other representation
could be computed:
412 CHAPTER 31
program kruskaI(input, output);
const maxV=50; maxE=2500;
type edge=record x, y, weight: integer end;
var i, j, m, x, y, V, E: integer;
edges: array [O..maxE] of edge;
readln (V, E) ;
forj:=l toEdo
readln (c, d, edges/j] . weight) ;
edgesb].x:=index(c);
edgesb].y:=index(d);
end ;
findinit; pqconstruct; i:=O;
repeat
m:=pqremove; x:=edges[m].x; y:=edges[m].y;
if not fastfind(x, y, true) then
writeln(name(x), name(y), edges[m].weight);
i:=i+l
end
until i=V-I;
end.
The running time of this program is dominated by the time spent processing
edges in the priority queue. Suppose that the graph consists of two clusters of
vertices all connected together by very short edges, and only one edge which
is very long connecting the two clusters. Then the longest edge in the graph is
in the minimum spanning tree, but it will be the last edge out of the priority
queue. This shows that the running time could be proportional to ElogE in
the worst case, although we might expect it to be much smaller for typical
graphs (though it always takes time proportional to E to build the priority
queue initially).
An alternate implementation of the same strategy is to sort the edges by
weight initially, then simply process them in order. Also, the cycle testing
can be done in time proportional to Elog E with a much simpler strategy
than union-find, to give a minimum spanning tree algorithm that always takes
E log E steps. This method was proposed by J. Kruskal in 1956, even earlier
than Prim's algorithm. We'll refer to the modernized version above, which
uses priority queues and union-find structures, as "Kruskal's algorithm."
The performance characteristics of these three methods indicate that the
WEIGHTED GRAPHS 413
priority-first search method will be faster for some graphs, Prim's for some
others, Kruskal's for still others. As mentioned above, the worst case for the
priority-first search method is (E + V)logV while the worst case for Prim's
is V2 and the worst case for Kruskal's is Elog E. But it is unwise to choose
between the algorithms on the basis of these formulas because "worstrcase"
graphs are unlikely to occur in practice. In fact, the priority-first search
method and Kruskal's method are both likely to run in time proportional to
E for graphs that arise in practice: the first because most edges do not really
require a priority queue adjustment that takes 1ogV steps and the second
because the longest edge in the minimum spanning tree is probably sufficiently
short that not many edges are taken off the priority queue. Of course, Prim's
method also runs in time proportional to about E for dense graphs (but it
shouldn't be used for sparse graphs).
Shortest Path
The shortest path problem is to find the path in a weighted graph connecting
two given vertices x and y with the property that the sum of the weights of
all the edges is minimized over all such paths.
If the weights are all 1, then the problem is still interesting: it is to find
the path containing the minimum number of edges which connects x and y.
Moreover, we've already considered an algorithm which solves the problem:
breadth-first search. It is easy to prove by induction that breadth-first search
starting at x will first visit all vertices which can be reached from z with 1
edge, then al: vertices which can be reached from x with 2 edges, etc., visiting
all vertices which can be reached with k edges before encountering any that
require k + 1 edges. Thus, when y is first encountered, the shortest path from
x has been found (because no shorter paths reached y).
In general, the path from z to y could touch all the vertices, so we usually
consider the problem of finding the shortest paths connecting a given vertex
x with each of the other vertices in the graph. Again, it turns out that the
problem is simple to solve with the priority graph traversal algorithm of the
previous chapter.
If we draw the shortest path from x to each other vertex in the graph,
then we clearly get no cycles, and we have a spanning tree. Each vertex leads
to a different spanning tree; for example, the following three diagrams show
the shortest path spanning trees for vertices A, B, and E in the example graph
that we've been using.
414 CHAPTER 31
The priority-first search solution to this problem is very similar to the
solution for the minimum spanning tree: we build the tree for vertex z by
adding, at each step, the vertex on the fringe which is closest to z (before,
we added the one closest to the tree). To find which fringe vertex is closest
to CC, we use the val array: for each tree vertex k, val[k] will be the distance
from that vertex to z, using the shortest path (which must be comprised of
tree nodes). When k is added to the tree, we update the fringe by going
through k's adjacency list. For each node t on the list, the shortest distance
to z through k from tf.v is val[k]+tf.weight. Thus, the algorithm is trivially
implemented by using this quantity for priority in the priority graph traversal
program. The following sequence of diagrams shows the construction of the
shortest path search tree for vertex A in our example.
1 b-0 B'C
A 1
kB'C
2 2
D
0F
WEIGHTED GRAPHS 415
First we visit the closest vertex to A, which is B. Then both C and F are
distance 2 from A, so we visit them next (in whatever order the priority queue
returns them). Then D can be attached at F or at B to get a path of distance
3 to A. (The algorithm attaches it to B because it was put on the tree before
F, so D was already on the fringe when F was put on the tree and F didn't
provide a shorter path to A.) Finally, E and G are visited. As usual, the tree
is represented by the dad array of father links. The following table shows the
array computed by the priority graph traversal procedure for our example:
A B C D E F G
dad: A B B F A E
val: 0 1 2 3 4 2 5
Thus the shortest path from A to G has a total weight of 5 (found in the val
entry for G) and goes from A to F to E to G (found by tracing backwards
in the dad array, starting at G). Note that the correct operation of this
program depends on the val entry for the root being zero, the convention that
we adopted for sparsepfs.
As before, the priority graph traversal algorithm has a worst-case running
time proportional to (E + V) log V, though a different implementation of the
priority queue can give a V2 algorithm, which is appropriate for dense graphs.
Below, we'll examine this implementation of priority graph traversal for dense
graphs in full detail. For the shortest path problem, this reduces to a method
discovered by E. Dijkstra in 1956. Though the methods are the same in
essence, we'll refer to the sparsepfs program of the previous chapter with
priority replaced by val [k] + tt . weight as the "priority-first search solution" to
the shortest paths problem and the adjacency matrix version given below as
"Dijkstra's algorithm."
Dense Graphs
As we've discussed, when a graph is represented with a adjacency matrix, it is
best to use an unordered array representation for the priority queue in order
to achieve a V2 running time for any priority graph traversal algorithm. That
is, this provides a linear algorithm for the priority first search (and thus the
minimum spanning tree and shortest path problems) for dense graphs.
Specifically, we maintain the priority queue in the val array just as in
sparsepfs but we implement the priority queue operations directly rather than
using heaps. First, we adopt the convention that the priority values in the
val array will be negated, so that the sign of a val entry tells whether the
corresponding vertex is on the tree or the priority queue. To change the
416 CRAPTER 31
priority of a vertex, we simply assign the new priority to the vaJ entry for that
vertex. To remove the highest priority vertex, we simply scan through the
vaJ array to find the vertex with the largest negative (closest to 0) vaJ value
(then complement its vaJ entry). After making these mechanical changes to
the sparsepfs program of the previous chapter, we are left with the following
compact program.
procedure densepfs;
var k, min, t: integer;
for k:=l to Vdo
begin vaJ[k]:=-unseen; dad[k]:=O end;
vaJ[O]:=-(unseen+l);
min:=l;
repeat
k:=min; vaJ[k]:=-vaJ[k]; min:=O;
if vaJ[k]=unseen then vaJ[k] :=O;
for t:=l to Vdo
if vaJ[t]<O then
if (a[k, t]<>O) and (vaJ[t]<-priority) then
begin vaJ[t] :=-p riority; dad [ t] :=k end;
if vaJ[t]>vaJ[min] then min:=t;
end
until min=O;
end ;
Note that, the loop to update the priorities and the loop to find the minimum
are combined: each time we remove a vertex from the fringe, we pass through
all the vertices, updating their priority if necessary, and keeping track of the
minimum value found. (Also, note that unseen must be slightly less than
maxint since a value one higher is used as a sentinel to find the minimum,
and the negative of this value must be representable.)
If we use a[k, t] for priority in this program, we get Prim's algorithm
for finding the minimum spanning tree; if we use vaJ[k]+a[k, t] for priority
we get Dijkstra's algorithm for the shortest path problem. As in Chapter
30, if we include the code to maintain now as the number of vertices so far
searched and use V-now for priority, we get depth-first search; if we use now
we get breadth-first search. This program differs from the sparsepfs program
of Chapter 30 only in the graph representation used (adjacency matrix instead
of adjacency list) and the priority queue implementation (unordered array
WEIGHTED GRAPHS 417
instead of indirect heap). These changes yield a worst-case running time
proportional to V2, as opposed to (E + V)logV for sparsepfs. That is, the
running time is linear for dense graphs (when E is proportional to V2), but
sparsepfs is likely to be much faster for sparse graphs.
Geometric Problems
Suppose that we are given N points in the plane and we want to find the
shortest set of lines connecting all the points. This is a geometric problem,
called the Euclidean minimum spanning tree problem. It can be solved using
the graph algorithm given above, but it seems clear that the geometry
provides enough extra structure to allow much more efficient algorithms to be
developed.
The way to solve the Euclidean problem using the algorithm given above
is to build a complete graph with N vertices and N(N - 1)/2 edges, one
edge connecting each pair of vertices weighted with the distance between the
corresponding points. Then the minimum spanning tree can be found with
the algorithm above for dense graphs in time proportional to N2.
It has been proven that it, is possible to do better. The point is that the
geometric structure makes most of the edges in the complete graph irrelevant
to the problem, and we can eliminate most of the edges before even starting
to construct the minimum spanning tree. In fact, it has been proven that
the minimum spanning tree is a subset of the graph derived by taking only
the edges from the dual of the Voronoi diagram (see Chapter 28). We know
that this graph has a number of edges proportional to N, and both Kruskal's
algorithm and the priority-first search method work efficiently on such sparse
graphs. In principle, then, we could compute the Voronoi dual (which takes
time proportional to Nlog N), then run either Kruskal's algorithm or the
priority-first search method to get a Euclidean minimum spanning tree algorithm
which runs in time proportional to N log N. But writing a program
to compute the Voronoi dual is quite a challenge even for an experienced
programmer.
Another approach which can be used for random point sets is to take
advantage of the distribution of the points to limit the number of edges
included in the graph, as in the grid method used in Chapter 26 for range
searching. If we divide up the plane into squares such that each square
is likely to contain about 5 points, and then include in the graph only the
edges connecting each point to the points in the neighboring squares, then we
are very likely (though not guaranteed) to get all the edges in the minimum
spanning tree, which would mean that Kruskal's algorithm or the priority-first
search method would efficiently finish the job.
It is interesting to reflect on the relationship between graph and geometric
algorithms brought out by the problem posed in the previous paragraphs. It
CHAPTER 31
is certainly true that many problems can be formulated either as geometric
problems or as graph problems. If the actual physical placement of objects
is a dominating characteristic, then the geometric algorithms of the previous
section may be appropriate, but if interconnections between objects are of
fundamental importance, then the graph algorithms of this section may be
better. The Euclidean minimum spanning tree seems to fall at the interface
between these two approaches (the input involves geometry and the output
involves interconnections) and the development of simple, straightforward
methods for this and related problems remains an important though elusive
goal.
Another example of the interaction between geometric and graph algorithms
is the problem of finding the shortest path from z to y in a graph
whose vertices are points in the plane and whose edges are lines connecting
the points. For example, the maze graph at the end of Chapter 29 might be
viewed as such a graph. The solution to this problem is simple: use priority
first searching, setting the priority of each fringe vertex encountered to the
distance in the tree from z to the fringe vertex (as in the algorithm given) plus
the Euclidean distance from the fringe vertex to y. Then we stop when y is
added to the tree. This method will very quickly find the shortest path from
x to y by always going towards y, while the standard graph algorithm has to
"search" for y. Going from one corner to another of a large maze graph like
that one at the end of Chapter 29 might require examining a number of nodes
proportional to @, while the standard algorithm has to examine virtually
all the nodes.
WEIGHTED GRAPHS 419
Exercises
1. Give another minimum spanning tree for the example graph at the beginning
of the chapter.
2. Give an algorithm to find the minimum spanning forest of a connected
graph (each vertex must be touched by some edge, but the resulting graph
doesn't have to be connected).
3. Is there a graph with V vertices and E edges for which the priority-first
solution to the minimum spanning tree problem algorithm could require
time proportional to (E + V) log V? Give an example or explain your
answer.
4. Suppose we maintained the priority queue as a sorted list in the general
graph traversal implementations. What would be the worst-case running
time, to within a constant factor? When would this method be
appropriate, if at all?
5. Give counterexamples which show why the following "greedy" strategy
doesn't work for either the shortest path or the minimum spanning tree
problems: "at each step visit the unvisited vertex closest to the one just
visited."
6. Give the shortest path trees for the other nodes in the example graph.
7. Find the shortest path from the upper right-hand corner to the lower
left-hand corner in the maze graph of Chapter 29, assuming all edges
have weight 1.
8. Write a program to generate random connected graphs with V vertices,
then find the minimum spanning tree and shortest path tree for some
vertex. Use random weights between 1 and V. How do the weights of the
trees compare for different values of V?
9. Write a program to generate random complete weighted graphs with V
vertices by simply filling in an adjacency matrix with random numbers between
1 and V. Run empirical tests to determine which method finds the
minimum spanning tree faster for V = 10, 25, 100: Prim's or Kruskal's.
10. Give a counterexample to show why the following method for finding the
Euclidean minimum spanning tree doesn't work: "Sort the points on their
x coordinates, then find the minimum spanning trees of the first half and
the second half, then find the shortest edge that connects them."
32. Directed Graphs
Directed graphs are graphs in which edges connecting nodes are oneway;
this added structure makes it more difficult to determine various
properties. Processing such graphs is akin to traveling around in a city with
many one-way streets or to traveling around in a country where airlines rarely
run round-trip routes: getting from one point to another in such situations
can be a challenge indeed.
Often the edge direction reflects some type of precedence relationship in
the application being modeled. For example, a directed graph might be used
to model a manufacturing line, with nodes corresponding to jobs to be done
and with an edge from node x to node y if the job corresponding to node x
must be done before the job corresponding to node y. How do we decide when
to perform each of the jobs so that none of these precedence relationships are
violated?
In this chapter, we'll look at depth-first search for directed graphs, as well
as algorithms for computing the transitive closure (which summarizes connectivity
information) and for topological sorting and for computing strongly
connected components (which have to do with precedence relationships).
As mentioned in Chapter 29, representations for directed graphs are
simple extensions of representations for undirected graphs. In the adjacency
list representation, each edge appears only once: the edge from z to y is
represented as a list node containing y in the linked list corresponding to x.
In the adjacency matrix representation, we need to maintain a full V-by-V
matrix, with a 1 bit in row x and column y (but not necessarily in row y and
column Z) if there is an edge from x to y.
A directed graph similar to the undirected graph that we've been considering
is drawn below. This graph consists of the edges AG AI3 CA LM JM
JLJKEDDFHIFEAFGEGCHGGJLGMML.
421
422 CHAPTER 32
The order in which the edges appear is now significant: the notation AG
describes an edge which points from A to G, but not from G to A. But it is
possible to have two edges between two nodes, one in either direction (we have
both HI and IH and both LM and ML in the above graph).
Note that, in these representations, no difference could be perceived
between an undirected graph and a directed graph with two opposite directed
edges for each edge in the undirected graph. Thus, some of algorithms in this
chapter can be considered generalizations of algorithms in previous chapters.
Depth-First Search
The depth-first search algorithm of Chapter 29 works properly for directed
graphs exactly as given. In fact, its operation is a little more straightforward
than for undirected graphs because we don't have to be concerned with double
edges between nodes unless they're explicitly included in the graph. However,
the search trees have a somewhat more complicated structure. For example,
the following depth-first search structure describes the operation of the recursive
algorithm of Chapter 29 on our sample graph.
As before, this is a redrawn version of the graph, with solid edges correspondDIRECTED
GRAPHS
ing to those edges that were actually used to visit vertices via recursive calls
and dotted edges corresponding to those edges pointing to vertices that had
already been visited at the time the edge was considered. The nodes are
visited in the order A F E D B G J K L M C H I.
Note that the directions on the edges make this depth-first search forest
quite different from the depth-first search forests that we saw for undirected
graphs. For example, even though the original graph was connected, the
depth-first search structure defined by the solid edges is not connected: it is
a forest, not a tree.
For undirected graphs, we had only one kind of dotted edge, one that
connected a vertex with some ancestor in the tree. For directed graphs, there
are three kinds of dotted edges: up edges, which point from a vertex to some
ancestor in the tree, down edges, which point from a vertex to some descendant
in the tree, and cross edges, which point from a vertex to some vertex which
is neither a descendant nor an ancestor in the tree.
As with undirected graphs, we're interested in connectivity properties of
directed graphs. We would like to be able to answer questions like "Is there
a directed path from vertex x to vertex y (a path which only follows edges in
the indicated direction)?" and "Which vertices can we get to from vertex x
with a directed path?" and "Is there a directed path from vertex x to vertex
y and a directed path from y to x.7" Just as with undirected graphs, we'll be
able to answer such questions by appropriately modifying the basic depth-first
search algorithm, though the various different types of dotted edges make the
modifications somewhat more complicated.
Transitive Closure
In undirected graphs, simple connectivity gives the vertices that can be reached
from a given vertex by traversing edges from the graph: they are all those in
the same connected component. Similarly, for directed graphs, we're often
interested in the set of vertices which can be reached from a given vertex by
traversing edges from the graph in the indicated direction.
It is easy to prove that the recursive visit procedure from the depth-first
search method in Chapter 29 visits all the nodes that can be reached from the
start node. Thus, if we modify that procedure to print out the nodes that it is
visiting (say, by inserting write(name(k)) just upon entering), we are printing
out all the nodes that can be reached from the start node. But note carefully
that it is not necessarily true that each tree in the depth-first search forest
contains all the nodes that can be reached from the root of that tree (in our
example, all the nodes in the graph can be reached from H, not just I). To
get all the nodes that can be visited from each node, we simply call visit V
times, once for each node:
424 CHAPTER 32
for k:=l to Vdo
now:=&
for j:=1 to V do vaIli] :=O;
visit(k);
wri teln
end ;
This program produces the following output for our sample graph:
A F E D B G J K L M C
B
C A F E D B G J K L M
D F E
E D F
F E D
G J K L M C A F E D B
H G J K L M C A F E D B I
I H G J K L M C A F E D B
J K L G C A F E D B M
K
L G J K M C A F E D B
M L G J K C A F E D B
For undirected graphs, this computation would produce a table with the
property that each line corresponding to the nodes in a connected component
lists all the nodes in that component. The table above has a similar property:
certain of the lines list identical sets of nodes. Below we shall examine the
generalization of connectedness that explains this property.
As usual, we could add code to do extra processing rather than just
writing out the table. One operation we might want to perform is to add an
edge directly from 3: to y if there is some way to get from z to y. The graph
which results from adding all edges of this form to a directed graph is called
the transitive closure of the graph. Normally, a large number of edges will be
added and the transitive closure is likely to be dense, so an adjacency matrix
representation is called for. This is an analogue to connected components in
an undirected graph; once we've performed this computation once, then we
can quickly answer questions like '5s there a way to get from x to y?"
Using depth-first search to compute the transitive closure requires V3
steps in the worst case, since we may have to examine every bit of the
DIRECTED GRAPHS 425
adjacency matrix for the depth-first search from each vertex. There is a
remarkably simple nonrecursive program for computing the transitive closure
of a graph represented with an adjacency matrix:
for y:=l to V do
for x:=1 to V do
if a[x, y] then
for j:=l to Vdo
if a[y, j] then a[x, j]:=true;
S. Warshall invented this method in 1962, using the simple observation that
"if there's a way to get from node x to node y and a way to get from node y to
node j then there's a way to get from node x to node j." The trick is to make
this observation a little stronger, so that the computation can be done in only
one pass through the matrix, to wit: "if there's a way to get from node x to
node y using only nodes with indices less than x and a way to get from node
y to node j then there's a way to get from. node x to node j using only nodes
with indices less than x+1." The above program is a direct implementation
of this.
Warshall's method converts the adjacency matrix for our sample graph,
given at left in the table below, into the adjacency matrix for its transitive
closure, given at the right:
A B C D E F G H I J K L M
A 1 1 0 0 0 1 1 0 0 0 0 0 0
B O l O O O O O O O O O O O
c1010000000000
DOOOlOlOOOOOOO
EOOOllOOOOOOOO
FOOOOllOOOOOOO
GOOlOlOlOOlOOO
HOOOOOOlllOOOO
1 0 0 0 0 0 0 0 1 1 0 0 0 0
JOOOOOOOOOllll
K O O O O O O O O O O l O O
L0000001000011
MOOOOOOOOOOOll
A B C D E F G H I J K L M
A 1 1 1 1 1 1 1 0 0 1 1 1 1
B O l O O O O O O O O O O O
c1111111001111
DOOOlllOOOOOOO
EOOOlllOOOOOOO
FOOOlllOOOOOOO
GlllllllOOllll
H l l l l l l l l l l l l l
1 1 1 1 1 1 1 1 1 1 1 1 1 1
JlllllllOOllll
K O O O O O O O O O O l O O
L1111111001111
MlllllllOOllll
426 CHAPTER 32
For very large graphs, this computation can be organized so that the
operations on bits can be done a computer word at a time, which will lead to
significant savings in many environments. (As we've seen, it is not intended
that such optimizations be tried with Pascal.)
Topological Sorting
For many applications involving directed graphs, cyclic graphs do arise. If,
however, the graph above modeled a manufacturing line, then it would imply,
say, that job A must be done before job G, which must be done before job
C, which must be done before job A. But such a situation is inconsistent:
for this and many other applications, directed graphs with no directed cycles
(cycles with all edges pointing the same way) are called for. Such graphs are
called directed acyclic graphs, or just dags for short. Dags may have many
cycles if the directions on the edges are not taken into account; their defining
property is simply that one should never get in a cycle by following edges in
the indicated direction. A dag similar to the directed graph above, with a
few edges removed or directions switched in order to remove cycles, is given
below.
The edge list for this graph is the same as for the connected graph of Chapter
30, but here, again, the order in which the vertices are given when the edge
is specified makes a difference.
Dags really are quite different objects from general directed graphs: in
a sense, they are part tree, part graph. We can certainly take advantage of
their special structure when processing them. Viewed from any vertex, a dag
looks like a tree; put another way, the depth-first search forest for a dag has
no up edges. For example, the following depth-first search forest describes
the operation of dfs on the example dag above.
DIRECTED GRAPHS 427
A fundamental operation on dags is to process the vertices of the graph
in such an order that no vertex is processed before any vertex that points
to it. For example, the nodes in the above graph could be processed in the
following order:
J K L M A G H I F E D B C
If edges were to be drawn with the vertices in these positions, all the edges
would go from left to right. As mentioned above, this has obvious application,
for example, to graphs which represent manufacturing processes, for it gives a
specific way to proceed within the constraints represented by the graph. This
operation is called topological sorting, because it involves ordering the vertices
of the graph.
In general, the vertex order produced by a topological sort is not unique.
For example, the order
A J G F K L E M B H C I D
is a legal topological ordering for our example (and there are many others).
In the manufacturing application mentioned, this situation occurs when one
job has no direct or indirect dependence on another and thus they can be
performed in either order.
It is occasionally useful to interpret the edges in a graph the other way
around: to say that an edge directed from x to y means that vertex x
"depends" on vertex y. For example, the vertices might represent terms to be
defined in a programming language manual (or a book on algorithms!) with
an edge from x to y if the definition of x uses y. In this case, it would be
useful to find an ordering with the property that every term is defined before
it is used in another definition. This corresponds to positioning the vertices
in a line so that edges would all go from right to left. A reverse topological
order for our sample graph is:
D E F C B I H G A K M L J
CHAPTER 32
The distinction here is not crucial: performing a reverse topological sort on a
graph is equivalent to performing a topological sort on the graph obtained by
reversing all the edges.
But we've already seen an algorithm for reverse topological sorting, the
standard recursive depth-first search procedure of Chapter 29! Simply changing
visit to print out the vertex visited just before exiting, for example by
inserting write(name[k] ) right at the end, causes dfs to print out the vertices
in reverse topological order, when the input graph is a dag. A simple induction
argument proves that this works: we print out the name of each vertex after
we've printed out the names of all the vertices that it points to. When visit
is changed in this way and run on our example, it prints out the vertices in
the reverse topological order given above. Printing out the vertex name on
exit from this recursive procedure is exactly equivalent to putting the vertex
name on a stack on entry, then popping it and printing it on exit. It would
be ridiculous to use an explicit stack in this case, since the mechanism for
recursion provides it automatically; we mention this because we do need a
stack for the more difficult problem to be considered next.
Strongly Connected Components
If a graph contains a directed cycle, (if we can get from a node back to itself
by following edges in the indicated direction), then it it is not a dag and it
can't be topologically sorted: whichever vertex on the cycle is printed out first
will have another vertex which points to it which hasn't yet been printed out.
The nodes on the cycle are mutually accessible in the sense that there is a
way to get from every node on the cycle to another node on the cycle and
back. On the other hand, even though a graph may be connected, it is not
likely to be true that any node can be reached from any other via a directed
path. In fact, the nodes divide themselves into sets called strongly connected
components with the property that all nodes within a componenl are mutually
accessible, but there is no way to get from a node in one component to a node
in another component and back. The strongly connected components of the
directed graph at the beginning of this chapter are two single nodes B and K,
one pair of nodes H I, one triple of nodes D E F, and one large component with
six nodes A C G J L M. For example, vertex A is in a different component
from vertex F because though there is a path from A to F, there is no way to
get from F to A.
The strongly connected components of a directed graph can be found
using a variant of depth-first search, as the reader may have learned to expect.
The method that we'll examine was discovered by R. E. Tarjan in 1972. Since
it is based on depth-first search, it runs in time proportional to V + E, but it is
actually quite an ingenious method. It requires only a few simple modifications
to our basic visit procedure, but before Tarjan presented the method, no linear
DLRECTED GRAPHS 429
time algorithm was known for this problem, even though many people had
worked on it.
The modified version of depth first search that we use to find the strongly
connected components of a graph is quite similar to the program that we
studied in Chapter 30 for finding biconnected components. The recursive
visit function given below uses the same min computation to find the highest
vertex reachable (via an up link) from any descendant of vertex k, but uses
the value of min in a slightly different way to write out the strongly connected
components:
function visit(k: integer): integer;
var t: link;
m, min : integer;
now:=now+l; val[k] :=now; min:=now;
stack[p] :=k; p:=p+I;
t:=adj[k] ;
while t<>z do
if vaJ[tr.v]=O
then m:=visit(tf.v)
else m:=vaJ[tf.v];
if m<min then min:=m;
t:=tt.next
end ;
if min=vaJ[k] then
repeat
p:=p-1; write(name(stack[p]));
vaJ[stack[p]]:=V+I
until stack[p]=k;
wri teln
end ;
visit:=min;
end ;
This program pushes the vertex names onto a stack on entry to visit, then
I pops them and prints them on exit from visiting the last member of each
strongly connected component. The point of the computation is the test
whether min=vaJ[k] at the end: if so, all vertices encountered since entry
(except those already printed out) belong to the same strongly connected
430 CHAPTER 32
component as k. As usual, this program could easily be modified to do more
sophisticated processing than simply writing out the components.
The method is based on two observations that we've actually already
made in other contexts. First, once we reach the end of a call to visit for
a vertex, then we won't encounter any more vertices in the same strongly
connected component (because all the vertices which can be reached from that
vertex have been processed, as we noted above for topological sorting). Second,
the "up" links in the tree provide a second path from one vertex to another and
bind together the strong components. As with the algorithm in Chapter 30 for
finding articulation points, we keep track of the highest ancestor reachable
via one "up" link from all descendants of each node. Now, if a vertex x
has no descendants or "up" links in the depth-first search tree, or if it has a
descendant in the depth-first search tree with an "up" link that points to x,
and no descendants with "up" links that point higher up in the tree, then it
and all its descendants (except those vertices satisfying the same property and
their descendants) comprise a strongly connected component. In the depthfirst
search tree at the beginning of the chapter, nodes B and K satisfy the
first condition (so they represent strongly connected components themselves)
and nodes F(representing F E D), H (representing H I), and A (representing
A G J L M C) satisfy the second condition. The members of the component
represented by A are found by deleting B K F and their descendants (they
appear in previously discovered components). Every descendant y of x that
does not satisfy this same property has some descendant that has an "up"
link that points higher than y in the tree. There is a path from x to y down
through the tree; and a path from y to x can be found by going down from
y to the vertex with the "up" link that reaches past y, then continuing the
same process until x is reached. A crucial extra twist is that once we're done
with a vertex, we give it a high val, so that "cross" links to that vertex will
be ignored.
This program provides a deceptively simple solution to a relatively difficult
problem. It is certainly testimony to the subtleties involved in searching
directed graphs, subtleties which can be handled (in this case) by a carefully
crafted recursive -prog-ram.
r-l
DIRECTED GRAPHS 431
Exercises
1. Give the adjacency matrix for the transitive closure of the example dag
given in this chapter.
2. What would be the result of running the transitive closure algorithms on
an undirected graph which is represented with an adjacency matrix?
3. Write a program to determine the number of edges in the transitive closure
of a given directed graph, using the adjacency list representation.
4. Discuss how Warshall's algorithm compares with the transitive closure
algorithm derived from using the depth-first search technique described
in the text, but using the adjacency matrix form of visit and removing
the recursion.
5. Give the topological ordering produced for the example dag given in
the text when the suggested method is used with an adjacency matrix
representation, but dfs scans the vertices in reverse order (from V down
to 1) when looking for unvisited vertices.
6. Does the shortest path algorithm from Chapter 31 work for directed
graphs? Explain why or give an example for which it fails.
7. Write a program to determine whether or not a given directed graph is a
dag.
8. How many strongly connected components are there in a dag? In a graph
with a directed cycle of size V?
9. Use your programs from Chapters 29 and 30 to produce large random
directed graphs with V vertices. How many strongly connected components
do such graphs tend to have?
10. Write a program that is functionally analogous to find from Chapter
30, but maintains strongly connected components of the directed graph
described by the input edges. (This is not an easy problem: you certainly
won't be able to get as efficient a program as find.)
33. Network Flow
q Weighted directed graphs are useful models for several types of applications
involving commodities flowing through an interconnected network.
Consider, for example, a network of oil pipes of varying sizes, interconnected
in complex ways, with switches controlling the direction of flow at junctions.
Suppose further that the network has a single source (say, an oil field) and a
single destination (say, a large refinery) to which all of the pipes ultimately
connect. What switch settings will maximize the amount of oil flowing from
source to destination? Complex interactions involving material flow at junctions
make this problem, called the networkflow problem, a nontrivial problem
to solve.
This same general setup can be used to describe traffic flowing along
highways, materials flowing through factories, etc. Many different versions
of the problem have been studied, corresponding to many different practical
situations where it has been applied. There is clearly strong motivation to
find an efficient algorithm for these problems.
This type of problem lies at the interface between computer science
and the field of operations research. Operations researchers are generally
concerned with mathematical modeling of complex systems for the purpose
of (preferably optimal) decision-making. Network flow is a typical example
of an operations research problem; we'll briefly touch upon some others in
Chapters 37-40.
As we'll see, the modeling can lead to complex mathematical equations
that require computers for solution. For example, the classical solution to the
network flow problem given below is closely related to the graph algorithms
that we have been examining. But this problem is one which is still actively
being studied: unlike many of the problems that we've looked at, the "best"
solution has not yet been found and good new algorithms are still being
discovered.
433
434 CRAPTER 33
The Network Flow Problem
Consider the following rather idealized drawing of a small network of oil pipes:
The pipes are of fixed capacity proportional to their size and oil can Ilow
in them only in the direction indicated (perhaps they run downhill or have
unidirectional pumps ). Furthermore, switches at each junction control how
much of the oil goes in each direction. No matter how the switches are set,
the system reaches a state of equilibrium when the amount of oil flowing into
the system at the left is equal to the amount flowing out of the system at the
right (this is the quantity that we want to maximize) and when the amount
of oil flowing in at each junction is equal to the amount of oil flowing out. We
measure both flow and pipe capacity in terms of integral "units" (say, gallons
per second).
It is not immediately obvious that the switch settings can really affect the
total flow: the following example will illustrate that they can. First, suppose
that all switches are open so that the two diagonal pipes and the top and
bottom pipes are full. This gives the following configuration:
The total flow into and out of the network in this case is less than half the
capacity of the input pipe, only slightly more than half the capacity of the
output pipe. Now suppose that the upward diagonal pipe is shut off. This
shuts flow equal to its capacity out of the bottom, and the top is unaffected
because there's room to replace its flow from the input pipe; thus we have:
NETWORK FLOW 435
The total flow into and out of the network is increased to substantially.
This situation can obviously be modeled using a directed graph, and it
turns out that the programs that we have studied can apply. Define a network
as a weighted directed graph with two distinguished vertices: one with no
edges pointing in (the source); one with no edges pointing out (the sink). The
weights on the edges, which we assume to be non-negative, are called the edge
capacities. Now, a flow is defined as another set of weights on the edges such
that the how on each edge is equal to or less than the capacity, and the flow
into each vertex is equal to the flow out of that vertex. The value of the flow
is the flow into the source (or out of the sink). The network flow problem is
to find a flow with maximum value for a given network.
Networks can obviously be represented using either the adjacency matrix
or adjacency list representations that we have used for graphs in previous
chapters. Instead of a single weight, two weights are associated with each
edge, the size and the Aow. These could be represented as two fields in an
adjacency list node, as two matrices in the adjacency matrix representation,
or as two fields within a single record in either representation. Even though
networks are directed graphs, the algorithms that we'll be examining need
to traverse edges in the "wrong" direction, so we use an undirected graph
representation: if there is an edge from x to y with size s and flow f, we also
keep an edge from y to x with size -s and flow -f. In an adjacency list
representation, it is necessary to maintain links connecting the two list nodes
which represent each edge, so that when we change the flow in one we can
update it in the other.
Ford-Fulkerson Method
The classical approach to the network flow problem was developed by L. R.
Ford and D. R. Fulkerson in 1962. They gave a method to improve any legal
flow (except, of course, the maximum). Starting with a zero flow, we apply
the method repeatedly; as long as the method can be applied, it produces an
increased flow; if it can't be applied, the maximum flow has been found.
Consider any directed path through the network (from source to sink).
Clearly, the flow can be increased by at least the smallest amount of unused
capacity on any edge on the path, by increasing the flow in all edges on the
436 CRAPTER 33
path by that amount. In our example, this rule could be applied along the
path ADEBCF:
then along the path ABCDEF:
C
U4
*2/3 F
E 2/5
4/5
and then along the path ABCF:
Now all directed paths through the network have at least one edge which
is filled to capacity. But there is another way to increase the flow: we can
consider arbitrary paths through the network which can contain edges which
point the "wrong way" (from sink to source along the path). The flow can
NETWORK FLOW 437
be increased along such a path by increasing the flow on edges from source
to sink and decreasing the flow on edges from sink to source by the same
amount. To simplify terminology, we'll call edges which flow from source to
sink along a particular path forward edges and edges which flow from sink to
source backward edges. For example, the flow in the network above can be
increased by 2 along the path ABEF.
This corresponds to shutting off the oil on the pipe from E to B; this allows 2
units to be redirected from E to F without losing any flow at the other end,
because the 2 units which used to come from E to B can be replaced by 2
units from A.
Notice that the amount by which the flow can be increased is limited by
the minimum of the unused capacities in the forward edges and the minimum
of the flows in the backward edges. Put another way, in the new flow, at
least one of the forward edges along the path becomes full or at least one of
the backward edges along the path becomes empty. Furthermore, the flow
can't be increased on any path containing a full forward edge or an empty
backward edge.
The paragraph above gives a method for increasing the flow on any
network, provided that a path with no full forward edges or empty backward
edges can be found. The crux of the Ford-Fulkerson method is the observation
that if no such path can be found then the flow is maximal. The proof of
this fact goes as follows: if every path from the source to the sink has a full
forward edge or an empty backward edge, then go through the graph and
identify the first such edge on every path. This set of edges cuts the graph in
two parts, as shown in the diagram below for our example.
438 CHAF'TER 33
For any cut of the network into two parts, we can measure the flow "across"
the cut: the total of the flow on the edges which go from the source to the sink
minus the total of the flow on the edges which go the other way. Our example
cut has a value of 8, which is equal to the total flow for the network. It turns
out that whenever the cut flow equals the total flow, we know not only that
the flow is maximal, but also that the cut is minimal (that is, every other
cut has at least as high a flow "across"). This is called the maxfiow-mincut
theorem: the flow couldn't be any larger (otherwise the cut would have to be
larger also); and no smaller cuts exist (otherwise the flow would have to be
smaller also).
Network Searching
The Ford-Fulkerson method described above may be summarized as follows:
"start with zero flow everywhere and increase the flow along any path from
source to sink with no full forward edges or empty backward edges, continuing
until there are no such paths in the network." But this is not an algorithm in
the sense to which we have become accustomed, since the method for finding
paths is not specified. The example above is based on the intuition that the
longer the path, the more the network is filled up, and thus that long paths
should be preferred. But the following classical example shows that some care
should be exercised:
0/1000 B 0/1000
@
A O/l c
o/loo0
D 0/1000
Now, if the first path chosen is ABDC, then the flow is increased by only one.
Then the second path chosen might be ADBC, again increasing the flow by
NETWORK FLOW 439
one, and leaving an situation identical to the initial situation, except that the
flows on the outside edges are increased only by 1. Any algorithm which chose
those two paths (for example, one that looks for long paths) would continue
to do so, thus requiring 1000 pairs of iterations before the maximum flow is
found. If the numbers on the sides were a billion, then a billion iterations
would be used. Obviously, this is an undesirable situation, since the paths
ABC and ADC would give the maximum flow in just two steps. For the
algorithm to be useful, we must avoid having the running time so dependent
on the magnitude of the capacities.
Fortunately, the problem is easily eliminated. It was proven by Edmonds
and Karp that if breadth-first search were used to find the path, then the
number of paths used before the maximum flow is found in a network of V
vertices and E edges must be less than VE. (This is a worstccase bound: a
typical network is likely to require many fewer steps.) In other words, simply
use the shortest available path from source to sink in the Ford-Fulkerson
method.
With the priority graph traversal method of Chapters 30 and 31, we
can implement another method suggested by Edmonds and Karp: find the
path through the network which increases the flow by the largest amount.
This can be achieved simply by using a variable for priority (whose value
is set appropriately) in either the adjacency list sparsepfs of Chapter 30 or
the adjacency matrix densepfs of Chapter 31. For example, the following
statements compute the priority assuming a matrix representation:
if size[k, t]>O then priority:=size[k, t]-Aow[k, t]
else priority:=-flow[k, t] ;
if priority>val[k] then priority:=val[k] ;
Then, since we want to take the node with the highest priority value, we must
either reorient the priority queue mechanisms in those programs to return
the maximum instead of the minimum or use them as is with priority set
to maxim?-l-priority (and the process reversed when the value is removed).
Also, we modify the priority first search procedure to take the source and sink
as arguments, then to start at the source and stop when a path to the sink
has been found. If such a path is not found, the partial priority search tree
defines a mincut for the network. Finally, the val for the source should be set
to maxi& before the search is started, to indicate that any amount of flow
can be achieved at the source (though this is immediately restricted by the
total capacity of all the pipes leading directly out of the source).
With densepfs implemented as described in the previous paragraph, finding
the maximum flow is actually quite simple, as shown by the following
440 CHAPTER 33
program:
repeat
densepfs(l, V);
y:=V, x:=dad[q;
while xc>0 do
Aow[x,y]:=Aow[x,y]+val[Vl;
Aow[y, x] :=-Aow[x, y] ;
y:=x; x:=dad[y]
end
until val [v] = unseen
This program assumes that the adjacency matrix representation is being used
for the network. As long as densepfs can find a path which increases the
flow (by the maximum amount), we trace back through the path (using the
dad array constructed by densepfs) and increase the how as indicated. If V
remains unseen after some call to densepfs, then a mincut has been found and
the algorithm terminates.
For our example network, the algorithm first increases the flow along the
path ABCF, then along ADEF, then along ABCDEF. No backwards edges
are used in this example, since the algorithm does not make the unwise choice
ADEBCF that we used to illustrate the need for backwards edges. In the next
chapter we'll see a graph for which this algorithm uses backwards edges to
find the maximum how.
Though this algorithm is easily implemented and is likely to work well
for networks that arise in practice, the analysis of this method is quite complicated.
First, as usual, densepfs requires V2 steps in the worst case, or we
could use sparsepfs to run in time proportional to (E + V) log V per iteration,
though the algorithm is likely to run somewhat faster than this, since it stops
when it reaches the sink. But how many iterations are required? Edmonds
and Karp show the worst case to be 1 + logMIMP1 f * where f * is the cost
of the flow and M is the maximum number of edges in a cut of the network.
This is certainly complicated to compute, but it is not likely to be large for
real networks. This formula is included to give an indication not of how long
the algorithm might take on an actual network, but rather of the complexity
of the analysis. Actually, this problem has been quite widely studied, and
complicated algorithms with much better worst-case time bounds have been
developed.
The network flow problem can be extended in several ways, and many
variations have been studied in some detail because they are important in
NETWORK FLOW 441
actual applications. For example, the multicommodity flow problem involves
introducing multiple sources, sinks, and types of material in the network. This
makes the problem much more difficult and requires more advanced algorithms
than those considered here: for example, no analogue to the max-flow mincut
theorem is known to hold for the general case. Other extensions to the
network flow problem include placing capacity constraints on vertices (easily
handled by introducing artificial edges to handle these capacities), allowing
undirected edges (also easily handled by replacing undirected edges by pairs
of directed edges), and introducing lower bounds on edge flows (not so easily
handled). If we make the realistic assumption that pipes have associated costs
as well as capacities, then we have the min-cost flow problem, a quite difficult
1nroblem from onI erations research.
r l
442
Exercises
1. Give an algorithm to solve the network flow problem for the case that the
network forms a tree if the sink is removed.
2. What paths are traced by the algorithm given in the text when finding
the maximum flow in the following network?
3. Draw the priority search trees computed on each call to densepfs for the
example discussed in the text.
4. True or false: No algorithm can find the maximum flow without examining
every edge in the network.
5. What happens to the Ford-Fulkerson method in the case that the network
has a directed cycle?
6. Give an example where a shortest-path traversal would produce a different
set of paths than the method given in the text.
7. Give a counterexample which shows why depth-first search is not appropriate
for the network flow problem.
8. Find an assignment of sizes that would make the algorithm given in the
text use a backward edge on the example graph, or prove that none exists.
9. Implement the breadth-first search solution to the network flow problem,
using sparsepfs.
10. Write a program to find maximum flows in random networks with V
nodes and about 1OV edges. How many calls to sparsepfs are made, for
V = 25, 50, lOO?
34. Matching
A problem which often arises is to "pair up" objects according to preference
relationships which are likely to conflict. For example, a quite
complicated system has been set up in the U. S. to place graduating medical
students into hospital residence positions. Each student lists several hospitals
in order of preference, and each hospital lists several students in order of
preference. The problem is to assign students to positions in a fair way,
respecting all the stated preferences. A sophisticated algorithm is required
because the best students are likely to be preferred by several hospitals, and
the best hospital positions are likely to be preferred by several students. It's
not even clear that each hospital position can be filled by a student that the
hospital has listed and each student can be assigned to a position that the
student has listed, let alone respect the order in the preference lists. Actually
this frequently occurs: after the algorithm has done the best that it can,
there is a last minute scramble among unmatched hospitals and students to
complete the process.
This example is a special case of a difficult fundamental problem on
graphs that has been widely studied. Given a graph, a matching is a subset
of the edges in which no vertex appears more than once. That is, each vertex
touched by one of the edges in the matching is paired with the other vertex
of that edge, but some vertices may be left unmatched. Even if we insist
that there should be no edges connecting unmatched vertices, different ways
of choosing the edges could lead to different numbers of leftover (unmatched)
vertices.
Of particular interest is a mazimum matching, which contains as many
edges as possible or, equivalently, which minimizes the number of unmatched
vertices. The best that we could hope to do would be to have a set of edges
in which each vertex appears exactly once (such a matching in a graph with
2V vertices would have V edges), but it is not always possible to achieve this.
443
444 CHAPTER 34
For example, consider our sample undirected graph:
The edges AF DE CG Hl JK LM make a maximum matching for this graph,
which is the best that can be done, but there's no three-edge matching for
the subgraph consisting of just the first six vertices and the edges connecting
them.
For the medical student matching problem described above, the students
and hospitals would correspond to nodes in the graph; their preferences to
edges. If they assign values to their preferences (perhaps using the timehonored
"l-10" scale), then we have the weighted matching problem: given
a weighted graph, find a set of edges in which no vertex appears more than
once such that the sum of the weights on the edges in the set chosen is
maximized. Below we'll see another alternative, where we respect the order in
the preferences, but do not require (arbitrary) values to be assigned to them.
The matching problem has attracted attention because of its intuitive
nature and its wide applicability. Its solution in the general case involves
intricate and beautiful combinatorial mathematics beyond the scope of this
book. Our intent here is to provide the reader with an appreciation for the
problem by considering some interesting special cases while at the same time
developing some useful algorithms.
Bipartite Graphs
The example mentioned above, matching medical students to residencies, is
certainly representative of many other matching applications. For example,
we might be matching men and women for a dating service, job applicants to
available positions, courses to available hours, or congressmen to committee
assignments. The graphs resulting from modeling such cases are called bipartite
graphs, which are defined to be graphs in which all edges go between two
sets of nodes (that is, the nodes divide into two sets and no edges connect
two nodes in the same set). Obviously, we wouldn't want to "match" one job
applicant to another or one committee assignment to another.
The reader might be amused to search for a maximum matching in the
typical Lipartite graph drawn below:
In an adjacency matrix representation for bipartite graphs, one can achieve
obvious savings by including only rows for one set and only columns for the
other set. In an adjacency list representation, no particular saving suggests
itself, except naming the vertices intelligently so that it is easy to tell which
set a vertex belongs to.
In our examples, we use letters for nodes in one set, numbers for nodes
in t.he other. The maximum matching problem for bipartite graphs can be
simply expressed in this representation: "Find the largest subset of a set of
letter-number pairs with the property that no two pairs have the same letter
or number." Finding the maximum matching for our example bipartite graph
corresponds to solving this puzzle on the pairs E5 A2 Al Cl B4 C3 D3 B2 A4
D5 E3 Bl.
It is an interesting exercise to attempt to find a direct solution to the
matching problem for bipartite graphs. The problem seems easy at first
glance, but subtleties quickly become apparent. Certainly there are far too
many pairings to try all possibilities: a solution to the problem must be clever
enough to try only a few of the possible ways to match the vertices.
The solution that we'll examine is an indirect one: to solve a particular
instance of the matching problem, we'll construct an instance of the network
flow problem, use the algorithm from the previous chapter, then use the
solution to the network flow problem to solve the matching problem. That is,
we reduce the matching problem to the network flow problem. Reduction is a
rnethod of algorithm design somewhat akin to the use of a library subroutine
by a systems programmer. It is of fundamental importance in the theory
of advanced combinatorial algorithms (see Chapter 40). For the moment,
reduction will provide us with an efficient solution to the bipartite matching
problem.
The construction is straightforward: given an instance of bipartite matchCHAPTER
34
ing, construct an instance of network flow by creating a source vertex with
edges pointing to all the members of one set in the bipartite graph, then make
all the edges in the bipartite graph point from that set to the other, then add
a sink vertex pointed to by all the members of the other set. All of the edges
in the resulting graph are given a capacity of 1. For example, the bipartite
graph given above corresponds to the network below: the darkened edges show
the first four paths found when the network flow algorithm of the previous
chapter is run on this graph.
Note that the bipartite property of the graph, the direction of the flow, and
the fact that all capacities are 1 force each path through the network to
correspond to an edge in a matching: in the example, the paths found so far
correspond to the partial matching Al B2 C3 D5. Each time the network flow
algorithm calls pfs it either finds a path which increases the flow by one or
terminates.
Now all forward paths through the network are full, and the algorithm
must use backward edges. The path found in this example is the path
04AlC3EZ. This path clearly increases the flow in the network, as described in
the previous chapter. In the present context, we can think of the path as a set
of instructions to create a new partial matching (with one more edge) from the
current one. This construction follows in a natural way from tracing through
the path in order: "4A" means to add A4 to the matching, which requires
MATCHING
that "Al" be deleted; "1C" means to add Cl to the matching, which requires
that "C3" be deleted; "3E" means to add E3 to the matching. Thus, after
this path is processed, we have the matching A4 B2 Cl D5 E3; equivalently,
the flow in the network is given by full pipes in the edges connecting those
nodes, and all pipes leaving 0 and entering Z full.
The proof that the matching is exactly those edges which are filled to
capacity by the maxflow algorithm is straightforward. First, the network flow
always gives a legal matching: since each vertex has an edge of capacity 1
either coming in (from the sink) or going out (to the source), at most one unit
of flow can go through each vertex, which implies that each vertex will be
included at most once in the matching. Second, no matching can have more
edges, since any such matching would lead directly to a better flow than that
produced by the maxflow algorithm.
Thus, to compute the maximum matching for a bipartite graph we simply
format the graph so as to be suitable for input to the network flow algorithm
of the previous chapter. Of course, the graphs presented to the network flow
algorithm in this case are much simpler than the general graphs the algorithm
is designed to handle, and it turns out that the algorithm is somewhat more
efficient for this case. The construction ensures that each call to pfs adds
one edge to the matching, so we know that there are at most V/2 calls to
pfs during the execution of the algorithm. Thus, for example, the total time
to find the maximum matching for a dense bipartite graph with V vertices
(using the adjacency matrix representation) is proportional to V3.
Stable Marriage Problem
The example given at the beginning of this chapter, involving medical students
and hospitals, is obviously taken quite seriously by the participants. But
the method that we'll examine for doing the matching is perhaps better
understood in terms of a somewhat whimsical model of the situation. We
assume that we have N men and N women who have expressed mutual
preferences (each man must say exactly how he feels about each of the N
women and vice versa). The problem is to find a set of N marriages that
respects everyone's preferences.
How should the preferences be expressed? One method would be to use
the "1-10" scale, each side assigning an absolute score to certain members of
the other side. This makes the marriage problem the same as the weighted
matching problem, a relatively difficult problem to solve. Furthermore, use of
absolute scales in itself can lead to inaccuracies, since peoples' scales will be
inconsistent (one woman's 10 might be another woman's 7). A more natural
way to express the preferences is to have each person list in order of preference
all the people of the opposite sex. The following two tables might show
448 CHAPTER 34
preferences among a set of five women and five men. As usual (and to protect
the innocent!) we assume that hashing or some other method has been used to
translate actual names to single digits for women and single letters for men:
A: 2 5 1 3 4 1: E A D B C
B: 1 2 3 4 5 2: D E B A C
c : 2 3 5 4 1 3: A D B C E
D: 1 3 2 4 5 4: C B D A E
E: 5 3 2 1 4 5: D B C E A
Clearly, these preferences often conflict: for example, both A and C list 2 as
their first choice, and nobody seems to want 4 very much (but someone must
get her). The problem is to engage all the women to all the men in such a
way as to respect all their preferences as much as possible, then perform N
marriages in a grand ceremony. In developing a solution, we must assume that
anyone assigned to someone less than their first choice will be disappointed
and will always prefer anyone higher up on the list. A set of marriages is
called unstable if two people who are not married both prefer each other to
their spouses. For example, the assignment Al B3 C2 D4 E5 is unstable
because A prefers 2 to 1 and 2 prefers A to C. Thus, acting according to their
preferences, A would leave 1 for 2 and 2 would leave C for A (leaving 1 and
C with little choice but to get together).
Finding a stable configuration seems on the face of it to be a difficult
problem, since there are so many possible assignments. Even determining
whether a configuration is stable is not simple, as the reader may discover
by looking (before reading the next paragraph) for the unstable couple in the
example above after the new matches A2 and Cl have been made. In general,
there are many different stable assignments for a given set of preference lists,
and we only need to find one. (Finding all stable assignments is a much more
difficult problem.)
One possible algorithm for finding a stable configuration might be to
remove unstable couples one at a time. However, not only is this slow because
of the time required to determine stability, but also the process does not
even necessarily terminate! For example, after A2 and Cl have been matched
in the example above, B and 2 make an unstable couple, which leads to
the configuration A3 B2 Cl D4 E5. In this arrangement, B and 1 make an
unstable couple, which leads to the configuration A3 Bl C2 D4 E5. Finally,
A and 1 make an unstable configuration which leads back to the original
configuration. An algorithm which attempts to solve the stable marriage
problem by removing stable pairs one by one is bound to get caught in this
type of loop.
MATCHING 449
Instead, we'll look at an algorithm which tries to build stable pairings
systematically using a method based on what might happen in the somewhat
idealized "real-life" version of the problem. The idea is to have each man,
in turn, become a "suitor" and seek a bride. Obviously, the first step in his
quest is to propose to the first woman on his list. If she is already engaged
to a man whom she prefers, then our suitor must try the next woman on his
list, continuing until he finds a woman who is not engaged or who prefers
him to her current fiancee. If this women is not engaged, then she becomes
engaged to the suitor and the next man becomes the suitor. If she is engaged,
then she breaks the engagement and becomes engaged to the suitor (whom
she prefers). This leaves her old fiancee with nothing to do but become the
suitor once again, starting where he left off on his list. Eventually he finds a
new fiancee, but another engagement may need to be broken. We continue in
this way, breaking engagements as necessary, until some suitor finds a woman
who has not yet been engaged.
This method may model what happens in some 19th-century novels, but
some careful examination is required to show that it produces a stable set of
assignments. The diagram below shows the sequence of events for the initial
stages of the process for our example. First, A proposes to 2 (his first choice)
and is accepted; then B proposes to 1 (his first choice) and is accepted; then C
proposes to 2, is turned down, and proposes to 3 and is accepted, as depicted
in the third diagram:
Each diagram shows the sequence of events when a new man sets out as the
suitor to seek a fiancee. Each line gives the "used" preference list for the
corresponding man, with each link labeled with an integer telling when that
link was used by that man to propose to that woman. This extra information
is useful in tracking the sequence of proposals when D and E become the
suitor, as shown in the following figure:
450 CWTER 34
When D proposes to 1, we have our first broken engagement, since 1 prefers
D to B. Then B becomes the suitor and proposes to 2, which gives our second
broken engagement, since 2 prefers B to A. Then A becomes the suitor and
proposes to 5, which leaves a stable situation. The reader might wish to trace
through the sequence of proposals made when E becomes the suitor. Things
don't settle down until after eight proposals are made. Note that E takes on
the suitor role twice in the process.
To begin the implementation, we need data structures to represent the
preference lists. Different structures are appropriate for the men and the
women, since they use the preference lists in different ways. The men simply
go through their preference lists in order, so a straightforward implementation
as a two-dimensional array is called for: we'll maintain a two-dimensional
array for the preference list so that, for example, prefer[m, w] will be the wth
woman in the preference list of the mth man. In addition, we need to keep
track of how far each man has progressed on his list. This can be handled
with a one-dimensional array next, initialized to zero, with next[m]+1 the
index of the next woman on man m's preference list: her identifier is found in
prefer[m, next[m]+l].
For each woman, we need to keep track of her fiancee (fiancee[w] will
be the man engaged to woman w) and we need to be able to answer the
question "Is man s preferable to fiancee [ w] ?" This could be done by searching
the preference list sequentially until either s or fiancee[w] is found, but this
method would be rather inefficient if they're both near the end. What is called
for is the "inverse" of the preference list: rank[w, s] is the index of man s on
woman w's preference list. For the example above this array is
MATCHING 451
1: 2 4 5 3 1
2: 4 3 5 1 2
3: 1 3 4 2 5
4: 4 2 1 3 5
5: 5 2 3 1 4
The suitability of suitor s can be very quickly tested by the statement if
rank[w, s]<rank[w, fiancee[w]] . . . . These arrays are easily constructed directly
from the preference lists. To get things started, we use a "sentinel" man
0 as the initial suitor, and put him at the end of all the women's preference
lists.
With the data structures initialized in this way, the implementation as
described above is straightforward:
for m:=l to N do
s:=m;
repeat
next[s]:=next[s]+l; w:=prefer[s,next[s]];
if rank[w, s]<rank[w, fiancee[w]] then
begin t:=fiancee[w]; fiancee[w]:=s; s:=t end;
until s=O;
end ;
Each iteration starts with an unengaged man and ends with an engaged
woman. The repeat loop must terminate because every man's list contains
every woman and each iteration of the loop involves incrementing some man's
list, and thus an unengaged woman must be encountered before any man's
list is exhausted. The set of engagements produced by the algorithm is stable
because every woman whom any man prefers to his fiancee is engaged to
someone that she prefers to him.
There are several obvious built-in biases in this algorithm. First, the
men go through the women on their lists in order, while the women must
wait for the "right man" to come along. This bias may be corrected (in a
somewhat easier manner than in real life) by interchanging the order in which
the preference lists are input. This produces the stable configuration 1E 2D
3A 4C 5B, where every women gets her first choice except 5, who gets her
second. In general, there may be many stable configurations: it can be shown
that this one is "optimal" for the women, in the sense that no other stable
configuration will give any woman a better choice from her list. (Of course,
the first stable configuration for our example is optimal for the men.)
452 CHAPTER 34
Another feature of the algorithm which seems to be biased is the order in
which the men become the suitor: is it better to be the first man to propose
(and therefore be engaged at least for a little while to your first choice) or
the last (and therefore have a reduced chance to suffer the indignities of a
broken engagement)? The answer is that this is not a bias at all: it doesn't
matter in what order the men become the suitor. As long as each man makes
proposals and each woman accepts according to their lists, the same stable
configuration results.
Advanced Algorithms
The two special cases that we've examined give some indication of how complicated
the matching problem can be. Among the more general problems
that have been studied in some detail are: the maximum matching problem
for general (not necessarily bipartite) graphs; weighted matching for bipartite
graphs, where edges have weights and a matching with maximum total weight
is sought; and weighted matching for general graphs. Treating the many techniques
that have been tried for matching on general graphs would fill an entire
volume: it is one of the most extensively studied problems in graph theory.
453
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Find all the matchings with five edges for the given sample bipartite
graph.
Use the algorithm given in the text to find maximum matchings for
random bipartite graphs with 50 vertices and 100 edges. About how many
edges are in the matchings?
Construct a bipartite graph with six nodes and eight edges which has a
three-edge matching, or prove that none exists.
Suppose that vertices in a bipartite graph represent jobs and people and
that each person is to be assigned to two jobs. Will reduction to network
flow give an algorithm for this problem? Prove your answer.
Modify the network flow program of Chapter 33 to take advantage of the
special structure of the O-l networks which arise for bipartite matching.
Write an efficient program for determining whether an assignment for the
marriage problem is stable.
Is it possible for two men to get their last choice in the stable marriage
algorithm? Prove your answer.
Construct a set of preference lists for N = 4 for the stable marriage
problem where everyone gets their second choice, or prove that no such
set exists.
Give a stable configuration for the stable marriage problem for the case
where the preference lists for men and women are all the same: in ascending
order.
Run the stable marriage program for N = 50, using random permutations
for preference lists. About how many proposals are made during the
execution of the algorithm?
454
SOURCES for Graph Algorithms
There are several textbooks on graph algorithms, but the reader should
be forewarned that there is a great deal to be learned about graphs, that
they still are not fully understood, and that they are traditionally studied
from a mathematical (as opposed to an algorithmic) standpoint. Thus, many
references have more rigorous and deeper coverage of much more difficult
topics than our treatment here.
Many of the topics that we've treated here are covered in the book by
Even, for example, our network flow example in Chapter 33. Another source
for further material is the book by Papadimitriou and Steiglitz. Though most
of that book is about much more advanced topics (for example, there is a full
treatment of matching in general graphs), it has up-to-date coverage of many
of the algorithms that we've discussed, including pointers to further reference
material.
The application of depth-first search to solve graph connectivity and other
problems is the work of R. E. Tarjan, whose original paper merits further
study. The many variants on algorithms for the union-find problem of Chapter
30 are ably categorized and compared by van Leeuwen and Tarjan. The
algorithms for shortest paths and minimum spanning trees in dense graphs
in Chapter 31 are quite old, but the original papers by Dijkstra, Prim, and
Kruskal still make interesting reading. Our treatment of the stable marriage
problem in Chapter 34 is based on the entertaining account given by Knuth.
E. W. Dijkstra, "A note on two problems in connexion with graphs," Numerishe
Muthemutik, 1 (1959).
S. Even, Graph Algorithms, Computer Science Press, Rockville, MD, 1980.
D. E. Knuth, Marriages stables, Les Presses de L'Universite de Montreal,
Montreal, 1976.
J. R. Kruskal Jr., "On the shortest spanning subtree of a graph and the
traveling salesman problem," Proceedings AMS, 7, 1 (1956).
C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms
and Complexity, Prentice-Hall, Englewood Cliffs, NJ, 1982.
R. C. Prim, "Shortest connection networks and some generalizations," Bell
System Technical Journal, 36 (1957).
R. E. Tarjan, "Depth-first search and linear graph algorithms," SIAM Journal
on Computing, 1, 2 (1972).
J. van Leeuwen and R. E. Tarjan, "Worst-case analysis of set-union algorithms,"
Journal of the ACM, to appear.
ADVANCED TOPICS
35. Algorithm Machines
The algorithms that we have studied are, for the most part, remarkably
robust in their applicability. Most of the methods that we have seen
are a decade or more old and have survived many quite radical changes in
computer hardware and software. New hardware designs and new software
capabilities certainly can have a significant impact on specific algorithms, but
good algorithms on old machines are, for the most part, good algorithms on
new machines.
One reason for this is that the fundamental design of "conventional"
computers has changed little over the years. The design of the vast majority
of computing systems is guided by the same underlying principle, which was
developed by the mathematician J. von Neumann in the early days of modern
computing. When we speak of the von Neumann model of computation, we
refer to a view of computing in which instructions and data are stored in the
same memory and a single processor fetches instructions from the memory
and executes them (perhaps operating on the data), one by one. Elaborate
mechanisms have been developed to make computers cheaper, faster, smaller
(physically), and larger (logically), but the architecture of most computer
systems can be viewed as variations on the von Neumann theme.
Recently, however, radical changes in the cost of computing components
have made it plausible to consider radically different types of machines, ones
in which a large number of instructions can be executed at each time instant
or in which the instructions are "wired in" to make special-purpose machines
capable of solving only one problem or in which a large number of smaller
machines can cooperate to solve the same problem. In short, rather than
having a machine execute just one instruction at each time instant, we can
think about having a large number of actions being performed simultaneously.
In this chapter, we shall consider the potential effect of such ideas on some of
the problems and algorithms we have been considering.
457
458 CHAPTER 35
General Approaches
Certain fundamental algorithms are used so frequently or for such large problems
that there is always pressure to run them on bigger and faster computers.
One result of this has been a series of "supercomputers" which embody
the latest technology; they make some concessions to the fundamental
von Neumann concept but still are designed to be general-purpose and useful
for all programs. The common approach to using such a machine for the type
of problem we have been studying is to start with the algorithms that are
best on conventional machines and adapt them to the particular features of
the new machine. This approach encourages the persistence of old algorithms
and old architecture in new machines.
Microprocessors with significant computing capabilities have become quite
inexpensive. An obvious approach is to try to use a large number of processors
together to solve a large problem. Some algorithms can adapt well to being
"distributed" in this way; others simply are not appropriate for this kind of
implementation.
The development of inexpensive, relatively powerful processors has involved
the appearance of general-purpose tools for use in designing and building
new processors. This has led to increased activity in the development
of special-purpose machines for particular problems. If no machine is particularly
well-suited to execute some important algorithm, then we can design
and build one that is! For many problems, an appropriate machine can be
designed and built that fits on one (very-large-scale) integrated circuit chip.
A common thread in all of these approaches is parallelism: we try to
have as many different things as possible happening at any instant. This can
lead to chaos if it is not done in an orderly manner. Below, we'll consider
two examples which illustrate some techniques for achieving a high degree of
parallelism for some specific classes of problems. The idea is to assume that
we have not just one but M processors on which our program can run. Thus,
if things work out well, we can hope to have our program run M times faster
than before.
There are several immediate problems involved in getting M processors
to work together to solve the same problem. The most important is that they
must communicate in some way: there must be wires interconnecting them
and specific mechanisms for sending data back and forth along those wires.
Furthermore, there are physical limitations on the type of interconnection
allowed. For example, suppose that our "processors" are integrated circuit
chips (these can now contain more circuitry than small computers of the past)
which have, say, 32 pins to be used for interconnection. Even if we had 1000
such processors, we could connect each to at most 32 others. The choice
of how to interconnect the processors is fundamental in parallel computing.
ALGORITHM MACHINES 459
Moreover, it's important to remember that this decision must be made ahead
of time: a program can change the way in which it does things depending on
the particular instance of the problem being solved, but a machine generally
can't change the way its parts are wired together.
This general view of parallel computation in terms of independent processors
with some fixed interconnection pattern applies in each of the three
domains described above: a supercomputer has very specific processors and
interconnection patterns that are integral to its architecture (and affect many
aspects of its performance); interconnected microprocessors involve a relatively
small number of powerful processors with simple interconnections; and verylarge-
scale integrated circuits themselves involve a very large number of simple
processors (circuit elements) with complex interconnections.
Many other views of parallel computation have been studied extensively
since von Neumann, with renewed interest since inexpensive processors have
become available. It would certainly be beyond the scope of this book to
treat all the issues involved. Instead, we'll consider two specific machines
that have been proposed for some familiar problems. The machines that we
consider illustrate the effects of machine architecture on algorithm design and
vice versa. There is a certain symbiosis at work here: one certainly wouldn't
design a, new computer without some idea of what it will be used for, and one
would like to use the best available computers to execute the most important
fundamental algorithms.
Perfect Shufles
To illustrate some of the issues involved in implementing algorithms as machines
instead of programs, we'll look at an interesting method for merging
which is suitable for hardware implementation. As we'll see, the same general
method can be developed into a design for an "algorithm machine" which
incorporates a fundamental interconnection pattern to achieve parallel operation
of M processors for solving several problems in addition to merging.
As mentioned above, a fundamental difference between writing a program
to solve a problem and designing a machine is that a program can adapt its
behavior to the particular instance of the problem being solved, while the
machine must be "wired" ahead of time always to perform the same sequence
of operations. To see the difference, consider the first sorting program that we
studied, sort3 from Chapter 8. No matter what three numbers appear in the
data, the program always performs the same sequence of three fundamental
"compare-exchange" operations. None of the other sorting algorithms that
we studied have this property. They all perform a sequence of comparisons
that depends on the outcome of previous comparisons, which presents severe
problems for hardware implementation.
460 CRAPTER 35
Specifically, if we have a piece of hardware with two input wires and two
output wires that can compare the two numbers on the input and exchange
them if necessary for the output, then we can wire three of these together
as follows to produce a sorting machine with three inputs (at the top in the
figure) and three outputs (at the bottom):
Thus, for example, if C B A were to appear at the top, the first box would
exchange the C and the B to give B C A, then the second box would exchange
the B and the A to give A C B, then the third box would exchange the C and
the B to produce the sorted result.
Of course, there are many details to be worked out before an actual
sorting machine based on this scheme can be built. For example, the method
of encoding the inputs is left unspecified: one way would be to think of each
wire in the diagram above as a "bus" of enough wires to carry the data with
one bit per wire; another way is to have the compare-exchangers read their
inputs one bit at a time along a single wire (most significant bit first). Also
left unspecified is the timing: mechanisms must be included to ensure that
no compare-exchanger performs its operation before its input is ready. We
clearly won't be able to delve much deeper into such circuit design questions;
instead we'll concentrate on the higher level issues concerning interconnecting
simple processors such as compare-exchangers for solving larger problems.
To begin, we'll consider an algorithm for merging together two sorted
files, using a sequence of "compare-exchange" operations that is independent
of the particular numbers to be merged and is thus suitable for hardware
implementation. Suppose that we have two sorted files of eight keys to be
merged together into one sorted file. First write one file below the other, then
compare those that are vertically adjacent and exchange them if necessary to
put the larger one below the smaller one.
ALGORITHM MACHINES 461
A E G G I M N R A B E E I M N R
A B E E L M P X A E G G L M P X
Next, split each line in half and interleave the halves, then perform the same
compare-exchange operations on the numbers in the second and third lines.
(Note that comparisons involving other pairs of lines are not necessary because
of previous sorting.)
A B E E ABEE
I M N R AEGG
AEGG I M N R
L M P X L M P X
This leaves both the rows and the columns of the table sorted. This fact is
a fundamental property of this method: the reader may wish to check that
it is true, but a rigorous proof is a trickier exercise than one might think. It
turns out that this property is preserved by the same operation: split each
line in half, interleave the halves, and do compare-exchanges between items
now vertically adjacent that came from different lines.
A B
E E
A E
G G
I M
N R
L M
P x
A B
A E
E E
G G
I M
L M
N R
P x
We have doubled the number of rows, halved the number of columns, and still
kept the rows and the columns sorted. One more step completes the merge:
462 CHAPTER 35
A A
B A
A B
E E
E E
E E
G G
G G
I I
M L
L M
M M
N N
R P
P R
X X
At last we have 16 rows and 1 column, which is sorted. This method obviously
extends to merge files of equal lengths which are powers of two. Other sizes
can be handled by adding dummy keys in a straightforward manner, though
the number of dummy keys can get large (if N is just larger than a power of
2).
The basic "split each line in half and interleave the halves" operation in
the above description is easy to visualize on paper, but how can it be translated
into wiring for a machine? There is a surprising and elegant answer to this
question which follows directly from writing the tables down in a different
way. Rather than writing them down in a two-dimensional fashion, we'll
write them down as a simple (one-dimensional) list of numbers, organized
in column-major order: first put the elements in the first column, then put
the elements in the second column, etc. Since compare-exchanges are only
done between vertically adjacent items, this means that each stage involves
a group of compare-exchange boxes, wired together according to the "splitand-
interleave" operation which is necessary to bring items together into the
compare-exchange boxes.
This leads to the following diagram, which corresponds precisely to the
description using tables above, except that the tables are all written in columnmajor
order (including an initial 1 by 16 table with one file, then the other).
The reader should be sure to check the correspondence between this diagram
and the tables given above. The compare-exchange boxes are drawn explicitly,
and explicit lines are drawn showing how elements move in the "split-andinterleave"
operation:
ALGORITHM MACHINES 463
AEGGIMNRABEELMPX
Ill
A I A L B M E M E IN G\ P E It3 GI X
A X
A X
A X
B A E E E G G I M L M N R P ' X
ABEEEGGILMMNPRX
A
A
Surprisingly, in this representation each "split-and-interleave" operation reduces
to precisely the same interconnection pattern. This pattern is called
the perfect shufle because the wires are exactly interleaved, in the same way
that cards from the two halves would be interleaved in an ideal mix of a deck
of cards.
This method was named the odd-even merge by K. E. Batcher, who
invented it in 1968. The essential feature of the method is that all of the
compare-exchange operations in each stage can be done in parallel. It clearly
demonstrates that two files of N elements can be merged together in 1ogN
parallel steps (the number of rows in the table is halved at every step), using
less than N log N compare-exchange boxes. From the description above, this
might seem like a straightforward result: actually, the problem of finding such
a machine had stumped researchers for quite some time.
Batcher also developed a closely related (but more difficult to understand)
merging algorithm, the bitonic merge, which leads to an even simpler machine.
464 CHAF'TER 35
This method can be described in terms of the "split-and-interleave" operation
on tables exactly as above, except that we begin with the second file in reverse
sorted order and always do compare-exchanges between vertically adjacent
items that came from the same lines. We won't go into the proof that this
method works: our interest in it is that it removes the annoying feature in the
odd-even merge that the compare-exchange boxes in the first stage are shifted
one position from those in following stages. As the following diagram shows,
each stage of the bitonic merge has exactly the same number of comparators,
in exactly the same positions:
A E G G I M N R X P M L E E B A
A X E P G M G L I E M E N B R A
A X E P G M G L E I E M B N A R
I3
A B E G I M X N E A E G M L P R
ABEGlMNXAE,EGLMPR
I \\\w///
A A B E E E G G I L M M N P R X
Now there is regularity not only in the interconnections but in the positions of
the compare-exchange boxes. There are more compare-exchange boxes than
for the odd-even merge, but this is not a problem, since the same number
of parallel steps is involved. The importance of this method is that it leads
directly to a way to do the merge using only N compare-exchange boxes. The
ALGORITHM MACHINES 465
idea is to simply collapse the rows in the table above to just one pair of rows,
and thus produce a cycling machine wired together as follows:
I ,
I ' I I ' ' 1
Such a machine can do log N compare-exchange-shuffle "cycles," one for each
of the stages in the figure above.
Note carefully that this is not quite "ideal" parallel performance: since
we can merge together two files of N elements using one processor in a number
of steps proportional to N, we would hope to be able to do it in a constant
number of steps using N processors. In this case, it has been proven that it
is not possible to achieve this ideal and that the above machine achieves the
best possible parallel performance for merging using compare-exchange boxes.
The perfect shuffle interconnection pattern is appropriate for a variety of
other problems. For example, if a 2n-by-2n square matrix is kept in row-major
order, then n perfect shuffles will transpose the matrix (convert it to columnmajor
order). More important examples include the fast Fourier transform
(which we'll examine in the next chapter); sorting (which can be developed by
applying either of the methods above recursively); polynomial evaluation; and
a host of others. Each of these problems can be solved using a cycling perfect
shuffle machine with the same interconnections as the one diagramed above
but with different (somewhat more complicated) processors. Some researchers
have even suggested the use of the perfect shuffle interconnection for "generalpurpose"
parallel computers.
466 CHAPTER 35
Systolic Arrays
One problem with the perfect shuffle is that the wires used for interconnection
are long. Furthermore, there are many wire crossings: a shuffle with N wires
involves a number of crossings proportional to N2. These two properties turn
out to create difficulties when a perfect shuffle machine is actually constructed:
long wires lead to time delays and crossings make the interconnection expensive
and inconvenient.
A natural way to avoid both of these problems is to insist that processors
be connected only to processors which are physically adjacent. As above, we
operate the processors synchronously: at each step, each processor reads inputs
from its neighbors, does a computation, and writes outputs to its neighbors.
It turns out that this is not necessarily restrictive, and in fact H. T. Kung
showed in 1978 that arrays of such processors, which he termed systolic arrays
(because the way data flows within them is reminiscent of a heartbeat), allow
very efficient use of the processors for some fundamental problems.
As a typical application, we'll consider the use of systolic arrays for
matrix-vector multiplication. For a particular example, consider the matrix
operation
(L/ -3 'X)-(g)
This computation will be carried out on a row of simple processors each of
which has three input lines and two output lines, as depicted below:
Five processors are used because we'll be presenting the inputs and reading
the outputs in a carefully timed manner, as described below.
During each step, each processor reads one input from the left, one from
the top, and one from the right; performs a simple computation; and writes
one output to the left and one output to the right. Specifically, the right
output gets whatever was on the left input, and the left output gets the result
computed by multiplying together the left and top inputs and adding the right
input. A crucial characteristic of the processors is that they always perform a
dynamic transformation of inputs to outputs; they never have to "remember"
computed values. (This is also true of the processors in the perfect shuffle
machine.) This is a ground rule imposed by low-level constraints on the
ALGORITHA4 MACHTNES 467
hardware design, since the addition of such a "memory" capability can be
(relatively) quite expensive.
The paragraph above gives the "program" for the systolic machine; to
complete the description of the computation, we need to also describe exactly
how the input values are presented. This timing is an essential feature of the
systolic machine, in marked contrast to the perfect shuffle machine, where
all the input values are presented at one time and all the output values are
available at some later time.
The general plan is to bring in the matrix through the top inputs of the
processors, reflected about the main diagonal and rotated forty-five degrees,
and the vector through the left input of processor A, to be passed on to the
other processors. Intermediate results are passed from right to left in the
array, with t,he output eventually appearing on the left output of processor
A. The specific timing for our example is shown in the following table, which
gives the values of the left, top, and right inputs for each processor at each
step:
A B C D E A B C D E A B C D
1 1
2 1
3 5 1 1
4 5 1 3 1 1
5 2 5 1 -4 1 -1 16 1
6 2 5 -2 -2 8 6 -1
7 2 5 5 2 -11
8 2 2 -1
9 2 -1
10 -1
The input vector is presented to the left input of processor A at steps 1, 3,
and 5 and passed right to the other processors in subsequent steps. The input
matrix is presented to the top inputs of the processors starting at steps 3,
skewed so the right-tcFleft diagonals of the matrix are presented in successive
steps. The output vector appears as the left output of processor A at steps
6, 8, and 10. (In the table, this appears as the right input of an imaginary
processor to the left of A, which is collecting the answer.)
The actual computation can be traced by following the right inputs (left
outputs) which move from right to left through the array. All computations
produce a zero result until step 3, when processor C has 1 for its left input
and 1 for its top input, so it computes the result 1, which is passed along
CHAPTER 35
as processor B's right input for step 4. At step 4, processor B has non-zero
values for all three of its inputs, and it computes the value 16, to be passed
on to processor A for step 5. Meanwhile, processor D computes a value 1 for
processor C's use at step 5. Then at step 5, processor A computes the value
8, which is presented as the first output value at step 6; C computes the value
6 for B's use at step 6, and E computes its first nonaero value (-1) for use by
D at step 6. The computation of the second output value is completed by B
at step 6 and passed through A for output at step 8, and the computation of
the third output value is completed by C at step 7 and passed through B and
A for output at step 10.
Once the process has been checked at a detailed level as in the previous
paragraph, the method is better understood at a somewhat higher level. The
numbers in the middle part of the table above are simply a copy of the input
matrix, rotated and reflected as required for presentation to the top inputs
of the processors. If we check the numbers in the corresponding positions at
the left part of the table, we find three copies of the input vector, located in
exactly the right positions and at the right times for multiplication against
the rows of the matrix. And the corresponding positions on the right give
the intermediate results for each multiplication of the input vector with each
matrix row. For example, the multiplication of the input vector with the
middle matrix row requires the partial computations 1'1 = 1, 1 + 1*5 = 6,
and 6 + (-a)*2 = 2, which appear in the entries 1 6 2 in the reflected middle
row on the right-hand side of the table. The systolic machine manages to time
things so that each matrix element "meets" the proper input vector entry and
the proper partial computation at the processor where it is input, so that it
can be incorporated into the partial result.
The method extends in an obvious manner to multiply an N-by-N matrix
by an N-by-l vector using 2N - 1 processors in 4N - 2 steps. This does come
close to the ideal situation of having every processor perform useful work at
every step: a quadratic algorithm is reduced to a linear algorithm using a
linear number of processors.
One can appreciate from this example that systolic arrays are at once
simple and powerful. The output vector at the edge appears almost as if
by magic! However, each individual processor is just performing the simple
computation described above: the magic is in the interconnection and the
timed presentation of the inputs. As before, we've only described a general
method of parallel computation. Many details in the logical design need to
be worked out before such a systolic machine can be constructed.
As with perfect shuffle machines, systolic arrays may be used in many
different types of problems, including string matching and matrix multiplication
among others. Some researchers have even suggested the use of this
ALGORITMM h4ACHIAJES 469
interconnection pattern for "general-purpose" parallel machines.
Certainly, the study of the perfect shuffle and systolic machines illustrates
that hardware design can have a significant effect on algorithm design, suggesting
changes that can provide interesting new algorithms and fresh challenges
for the algorithm designer. While this is an interesting and fruitful area for
further research, we must conclude with a few sobering notes. First, a great
deal of engineering effort is required to translate general schemes for parallel
computation such as those sketched above to actual algorithm machines with
good performance. For many applications, the resource expenditure required
is simply not justified, and a simple "algorithm machine" consisting of a conventional
(inexpensive) microprocessor running a conventional algorithm will
do quite well. For example, if one has many instances of the same problem
to solve and several microprocessors with which to solve them, then ideal
parallel performance can be achieved by having each microprocessor (using a
conventional algorithm) working on a different instance of the problem, with
no interconnection at all required. If one has N files to sort and N processors
available on which to sort them, why not simply use one processor for
each sort, rather than having all N processors labor together on all N sorts?
Techniques such as those discussed in this chapter are currently justified only
for applications with very special time or space requirements. In studying
various parallel computation schemes and their effects on the performance of
various algorithms, we can look forward to the development of general-purpose
parallel computers that will provide improved performance for a wide variety
of algorithms.
t-l
470
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Outline two possible ways to use parallelism in Quicksort.
Write a conventional Pascal program which merges files using Batcher's
bitonic method.
Write a conventional Pascal program which merges files using Batcher's
bitonic method, but doesn't actually do any shuffles.
Does the program of the previous exercise have any advantage over conventional
merging?
How many perfect shuffles will bring all the elements in an array of size
2% back to their original positions?
Draw a table like the one given in the text to illustrate the operation of
the systolic matrix-vector multiplier for the following problem:
Write a conventional Pascal program which simulates the operation of
the systolic array for multiplying a N-by-N matrix by an N-by-l vector.
Show how to use a systolic array to transpose a matrix.
How many processors and how many steps would be required for a systolic
machine which can multiply an M-by-N matrix by an N-by-l vector?
Give a simple parallel scheme for matrix-vector multiplication using processors
which have the capability to "remember" computed values.
36. The Fast Fourier Transform
q One of the most widely used arithmetic algorithms is the fast Fourier
transform, which (among many other applications) can provide a substantial
reduction in the time required to multiply two polynomials. The
Fourier transform is of fundamental importance in mathematical analysis and
is the subject of volumes of study. The emergence of an efficient algorithm
for this computation was a milestone in the history of computing.
It would be beyond the scope of this book to outline the mathematical
basis for the Fourier transform or to survey its many applications. Our purpose
is to learn the characteristics of a fundamental algorithm within the
context of some of the other algorithms that we've been studying. In particular,
we'll examine how to use the algorithm for polynomial multiplication,
a problem that we studied in Chapter 4. Only a very few elementary facts
from complex analysis are needed to show how the Fourier transform can be
used to multiply polynomials, and it is possible to appreciate the fast Fourier
transform algorithm without fully understanding the underlying mathematics.
The divide-and-conquer technique is applied in a way similar to other important
algorithms that we've seen.
Evaluate, Multiply, Interpolate
The general strategy of the improved method for polynomial multiplication
that we'll be examining takes advantage of the fact that a polynomial of degree
N - 1 is completely determined by its value at N different points. When we
multiply two polynomials of degree N - 1 together, we get a polynomial of
degree 2N - 2: if we can find that polynomial's value at 2N - 1 points, then it
is completely determined. But we can find the value of the result at any point
simply by evaluating the two polynomials to be multiplied at that point and
then multiplying those numbers. This leads to the following general scheme
for multiplying two polynomials of degree N - 1:
471
472 CHAPTER 36
Evaluate the input polynomials at 2N - 1 distinct points.
Multiply the two values obtained at each point.
Interpolate to find the unique result polynomial that has the given value
at the given points.
For example, to compute T(Z) = p(x)q(x) with p(z) = 1+x+x2 and q(x) =
2 -x+x2, we can evaluate p(x) and q(x) at any five points, say -2,-1,0,1,2,
to get the values
[P(-%P(-l),P(O),P(l),P(2)1 = [3,1,1,3,71,
[4(-2), 4(-i), do), 4(l), 4641 = [8,4,2,2,41.
Multiplying these together term-by-term gives enough values for the product
polynomial,
[r(-2), 7(-l), r(O), r(l), r(2)] = [24,4,2,6,281,
that its coefficients can be found by interpolation. By the Lagrange formula,
x+1 x - o x - l x - 2
r(x) = 24----
-2+1 - 2 - o - 2 - l - 2 - 2
+x+24x --o x-- l-x-- 2
-1+2 - 1 - o - 1 - 1 - 1 - 2
+2x +-2 ~x+1 z-- 1 x-- 2
0+2 0+1 o - 1 o - 2
+6x +-2 -x+1 x-- o -x - 2
1+2 1Sl 1 - o l - 2
x + 2 x+1 x - o x - l
+28- - - -
2+2 2fl 2 - o 2 - l '
which simplifies to the result
r(x)=2+x+2x2+x4.
As described so far, this method is not a good algorithm for polynomial multiplication
since the best algorithms we have so far for both evaluation (repeated
application of Horner's method) and interpolation (Lagrange formula) require
N2 operations. However, there is some hope of finding a better algorithm because
the method works for any choice of 2N - 1 distinct points whatsoever,
and it is reasonable to expect that evaluation and interpolation will be easier
for some sets of points than for others.
THE FAST FOURIER TRANSFORM 473
Complex Roots of Unity
It turns out that the most convenient points to use for polynomial interpolation
and evaluation are complex numbers, in fact, a particular set of complex
numbers called the complex roots of unity.
A brief review of some facts about complex analysis is necessary. The
number i = fl is an imaginary number: though \/-i is meaningless as
a real number, it is convenient to give it. a name, i, and perform algebraic
manipulations with it, replacing i2 with -1 whenever it appears. A complex
number consists of two parts, real and imaginary, usually written as a + bi,
where a and b are reals. To multiply complex numbers, apply the usual rules,
but replace i2 with -1 whenever it appears. For example,
(CL + bi)(c + di) = (ac - bd) + (ad + bc)i.
Sometimes the real or imaginary part can cancel out when a complex multiplication
is performed. For example,
(1 - i)(l -i) = -2i,
(1 + i)" = -4,
(1 + i)* = 16.
Scaling this last equation by dividing through by 16 = a*), we find that
In general, there are many complex numbers that evaluate to 1 when raised
to a power. These are the so-called complex roots of unity. In fact, it turns
out that for each N, there are exactly N complex numbers z with zN = 1.
One of these, named WN, is called the principal Nth root of unity; the others
are obtained by raising WN to the kth power, for k = 0,1,2,. . . ,N - 1. For
example, we can list the eighth roots of unity as follows:
The first root, w&, is 1 and the second, wj~, is the principal root. Also, for
N even, the root wEI2 is -1 (because (w:'~)~ = 1).
The precise values of the roots are unimportant for the moment. We'll be
using only simple properties which can easily be derived from the basic fact
that the Nth power of any Nth root of unity must be 1.
474 CXAPTER 36
Evaluation at the Roots of Unity
The crux of our implementation will be a procedure for evaluating a polynomial
of degree N - 1 at the Nth roots of unity. That is, this procedure
transforms the N coefficients which define the polynomial into the N values
resulting from evaluating that polynomial at all of the Nth roots of unity.
This may not seem to be exactly what we want, since for the first step of
the polynomial multiplication procedure we need to evaluate polynomials of
degree N - 1 at 2N - 1 points. Actually, this is no problem, since we can view
a polynomial of degree N - 1 as a polynomial of degree 2N - 2 with N - 1
coefficients (those for the terms of largest degree) which are 0.
The algorithm that we'll use to evaluate a polynomial of degree N - 1
at N points simultaneously will be based on a simple divide-and-conquer
strategy. Rather than dividing the polynomials in the middle (as in the
multiplication algorithm in Chapter 4) we'll divide them into two parts by
putting alternate terms in each part. This division can easily be expressed in
terms of polynomials with half the number of coefficients. For example, for
N = 8, the rearrangement of terms is as follows:
p(s)=po+p1a:+pzx2 +p3x3 +p4x4+p5x5 +psx6+p7x7
=(p~+p~~2+~4~4+p~~6)+~(~l +p3x2 +p5x4+p7x6)
= &(X2) + xp,(x2).
The Nth roots of unity are convenient for this decomposition because if you
square a root of unity, you get another root of unity. In fact, even more is
true: for N even, if you square an Nth root of unity, you get an +Nth root of
unity (a number which evaluates to 1 when raised to the ;Nth power). This
is exactly what is needed to make the divide-and-conquer method work. To
evaluate a polynomial with N coefficients on N points, we split it into two
polynomials with ;N coefficients. These polynomials need only be evaluated
on ;N points (the ;Nth roots of unity) to compute the values needed for the
full evaluation.
To see this more clearly, consider the evaluation of a degree-7 polynomial
p(x) on the eighth roots of unity
w~:w~,w~,w~,w~,w~,w~,w~,w~.
Since wi = -1, this is the same as the sequence
w* : w;, w;, w;, w;, -w;, -w;, -w;, -w;.
Squaring each term of this sequence gives two copies of the sequence (W4) of
the fourth roots of unity:
THE FAST FOURLEX TRAhBFORM 475
Now, our equation
tells us immediately how to evaluate p(z) at the eighth roots of unity from
these sequences. First, we evaluate p,(x) and pO(x) at the fourth roots of unity.
Then we substitute each of the eighth roots of unity for x in the equation
above, which requires adding the appropriate p, value to the product of the
appropriate p, value and the eighth root of unity:
P(4) = P&4, + w:Po(w:),
Pm = P&4 + w;P&J:),
P(4) = Pe(W,", + w:Po(w.f),
PM3 = P&3 + w;Po(w:),
P(4) = Pew2 - w:Po(w:),
PO-4 = Pebi, - w;P&J~),
P(4) = P&4, - w;Po(w:),
P(4) = P&3 - w:Pob4
In general, to evaluate p(x) on the Nth roots of unity, we recursively evaluate
p,(x) and pO(x) on the ;Nth roots of unity and perform the N multiplications
as above. This only works when N is even, so we'll assume from now on that
N is a power of two, so that it remains even throughout the recursion. The
recursion stops when N = 2 and we have po + pix to be evaluated at 1 and
-1, with the results po + pi and pc -pi.
The number of multiplications used satisfies the "fundamental divideand-
conquer" recurrence
M(N) = 2M(N/2) + N,
which has the solution M(N) = N lg N. This is a substantial improvement
over the straightforward N2 method for interpolation but, of course, it works
only at the roots of unity.
This gives a method for transforming a polynomial from its representation
as N coefficients in the conventional manner to its representation in terms of
its values at the roots of unity. This conversion of the polynomial from the
first representation to the second is the Fourier transform, and the efficient
recursive calculation procedure that we have described is called the "fast"
Fourier transform (FFT). (These same techniques apply to more general
functions than polynomials. More precisely we're doing the "discrete" Fourier
transform.)
476 CHAPTER 36
Interpolation at the Roots of Unity
Now that we have a fast way to evaluate polynomials at a specific set of points,
all that we need is a fast way to interpolate polynomials at those same points,
and we will have a fast polynomial multiplication method. Surprisingly, it
works out that, for the complex roots of unity, running the evaluation program
on a particular set of points will do the interpolation! This is a specific instance
of a fundamental "inversion" property of the Fourier transform, from which
many important mathematical results can be derived.
For our example with N = 8, the interpolation problem is to find the
polynomial
r(x) = To + TlX + %X2 + r3x3 + r4x4 + r5x5 + QX6 + r7x7
which has the values
r(w;)= so, r(wk)= Sl, r(wi)= s 2 , r(wg)= sg,
?(w;)= $4, T(t";)= 85, T(w;)= ~96, +;)= ~97.
As we've said before, the interpolation problem is the "inverse" of the evaluation
problem. When the points under consideration are the complex roots of
unity, this is literally true. If we let
then we can get the coefficients
just by evaluating the polynomial s(x) at the inverses of the complex roots of
unity
w,l : w;,wi1,w-2,w-3 -4@+-6
8 81w8 8
-1
8 yw8
which is the same sequence as the complex roots of unity, but in a different
order:
w,l : w;, w;, w;, w;, w;, w;, w;, w18'
In other words, we can use exactly the same routine for interpolation as
for evaluation: only a simple rearrangement of the points to be evaluated is
required.
The proof of this fact requires some elementary manipulations with finite
sums: those unfamiliar with such manipulations may wish to skip to the end
of this paragraph. Evaluating s(z) at the inverse of the tth Nth root of unity
THE FAST FOURLER TRANSFORM 477
gives
s(w$) = c Sj(W$)t)
O<j<N
= c r(wg(wj$
O<j<N = c c Ti(W$J(Wf;t)J
OQ<N O<i<N
Nearly everything disappears in the last term because the inner sum is trivially
N if i = t: if i # t then it evaluates to
c
W(Z-W _ 1 j ( i - t ) _ N
wN -
O<l<N
Note that an extra scaling factor of N arises. This is the "inversion theorem"
for the discrete Fourier transform, which says that the same method will
convert a polynomial both ways: between its representation as coefficients and
its representation as values at the complex roots of unity.
While the mathematics may seem complicated, the results indicated are
quite easy to apply: to interpolate a polynomial on the Nth roots of unity,
use the same procedure as for evaluation, using the interpolation values as
polynomial coefficients, then rearrange and scale the answers.
Implementation
Now we have all the pieces for a divide-and-conquer algorithm to multiply
two polynomials using only about N lg N operations. The general scheme is
to:
Evaluate the input polynomials at the (2N - 1)st roots of unity.
Multiply the two values obtained at each point.
Interpolate to find the result by evaluating the polynomial defined by
the numbers just computed at the (2N - 1)st roots of unity.
The description above can be directly translated into a program which uses a
procedure that can evaluate a polynomial of degree N - 1 at the Nth roots
of unity. Unfortunately, all the arithmetic in this algorithm is to be complex
arithmetic, and Pascal has no built-in type complex. While it is possible
478 CHAPTER 36
to have a user-defined type for the complex numbers, it is then necessary
to also define procedures or functions for all the arithmetic operations on
the numbers, and this obscures the algorithm unnecessarily. The following
implementation assumes a type complex for which the obvious arithmetic
functions are defined:
eval(p, outN, 0);
eval(q, outN, 0);
for i:=O to outNdo r[i]:=p[i]*q[i];
eval(r, outN, 0);
for i:=l to N do
begin t:=r[i]; r[i]:=r[outN+1-i]; r[outN+l-i]:=t end;
for i:=O to outN do r[i] :=r[i]/(outN+l);
This program assumes that the global variable outN has been set to 2N-1,
and that p, q, and r are arrays indexed from 0 to 2N - 1 which hold complex
numbers. The two polynomials to be multiplied, p and q are of degree N - 1,
and the other coefficients in those arrays are initially set to 0. The procedure
eval replaces the coefficients of the polynomial given as the first argument by
the values obtained when the polynomial is evaluated at the roots of unity.
The second argument specifies the degree of the polynomial (one less than the
number of coefficients and roots of unity) and the third argument is described
below. The above code computes the product of p and q and leaves the result
in r.
Now we are left with the implementation of eval. As we've seen before,
recursive programs involving arrays can be quite cumbersome to implement. It
turns out that for this algorithm it is possible to get around the usual storage
management problem by reusing the storage in a clever way. What we would
like to do is have a recursive procedure that takes as input a contiguous array
of N + 1 coefficients and returns the N + 1 values in the same array. But
the recursive step involves processing two noncontiguous arrays: the odd and
even coefficients. On reflection, the reader will see that the "perfect shuffle"
of the previous chapter is exactly what is needed here. We can get the odd
coefficients in a contiguous subarray (the first half) and the even coefficients
in a contiguous subarray (the second half) by doing a "perfect unshuffle" of
the input, as diagramed below for N = 15:
THE FAST FOURlER TRANSFORM 479
PO p, P2 P3 p4 P5 P6 p7 &I p!3 PlO PI1 p12 P13 P14 P15
This leads to the following implementation of the FFT:
procedure eval(var p: poly; N, k: integer);
var i, j: integer;
if N=l then
t:=p[k]; pl:=p[k+l];
:I] :=t+pl; p[k+l] :=t-pl
else
fori:=Oto Ndiv 2do
j:=k+2*i;
t[i]:=pb]; t[i+l+Ndiv 2]:=p[j+l]
end;
for i:=O to N do p[k+i] := t [i] ;
eval(p, N div 2, k) ;
eval(p, N div 2, k+l+N div 2);
j:=(outN+l) div (Nfl);
fori:=Oto Ndiv 2do
t:=w[i*j]*p[k+(N div 2)+l+i];
t[i]:=p[k+i]+t; t [ i + ( N d i v 2)+1]:=p[k+i]-t
end ;
for i:=O to N do p[k+i]:=t[i]
end
end ;
This program transforms the polynomial of degree N inplace in the subarray
p[k..k+N] using the recursive method outlined above. (For simplicity, the
code assumes that N+l is a power of two, though this dependence is not hard
to remove.) If N = 1, then the easy computation to evaluate at 1 and -1 is
480 CHAPTER 36
performed. Otherwise the procedure first shuffles, then recursively calls itself
to transform the two halves, then combines the results of these computations
as described above. Of course, the actual values of the complex roots of unity
are needed to do the implementation. It is well known that
these values are easily computed using conventional trigonometric functions.
In the above program, the array w is assumed to hold the (outN+l)st roots of
unity. To get the roots of unity needed, the program selects from this array
at an interval determined by the variable i. For example, if outhJ were 15,
the fourth roots of unity would be found in w[O], w[4],w[8], and w[12]. This
eliminates the need to recompute roots of unity each time they are used.
As mentioned at the outset, the scope of applicability of the FFT is far
greater than can be indicated here; and the algorithm has been intensively
used and studied in a variety of domains. Nevertheless, the fundamental
principles of operation in more advanced applications are the same as for the
polynomial multiplication problem that we've discussed here. The FFT is
a classic example of t.he application of the. "divide-and-conquer" algorithm
design paradigm to achieve truly significant computational savings.
r l
THE FAST FOURLER TRANSFORM 481
Exercises
1. Explain how you would improve the simple evaluate-multiply-interpolate
algorithm for multiplying together two polynomials p(z) and q(z) with
known roots ~0, ~1,. . . #N-I and 40, 41,. . . ,qN-1.
2. Find a set of N real numbers at which a polynomial of degree N can be
evaluated using substantially fewer than N2 operations.
3. Find a set of N real numbers at which a polynomial of degree N can be
interpolated using substantially fewer than N2 operations.
4. What is the value of WE for M > N?
5. Would it be worthwhile to multiply sparse polynomials using the FFT?
6. The FFT implementation has three calls to eval, just as the polynomial
multiplication procedure in Chapter 4 has three calls to mult. Why is the
FFT implementation more efficient?
7. Give a way to multiply two complex numbers together using fewer than
four integer multiplication operations.
8. How much storage would be used by the FFT if we didn't circumvent the
"storage management problem" with the perfect shuffle?
9. Why can't some technique like the perfect shuffle be used to avoid the
problems with dynamically declared arrays in the polynomial multiplication
procedure of Chapter 4?
10. Write an efficient program to multiply a polynomial of degree N by a
polynomial of degree M (not necessarily powers of two).
37. Dynamic Programming
The principle of divide-and-conquer has guided the design of many of
the algorithms we've studied: to solve a large problem, break it up into
smaller problems which can be solved independently. In dynamic programming
this principle is carried to an extreme: when we don't know exactly which
smaller problems to solve, we simply solve them all, then store the answers
away to be used later in solving larger problems.
There are two principal difficulties with the application of this technique.
First, it may not always be possible to combine the solutions of two problems
to form the solution of a larger one. Second, there may be an unacceptably
large number of small problems to solve. No one has precisely characterized
which problems can be effectively solved with dynamic programming; there are
certainly many "hard" problems for which it does not seem to be applicable
(see Chapters 39 and 40), as well as many "easy" problems for which it is
less efficient than standard algorithms. However, there is a certain class of
problems for which dynamic programming is quite effective. We'll see several
examples in this section. These problems involve looking for the "best" way to
do something, and they have the general property that any decision involved
in finding the best way to do a small subproblem remains a good decision even
when that subproblem is included as a piece of some larger problem.
Knapsack Problem
Suppose that a thief robbing a safe finds N items of varying size and value
that he could steal, but has only a small knapsack of capacity A4 which he
can use to carry the goods. The knapsack problem is to find the combination
of items which the thief should choose for his knapsack in order to maximize
the total take. For example, suppose that he has a knapsack of capacity 17
and the safe contains many items of each of the following sizes and values:
483
CHAPTER 37
name A B C D E
size 3 4 7 8 9
value 4 5 10 11 13
(As before, we use single letter names for the items in the example and integer
indices in the programs, with the knowledge that more complicated names
could be translated to integers using standard searching techniques.) Then
the thief could take five A's (but not six) for a total take of 20, or he could
fill up his knapsack with a D and an E for a total take of 24, or he could try
many other combinations.
Obviously, there are many commercial applications for which a solution
to the knapsack problem could be important. For example, a shipping company
might wish to know the best way to load a truck or cargo plane with
items for shipment. In such applications, other variants to the problem might
arise: for example, there might be a limited number of each kind of item
available. Many such variants can be handled with the same approach that
we're about to examine for solving the basic problem stated above.
In a dynamic programming solution to the knapsack problem, we calculate
the best combination for all knapsack sizes up to M. It turns out that we
can perform this calculation very efficiently by doing things in an appropriate
order, as in the following program:
for j:=l to N do
for i:=l to M do
if i-sizeb]>=O then
if cost[i]<(cost[i-sizeb]]+valIj]) then
cost[i]:=cost[i-sizeb]]+valb];
best[i] :=j
end ;
end ;
In this program, cost[i] is the highest value that can be achieved with a
knapsack of capacity i and best [i] is the last item that was added to achieve
that maximum (this is used to recover the contents of the knapsack, as
described below). First, we calculate the best that we can do for all knapsack
sizes when only items of type A are taken, then we calculate the best that we
can do when only A's and B's are taken, etc. The solution reduces to a simple
calculation for cost [il. Suppose an item j is chosen for the knapsack: then the
best value that could be achieved for the total would be va1b] (for the item)
DYNAMIC PROGRAMMING 485
plus cost [i-sizeb]] (to fill up the rest of the knapsack). If this value exceeds
the best value that can be achieved without an item j, then we update cost [i]
and best[i]; otherwise we leave them alone. A simple induction proof shows
that this strategy solves the problem.
The following table traces the computation for our example. The first
pair of lines shows the best that can be done (the contents of the cost and
best arrays) with only A's, the second pair of lines shows the best that can be
done with only A's and B's, etc.:
1 2 3 4 5 6 7 8 9 1011121314151617
0 0 4 4 4 8 8 8 12 12 12 16 16 16 202020
A A A A A A A A A A A A A A A
0 0 4 5 5 8 9 10 12 13 14 16 17 18 20 21 22
A B B A B B A B B A B B A B B
0 0 4 5 5 8 10 10 12 14 15 16 18 20 20 22 24
A B B A C B A C C A C C A C C
0 0 4 5 5 8 10 11 12 14 15 16 18 20 21 22 24
A B B A C D A C C A C C D C C
0 0 4 5 5 8 10 11 13 14 15 17 18 20 21 23 24
A B B A C D E C C E C C D E C
Thus the highest value that can be achieved with a knapsack of size 17 is 24.
In order to compute this result, we also solved many smaller subproblems.
For example, the highest value that can be achieved with a knapsack of size
16 using only A's B's and C's is 22.
The actual contents of the optimal knapsack can be computed with the
aid of the best array. By definition, best [M] is included, and the remaining
contents are the same as for the optimal knapsack of size M-size[best [Ml].
Therefore, best [M-size [ best [Ml]] is included, and so forth. For our example,
best[l7]=C, then we find another type C item at size 10, then a type A item
at size 3.
It is obvious from inspection of the code that the running time of this
algorithm is proportional to NM. Thus, it will be fine if M is not large,
but could become unacceptable for large capacities. In particular, a crucial
point that should not be overlooked is that the method does not work at all if
M and the sizes or values are, for example, real numbers instead of integers.
This is more than a minor annoyance: it is a fundamental difficulty. No good
solution is known for this problem, and we'll see in Chapter 40 that many
486 CHAPTER 37
people believe that no good solution exists. To appreciate the difficulty of the
problem, the reader might wish to try solving the case where the values are
all 1, the size of the jth item is & and M is N/2.
But when capacities, sizes and values are all integers, we have the fundamental
principle that optimal decisions, once made, do not need to be
changed. Once we know the best way to pack knapsacks of any size with the
first j items, we do not need to reexamine those problems, regardless of what
the next items are. Any time this general principle can be made to work,
dynamic programming is applicable.
In this algorithm, only a small amount of information about previous
optimal decisions needs to be saved. Different dynamic programming applications
have widely different requirements in this regard: we'll see other examples
below.
Matrix Chain Product
Suppose that the six matrices
are to be multiplied together. Of course, for the multiplications to be valid,
the number of columns in one matrix must be the same as the number of rows
in the next. But the total number of scalar multiplications involved depends
on the order in which the matrices are multiplied. For example, we could
proceed from left to right: multiplying A by B, we get a 4-by-3 matrix after
using 24 scalar multiplications. Multiplying this result by C gives a 4-by-1
matrix after 12 more scalar multiplications. Multiplying this result by D gives
a 4-by-2 matrix after 8 more scalar multiplications. Continuing in this way,
we get a 4-by-3 result after a grand total of 84 scalar multiplications. But if
we proceed from right to left instead, we get the same 4-by-3 result with only
69 scalar multiplications.
Many other orders are clearly possible. The order of multiplication can be
expressed by parenthesization: for example the left-Wright order described
above is the ordering (((((A*B)*C)*D)*E)*F), and the right-to-left order is
(A*(B*(C*(D*(E*F))))). Any legal parenthesization will lead to the correct
answer, but which leads to the fewest scalar multiplications?
Very substantial savings can be achieved when large matrices are involved:
for example, if matrices B, C, and F in the example above were to each have
a dimension of 300 where their dimension is 3, then the left-to-right order
will require 6024 scalar multiplications but the right-to-left order will use an
DMVAMlC PROGRAhdMING 487
astronomical 274,200. (In these calculations we're assuming that the standard
method of matrix multiplication is used. Strassen's or some similar method
could save some work for large matrices, but the same considerations about
the order of multiplications apply. Thus, multiplying a p-by-q matrix by
a q-by-r matrix will produce a pby-r matrix, each entry computed with q
multiplications, for a total of pqr multiplications.)
In general, suppose that N matrices are to be multiplied together:
where the matrices satisfy the constraint that Mi has ri rows and ri+i columns
for 1 5 i < N. Our task is to find the order of multiplying the matrices
that minimizes the total number of multiplications used. Certainly trying
all possible orderings is impractical. (The number of orderings is a wellstudied
combinatorial function called the Catalan number: the number of
ways to parenthesize N variables is about 4N-'/Nm.) But it is certainly
worthwhile to expend some effort to find a good solution because N is generally
quite small compared to the number of multiplications to be done.
As above, the dynamic programming solution to this problem involves
working "bottom up," saving computed answers to small partial problems to
avoid recomputation. First, there's only one way to multiply Ml by Mz, Mz
by MS, . . . , MN-~ by MN; we record those costs. Next, we calculate the best
way to multiply successive triples, using all the information computed so far.
For example, to find the best way to multiply MlMzMs, first we find the cost
of computing MI MZ from the table that we saved and then add the cost of
multiplying that result by Ms. This total is compared with the cost of first
multiplying MzM3 then multiplying by Ml, which can be computed in the
same way. The smaller of these is saved, and the same procedure followed for
all triples. Next, we calculate the best way to multiply successive groups of
four, using all the information gained so far. By continuing in this way we
eventually find the best way to multiply together all the matrices.
In general, for 1 5 j 5 N - 1, we can find the minimum cost of computing
MiMi+l* **Mt+j
for 1 5 i 5 N - j by finding, for each k between i and i + j, the cost of
computing MiMi+l*** Mk-1 and MkMk+i." Mi+j and then adding the cost
of multiplying these results together. Since we always break a group into two
smaller groups, the minimum costs for the two groups need only be looked
up in a table, not recomputed. In particular, if we maintain an array with
entries cost [1, r] giving the minimum cost of computing MLML+I**.M,, then
the cost of the first group above is cost [i, k-l] and the cost of the second
488 CHAPTER 37
group is cost [k, i+j]. The cost of the final multiplication is easily determined:
M,M,+I... Mk-1 is a rz-by-rk matrix, and MkMk+l* **Mi+j is a rk-by-ri+j+l
matrix, so the cost of multiplying these two is rirkri+j+l. This gives a way
to compute cost[i, i+j] for 1 5 i 5 N-j with j increasing from 1 to N - 1.
When we reach j = N - 1 (and i = l), then we've found the minimum cost of
computing Ml Mze +. MN, as desired. This leads to the following program:
for i:=l to N do
for j:=i+l to N do cost [i, j] :=maxint;
for i:=l to N do cost[i, i]:=O;
for j:=l to N-l do
for i:=l to N-j do
for k:=i+1 to i+j do
t:=cost[i, k-l]+cost[k, i+j]+r[i]*r[k]*r[i+j+l];
if t<cost[i, i+j] then
begin cost[i,i+j]:=t; best[i, i+j]:=k end;
end ;
As above, we need to keep track of the decisions made in a separate array
best for later recovery when the actual sequence of multiplications is to be
generated.
The following table is derived in a straightforward way from the cost and
best arrays for the sample problem given above:
B C D E
A 24 14 22 26
A B A B C D C D
B 6 10 14
B C C D C D
C 6 10
C D C D
D 4
D E
E
F
36
C D
22
C D
19
C D
10
E F
12
E F
For example, the entry in row A and column F says that 36 scalar multiplications
are required to multiply matrices A through F together, and that this can
DYNAMIC PROGR.AMMlNG 489
be achieved by multiplying A through C in the optimal way, then multiplying
D through F in the optimal way, then multiplying the resulting matrices
together. (Only D is actually in the best array: the optimal splits are indicated
by pairs of letters in the table for clarity.) To find how to multiply A through
C in the optimal way, we look in row A and column C, etc. The following
program implements this process of extracting the optimal parenthesization
from the cost and best arrays computed by the program above:
procedure order(i, j: integer);
if i=j then write(name(i)) else
write( '( ');
order(i, best [i, j]-1); write('*'); order(best[i, j], j);
write( ') ')
end
end ;
For our example, the parenthesization computed is ((A*(B*C))*((D*E)*F))
which, as mentioned above, requires only 36 scalar multiplications. For the
example cited earlier with the dimensions cf 3 in B, C and F changed to 300,
the same parenthesization is optimal, requiring 2412 scalar multiplications.
The triple loop in the dynamic programming code leads to a running time
proportional to N3 and the space required is proportional to N2, substantially
more than we used for the knapsack problem. But this is quite palatable
compared to the alternative of trying all 4N-'/Na possibilities.
Optimal Binary Search Trees
In many applications of searching, it is known that the search keys may occur
with widely varying frequency. For example, a program which checks the
spelling of words in English text is likely to look up words like "and" and "the"
far more often than words like "dynamic" and "programming." Similarly,
a Pascal compiler is likely to see keywords like "end" and "do" far more
often than "label" or "downto." If binary tree searching is used, it is clearly
advantageous to have the most frequently sought keys near the top of the tree.
A dynamic programming algorithm can be used to determine how to arrange
the keys in the tree so that the total cost of searching is minimized.
Each node in the following binary search tree on the keys A through G is
labeled with an integer which is assumed to be proportional to its frequency
of access:
490 CHAPTER 37
That is, out of every 18 searches in this tree, we expect 4 to be for A, 2 to
be for B, 1 to be for C, etc. Each of the 4 searches for A requires two node
accesses, each of the 2 searches for B requires 3 node accesses, and so forth.
We can compute a measure of the "cost" of the tree by simply multiplying
the frequency for each node by its distance to the root and summing. This is
the weighted internal path length of the tree. For the example tree above, the
weighted internal path length is 4*2 + 2*3 + l*l + 3*3 + 5*4 + 2*2 + 1*3 = 51.
We would like to find the binary search tree for the given keys with the given
frequencies that has the smallest internal path length over all such trees.
This problem is similar to the problem of minimizing weighted external
path length that we saw in studying Huffman encoding, but in Huffman
encoding it was not necessary to maintain the order of the keys: in the binary
search tree, we must preserve the property that all nodes to the left of the
root have keys which are less, etc. This requirement makes the problem very
similar to the matrix chain multiplication problem treated above: virtually
the same program can be used.
Specifically, we assume that we are given a set of search keys K1 < Kz <
. .. < KN and associated frequencies rc, rl , . . . , TN, where ri is the anticipated
frequency of reference to key Ki. We want to find the binary search tree that
minimizes the sum, over all keys, of these frequencies times the distance of
the key from the root (the cost of accessing the associated node).
We proceed exactly as for the matrix chain problem: we compute, for
each j increasing from 1 to N - 1, the best way to build a subtree containing
K,, J&+1,. . . ,Ki+j for 1 2 i 2 N-j. This computation is done by trying each
node as the root and using precomputed values to determine the best way to
do the subtrees. For each k between i and i + j, we want to find the optimal
tree containing K,, Ki+l,. . . , Ki+j with Kk at the root. This tree is formed
by using the optimal tree for K,, Ki+l,. . . ,Kk-r as the left subtree and the
optimal tree for Kk+r, Kk+z,. . . ,K2+3 as the right subtree. The internal path
length of this tree is the sum of the internal path lengths for the two subtrees
DYNMC PROGRAMMING 491
plus the sum of the frequencies for all the nodes (since each node is one step
further from the root in the new tree). This leads to the following program:
for i:=l to N do
for j:=i+l to N+l do cost[i, j] :=maxint;
for i:=l to Ndo cost[i,i]:=f[i];
for i:=l to N+l do cost[i, i-l] :=O;
for j:=l to N-l do
for i:=l to N-j do
for k:=i to i+j do
t:=cost[i,k-l]+cost[k+l,i+j];
if t<cost [i, i+j] then
begin cost[i, i+j] :=t; best[i, i+j] :=k end;
end ;
t:=O; for k:=i to i+j do t:=t+f[k];
cost[i, i+j] :=cost[i, i+j]+t;
end ;
Note that the sum of all the frequencies would be added to any cost so it is not
needed when looking for the minimum. Also, we must have cost [i, i-l]=0 to
cover the possibility that a node could just have one son (there was no analog
to this in the matrix chain problem).
As before, a short recursive program is required to recover the actual tree
from the best array computed by the program. For the example given above,
the optimal tree computed is
D3
A" YB2;
C'
ES 2 F
73G'
which has a weighted internal path length of 41.
492 CHAPTER 37
As above, this algorithm requires time proportional to N3 since it works
with a matrix of size N2 and spends time proportional to N on each entry.
It is actually possible in this case to reduce the time requirement to N2 by
taking advantage of the fact that the optimal position for the root of a tree
can't be too far from the optimal position for the root of a slightly smaller
tree, so that k doesn't have to range over all the values from i to i + j in the
program above.
Shortest Paths
In some cases, the dynamic programming formulation of a method to solve
a problem produces a familiar algorithm. For example, Warshall's algorithm
(given in Chapter 32) for finding the transitive closure of a directed graph
follows directly from a dynamic programming formulation. To show this,
we'll consider the more general all-pairs shortest paths problem: given a graph
with vertices { 1,2,. . . ,V} determine the shortest distance from each vertex
to every other vertex.
Since the problem calls for V2 numbers as output, the adjacency matrix
representation for the graph is obviously appropriate, as in Chapters 31 and
32. Thus we'll assume our input to be a V-by-V array a of edge weights, with
a[i, j] :=w if there is an edge from vertex i to vertex j of weight w. If a[i, j]=
a b, i] for all i and j then this could represent an undirected graph, otherwise it
represents a directed graph. Our task is to find the directed path of minimum
weight connecting each pair of vertices. One way to solve this problem is to
simply run the shortest path algorithm of Chapter 31 for each vertex, for a
total running time proportional to V3. An even simpler algorithm with the
same performance can be derived from a dynamic programming approach.
The dynamic programming algorithm for this problem follows directly
from our description of Warshall's algorithm in Chapter 32. We compute,
1 5 k 5 N, the shortest path from each vertex to each other vertex which
uses only vertices from {1,2,. . . , k}. The shortest path from vertex i to vertex
j using only vertices from 1,2, . . . , k is either the shortest path from vertex i
to vertex j using only vertices from 1,2,. . . , k - 1 or a path composed of the
shortest path from vertex i to vertex k using only vertices from 1,2, . . . , k - 1
and the shortest path from vertex k to vertex j using only vertices from
1,2,. . . , k - 1. This leads immediately to the following program.
DYNAMlC PROGRAMMlNG 493
for y:=l to Vdo
for x:=1 to Vdo
if a[x,y]<>maxint div 2 then
for j:=l to Vdo
if a[x,jl>(a[x,Yl+a[y,jl)
then a[x,j]:=a[x,y]+a[y,j];
The value maxint div 2 is used as a sentinel in matrix positions corresponding
to edges not present in the graph. This eliminates the need to test explicitly
in the inner loop whether there is an edge from x to j or from y to j. A "small"
sentinel value is used so that there will be no overflow.
This is virtually the same program that we used to compute the transitive
closure of a directed graph: logical operations have been replaced by arithmetic
operations. The following table shows the adjacency matrix before and after
this algorithm is run on directed graph example of Chapter 32, with all edge
weights set to 1:
A B C D E F G H I J K L M
A 0 1 1 1
B 0
C l 0
D 0 1
E 1 0
F 1 0
G 1 1 0 1
H 1 0 1
I 1 0
J 0 1 1 1
K 0
L 1 0 1
M 1 0
A B C D E F G H I J K L M
A 0 1 2 3 2 1 1 2 33 3
B 0
C l 2 0 4 3 2 2 3 4 4 4
D 0 2 1
E 1 0 2
F 2 1 0
G2312130 1 2 2 2
H 3 4 2 3 2 4 1 0 1 2 3 3 3
1 4 5 3 4 3 5 2 1 0 3 4 4 4
J 4 5 3 4 3 5 2 0 1 1 1
K 0
L 3 4 2 3 2 4 1 2 3 0 1
M 4 5 3 4 3 5 2 3 4 1 0
Thus the shortest path from M to B is of length 5, etc. Note that, for this
algorithm, the weight corresponding to the edge between a vertex and itself
is 0. Except for this, if we consider nonzero entries as 1 bits, we have exactly
the bit matrix produced by the transitive closure algorithm of Chapter 32.
From a dynamic programming standpoint, note that the amount of information
saved about small subproblems is nearly the same as the amount
of information to be output, so little space is wasted.
CHAPTER 37
One advantage of this algorithm over the shortest paths algorithm of
Chapter 31 is that it works properly even if negative edge weights are allowed,
as long as there are no cycles of negative weight in the graph (in which case
the shortest paths connecting nodes on the cycle are not defined). If a cycle
of negative weight is present in the graph, then the algorithm can detect that
fact, because in that case a[i, i] will become negative for some i at some point
during the algorithm.
Time and Space Requirements
The above examples demonstrate that dynamic programming applications can
have quite different time and space requirements depending on the amount of
information about small subproblems that must be saved. For the shortest
paths algorithm, no extra space was required; for the knapsack problem,
space proportional to the size of the knapsack was needed; and for the other
problems N2 space was needed. For each problem, the time required was a
factor of N greater than the space required.
The range of possible applicability of dynamic programming is far larger
than covered in the examples. From a dynamic programming point of view,
divide-and-conquer recursion could be thought of as a special case in which
a minimal amount of information about small cases must be computed and
stored, and exhaustive search (which we'll examine in Chapter 39) could be
thought of as a special case in which a maximal amount of information about
small cases must be computed and stored. Dynamic programming is a natural
design technique that appears in many guises to solve problems throughout
this range.
DYNAMIC PROGRAMMING 495
Exercises
1. In the example given for the knapsack problem, the items are sorted by
size. Does the algorithm still work properly if they appear in arbitrary
order?
2. Modify the knapsack program to take into account another constraint
defined by an array num [1..N] which contains the number of items of
each type that are available.
3. What would the knapsack program do if one of the values were negative?
4. True or false: If a matrix chain involves a l-by-k by k-by-l multiplication,
then there is an optimal solution for which that multiplication is last.
Defend your answer.
5. Write a program which actually multiplies together N matrices in an optimal
way. Assume that the matrices are stored in a three-dimensional array
matrices[l..Nmax, 1..Dmax, 1..Dmax], where Dmax is the maximum
dimension, with the ith matrix stored in matrices[i, l..r[i], l..r[i+l]].
6. Draw the optimal binary search tree for the example in the text, but with
all the frequencies increased by 1.
7. Write the program omitted from the text for actually constructing the
optimal binary search tree.
8. Suppose that we've computed the optimum binary search tree for some
set of keys and frequencies, and say that one frequency is incremented by
1. Write a program to compute the new optimum tree.
9. Why not solve the knapsack problem in the same way as the matrix chain
and optimum binary search tree problems, by minimizing, for k from 1
to M, the sum of the best value achievable for a knapsack of size k and
the best value achievable for a knapsack of size M-k?
10. Extend the program for the shortest paths problem to include a procedure
paths(i, j: integer) that will fill an array path with the shortest path from
i to j. This procedure should take time proportional to the length of the
path each time it is called, using an auxiliary data structure built up by
a modified version of the program given in the text.
3 8. Linear Programming
Many practical problems involve complicated interactions between a
number of varying quantities. One example of this is the network flow
problem discussed in Chapter 33: the flows in the various pipes in the network
must obey physical laws over a rather complicated network. Another example
is scheduling various tasks in (say) a manufacturing process in the face of
deadlines, priorities, etc. Very often it is possible to develop a precise mathematical
formulation which captures the interactions involved and reduces
the problem at hand to a more straightforward mathematical problem. This
process of deriving a set of mathematical equations whose solution implies the
solution of a given practical problem is called mathematical programming. In
this section, we consider a fundamental variant of mathematical programming,
linear programming, and an efficient algorithm for solving linear programs, the
simplex method.
Linear programming and the simplex method are of fundamental importance
because a wide variety of important problems are amenable to formulation
as linear programs and efficient solution by the simplex method. Better
algorithms are known for some specific problems, but few problem-solving
techniques are as widely applicable as the process of first formulating the
problem as a linear program, then computing the solution using the simplex
method.
Research in linear programming has been extensive, and a full understanding
of all the issues involved requires mathematical maturity somewhat
beyond that assumed for this book. On the other hand, some of the basic ideas
are easy to comprehend, and the actual simplex algorithm is not difficult to
implement, as we'll see below. As with the fast Fourier transform in Chapter
36, our intent is not to provide a full practical implementation, but rather
to learn some of the basic properties of the algorithm and its relationship to
other algorithms that we've studied.
497
498 CHAPTER 38
Linear Programs
Mathematical programs involve a set of variables related by a set of mathematical
equations called constraints and an objective function involving the
variables that are to be maximized subject to the constraints. If all of
the equations involved are simply linear combinations of the variables, we
have the special case that we're considering called linear programming. The
"programming" necessary to solve any particular problem involves choosing
the variables and setting up the equations so that a solution to the equations
corresponds to a solution to the problem. This is an art that we won't pursue
in any further detail, except to look at a few examples. (The "programming"
that we'll be interested in involves writing Pascal programs to find solutions
to the mathematical equations.)
The following linear program corresponds to the network flow problem
that we considered in Chapter 33.
Maximize XAB + XAD
subject to the constraints
2~858 xCD<2
xAD<2 XCF-5 4
xBC<6 XEBI 3
xDE<s xEF<s
xAB+xEB=xBC,
xAD+xCD=xDE,
xEF+xEB=xDE,
%D+xCF=xBC,
~AB,~AC,~BC,~CD,~CF,~DE,~EB,~EF~0.
There is one variable in this linear program corresponding to the flow in
each of the pipes. These variables satisfy two types of equations: inequalities,
corresponding to capacity constraints on the pipes, and equalities, corresponding
to flow constraints at every junction. Thus, for example, the inequality
XAB 5 8 says that pipe m has capacity 8, and the equation XAB + XEB = XBC
says that the inflow must equal the outflow at junction B. Note that adding all
the equalities together gives the implicit constraint XAB + XAD = XCF + XEF
which says that the inflow must equal the outflow for the whole network. Also,
of course, all of the flows must be positive.
This is clearly a mathematical formulation of the network flow problem: a
solution to this particular mathematical problem is a solution to the particular
LINEAR PROGRAMMING 499
instance of the network flow problem. The point of this example is not
that linear programming will provide a better algorithm for this problem,
but rather that linear programming is a quite general technique that can be
applied to a variety of problems. For example, if we were to generalize the
network flow problem to include costs as well as capacities, or whatever, the
linear programming formulation would not look much different, even though
the problem might be significantly more difficult to solve directly.
Not only are linear programs richly expressive but also there exists an
algorithm for solving them (the simplex algorithm) which has proven to be
quite efficient for many problems arising in practice. For some problems
(such as network flow) there may be an algorithm specifically oriented to that
problem which can perform better than linear programming/simplex; for other
problems (including various extensions of network flow), no better algorithms
are known. Even if there is a better algorithm, it may be complicated or
difficult to implement, while the procedure of developing a linear program
and solving it with a simplex library routine is often quite straightforward.
This "general-purpose" aspect of the method is quite attractive and has led
to its widespread use. The danger in relying upon it too heavily is that it may
lead to inefficient solutions for some simple problems (for example, many of
those for which we have studied algorithms in this book).
Geometric Interpretation
Linear programs can be cast in a geometric setting. The following linear
program is easy to visualize because only two variables are involved.
Maximize x1 + 52
subject to the constraints
-x1 +x2 55,
X1 +4X2 5 45,
2x1 +x2i 27,
3x1 - 4x2 5 24,
X1,X2 2 0.
It corresponds to the following diagram:
500 CIIAPTER 38
Each inequality defines a halfplane in which any solution to the linear program
must lie. For example, xi 2 0 means that any solution must lie to the right
of the 22 axis, and -xi+ x2 I 5 means that any solution must lie below and
to the right of the line -xi + x2 = 5 (which goes through (0,5) and (5,lO)).
Any solution to the linear program must satisfy all of these constraints, so the
region defined by the intersection of all these halfplanes (shaded in the diagram
above) is the set of all possible solutions. To solve the linear program we must
find the point within this region which maximizes the objective function.
It is always the case that a region defined by intersecting halfplanes is
convex (we've encountered this before, in one of the definitions of the convex
hull in Chapter 25). This convex region, called the simplex, forms the basis
for an algorithm to find the solution to the linear program which maximizes
the objective function.
A fundamental property of the simplex, which is exploited by the algorithm,
is that the objective function is maximized at one of the vertices of the
simplex: thus only these points need to be examined, not all the points inside.
To see why this is so for our example, consider the dotted line at the right,
which corresponds to the objective function. The objective function can be
thought of as defining a line of known slope (in this case -1) and unknown
position. We're interested in the point at which the line hits the simplex, as it
is moved in from infinity. This point is the solution to the linear program: it
satisfies all the inequalities because it is in the simplex, and it maximizes the
objective function because no points with larger values were encountered. For
LINEAR PROGRAMMING
our example, the line hits the simplex at (9,9) which maximizes the objective
function at 18.
Other objective functions correspond to lines of other slopes, but always
the maximum will occur at one of the vertices of the simplex. The algorithm
that we'll examine below is a systematic way of moving from vertex to vertex
in search of the minimum. In two dimensions, there's not much choice about
what to do, but, as we'll see, the simplex is a much more complicated object
when more variables are involved.
From the geometric representation, one can also appreciate why mathematical
programs involving nonlinear functions are so much more difficult
to handle. For example, if the objective function is nonlinear, it could be a
curve that could strike the simplex along one of its edges, not at a vertex. If
the inequalities are also nonlinear, quite complicated geometric shapes which
correspond to the simplex could arise.
Geometric intuition makes it clear that various anomalous situations can
arise. For example, suppose that we add the inequality ~1 2 13 to the linear
program in the example above. It is quite clear from the diagram above that
in this case the intersection of the half-planes is empty. Such a linear program
is called infeasible: there are no points which satisfy the inequalities, let alone
one which maximizes the objective function. On the other hand the inequality
~1 5 13 is redundant: the simplex is entirely contained within its halfplane,
so it is not represented in the simplex. Redundant inequalities do not affect
the solution at all, but they need to be dealt with during the search for the
solution.
A more serious problem is that the simplex may be an open (unbounded)
region, in which case the solution may not be well-defined. This would be the
case for our example if the second and third inequalities were deleted. Even if
the simplex is unbounded the solution may be well-defined for some objective
functions, but an algorithm to find it might have significant difficulty getting
around the unbounded region.
It must be emphasized that, though these problems are quite easy to
see when we have two variables and a few inequalities, they are very much
less apparent for a general problem with many variables and inequalities.
Indeed, detection of these anomalous situations is a significant part of the
computational burden of solving linear programs.
The same geometric intuition holds for more variables. In 3 dimensions
the simplex is a convex 3-dimensional solid defined by the intersection of
halfspaces defined by the planes whose equations are given by changing the
inequalities to equalities. For example, if we add the inequalities x3 5 4 and
x3 2 0 to the linear program above, the simplex becomes the solid object
diagramed below:
502 CHAPTER 38
(8,070) );
To make the example more three-dimensional, suppose that we change the
objective function to ~1 + ~2 + 2s. This defines a plane perpendicular to the
line zl = x2 = x3. If we move a plane in from infinity along this line, we
hit the simplex at the point (9,9,4) which is the solution. (Also shown in
the diagram is a path along the vertices of the simplex from (O,O, 0) to the
solution, for reference in the description of the algorithm below.)
In n dimensions, we intersect halfspaces defined by (n - 1)-dimensional
hyperplanes to define the n-dimensional simplex, and bring in an (n - l)-
dimensional hyperplane from infinity to intersect the simplex at the solution
point. As mentioned above, we risk oversimplification by concentrating on
intuitive two- and three-dimensional situations, but proofs of the facts above
involving convexity, intersecting hyperplanes, etc. involve a facility with
linear algebra somewhat beyond the scope of this book. Still, the geometric
intuition is valuable, since it can help us to understand the fundamental
characteristics of the basic method that is used in practice to solve higherdimensional
problems.
LINEAR PROGRAMMING
The Simplex Method
Simplex is the name commonly used to describe a general approach to solving
linear programs by using pivoting, the same fundamental operation used in
Gaussian elimination. It turns out that pivoting corresponds in a natural way
to the geometric operation of moving from point to point on the simplex, in
search of the solution. The several algorithms which are commonly used differ
in essential details having to do with the order in which simplex vertices are
searched. That is, the well-known "algorithm" for solving this problem could
more precisely be described as a generic method which can be refined in any
of several different ways. We've encountered this sort of situation before, for
example Gaussian elimination or the Ford-Fulkerson algorithm.
First, as the reader surely has noticed, linear programs can take on
many different forms. For example, the linear program above for the network
flow problem has a mixture of equalities and inequalities, but the geometric
examples above use only inequalities. It is convenient to reduce the number
of possibilities somewhat by insisting that all linear programs be presented in
the same standard form, where all the equations are equalities except for an
inequality for each variable stating that it is nonnegative. This may seem like
a severe restriction, but actually it is not difficult to convert general linear
programs to this standard form. For example, the following linear program is
the standard form for the three-dimensional example given above:
Maximize x1 +x2 + x3
subject to the constraints
-21+zz+y1=5
zl+422+yz=45
&+zz+yg=27
3x1-4x2+y4=24
x3 + y5 = 4
xl,x2,x3,Yl,Y2,Y3,Y4,Y5 2 0.
Each inequality involving more than one variable is converted into an equality
by introducing a new variable. The y's are called slack variables because they
take up the slack allowed by the inequality. Any inequality involving only one
variable can be converted to the standard nonnegative constraint simply by
renaming the variable. For example, a constraint such as x3 < -1 would be
handled by replacing x3 by -1 - xj everywhere that it appears.
This formulation makes obvious the parallels between linear programming
and simultaneous equations. We have N equations in M unknown variables,
all constrained to be positive. In this case, note that there are N slack
variables, one for each equation (since we started out with all inequalities).
504 CHAPTER 38
We assume that M > N which implies that there are many solutions to
the equations: the problem is to find the one which maximizes the objective
function.
For our example, there is a trivial solution to the equations: take z1 =
xz = x3 = 0, then assign appropriate values to the slack variables to satisfy the
equalities. This works because (0, 0,O) is a point on the simplex. Although this
need not be the case in general, to explain the simplex method, we'll restrict
attention for now to linear programs where it is known to be the case. This
is still a quite large class of linear programs: for example, if all the numbers
on the right-hand side of the inequalities in the standard form of the linear
program are positive and slack variables all have positive coefficients (as in
our example) then there is clearly a solution with all the original variables
zero. Later we'll return to the general case.
Given a solution with M-N variables set to 0, it turns out that we can
find another solution with the same property by using a familiar operation,
pivoting. This is essentially the same operation used in Gaussian elimination:
an element a[p, q] is chosen in the matrix of coefficients defined by the equations,
then the pth row is multiplied by an appropriate scalar and added to all
other rows to make the qth column all 0 except for the entry in row q, which
is made 1. For example, consider the following matrix, which represents the
linear program given above:
-1.00 -1.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 5.00
1.00 4.00 0.00 0.00 1.00 0.00 0.00 0.00 45.00
2.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 27.00
3.00 -4.00 0.00 0.00 0.00 0.00 1.00 0.00 24.00
0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 4.00
This (N + 1)-by-(M + 1) matrix contains the coefficients of the linear program
in standard form, with the (M + 1)st column containing the numbers on the
right-hand sides of the equations (as in Gaussian elimination), and the 0th row
containing the coefficients of the objective function, with the sign reversed.
The significance of the 0th row is discussed below; for now we'll treat it just
like all of the other rows.
For our example, we'll carry out all computations to two decimal places.
Obviously, issues such as computational accuracy and accumulated error are
just as important here as they are in Gaussian elimination.
The variables which correspond to a solution are called the basis variables
and those which are set to 0 to make the solution are called non-basis variables.
In the matrix, the columns corresponding to basis variables have exactly one 1
with all other values 0, while non-basis variables correspond to columns with
more than one nonzero entry.
LINEAR PROGRAMMTNG 505
Now, suppose that we wish to pivot this matrix for p = 4 and q = 1. That
is, an appropriate multiple of the fourth row is added to each of the other
rows to make the first column all 0 except for a 1 in row 4. This produces the
following result:
0.00 -2.33 -1.00 0.00 0.00 0.00 0.33 0.00 8.00
0.00 -0.33 0.00 1.00 0.00 0.00 0.33 0.00 13.00
0.00 5.33 0.00 0.00 1.00 0.00 -0.33 0.00 37.00
0.00 3.67 0.00 0.00 0.00 1.00 -0.67 0.00 11.00
1.00 -1.33 0.00 0.00 0.00 0.00 0.33 0.00 8.00
0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 4.00.
This operation removes the 7th column from the basis and adds the 1st column
to the basis. Exactly one basis column is removed because exactly one basis
column has a 1 in row p.
By definition, we can get a solution to the linear program by setting all the
non-basis variables to zero, then using the trivial solution given in the basis.
In the solution corresponding to the above matrix, both x2 and 2s are zero
because they are non-basis variables and xi = 8, so the matrix corresponds
to the point (8,0,0) on the simplex. (We're not interested particularly in the
values of the slack variables.) Note that the upper right hand corner of the
matrix (row 0, column M + 1) contains the value of the objective function at
this point. This is by design, as we shall soon see.
Now suppose that we perform the pivot operation for p = 3 and q = 2:
f 0.00 0.00 -1.00 0.00 0.00 0.64 -0.09 0.00 15.00
0.00 0.00 0.00 1.00 0.00 0.09 0.27 0.00 14.00
0.00 0.00 0.00 0.00 1.00 -1.45 0.64 0.00 21.00
0.00 1.00 0.00 0.00 0.00 0.27 -0.18 0.00 3.00
1.00 0.00 0.00 0.00 0.00 0.36 0.09 0.00 12.00
\ 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 4.00
This removes column 6 from the basis and adds column 2. By setting non-basis
variables to 0 and solving for basis variables as before, we see that this matrix
corresponds to the point (12,3,0) on the simplex, for which the objective
function has the value 15. Note that the value of the objective function is
strictly increasing. Again, this is by design, as we shall soon see.
How do we decide which values of p and q to use for pivoting? This
is where row 0 comes in. For each non-basis variable, row 0 contains the
amount by which the objective function would increase if that variable were
changed from 0 to 1, with the sign reversed. (The sign is reversed so that the
standard pivoting operation will maintain row 0, with no changes.) Pivoting
using column q amounts to changing the value of the corresponding variable
506 CRAPTER 38
from 0 to some positive value, so we can be sure the objective function will
increase if we use any column with a negative entry in row 0.
Now, pivoting on any row with a positive entry for that column will
increase the objective function, but we also must make sure that it will result
in a matrix corresponding to a point on the simplex. Here the central concern
is that one of the entries in column M + 1 might become negative. This can be
forestalled by finding, among the positive elements in column q (not including
row 0), the one that gives the smallest value when divided into the (M + 1)st
element in the same row. If we take p to be the index of the row containing
this element and pivot, then we can be sure that the objective function will
increase and that none of the entries in column M + 1 will become negative;
this is enough to ensure that the resulting matrix corresponds to a point on
the simplex.
There are two potential problems with this procedure for finding the
pivot row. First, what if there are no positive entries in column q? This is
an inconsistent situation: the negative entry in row 0 says that the objective
function can be increased, but there is no way to increase it. It turns out that
this situation arises if and only if the simplex is unbounded, so the algorithm
can terminate and report the problem. A more subtle difficulty arises in the
degenerate case when the (M + 1)st entry in some row (with a positive entry
in column q) is 0. Then this row will be chosen, but the objective function
will increase by 0. This is not a problem in itself: the problem arises when
there are two such rows. Certain natural policies for choosing between such
rows lead to cycling: an infinite'sequence of pivots which do not increase the
objective function at all. Again, several possibilities are available for avoiding
cycling. One method is to break ties randomly. This makes cycling extremely
unlikely (but not mathematically impossible). Another anti-cycling policy is
described below.
We have been avoiding difficulties such as cycling in our example to make
the description of the method clear, but it must be emphasized that such
degenerate cases are quite likely to arise in practice. The generality offered by
using linear programming implies that degenerate cases of the general problem
will arise in the solution of specific problems.
In our example, we can pivot again with q = 3 (because of the -1 in row
0 and column 3) and p = 5 (because 1 is the only positive value in column 3).
This gives the following matrix:
0.00 0.00 0.00 0.00 0.00 0.64 -0.09 1.00 19.00
0.00 0.00 0.00 1.00 0.00 0.09 0.27 0.00 14.00
0.00 0.00 0.00 0.00 1.00 -1.45 0.64 0.00 21.00
0.00 1.00 0.00 0.00 0.00 0.27 -0.18 0.00 3.00
1.00 0.00 0.00 0.00 0.00 0.36 0.09 0.00 12.00
0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 4.00
LIhJEAR PROGRAMMING 507
This corresponds to the point (12,3,4) on the simplex, for which the value of
the objective function is 19.
In general, there might be several negative entries in row 0, and several
different strategies for choosing from among them have been suggested. We
have been proceeding according to one of the most popular methods, called
the greatest increment method: always choose the column with the smallest
value in row 0 (largest in absolute value). This does not necessarily lead to the
largest increase in the objective function, since scaling according to the row p
chosen has to be done. If this column selection policy is combined with the row
selection policy of using, in case of ties, the row that will result in the column
of lowest index being removed from the basis, then cycling cannot happen.
(This anticycling policy is due to R. G. Bland.) Another possibility for column
selection is to actually calculate the amount by which the objective function
would increase for each column, then use the column which gives the largest
result. This is called the steepest descent method. Yet another interesting
possibility is to choose randomly from among the available columns.
Finally, after one more pivot at p = 2 and q = 7, we arrive at the solution:
I 0.00 0.00 0.00 0.00 0.14 0.43 0.00 1.00 22.00
0.00 0.00 0.00 1.00 -0.43 0.71 0.00 0.00 5.00
0.00 0.00 0.00 0.00 1.57 -2.29 1.00 0.00 33.00
0.00 1.00 0.00 0.00 0.29 -0.14 0.00 0.00 9.00
1.00 0.00 0.00 0.00 -0.14 0.57 0.00 0.00 9.00
\ 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 4.00
This corresponds to the point (9,9,4) on the simplex, which maximizes the
objective function at 22. All the entries in row 0 are nonnegative, so any pivot
will only serve to decrease the objective function.
The above example outlines the simplex method for solving linear programs.
In summary, if we begin with a matrix of coefficients corresponding
to a point on the simplex, we can do a series of pivot steps which move to
adjacent points on the simplex, always increasing the objective function, until
the maximum is reached.
There is one fundamental fact which we have not yet noted but is crucial
to the correct operation of this procedure: once we reach a point where no
single pivot can improve the objective function (a "local" maximum), then
we have reached the "global" maximum. This is the basis for the simplex
algorithm. As mentioned above, the proof of this (and many other facts
which may seem obvious from the geometric interpretation) in general is quite
beyond the scope of this book. But the simplex algorithm for the general
case operates in essentially the same manner as for the simple problem traced
above.
508 CHAPTER 38
Implementation
The implementation of the simplex method for the case described above
is quite straightforward from the description. First, the requisite pivoting
procedure uses code similar to our implementation of Gaussian elimination in
Chapter 5:
procedure pivot(p, q: integer);
var j, k: integer;
for j:=O to N do
for k:=M+l downto 1 do
if (j<>p) and (k<>q) then
4.i kl:=dL kl-ab, kl*d.L d/ah 41;
for j:=O to N do if j<>p then ab, q] :=O;
for k:=l to M+l do if k<>q then a[p, k] :=a[p, k]/a[p, q];
4-3 41 :=I
end ;
This program adds multiples of row p to each row as necessary to make column
q all zero except for a 1 in row q as described above. As in Chapter 5, it is
necessary to take care not to change the value of a[p, q] before we're done
using it.
In Gaussian elimination, we processed only rows below p in the matrix
during forward elimination and only rows above p during backward substitution
using the Gauss-Jordan method. A system of N linear equations in N
unknowns could be solved by calling pivot(i, i) for i ranging from 1 to N then
back down to 1 again.
The simplex algorithm, then, consists simply of finding the values of p
and q as described above and calling pivot, repeating the process until the
optimum is reached or the simplex is determined to be unbounded:
repeat
q:=O; repeat q:=q+l until (q=M+l) or (a[O, q]<O);
p:=O; repeat p:=p+l until (p=N+l) or (a[p, q]>O);
for i:=p+l to N do
if a[& q]>O then
if (a[& M+l]/a[i, q])<(a[p, M+l]/a[p, q]) then p:=i;
if (q<M+l) and (p<N+l) then pivot(p, q)
until (q=M+l) or (p=N+l);
LINEAR PROGRAMMING 509
If the program terminates with q=M+1 then an optimal solution has been
found: the value achieved for the objective function will be in a[& M+1] and
the values for the variables can be recovered from the basis. If the program
terminates with p=N+l, then an unbounded si:uation has been detected.
This program ignores the problem of cycle avoidance. To implement
Bland's method, it is necessary to keep track of the column that would leave
the basis, were a pivot to be done using row p. This is easily done by setting
outb[p]:=q after each pivot. Then the loop to calculate p can be modified
to set p:=i also if equality holds in the ratio test and outb[p]<outb[q].
Alternatively, the selection of a random element could be implemented by
generating a random integer x and replacing each array reference a[p, q]
(or a[i, q]) by a[(p+x)mod(N+l), q] (or a[(i+x)mod(N+l), q]). This has
the effect of searching through the column q in the same way as before,
but starting at a random point instead of the beginning. The same sort of
technique could be used to choose a random column (with a negative entry in
row 0) to pivot on.
The program and example above treat a simple case that illustrates the
principle behind the simplex algorithm but avoids the substantial complications
that can arise in actual applications. The main omission is that the
program requires that the matrix have a feasible basis: a set of rows and
columns which can be permuted into the identity matrix. The program starts
with the assumption that there is a solution with the M - N variables appearing
in the objective function set to zero and that the N-by-N submatrix
involving the slack variables has been "solved" to make that submatrix the
identity matrix. This is easy to do for the particular type of linear program
that we stated (with all inequalities on positive variables), but in general we
need to find some point on the simplex. Once we have found one solution, we
can make appropriate transformations (mapping that point to the origin) to
bring the matrix into the required form, but at the outset we don't even know
whether a solution exists. In fact, it has been shown that detecting whether a
solution exists is as difficult computationally as finding the optimum solution,
given that one exists.
Thus it should not be surprising that the technique that is commonly used
to detect the existence of a solution is the simplex algorithm! Specifically, we
add another set of artificial variables ~1, ~2,. . . , sN and add variable si to the
ith equation. This is done simply by adding N columns to the matrix, filled
with the identity matrix. Now, this gives immediately a feasible basis for this
new linear program. The trick is to run the above algorithm with the objective
function -sl -s2 -. . '-SN. If there is a solution to the original linear program,
then this objective function can be maximized at 0. If the maximum reached
is not zero, then the original linear program is infeasible. If the maximum
is zero, then the normal situation is that ~1, ~2,. . . , sN all become non-basis
510 CRAPTER 38
variables, so we have computed a feasible basis for the original linear program.
In degenerate cases, some of the artificial variables may remain in the basis,
so it is necessary to do further pivoting to remove them (without changing
the cost).
To summarize, a two-phase process is normally used to solve general linear
programs. First, we solve a linear program involving the artificial s variables
to get a point on the simplex for our original problem. Then, we dispose of
the s variables and reintroduce our original objective function to proceed from
this point to the solution.
The analysis of the running time of the simplex method is an extremely
complicated problem, and few results are available. No one knows the "best"
pivot selection strategy, because there are no results to tell us how many pivot
steps to expect, for any reasonable class of problems. It is possible to construct
artificial examples for which the running time of the simplex could be very
large (an exponential function of the number of variables). However, those
who have used the algorithm in practical settings are unanimous in testifying
to its efficiency in solving actual problems.
The simple version of the simplex algorithm that we've considered, while
quite useful, is merely part of a general and beautiful mathematical framework
providing a complete set of tools which can be used to solve a variety of very
important practical problems.
r-l
LINEARPROGRAh4MlNG 511
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Draw the simplex defined by the inequalities ~1 2 0, 52 > 0, x3 2 0,
x1 + 2x2 < 20, and x1 + 52 +x3 5 10.
Give the sequence of matrices produced for the example in the text if the
pivot column chosen is the largest q for which a[O, q] is negative.
Give the sequence of matrices produced for the example in the text for
the objective function x1 + 5x2 + 2s.
Describe what happens if the simplex algorithm is run on a matrix with
a column of all 0's.
Does the simplex algorithm use the same number of steps if the rows of
the input matrix are permuted?
Give a linear programming formulation of the example in the previous
chapter for the knapsack problem.
How many pivot steps are required to solve the linear program "Maximize
Xl + ... +x~ subject to the constraints xl,.. .,x~ 5 1 and x1 ,...,x~ 2
0" ?
Construct a linear program consisting of N inequalities on two variables
for which the simplex algorithm requires at least N/2 pivots.
Give a three-dimensional linear programming problem which illustrates
the difference between the greatest increment and steepest descent column
selection methods.
Modify the implementation given in the text to actually write out the
coordinates of the optimal solution point.
39. Exhaustive Search
Some problems involve searching through a vast number of potential
solutions to find an answer, and simply do not seem to be amenable to
solution by efficient algorithms. In this chapter, we'll examine some characteristics
of problems of this sort and some techniques which have proven to
be useful for solving them.
To begin, we should reorient our thinking somewhat as to exactly what
constitutes an "efficient" algorithm. For most of the applications that we
have discussed, we have become conditioned to think that an algorithm must
be linear or run in time proportional to something like NlogN or N3/2 to
be considered efficient. We've generally considered quadratic algorithms to be
bad and cubic algorithms to be awful. But for the problems that we'll consider
in this and the next chapter, any computer scientist would be absolutely
delighted to know a cubic algorithm. In fact, even an N5' algorithm would be
pleasing (from a theoretical standpoint) because these problems are believed
to require exponential time.
Suppose that we have an algorithm that takes time proportional to 2N. If
we were to have a computer 1000 times faster than the fastest supercomputer
available today, then we could perhaps solve a problem for N = 50 in an
hour's time under the most generous assumptions about the simplicity of the
algorithm. But in two hour's time we could only do N = 51, and even in
a year's time we could only get to N = 59. And even if a new computer
were to be developed with a million times the speed, and we were to have
a million such computers available, we couldn't get to N = 100 in a year's
time. Realistically, we have to settle for N on the order of 25 or 30. A "more
efficient" algorithm in this situation may be one that could solve a problem
for N = 100 with a realistic amount of time and money.
The most famous problem of this type is the traveling salesman problem:
given a set of N cities, find the shortest route connecting them all, with no
513
514 CHAPTER 39
city visited twice. This problem arises naturally in a number of important applications,
so it has been studied quite extensively. We'll use it as an example
in this chapter to examine some fundamental techniques. Many advanced
methods have been developed for this problem but it is still unthinkable to
solve an instance of the problem for N = 1000.
The traveling salesman problem is difficult because there seems to be no
way to avoid having to check the length of a very large number of possible
tours. To check each and every tour is exhaustive search: first we'll see how
that is done. Then we'll see how to modify that procedure to greatly reduce
the number of possibilities checked, by trying to discover incorrect decisions
as early as possible in the decision-making process.
As mentioned above, to solve a large traveling salesman problem is unthinkable,
even with the very best techniques known. As we'll see in the next
chapter, the same is true of many other important practical problems. But
what can be done when such problems arise in practice? Some sort of answer is
expected (the traveling salesman has to do something): we can't simply ignore
the existence of the problem or state that it's too hard to solve. At the end of
this chapter, we'll see examples of some methods which have been developed
for coping with practical problems which seem to require exhaustive search.
In the next chapter, we'll examine in some detail the reasons why no efficient
algorithm is likely to be found for many such problems.
Exhaustive Search in Graphs
If the traveling salesman is restricted to travel only between certain pairs of
cities (for example, if he is traveling by air), then the problem is directly
modeled by a graph: given a weighted (possibly directed) graph, we want to
find the shortest simple cycle that connects all the nodes.
This immediately brings to mind another problem that would seem to
be easier: given an undirected graph, is there any way to connect all the
nodes with a simple cycle? That is, starting at some node, can we "visit" all
the other nodes and return to the original node, visiting every node in the
graph exactly once? This is known as the Hamilton cycle problem. In the
next chapter, we'll see that it is computationally equivalent to the traveling
salesman problem in a strict technical sense.
In Chapters 30-32 we saw a number of methods for systematically visiting
all the nodes of a graph. For all of the algorithms in those chapters, it was
possible to arrange the computation so that each node is visited just once, and
this leads to very efficient algorithms. For the Hamilton cycle problem, such
a solution is not apparent: it seems to be necessary to visit each node many
times. For the other problems, we were building a tree: when a "dead end"
was reached in the search, we could start it up again, working on another
EXHAUSTIVE SEARCH 515
part of the tree. For this problem, the tree must have a particular structure
(a cycle): if we discover during the search that the tree being built cannot be
a cycle, we have to go back and rebuild part of it.
To illustrate some of the issues involved, we'll look at the Hamilton cycle
problem and the traveling salesman problem for the example graph from
Chapter 31:
Depth-first search would visit the nodes in this graph in the order A B C E
D F G (assuming an adjacency matrix or sorted adjacency list representation).
This is not a simple cycle: to find a Hamilton cycle we have to try another way
to visit the nodes. It turns out the we can systematically try all possibilities
with a simple modification to the visit procedure, as follows:
procedure visit(k: integer);
var t: integer;
now:=now+l; val[k] :=now;
for t:=l to Vdo
if a[k, t] then
if val[t]=O then visit(t);
now:=now-1; va1[k] :=O
end ;
Rather than leaving every node that it touches marked with a nonzero
val entry, this procedure "cleans up after itself" and leaves now and the val
array exactly as it found them. The only marked nodes are those for which
visit hasn't completed, which correspond exactly to a simple path of length
now in the graph, from the initial node to the one currently being visited. To
visit a node, we simply visit all unmarked adjacent nodes (marked ones would
not correspond to a simple path). The recursive procedure checks all simple
paths in the graph which start at the initial node.
516 CHAPTER 39
The following tree shows the order in which paths are checked by the
above procedure for the example graph given above. Each node in the tree
corresponds to a call of visit: thus the descendants of each node are adjacent
nodes which are unmarked at the time of the call. Each path in the tree from
a node to the root corresponds to a simple path in the graph:
Thus, the first path checked is A B C E D F. At this point all vertices adjacent
to F are marked (have non-zero val entries), so visit for F unmarks F and
returns. Then visit for D unmarks D and returns. Then visit for E tries F
which tries D, corresponding to the path A B C E F D. Note carefully that
in depth-first search F and D remain marked after they are visited, so that F
would not be visited from E. The "unmarking" of the nodes makes exhaustive
search essentially different from depth-first search, and the reader should be
sure to understand the distinction.
As mentioned above, now is the current length of the path being tried,
and val[k] is the position of node k on that path. Thus we can make the visit
procedure given above test for the existence of a Hamilton cycle by having
it test whether there is an edge from k to 1 when val[k]=V. In the example
above, there is only one Hamilton cycle, which appears twice in the tree,
traversed in both directions. The program can be made to solve the traveling
salesman problem by keeping track of the length of the current path in the
val array, then keeping track of the minimum of the lengths of the Hamilton
EXhXJSTIVE SEARCH 517
cycles found.
Backtracking
The time taken by the exhaustive search procedure given above is proportional
to the number of calls to visit, which is the number of nodes in the exhaustive
search tree. For large graphs, this will clearly be very large. For example, if
the graph is complete (every node connected to every other node), then there
are V! simple cycles, one corresponding to each arrangement of the nodes.
(This case is studied in more detail below.) Next we'll examine techniques to
greatly reduce the number of possibilities tried. All of these techniques involve
adding tests to visit to discover that recursive calls should not be made for
certain nodes. This corresponds to pruning the exhaustive search tree: cutting
certain branches and deleting everything connected to them.
One important pruning technique is to remove symmetries. In the above
example, this is manifested by the fact that we find each cycle twice, traversed
in both directions. In this case, we can ensure that we find each cycle just
once by insisting that three particular nodes appear in a particular order. For
example, if we insist that node C appear after node A but before node B in
the example above, then we don't have to call visit for node B unless node C
is already on the path. This leads to a drastically smaller tree:
This technique is not always applicable: for example, suppose that we're trying
to find the minimum-cost path (not cycle) connecting all the vertices. In the
above example, A G E F D B C is a path which connects all the vertices, but
CHAPTER 39
it is not a cycle. Now the above technique doesn't apply, since we can't know
in advance whether a path will lead to a cycle or not.
Another important pruning technique is to cut off the search as soon as it
is determined that it can't possibly be successful. For example, suppose that
we're trying to find the minimum cost path in the graph above. Once we've
found A F D B C E G, which has cost 11, it's fruitless, for example, to search
anywhere further along the path A G E B, since the cost is already 11. This
can be implemented simply by making no recursive calls in visit if the cost
of the current partial path is greater than the cost of the best full path found
so far. Certainly, we can't miss the minimum cost path by adhering to such
a policy.
The pruning will be more effective if a low-cost path is found early in
the search; one way to make this more likely is to visit the nodes adjacent
to the current node in order of increasing cost. In fact, we can do even
better: often, we can compute a bound on the cost of all full paths that begin
with a given partial path. For example, suppose that we have the additional
information that all edges in the diagram have a weight of at least 1 (this could
be determined by an initial scan through the edges). Then, for example, we
know that any full path starting with AG must cost at least 11, so we don't
have to search further along that path if we've already found a solution which
costs 11.
Each time that we cut off the search at a node, we avoid searching the
entire subtree below that node. For very large trees, this is a very substantial
savings. Indeed, the savings is so significant that it is worthwhile to do as much
as possible within visit to avoid making recursive calls. For our example, we
can get a much better bound on the cost of any full path which starts with the
partial path made up of the marked nodes by adding the cost of the minimum
spanning tree of the unmarked nodes. (The rest of the path is a spanning tree
for the unmarked nodes; its cost will certainly not be lower than the cost of
the minimum spanning tree of those nodes.) In particular, some paths might
divide the graph in such a way that the unmarked nodes aren't connected;
clearly we stop the search on such paths also. (This might be implemented by
returning an artificially high cost for the spanning tree.) For example, there
can't be any simple path that starts with ABE.
Drawn below is the search tree that results when all of these rules are
applied to the problem of finding the best Hamilton path in the sample graph
that we've been considering:
EXHAUSTIVE SEARCH 519
6G
Again the tree is drastically smaller. It is important to note that the savings
achieved for this toy problem is only indicative of the situation for larger
problems. A cutoff high in the tree can lead to truly significant savings;
missing an obvious cutoff can lead to truly significant waste.
The general procedure of solving a problem by systematically generating
all possible solutions as described above is called backtracking. Whenever we
have a situation where partial solutions to a problem can be successively augmented
in many ways to produce a complete solution, a recursive implementation
like the program above may be appropriate. As above, the process
can be described by an exhaustive search tree whose nodes correspond to the
partial solutions. Going down in the tree corresponds to forward progress
towards creating a more complete solution; going up in the tree corresponds
to "backtracking" to some previously generated partial solution, from which
point it might be worthwhile to proceed forwards again. The general technique
of calculating bounds on partial solutions in order to limit the number of full
solutions which need to be examined is sometimes called branch-and-bound.
For another example, consider the knapsack problem of the previous
chapter, where the values are not necessarily restricted to be integers. For
this problem, the partial solutions are clearly some selection of items for the
knapsack, and backtracking corresponds to taking an item out to try some
other combination. Pruning the search tree by removing symmetries is quite
effective for this problem, since the order in which objects are put into the
knapsack doesn't affect the cost.
Backtracking and branch-and-bound are quite widely applicable as general
520 CHAPTER 39
problem-solving techniques. For example, they form the basis for many programs
which play games such as chess or checkers. In this case, a partial
solution is some legal positioning of all the pieces on the board, and the descendant
of a node in the exhaustive search tree is a position that can be
the result of some legal move. Ideally, it would be best if a program could
exhaustively search through all possibilities and choose a move that will lead
to a win no matter what the opponent does, but there are normally far too
many possibilities to do this, so a backtracking search is typically done with
quite sophisticated pruning rules so that only "interesting" positions are examined.
Exhaustive search techniques are also used for other applications in
artificial intelligence.
In the next chapter we'll see several other problems similar to those
we've been studying that can be attacked using these techniques. Solving
a particular problem involves the development of sophisticated criteria which
can be used to limit the search. For the traveling salesman problem we've
given only a few examples of the many techniques that have been tried,
and equally sophisticated methods have been developed for other important
problems.
However sophisticated the criteria, it is generally true that the running
time of backtracking algorithms remains exponential. Roughly, if each node
in the search tree has cr sons, on the average, and the length of the solution
path is N, then we expect the number of nodes in the tree to be proportional
to oN. Different backtracking rules correspond to reducing the value of (Y,
the number of choices to try at each node. It is worthwhile to expend effort
to do this because a reduction in [Y will lead to an increase in the size of the
problem that can be solved. For example, an algorithm which runs in time
proportional to 1.1 N can solve a problem perhaps eight times a large as one
which runs in time proportional to 2N.
Digression: Permutation Generation
An interesting computational puzzle is to write a program that generates all
possible ways of rearranging N distinct items. A simple program for this
permutation generation problem can be derived directly from the exhaustive
search program above because, as noted above, if it is run on a complete graph,
then it must try to visit the vertices of that graph in all possible orders.
EXHAUSTIVE SEARCH 521
procedure visit(k: integer);
var t: integer;
now:=now+l; vaI[k] :=now;
if now= V then writeperm;
for t:=l to Vdo
if vaI[t]=O then visit(t);
now:=now-I; val[k] :=O
end ;
This program is derived from the procedure above by eliminating all reference
to the adjacency matrix (since all edges are present in a complete graph). The
procedure writeperm simply writes out the entries of the val array. This is
done each time now=V, corresponding to the discovery of a complete path
in the graph. (Actually, the program can be improved somewhat by skipping
the for loop when now=V, since at that point is known that all the val entries
are nonzero.) To print out all permutations of the integers 1 through N, we
invoke this procedure with the call visit(O) with now initialized to -1 and the
val array initialized to 0. This corresponds to introducing a dummy node to
the complete graph, and checking all paths in the graph starting with node
0. When invoked in this way for N=4, this procedure produces the following
output (rearranged here into two columns):
1 2 3 4
1 2 4 3
1 3 2 4
1 4 2 3
1 3 4 2
1 4 3 2
2 1 3 4
2 1 4 3
3 1 2 4
4 1 2 3
3 1 4 2
4 1 3 2
2 3 1 4
2 4 1 3
3 2 1 4
4 2 1 3
3 4 1 2
4 3 1 2
2 3 4 1
2 4 3 1
3 2 4 1
4 2 3 1
3 4 2 1
4 3 2 1
Admittedly, the interpretation of the procedure as generating paths in a
complete graph is barely visible. But a direct examination of the procedure
reveals that it generates all N! permutations of the integers 1 to N by
first generating all (N - l)! permutations with the 1 in the first position
522 CRAPTER 39
(calling itself recursively to place 2 through N), then generating the (N - l)!
permutations with the 1 in the second position, etc.
Now, it would be unthinkable to use this program even for N = 16,
because 16! > 250. Still, it is important to study because it can form the basis
for a backtracking program to solve any problem involving reordering a set
of elements.
For example, consider the Euclidean traveling salesman problem: given
a set of N points in the plane, find the shortest tour that connects them
all. Since each ordering of the points corresponds to a legal tour, the above
program can be made to exhaustively search for the solution to this problem
simply by changing it to keep track of the cost of each tour and the minimum
of the costs of the full tours, in the same manner as above. Then the
same branch-and-bound technique as above can be applied, as well as various
backtracking heuristics specific to the Euclidean problem. (For example, it is
easy to prove that the optimal tour cannot cross itself, so the search can be cut
off on all partial paths that cross themselves.) Different search heuristics might
correspond to different ways of ordering the permutations. Such techniques
can save an enormous amount of work but always leave an enormous amount
of work to be done. It is not at all a simple matter to find an exact solution
to the Euclidean traveling salesman problem, even for N as low as 16.
Another reason that permutation generation is of interest is that there
are a number of related procedures for generating other combinatorial objects.
In some cases, the number of objects generated are not quite so numerous are
as permutations, and such procedures can be useful for larger N in practice.
An example of this is a procedure to generate all ways of choosing a subset of
size k out of a set of N items. For large N and small k, the number of ways
of doing this is roughly proportional to N k. Such a procedure could be used
as the basis for a backtracking program to solve the knapsack problem.
Approximation Algorithms
Since finding the shortest tour seems to require so much computation, it is
reasonable to consider whether it might be easier to find a tour that is almost
as short as the shortest. If we're willing to relax the restriction that we
absolutely must have the shortest possible path, then it turns out that we can
deal with problems much larger than is possible with the techniques above.
For example, it's relatively easy to find a tour which is longer by at most a
factor of two than the optimal tour. The method is based on simply finding the
minimum spanning tree: this not only, as mentioned above, provides a lower
bound on the length of the tour but also turns out to provide an upper bound
on the length of the tour, as follows. Consider the tour produced by visiting
the nodes of the minimum spanning tree using the following procedure: to
EXHAUSTIVE SEARCH 523
process node x, visit x, then visit each son of x, applying this visiting procedure
recursively and returning to node x after each son has been visited, ending up
at node x. This tour traverses every edge in the spanning tree twice, so its
cost is twice the cost of the tree. It is not a simple tour, since a node may be
visited many times, but it can be converted to a simple tour simply by deleting
all but the first occurrence of each node. Deleting an occurrence of a node
corresponds to taking a shortcut past that node: certainly it can't increase
the cost of the tour. Thus, we have a simple tour which has a cost less than
twice that of the minimum spanning tree. For example, the following diagram
shows a minimum spanning tree for our set of sample points (computed as
described in Chapter 31) along with a corresponding simple tour.
This tour is clearly not the optimum, because it self-intersects. For a large
random point set, it seems likely that the tour produced in this way will
be close to the optimum, though no analysis has been done to support this
conclusion.
Another approach that has been tried is to develop techniques to improve
an existing tour in the hope that a short tour can be found by applying
such improvements repeatedly. For example, if we have (as above)
a Euclidean traveling salesman problem where graph distances are distances
524 CHAPTER 39
between points in the plane, then a self-intersecting tour can be improved by
removing each intersection as follows. If the line Al3 intersects the line CD,
the situation can be diagramed as at left below, without loss of generality.
But it follows immediately that a shorter tour can be formed by deleting AB
and CD and adding AD and CB, as diagramed at right:
Applying this procedure successively will, given any tour, produce a tour that
is no longer and which is not self-intersecting. For example, the procedure
applied to the tour produced from the minimum spanning tree in the example
above gives the shorter tour AGOENLPKFJMBDHICA. In fact, one of the
most effective approaches to producing approximate solutions to the Euclidean
traveling salesman problem, developed by S. Lin, is to generalize the procedure
above to improve tours by switching around three or more edges in an existing
tour. Very good results have been obtained by applying such a procedure
successively, until it no longer leads to an improvement, to an initially random
tour. One might think that it would be better to start with a tour that is
already close to the optimum, but Lin's studies indicate that this may not be
the case.
The various approaches to producing approximate solutions to the traveling
salesman problem which are described above are only indicative of the
types of techniques that can be used in order to avoid exhaustive search. The
brief descriptions above do not do justice to the many ingenious ideas that
have been developed: the formulation and analysis of algorithms of this type
is still a quite active area of research in computer science.
One might legitimately question why the traveling salesman problem and
the other problems that we have been alluding to require exhaustive search.
Couldn't there be a clever algorithm that finds the minimal tour as easily
and quickly as we can find the minimum spanning tree? In the next chapter
we'll see why most computer scientists believe that there is no such algorithm
and why approximation algorithms of the type discussed in this section must
therefore be studied.
EXHAUSTAL!? SEARCH 525
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Which would you prefer to use, an algorithm that requires N5 steps or
one that requires 2N steps?
Does the "maze" graph at the end of Chapter 29 have a Hamilton cycle?
Draw the tree describing the operation of the exhaustive search procedure
when looking for a Hamilton cycle on the sample graph starting at vertex
B instead of vertex A.
How long could exhaustive search take to find a Hamilton cycle in a graph
where all nodes are connected to exactly two other nodes? Answer the
same question for the case where all nodes are connected to exactly three
other nodes.
How many calls to visit are made (as a function of V) by the permutation
generation procedure?
Derive a nonrecursive permutation generation procedure from the program
given.
Write a program which determines whether or not two given adjacency
matrices represent the same graph, except with different vertex names.
Write a program to solve the knapsack problem of Chapter 37 when the
sizes can be real numbers.
Define another cutoff rule for the Euclidean traveling salesman problem,
and show the search tree that it leads to for the first six points of our
sample point set.
Write a program to count the number of spanning trees of a set of N
given points in the plane with no intersecting edges.
Solve the Euclidean traveling salesman problem for our sixteen sample
points.
40. NP-complete Problems
The algorithms we've studied in this book generally are used to solve
practical problems and therefore consume reasonable amounts of resources.
The practical utility of most of the algorithms is obvious: for many
problems we have the luxury of several efficient algorithms to choose from.
Many of the algorithms that we have studied are routinely used to solve actual
practical problems. Unfortunately, as pointed out in the previous chapter,
many problems arise in practice which do not admit such efficient solutions.
What's worse, for a large class of such problems we can't even tell whether or
not an efficient solution might exist.
This state of affairs has been a source of extreme frustration for programmers
and algorithm designers, who can't find any efficient algorithm for
a wide range of practical problems, and for theoreticians, who have been unable
to find any reason why these problems should be difficult. A great deal
of research has been done in this area and has led to the development of
mechanisms by which new problems can be classified as being "as difficult as"
old problems in a particular technical sense. Though much of this work is
beyond the scope of this book, the central ideas are not difficult to learn. It
is certainly useful when faced with a new problem to have some appreciation
for the types of problems for which no one knows any efficient algorithm.
Sometimes there is quite a fine line between "easy" and "hard" problems.
For example, we saw an efficient algorithm in Chapter 31 for the following
problem: "Find the shortest path from vertex z to vertex y in a given weighted
graph." But if we ask for the longest path (without cycles) from x to y, we
have a problem for which no one knows a solution substantially better than
checking all possible paths. The fine line is even more striking when we
consider similar problems that ask for only "yes-no" answers:
527
528 CHAPTER 40
Easy: Is there a path from x to y with weight 5 M?
Hard(?): Is there a path from x to y with weight 2 M?
Breadth-first search will lead to a solution for the first problem in linear time,
but all known algorithms for the second problem could take exponential time.
We can be much more precise than "could take exponential time," but
that will not be necessary for the present discussion. Generally, it is useful
to think of an exponential-time algorithm as one which, for some input of
size N, takes time proportional to 2N (at least). (The substance of the
results that we're about to discuss is not changed if 2 is replaced by any
number CI: > 1.) This means, for example, that an exponential-time algorithm
could not be guaranteed to work for all problems of size 100 (say) or greater,
because no one could wait for an algorithm to take 2"' steps, regardless of
the speed of the computer. Exponential growth dwarfs technological changes:
a supercomputer may be a trillion times faster than an abacus, but neither
can come close to solving a problem that requires 21°0 steps.
Deterministic and Nondeterministic Polynomial- Time Algorithms
The great disparity in performance between "efficient" algorithms of the type
we've been studying and brute-force "exponential" algorithms that check each
possibility makes it possible to study the interface between them with a simple
formal model.
In this model, the efficiency of an algorithm is a function of the number
of bits used to encode the input, using a "reasonable" encoding scheme. (The
precise definition of "reasonable" includes all common methods of encoding
things for computers: an example of an unreasonable coding scheme is unary,
where M bits are used to represent the number M. Rather, we would
expect that the number of bits used to represent the number M should be
proportional to log M.) We're interested merely in identifying algorithms
guaranteed to run in time proportional to some polynomial in the number of
bits of input. Any problem which can be solved by such an algorithm is said
to belong to
P : the set of all problems which can be solved by deterministic
algorithms in polynomial time.
By deterministic we mean that at any time, whatever the algorithm is doing,
there is only one thing that it could do next. This very general notion covers
the way that programs run on actual computers. Note that the polynomial
is not specified at all and that this definition certainly covers the standard
algorithms that we've studied so far. Sorting belongs to P because (for
NP-COMPLETE PROBLEMS 529
example)1 insertion sort runs in time proportional to N2: the existence of
N log N sorting algorithms is not relevant to the present discussion. Also, the
time taken by an algorithm obviously depends on the computer used, but it
turns out that using a different computer will affect the running time by only
a polynomial factor (again, assuming reasonable limits), so that also is not
particularly relevant to the present discussion.
Of course, the theoretical results that we're discussing are based on a
completely specified model of computation within which the general statements
that we're making here can be proved. Our intent is to examine some of
the central ideas, not to develop rigorous definitions and theorem statements.
The reader may rest assured that any apparent logical flaws are due to the
informal nature of the description, not the theory itself.
One "unreasonable" way to extend the power of a computer is to endow it
with the power of nondeterminism: when an algorithm is faced with a choice
of several options, it has the power to "guess" the right one. For the purposes
of the discussion below, we can think of an algorithm for a nondeterministic
machine as "guessing" the solution to a problem, then verifying that the
solution is correct. In Chapter 20, we saw how nondeterminism can be useful
as a tool for algorithm design; here we use it as a theoretical device to help
classify problems. We have
NP: the set of all problems which can be solved by nondeterministic
algorithms in polynomial time.
Obviously, any problem in P is also in NP. But it seems that there should be
many other problems in NP: to show that a problem is in NP, we need only
find a polynomial-time algorithm to check that a given solution (the guessed
solution) is valid. For example, the "yes-no" version of the longest-path
problem is in NP. Another example of a problem in NP is the satisfiability
problem. Given a logical formula of the form
(Xl + 23 + %)*(Icl + z2 + x4)*(23 + x4 + %)*(x2 + :3 + x5)
where the zz's represent variables which take on truth values (true or false),
"+" represents or, "*" represents and, and z represents not, the satisfiability
problem is to determine whether or not there exists an assignment of truth
values to the variables that makes the formula true ("satisfies" it). We'll see
below that this particular problem plays a special role in the theory.
Nondeterminism is such a powerful operation that it seems almost absurd
to consider it seriously. Why bother considering an imaginary tool that
makes difficult problems seem trivial? The answer is that, powerful as nondeterminism
may seem, no one has been able to prove that it helps for any
particular problem! Put another way, no one has been able to find a single
530 CHAPTER 40
example of a problem which can be proven to be in NP but not in P (or even
prove that one exists): we do not know whether or not P = NP. This is a
quite frustrating situation because many important practical problems belong
to NP (they could be solved efficiently on a non-deterministic machine) but
may or may not belong to P (we don't know any efficient algorithms for
them on a deterministic machine). If we could prove that a problem doesn't
belong to P, then we could abandon the search for an efficient solution to
it. In the absence of such a proof, there is the lingering possibility that some
efficient algorithm has gone undiscovered. In fact, given the current state
of our knowledge, it could be the case that there is some efficient algorithm
for every problem in NP, which would imply that many efficient algorithms
have gone undiscovered. Virtually no one believes that P = NP, and a considerable
amount of effort has gone into proving the contrary, but this remains
the outstanding open research problem in computer science.
NP-Completeness
Below we'll look at a list of problems that are known to belong to NP but
which might or might not belong to P. That is, they are easy to solve on a
non-deterministic machine, but, despite considerable effort, no one has been
able to find an efficient algorithm on a conventional machine (or prove that
none exists) for any of them. These problems have an additional property
that provides convincing evidence that P # NP: if any of the problems can be
solved in polynomial time on a deterministic machine, then so can all problems
in NP (i.e., P = NP). That is, the collective failure of all the researchers
to find efficient algorithms for all of these problems might be viewed as a
collective failure to prove that P = NP. Such problems are said to be NPcomplete.
It turns out that a large number of interesting practical problems
have this characteristic.
The primary tool used to prove that problems are NP-complete uses
the idea of polynomial reducibility. We show that any algorithm to solve a
new problem in NP can be used to solve some known NP-complete problem
by the following process: transform any instance of the known NP-complete
problem to an instance of the new problem, solve the problem using the given
algorithm, then transform the solution back to a solution of the NP-complete
problem. We saw an example of a similar process in Chapter 34, where we
reduced bipartite matching to network flow. By "polynomially" reducible,
we mean that the transformations can be done in polynomial time: thus the
existence of a polynomial-time algorithm for the new problem would imply the
existence of a polynomial-time algorithm for the NP-complete problem, and
this would (by definition) imply the existence of polynomial-time algorithms
for all problems in NP.
NP-COMPLETE PROBLEMS 531
The concept of reduction provides a useful mechanism for classifying
algorithms. For example, to prove that a problem in NP is NP-complete,
we need only show that some known NP-complete problem is polynomially
reducible to it: that is, that a polynomial-time algorithm for the new problem
could be used to solve the NP-complete problem, and then could, in turn, be
used to solve all problems in NP. For an example of reduction, consider the
following two problems:
TRAVELING SALESMAN: Given a set of cities, and distances between
all pairs, find a tour of all the cities of distance less than M.
HAMILTON CYCLE: Given a graph, find a simple cycle that includes
all the vertices.
Suppose that we know the Hamilton cycle problem to be NP-complete and
we wish to determine whether or not the traveling salesman problem is also
NP-complete. Any algorithm for solving the traveling salesman problem
can be used to solve the Hamilton cycle problem, through the following
reduction: given an instance of the Hamilton cycle problem (a graph) construct
an instance of the traveling salesman problem (a set of cities, with distances
between all pairs) as follows: for cities for the traveling salesman use the set
of vertices in the graph; for distances between each pair of cities use 1 if there
is an edge between the corresponding vertices in the graph, 2 if there is no
edge. Then have the algorithm for the traveling salesman problem find a tour
of distance less than or equal to N, the number of vertices in the graph. That
tour must correspond precisely to a Hamilton cycle. An efficient algorithm for
the traveling salesman problem would also be an efficient algorithm for the
Hamilton cycle problem. That is, the Hamilton cycle problem reduces to the
traveling salesman problem, so the NP-completeness of the Hamilton cycle
problem implies the NP-completeness of the traveling salesman problem.
The reduction of the Hamilton cycle problem to the traveling salesman
problem is relatively simple because the problems are so similar. Actually,
polynomial-time reductions can be quite complicated indeed and can connect
problems which seem to be quite dissimilar. For example, it is possible to
reduce t'he satisfiability problem to the Hamilton cycle problem. Without
going into details, we can look at a sketch of the proof. We wish to show
that if we had a polynomial-time solution to the Hamilton cycle problem,
then we could get a polynomial-time solution to the satisfiability problem by
polynomial reduction. The proof consists of a detailed method of construction
showing how, given an instance of the satisfiability problem (a Boolean
formula) to construct (in polynomial time) an instance of the Hamilton cycle
problem (a graph) with the property that knowing whether the graph has a
Hamilton cycle tells us whether the formula is satisfiable. The graph is built
from small components (corresponding to the variables) which can be traversed
532 ChXPTER 40
by a simple path in only one of two ways (corresponding to the truth or falsity
of the variables). These small components are attached together as specified
by the clauses, using more complicated subgraphs which can be traversed by
simple paths corresponding to the truth or falsity of the clauses. It is quite
a large step from this brief description to the full construction: the point
is to illustrate that polynomial reduction can be applied to quite dissimilar
problems.
Thus, if we were to have a polynomial-time algorithm for the traveling
salesman problem, then we would have a polynomial-time algorithm for the
Hamilton cycle problem, which would also give us a polynomial-time algorithm
for the satisfiability problem. Each problem that is proven NP-complete
provides another potential basis for proving yet another future problem NPcomplete.
The proof might be as simple as the reduction given above from the
Hamilton cycle problem to the traveling salesman problem, or as complicated
as the transformation sketched above from the satisfiability problem to the
Hamilton cycle problem, or somewhere in between. Literally thousands of
problems have been proven to be NP-complete over the last ten years by
transforming one to another in this way.
Cook's Theorem
Reduction uses the NP-completeness of one problem to imply the NP-completeness
of another. There is one case where it doesn't apply: how was the
first problem proven to be NP-complete? This was done by S. A. Cook in
1971. Cook gave a direct proof that satisfiability is NP-complete: that if
there is a polynomial time algorithm for satisfiability, then all problems in
NP can be solved in polynomial time.
The proof is extremely complicated but the general method can be explained.
First, a full mathematical definition of a machine capable of solving
any problem in NP is developed. This is a simple model of a general-purpose
computer known as a Turing machine which can read inputs, perform certain
operations, and write outputs. A Turing machine can perform any computation
that any other general purpose computer can, using the same amount of
time (to within a polynomial factor), and it has the additional advantage that
it can be concisely described mathematically. Endowed with the additional
power of nondeterminism, a Turing machine can solve any problem in NP.
The next step in the proof is to describe each feature of the machine, including
the way that instructions are executed, in terms of logical formulas such
as appear in the satisfiability problem. In this way a correspondence is established
between every problem in NP (which can be expressed as a program on
the nondeterministic Turing machine) and some instance of satisfiability (the
translation of that program into a logical formula). Now, the solution to the
satisfiability problem essentially corresponds t,o a simulation of the machine
W-COMPLETE PROBLEMS
running the given program on the given input, so it produces a solution to an
instance of the given problem. Further details of this proof are well beyond
the scope of this book. Fortunately, only one such proof is really necessary:
it is much easier to use reduction to prove NP-completeness.
Some NP- Complete Problems
As mentioned above, literally thousands of diverse problems are known to be
NP-complete. In this section, we list a few for purposes of illustrating the
wide range of problems that have been studied. Of course, the list begins
with satisfiability and includes traveling salesman and Hamilton cycle, as well
as longest path. The following additional problems are representative:
PARTITION: Given a set of integers, can they be divided into two sets
whose sum is equal?
INTEGER LINEAR PROGRAMMING: Given a linear program, is there
a solution in integers?
MULTIPROCESSOR SCHEDULING: Given a deadline and a set of
tasks of varying length to be performed on two identical processors can
the tasks be arranged so that the deadline is met?
VERTEX COVER: Given a graph and an integer N, is there a set of
less than N vertices which touches all the edges?
These and many related problems have important natural practical applications,
and there has been strong motivation for some time to find good algorithms
to solve them. The fact that no good algorithm has been found for any
of these problems is surely strong evidence that P # NP, and most researchers
certainly believe this to be the case. (On the other hand, the fact that
no one has been able to prove that any of these problem do not belong to P
could be construed to comprise a similar body of circumstantial evidence on
the other side.) Whether or not P = NP, the practical fact is that we have at
present no algorithms that are guaranteed to solve any of the NP-complete
problems efficiently.
As indicated in the previous chapter, several techniques have been developed
to cope with this situation, since some sort of solution to these various
problems must be found in practice. One approach is to change the problem
and find an "approximation" algorithm that finds not the best solution but
a solution that is guaranteed to be close to the best. (Unfortunately, this is
sometimes not sufficient to fend off NP-completeness.) Another approach is
to rely on "average-time" performance and develop an algorithm that finds
the solution in some cases, but doesn't necessarily work in all cases. That is,
while it may not be possible to find an algorithm that is guaranteed to work
well on all instances of a problem, it may well be possible to solve efficiently
virtually all of the instances that arise in practice. A third approach is to work
534 CHAPTER 40
with "efficient" exponential algorithms, using the backtracking techniques
described in the previous chapter. Finally, there is quite a large gap between
polynomial and exponential time which is not addressed by the theory. What
about an algorithm that runs in time proportional to Nl"sN or '2m?
All of the application areas that we've studied in this book are touched
by NP-completeness: there are NP-complete problems in numerical applications,
in sorting and searching, in string processing, in geometry, and in graph
processing. The most important practical contribution of the theory of NPcompleteness
is that it provides a mechanism to discover whether a new problem
from any of these diverse areas is "easy" or "hard." If one can find an
efficient algorithm to solve a new problem, then there is no difficulty. If not,
a proof that the problem is NP-complete at least gives the information that
the development of an efficient algorithm would be a stunning achievement
(and suggests that a different approach should perhaps be tried). The scores
of efficient algorithms that we've examined in this book are testimony that we
have learned a great deal about efficient computational methods since Euclid,
but the theory of NP-completeness shows that, indeed, we still have a great
deal to learn.
535
Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Write a program to find the longest simple path from x to y in a given
weighted graph.
Could there be an algorithm which solves an NP-complete problem in
an average time of N log N, if P # NP? Explain your answer.
Give a nondeterministic polynomial-time algorithm for solving the PARTITION
problem.
Is there an immediate polynomial-time reduction from the traveling salesman
problem on graphs to the Euclidean traveling salesman problem, or
vice versa?
What would be the significance of a program that could solve the traveling
salesman problem in time proportional to l.lN?
Is the logical formula given in the text satisfiable?
Could one of the "algorithm machines" with full parallelism be used to
solve an NP-complete problem in polynomial time, if P # NP? Explain
your answer.
How does the problem "compute the exact value of 2N" fit into the PNP
classification scheme?
Prove that the problem of finding a Hamilton cycle in a directed graph is
NP-complete, using the NP-completeness of the Hamilton cycle problem
for undirected graphs.
Suppose that two problems are known to be NP-complete. Does this
imply that there is a polynomial-time reduction from one to the other, if
P#NP?
536
SOURCES for Advanced Topics
Each of the topics covered in this section is the subject of volumes of
reference material. From our introductory treatment, the reader seeking more
information should anticipate engaging in serious study; we'll only be able to
indicate some basic references here.
The perfect shuffle machine of Chapter 35 is described in the 1968 paper
by Stone, which covers many other applications. One place to look for more
information on systolic arrays is the chapter by Kung and Leiserson in Mead
and Conway's book on VLSI. A good reference for applications and implementation
of the FFT is the book by Rabiner and Gold. Further information on
dynamic programming (and topics from other chapters) may be found in the
book by Hu. Our treatment of linear programming in Chapter 38 is based on
the excellent treatment in the book by Papadimitriou and Steiglitz, where all
the intuitive arguments are backed up by full mathematical proofs. Further
information on exhaustive search techniques may be found in the books by
Wells and by Reingold, Nievergelt, and Deo. Finally, the reader interested
in more information on NP-completeness may consult the survey article by
Lewis and Papadimitriou and the book by Garey and Johnson, which has a
full description of various types of NP-completeness and a categorized listing
of hundreds of NP-complete problems.
M. R. Garey and D. S. Johnson, Computers and Intractability: a Guide to the
Theory of NP-Completeness, Freeman, San Francisco, CA, 1979.
T. C. Hu, Combinatorial Algorithms, Addison-Wesley, Reading, MA, 1982.
H. R. Lewis and C. H. Papadimitriou, "The efficiency of algorithms," Scientific
American, 238, 1 (1978).
C. A. Mead and L. C. Conway, Introduction to VLSI Design, Addison-Wesley,
Reading, MA, 1980.
C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms
and Complexity, Prentice-Hall, Englewood Cliffs, NJ, 1982.
E. M. Reingold, J. Nievergelt, and N. Deo, Combinatorial Algorithms: Theory
and Practice, Prentice-Hall, Englewood Cliffs, NJ, 1982.
L. R. Rabiner and B. Gold, Digital Signal Processing, Prentice-Hall, Englewood
Cliffs, NJ, 1974.
H. S. Stone, "Parallel processing with the perfect shuffle," IEEE Transactions
on Computing, C-20, 2 (February, 1971).
M. B. Wells, Elements of Combinatorial Computing, Pergaman Press, Oxford,
1971.
Index
Abacus, 528.
Abstract data structures, 30, 88,
128, 136.
adapt (integration, adaptive
quadrature), 85.
Additive congruential generator
(randomint), 38-40.
add (polynomials represented
with linked lists), 27.
add (sparse polynomials), 28.
Adjacency lists, 3788381, 3822
383, 391-392, 410-411, 435.
Adjacency matrix, 3777378, 384,
410-411, 425, 435, 493, 515.
Adjacency structure; see adjacency
lists.
adjlist (graph input, adjacency
lists), 379.
adjmatrix (graph input, adjacency
matrix), 378.
Adleman, L., 301, 304.
Aho, A. V., 304.
Algorithm machines, 4577469.
All-nearest-neighbors, 366.
All-pairs shortest paths, 4922494.
Analysis of algorithms, 12-16, 19.
Approximation algorithms, 522-
524, 533.
Arbitrary numbers, 33.
Arithmetic, 23-30.
Arrays, 24.
Articulation points, 390-392,
430.
Artificial (slack) variables, 503,
509.
Attributes, 335.
Average case, 12-13.
AVL trees, 198.
B-trees, 228-231, 237.
Backtracking, 517-522.
Backward substitution, 60, 62
(substitute), 64.
Balanced multiway merging,
1566161.
Balanced trees, 187-199, 237,
355.
Basis variables, 504.
Batcher, K. E., 4633465.
Bayer, R., 228.
Bentley, J. L., 370.
Biconnectivity, 390-392, 429.
537
538
Binary search, 175-177, 176
(binarysearch), 336.
Binary search trees, 169, 178%
185, 204, 210, 336, 343-346,
353, 3566359.
array representation, 1844185.
indirect representation, 184-
185, 353.
optimal, 489-492.
standard representation, 178-
179.
weighted internal path length,
490.
Binary trees, 179, 237.
Binomial queues, 167.
Bipartite graphs, 444-447.
Bitonic merge, 463-465.
bits, 116, 118, 122, 214, 215, 221,
222.
Bland, R. G., 507.
Bland's method (for cycle
avoidance in simplex), 509.
Borodin, A,, 88.
Bottom-up parsing, 275-276.
Boyer, R. S., 242, 304.
Boyer-Moore string searching,
250-251.
Branch-and-bound, 519-520.
Breadth-first search, 395, 397-
398, 439.
Brown, M. R., 167.
brutesearch (brute-force string
searching), 243.
b&delete (binary search tree deletion),
185, 355.
bstinsert (binary search tree insertion),
184, 353, 355.
b&range (one-dimensional range
search), 337, 355.
bubblesort, 99.
Caesar cipher, 297.
Catalan numbers, 487.
Chi-square (x2) test (c&square),
41-42.
Ciphers, 297-300.
Caesar, 297.
Vernam, 299.
Vigenere, 298.
product, 300.
Ciphertext, 297.
Clancy, M., 19.
Closest-pair problem, 362-366,
368.
Closest-point problems, 361-368,
370.
Closure, 258, 261.
Clustering, 207.
Comer, D., 237.
Compare-exchange, 93, 460-465.
Compilers, 247, 269, 276-279,
304.
Complete binary tree, 130.
Complete graphs, 376.
Complex numbers, 473-478.
Complex roots of unity, 473-477.
Computational accuracy, 61, 63,
86, 504.
Concatenation, 258, 261.
Connected components, 375.
Connected graph, 375.
Connectivity, 389-405, 454.
Conquer-and-divide, 152.
Constant running time, 14.
Constraints, 498.
Context-free grammars, 270-272.
Contextrsensitive grammars, 272.
Convex hull, 321.
Convex hull algorithms, 321-333,
368, 370.
INDEX
divide-and-conquer, 368.
Floyd-Eddy method, 331-332.
Graham scan, 326-330, 329
(grahamscan), 332.
hull selection, 331-332.
package wrapping, 323-326,
325 (wrap), 332.
Convex polygons, 321.
Convexity, 321.
Conway, L. C., 536.
Cook, S. A., 242, 532.
Cook's theorem (satisfiability is
NP-complete), 532-533.
Cooper, D., 19.
Counting, 455.
Cross edges, 423, 430.
Cryptanalysis, 295-296.
Cryptography, 295-296.
Cryptology, 295-302, 304.
Cryptosystem, 296.
Cryptovariables, 299.
Cubic running time, 15.
Curve fitting, 67-76.
Cycle, 375, 384.
Cycling in the simplex method,
506-507, 509.
Dags (directed acyclic graphs),
426-428.
Data fitting, 67-76.
Data structures.
abstract, 30, 128, 136.
adjacency lists, 378-381.
adjacency matrix, 377-378.
adjacency structure, 378-381
array, 24.
Btree, 228-231, 237.
binary search tree, 178-185.
deque, 263-267.
539
heap, 129-140.
indirect binary search tree,
184-185.
indirect heap, 138-139.
linked list, 27-28, 202-203,
379.
priority queue, 127-140.
queue, 264, 395.
red-black tree, 192-199.
sorted list, 129.
stack, 109-110, 264, 394, 428,
429.
string, 241.
top-down 2-3-4 tree, 187-199.
unordered list, 129.
Database, 226, 237, 335.
Decryption, 297, 301.
Deletion in binary search trees,
183-184.
Deletion in hash tables, 208.
Dense graphs, 376, 378, 397-398,
411, 413, 415-417.
densepfs (priority graph traversal),
416, 439-440.
Deo, N., 536.
Depth-first search, 371, 381-387,
391-395, 397-399, 422-423,
428-430, 454, 515.
Depth-first search forest, 382,
384, 394, 422-423.
Derivation, 270.
Deterministic algorithm, 528.
dfs (recursive depth-first search),
382-385.
Dictionaries, 171.
Diffie, W., 301.
Digital search trees, 213-216.
digitalinsert, 215.
digitalsearch, 214.
540
Dijkstra's algorithm (for finding
the shortest path), 415.
Dijkstra, E. W., 410, 415, 454.
Directed acyclic graphs (dags),
426-428.
Directed cycle, 428.
Directed graphs, 376, 380, 421-
430.
Directed path, 423.
Directory, 233.
Discrete mathematics, 19.
Disk searching, 225-235.
Distribution counting, 99-101,
116, 122-123.
Divide-and-conquer, 48, 51, 104,
152, 175, 362, 474, 477-480,
483.
Divide-and-conquer recurrence,
51, 108, 149, 475, 363.
Dot product, 74.
Double buffering, 161.
Double hashing, 207-210.
Double rotation, 198.
Down edges, 423.
downheap (top-down heap
repair), 134.
Drawing lines, 310 (draw), 311.
Dual of Voronoi diagram, 367-
368.
Dummy node; see z.
Duplicate keys; see equal keys.
Dynamic programming, 483-494,
536.
Eddy, W. F., 331, 370.
Edges, 374.
backward, 437.
capacities, 435.
cross, 423, 430.
down, 423.
forward, 437.
negative weight, 494.
up, 423, 430.
Edmonds, J., 439-440.
eliminate (forward elimination),
62.
Encryption, 297, 301.
eof, 9.
Equal keys, 172, 177, 193, 204,
214, 227-228, 234.
Escape sequence, 286.
Euclid's algorithm (for finding
the gcd), 10-11, 19, 302.
Euclidean minimum spanning
tree, 417.
Euclidean shortest path problem,
418.
Euclidean traveling salesman
problem, 522-524.
eval (fast Fourier transform), 479.
eval (spline evaluation), 72.
Even, S., 454.
Exception dictionary, 210.
Exhaustive graph traversal
(visit), 515.
Exhaustive search, 513-524, 536.
Exponential running time, 15,
513, 520, 528, 534.
Exponentiation, 46-47, 301.
expression (top-down compiler),
277.
expression (top-down parser),
273.
Extendible hashing, 231-235,
237.
External nodes, 180, 230, 289,
490.
External searching, 225-235.
External sorting, 155-165.
INDEX
factor (top-down compiler), 278.
factor (top-down parser), 274.
Fagin, R., 231, 237.
fastfind (union-find with compression
and balancing), 403,
411.
Fast Fourier transform, 465, 471-
480, 479 (eval), 536.
Feasible basis, 509-510.
File compression, 283-293.
Huffman encoding, 286-293.
run-length encoding, 284-286.
variable-length encoding, 286-
293.
Find, 399.
find (union-find, quick union),
401.
findinit (fastfind initialization),
403, 411.
Finite-state machine.
deterministic, 248, 259.
nondeterministic, 259-267.
Flow, 435.
Floyd, R. W., 331.
Ford, L. R., 435.
Forecasting, 161.
Forest, 375.
Forsythe, G. E., 88.
Forward elimination, 59, 60-62,
62 (eliminate), 64.
Cnode, 188.
Fourier transform, 471-480.
Fredkin, E., 216.
Friedman, J. H., 370.
Fringe vertices, 393, 410.
Fulkerson, D. R., 435.
Garey, M. R., 536.
Gauss-Jordan method, 63, 65,
508.
541
Gaussian elimination, 57-65, 60
(gauss), 71, 76, 504, 508.
gcd (greatest common divisor,
Euclid's algorithm), 11, 12.
General regular-expression pattern
matching, 265 (match),
279.
Geometric algorithms, 307-370.
closest pair, 362-366.
convex hull, 321-333, 368.
elementary, 307-319.
grid method, 339-342.
inside polygon test, 316-318.
intersection, 349-359.
line drawing, 310-311.
range searching, 336-347.
simple closed path, 313-315.
2D-trees, 343-346.
Gerrymandering, 307.
Gold, B., 536.
Gosper, R. W., 242.
Graham, R. L., 326, 370.
Graham scan, 326-330, 329
(grahamscan).
Grammars, 270-272.
Graph algorithms, 373-454.
all-pairs shortest paths, 492-
494.
biconnectivity, 390-392.
bipartite matching, 444-447.
breadth-first search, 395.
connected components, 384.
cycle testing, 384.
depth-first search, 381-387.
elementary, 373-387.
exhaustive search for cycles,
515-520.
maximum tlow in a network,
439-440.
542
minimum spanning tree, 408-
413.
priority traversal, 395-397.
shortest path, 413-415.
stable marriage, 447-452.
strongly connected components,
428-430.
topological sorting, 426-428.
transitive closure, 423-426.
union-find, 398-405.
Graph input, adjacency lists, 379
(adjlist).
Graph input, adjacency matrix,
378 (adjmatrix).
Graph isomorphism, 387.
Graph traversal, 393-398.
Graphs, 492-494.
adjacency list, 416.
adjacency matrix, 416.
bipartite, 444-447.
complete, 376.
connected, 375.
connectivity, 389-405.
dense, 376.
directed, 376, 421-430, 421&
430.
directed acyclic, 426-428.
representation, 376-381, 416,
421, 435.
sparse, 376.
traversal, 393-398.
undirected, 376.
weighted, 376.
Greatest common divisor (gcd),
9-12.
Greatest increment method, 507.
Grid method, 339-342, 341
g7ngegrid), 342 (gridrange),
Guibas, L., 237.
Hamilton cycle problem, 514-
520, 531-532.
Hash functions, 202.
Hashing, 201-210, 234.
double hashing, 207-210.
initialization for open addressing,
205 (ha&initialize).
linear probing, 2055207, 205
(hashinsert).
open addressing, 205-210.
separate chaining, 202-204.
Head node, 1744175, 180, 181,
199, 203-204, 214, 222, 352-
353.
Heaps, 89, 129-140, 289-290,
397.
Heap algorithms, 129-140.
change, 135.
construct, 136-137.
downheap, 134, 136.
insert, 132, 135.
join, 139-140.
pqconstruct, 138.
pqdownheap, 139, 289-290.
pqinsert, 139, 158, 160.
pqremove, 139, 290.
pqreplace, 159, 160.
remove, 134, 135.
replace, 135.
upheap, 132.
Heap condition, 130.
Heapsort, 135-137, 136
(heapsort).
Hellman, M. E., 301.
Hoare, C. A. R., 103, 167.
Hoey, D., 349, 370.
Holt, R., 19.
Horner's rule, 45-46.
Hu, T. C., 536.
Huffman, D. A., 304.
INDEX
Huffman's algorithm (for file
compression), 239, 286-293,
490.
Hume, J. P., 19.
Hybrid searching, 219.
Increment sequence, 98.
Indexed sequential access, 226-
228.
index (convert from name to integer),
227, 230, 231, 376.
Indirect binary search trees, 184-
185.
Indirect heaps, 138-139, 159-160,
289-290.
Infeasible linear program, 501.
Inner loop, 13-14, 106, 124.
Insertion sort, 95-96, 96
(insertion), 112, 123-124.
inside (point inside test), 318.
insiderect (point inside rectangle
test), 338.
Integer linear programming, 533.
Integration, 79-86.
adaptive quadrature, 85-86, 85
(adapt).
rectangle method, 80-82, 81
(intrect), 85.
Romberg, 84.
Simpson's method, 83-84, 84
(intsimp), 85-86.
spline quadrature, 85.
symbolic, 79-80.
trapezoid method, 82-83, 83
(i&trap), 85.
Internal nodes, 180, 230, 289,
490.
Interpolation search, 177-178.
Interpolation.
polynomial, 68.
543
spline, 68-72.
Intersection, 349-359, 370.
Manhattan geometry, 350-356.
circles, 359.
horizontal and vertical lines,
305, 350-356.
lines, 356-359.
rectangles, 359.
two lines, 312-313, 313
(intersect).
interval, 337.
Inverse, 138, 385, 450-451.
Jarvis, R. A., 370.
Jensen, K., 19.
Johnson, D. S., 536.
Kahn, D., 304.
Karp, R. M., 243, 439-440.
Key generation, 299.
Keys.
binary representation, 119.
cryptology, 297.
searching, 171.
strings, 254.
Knapsack problem, 483-486, 519.
Knuth, D. E., 19, 36, 88, 167, 209,
237, 242, 304, 454.
Knuth-Morris-Pratt string searching,
244-249.
Kruskal, J. B. Jr., 412, 454.
Kruskal's algorithm (minimum
spanning tree), 411-413, 412
(kruskal), 417.
Kung, H. T., 466.
Lagrange's interpolation formula,
47, 472.
Leading term, 14, 15.
544
Leaf pages, 233.
Least-squares data fitting, 73-76.
Lewis, H. R., 536.
IgN, 16.
Lin, S., 524.
Line, 308.
Line drawing, 310-311.
Line intersection, 312-313, 349%
359.
one pair, 312-313.
initialization (buildytree), 353.
Manhattan (scan), 355.
Linear congruential generator,
35-38, 37 (random).
Linear feedback shift registers,
38.
Linear probing, 205-207, 209.
Linear programming, 497-510,
536.
Linear running time, 14.
Linked lists, 25-28.
create and add node, 27
(listadd).
input and construction, 26
(readlist).
merging, 148 (listmerge).
output, 26 (writelist).
sequential search, 174 (listinsert,
listsearch), 203, 341,
343.
sorting, 149-152, 149 (sort),
151 (mergesort).
InN, 16.
Logarithm, 16.
Logarithmic running time, 14.
Longest path, 527.
Lookahead, 273.
MACSYMA, 88.
Malcomb, M. A., 88.
Master index, 227.
Matching, 443-452, 454.
match (general regular-expression
pattern matching), 265.
Mathematical algorithms, 23-88.
Mathematical programming, 497.
Matrices.
addition, 28-29 (matradd).
band, 64.
chain product, 486-489.
inverse, 65.
multiplication, 29, 53-54, 487.
multiplication by vector, 466-
469.
representation, 28-30.
sparse, 30, 63.
Strassen's multiplication method,
53-54, 65, 487.
transposition, 465.
tridiagonal, 64, 71.
Maxflow-mincut theorem, 438.
Maximum flow, 435-438.
Maximum matching, 443.
Mazes, 385-386, 398, 418.
McCreight, E., 228.
Mead, C. A., 536.
Merging, 146-152, 156-164, 363-
366.
mergesort (non-recursive),
150-152, 151 (mergesort),
366.
mergesort (recursive), 148-149,
148 (sort), 363.
multiway, 156-162.
polyphase, 163.
Microprocessors, 458, 469.
Minimum cut, 438.
Minimum spanning trees, 408-
413, 417, 454, 518, 522-524.
INDEX
mischarsearch (Boyer-Moore
string searching), 251.
mod, 10-12, 34-40, 301-302.
Moler, C. B., 88.
Moore, J. S., 242, 304.
Morris, J. H., 242, 304.
Morrison, D. R., 219.
Multidimensional range searching,
346-347.
Multiplication.
large integers, 37 (mult).
matrices, 27-28, 51-52.
polynomials (divide-and-conquer),
48-50 (mult).
polynomials (fast Fourier
transform), 471-480.
Multiprocessor scheduling, 533.
Multiway merging, 156-162.
Multiway radix searching, 218-
219.
Munro, I., 88.
N log A; running time, 15.
name (convert from integer to
name), 376, 428, 429.
Nearest-neighbor problem, 366.
Network flow, 433-441, 445-447,
454, 497-499.
Networks, 376, 435.
Nievergelt, J., 231, 237, 536.
Node transformations, 189-191.
Non-basis variables, 504.
Nondeterminism, 259-267, 529.
Nonterminal symbol, 270.
NP, 529.
NP-complete problems, 527-534,
536.
Numerical analysis, 88.
Objective function, 498.
545
Odd-even merge, 459-463.
One-dimensional range search
(bstrange), 337.
One-way branching, 218.
Open addressing, 205-210.
Operations research, 433, 441.
Optimal binary search trees, 489-
492.
Or, 258, 261.
Ordered hashing, 210.
P, 528.
Package wrapping, 323-326.
Pages, 226-239.
Papadimitriou, C. H., 454, 536.
Parallel computation, 457-469.
Parse tree, 271.
Parser generator, 280.
Parsing, 269-280, 304.
bottom-up, 275-276.
recursive descent, 272-275.
shift-reduce, 276.
top-down, 272-275.
Partition, 533.
Partitioning, 104-105 (partition),
112, 145.
Pascal, 9, 19, 271-272.
Path compression, 403.
Paths in graphs, 374-423.
Patricia, 219-223, 254.
patriciainsert, 222.
patriciasearch, 221.
Pattern matching, 241, 257-267,
279.
Perfect shuffle, 459-465, 468-
469, 478-480, 536.
Permutation generation, 520-
522.
Pippenger, N., 231, N., 237.
546
Pivoting, 5044510, 508 (pivot).
Plaintext, 296.
Planarity, 387.
Point, 308.
Polygon, 308.
convex, 321.
simple closed, 313-315.
standard representation, 318.
test if point inside, 316-318.
Voronoi, 367.
Polynomials, 45-54.
addition, 24-28.
evaluation, 45-46, 465, 471-
472, 474-475.
interpolation, 47-48, 471-472,
475-477.
multiplication, 24-25, 48-50,
471-472, 477-480.
representation, 23-28.
Polyphase merging, 163.
Pop, 109, 439.
pqchange (change priority in
priority queue), 396.
pqconstruct (heap construction,
indirect), 138, 396, 411.
pqdownheap (top-down heap
repair, indirect), 139, 289,
290.
pqinsert, 139.
pqreznove (remove largest item
from priority queue), 396,
139, 290, 411.
Pratt, V. R., 242, 304.
Preprocessing, 335.
Prim, R. C., 410, 454.
Prim's algorithm (minimum
spanning tree), 410-411, 413.
Print binary search tree
(treeprint), 336.
Priority graph traversal (priorityfirst
search).
breadth-first search, 397, 416.
densepfs, 416.
depth-first search, 397, 416.
Euclidean shortest path, 418.
minimum spanning tree, 409-
411, 416.
network flow, 439-440.
shortest path, 413-416.
sparsepfs, 395-397.
Priority queues, 127-140, 144,
158-161, 167, 395-397.
Probe, 205.
Projection, 339.
Pruning, 517-522.
Pseudo-angle calculation (theta),
316.
Public-key cryptosystems, 300-
302, 304.
Push, 109.
Pushdown stack, 109-110, 394.
Quadrature; see integration.
Queue, 109, 395.
Quicksort, 103-113, 118, 124,
135, 144, 152, 165, 167, 183,
218.
Rabin, M. O., 243.
Rabin-Karp string searching
(rksearch), 252-253.
Rabiner, L. R., 536.
radixexchange (radix exchange
sort), 118.
Radix searching, 213-223.
digital search trees, 213-216.
multiway, 218-219.
Patricia, 219-223.
INDEX
tries, 216-218, 291-293,
Radix sorting, 115-124, 165, 218.
radix exchange, 117-121.
straight radix, 121-124.
Random integer in a fixed range
(randomint), 38, 40.
Random number generation, 88,
202, 299.
Random numbers, 33-42, 112.
additive congruential generator,
38-40, 42.
linear congruential generator,
35-38, 42.
pseudo-, 33.
quasi-, 34.
uniform, 34.
Range searching.
grid method, 339-342, 346.
/CD trees, 346-347.
multidimensional, 346-347.
one-dimensional, 336-337.
projection, 339.
sequential search, 338.
2D trees, 343-346.
rbtreeinsert (red-black tree insertion),
194.
readlist (linked list input and
construction), 26, 148.
readln, 9.
Records.
database 335.
searching, 171-172.
sorting, 93-94.
Records/database, 335.
Records/searching, 171.
Recursion, 11-12, 176, 363-366,
381-382, 398, 465, 479, 489,
491, 515, 517-522.
removal, 110-111, 145-146,
152, 176, 179-180, 275, 366,
12.
547
two-dimensional, 356, 361,
363-367.
Red-black trees, 192-199.
Reduction, 445, 530-532.
Regular expression, 258.
Regular-expression pattern
matching, 258, 279, 304.
Reingold, E. M., 536.
remove (delete largest element in
heap), 134.
Replacement selection, 158-161.
replace (replace largest element
in heap), 135.
Representation.
binary search trees, 178-179,
184-185.
finite state machines, 247, 262-
263.
functions, 65.
graphs, 376-381.
lines, 308.
matrices, 28-30.
points, 308.
polygons, 306, 318.
polynomials, 23, 28.
trees (father link), 290-202,
395-396, 400-404, 411, 415.
Rivest, R. L., 167, 301, 304.
rksearch (Rabin-Karp string
searching), 253.
Root node, 230, 233.
Roots of unity, 473-477.
Rotation, 196-197.
Run-length encoding, 284-286.
RSA public-key cryptosystem,
301-302.
same (test if two points are on the
same side of a line), 313.
Satisfiability, 529, 531-532.
548
Scan conversion, 310-311.
scan (line intersection, Manhattan),
355.
Scheduling, 373.
Searching, 171-237.
binary search, 175-177.
binary tree search, 178-185.
digital search trees, 213-216.
disk searching, 225-235.
elementary methods, 171-185.
extendible hashing, 231-235.
external searching, 225-235.
hashing, 201-210.
indexed dequential access,
226-228.
Patricia, 221-222.
radix search tries, 216-218.
radix searching, 213-223.
sequential, 172.
sequential list, 174.
varying length keys, 223.
Sedgewick, R., 167, 237.
Selection, 144-146.
select (selection, nonrecursive),
146.
select (selection, recursive), 145.
Selection sort, 95 (selection), 144,
326.
Self-organizing search, 175.
Seminumerical algorithms, 88.
Sentinel, 106, 173, 273, 309, 329,
96, 247, 254, 493.
Separate chaining, 202-204, 209.
Sequential searching, 172-174,
339.
Sets, 398-405.
Shamir, A., 301, 304.
Shamos, M. I., 349, 370.
Shellsort (shellsort), 97-99, 329.
Shortest path, 413-415, 418, 454,
492-494.
Simple closed path, 313-315.
Simplex method, 497-510.
Simultaneous equations, 58, 75,
503-504.
Single rotation, 196-197.
Sink, 435.
Slack (artificial) variables, 503.
Sort-merge, 156.
sort3 (sorting three elements), 93,
459-460.
Sorting, 91-167.
bubble, 99.
disk, 162, 165, 155-165.
distribution counting, 99-101.
elementary methods, 91-101.
external, 92.
Heapsort, 135-137.
insertion, 95-96.
internal, 92.
linear, 123-124.
mergesort (non-recursive),
150-152.
mergesort (recursive), 148-149.
Quicksort, 103-114.
radix exchange, 117-121.
relationship to convex hull,
323.
selection, 94-95.
shellsort, 97-99.
stability, 92-93, 121, 152.
straight radix, 121-124.
tape, 155-165.
three elements (sort3), 93.
Source, 435.
Spanning trees, 375, 408-413.
Sparse graphs, 376, 378, 396,
397-398, 411, 413.
INDEX
sparsepfs (priority graph traversal),
396, 410, 415-417, 439-
440.
Spline interpolation, 68872, 71
(makespline), 72 (eval).
Spline quadrature, 85.
Splitting, 189-191, 1944199, 228-
229.
Stable marriage problem, 447-
452, 454.
Stack, 394, 428, 429.
Standard form of linear programs,
503.
Standish, T. A., 304.
Steepest descent method, 507.
Steiglitz, K., 454, 536.
Stone, H. S., 536.
straightradix (straight radix
sort), 121-124.
Strassen's method, 53-54, 65, 88,
487.
String processing, 241-304.
String searching, 241-254.
Boyer-Moore, 2499252.
brute-force, 243.
Knuth-Morris-Pratt, 244-249.
mismatched character, 250-
251.
multiple searches, 254.
Rabin-Karp, 252-253.
Strings, 241, 283, 284-285.
Strong, H. R., 231, 237, 231.
Strongly connected components,
428-430.
substitute (backward substitution),
62.
Supercomputer, 458, 513, 528.
Symbol tables, 171.
Systolic arrays, 466, 536.
549
Tail node, 25-28, 174-175, 180,
203.
Tarjan, R. E., 387, 405, 428, 454.
Terminal symbol, 270.
term (top-down compiler), 278.
term (top-down parser), 273.
theta (pseudo-angle calculation),
316, 324, 325.
Thompson, K., 304.
3-node, 188.
Top-down 2-3-4 trees, 187-199.
Top-down compiler (expression,
term, factor), 277-278.
Top-down parsing, 272-275
(expression, term, factor),
273-274.
Topological sorting, 426-428,
430.
Transitive closure, 423-426, 493.
Traveling salesman problem, 387,
513-524, 531-532.
Tree vertices, 393.
treeinitialize (binary search tree
initialization), 181.
treeinsert (binary search tree insertion),
181.
treeprint (binary search tree
sorted output), 182, 346, 354.
Trees.
AVL, 198.
balanced, 187-199.
binary, 179, 237.
binary search, 1788185.
breadth-first search, 395.
depth-first search, 382, 384,
394, 422-423.
exhaustive search, 516-519.
father link representation,
290-292, 395-396, 4OOC404,
411, 415.
550
parse, 271.
red-black, 192-199.
spanning, 375, 408-413.
top-down 2-3-4, 187-199.
2-3, 198.
2-3-4, 188.
union-find, 399-404.
treesearch (binary tree search),
180, 193.
Tries, 216-218, 291-293.
2D (two-dimensional) trees, 343%
346.
twoDinsert (insertion into 2D
trees), 345.
twoDrange (range searching with
2D trees), 346.
2-node, 188.
2-3 trees, 198.
2-3-4 tree, 188.
Ullman, J. D., 237, 304.
Undirected graphs, 376.
Union, 399.
Union-find, 454.
Union-find algorithms, 398-405.
analysis, 405.
(fastfind), 403.
(find), 401.
halving, 404.
height balancing, 404.
path compression, 403.
quick union, 401.
splitting, 404.
weight balancing, 402.
Unseen vertices, 393, 410.
Up edges, 423, 430.
upheap, insert (heap insertion at
bottom), 132.
van Leeuwan, J., 454.
Variable-length encoding, 286-
293.
Vernam cipher, 299.
Vertex cover, 533.
Vertex visit, adjacency lists
(visit), 382.
Vertex visit, adjacency matrix
(visit), 384.
Vertices, 374.
fringe, 393.
tree, 393.
unseen, 393.
Very large scale integrated circuits,
458.
Vigenere cipher, 298.
Virtual memory, 165, 234.
Visited vertices, 410.
visit.
vertex visit for graph searching,
adjacency lists, 382.
vertex visit for graph searching,
adjacency matrix, 384.
graph search to test biconnectivity,
392.
graph traversal to find strong
components, 429.
exhaustive graph traversal,
515.
permutation generation, 521.
Von Neumann, J., 457.
Von Neumann model of computation,
457.
Voronoi diagram, 366-368.
Voronoi dual, 417.
Warshall, S., 425.
Warshall's algorithm (computing
transitive closure), 425, 492-
493.
Wegner, P., 88.
INDEX 551
Weight balancing, 402.
Weighted graphs, 376, 380, 407-
418.
Weighted internal path length,
490.
Weighted matching, 444.
Wells, M. B., 536.
Wirth, N., 19.
Worst case, 13.
wrap (convex hull by package
wrapping), 325.
writelist (linked list output), 26,
148.
writeln, 9.
z, 25-28, 174-175, 180-181, 194,
203, 214-215, 221-222, 341,
345, 352-353, 364-365.
DESIGNS
Cover
Page 1
21
89
169
239
305
371
455
Back
Insertion sort: Color represents the key value; the ith column (from
right to left) shows result of ith insertion.
Relatively prime numbers: A mark is in positions i,j for which the
greatest common divisor of i and j is not 1.
Random points: A mark is in position i, j with i and j generated by
a linear congruential random number generator.
A heap: Horizontal coordinate is position in heap, vertical coordinate
is value.
A binary search tree laid out in the manner of an H-tree.
Huffman's algorithm before and after: run on the initial part of the
text file for Chapter 22.
One intersecting pair among a set of random horizontal and vertical
lines.
Depth first search on a grid graph: each node is adjacent to its
immediate neighbors; adjacency lists are in random order.
Counting to 28: eight cyclic rotations.
Random permutation: Color represents the key value; the ith column
(from right to left) shows result of exchanging ith item with one
having a random index greater than i.
Heap design inspired by the movie "Sorting out Sorting," R. Baecker, University
of Toronto.
Pictures printed by Tom Freeman, using programs from the text.
Bạn đang đọc truyện trên: TruyenTop.Vip