Cryptography for Internet and Database Applns - N. Galbreath _Wiley_ 2002_ WW.pdf

Cryptography for Internet and Database Applns - N. Galbreath _Wiley_ 2002_ WW.pdf, updated 7/23/22, 2:29 AM

visibility104

About Global Documents

Global Documents provides you with documents from around the globe on a variety of topics for your enjoyment.

Global Documents utilizes edocr for all its document needs due to edocr's wonderful content features. Thousands of professionals and businesses around the globe publish marketing, sales, operations, customer service and financial documents making it easier for prospects and customers to find content.

 

Tag Cloud

Nick Galbreath
Cryptography for
Internet and Database
Applications
Developing Secret and Public Key
Techniques with Java™
Cryptography for Internet and
Database Applications
Nick Galbreath
Cryptography for
Internet and Database
Applications
Developing Secret and Public Key
Techniques with Java™
Publisher: Bob Ipsen
Editor: Carol A. Long
Developmental Editor: Adaobi Obi
Managing Editor: Micheline Frederick
New Media Editor: Brian Snapp
Text Design & Composition: Wiley Composition Services
Designations used by companies to distinguish their products are often claimed as trade-
marks. In all instances where Wiley Publishing, Inc., is aware of a claim, the product names
appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appro-
priate companies for more complete information regarding trademarks and registration.
This book is printed on acid-free paper. ∞
Copyright © 2002 by Nicholas Galbreath. All rights reserved.
Published by Wiley Publishing, Inc., Indianapolis, Indiana
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or
otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright
Act, without either the prior written permission of the Publisher, or authorization through
payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rose-
wood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470. Requests to the Pub-
lisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc.,
10475 Crosspointe Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail:
permcoordinator@wiley.com.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their
best efforts in preparing this book, they make no representations or warranties with respect
to the accuracy or completeness of the contents of this book and specifically disclaim any
implied warranties of merchantability or fitness for a particular purpose. No warranty may
be created or extended by sales representatives or written sales materials. The advice and
strategies contained herein may not be suitable for your situation. You should consult with
a professional where appropriate. Neither the publisher nor author shall be liable for any
loss of profit or any other commercial damages, including but not limited to special, inci-
dental, consequential, or other damages.
For general information on our other products and services please contact our Customer
Care Department within the United States at (800) 762-2974, outside the United States at
(317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print may not be available in electronic books.
Library of Congress Cataloging-in-Publication Data:
ISBN: 0-471-21029-3
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
Preface
xiii
Introduction
xv
Chapter 1
Bits and Bytes
1
General Operations
1
Number Bases
2
Bits and Bytes
3
Signed Bytes
4
Bitwise Operators
4
Complementation or Bitwise NOT
6
Bitwise AND
6
Bitwise OR
7
Bitwise Exclusive OR (XOR)
8
Left-Shift
8
Right-Shift
9
Special Operations and Abbreviations
10
Packed Words
11
Integers and Endian Notation
12
Java Numerics
13
Basic Types
14
Type Conversion
15
Unsigned to Signed Conversions
17
Overflow
17
Arrays
18
Numeric Classes
20
Contents
v
Booleans and BitFields
21
Chars
22
Working with Bytes
22
Sign Problems
22
Conversion of Integral Types to Byte Arrays
24
Converting to Hex Strings
25
BigInteger
28
Creating and Converting
29
BigInteger and Cryptography
31
Secret Methods in BigInteger
31
Chapter 2
Secret Key Cryptography
35
Symmetric Block Ciphers
35
Cipher Properties
36
Security Properties
36
Brute-Force Attacks
37
Other Attacks
37
Common Block Ciphers
39
Data Encryption Standard (DES)
39
Triple DES
40
Blowfish
41
IDEA
43
RC5
43
Rijndael and the Advanced Encryption Standard (AES)
44
Twofish
46
RC6
47
Ciphers You Shouldn’t Use
47
Password XOR Schemes
47
Classical Cryptography
48
ROT 13
49
Padding
49
Fixed-Value Padding
49
Random Padding
50
PKCS Padding
50
Modes of Operations
51
Initialization Vectors
51
Electronic Codebook Mode (ECB)
53
Cipher Block Chaining Mode (CBC)
54
Cipher Feedback Mode (CFB)
56
Output Feedback Mode (OFB)
58
Counter Mode (CTR)
60
Propagating CBC (PCBC) Mode
60
Key Wrapping
61
Triple DES KEY Wrapping
61
AES Key Wrapping
62
Turning Passwords into Keys
63

Hashes
64
Cryptographic Hashes
65
Collisions
66
Attacks
67
Algorithms
69
The MD Family
71
The SHA Family
71
The RIPE-MD Family
72
Hash Standards and Practices
73
Hashed Message Authentication Codes (HMACs)
74
The Standard HMAC
74
Legacy Concatenation Schemes
75
HMAC Standards and Practices
76
Summary
76
Chapter 3
Public Key Cryptography
77
Public Key Ciphers
77
Other Systems
79
Digital Signatures
79
Key Agreements
79
Zero-Knowlege Schemes
79
Secret Sharing
80
Public Key Security Classification
80
Fundamental Mathematics
82
Prime Numbers
82
The Distribution of Prime Numbers
83
Prime Testing
84
Probabilistic Tests
85
Sequence-Based Tests
87
Elementary Number Theory
88
Modular Arithmetic
88
Additive Groups
89
Multiplicative Groups
89
Fields
90
Rings
90
Orders and Generators
90
Public Key Encryption and Major PKCS Categories
91
RSA and Integer Factorization
92
Factoring
92
The RSA Problem
94
The Algorithm
94
Message Representation and OAEP
96
In Practice and Standards
98
Choice of Parameters
98
Discrete Logarithm Systems
100
Underlying Mathematics
100
The Algorithm
103
Standards and Practice
106

Elliptic Curves
106
Underlying Mathematics: Elliptic Curves
106
The Algorithm
110
Standards and Practice
112
Other Public Key Cryptographic Systems
112
Rabin Cryptosystem
112
NTRU
114
Summary
115
Chapter 4
Random Numbers
117
Randomness and Security
119
Testing for Randomness
120
FIPS 140-2 Requirements
121
Pseudorandom Number Generators
122
Cryptographic PRNG
123
SHA-1 PRNG
123
Cipher-CBC or ANSI X9.17
123
FIPS 186
123
Blum-Blum-Shub
124
Stream Ciphers
125
One-Time Pads
125
RC4 or ArcFour
125
Using Randomness
126
Generating Random Numbers for Gaming
126
Generating Random Numbers in a Range
127
Shuffling
129
Generating Random Permutations
131
Small Permutations
131
Large Fixed Permutations
132
Random Sampling
133
Accessing Entropy
134
OS Services
134
Win32 CryptoAPI CryptGenRandom
135
/dev/random and friends
135
Userland Services
136
Entropy Generating Daemon (EGD)
137
PRNGD
141
Yarrow and EGADS
141
TrueRand Library
141
Remote Services
142
RAND Corporation
143
HotBits
143
Random.org
143
LavaRnd
144
Java and Random Numbers
144
Random and SecureRandom
144
java.util.random
145
java.security.SecureRandom
146

Developer Issues
148
Reseeding
148
Collecting Entropy
150
An Entropy Pool Implementation
150
Basic System State
151
Thread Schemes
153
Reading External Sources
156
Application Events
156
User Events
157
Chapter 5
Java Cryptography
159
Organization
160
Providers and Engine Classes
161
Parameters, Keys, and Certificates
162
Error Handling
163
Providers
164
Standard Names
165
Standard Sun and SunJCE Providers
168
Other Providers
169
Initializing Providers
170
Writing Your Own Provider
171
Core Engine Classes
171
MessageDigest
171
Digest Streams
172
MAC
173
SecureRandom
174
Ciphers
177
Additional Cipher-Related Objects
180
Signatures
183
SignedObject
184
Key Agreement Protocols
184
Parameters, Keys, and Certificates
185
Algorithm Parameters
186
AlgorithmParameters
186
AlgorithmParameterGenerators
188
Keys
189
Secret Keys
192
Public/Private Keys
195
Encoding and Encrypting Keys
197
Summary
202
Chapter 6
Small Message Encoding and Encryption
203
Preprocessing
203
Converting Digits into Bytes
203
7-bit to 8-bit Compression
205
General Compression and java.util.zip.Deflate
206
Adding Check and Parity Bits
207

Small Message Encryption
209
Single-Block Encryption
210
n-Block Encryption
210
Very Small Message Encryption
211
XOR Tables
211
Small Codebooks
211
RC5-16/16
212
Small Message Encoding
212
Encoding for Customer-Usable Data
213
Capacity and Range
213
Selecting a Base Representation
214
Selecting Base Alphabets
215
Mixed Bases and Alphabets
221
Adding Check Digits
221
Encoding for Machines and Customer-Visible Applications
230
Base 64
230
Base 85
234
Base 128 and Java Source Encoding
237
Chapter 7
Application and Data Architecture
241
Database Architecture for Encrypted Data
241
Selecting a Cipher
243
Secret or Public?
243
Cipher Selection
244
Data
245
Passwords
245
Challenges and Responses
246
Payment, Credit Card, and Other Account Numbers
247
Social Security Number (U.S.)
250
Birthdates and Birthdays
250
Last Name
251
Searching, Indexing, and Constraints
251
Removing Randomness
252
Deterministic Key Selection
252
Indexing and Hashing
253
Uniqueness Constraints
254
Asymmetric Data Usages
255
Null Values and Database Applications
256
Secure Memory Management in Java
258
Smart Array Classes
259
Char Arrays
262
Using SecureRandom
263
Secret Key Management
263
Secret Key Data
264
Key Generation
266
Key Encryption
266
Storage
267

Key Access and Distribution
268
Using Keys with Ciphers and MACs
269
Passwords
272
Startup Passwords
272
Member Names and Passwords
274
Selecting Passwords
274
Member Login, Success and Failure
275
Changing Passwords and Challenges
276
Web-Based Password Entry
276
Generating New Passwords
277
Member Names
278
Logging
278
Embedded-Encryption Logging
279
Fully Encrypted Log Files
279
Public Key Logging
281
Split Log Files
281
Network-Based Logs
282
Cryptographic Tokens and Applications
282
Token Design
282
Expirations and Time Quantization
283
Creating the Security Bits
285
URL Tokens
285
Tamper-Evident URLs
285
Protecting Static Content
286
A Simple URL MAC Implementation
287
Fast Query String Parsing
290
URL Encryption
296
Very Short URLs
296
Cookie Tokens
297
Detecting Cookie Capability
297
Cookies and Authentication
297
Tokens for Access Control
298
Buy-on-Demand Systems
299
Multiple Key Systems
299
Trials and Expirations
300
Decimal and Monetary Computations
300
Doubles and Floats
300
BigDecimal
301
Rounding
302
BigDecimal and Mathematics
303
BigDecimal Alternatives and Wrappers
304
Usage and Storage
304
Appendix A Java Cryptography Class Reference
305
References
367
Index
381

I wrote this book for software engineers with little or no exposure to cryp-
tography. Most other books fall into one of two categories, the encyclope-
dia and description or the purely API descriptive. The goal was try and
bridge the two by providing a solid introduction to cryptography while
providing solid examples and uses. In addition, many books focus over-
whelmingly on public key techniques. In my experience the most common
uses for public key ciphers are handled by third-party applications (VPNs,
Emails) or off-the-shelf protocols such as SSL and SSH.
While there are a number of excellent cryptographic libraries in C and
C++, Java serves as the reference since:
 It’s very popular for new business and server applications.
 Memory management is automatically done, eliminating entire
classes of bugs (stack smashing and buffer overflows).
 It provides a standard cryptographic API that, while not perfect or
complete, is about as “universal” as we can get.
The cryptographic API for Java is scattered among several packages.
Instead of listing classes by packages as is commonly done, Appendix A
lists the classes alphabetically. I found this to be much more useful than
flipping between different package sections.
I’ve tried to limit source code to examples that are relatively simple or
that demonstrate main points or self-contained utilities. More complicated
(and perhaps more useful) examples were excluded simply because the
error and exception handling for a security application can be quite long
and tedious and didn’t really add much value. I have always disliked
Preface
xiii
CD-ROMs glued into books and likewise, never found chapters and chap-
ters of source code to be very useful. Instead, full source code and more
examples are presented at the companion Web site at www.wiley.com/
compbooks.galbreath.
Unfortunately, time and space didn’t permit examination of many top-
ics. In particular:
 XML key management, encryption, and signature scheme
 Privacy Issues, including privacy “seals,” Gramm-Leech-Biley Act of
1999 and HIPPA privacy rule.
 More detailed database tips and techniques.
Perhaps these will be detailed in future editions. Until then, I hope you find
this book useful.
Finally, this book would not have been possible without the many fruit-
ful conversations and support from Dave Andre, Jeff Bussgang, Ann
Calvit, Roy Dixit, Jim Finucane, Bill French, Venkat Gaddipati, Nabil
Hachem, Sam Haradhvala, Adam Hirsch, Steve Morris, Jack Orenstein,
Rich O’Neil, and Matt Rowe, and many others from Upromise, SightPath
and Open Market. Special thanks to my family and friends that had to put
up with my social hiatus while working on this book. Thank you all.
Nick Galbreath
Plum Island, MA
June 2002

The goal of the book is to present the systems programmer with a full intro-
duction into the science and the application of cryptography using Java.
Unlike many texts, this book does not focus on cryptography for transmis-
sion (allowing two parties to communicate securely). For most applica-
tions, those requirements are handled by the SSL or SSH protocols, or a
third-party application. Instead this book focuses on cryptography for stor-
age, message integrity, and authentication, (the so-called “single-point”
techniques) which is more common with custom applications. Beside pure
cryptography, many “auxiliary” techniques are covered that make cryp-
tography useful.
The first chapter covers basic logical and numeric operations, both at the
theoretical level and with the specifics of the Java numeric model. While it
may be a review to many readers, I’ve found that many system program-
mers just do not have exposure to these low-level details. Either they never
learned, or more likely, because it just isn’t normally used in day-to-day
work, it’s been forgotten. Those readers who are more familiar with C or
C++, should find this chapter especially useful for translating code one
way or another from Java.
The next two chapters introduce the reader to science, mathematics, and
standards of both secret key and public key cryptography. Here you’ll
learn why the algorithms work and the various tradeoffs between them.
While the mathematics aren’t essential in day-to-day programming tasks,
the goal was to teach the reader to be familiar with the terminology that is
often tossed around to be able to make informed decisions.
Introduction
xv
Chapter 4 discusses random numbers and random number generators.
While not technically cryptography, random numbers are essential to it.
Those readers developing gaming systems should find this section espe-
cially interesting. Java is used for examples of algorithms, but it could be
easily converted to any other programming language.
Finally, in Chapter 5 we introduce Java’s cryptographic APIs. The Java
SDK already comes with some excellent documentation, but I’ve tried to
pull it together consolidating the JCA and JCE into a coherent format. Like-
wise, I made Appendix A the Java cryptography reference I wanted when
I’m programming. Instead of listing classes by package, I list them all
alphabetically. You’ll find you’ll do a lot less page flipping this way and get
a much better understanding of the API with this organization.
Chapter 6 is on small message encoding, and like Chapter 4, isn’t techni-
cally cryptography, but it’s always an issue when using cryptography.
Here you’ll learn how to convert binary data (perhaps encrypted) into var-
ious ASCII formats. This is critical when embedding data in a URL, creating
passwords and cryptographic tokens.
Chapter 7 pulls everything together and discusses many topics: applica-
tion and database design, working the passwords and tokens, key man-
agement, and logging.
Numerous source code examples are used, and the source itself can be
found at the companion Web site www.modp.com. Many of the examples
in the book will require a little reworking for production use. You’ll want
to modify the error handling to suite your needs. In some cases, if the code
is especially long and not particularly illuminating, I decided not to list it
and instead just refer to the Web site. I find page after page of printed
source code in a book to not be particularly useful.
Unfortunately, due to time and space requirements a lot of important
topics are not covered. Digital signatures are only mentioned and they
deserve more, but I’ve found for basic application work, they just aren’t
that useful. Either, most people won’t use them or there is already a trust
relationship in place eliminating their usefulness, or encryption with a hash
or MAC is preferable. And finally, when I do need to use them, the process
is normally handled by a third party application. Other topics not given
their due are the new XML and SAML standards for creating documents
that contain encrypted data, and embedding cryptography within the data-
base (instead of the application), such as the newer Oracle database can do.
A lot more could be said for key management and database design as well.
Perhaps future editions will remedy this.

1
Before getting into the details of cryptographic operators, we’ll review
some basics of working with bits and number bases. For those of you who
have worked on operating systems, low-level protocols, and embedded
systems, the material in this chapter will probably be a review; for the rest
of you, whose day-to-day interactions with information technology and
basic software development don’t involve working directly with bits, this
information will be an important foundation to the information contained
in the rest of the book.
General Operations
Most cryptographic functions operate directly on machine representation
of numbers. What follows are overviews of how numbers are presented in
different bases, how computers internally represent numbers as a collec-
tion of bits, and how to directly manipulate bits. This material is presented
in a computer- and language-generic way; the next section focuses specifi-
cally on the Java model.
Bits and Bytes
C HAPTE R
1
Number Bases
In day-to-day operations, we represent numbers using base 10. Each digit
is 0 through 9; thus, given a string of decimal digits dndn-1 ... d2d1d0, the
numeric value would be:
10ndn + 10n-1dn-1 + ... + 102d2 + 10d1 + d0
This can be generalized to any base x, where there are x different digits
and a number is represented by:
xndn + xn-1dn-1 + ... + x2d2 + xd1 + d0
In computer science, the most important base is base 2, or the binary rep-
resentation of the number. Here each digit, or bit, can either be 0 or 1. The
decimal number 30 can be represented in binary as 11110 or 16 + 4 + 2. Also
common is hexadecimal, or base 16, where the digits are 0 to 9 and A, B, C,
D, E, and F, representing 10 to 15, respectively. The number 30 can now be
represented in hexadecimal as 1E or 16 + 14. The relationship between dig-
its in binary, decimal, and hexadecimal is listed in Table 1.1.
Table 1.1 Binary, Decimal, and Hexadecimal Representations
BINARY
DECIMAL
HEXADECIMAL
0000
0
0
0001
1
1
0010
2
2
0011
3
3
0100
4
4
0101
5
5
0110
6
6
0111
7
7
1000
8
8
1001
9
9
1010
10
A
1011
11
B
1100
12
C
1101
13
D
1110
14
E
1111
15
F

When you are working with different bases, the base of a number
may be ambiguous. For instance, is 99 the decimal or the hexadecimal 99
(= 9 × 16+9)? In this case, it’s common to prefix a hexadecimal number with
0x or just x (e.g., 99 becomes 0x99 or x99). The same confusion can happen
with binary numbers: Is 101 the decimal 101 or the binary 101 (4 + 1 = 5)?
When a number in binary may be confused, it’s customary to add a sub-
script 2 at the end of a string of binary digits, for example, 1012.
Any number can be represented in another base b if it is positive; how-
ever, doing the conversion isn’t necessarily easy. We’ll discuss general-
purpose base conversion in a later section, but it’s useful to note that
conversion between two bases is especially easy if one of the bases is a
power of the other. For instance, the decimal number 1234 has the canoni-
cal representation in base 10 as 1 × 1000 + 2 × 100 + 3 × 10 + 4. However,
1234 can also be thought as two “digits” in base 100: (12) and (34), with a
value of 12 × 1000 + 34 × 100. It’s the same number with the same value; the
digits have just been regrouped. This property is especially useful for base
2. Given a binary string, it’s possible to convert to hexadecimal by group-
ing 4 bits and computing the hexadecimal value:
10011100 = 1001 1100 = 9C
Bits and Bytes
A bit is the smallest unit of information that a computer can work on and
can take two values “1” and “0,” although sometimes depending on con-
text the values of “true” and “false” are used. On most modern computers,
we do not work directly on bits but on a collection of bits, sometimes called
a word, the smallest of which is a byte. Today, a byte by default means 8 bits,
but technically it can range from 4 to 10 bits. The odd values are from either
old or experimental CPU architectures that aren’t really in use anymore. To
be precise, many standards use octet to mean 8 bits, but we’ll use the more
common byte. Modern CPUs operate on much larger word sizes: The term
32-bit microprocessor means the CPU operates primarily on 32-bit words in
one clock cycle. It can, of course, operate on 8-bit words, but it doesn’t
mean it happens any faster. Many CPUs also have special instructions that
sometimes can operate on larger words, such as the SSE and similar
instructions for multimedia, as well as vector processing, such as on the
PowerPC.
A byte has a natural numeric interpretation as an integer from 0 to 255
using base 2, as described earlier. Bit n represents the value 2n, and the
value of the byte becomes the sum of these bits. The bits are laid out exactly
as expected for a numeric representation, with bit 0 on the right and bit 7 on
the left, but the layout is backward when compared to Western languages.

(b7b6b5b4b3b2b1b0) = 27b7 + 26b6 + 25b5 + 24b4 + 23b3 + 22b2 + 21b1 + 20b0
or using decimal notation
(b7b6b5b4b3b2b1b0) = 128b7 + 64b6 + 32b5 + 16b4+ 8b3 + 4b2+ 2b1 + b0
For example, 00110111 = 32 + 16 + 4 + 2 + 1 = 55.
Bits on the left are referred to as the most-significant bits, since they
contribute the most to the overall value of the number. Likewise, the right-
most bits are called the least-significant bits. This layout is also known as
Big-Endian, which we’ll discuss later.
Signed Bytes
Negative numbers can be represented in a few ways. The simplest is to
reverse one bit to represent the sign of the number, either positive or nega-
tive. Bits 0 through 6 would represent the number, and bit 7 would repre-
sent the sign. Although this allows a range from –127 to 127, it has the quirk
of two zeros: a “positive” zero and a “negative” zero. Having the two zeros
is odd, but it can be worked around. The bigger problem is when an over-
flow occurs—for instance, adding 127 + 2 is 129 in unsigned arithmetic, or
1000001. However, in signed arithmetic, the value is –1.
The most common representation is known as two’s complement. Given x,
its negative is represented by flipping all the bits (turning 1s into 0s and
vice versa) and adding 1, or computing –1 –x (the same value). For example,
note in Table 1.2 that adding 1 to 127 makes the value –128. While this
method is a bit odd, there are many benefits. Microprocessors can encode
just an addition circuit and a complementation circuit to do both addition
and subtraction (in fact, many CPUs carry around the complement with
the original value just in case subtraction comes up). The other main bene-
fit is when casting occurs, or converting a byte into a larger word, as
described in the following section.
Bitwise Operators
The usual arithmetic functions such as addition and multiplication inter-
pret words as numbers and perform the appropriate operations. However,
other operations work directly on the bits without regard to their represen-
tation as numbers. These are bitwise or logical operations. While examples
shown in the next sections use 8-bit bytes, they naturally extend to any
word size.

Table 1.2 Two’s Complement Representation
UNSIGNED
SIGNED
HEXADECIMAL
BINARY
VALUE
VALUE
REPRESENTATION
REPRESENTATION
0
0
00
00000000
1
1
01
00000001
2
2
02
00000010
126
126
7d
01111110
127
127
7f
01111111
128
-128
80
10000000
129
-127
81
10000001
130
-126
82
10000010
253
-3
fd
11111101
254
-2
fe
11111110
255
-1
ff
11111111
As shown in Table 1.3, the operations are represented by different sym-
bols depending if the context is programming or typographical. When
required, this book will use the programming notations, with the exception
of XOR, since the caret symbol (^) is used to denote exponents in some
systems.
Table 1.3 Bitwise Operations and Notation
C-STYLE
C-STYLE SELF-
OPERATION
NOTATION
ASSIGNMENT
TYPOGRAPHICAL
NOT a
~a
n/a
¬ a
a AND b
a & b
a &= b
a ^ b
a OR b
a | b
a |= b
a b
a XOR b
a ^ b
a ^= b
a ⊕ b
Shift a left by
a << n
a <<= n
a << n
n bits
(continues)
^

Table 1.3 Bitwise Operations and Notation (Continued)
C-STYLE
C-STYLE SELF-
OPERATION
NOTATION
ASSIGNMENT TYPOGRAPHICAL
Shift a right by
a >> n
a >>=n
None; all shifts and
n bits, preserve
words are assumed
sign of a
to be unsigned
Shift a right by
a >>> n
a >>>=n
a >> n
n bits, unsigned
Rotate a right by
(x >>> n) |
n/a
ROTRn(a)
n bits; w is
(x << w-n)
number of bits of a
Rotate a left by
(x << n) |
n/a
ROTLn(a)
n bits; w is the
( x >>> w-n)
number of bits of a
Concatenating
(a << shift) | b
n/a
a || b
a and b
Take n least-
a & mask
a &= mask
MSBn(a)
significant bits of a
Take n most-
a & mask
LSBn(a)
significant bits of a
>>> shift
Complementation or Bitwise NOT
The simplest bit operation is complementation or the bitwise NOT. This
simply flips bits within a word, where 0s become 1s and 1s become 0s—for
example ~11001 = 00110. Cryptographically, this operation is not used
much, primarily for implementing basic arithmetic in hardware.
Bitwise AND
AND is useful in bit testing and bit extraction. It’s based on the usual truth
table for logical AND, as shown in Table 1.4. You can remember the truth
table by observing that it’s the same as binary multiplication—anything
times zero is zero.
Table 1.4 Bitwise AND
0
1
0
0
0
1
0
1
AND
To test if a bit is set, create a mask with 1s in positions you wish to test
and perform the logical AND operation. If the result is nonzero, the bit is
set; otherwise, the bit is not:
01011101
AND 00001000
________
00001000 // not == 0, bit 4 is set
It’s also useful for bit clearing. A mask is created with 1s in positions to
preserve and 0s where you want to clear. For instance, to clear the four
least-significant bits in a byte, use the mask 11110000:
11011101
AND 11110000
________
11010000
Bitwise OR
Bitwise OR extends the logical OR to a series of bits. In English, or means
one or the other but not both. The bitwise version means one or the other
or both (the same as the logical OR). See Table 1.5.
Logical OR is useful in joining nonoverlapping words. The following
example joins two 4-bit words together:
11010000
OR 00001111
___________
11011111
In this case, the same effect could be done by using regular addition;
however, it’s pretty standard to use OR instead to make it clear that addi-
tion is not the point of the operation and that you are doing bit operations.
Table 1.5 Bitwise OR
0
1
0
0
1
1
1
1
OR
Table 1.6 Bitwise XOR
Bitwise Exclusive OR (XOR)
Exclusive OR is abbreviated as XOR, and its operation is denoted by ⊕. It
is less common in normal programming operations but very common in
cryptographic applications. Table 1.6 shows the Bitwise XOR table. This
table is easy to remember, since it is the same as addition but without any
carry (this is also known as addition modulo 2). It’s also equivalent to the
English usage of or—one or the other but not both.
XOR tests to see if two bits are different: If the bits are both 0s or both 1s,
the result is 0; if they are different, the result is 1. XOR is useful in crypto-
graphic applications, since unlike AND and OR, it’s an invertible opera-
tion, so that A ^ B ^ B = A:
11010011
XOR 10101010
____________
01111101
XOR 10101010
____________
11010011
Left-Shift
As its name implies, the left-shift operator shifts the bits within a word to
the left by a certain number of positions. Left-shifts are denoted by a << b,
where a is the value and b denotes the number of positions to shift left.
Zeros are filled in on the least-significant positions:
11111111 << 1 = 11111110
11001100 << 3 = 01100000
Mathematically, each shift left can be thought of as multiplication by 2:
3 × 2 = 00000011 << 1 = 00000110 = 4 + 2 = 6
3 × 4 = 00000011 << 2 = 00001100 = 8 + 4 = 12
-3 × 2 = 11111101 << 1 = 11111010 = -6
-128 × 2 = 10000000 << 1 = 00000000 = 0 (overflow)
0
1
0
0
1
1
1
0
XOR
When you are working with large word sizes (such as 32-bit integers),
left-shifts are used to compute large powers of 2, since 2n = 1 << n. A trick
to test to see if only one bit is set (or rather the value is a power of 2) is using
x & -x == x. In many cryptographic operations, we need to extract a cer-
tain number of the least-significant bits from a word. Normally you are
working with word sizes much bigger than a byte. However, say you
needed to extract the six least-significant bits from a byte B. You could per-
form B & 0x3f (B & 0011111). Computing this for larger word sizes is a bit
clumsy, and worse, some algorithms may extract a variable number of bits,
so the mask has to be dynamic. Hardwiring a hexadecimal mask is also
prone to errors, and someone reading your code would have to think about
how many bits you are extracting. How many bits are in 0x7fffff? The
answer is 23, but it’s not automatically clear, and staring at a computer mon-
itor all day where the text is in a small font makes the task harder. Instead,
a left-shift can be used, since 2n – 1 or (1 << n) – 1 has binary representation
of with (n – 1) digits of “1.” Instead of 0x7fffff, we can use ((1 << 24) – 1), which
will be perfectly clear to someone familiar with bit operations. Even better,
you can make your intentions clearer by making the mask a constant:
public static final int mask23lsb = (1<<24) -1; // 23-bit mask
Right-Shift
Not surprisingly, right-shifts, denoted by >>, are the opposite of left-shifts,
and positions are shifted to the right. However, there is another significant
difference. In left-shift notation, zeros are placed in the least-significant
position. With right-shift, the value of the most-significant bit is used to
fill in the shifted positions. For example, 1000000 >> 2 = 1100000, but
0100000 >> 2 = 00100000. This is designed so that the sign is preserved and
the right-shift is equivalent to division by 2 (dropping any fractional part),
regardless of the sign. For example, we’ll “undo” the previous left-shift
examples:
6 / 2 = 00000110 >> 1 = 00000011 = 3
12 / 4 = 00001100 >> 2 = 00000011 = 3
-6 / 2 = 11111010 >> 2 = 11111101 = -3
-127 / 2 = 10000001 >> 1 = 11000000 = -64
This works well when you are treating the byte as a number. In crypto-
graphic applications, the bits are just treated as bits without any numerical
interpretation. In every case, it is assumed the byte or word is unsigned
(e.g., unsigned long in C). If you are using a signed type (such as Java),

you’ll want to use the unsigned right-shift, denoted by >>> (three less-than
symbols). This just shifts the bits to the right and fills in zeros for the most
significant bits. All cryptographic papers assume the shift is unsigned and
normally use the plain >>. When coding an algorithm using signed types,
make sure you convert >> to >>>.
Special Operations and Abbreviations
The previous operations, while useful, are fairly basic and typically directly
implemented in hardware. By combining and composing them, you can
create new higher-level bit operations. For cryptography, the most useful
operations are rotations, concatenation, and extraction of most- and least-
significant bits.
Rotations
Bit rotations are fairly self-explanatory: Bits are shifted either left or right,
and the bits that “fall off the edge” roll over onto the other side. These are
commonly used in cryptography, since this method provides an invertible
operation that can aid in scrambling bits. The problem is that they aren’t
used for much else, so many CPUs do not have rotation instructions, or if
they do, they are frequently slow. Even if they have a fast rotation, the only
way to access them is by writing assembly language, since rotations do not
have a standard operator like the shifts do. Common programming rota-
tions can be accomplished with a combination of shifts and logical opera-
tors, which is the slowest method of all. In its most general case, a rotation
will take one subtraction, two shifts, and a logical OR operation. Remember
to use the unsigned right-shift operator:
(x >>> n) | (x << 32-n); // rotate right n positions, x 32-bit int
(x << n) | (x >>> 32-n); // rotate left n position, x 32-bit int
There is no special typographical symbol for rotations; we normally use
the abbreviations ROTR or ROTL, although certain technical papers may
define their own symbols in their work.
Bit Concatenation
Bit concatenation is the process of simply joining together two sets of bits
into one word. If b is n-bits long, then a || b is a << n | a:
a = 101
b = 0011
a || b = 11 << 4 | 0011 = 1010000 | 0011 = 1010011

MSB and LSB operations
Two other common operations are extracting a certain number of most-
significant bits (MSB) or least-significant bits (LSB). We’ll use the notation
MSBn(a) to denote extracting the n most-significant bits. Likewise with
LSBn(a). These operations are accomplished by making the appropriate
mask, and in the MSB case, by shifting appropriately:
MSB3(10111111) = (10111111 & 11100000) >>> 5 = 101
LSB2(11111101) = 11111111 & 00000011 = 00000001
Packed Words
Shifts are useful in creating packed words, where you treat the word as an
array of bits rather than as a number, and where the bits represent anything
you like. When doing this, you should use an unsigned numeric type.
Unfortunately, Java and many scripting languages do not have unsigned
types, so you must be extremely careful. As an example, suppose you have
two variables that are numbers between 0 and 15. When transmitting or
storing them, you might want to use the most space-efficient representa-
tion. To do this, you represent them not as 8-bit numbers from 0 to 255, but
as two numbers each with 4 bits. For this example, we’ll use pseudocode
and use the type int to represent an unsigned 8-bit value:
int a = 5; // 00000101
int b = 13; // 00001101
int packed = a << 4 | b; // = 00000101 << 4 | 00001101
// = 01010000 | 00001101
// = 01011101
To undo the operation:
b = packed & 0x0F; // 01011101 & 00001111 = 00001101
a = packed >>> 4; // note unsigned right-shift
// 01011101 >>> 4 = 00000101
Likewise, you might wish to view the byte as holding eight true/false
values:
int c = b0 | (b1 << 1) | (b2 << 2) | (b3 << 3) | (b4 << 4)
| (b5 << 5) | (b6 << 6) | (b7 << 7);
int mask = 0x01;
b0 = c & mask
b1 = (c >>> 1) & mask;
b2 = (c >>> 2) & mask;

and so on. If you were using an array, you could do this dynamically in a
loop as well.
for (int i = 0; i < 8; ++i)
b[i] = (c >>> i) & mask;
It’s quite easy to make mistakes, especially when the packed word has a
complicated structure. If you are writing code and the answers aren’t what
you’d expect:
 Check to make sure you are using the unsigned right-shift operator
>>> instead of the signed version >>.
 Check that the language isn’t doing some type of automatic type
conversion of signed values (e.g., turning bytes into integers before
the shift operation happens).
 Check to make sure your masks are correct.
 If you are using dynamic shifts, make sure the shift amount isn’t
larger than the word size. For instance, if you are shifting a byte,
make sure you aren’t doing >> 9. How this is interpreted depends
on the environment.
Integers and Endian Notation
Endian refers to how a string of bytes should be interpreted as an integer,
and this notation comes is two flavors: Little-Endian and Big-Endian. The
names come from Jonathan Swift’s Gulliver’s Travels, where the tiny people,
Lilliputians, were divided into two camps based on how they ate eggs. The
Little-Endians opened the eggs from the small end, and the Big-Endians
opened theirs from the big end. As you might expect with that reference,
there are pros and cons to Big-Endian and Little-Endian representations,
but overall it doesn’t make any difference except when you have to convert
between the two formats. A comparison of the two is shown in Table 1.7.
Table 1.7 Comparison of Little-Endian versus Big-Endian Representation
ENDIAN TYPE
B0B1B2B3 = 0XAABBCCDD
SAMPLE
MICROPROCESSORS
Little-Endian
aa bb cc dd
Intel x86, Digital (VAX,
Alpha)
Big-Endian
dd cc bb aa
Sun, HP, IBM RS6000, SGI,
“Java”

Big-Endian, also know as most-significant byte order (MSB) or network
order, puts the most-significant or highest byte value first. This is equiva-
lent to how we write decimal numbers: left to right. The downside is that
many numerical algorithms have to work backward starting at the end of
the array and working forward (just as you would manually with pencil
and paper).
The Little-Endian, or least significant byte (LSB), format is the opposite.
This makes it harder for humans to read hex dumps, but numeric algorithms
are a little easier to implement. Adding capacity (or widening conversions)
is also easier, since you just add bytes to the end of the array of bytes (i.e.,
0xff becomes 0xff00).
Fortunately, regardless of what the byte Endian order is, the bits within
bytes are always in Big-Endian format. For instance, 1 is always stored in a
byte as 000000012 no matter what platform.
The Endian issue becomes critical when you are working with heteroge-
neous systems—that is, systems that use different Endian models. When
shipping bytes between these machines, you must use a standard Endian
format or an ASCII format. In many other programming languages, you
must determine in advance what the Endian architecture is and adjust sub-
sequent bit operations appropriately. For cryptographic applications this
becomes critical, since you are often manipulating bits directly.
With Java, the underlying architecture is hidden and you program using
a Big-Endian format. The Java virtual machine does any Endian conversion
needed behind the scenes.
For C and C++ programmers, normally a BIG_ENDIAN or LITTLE_
ENDIAN macro is defined by the compiler or from an include file. If not,
you can use code similar to this for testing. It sets raw memory and then
converts it to an integer type. The value will be different depending on the
CPU’s Endian type. This C code assumes an int is a standard 4 bytes or
32 bits, but you may wish to generalize:
int isBigEndian() {
static unsigned char test[sizeof(unsigned int)] = {0, 0, 0, 1};
unsigned int i = *(unsigned int) test;
if (i == 0x0000001) return 1; // true, big-endian
return 0; // false, little-endian
}
Java Numerics
We’ll now apply the information in the previous section to the Java
numeric model. While notation is similar to the C or C++ model, there are

Java-specific issues with using signed and unsigned types—specifically,
byte arrays. Java also provides class wrappers around the native types, as
well as an unlimited-capacity integer arithmetic.
Basic Types
Java provides the standard basic numeric types—integers with 8, 16, 32,
and 64 bits, and floating-point types with 32- and 64-bit representations.
Unlike C and other languages, all the types are signed—there are no native
unsigned types. Table 1.8 shows Java’s primitive numeric types.
For integer types, a literal integer can be expressed in decimal or hexa-
decimal formats (using lower- or uppercase letters for hex digits):
int i1 = 1000;
int i2 = 0x3e8; // == 1000 decimal
int i3 = 0x3E8; // same
For literal long values, a prefix of L or l should always be used, even if it
appears unnecessary:
long l1 = 1000; // compiles but is not recommended
long l1 = 1000L; // recommended
long l2 = 0xfffffffff; // won’t compile, error
long l2 = 0xfffffffffL; // correct
Table 1.8 Primitive Numeric Types in Java
NAME
TYPE
LOGICAL SIZE
RANGE
byte
signed integer
8 bits
–128 to 127
short
signed integer
16 bits
–32768 to 32767
int
signed integer
32 bits
–2,147,483,648 to
2,147,483,647 (2.1 billion)
long
signed integer
64 bits
–9,223,372,036,854,775,808
to 9,223,372,036,854,775,807
(± 9.2 × 1018)
float
ANSI/IEEE 754
32 bits
±1.4 × 10-45 to
floating point
±3.4028235 × 1038
(6–7 significant decimal digits)
double
ANSI/IEEE 754
64 bits
±4.9 × 10-324 to
floating point
±1.7976931348623157 × 10308
(15 significant decimal digits)

Specifying literal values for bytes can be tricky because a byte is signed
from –128 to 127, while very often you’ll be using constants specified from
0 to 255. If the value is within –128 to 127, the code will compile. If the value
is from 128 to 255, you can either convert it to its negative equivalent or use
a cast operation. The same principles apply to the short type:
byte b1 = 10;
byte b2 = 189; // error, out of bounds
byte b3 = -67; // = 189, ok
byte b4 = (byte) 189; // ok
byte b4 = (byte) 0xbd; // ok
Floating-type literals are assumed to be a double type unless suffixed
by an f for float. Floating types can also be representing using scientific
notation using valEscale = val × 10scale.
float f1 = 0.1; // compiler error
float f2 = 0.1f; // by default “0.1” is a double type
double d1 = 0.1;
double d2 = 1.0E2 // == 100
In practice, the short type is rarely used because it almost always is
converted into an int type before anything else is done. The float
type also isn’t used because it doesn’t provide enough precision for most
applications.
Type Conversion
Conversions between types can either be widening or narrowing. A widen-
ing conversion is where the new type can represent larger numbers and can
“contain” the old type. For an integer-to-integer type conversion (i.e.,
short to long), this just means putting the same number in a larger
container. Integer-to-floating-point conversions are also considered as
widening, but some of the least-significant digits may be scrambled or
zeroed due to how floating point numbers are presented. These types of
conversion happen automatically and silently either at compile time or at
run time. For example:
int i1 = 123;
long l1 = i; // ok -- l = 123
float f1 = i; // ok -- f = 123
int i2 = 123456789;
float f2 = i; // ok, but f = 123456792

Narrowing conversions may result in a loss of magnitude and precision.
Any conversion from floating-point to an integer type is considered nar-
rowing and is clearly a larger integer type to a smaller integer type. Java
will not automatically do narrowing conversions, and a compiler error will
be issued if the compiler detects such a conversion. To do these conver-
sions, you must explicitly declare them with the cast operator.
Converting floating point to integers drops or truncates any digits to the
right of the decimal point:
float f1 = 0.123456;
int i0 = f1; // compiler error -- requires a cast.
int i1 = (int) f1; // == 0
float f2 = 0.99999;
int i2 = (int) f2; // == 0
float f3 = 100.9;
int i3 = (int) f3; // == 100
float f4 = -100.9;
int i4 = (int) f4; // == -100
int i5 = 0xffL; // compiler error; long to int conversion
For narrowing conversions from one integer type to another, the least-
significant bits (or bytes) form the larger type:
int i1 = 0xfffff01;
byte b1 = (byte) b; // b = 0x01;
These rules are summarized in Table 1.9, with N representing a narrow-
ing conversion, W for a widening conversion, and W* for a widening
conversion that may result in a loss of precision.
Table 1.9 Primitive Type Conversion
BYTE
SHORT
INT
LONG
FLOAT
DOUBLE
byte
W
W
W
W
W
short
N
W
W
W
W
int
N
N
W
W*
W
long
N
N
N
W*
W*
float
N
N
N
N
W
double
N
N
N
N
N

Unsigned to Signed Conversions
There are no unsigned types in Java; however, you can still use the signed
types as an unsigned container. Even though a byte is signed, it still has
8 bits and can represent an integer from 0 to 255 even if Java thinks other-
wise. To use the unsigned value, it must be converted into a larger type
using an AND mask.
To convert a unsigned byte:
byte c = (byte) 254; // b = -2 in Java.
short c = (short)(x & 0xff); // c = 254
int c = x & 0xff // c = 254
long c = x & 0xffL; // c = 254, must put L at end of 0xffL
Unsigned short values are converted by the following:
int c = x & 0xffff;
long c = x & 0xffffL;
Finally, the unsigned int is converted into a signed long by the
following:
long c = x & 0xffffffffL
It’s critical to put an L at the end of the mask when working with long
types. Without it, Java will respond as if you are doing an int computa-
tion, then doing a widening conversion to a long:
byte b = (byte)0xff;
long b = (b & 0xff) << 56; // Wrong. b = 0
long b = (long)((int)(b & 0xff) << 56); // same as previous
long b = (b & 0xffL) << 56; // Correct. b = 0xff00000000000000
Overflow
When a computation exceeds an integer type’s bounds, the value is “rolled
over” silently; no exception is thrown:
byte b = (byte) 127; // b = 0111111
b++; // b = 1000000, but b has the value of -128
While the silence may seem disturbing, in practice overflow is rarely a
problem. For floating-point computations overflow is still silent, but the
result is not rolled over. Instead, the type takes on a special Infinity

value that can be checked using normal comparison operators or by using
the Double.isInfinite method. Also note that Java does not think 0.0
is the same as 0 (zero). Instead, 0.0 is treated as a number that is very close
to zero, so division by 0.0 results in an infinite value. Keep in mind, how-
ever, that Java throws an exception if you try to divide by the integer 0. In
the event that a computation doesn’t make sense, the value is NaN, which
stands for “not a number.” If you wish to see if a value is NaN, then you
must use the Double.isNaN method. Various examples of manipulating
double types are shown as follows:
double d1 = 1.7E308 * 2.0; // overflow = Infinity
double d2 = 1.0/0.0; // = Infinity
double d3 = -1.0/0.0; // = -Infinity
int i = 1 / 0; // throws a DivisionByZero exception
double d4 = 0.0/0.0 // = NaN
boolean b = (d4 == d4); // = false, always
boolean b = Double.isNan(d4); // true;
boolean b = Double.isInfinite(d3); // true
Arrays
In Java, native-type arrays are treated and created as if there were objects
created using a constructor with the new operator. Arrays can also be set
with initial values, as shown in the following code snippet.
int[] a= new int[3]; // all three elements set to zero
int[] a = {1, 2, 3}; // a pre-set 3 element array.
Once constructed, an array is not resizable. If dynamically allocated
storage is needed, you should use one of the collections in the java.util
package. The index to arrays is with int types, and the first element starts
at 0 (as in C). Small types will have widening conversions done to them,
and the long type cannot be used without a cast operation. Thus, single
dimensional arrays cannot be larger than 231, or roughly 2.1 billion, entries.
Hopefully, this will be sufficient for your needs.
Since arrays are objects, assignment is done by reference, as shown in the
following:
int[] a = {1,2};
int[] b = a;
b[0] = 100; // modifies the object that both a and b point too.
System.out.println(a[0]); // will print 100, not 1
a = null; // b is null, but a is intact.

In addition, they can be copied using the clonemethod. To use this, cast
the result to the correct array type:
int[] a = {1, 2};
int[] b = (int[]) a.clone(); // b is a “deep copy” of a
The System.arraycopy method is extremely useful for copying parts
of an array or concatenating arrays. It makes a native system call to copy
memory directly instead of copying each element individually and is much
faster than writing a custom for loop:
System.arrayCopy(Object input, int inputOffset,
Object output, int outputOffset, int length)
Even though the method has Object in its signature, this method only
works on array types. Other useful methods for manipulating native
arrays can be found in the class java.util.Arrays and are summarized
in Table 1.10.
Table 1.10 Summary of Methods from java.util.Arrays
JAVA.UTIL.ARRAYS METHOD
DESCRIPTION
static boolean
Returns true if and only if the arrays are
equals(type[] a, type[] b)
the same length and every element is
equal.
static void
Fills an array with a single value.
fill(type[] a, type val)
static void
Performs a fast numeric sort. Array is
sort(type[] a)
modified.
static void
Sorts only part of an array.
sort(type[] a, int fromIndex,
int toIndex)
static int
Performs a binary search of a sorted
binarySearch(type[] a,
array. Returns the array index if a match
type val)
is found and –1 if no match. Arrays must
be sorted first.

Numeric Classes
Each numeric type has a matching class that acts as an object wrapper, as
shown in Table 1.11.
These classes do not have any support for mathematical operations—
you can’t add two integer objects directly. Instead, these classes are pri-
marily used to allow numeric types to be used in methods that expect an
Object type, such as in collections from the java.util package (e.g.,
ArrayList, HashMap). The wrapper classes also provide basic string for-
matting and parsing from strings for numbers. Since they are objects, they
can also be null. Likewise, objects are always pass-by-reference.
public void changeInt(Integer i) {
i = new Integer(“1”);
}
Integer I = new Integer(“0”);
changeInt(i); // i is now 1
All of the classes share some common traits (examples are just shown for
Integer and Long):
 Two public fields, MAX_VALUE and MIN_VALUE, in case you don’t
want to memorize the previous table.
 byteValue, doubleValue, floatValue, intValue, and
longValue methods that return the underlying number in
a native type.
 A static method valueOf that accepts a string and an optional radix
to parse a string and return an object. The radix can be 2 to 32.
static Integer Integer.valueOf(int val)
static Integer Integer.valueOf(int val, int radix)
static Long Long.valueOf(long val)
static Long Long.valueOf(long val, int radix)
 A static method parseClass (where Class is the name of the class,
such as Byte or Integer) that also accepts a string, and an optional
radix to parse a string returns the native type instead of an object.
static int Integer.parseLong(int val)
static int Integer.parseFloat(int val, int radix)
static long Long.parseLong(long val)
static long Long.parseLong(long val, int radix)
 toString that returns the number as unformatted decimal number.

Table 1.11
Java Class Wrappers for Native Types
NATIVE TYPE
MATCHING JAVA.LANG CLASS
int
Integer
byte
Byte
double
Double
float
Float
long
Long
short
Short
The Long and Integer classes have a few other useful static methods
that provide an unsigned representation of the number in binary, hex, or
octal formats as shown:
static String Integer.toBinaryString(int val)
static String Integer.toHexString(int val)
static String Integer.toOctalString(int val)
static String Long.toBinaryString(long val)
static String Long.toHexString(long val)
static String Long.toOctalString(long val)
Binary representation is especially useful for debugging bit fields.
These objects do not add any “leading zeros” to the output, so new
Integer(16).toHexString() just returns F and not 0F or 0x0F.
Booleans and BitFields
Java provides a native boolean type that can either be true or false.
Unlike C and C++, it is not a numeric type and does not have a numeric
value, and it is not automatically cast. Like the numeric types, there is also
a boolean class wrapper. If you want to create an array of boolean values,
you could use Boolean with one of the collection classes (e.g.,
ArrayList). However, there is a special class java.utl.BitField
specially designed for use with boolean types that provides a huge perfor-
mance increase and memory savings. More specialized applications will
convert a native type or byte array into a bit field directly.

Chars
Java has another native type char that represents a Unicode character and
is used by String and StringBuffer internally. It is represented by an
unsigned 16-bit integer, but it is not a numeric type. It is automatically cast
to an integer in situations when needed, such in mathematical or bitwise
operations, but it’s not automatically cast to any other type. However, you
can explicitly convert a char to another type by using a cast operator.
Since it’s unsigned, it might be tempting to use char in places for math-
ematical purposes and for raw bit fields. In practice, there is not much
point, since char types are automatically converted into an int before any
operation is done, eliminating any advantage
Working with Bytes
Because the Java byte type is signed and because of the automatic conver-
sion rules, working with bytes and byte array can be frustrating. Following
are some tips and tricks to simplify byte manipulation.
Sign Problems
It’s always easiest to specify a constant byte value by using a cast (later in
this chapter you will see tricks on making byte arrays):
byte b = 0xff; // Compiler error. 0xff is an ‘int’ type
byte b = (byte) 0xff; // right
Bytes range from –127 to 128, and not 0 to 255. Setting a byte with the
value 128–255 and a cast is fine—the byte contains the correct bits. The
problem is when you want to retrieve the unsigned value or when you per-
form a computation with bytes and ints. For instance:
byte b = 127;
int i = 3;
System.out.println((i+b)); // Prints -126 and not 130
results in –127 being printed, not 129. In the addition step, byte b is auto-
matically converted to an int type, including the sign. While b was set
with 128, internally it’s really –1, and this is its value when converted.
The key point to remember is that all bitwise and arithmetic operators
work on int and long types only. All other types are automatically cast. If
you are using a byte as an unsigned value, you must manually convert the
type to an int or long before the operation using val & 0xFF or val &
0xFFL respectively.

The problem is more mysterious with the shift operators. Let’s say you
want to do a right-shift on 0xFF. In binary, this is 11111111, so you’d expect an
unsigned right-shift to be 0111111, or 0x7F. The natural way to write this is:
byte b1 = (byte) 0xff;
byte b2 = b >>> 1; // compiler error
However, this results in a strange error from the javac compiler, and the
problem isn’t clearly defined:
aprogram:java.6: possible loss of precision
found: int
required: byte
byte b2 = b >>> 1;
^
You might try and fix this by casting the result:
byte b1 = (byte) 0xff;
byte b2 = (byte)(b >>> 1); // error #2
This performs the compile operation; however, the result is still wrong.
The value of b2 is not 0x7f, but strangely remains at 0xff or –1. Again, Java
is converting the byte into an integer before the shift. Since bytes are signed,
it converts the byte value of –1 into an int value of –1, which is 0xffffffff.
Then it performs the shift, resulting in 0x7fffffff. Finally, the cast takes the
least bits, or 0xff, and converts them to a byte. The step-by-step details
are listed in Table 1.12. The correct solution is to treat the byte as an
unsigned value, then convert to an int, and then do the shift and cast back
down to a byte:
byte b1 = (byte) 0xff;
byte b2 = (byte)((b & 0xff) >>> 1); // correct:
Table 1.12 Comparison of Bit-Shifting a Byte Type
STEPS
INCORRECT
CORRECT
Initial value
0xff = –1
0xff = –1
Convert to int
0xffffffff = –1
0x000000ff = 255
Right-shift >>>
0x7fffffff = large int
0x0000007f = 127
Cast down to byte
0xff = –1
0x7f = 127

To convert a byte back to its unsigned value, you have to do a little trick;
you must bitwise AND the byte with 0xff, that is, b & 0xff. For example:
byte b = 128;
int i = 1;
int unsignedByteValue = b & 0xff
System.out.println((i + unsignedByteValue); // 129
System.out.println((i + (b & 0xff)); // combined
This trick works by observing that the least-significant bytes in an int
match exactly with the byte type, as shown in Table 1.13.
Conversion of Integral Types to Byte Arrays
Java does not have any easy way of converting integral types into raw
bytes. There are the serialization methods, but these involve a lot of work
for something that should be relatively simple. For example:
public static byte[] intToByteArray(int i) {
byte[] buf = new byte[4];
b[0] = (byte) (i >>> 24);
b[1] = (byte) (i >>> 16);
b[2] = (byte) (i >>> 8);
b[3] = (byte) i;
}
public static int byteArrayToInt(byte[]) {
If (buf.length != 4)
throw new RuntimeException(“Bad Length”);
return ((b[0] & 0xff) << 24) | ((b[1] & 0xff) << 16) |
(b[2] & 0xff) << 8) | (b[3]);
}
Table 1.13 Comparison of Representation between int and byte Types
UNSIGNED
SIGNED
BYTE
INT
VALUE
VALUE
REPRESENTATION
REPRESENTATION
0
0
00
00000000
1
1
01
00000001
2
2
02
00000002
126
126
7d
0000007e
127
127
7f
0000007f
128
-128
80
00000080

Table 1.13
(Continued)
UNSIGNED
SIGNED
BYTE
INT
VALUE
VALUE
REPRESENTATION
REPRESENTATION
129
-127
81
00000081
130
-126
82
00000082
253
-3
fd
fffffffd
254
-2
fe
fffffffe
255
-1
ff
ffffffff
256
256
00
00000100
We could do the same thing for longs. Instead, we’ll demonstrate a dif-
ferent technique that builds the result by incrementally shifting the value
in question:
public static byte[] longToByteArray(long I) {
byte[] buf = new byte[8];
for (int j = 7; j >= 0; ++j) {
buf[j] = (l & 0xffL); // !! must use 0xffL, not 0xff !!
l >>>= 8; // l = l >>> 8;
}
}
public static long byteArrayToLong(byte[] buf)
{
long i = 0;
if (buf.length != 8)
throw new RuntimeException(“Bad Length”);
for (int j = 0; j < 8; ++j) {
i |= buf[j];
i <<= 8; // l = l << 8;
}
return i;
}
Converting to Hex Strings
Converting an array into a hexadecimal string is a fairly common task for
printing, data interchange, and debugging. A naive way to do hex conver-
sions is by using one of the previously mentioned byte array-to-long
methods followed by Long.toHexString. However this is very slow

and is limited to arrays smaller than 8 bytes. Even worse would be using
BigInteger (discussed later in the chapter). For example:
byte[] mybytes = ...;
BigInteger a = new BigInteger(1, mybytes);
return a.toString(16);
While mathematically correct, most programs operate on printing two
hex digits per byte; for example, decimal 16 is converted to 0F, not F. The
other problem is that this method is horrendously slow.
While these methods are useful in a pinch, for high performance appli-
cations they won’t do. If one is willing to use a little memory, performance
can be doubled or tripled. The conversion itself is fairly simple, but an
implementation that uses tables has some important benefits:
 Provides typically higher performance (at the expense of some minor
setup and memory costs).
 Allows using different alphabets instead of the standard. These
issues are further detailed in Chapter 4.
 Typically allows simpler coding.
The conversion class starts out defining the hexadecimal alphabet (in
our case, the standard one). The hexDecode table is the inverse mapping;
for instance, character A is mapped to 10. Characters that are invalid are set
to –1. For example:
public class Hexify
{
protected static char[] hexDigits = {‘0’, ‘1’, ‘2’, ‘3’, ‘4’,
‘5’ ‘6’, ‘7’, ‘8’, ‘9’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’},
protected static int[] hexDecode = new int[256];
static {
for (int i = 0; i < 256; ++i) hexDecode[i] = -1;
for (int i = ‘0’; i <= ‘9’; ++i) hexDecode[i] = i - ‘0’;
for (int i = ‘A’; i <= ‘F’; ++i) hexDecode[i] = i - ‘A’ + 10;
for (int i = ‘a’; i <= ‘f’; ++i) hexDecode[i] = i - ‘a’ + 10;
}
Encoding is fairly straightforward. Each byte is split into two sections,
and the table is used to determine the appropriate character:
public static String encode(byte[] b) {
char[] buf = new char[b.length * 2];
int max = b.length;
int j = 0;

for (int i = 0; i < max; ++i) {
buf[j++] = hexDigits[(b[i] & 0xf0) >> 4];
buf[j++] = hexDigits[b[i] & 0x0f];
}
return new String[buf];
}
Decoding is more interesting. First, a function, given a hexadecimal
character, returns the integer value. This is done via a table lookup, and the
result is verified to make sure it’s valid. It’s always important to check the
input for validity, especially if you don’t have control of what the input is.
You could easily add or inline these checks into the main decode routine
for a minimal improvement in execution time, as follows:
protected static int getHexDecode(char c) {
int x = hexDecode[c];
if (x < 0) throw new RuntimeException(“Bad hex digit “ + c);
return x;
}
While the encode function always encodes two characters per byte, you
may receive a hexadecimal string that doesn’t conform to this rule. If you
are only processing strings that you created, you could check that the num-
ber of input characters is even and that an exception is thrown if it’s not.
The following implementation handles full generality. The string is checked
to see if it has an odd number of characters, by computing max & 0x01
(or, equivalently, max % 2), which just looks at the least-significant bit. If
it’s set, then the value is odd.
public static byte[] decode(String s) {
char[] input = s.charArray[];
int max = input.length;
int maxodd = max & 0x01;
byte b;
byte[] buf = new byte[max/2 + odd];
int i = 0, j = 0;
if (maxodd == 1) {
buf[j++] = getHexDecode[input[i++]]
}
while (i < max) {
buf[j++] == (byte)(getHexDecode[input[i++]] << 4 |
getHexdecode[input[i++]]);
}
return buf;
} //end of class Hexity

BigInteger
The BigInteger class in the java.math package provides a way of
doing computations with arbitrary large integer types. Unlike C++, Java
does not overload operators, so to do basic math, you have to call methods
to perform mathematical operations such as addition. All of the operations
create new BigInteger objects as results. The objects used in the opera-
tions remain untouched. For instance:
BigInteger b1 = new BigInteger(“1”);
BigInteger b2 = new BigInteger(“2”);
BigInteger b3 = b1.add(b2);
At the end, b1 is still 1, b2 is still 2, and b3 is what you’d expect it to be,
3. Modifying an object requires explicit code. For instance, if we wanted
b1 to contain the result of the addition, we would use self-assignment like
this:
b1 = b1.add(b2); // b1 is now 3
While the notation is different, the behavior is no different than using
normal operators. Adding 1 + 2 doesn’t change the original values of 1 or 2.
All of the usual arithmetic operations are available: add, subtract,
multiply, divide, and remainder. Another method, divideAnd-
Remainder, returns a two-element BigInteger array where element 0
is the division result and element 1 is the remainder. A complete list and
comparison of BigInteger operations is given in Table 1.14.
Table 1.14 BigInteger Arithmetic Operators
ARITHMETIC
NATIVE TYPE
BIGINTEGER
OPERATION
NOTATION
NOTATION
Addition
a + b
a.add(b)
Subtraction
a – b
a.subtract(b)
Multiplication
a * b
a.mult(b)
Integer division
a / b
a.divide(b)
Remainder
a % b
a.remainder(b)
Division with
int[] result =
BigInteger[] result =
remainder
{ a / b, a % b }
a.divideAndRemainder(b)
Negation
–a
a.negate()

Table 1.14
(Continued)
ARITHMETIC
NATIVE TYPE
BIGINTEGER
OPERATION
NOTATION
NOTATION
Exponentiation
Math.pow(a,b)
a.pow(b)
Random value
Random r = new Random(); Random r = ...
int bits = ...;
a = r.getInt();
BigInteger a =
new BigInteger(bits, r);
Absolute value
(a >= 0) ? a : b or
a.abs()
Math.abs(a)
Minimum of a, b
(a < b) ? a : b or
a.min(b)
Math.min(a,b)
Maximum of a, b
(a > b) ? a : b or
a.max(b)
Math.max(a,b)
Likewise, all the usual bit operations are available, as described in
Table 1.15. There is no need for a special unsigned right-shift operator, since
BigInteger has unlimited capacity; the result of shifting n places to the
right is the equivalent of dividing by 2n, regardless of sign. In addition are
a few new methods that simplify working with bits: testBit, setBit,
and flipBit. Two others called bitCount and bitLength need a bit
more explaining. For positive integers the results are what you’d expect:
bitLength returns the minimal number of bits required to represent the
integer, and BitCount returns the number of one-bits in the representation.
For negative numbers, bitLength gives a minimal length excluding the
sign bit, and bitCount returns the number of zeros in the representation.
Creating and Converting
BigInteger objects can be converted by using a string representation of a
number, a native type, or with a byte array. Using a native type, a string, or
a byte array representation of a number creates BigInteger objects.
Strings
You can create BigInteger objects by using a construction that takes a
string representation of a number. By default, the constructor expects the
string representation to be in decimal (base 10) form. However, you can
also use the standard base 16 representations, as follows:
BigInteger(String base10Rep)
BigInteger(String representation, int radix)

Table 1.15 BigInteger Bit Operations
NATIVE TYPE
BIGINTEGER
BIT OPERATION
NOTATION
NOTATION
AND
a & b
a.and(b)
ANDNOT
a & ~b
a.andNot(b)
NOT (complement)
~a
a.not(b)
OR
a | b
a.or(b)
XOR
a ^ b
a.xor(b)
Shift left n bits, signed
a << n
a.shiftLeft(n)
Shift right n bits, signed
a >> n
a.shiftRight(n)
Test bit n
(a & (1 << n) != 0)
a.testBit(n)
Set bit n
(a | (1 << n))
a.setBit(n)
Flip bit n
(a ^ (1 << n))
a.flipBit(n)
Conversely, you can retrieve a string representation in an appropriate
base by using the toString method:
String toString()
String toString(int radix)
Numeric Types
The BigInteger class can be created from a long by using the static
method:
Long val = 123456789123;
BigInteger bi = BigInteger.valueOf(val);
Once you have the BigInteger object, it can be converted directly
into a numeric type by using intValue, longValue, floatValue, or
doubleValue. Not that we’ll mind, but there is no method to directly
convert into a short or char value. If the value is too large to fit into the
numeric type, a silent narrowing conversion is done (the high bits are
chopped off).
Byte Arrays
Generating byte arrays and constructing byte a. Assuming you have a
BigInteger object, a byte array can be generated with:

byte[] toByteArray()
And the results of this output can be used to create a new BigInteger
object:
BigInteger(byte[] array)
However, because of signing issues, typically this format is not used for
cryptographic purposes. If the bit length is a multiple of 8, and it always is
for cryptographic applications, the byte array starts with an extra byte
indicating the sign (0 for positive). Either your application can deal with a
leading zero, or you can strip it away, as in the following:
byte ba[] = bi.toByteArray();
if (ba[0] == 0) {
byte[] tmp = new byte[ba.length - 1];
System.arraycopy(ba, 1, tmp, 0, tmp.length)
ba = tmp;
}
A lower-speed option is to use strings as the common medium:
Byte ba[] = Hexify.decode(bi.toString(16);
Since BigInteger expects a sign-bit, you’ll need to use a special con-
structor and manually indicate the sign by passing in 1, –1, or 0 for positive,
negative, or zero:
BigInteger(int signum, byte[] magnitude)
BigInteger and Cryptography
BigInteger has a few special methods specifically designed for cryptog-
raphy, listed in Table 1.16. These operations are discussed fully in Chapter 3.
Secret Methods in BigInteger
For Sun Microsystems to effectively test for primality, many generic
methods and algorithms had to be implemented; however, they are not
part of the public API. For people working in numbers theory or who are
developing more advanced cryptographic algorithms, these “hidden”
methods may be useful. Unfortunately, they are declared private; but with
a few modifications to the source code, you can create your own enhanced
BigInteger variant.

Table 1.16 BigInteger Operations Useful in Implementing Cryptographic Algorithms
OPERATION
BIGINTEGER NOTATION
Create a random nonnegative integer
BigInteger
uniformly distributed from 0 to 2n–1
(int numBits, Random r)
Create a random integer that is prime
Random r = ...;
with certainty 1 – 2n
int bitLength = ...;
BigInteger a =
BigInteger(bitLength, r)
Check if prime with certainty using
a.isProbablePrime()
IEEE standard of 1 – 2100
a mod b
a.mod(b)
an mod b
a.modPow(b, n); // n is an
int, not BigInteger
find a-1, such that aa-1 = 1 mod b
a.modInv(b)
Greatest common denominator of a, b
a.gcb(b)
The most interesting of these are the following methods:
int jacobiSymbol(int p, BigInteger n); // this is “package protected”
private boolean passesMillerRabin(int iterations)
private boolean passesLucasLehmer()
private static BigInteger lucasLehmerSequence(int z,
BigInteger k, BigInteger n)
In addition, there is an implementation of sieve for testing primality of
small numbers in the class BitSieve.
To “free” them:
1. Copy all the source files from this directory into a new directory.
2. Edit each file and change the package name from sun.math to one
of your own choosing.
3. Make the BitSieve class “public.”
4. Change the desired methods and classes to public. Note that the
method jacobiSymbol in BigInteger and the class BitSieve
do not have an access modifier, so you have to add public in front
of the method.
5. Compile.

This code is copyrighted by Sun Microsystems. For commercial use and
distribution, refer to the license agreement distributed with the source
code.
Now that we have a full understanding of bit operations and the Java
model, we’ll discuss secret and public key cryptography.

35
Secret key cryptography is based on cryptographic functions that rely on a
single key that must be kept confidential. Secret key ciphers perform
encryption and decryption using the same key, and cryptographic hashes
must be computed and verified using the same key.
Symmetric Block Ciphers
In general, a cipher is a function that maps a message, known as plaintext,
into an unreadable form, known as ciphertext, by use of a key. This is known
as encryption. Without the key, you cannot do the inverse transformation,
the decryption, or turning the ciphertext back into its original plaintext
form. In lay terminology code is often used interchangeably with cipher.
However, they have different meanings. A code is translation of a message
by use of a dictionary, without any key being used.
Modern block ciphers operate on a fixed number of bytes in a single pass,
for instance, transforming 8 bytes plaintext into 8 bytes ciphertext. Messages
larger than the block must be chopped into pieces, and if the message isn’t
a multiple of the block size, the message must be padded appropriately.
Secret Key Cryptography
C HAPTE R
2
Cipher Properties
To be useful a cipher must be secure and must be usable. What makes a
cipher secure? For ciphers that have stood the test of time (i.e., those in
common use and have not been cracked), a rough estimate of security is
key length, with the longer the key being more secure. A cipher is deemed
secure if there is no attack that is significantly better than brute force. How-
ever, clearly this by itself is not enough, and the notion of security is tied to
current or contextual computation resources available to an attacker. An
algorithm may be structurally secure, but if the key is too small, brute-force
attacks can easily break the cipher by directly recovering the key.
Security Properties
The fundamental property of an encryption algorithm is that the output be
effectively random, regardless of what the input or key is. From this prop-
erty, many notions emerge regarding how well the algorithm performs:
Semantically secure, polynomially secure, or “indistinguishably
of encryptions.” There are many variants of this idea, but the
simplest is that, given two plaintext messages and their ciphertext
equivalents, it is impossible to determine which plaintext matches
the correct encrypted version with a probability greater than .5. In
other words, the ciphertext does not reveal any information about
the plaintext.
Nonmalleability. Given a ciphertext, it is impossible to manip-
ulate the ciphertext to produce related plaintext. In other words,
a 1-bit change to a ciphertext should result in a very different
plaintext.
Plaintext-aware.
It is computationally infeasible to produce a valid
ciphertext without knowing the corresponding plaintext. For instance,
you should be able to generate ciphertext that decrypts to 7-bit
ASCII, unless you are encrypting the original plaintext. This is useful
to prevent the generation of false messages.
Lack of algebraic structure.
It’s very important that a cipher doesn’t
have certain algebraic properties that might allow multiple encryp-
tion or multiple keys to negate each other. In particular, it is important
that the cipher does not form a group where:
 Each key has an inverse key.

 Encrypting something twice with different keys is equivalent to
encrypting it once with another key.
 The result of encrypting with one key and decrypting with
another key is the same as encrypting with one key. This would
be very useful for attack purposes.
Ideally, you don’t even want a subset of keys and messages to form a
group; however, proving that is quite tricky. You also don’t want “fixed
points,” that is, messages that encrypt to themselves.
Brute-Force Attacks
Brute-force attacks are the simplest attack, and for symmetric ciphers, this
means trying each key, one after another, until the message decrypts prop-
erly. In decryption, there is no inherent flag saying “this is the right key” or
“decryption occurred correctly.” For instance, a ciphertext with one key
might decrypt to cat, while the same ciphertext decrypted with a different
key might decrypt to dog. Another way of looking at this is if you encrypted
a random number or binary data, an attacker using the brute-force tech-
nique would be unable to determine if the resulting plaintext was the orig-
inal number without using some additional context.
Typically, you do not encrypt random binary data, but rather something
with more structure. During brute-force attacks, the decrypted result is
checked to see if the structure exists. For instance, English text is encoded
using ASCII where only 7 bits in each byte are used. During a brute-force
attack, every output that is only 7 bits is stored in a “candidates” file, which
will need to be checked by an actual human to determine what the correct
result is. False-positive results are extremely rare for language texts.
Even with small key sizes, such as DES (for Data Encryption Standard),
there are 255, or approximately 3.6 × 1016, keys to check. When measuring
cryptographic security, you can’t be afraid of large numbers. While 255
sounds out of reach, in fact, it’s quite doable. Brute-force algorithms are
easily parallelizable, and with the Internet, tens of thousands of computers
are easily pooled for such attacks, thereby enabling trillions of keys to be
tested per second.
Other Attacks
Brute-force attacks can easily be thwarted by using a larger key, so a full
key-space scan becomes impossible. The following attacks attempt to find

weaknesses in the cipher and provide shortcuts to a brute-force attack. If a
significant attack occurs, the cipher is said to be broken and thus insecure.
Ciphertext-only attack. This is probably the most non-networked
application. The adversary here has a ciphertext message and that’s
it. In the vast majority of cases, this attack indicates brute force,
where every key is checked.
Chosen plaintext. Here the attack has a bulk list of plaintexts
and receives the corresponding ciphertext and through analysis can
determine the key.
Adaptive chosen plaintext. Similar to the previous except the process
is more interactive. Plaintext may depend on the results of the previ-
ous plaintext and ciphertexts.
Chosen ciphertext. The attack selects ciphertext, which is then
decrypted.
Related key. The attack requires the plaintext and the ciphertext from
different keys.
Partial known plaintext attack. This attack is one of my own devis-
ing. I haven’t seen anything in literature of this. Here the attacker
has the ciphertext and some of the plaintext. This potential attack
turns out to have applications for database security.
Power attack.
This only applies to smartcard-like devices. An attacker
can actually measure the radiation and time of the cryptographic
computation and estimate the key or size of the key.
Many of these attacks come in the form of linear or differential (or some
combination) analysis. Knowing the algorithm being used and with many
pairs of text (plaintext and/or ciphertext), an attacker can determine the
internal structure and thus extract the key or limit the key space.
Many might be wondering what the point is of the attacks if the attacker
already has plaintext-ciphertext pairs. The trick is that the text pairs are
chosen by the attackers and by themselves are not interesting (the values
are more or less random or generated by an algorithm). The plan is to use
these pairings to recover the key and thus decrypt ciphertexts that do con-
tain useful information. Still, the attacks may seem odd, yet many network
protocols allow many of these types of attacks since they echo back. For
common ciphers most of these attacks require a lot of (at least 240) pairings,
which would certainly be noticed by even the most primitive network
monitoring. Another way these attacks can be exploited is by an attacker

having access to either the encryption or decryption functions of a system.
A hacker breaking into a system might not have access to the keys but will
have access to system calls to do encryption. Then the attack might only
generate a modest strain on the system’s resources and not be noticeable.
Common Block Ciphers
There are dozens of block ciphers; however, you really only need to consider
a small handful (see Table 2.1). Block ciphers come in two flavors: 64-bit
block and 128-bit block.
Data Encryption Standard (DES)
The Data Encryption Standard, first developed in the 1970s, is the most
widely used cipher. It was the first industrial-strength algorithm that was
fully specified to the public, in Federal Information Processing Standard,
or FIPS, 46-2. (FIPS are the standards used by the U.S. federal government
for information processing.)
DES keys are specified as a 64-bit value; however, only 56 bits are used
as part of the key. For each byte, the 8th bit is a parity bit that causes bytes
to have an odd number of one-bits in them (other wise known as “odd-
parity”). This is left over from the early days of computing.
Table 2.1 Selected Secret-Key Ciphers
BLOCK SIZE
KEY SIZE
ALGORITHM
YEAR
BITS
BITS
REASON
DES
1977
64
56
Standard
Triple DES
1985
64
168
Standard
Blowfish
1993
64
40–442
Popular, fast
RC5
1996
64 standard,
0–2048
Popular,
but supports
supported
32 and 128
IDEA
1992
64
128
Popular
Rijndael
1998
128, 192, 256
128, 192, 256
Standard, fast
Twofish
1998
128, 192, 256
128, 192, 256
Popular
RC6
1998
128, 192, 256
128, 192, 256
Popular,
supported

It’s worth looking at the inner workings of DES, since the ideas are used
in many ciphers and the concept is actually fairly simple. First, the key is
expanded. Although the key itself is only 56 bits, internally the key is
expanded into sixteen 48-bit values. Likewise, various stages in the pipeline
also expand data from 32 bits to 48 bits. Later, these expanded values get
contracted as well. While, theoretically, expanding a key into larger values
doesn’t make a key longer than it really is, it does have the benefit of allow-
ing more complicated functions to be constructed with simpler code.
The basic structure of DES is reused in many other ciphers and is what is
known as Feistel structure. The plaintext block is split into left and right sec-
tions, and then the following is iterated:
Ln+1 = Rn
Rn+1 = Ln ⊕ f(Rn, Kn)
where Kn is a subkey of the main key and f is a mixing function. This mix-
ing function is really what makes DES successful. It uses what are called
S-Boxes, which are maps from a 6-bit input to a 4-bit output (the contraction)
based on a lookup table. In DES (and other ciphers) the map is nonlinear and
normally specified by using tables. The security of DES depends entirely
on the values in the S-Boxes and exactly how they were constructed is a
mystery to this day.
DES Security
As of this writing, DES has had no attacks that are significantly better than
brute force. So in structure DES is solid and secure. However, the problem
of short key size and vulnerability to brute-force attacks remains. Nowa-
days DES can be cracked either by specialized hardware or by distributed
networks in a few hours. (See www.eff.org/descracker/ for full
details.) What’s interesting is that various modifications have been pro-
posed to enhance DES. It has been shown that just about any tampering or
enhancements to the S-Box structure results in a much weaker cipher. These
days there really is no reason to use DES except in legacy applications.
Triple DES
While DES appears to have a solid design, the key space is just too short
and is susceptible to brute-force attacks. To help fix the problem, an interim
solution was proposed that effectively performs multiple iterations of DES
on the message using different keys. What’s known as Triple DES, TDES,
DES-EDE, or 3DES can provide 168 bits of security using three DES keys. A
less common version is a two-key version providing 112 bits of security
where three operations are done, but where one key is used twice. It does
this by composition of three DES operations, as shown in Table 2.2.

Table 2.2 Triple DES Operations
OPERATION
3 KEYS
2 KEYS
Encryption
Ek3(Dk2(Ek1(m)))
Ek1(Dk2(Ek1(m)))
Decryption
Dk3(Ek2(Dk1(m)))
Dk1(Ek2(Dk1(m)))
In software this is a bit slow, but for basic networking (100 megabit), the
speed limitations aren’t much of a problem, since the algorithm can be
implemented in hardware fairly efficiently. Triple DES is not a FIPS but
instead is specified under the ANSI X.9.52 standard and NIST SP 800-20
(ANSI stands for American National Standard Institute, NIST for National
Institute of Standards and Technology).
Triple DES Security
Providing 168 bits of security, Triple DES is by far the most studied,
strongest algorithm available. When in doubt, DES is probably the best
choice. Despite more modern ciphers, nobody suggests that Triple DES is
less secure than others.
Why Not “Double DES”
Using two keys, or “Double DES,” isn’t done because the key size of 112 is
too small; moreover, it isn’t much more secure than a single iteration of
DES. Given a plaintext-ciphertext pair, an attacker could extract the key
using a brute-force attack requiring up to 2112 computations. However, the
attacker can trade computation costs for memory storage by using a “meet-
in-the-middle” attack. There are many variations, but the simplest is to
compute 256 different encryptions of the plaintext and store them, then
begin decrypting the ciphertext using keys and compare the result to the
table. The attacker will likely get multiple matches and would need
another plaintext-ciphertext pair to verify that is the correct key. The worst
case is that the attacker would need to perform 256 decryptions, but even
then that’s only 257 total operations.
There is nothing specific to DES in this; any doubled-cipher will have
this problem. The attack extends to the two-key version of Triple DES as
well, but it’s generally less feasible, since the storage requirements become
quite large.
Blowfish
In 1994, 56-bit DES was becoming obsolete and Triple DES was still very
slow. Every other secure encryption algorithm was classified, incompletely

specified (GOST, a Soviet algorithm was known, but the critical S-Boxes
were classified), or known but unusable, since they were patented (Khufu,
REDOC II, IDEA) or crippled for legal export (RC2, RC4).
In late 1993, Bruce Schneier presented the Blowfish cipher, a free, non-
patented, drop-in replacement for DES with a 64-bit block size but a vari-
able key length of up to 448 bits [Schneier1994]. In today’s world awash
with cryptologists and a healthy open-source movement, it’s easy to forget
how revolutionary this was—strong cryptography for anyone who wanted
it. Even better, the Blowfish cipher is still secure.
The block size is the same as DES with 64 bits. However, Blowfish can
have a variable key size in multiples of 8 bits, from 40 to 442 bits (56 bytes).
During initialization, a key regardless of length is expanded to an internal
structure of 4,168 bytes. The setup costs are expensive, since it does 521 iter-
ations of encryption to generate the key schedule. However, after this, the
key size has no effect on performance—in other words, the speed using a
442-bit key is the same as with a 56-bit key. This allows key choice to be based
purely on security, storage, and protocol requirements instead of on perfor-
mance. The initialization penalty can somewhat be avoided by directly
specifying the internal array instead of having it generated (of course, then
a key of 4168 bytes instead of at most 56 bytes must be saved and stored).
In overall structure, it’s not that different from DES. The subkey array is
eighteen 32-bit subkeys (compare to DES with sixteen 48-bit subkeys).
There are four S-Boxes, each with 256 entries (compare to DES, which has
eight S-Boxes with 64 entries). The most important difference is that Blow-
fish dynamically creates S-Box values during initialization, while DES uses
fixed values. Notice that in Blowfish, all data is a nice round number of
32 and 8 bits, all designed to fit into machine registers with minimal
amount of bit extractions and shifts. This gives Blowfish its speed.
Cryptographically, Blowfish has been deemed strong and has been quite
well received. The algorithm has been included in dozens of free and
limited commercial products. There have been no significant faults found
with the algorithm. In a reduced eight-round version (not the standard
16 rounds), roughly 1 in 214 keys are weak [Vau1996]. The weakness allows
an attacker to know if a weak key is being used or not, but not what the
specific key is (and it also allows some other more abstract attacks). The
weak keys come about because Blowfish dynamically creates S-Boxes.
Apparently, in some cases Blowfish generates some stinkers where some of
the S-Box values are duplicated. There is no way of knowing which key is
weak in advance; you have to go through the iteration and then check the
S-Box values to see. Again, no weak keys have been found for the full-
strength 16-round version.

IDEA
The IDEA cipher also operates on 64-bit blocks but uses a fixed-size 128-bit
key length [LaMa1990, LaMaMu1991]. It is generally regarded as a very
well-designed cipher and has experienced no known cryptographic attacks
better than brute force. It is covered by patents owned by the Swiss firm
Ascom, but they have allowed free noncommercial use, provided permis-
sions are first obtained. Because of this licensing, IDEA became popular in
the email encryption program PGP (Pretty Good Privacy). Today it seems
to still be quite popular in the Perl community, partially since it is a very
secure cipher and has received a positive recommendation from Bruce
Schneier in Applied Cryptography (at the time Schneier had just developed
Blowfish and it was still too new to be recommended). However, IDEA,
like Blowfish, generates some nonlinear aspects of the cipher based on the
key; because of this there are classes of weak keys [DaGoVn1993]. Fortu-
nately, the odds of using one of the weak keys is 2-77, so it’s not considered
a grave threat.
While IDEA is fairly easy to implement in software, it’s also very slow
because of generous use of expensive multiplication operations. A naïve
implementation is typically about the same speed as DES, if not slower.
Those same multiplications allow it to speed up significantly on 64-bit
machines, or if someone hand-codes an assembly version that uses some
microprocessors’ extensions such as MMX (Intel and AMD processors),
SSE2 (Intel’s Pentium 4), or AltiVec (IBM/Motorola’s PowerPC). In these
cases, it’s about twice as fast as DES. See [Lipmaa1998] for a discussion of
using the MMX extension with IDEA.
RC5
RC5 was presented by Ron Rivest (coinventor of the RSA, for Rivest-
Shamir-Adelman, algorithm) in 1995 [Rivest1995]. It is also fully specified
in RFC 2105; however, it is patented (U.S. Patent 5,724,428) and is propri-
etary to RSA Security. It is fully parameterized in terms of block size, key
size, and security represented as RC5-w/r/b, where:
 w = Word size in bits—16, 32, and 64—which results in a block size
of 2w. The standard is 32.
 r = Number of rounds, which, in turn, provides security. The appro-
priate number depends on w.
 b = Number of bytes in secret key; can be 0 to 255.

Internally, the key expansion stage takes a key and turns into an internal
array S of r × w/8 bytes. Following that, encryption is amazingly short and
simple:
A = A + S[0]; // A is first half of plaintext block
B = B + S[1]; // B is second half of plaintext block
for (i = 1 .. r)
A = ((A 5 B) << B) + S[2i]; // <-- data-dependant rotations
B = ((B 5 A) << A) + S[2i+1] ;
The matching decryption algorithm is equally terse. As you can see, like
Blowfish, RC5 uses data-dependant rotations.
RC5 is typically used with 64-bit blocks, 16 rounds, and keys being at
least 128 bits (e.g., RC5-32/16/128+). While you can use larger blocks, you
might consider using the newer ciphers like Rijndael or RC6 (discussed in
the following sections). Using these guidelines, the most effective attacks
are brute force, and the security is based on the size of the key. There is no
reason not to use at least 128 bits, since after key setup, the cipher runs at
the same speed independent of key size.
Rijndael and the Advanced Encryption Standard (AES)
In 2001, the NIST selected Rijndael as the replacement for DES (FIPS 197).
Flemish for XYZ and pronounced “rain-doll,” Rijndael is an interesting
cipher, since it works in a completely different way from the previous ciphers.
The algorithm is in some ways similar to shuffling and cutting a deck of
cards. The interstate is laid out in a square, and the rows and columns are
shifted, mixed, and added in various ways. The entries themselves are also
substituted and altered. It has a lot of parallel and symmetric structure
because of the mathematics, which provides a lot of flexibility in how it is
implemented. However, some have criticized it as having too much structure,
which may lead to future attacks. Apparently that didn’t bother the NSA
(National Security Agency) or the NIST. No known cryptographic attacks
are known, and it works well on a wide variety of processors, doesn’t use
bit shifting or rotation, and is very fast.
The AES Competition
How Rijndael was selected is also interesting. Knowing that DES had out-
lived its useful lifetime, the NIST issued an open call for replacements.
Anyone was welcome to submit a cipher. The general requirements were:

 Must be a symmetric block cipher.
 Must support a block size of 128 bits.
 Must support key sizes of 128, 192, and 256 bits.
 Should be at least as fast Triple DES.
 Should run on a variety of hardware and should have “good”
characteristics for implementation in hardware.
 If selected as the winner, the submitter relinquishes all intellectual
property claims such as patents and trademarks.
Out of 15 semifinalists, 5 were selected:
 Rijndael, the eventual winner.
 MARS, from IBM.
 RC6, from RSA Labs (similar to RC5).
 Serpent, from Ross Anderson, Eli Biham, and Lars Kundsen.
 Twofish, from Bruce Schneier, John Kelsey, Doug Whiting, David
Wagner, Chris Hall, and Niels Ferguson. It is the successor to the
previously mentioned Blowfish algorithm.
How did they rate? Table 2.3 presents summaries from the confer-
ence reports from the Third Advanced Encryption Standard Candidate
Conference.
The rankings are extremely relative, and a last-place finish does not
mean the algorithm was unacceptable. Each algorithm author had his or
her own interpretation of what the NIST selection process called for. Ser-
pent, for instance, was extremely conservative in security, and this shows
in its performance, while RC6 focused on 32- and 64-bit processors.
That the four other ciphers were not selected does not imply they are
insecure. To the contrary, the NSA said any of the final candidates would
make a good standard from a security standpoint (the speed with which
they made their decision would seem to imply that the NSA has much more
sophisticated methods of cipher analysis than the public has). In addition,
the NIST has not specifically excluded them from being used. Since the
NIST wanted a “generalist” cipher, these alternates may find their way into
other more specialized standards and applications. However, Twofish and
RC6 are particularly noteworthy because they have strong corporate or
independent support and are likely to be used in future applications.

Table 2.3 Evaluation of AES Finalists on Various Platforms
PROCESSOR
RELATIVE RANKING
ARCHITECTURE
BETTER......................................................WORSE
FGPA
Rijndael
Serpent
RC6
MARS
(programmable
Twofish
logic)
DSP TMS320C62x
RC6
Rijndael
Twofish
Toshiba T6N55
Rijndael
RC6
(smart card)
Twofish
Pentium II
Rijndael
RC6
MARS
Serpent
Twofish
Itanium /
Rijndael
RC6
MARS
Serpent
PA-RISC (HP)
Twofish
Alpha 21264
Rijndael
Twofish
RC6
MARS
Serpent
ASIC hardware,
Rijndael
Serpent
Twofish
MARS/RC6
worst case
“Parallelism”
Rijndael
Twofish
Serpent
MARS
RC6
(feedback modes)
Memory usage
RC6
Serpent
MARS
Twofish
Rijndael
Software
Rijndael
MARS
Serpent
(summary, Whiting)
Twofish
RC6
Software (summary,
RC6
Rijndael
MARS
Twofish
Serpent
Bassham)
Java (speed)
Rijndael
MARS
RC6
Java (throughput)
RC6
MARS
Rijndael
Twofish
Java (key setup)
Rijndael
Serpent
RC6
MARS
Twofish
Twofish
The successor to Blowfish, Twofish has already been widely deployed in
many open-source and minor commercial projects. Twofish is meticulously
constructed and well documented, and the submission paper is an excel-
lent reference and bibliography on cryptanalysis. The developers even

published a book on its design [ScKeWhWaFe1999]. In some ways, Twofish
is a very traditional design that isn’t dissimilar to DES: it’s based on a Feistel
design using expanded keys and S-Boxes for nonlinearity. The big differ-
ence is how each part was designed and how each part interacts with the
others. Many components were found by an exhaustive search for a map-
ping that provided an optimal result using the smallest number of machine
operations. While each piece of Twofish separately is fairly straightfor-
ward, the final result is hard to understand and analyze. One description of
it is “a collection of patches which fix various problems” [Baudron1999].
That said, no one is suggesting it is insecure and currently the best attack is
brute force.
RC6
RC6 builds upon the success of RC5 and is modified to handle 128-bit block
sizes using four 32-bit registers. Like RC5, the algorithm is completely
parameterized and specified as RC6-w/r/b. As submitted to the AES, RC6
uses w = 32 and r = 20. The length of the key may be any multiple of 8; how-
ever, it is most common to use b = 16, 24 or 32 bytes (128, 192, and 256 bits).
The key scheduling function is almost identical to the one used in RC5,
except that RC6 generates more bytes from the key. It is fast—and in many
cases faster than Rijndael—and Java performance is excellent. RSA Secu-
rity has submitted RC6 to various other standard bodies, so you can be
sure you will be seeing more of it. They also have trademarked the name
“RC6” and received a patent on its design (U.S. Patent 6,269,163).
Ciphers You Shouldn’t Use
This section lists ciphers that are insecure and should not be used. Oddly,
they keep turning up in supposedly high-security applications such as
authentication tokens (see [FuSiSmFe2001] for examples). The ciphers are
most likely used because they are simple to implement in a few lines of
code and visually appear to do a good job obscuring the underlying data.
Password XOR Schemes
In this type of scheme, the user takes a password and repeatedly uses it to
XOR against the message. This has been used in various forms, including
file protection. It is also common in protecting machine code where the
program starts to decrypt the machine code and then jumps into the newly
decrypted code. However, this is a weak scheme, and it can be easily

cracked, especially if there is any knowledge about what the plaintext
looks like (such as common headers or ASCII). It can be used to obscure
data to make it visually unreadable, but not for anything that is susceptible
to an attack:
// same function to encode and decode
public static void xorData(byte[] data, byte[] key) {
for (int i = 0, j = 0; i < data.length; i++) {
data[i] ^= key[j++];
if (j == key.length) j = 0;
}
}
Classical Cryptography
Classical cryptography involved simple ciphers that typically worked on
letters or characters as opposed to bytes. More advanced classical cipher
operated on blocks, but typically blocks of letters instead of blocks of bytes.
These are now primarily of historical interest and useful as examples for
more advanced concepts.
Interestingly, some of the work of classical cryptography is still classified.
From 1938 to 1941, William Friedman wrote a four-volume series titled
Military Cryptanalysis. In 1984 the first two volumes were declassified. A
more modern series from Lambros Callimahos included three volumes
written from 1959 to 1977. Again only the first two are declassified. In 1992,
an attempt to declassify the remaining volumes using the Freedom of Infor-
mation Act was rejected. According to the NSA (Gilmore v. NSA: 10-02-93 let-
ter, available at: www.eff.org/Privacy/Key_escrow/ITAR_export/
ITAR_FOIA/Gilmore_v_NSA/10-02-93.letter
The documents are classified because their disclosure could reasonably be
expected to cause serious damage to the national security . . . . In addition,
this Agency is authorized by various statutes to protect certain information
concerning its activities. We have determined that such information exists in
these documents.
In other words, 45 years later, World War II cryptography is still essential
to national security. One can only speculate as to why this would be. To my
knowledge no further attempts have been made to release the material
since 1992, but maybe someone will try again.
Regardless of what is hidden in those texts, classical cryptography has
no place in your security solution.