### About Global Documents

**Global Documents provides you with documents from around the globe on a variety of topics for your enjoyment. **

**Global Documents utilizes edocr for all its document needs due to edocr's wonderful content features. Thousands of professionals and businesses around the globe publish marketing, sales, operations, customer service and financial documents making it easier for prospects and customers to find content. **

Nick Galbreath

Cryptography for

Internet and Database

Applications

Developing Secret and Public Key

Techniques with Java™

Cryptography for Internet and

Database Applications

Nick Galbreath

Cryptography for

Internet and Database

Applications

Developing Secret and Public Key

Techniques with Java™

Publisher: Bob Ipsen

Editor: Carol A. Long

Developmental Editor: Adaobi Obi

Managing Editor: Micheline Frederick

New Media Editor: Brian Snapp

Text Design & Composition: Wiley Composition Services

Designations used by companies to distinguish their products are often claimed as trade-

marks. In all instances where Wiley Publishing, Inc., is aware of a claim, the product names

appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appro-

priate companies for more complete information regarding trademarks and registration.

This book is printed on acid-free paper. ∞

Copyright © 2002 by Nicholas Galbreath. All rights reserved.

Published by Wiley Publishing, Inc., Indianapolis, Indiana

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted

in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or

otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright

Act, without either the prior written permission of the Publisher, or authorization through

payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rose-

wood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470. Requests to the Pub-

lisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc.,

10475 Crosspointe Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail:

permcoordinator@wiley.com.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their

best efforts in preparing this book, they make no representations or warranties with respect

to the accuracy or completeness of the contents of this book and specifically disclaim any

implied warranties of merchantability or fitness for a particular purpose. No warranty may

be created or extended by sales representatives or written sales materials. The advice and

strategies contained herein may not be suitable for your situation. You should consult with

a professional where appropriate. Neither the publisher nor author shall be liable for any

loss of profit or any other commercial damages, including but not limited to special, inci-

dental, consequential, or other damages.

For general information on our other products and services please contact our Customer

Care Department within the United States at (800) 762-2974, outside the United States at

(317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears

in print may not be available in electronic books.

Library of Congress Cataloging-in-Publication Data:

ISBN: 0-471-21029-3

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

Preface

xiii

Introduction

xv

Chapter 1

Bits and Bytes

1

General Operations

1

Number Bases

2

Bits and Bytes

3

Signed Bytes

4

Bitwise Operators

4

Complementation or Bitwise NOT

6

Bitwise AND

6

Bitwise OR

7

Bitwise Exclusive OR (XOR)

8

Left-Shift

8

Right-Shift

9

Special Operations and Abbreviations

10

Packed Words

11

Integers and Endian Notation

12

Java Numerics

13

Basic Types

14

Type Conversion

15

Unsigned to Signed Conversions

17

Overflow

17

Arrays

18

Numeric Classes

20

Contents

v

Booleans and BitFields

21

Chars

22

Working with Bytes

22

Sign Problems

22

Conversion of Integral Types to Byte Arrays

24

Converting to Hex Strings

25

BigInteger

28

Creating and Converting

29

BigInteger and Cryptography

31

Secret Methods in BigInteger

31

Chapter 2

Secret Key Cryptography

35

Symmetric Block Ciphers

35

Cipher Properties

36

Security Properties

36

Brute-Force Attacks

37

Other Attacks

37

Common Block Ciphers

39

Data Encryption Standard (DES)

39

Triple DES

40

Blowfish

41

IDEA

43

RC5

43

Rijndael and the Advanced Encryption Standard (AES)

44

Twofish

46

RC6

47

Ciphers You Shouldn’t Use

47

Password XOR Schemes

47

Classical Cryptography

48

ROT 13

49

Padding

49

Fixed-Value Padding

49

Random Padding

50

PKCS Padding

50

Modes of Operations

51

Initialization Vectors

51

Electronic Codebook Mode (ECB)

53

Cipher Block Chaining Mode (CBC)

54

Cipher Feedback Mode (CFB)

56

Output Feedback Mode (OFB)

58

Counter Mode (CTR)

60

Propagating CBC (PCBC) Mode

60

Key Wrapping

61

Triple DES KEY Wrapping

61

AES Key Wrapping

62

Turning Passwords into Keys

63

Hashes

64

Cryptographic Hashes

65

Collisions

66

Attacks

67

Algorithms

69

The MD Family

71

The SHA Family

71

The RIPE-MD Family

72

Hash Standards and Practices

73

Hashed Message Authentication Codes (HMACs)

74

The Standard HMAC

74

Legacy Concatenation Schemes

75

HMAC Standards and Practices

76

Summary

76

Chapter 3

Public Key Cryptography

77

Public Key Ciphers

77

Other Systems

79

Digital Signatures

79

Key Agreements

79

Zero-Knowlege Schemes

79

Secret Sharing

80

Public Key Security Classification

80

Fundamental Mathematics

82

Prime Numbers

82

The Distribution of Prime Numbers

83

Prime Testing

84

Probabilistic Tests

85

Sequence-Based Tests

87

Elementary Number Theory

88

Modular Arithmetic

88

Additive Groups

89

Multiplicative Groups

89

Fields

90

Rings

90

Orders and Generators

90

Public Key Encryption and Major PKCS Categories

91

RSA and Integer Factorization

92

Factoring

92

The RSA Problem

94

The Algorithm

94

Message Representation and OAEP

96

In Practice and Standards

98

Choice of Parameters

98

Discrete Logarithm Systems

100

Underlying Mathematics

100

The Algorithm

103

Standards and Practice

106

Elliptic Curves

106

Underlying Mathematics: Elliptic Curves

106

The Algorithm

110

Standards and Practice

112

Other Public Key Cryptographic Systems

112

Rabin Cryptosystem

112

NTRU

114

Summary

115

Chapter 4

Random Numbers

117

Randomness and Security

119

Testing for Randomness

120

FIPS 140-2 Requirements

121

Pseudorandom Number Generators

122

Cryptographic PRNG

123

SHA-1 PRNG

123

Cipher-CBC or ANSI X9.17

123

FIPS 186

123

Blum-Blum-Shub

124

Stream Ciphers

125

One-Time Pads

125

RC4 or ArcFour

125

Using Randomness

126

Generating Random Numbers for Gaming

126

Generating Random Numbers in a Range

127

Shuffling

129

Generating Random Permutations

131

Small Permutations

131

Large Fixed Permutations

132

Random Sampling

133

Accessing Entropy

134

OS Services

134

Win32 CryptoAPI CryptGenRandom

135

/dev/random and friends

135

Userland Services

136

Entropy Generating Daemon (EGD)

137

PRNGD

141

Yarrow and EGADS

141

TrueRand Library

141

Remote Services

142

RAND Corporation

143

HotBits

143

Random.org

143

LavaRnd

144

Java and Random Numbers

144

Random and SecureRandom

144

java.util.random

145

java.security.SecureRandom

146

Developer Issues

148

Reseeding

148

Collecting Entropy

150

An Entropy Pool Implementation

150

Basic System State

151

Thread Schemes

153

Reading External Sources

156

Application Events

156

User Events

157

Chapter 5

Java Cryptography

159

Organization

160

Providers and Engine Classes

161

Parameters, Keys, and Certificates

162

Error Handling

163

Providers

164

Standard Names

165

Standard Sun and SunJCE Providers

168

Other Providers

169

Initializing Providers

170

Writing Your Own Provider

171

Core Engine Classes

171

MessageDigest

171

Digest Streams

172

MAC

173

SecureRandom

174

Ciphers

177

Additional Cipher-Related Objects

180

Signatures

183

SignedObject

184

Key Agreement Protocols

184

Parameters, Keys, and Certificates

185

Algorithm Parameters

186

AlgorithmParameters

186

AlgorithmParameterGenerators

188

Keys

189

Secret Keys

192

Public/Private Keys

195

Encoding and Encrypting Keys

197

Summary

202

Chapter 6

Small Message Encoding and Encryption

203

Preprocessing

203

Converting Digits into Bytes

203

7-bit to 8-bit Compression

205

General Compression and java.util.zip.Deflate

206

Adding Check and Parity Bits

207

Small Message Encryption

209

Single-Block Encryption

210

n-Block Encryption

210

Very Small Message Encryption

211

XOR Tables

211

Small Codebooks

211

RC5-16/16

212

Small Message Encoding

212

Encoding for Customer-Usable Data

213

Capacity and Range

213

Selecting a Base Representation

214

Selecting Base Alphabets

215

Mixed Bases and Alphabets

221

Adding Check Digits

221

Encoding for Machines and Customer-Visible Applications

230

Base 64

230

Base 85

234

Base 128 and Java Source Encoding

237

Chapter 7

Application and Data Architecture

241

Database Architecture for Encrypted Data

241

Selecting a Cipher

243

Secret or Public?

243

Cipher Selection

244

Data

245

Passwords

245

Challenges and Responses

246

Payment, Credit Card, and Other Account Numbers

247

Social Security Number (U.S.)

250

Birthdates and Birthdays

250

Last Name

251

Searching, Indexing, and Constraints

251

Removing Randomness

252

Deterministic Key Selection

252

Indexing and Hashing

253

Uniqueness Constraints

254

Asymmetric Data Usages

255

Null Values and Database Applications

256

Secure Memory Management in Java

258

Smart Array Classes

259

Char Arrays

262

Using SecureRandom

263

Secret Key Management

263

Secret Key Data

264

Key Generation

266

Key Encryption

266

Storage

267

Key Access and Distribution

268

Using Keys with Ciphers and MACs

269

Passwords

272

Startup Passwords

272

Member Names and Passwords

274

Selecting Passwords

274

Member Login, Success and Failure

275

Changing Passwords and Challenges

276

Web-Based Password Entry

276

Generating New Passwords

277

Member Names

278

Logging

278

Embedded-Encryption Logging

279

Fully Encrypted Log Files

279

Public Key Logging

281

Split Log Files

281

Network-Based Logs

282

Cryptographic Tokens and Applications

282

Token Design

282

Expirations and Time Quantization

283

Creating the Security Bits

285

URL Tokens

285

Tamper-Evident URLs

285

Protecting Static Content

286

A Simple URL MAC Implementation

287

Fast Query String Parsing

290

URL Encryption

296

Very Short URLs

296

Cookie Tokens

297

Detecting Cookie Capability

297

Cookies and Authentication

297

Tokens for Access Control

298

Buy-on-Demand Systems

299

Multiple Key Systems

299

Trials and Expirations

300

Decimal and Monetary Computations

300

Doubles and Floats

300

BigDecimal

301

Rounding

302

BigDecimal and Mathematics

303

BigDecimal Alternatives and Wrappers

304

Usage and Storage

304

Appendix A Java Cryptography Class Reference

305

References

367

Index

381

I wrote this book for software engineers with little or no exposure to cryp-

tography. Most other books fall into one of two categories, the encyclope-

dia and description or the purely API descriptive. The goal was try and

bridge the two by providing a solid introduction to cryptography while

providing solid examples and uses. In addition, many books focus over-

whelmingly on public key techniques. In my experience the most common

uses for public key ciphers are handled by third-party applications (VPNs,

Emails) or off-the-shelf protocols such as SSL and SSH.

While there are a number of excellent cryptographic libraries in C and

C++, Java serves as the reference since:

It’s very popular for new business and server applications.

Memory management is automatically done, eliminating entire

classes of bugs (stack smashing and buffer overflows).

It provides a standard cryptographic API that, while not perfect or

complete, is about as “universal” as we can get.

The cryptographic API for Java is scattered among several packages.

Instead of listing classes by packages as is commonly done, Appendix A

lists the classes alphabetically. I found this to be much more useful than

flipping between different package sections.

I’ve tried to limit source code to examples that are relatively simple or

that demonstrate main points or self-contained utilities. More complicated

(and perhaps more useful) examples were excluded simply because the

error and exception handling for a security application can be quite long

and tedious and didn’t really add much value. I have always disliked

Preface

xiii

CD-ROMs glued into books and likewise, never found chapters and chap-

ters of source code to be very useful. Instead, full source code and more

examples are presented at the companion Web site at www.wiley.com/

compbooks.galbreath.

Unfortunately, time and space didn’t permit examination of many top-

ics. In particular:

XML key management, encryption, and signature scheme

Privacy Issues, including privacy “seals,” Gramm-Leech-Biley Act of

1999 and HIPPA privacy rule.

More detailed database tips and techniques.

Perhaps these will be detailed in future editions. Until then, I hope you find

this book useful.

Finally, this book would not have been possible without the many fruit-

ful conversations and support from Dave Andre, Jeff Bussgang, Ann

Calvit, Roy Dixit, Jim Finucane, Bill French, Venkat Gaddipati, Nabil

Hachem, Sam Haradhvala, Adam Hirsch, Steve Morris, Jack Orenstein,

Rich O’Neil, and Matt Rowe, and many others from Upromise, SightPath

and Open Market. Special thanks to my family and friends that had to put

up with my social hiatus while working on this book. Thank you all.

Nick Galbreath

Plum Island, MA

June 2002

The goal of the book is to present the systems programmer with a full intro-

duction into the science and the application of cryptography using Java.

Unlike many texts, this book does not focus on cryptography for transmis-

sion (allowing two parties to communicate securely). For most applica-

tions, those requirements are handled by the SSL or SSH protocols, or a

third-party application. Instead this book focuses on cryptography for stor-

age, message integrity, and authentication, (the so-called “single-point”

techniques) which is more common with custom applications. Beside pure

cryptography, many “auxiliary” techniques are covered that make cryp-

tography useful.

The first chapter covers basic logical and numeric operations, both at the

theoretical level and with the specifics of the Java numeric model. While it

may be a review to many readers, I’ve found that many system program-

mers just do not have exposure to these low-level details. Either they never

learned, or more likely, because it just isn’t normally used in day-to-day

work, it’s been forgotten. Those readers who are more familiar with C or

C++, should find this chapter especially useful for translating code one

way or another from Java.

The next two chapters introduce the reader to science, mathematics, and

standards of both secret key and public key cryptography. Here you’ll

learn why the algorithms work and the various tradeoffs between them.

While the mathematics aren’t essential in day-to-day programming tasks,

the goal was to teach the reader to be familiar with the terminology that is

often tossed around to be able to make informed decisions.

Introduction

xv

Chapter 4 discusses random numbers and random number generators.

While not technically cryptography, random numbers are essential to it.

Those readers developing gaming systems should find this section espe-

cially interesting. Java is used for examples of algorithms, but it could be

easily converted to any other programming language.

Finally, in Chapter 5 we introduce Java’s cryptographic APIs. The Java

SDK already comes with some excellent documentation, but I’ve tried to

pull it together consolidating the JCA and JCE into a coherent format. Like-

wise, I made Appendix A the Java cryptography reference I wanted when

I’m programming. Instead of listing classes by package, I list them all

alphabetically. You’ll find you’ll do a lot less page flipping this way and get

a much better understanding of the API with this organization.

Chapter 6 is on small message encoding, and like Chapter 4, isn’t techni-

cally cryptography, but it’s always an issue when using cryptography.

Here you’ll learn how to convert binary data (perhaps encrypted) into var-

ious ASCII formats. This is critical when embedding data in a URL, creating

passwords and cryptographic tokens.

Chapter 7 pulls everything together and discusses many topics: applica-

tion and database design, working the passwords and tokens, key man-

agement, and logging.

Numerous source code examples are used, and the source itself can be

found at the companion Web site www.modp.com. Many of the examples

in the book will require a little reworking for production use. You’ll want

to modify the error handling to suite your needs. In some cases, if the code

is especially long and not particularly illuminating, I decided not to list it

and instead just refer to the Web site. I find page after page of printed

source code in a book to not be particularly useful.

Unfortunately, due to time and space requirements a lot of important

topics are not covered. Digital signatures are only mentioned and they

deserve more, but I’ve found for basic application work, they just aren’t

that useful. Either, most people won’t use them or there is already a trust

relationship in place eliminating their usefulness, or encryption with a hash

or MAC is preferable. And finally, when I do need to use them, the process

is normally handled by a third party application. Other topics not given

their due are the new XML and SAML standards for creating documents

that contain encrypted data, and embedding cryptography within the data-

base (instead of the application), such as the newer Oracle database can do.

A lot more could be said for key management and database design as well.

Perhaps future editions will remedy this.

1

Before getting into the details of cryptographic operators, we’ll review

some basics of working with bits and number bases. For those of you who

have worked on operating systems, low-level protocols, and embedded

systems, the material in this chapter will probably be a review; for the rest

of you, whose day-to-day interactions with information technology and

basic software development don’t involve working directly with bits, this

information will be an important foundation to the information contained

in the rest of the book.

General Operations

Most cryptographic functions operate directly on machine representation

of numbers. What follows are overviews of how numbers are presented in

different bases, how computers internally represent numbers as a collec-

tion of bits, and how to directly manipulate bits. This material is presented

in a computer- and language-generic way; the next section focuses specifi-

cally on the Java model.

Bits and Bytes

C HAPTE R

1

Number Bases

In day-to-day operations, we represent numbers using base 10. Each digit

is 0 through 9; thus, given a string of decimal digits dndn-1 ... d2d1d0, the

numeric value would be:

10ndn + 10n-1dn-1 + ... + 102d2 + 10d1 + d0

This can be generalized to any base x, where there are x different digits

and a number is represented by:

xndn + xn-1dn-1 + ... + x2d2 + xd1 + d0

In computer science, the most important base is base 2, or the binary rep-

resentation of the number. Here each digit, or bit, can either be 0 or 1. The

decimal number 30 can be represented in binary as 11110 or 16 + 4 + 2. Also

common is hexadecimal, or base 16, where the digits are 0 to 9 and A, B, C,

D, E, and F, representing 10 to 15, respectively. The number 30 can now be

represented in hexadecimal as 1E or 16 + 14. The relationship between dig-

its in binary, decimal, and hexadecimal is listed in Table 1.1.

Table 1.1 Binary, Decimal, and Hexadecimal Representations

BINARY

DECIMAL

HEXADECIMAL

0000

0

0

0001

1

1

0010

2

2

0011

3

3

0100

4

4

0101

5

5

0110

6

6

0111

7

7

1000

8

8

1001

9

9

1010

10

A

1011

11

B

1100

12

C

1101

13

D

1110

14

E

1111

15

F

When you are working with different bases, the base of a number

may be ambiguous. For instance, is 99 the decimal or the hexadecimal 99

(= 9 × 16+9)? In this case, it’s common to prefix a hexadecimal number with

0x or just x (e.g., 99 becomes 0x99 or x99). The same confusion can happen

with binary numbers: Is 101 the decimal 101 or the binary 101 (4 + 1 = 5)?

When a number in binary may be confused, it’s customary to add a sub-

script 2 at the end of a string of binary digits, for example, 1012.

Any number can be represented in another base b if it is positive; how-

ever, doing the conversion isn’t necessarily easy. We’ll discuss general-

purpose base conversion in a later section, but it’s useful to note that

conversion between two bases is especially easy if one of the bases is a

power of the other. For instance, the decimal number 1234 has the canoni-

cal representation in base 10 as 1 × 1000 + 2 × 100 + 3 × 10 + 4. However,

1234 can also be thought as two “digits” in base 100: (12) and (34), with a

value of 12 × 1000 + 34 × 100. It’s the same number with the same value; the

digits have just been regrouped. This property is especially useful for base

2. Given a binary string, it’s possible to convert to hexadecimal by group-

ing 4 bits and computing the hexadecimal value:

10011100 = 1001 1100 = 9C

Bits and Bytes

A bit is the smallest unit of information that a computer can work on and

can take two values “1” and “0,” although sometimes depending on con-

text the values of “true” and “false” are used. On most modern computers,

we do not work directly on bits but on a collection of bits, sometimes called

a word, the smallest of which is a byte. Today, a byte by default means 8 bits,

but technically it can range from 4 to 10 bits. The odd values are from either

old or experimental CPU architectures that aren’t really in use anymore. To

be precise, many standards use octet to mean 8 bits, but we’ll use the more

common byte. Modern CPUs operate on much larger word sizes: The term

32-bit microprocessor means the CPU operates primarily on 32-bit words in

one clock cycle. It can, of course, operate on 8-bit words, but it doesn’t

mean it happens any faster. Many CPUs also have special instructions that

sometimes can operate on larger words, such as the SSE and similar

instructions for multimedia, as well as vector processing, such as on the

PowerPC.

A byte has a natural numeric interpretation as an integer from 0 to 255

using base 2, as described earlier. Bit n represents the value 2n, and the

value of the byte becomes the sum of these bits. The bits are laid out exactly

as expected for a numeric representation, with bit 0 on the right and bit 7 on

the left, but the layout is backward when compared to Western languages.

(b7b6b5b4b3b2b1b0) = 27b7 + 26b6 + 25b5 + 24b4 + 23b3 + 22b2 + 21b1 + 20b0

or using decimal notation

(b7b6b5b4b3b2b1b0) = 128b7 + 64b6 + 32b5 + 16b4+ 8b3 + 4b2+ 2b1 + b0

For example, 00110111 = 32 + 16 + 4 + 2 + 1 = 55.

Bits on the left are referred to as the most-significant bits, since they

contribute the most to the overall value of the number. Likewise, the right-

most bits are called the least-significant bits. This layout is also known as

Big-Endian, which we’ll discuss later.

Signed Bytes

Negative numbers can be represented in a few ways. The simplest is to

reverse one bit to represent the sign of the number, either positive or nega-

tive. Bits 0 through 6 would represent the number, and bit 7 would repre-

sent the sign. Although this allows a range from –127 to 127, it has the quirk

of two zeros: a “positive” zero and a “negative” zero. Having the two zeros

is odd, but it can be worked around. The bigger problem is when an over-

flow occurs—for instance, adding 127 + 2 is 129 in unsigned arithmetic, or

1000001. However, in signed arithmetic, the value is –1.

The most common representation is known as two’s complement. Given x,

its negative is represented by flipping all the bits (turning 1s into 0s and

vice versa) and adding 1, or computing –1 –x (the same value). For example,

note in Table 1.2 that adding 1 to 127 makes the value –128. While this

method is a bit odd, there are many benefits. Microprocessors can encode

just an addition circuit and a complementation circuit to do both addition

and subtraction (in fact, many CPUs carry around the complement with

the original value just in case subtraction comes up). The other main bene-

fit is when casting occurs, or converting a byte into a larger word, as

described in the following section.

Bitwise Operators

The usual arithmetic functions such as addition and multiplication inter-

pret words as numbers and perform the appropriate operations. However,

other operations work directly on the bits without regard to their represen-

tation as numbers. These are bitwise or logical operations. While examples

shown in the next sections use 8-bit bytes, they naturally extend to any

word size.

Table 1.2 Two’s Complement Representation

UNSIGNED

SIGNED

HEXADECIMAL

BINARY

VALUE

VALUE

REPRESENTATION

REPRESENTATION

0

0

00

00000000

1

1

01

00000001

2

2

02

00000010

126

126

7d

01111110

127

127

7f

01111111

128

-128

80

10000000

129

-127

81

10000001

130

-126

82

10000010

253

-3

fd

11111101

254

-2

fe

11111110

255

-1

ff

11111111

As shown in Table 1.3, the operations are represented by different sym-

bols depending if the context is programming or typographical. When

required, this book will use the programming notations, with the exception

of XOR, since the caret symbol (^) is used to denote exponents in some

systems.

Table 1.3 Bitwise Operations and Notation

C-STYLE

C-STYLE SELF-

OPERATION

NOTATION

ASSIGNMENT

TYPOGRAPHICAL

NOT a

~a

n/a

¬ a

a AND b

a & b

a &= b

a ^ b

a OR b

a | b

a |= b

a b

a XOR b

a ^ b

a ^= b

a ⊕ b

Shift a left by

a << n

a <<= n

a << n

n bits

(continues)

^

Table 1.3 Bitwise Operations and Notation (Continued)

C-STYLE

C-STYLE SELF-

OPERATION

NOTATION

ASSIGNMENT TYPOGRAPHICAL

Shift a right by

a >> n

a >>=n

None; all shifts and

n bits, preserve

words are assumed

sign of a

to be unsigned

Shift a right by

a >>> n

a >>>=n

a >> n

n bits, unsigned

Rotate a right by

(x >>> n) |

n/a

ROTRn(a)

n bits; w is

(x << w-n)

number of bits of a

Rotate a left by

(x << n) |

n/a

ROTLn(a)

n bits; w is the

( x >>> w-n)

number of bits of a

Concatenating

(a << shift) | b

n/a

a || b

a and b

Take n least-

a & mask

a &= mask

MSBn(a)

significant bits of a

Take n most-

a & mask

LSBn(a)

significant bits of a

>>> shift

Complementation or Bitwise NOT

The simplest bit operation is complementation or the bitwise NOT. This

simply flips bits within a word, where 0s become 1s and 1s become 0s—for

example ~11001 = 00110. Cryptographically, this operation is not used

much, primarily for implementing basic arithmetic in hardware.

Bitwise AND

AND is useful in bit testing and bit extraction. It’s based on the usual truth

table for logical AND, as shown in Table 1.4. You can remember the truth

table by observing that it’s the same as binary multiplication—anything

times zero is zero.

Table 1.4 Bitwise AND

0

1

0

0

0

1

0

1

AND

To test if a bit is set, create a mask with 1s in positions you wish to test

and perform the logical AND operation. If the result is nonzero, the bit is

set; otherwise, the bit is not:

01011101

AND 00001000

________

00001000 // not == 0, bit 4 is set

It’s also useful for bit clearing. A mask is created with 1s in positions to

preserve and 0s where you want to clear. For instance, to clear the four

least-significant bits in a byte, use the mask 11110000:

11011101

AND 11110000

________

11010000

Bitwise OR

Bitwise OR extends the logical OR to a series of bits. In English, or means

one or the other but not both. The bitwise version means one or the other

or both (the same as the logical OR). See Table 1.5.

Logical OR is useful in joining nonoverlapping words. The following

example joins two 4-bit words together:

11010000

OR 00001111

___________

11011111

In this case, the same effect could be done by using regular addition;

however, it’s pretty standard to use OR instead to make it clear that addi-

tion is not the point of the operation and that you are doing bit operations.

Table 1.5 Bitwise OR

0

1

0

0

1

1

1

1

OR

Table 1.6 Bitwise XOR

Bitwise Exclusive OR (XOR)

Exclusive OR is abbreviated as XOR, and its operation is denoted by ⊕. It

is less common in normal programming operations but very common in

cryptographic applications. Table 1.6 shows the Bitwise XOR table. This

table is easy to remember, since it is the same as addition but without any

carry (this is also known as addition modulo 2). It’s also equivalent to the

English usage of or—one or the other but not both.

XOR tests to see if two bits are different: If the bits are both 0s or both 1s,

the result is 0; if they are different, the result is 1. XOR is useful in crypto-

graphic applications, since unlike AND and OR, it’s an invertible opera-

tion, so that A ^ B ^ B = A:

11010011

XOR 10101010

____________

01111101

XOR 10101010

____________

11010011

Left-Shift

As its name implies, the left-shift operator shifts the bits within a word to

the left by a certain number of positions. Left-shifts are denoted by a << b,

where a is the value and b denotes the number of positions to shift left.

Zeros are filled in on the least-significant positions:

11111111 << 1 = 11111110

11001100 << 3 = 01100000

Mathematically, each shift left can be thought of as multiplication by 2:

3 × 2 = 00000011 << 1 = 00000110 = 4 + 2 = 6

3 × 4 = 00000011 << 2 = 00001100 = 8 + 4 = 12

-3 × 2 = 11111101 << 1 = 11111010 = -6

-128 × 2 = 10000000 << 1 = 00000000 = 0 (overflow)

0

1

0

0

1

1

1

0

XOR

When you are working with large word sizes (such as 32-bit integers),

left-shifts are used to compute large powers of 2, since 2n = 1 << n. A trick

to test to see if only one bit is set (or rather the value is a power of 2) is using

x & -x == x. In many cryptographic operations, we need to extract a cer-

tain number of the least-significant bits from a word. Normally you are

working with word sizes much bigger than a byte. However, say you

needed to extract the six least-significant bits from a byte B. You could per-

form B & 0x3f (B & 0011111). Computing this for larger word sizes is a bit

clumsy, and worse, some algorithms may extract a variable number of bits,

so the mask has to be dynamic. Hardwiring a hexadecimal mask is also

prone to errors, and someone reading your code would have to think about

how many bits you are extracting. How many bits are in 0x7fffff? The

answer is 23, but it’s not automatically clear, and staring at a computer mon-

itor all day where the text is in a small font makes the task harder. Instead,

a left-shift can be used, since 2n – 1 or (1 << n) – 1 has binary representation

of with (n – 1) digits of “1.” Instead of 0x7fffff, we can use ((1 << 24) – 1), which

will be perfectly clear to someone familiar with bit operations. Even better,

you can make your intentions clearer by making the mask a constant:

public static final int mask23lsb = (1<<24) -1; // 23-bit mask

Right-Shift

Not surprisingly, right-shifts, denoted by >>, are the opposite of left-shifts,

and positions are shifted to the right. However, there is another significant

difference. In left-shift notation, zeros are placed in the least-significant

position. With right-shift, the value of the most-significant bit is used to

fill in the shifted positions. For example, 1000000 >> 2 = 1100000, but

0100000 >> 2 = 00100000. This is designed so that the sign is preserved and

the right-shift is equivalent to division by 2 (dropping any fractional part),

regardless of the sign. For example, we’ll “undo” the previous left-shift

examples:

6 / 2 = 00000110 >> 1 = 00000011 = 3

12 / 4 = 00001100 >> 2 = 00000011 = 3

-6 / 2 = 11111010 >> 2 = 11111101 = -3

-127 / 2 = 10000001 >> 1 = 11000000 = -64

This works well when you are treating the byte as a number. In crypto-

graphic applications, the bits are just treated as bits without any numerical

interpretation. In every case, it is assumed the byte or word is unsigned

(e.g., unsigned long in C). If you are using a signed type (such as Java),

you’ll want to use the unsigned right-shift, denoted by >>> (three less-than

symbols). This just shifts the bits to the right and fills in zeros for the most

significant bits. All cryptographic papers assume the shift is unsigned and

normally use the plain >>. When coding an algorithm using signed types,

make sure you convert >> to >>>.

Special Operations and Abbreviations

The previous operations, while useful, are fairly basic and typically directly

implemented in hardware. By combining and composing them, you can

create new higher-level bit operations. For cryptography, the most useful

operations are rotations, concatenation, and extraction of most- and least-

significant bits.

Rotations

Bit rotations are fairly self-explanatory: Bits are shifted either left or right,

and the bits that “fall off the edge” roll over onto the other side. These are

commonly used in cryptography, since this method provides an invertible

operation that can aid in scrambling bits. The problem is that they aren’t

used for much else, so many CPUs do not have rotation instructions, or if

they do, they are frequently slow. Even if they have a fast rotation, the only

way to access them is by writing assembly language, since rotations do not

have a standard operator like the shifts do. Common programming rota-

tions can be accomplished with a combination of shifts and logical opera-

tors, which is the slowest method of all. In its most general case, a rotation

will take one subtraction, two shifts, and a logical OR operation. Remember

to use the unsigned right-shift operator:

(x >>> n) | (x << 32-n); // rotate right n positions, x 32-bit int

(x << n) | (x >>> 32-n); // rotate left n position, x 32-bit int

There is no special typographical symbol for rotations; we normally use

the abbreviations ROTR or ROTL, although certain technical papers may

define their own symbols in their work.

Bit Concatenation

Bit concatenation is the process of simply joining together two sets of bits

into one word. If b is n-bits long, then a || b is a << n | a:

a = 101

b = 0011

a || b = 11 << 4 | 0011 = 1010000 | 0011 = 1010011

MSB and LSB operations

Two other common operations are extracting a certain number of most-

significant bits (MSB) or least-significant bits (LSB). We’ll use the notation

MSBn(a) to denote extracting the n most-significant bits. Likewise with

LSBn(a). These operations are accomplished by making the appropriate

mask, and in the MSB case, by shifting appropriately:

MSB3(10111111) = (10111111 & 11100000) >>> 5 = 101

LSB2(11111101) = 11111111 & 00000011 = 00000001

Packed Words

Shifts are useful in creating packed words, where you treat the word as an

array of bits rather than as a number, and where the bits represent anything

you like. When doing this, you should use an unsigned numeric type.

Unfortunately, Java and many scripting languages do not have unsigned

types, so you must be extremely careful. As an example, suppose you have

two variables that are numbers between 0 and 15. When transmitting or

storing them, you might want to use the most space-efficient representa-

tion. To do this, you represent them not as 8-bit numbers from 0 to 255, but

as two numbers each with 4 bits. For this example, we’ll use pseudocode

and use the type int to represent an unsigned 8-bit value:

int a = 5; // 00000101

int b = 13; // 00001101

int packed = a << 4 | b; // = 00000101 << 4 | 00001101

// = 01010000 | 00001101

// = 01011101

To undo the operation:

b = packed & 0x0F; // 01011101 & 00001111 = 00001101

a = packed >>> 4; // note unsigned right-shift

// 01011101 >>> 4 = 00000101

Likewise, you might wish to view the byte as holding eight true/false

values:

int c = b0 | (b1 << 1) | (b2 << 2) | (b3 << 3) | (b4 << 4)

| (b5 << 5) | (b6 << 6) | (b7 << 7);

int mask = 0x01;

b0 = c & mask

b1 = (c >>> 1) & mask;

b2 = (c >>> 2) & mask;

and so on. If you were using an array, you could do this dynamically in a

loop as well.

for (int i = 0; i < 8; ++i)

b[i] = (c >>> i) & mask;

It’s quite easy to make mistakes, especially when the packed word has a

complicated structure. If you are writing code and the answers aren’t what

you’d expect:

Check to make sure you are using the unsigned right-shift operator

>>> instead of the signed version >>.

Check that the language isn’t doing some type of automatic type

conversion of signed values (e.g., turning bytes into integers before

the shift operation happens).

Check to make sure your masks are correct.

If you are using dynamic shifts, make sure the shift amount isn’t

larger than the word size. For instance, if you are shifting a byte,

make sure you aren’t doing >> 9. How this is interpreted depends

on the environment.

Integers and Endian Notation

Endian refers to how a string of bytes should be interpreted as an integer,

and this notation comes is two flavors: Little-Endian and Big-Endian. The

names come from Jonathan Swift’s Gulliver’s Travels, where the tiny people,

Lilliputians, were divided into two camps based on how they ate eggs. The

Little-Endians opened the eggs from the small end, and the Big-Endians

opened theirs from the big end. As you might expect with that reference,

there are pros and cons to Big-Endian and Little-Endian representations,

but overall it doesn’t make any difference except when you have to convert

between the two formats. A comparison of the two is shown in Table 1.7.

Table 1.7 Comparison of Little-Endian versus Big-Endian Representation

ENDIAN TYPE

B0B1B2B3 = 0XAABBCCDD

SAMPLE

MICROPROCESSORS

Little-Endian

aa bb cc dd

Intel x86, Digital (VAX,

Alpha)

Big-Endian

dd cc bb aa

Sun, HP, IBM RS6000, SGI,

“Java”

Big-Endian, also know as most-significant byte order (MSB) or network

order, puts the most-significant or highest byte value first. This is equiva-

lent to how we write decimal numbers: left to right. The downside is that

many numerical algorithms have to work backward starting at the end of

the array and working forward (just as you would manually with pencil

and paper).

The Little-Endian, or least significant byte (LSB), format is the opposite.

This makes it harder for humans to read hex dumps, but numeric algorithms

are a little easier to implement. Adding capacity (or widening conversions)

is also easier, since you just add bytes to the end of the array of bytes (i.e.,

0xff becomes 0xff00).

Fortunately, regardless of what the byte Endian order is, the bits within

bytes are always in Big-Endian format. For instance, 1 is always stored in a

byte as 000000012 no matter what platform.

The Endian issue becomes critical when you are working with heteroge-

neous systems—that is, systems that use different Endian models. When

shipping bytes between these machines, you must use a standard Endian

format or an ASCII format. In many other programming languages, you

must determine in advance what the Endian architecture is and adjust sub-

sequent bit operations appropriately. For cryptographic applications this

becomes critical, since you are often manipulating bits directly.

With Java, the underlying architecture is hidden and you program using

a Big-Endian format. The Java virtual machine does any Endian conversion

needed behind the scenes.

For C and C++ programmers, normally a BIG_ENDIAN or LITTLE_

ENDIAN macro is defined by the compiler or from an include file. If not,

you can use code similar to this for testing. It sets raw memory and then

converts it to an integer type. The value will be different depending on the

CPU’s Endian type. This C code assumes an int is a standard 4 bytes or

32 bits, but you may wish to generalize:

int isBigEndian() {

static unsigned char test[sizeof(unsigned int)] = {0, 0, 0, 1};

unsigned int i = *(unsigned int) test;

if (i == 0x0000001) return 1; // true, big-endian

return 0; // false, little-endian

}

Java Numerics

We’ll now apply the information in the previous section to the Java

numeric model. While notation is similar to the C or C++ model, there are

Java-specific issues with using signed and unsigned types—specifically,

byte arrays. Java also provides class wrappers around the native types, as

well as an unlimited-capacity integer arithmetic.

Basic Types

Java provides the standard basic numeric types—integers with 8, 16, 32,

and 64 bits, and floating-point types with 32- and 64-bit representations.

Unlike C and other languages, all the types are signed—there are no native

unsigned types. Table 1.8 shows Java’s primitive numeric types.

For integer types, a literal integer can be expressed in decimal or hexa-

decimal formats (using lower- or uppercase letters for hex digits):

int i1 = 1000;

int i2 = 0x3e8; // == 1000 decimal

int i3 = 0x3E8; // same

For literal long values, a prefix of L or l should always be used, even if it

appears unnecessary:

long l1 = 1000; // compiles but is not recommended

long l1 = 1000L; // recommended

long l2 = 0xfffffffff; // won’t compile, error

long l2 = 0xfffffffffL; // correct

Table 1.8 Primitive Numeric Types in Java

NAME

TYPE

LOGICAL SIZE

RANGE

byte

signed integer

8 bits

–128 to 127

short

signed integer

16 bits

–32768 to 32767

int

signed integer

32 bits

–2,147,483,648 to

2,147,483,647 (2.1 billion)

long

signed integer

64 bits

–9,223,372,036,854,775,808

to 9,223,372,036,854,775,807

(± 9.2 × 1018)

float

ANSI/IEEE 754

32 bits

±1.4 × 10-45 to

floating point

±3.4028235 × 1038

(6–7 significant decimal digits)

double

ANSI/IEEE 754

64 bits

±4.9 × 10-324 to

floating point

±1.7976931348623157 × 10308

(15 significant decimal digits)

Specifying literal values for bytes can be tricky because a byte is signed

from –128 to 127, while very often you’ll be using constants specified from

0 to 255. If the value is within –128 to 127, the code will compile. If the value

is from 128 to 255, you can either convert it to its negative equivalent or use

a cast operation. The same principles apply to the short type:

byte b1 = 10;

byte b2 = 189; // error, out of bounds

byte b3 = -67; // = 189, ok

byte b4 = (byte) 189; // ok

byte b4 = (byte) 0xbd; // ok

Floating-type literals are assumed to be a double type unless suffixed

by an f for float. Floating types can also be representing using scientific

notation using valEscale = val × 10scale.

float f1 = 0.1; // compiler error

float f2 = 0.1f; // by default “0.1” is a double type

double d1 = 0.1;

double d2 = 1.0E2 // == 100

In practice, the short type is rarely used because it almost always is

converted into an int type before anything else is done. The float

type also isn’t used because it doesn’t provide enough precision for most

applications.

Type Conversion

Conversions between types can either be widening or narrowing. A widen-

ing conversion is where the new type can represent larger numbers and can

“contain” the old type. For an integer-to-integer type conversion (i.e.,

short to long), this just means putting the same number in a larger

container. Integer-to-floating-point conversions are also considered as

widening, but some of the least-significant digits may be scrambled or

zeroed due to how floating point numbers are presented. These types of

conversion happen automatically and silently either at compile time or at

run time. For example:

int i1 = 123;

long l1 = i; // ok -- l = 123

float f1 = i; // ok -- f = 123

int i2 = 123456789;

float f2 = i; // ok, but f = 123456792

Narrowing conversions may result in a loss of magnitude and precision.

Any conversion from floating-point to an integer type is considered nar-

rowing and is clearly a larger integer type to a smaller integer type. Java

will not automatically do narrowing conversions, and a compiler error will

be issued if the compiler detects such a conversion. To do these conver-

sions, you must explicitly declare them with the cast operator.

Converting floating point to integers drops or truncates any digits to the

right of the decimal point:

float f1 = 0.123456;

int i0 = f1; // compiler error -- requires a cast.

int i1 = (int) f1; // == 0

float f2 = 0.99999;

int i2 = (int) f2; // == 0

float f3 = 100.9;

int i3 = (int) f3; // == 100

float f4 = -100.9;

int i4 = (int) f4; // == -100

int i5 = 0xffL; // compiler error; long to int conversion

For narrowing conversions from one integer type to another, the least-

significant bits (or bytes) form the larger type:

int i1 = 0xfffff01;

byte b1 = (byte) b; // b = 0x01;

These rules are summarized in Table 1.9, with N representing a narrow-

ing conversion, W for a widening conversion, and W* for a widening

conversion that may result in a loss of precision.

Table 1.9 Primitive Type Conversion

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

byte

W

W

W

W

W

short

N

W

W

W

W

int

N

N

W

W*

W

long

N

N

N

W*

W*

float

N

N

N

N

W

double

N

N

N

N

N

Unsigned to Signed Conversions

There are no unsigned types in Java; however, you can still use the signed

types as an unsigned container. Even though a byte is signed, it still has

8 bits and can represent an integer from 0 to 255 even if Java thinks other-

wise. To use the unsigned value, it must be converted into a larger type

using an AND mask.

To convert a unsigned byte:

byte c = (byte) 254; // b = -2 in Java.

short c = (short)(x & 0xff); // c = 254

int c = x & 0xff // c = 254

long c = x & 0xffL; // c = 254, must put L at end of 0xffL

Unsigned short values are converted by the following:

int c = x & 0xffff;

long c = x & 0xffffL;

Finally, the unsigned int is converted into a signed long by the

following:

long c = x & 0xffffffffL

It’s critical to put an L at the end of the mask when working with long

types. Without it, Java will respond as if you are doing an int computa-

tion, then doing a widening conversion to a long:

byte b = (byte)0xff;

long b = (b & 0xff) << 56; // Wrong. b = 0

long b = (long)((int)(b & 0xff) << 56); // same as previous

long b = (b & 0xffL) << 56; // Correct. b = 0xff00000000000000

Overflow

When a computation exceeds an integer type’s bounds, the value is “rolled

over” silently; no exception is thrown:

byte b = (byte) 127; // b = 0111111

b++; // b = 1000000, but b has the value of -128

While the silence may seem disturbing, in practice overflow is rarely a

problem. For floating-point computations overflow is still silent, but the

result is not rolled over. Instead, the type takes on a special Infinity

value that can be checked using normal comparison operators or by using

the Double.isInfinite method. Also note that Java does not think 0.0

is the same as 0 (zero). Instead, 0.0 is treated as a number that is very close

to zero, so division by 0.0 results in an infinite value. Keep in mind, how-

ever, that Java throws an exception if you try to divide by the integer 0. In

the event that a computation doesn’t make sense, the value is NaN, which

stands for “not a number.” If you wish to see if a value is NaN, then you

must use the Double.isNaN method. Various examples of manipulating

double types are shown as follows:

double d1 = 1.7E308 * 2.0; // overflow = Infinity

double d2 = 1.0/0.0; // = Infinity

double d3 = -1.0/0.0; // = -Infinity

int i = 1 / 0; // throws a DivisionByZero exception

double d4 = 0.0/0.0 // = NaN

boolean b = (d4 == d4); // = false, always

boolean b = Double.isNan(d4); // true;

boolean b = Double.isInfinite(d3); // true

Arrays

In Java, native-type arrays are treated and created as if there were objects

created using a constructor with the new operator. Arrays can also be set

with initial values, as shown in the following code snippet.

int[] a= new int[3]; // all three elements set to zero

int[] a = {1, 2, 3}; // a pre-set 3 element array.

Once constructed, an array is not resizable. If dynamically allocated

storage is needed, you should use one of the collections in the java.util

package. The index to arrays is with int types, and the first element starts

at 0 (as in C). Small types will have widening conversions done to them,

and the long type cannot be used without a cast operation. Thus, single

dimensional arrays cannot be larger than 231, or roughly 2.1 billion, entries.

Hopefully, this will be sufficient for your needs.

Since arrays are objects, assignment is done by reference, as shown in the

following:

int[] a = {1,2};

int[] b = a;

b[0] = 100; // modifies the object that both a and b point too.

System.out.println(a[0]); // will print 100, not 1

a = null; // b is null, but a is intact.

In addition, they can be copied using the clonemethod. To use this, cast

the result to the correct array type:

int[] a = {1, 2};

int[] b = (int[]) a.clone(); // b is a “deep copy” of a

The System.arraycopy method is extremely useful for copying parts

of an array or concatenating arrays. It makes a native system call to copy

memory directly instead of copying each element individually and is much

faster than writing a custom for loop:

System.arrayCopy(Object input, int inputOffset,

Object output, int outputOffset, int length)

Even though the method has Object in its signature, this method only

works on array types. Other useful methods for manipulating native

arrays can be found in the class java.util.Arrays and are summarized

in Table 1.10.

Table 1.10 Summary of Methods from java.util.Arrays

JAVA.UTIL.ARRAYS METHOD

DESCRIPTION

static boolean

Returns true if and only if the arrays are

equals(type[] a, type[] b)

the same length and every element is

equal.

static void

Fills an array with a single value.

fill(type[] a, type val)

static void

Performs a fast numeric sort. Array is

sort(type[] a)

modified.

static void

Sorts only part of an array.

sort(type[] a, int fromIndex,

int toIndex)

static int

Performs a binary search of a sorted

binarySearch(type[] a,

array. Returns the array index if a match

type val)

is found and –1 if no match. Arrays must

be sorted first.

Numeric Classes

Each numeric type has a matching class that acts as an object wrapper, as

shown in Table 1.11.

These classes do not have any support for mathematical operations—

you can’t add two integer objects directly. Instead, these classes are pri-

marily used to allow numeric types to be used in methods that expect an

Object type, such as in collections from the java.util package (e.g.,

ArrayList, HashMap). The wrapper classes also provide basic string for-

matting and parsing from strings for numbers. Since they are objects, they

can also be null. Likewise, objects are always pass-by-reference.

public void changeInt(Integer i) {

i = new Integer(“1”);

}

Integer I = new Integer(“0”);

changeInt(i); // i is now 1

All of the classes share some common traits (examples are just shown for

Integer and Long):

Two public fields, MAX_VALUE and MIN_VALUE, in case you don’t

want to memorize the previous table.

byteValue, doubleValue, floatValue, intValue, and

longValue methods that return the underlying number in

a native type.

A static method valueOf that accepts a string and an optional radix

to parse a string and return an object. The radix can be 2 to 32.

static Integer Integer.valueOf(int val)

static Integer Integer.valueOf(int val, int radix)

static Long Long.valueOf(long val)

static Long Long.valueOf(long val, int radix)

A static method parseClass (where Class is the name of the class,

such as Byte or Integer) that also accepts a string, and an optional

radix to parse a string returns the native type instead of an object.

static int Integer.parseLong(int val)

static int Integer.parseFloat(int val, int radix)

static long Long.parseLong(long val)

static long Long.parseLong(long val, int radix)

toString that returns the number as unformatted decimal number.

Table 1.11

Java Class Wrappers for Native Types

NATIVE TYPE

MATCHING JAVA.LANG CLASS

int

Integer

byte

Byte

double

Double

float

Float

long

Long

short

Short

The Long and Integer classes have a few other useful static methods

that provide an unsigned representation of the number in binary, hex, or

octal formats as shown:

static String Integer.toBinaryString(int val)

static String Integer.toHexString(int val)

static String Integer.toOctalString(int val)

static String Long.toBinaryString(long val)

static String Long.toHexString(long val)

static String Long.toOctalString(long val)

Binary representation is especially useful for debugging bit fields.

These objects do not add any “leading zeros” to the output, so new

Integer(16).toHexString() just returns F and not 0F or 0x0F.

Booleans and BitFields

Java provides a native boolean type that can either be true or false.

Unlike C and C++, it is not a numeric type and does not have a numeric

value, and it is not automatically cast. Like the numeric types, there is also

a boolean class wrapper. If you want to create an array of boolean values,

you could use Boolean with one of the collection classes (e.g.,

ArrayList). However, there is a special class java.utl.BitField

specially designed for use with boolean types that provides a huge perfor-

mance increase and memory savings. More specialized applications will

convert a native type or byte array into a bit field directly.

Chars

Java has another native type char that represents a Unicode character and

is used by String and StringBuffer internally. It is represented by an

unsigned 16-bit integer, but it is not a numeric type. It is automatically cast

to an integer in situations when needed, such in mathematical or bitwise

operations, but it’s not automatically cast to any other type. However, you

can explicitly convert a char to another type by using a cast operator.

Since it’s unsigned, it might be tempting to use char in places for math-

ematical purposes and for raw bit fields. In practice, there is not much

point, since char types are automatically converted into an int before any

operation is done, eliminating any advantage

Working with Bytes

Because the Java byte type is signed and because of the automatic conver-

sion rules, working with bytes and byte array can be frustrating. Following

are some tips and tricks to simplify byte manipulation.

Sign Problems

It’s always easiest to specify a constant byte value by using a cast (later in

this chapter you will see tricks on making byte arrays):

byte b = 0xff; // Compiler error. 0xff is an ‘int’ type

byte b = (byte) 0xff; // right

Bytes range from –127 to 128, and not 0 to 255. Setting a byte with the

value 128–255 and a cast is fine—the byte contains the correct bits. The

problem is when you want to retrieve the unsigned value or when you per-

form a computation with bytes and ints. For instance:

byte b = 127;

int i = 3;

System.out.println((i+b)); // Prints -126 and not 130

results in –127 being printed, not 129. In the addition step, byte b is auto-

matically converted to an int type, including the sign. While b was set

with 128, internally it’s really –1, and this is its value when converted.

The key point to remember is that all bitwise and arithmetic operators

work on int and long types only. All other types are automatically cast. If

you are using a byte as an unsigned value, you must manually convert the

type to an int or long before the operation using val & 0xFF or val &

0xFFL respectively.

The problem is more mysterious with the shift operators. Let’s say you

want to do a right-shift on 0xFF. In binary, this is 11111111, so you’d expect an

unsigned right-shift to be 0111111, or 0x7F. The natural way to write this is:

byte b1 = (byte) 0xff;

byte b2 = b >>> 1; // compiler error

However, this results in a strange error from the javac compiler, and the

problem isn’t clearly defined:

aprogram:java.6: possible loss of precision

found: int

required: byte

byte b2 = b >>> 1;

^

You might try and fix this by casting the result:

byte b1 = (byte) 0xff;

byte b2 = (byte)(b >>> 1); // error #2

This performs the compile operation; however, the result is still wrong.

The value of b2 is not 0x7f, but strangely remains at 0xff or –1. Again, Java

is converting the byte into an integer before the shift. Since bytes are signed,

it converts the byte value of –1 into an int value of –1, which is 0xffffffff.

Then it performs the shift, resulting in 0x7fffffff. Finally, the cast takes the

least bits, or 0xff, and converts them to a byte. The step-by-step details

are listed in Table 1.12. The correct solution is to treat the byte as an

unsigned value, then convert to an int, and then do the shift and cast back

down to a byte:

byte b1 = (byte) 0xff;

byte b2 = (byte)((b & 0xff) >>> 1); // correct:

Table 1.12 Comparison of Bit-Shifting a Byte Type

STEPS

INCORRECT

CORRECT

Initial value

0xff = –1

0xff = –1

Convert to int

0xffffffff = –1

0x000000ff = 255

Right-shift >>>

0x7fffffff = large int

0x0000007f = 127

Cast down to byte

0xff = –1

0x7f = 127

To convert a byte back to its unsigned value, you have to do a little trick;

you must bitwise AND the byte with 0xff, that is, b & 0xff. For example:

byte b = 128;

int i = 1;

int unsignedByteValue = b & 0xff

System.out.println((i + unsignedByteValue); // 129

System.out.println((i + (b & 0xff)); // combined

This trick works by observing that the least-significant bytes in an int

match exactly with the byte type, as shown in Table 1.13.

Conversion of Integral Types to Byte Arrays

Java does not have any easy way of converting integral types into raw

bytes. There are the serialization methods, but these involve a lot of work

for something that should be relatively simple. For example:

public static byte[] intToByteArray(int i) {

byte[] buf = new byte[4];

b[0] = (byte) (i >>> 24);

b[1] = (byte) (i >>> 16);

b[2] = (byte) (i >>> 8);

b[3] = (byte) i;

}

public static int byteArrayToInt(byte[]) {

If (buf.length != 4)

throw new RuntimeException(“Bad Length”);

return ((b[0] & 0xff) << 24) | ((b[1] & 0xff) << 16) |

(b[2] & 0xff) << 8) | (b[3]);

}

Table 1.13 Comparison of Representation between int and byte Types

UNSIGNED

SIGNED

BYTE

INT

VALUE

VALUE

REPRESENTATION

REPRESENTATION

0

0

00

00000000

1

1

01

00000001

2

2

02

00000002

126

126

7d

0000007e

127

127

7f

0000007f

128

-128

80

00000080

Table 1.13

(Continued)

UNSIGNED

SIGNED

BYTE

INT

VALUE

VALUE

REPRESENTATION

REPRESENTATION

129

-127

81

00000081

130

-126

82

00000082

253

-3

fd

fffffffd

254

-2

fe

fffffffe

255

-1

ff

ffffffff

256

256

00

00000100

We could do the same thing for longs. Instead, we’ll demonstrate a dif-

ferent technique that builds the result by incrementally shifting the value

in question:

public static byte[] longToByteArray(long I) {

byte[] buf = new byte[8];

for (int j = 7; j >= 0; ++j) {

buf[j] = (l & 0xffL); // !! must use 0xffL, not 0xff !!

l >>>= 8; // l = l >>> 8;

}

}

public static long byteArrayToLong(byte[] buf)

{

long i = 0;

if (buf.length != 8)

throw new RuntimeException(“Bad Length”);

for (int j = 0; j < 8; ++j) {

i |= buf[j];

i <<= 8; // l = l << 8;

}

return i;

}

Converting to Hex Strings

Converting an array into a hexadecimal string is a fairly common task for

printing, data interchange, and debugging. A naive way to do hex conver-

sions is by using one of the previously mentioned byte array-to-long

methods followed by Long.toHexString. However this is very slow

and is limited to arrays smaller than 8 bytes. Even worse would be using

BigInteger (discussed later in the chapter). For example:

byte[] mybytes = ...;

BigInteger a = new BigInteger(1, mybytes);

return a.toString(16);

While mathematically correct, most programs operate on printing two

hex digits per byte; for example, decimal 16 is converted to 0F, not F. The

other problem is that this method is horrendously slow.

While these methods are useful in a pinch, for high performance appli-

cations they won’t do. If one is willing to use a little memory, performance

can be doubled or tripled. The conversion itself is fairly simple, but an

implementation that uses tables has some important benefits:

Provides typically higher performance (at the expense of some minor

setup and memory costs).

Allows using different alphabets instead of the standard. These

issues are further detailed in Chapter 4.

Typically allows simpler coding.

The conversion class starts out defining the hexadecimal alphabet (in

our case, the standard one). The hexDecode table is the inverse mapping;

for instance, character A is mapped to 10. Characters that are invalid are set

to –1. For example:

public class Hexify

{

protected static char[] hexDigits = {‘0’, ‘1’, ‘2’, ‘3’, ‘4’,

‘5’ ‘6’, ‘7’, ‘8’, ‘9’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’},

protected static int[] hexDecode = new int[256];

static {

for (int i = 0; i < 256; ++i) hexDecode[i] = -1;

for (int i = ‘0’; i <= ‘9’; ++i) hexDecode[i] = i - ‘0’;

for (int i = ‘A’; i <= ‘F’; ++i) hexDecode[i] = i - ‘A’ + 10;

for (int i = ‘a’; i <= ‘f’; ++i) hexDecode[i] = i - ‘a’ + 10;

}

Encoding is fairly straightforward. Each byte is split into two sections,

and the table is used to determine the appropriate character:

public static String encode(byte[] b) {

char[] buf = new char[b.length * 2];

int max = b.length;

int j = 0;

for (int i = 0; i < max; ++i) {

buf[j++] = hexDigits[(b[i] & 0xf0) >> 4];

buf[j++] = hexDigits[b[i] & 0x0f];

}

return new String[buf];

}

Decoding is more interesting. First, a function, given a hexadecimal

character, returns the integer value. This is done via a table lookup, and the

result is verified to make sure it’s valid. It’s always important to check the

input for validity, especially if you don’t have control of what the input is.

You could easily add or inline these checks into the main decode routine

for a minimal improvement in execution time, as follows:

protected static int getHexDecode(char c) {

int x = hexDecode[c];

if (x < 0) throw new RuntimeException(“Bad hex digit “ + c);

return x;

}

While the encode function always encodes two characters per byte, you

may receive a hexadecimal string that doesn’t conform to this rule. If you

are only processing strings that you created, you could check that the num-

ber of input characters is even and that an exception is thrown if it’s not.

The following implementation handles full generality. The string is checked

to see if it has an odd number of characters, by computing max & 0x01

(or, equivalently, max % 2), which just looks at the least-significant bit. If

it’s set, then the value is odd.

public static byte[] decode(String s) {

char[] input = s.charArray[];

int max = input.length;

int maxodd = max & 0x01;

byte b;

byte[] buf = new byte[max/2 + odd];

int i = 0, j = 0;

if (maxodd == 1) {

buf[j++] = getHexDecode[input[i++]]

}

while (i < max) {

buf[j++] == (byte)(getHexDecode[input[i++]] << 4 |

getHexdecode[input[i++]]);

}

return buf;

} //end of class Hexity

BigInteger

The BigInteger class in the java.math package provides a way of

doing computations with arbitrary large integer types. Unlike C++, Java

does not overload operators, so to do basic math, you have to call methods

to perform mathematical operations such as addition. All of the operations

create new BigInteger objects as results. The objects used in the opera-

tions remain untouched. For instance:

BigInteger b1 = new BigInteger(“1”);

BigInteger b2 = new BigInteger(“2”);

BigInteger b3 = b1.add(b2);

At the end, b1 is still 1, b2 is still 2, and b3 is what you’d expect it to be,

3. Modifying an object requires explicit code. For instance, if we wanted

b1 to contain the result of the addition, we would use self-assignment like

this:

b1 = b1.add(b2); // b1 is now 3

While the notation is different, the behavior is no different than using

normal operators. Adding 1 + 2 doesn’t change the original values of 1 or 2.

All of the usual arithmetic operations are available: add, subtract,

multiply, divide, and remainder. Another method, divideAnd-

Remainder, returns a two-element BigInteger array where element 0

is the division result and element 1 is the remainder. A complete list and

comparison of BigInteger operations is given in Table 1.14.

Table 1.14 BigInteger Arithmetic Operators

ARITHMETIC

NATIVE TYPE

BIGINTEGER

OPERATION

NOTATION

NOTATION

Addition

a + b

a.add(b)

Subtraction

a – b

a.subtract(b)

Multiplication

a * b

a.mult(b)

Integer division

a / b

a.divide(b)

Remainder

a % b

a.remainder(b)

Division with

int[] result =

BigInteger[] result =

remainder

{ a / b, a % b }

a.divideAndRemainder(b)

Negation

–a

a.negate()

Table 1.14

(Continued)

ARITHMETIC

NATIVE TYPE

BIGINTEGER

OPERATION

NOTATION

NOTATION

Exponentiation

Math.pow(a,b)

a.pow(b)

Random value

Random r = new Random(); Random r = ...

int bits = ...;

a = r.getInt();

BigInteger a =

new BigInteger(bits, r);

Absolute value

(a >= 0) ? a : b or

a.abs()

Math.abs(a)

Minimum of a, b

(a < b) ? a : b or

a.min(b)

Math.min(a,b)

Maximum of a, b

(a > b) ? a : b or

a.max(b)

Math.max(a,b)

Likewise, all the usual bit operations are available, as described in

Table 1.15. There is no need for a special unsigned right-shift operator, since

BigInteger has unlimited capacity; the result of shifting n places to the

right is the equivalent of dividing by 2n, regardless of sign. In addition are

a few new methods that simplify working with bits: testBit, setBit,

and flipBit. Two others called bitCount and bitLength need a bit

more explaining. For positive integers the results are what you’d expect:

bitLength returns the minimal number of bits required to represent the

integer, and BitCount returns the number of one-bits in the representation.

For negative numbers, bitLength gives a minimal length excluding the

sign bit, and bitCount returns the number of zeros in the representation.

Creating and Converting

BigInteger objects can be converted by using a string representation of a

number, a native type, or with a byte array. Using a native type, a string, or

a byte array representation of a number creates BigInteger objects.

Strings

You can create BigInteger objects by using a construction that takes a

string representation of a number. By default, the constructor expects the

string representation to be in decimal (base 10) form. However, you can

also use the standard base 16 representations, as follows:

BigInteger(String base10Rep)

BigInteger(String representation, int radix)

Table 1.15 BigInteger Bit Operations

NATIVE TYPE

BIGINTEGER

BIT OPERATION

NOTATION

NOTATION

AND

a & b

a.and(b)

ANDNOT

a & ~b

a.andNot(b)

NOT (complement)

~a

a.not(b)

OR

a | b

a.or(b)

XOR

a ^ b

a.xor(b)

Shift left n bits, signed

a << n

a.shiftLeft(n)

Shift right n bits, signed

a >> n

a.shiftRight(n)

Test bit n

(a & (1 << n) != 0)

a.testBit(n)

Set bit n

(a | (1 << n))

a.setBit(n)

Flip bit n

(a ^ (1 << n))

a.flipBit(n)

Conversely, you can retrieve a string representation in an appropriate

base by using the toString method:

String toString()

String toString(int radix)

Numeric Types

The BigInteger class can be created from a long by using the static

method:

Long val = 123456789123;

BigInteger bi = BigInteger.valueOf(val);

Once you have the BigInteger object, it can be converted directly

into a numeric type by using intValue, longValue, floatValue, or

doubleValue. Not that we’ll mind, but there is no method to directly

convert into a short or char value. If the value is too large to fit into the

numeric type, a silent narrowing conversion is done (the high bits are

chopped off).

Byte Arrays

Generating byte arrays and constructing byte a. Assuming you have a

BigInteger object, a byte array can be generated with:

byte[] toByteArray()

And the results of this output can be used to create a new BigInteger

object:

BigInteger(byte[] array)

However, because of signing issues, typically this format is not used for

cryptographic purposes. If the bit length is a multiple of 8, and it always is

for cryptographic applications, the byte array starts with an extra byte

indicating the sign (0 for positive). Either your application can deal with a

leading zero, or you can strip it away, as in the following:

byte ba[] = bi.toByteArray();

if (ba[0] == 0) {

byte[] tmp = new byte[ba.length - 1];

System.arraycopy(ba, 1, tmp, 0, tmp.length)

ba = tmp;

}

A lower-speed option is to use strings as the common medium:

Byte ba[] = Hexify.decode(bi.toString(16);

Since BigInteger expects a sign-bit, you’ll need to use a special con-

structor and manually indicate the sign by passing in 1, –1, or 0 for positive,

negative, or zero:

BigInteger(int signum, byte[] magnitude)

BigInteger and Cryptography

BigInteger has a few special methods specifically designed for cryptog-

raphy, listed in Table 1.16. These operations are discussed fully in Chapter 3.

Secret Methods in BigInteger

For Sun Microsystems to effectively test for primality, many generic

methods and algorithms had to be implemented; however, they are not

part of the public API. For people working in numbers theory or who are

developing more advanced cryptographic algorithms, these “hidden”

methods may be useful. Unfortunately, they are declared private; but with

a few modifications to the source code, you can create your own enhanced

BigInteger variant.

Table 1.16 BigInteger Operations Useful in Implementing Cryptographic Algorithms

OPERATION

BIGINTEGER NOTATION

Create a random nonnegative integer

BigInteger

uniformly distributed from 0 to 2n–1

(int numBits, Random r)

Create a random integer that is prime

Random r = ...;

with certainty 1 – 2n

int bitLength = ...;

BigInteger a =

BigInteger(bitLength, r)

Check if prime with certainty using

a.isProbablePrime()

IEEE standard of 1 – 2100

a mod b

a.mod(b)

an mod b

a.modPow(b, n); // n is an

int, not BigInteger

find a-1, such that aa-1 = 1 mod b

a.modInv(b)

Greatest common denominator of a, b

a.gcb(b)

The most interesting of these are the following methods:

int jacobiSymbol(int p, BigInteger n); // this is “package protected”

private boolean passesMillerRabin(int iterations)

private boolean passesLucasLehmer()

private static BigInteger lucasLehmerSequence(int z,

BigInteger k, BigInteger n)

In addition, there is an implementation of sieve for testing primality of

small numbers in the class BitSieve.

To “free” them:

1. Copy all the source files from this directory into a new directory.

2. Edit each file and change the package name from sun.math to one

of your own choosing.

3. Make the BitSieve class “public.”

4. Change the desired methods and classes to public. Note that the

method jacobiSymbol in BigInteger and the class BitSieve

do not have an access modifier, so you have to add public in front

of the method.

5. Compile.

This code is copyrighted by Sun Microsystems. For commercial use and

distribution, refer to the license agreement distributed with the source

code.

Now that we have a full understanding of bit operations and the Java

model, we’ll discuss secret and public key cryptography.

35

Secret key cryptography is based on cryptographic functions that rely on a

single key that must be kept confidential. Secret key ciphers perform

encryption and decryption using the same key, and cryptographic hashes

must be computed and verified using the same key.

Symmetric Block Ciphers

In general, a cipher is a function that maps a message, known as plaintext,

into an unreadable form, known as ciphertext, by use of a key. This is known

as encryption. Without the key, you cannot do the inverse transformation,

the decryption, or turning the ciphertext back into its original plaintext

form. In lay terminology code is often used interchangeably with cipher.

However, they have different meanings. A code is translation of a message

by use of a dictionary, without any key being used.

Modern block ciphers operate on a fixed number of bytes in a single pass,

for instance, transforming 8 bytes plaintext into 8 bytes ciphertext. Messages

larger than the block must be chopped into pieces, and if the message isn’t

a multiple of the block size, the message must be padded appropriately.

Secret Key Cryptography

C HAPTE R

2

Cipher Properties

To be useful a cipher must be secure and must be usable. What makes a

cipher secure? For ciphers that have stood the test of time (i.e., those in

common use and have not been cracked), a rough estimate of security is

key length, with the longer the key being more secure. A cipher is deemed

secure if there is no attack that is significantly better than brute force. How-

ever, clearly this by itself is not enough, and the notion of security is tied to

current or contextual computation resources available to an attacker. An

algorithm may be structurally secure, but if the key is too small, brute-force

attacks can easily break the cipher by directly recovering the key.

Security Properties

The fundamental property of an encryption algorithm is that the output be

effectively random, regardless of what the input or key is. From this prop-

erty, many notions emerge regarding how well the algorithm performs:

Semantically secure, polynomially secure, or “indistinguishably

of encryptions.” There are many variants of this idea, but the

simplest is that, given two plaintext messages and their ciphertext

equivalents, it is impossible to determine which plaintext matches

the correct encrypted version with a probability greater than .5. In

other words, the ciphertext does not reveal any information about

the plaintext.

Nonmalleability. Given a ciphertext, it is impossible to manip-

ulate the ciphertext to produce related plaintext. In other words,

a 1-bit change to a ciphertext should result in a very different

plaintext.

Plaintext-aware.

It is computationally infeasible to produce a valid

ciphertext without knowing the corresponding plaintext. For instance,

you should be able to generate ciphertext that decrypts to 7-bit

ASCII, unless you are encrypting the original plaintext. This is useful

to prevent the generation of false messages.

Lack of algebraic structure.

It’s very important that a cipher doesn’t

have certain algebraic properties that might allow multiple encryp-

tion or multiple keys to negate each other. In particular, it is important

that the cipher does not form a group where:

Each key has an inverse key.

Encrypting something twice with different keys is equivalent to

encrypting it once with another key.

The result of encrypting with one key and decrypting with

another key is the same as encrypting with one key. This would

be very useful for attack purposes.

Ideally, you don’t even want a subset of keys and messages to form a

group; however, proving that is quite tricky. You also don’t want “fixed

points,” that is, messages that encrypt to themselves.

Brute-Force Attacks

Brute-force attacks are the simplest attack, and for symmetric ciphers, this

means trying each key, one after another, until the message decrypts prop-

erly. In decryption, there is no inherent flag saying “this is the right key” or

“decryption occurred correctly.” For instance, a ciphertext with one key

might decrypt to cat, while the same ciphertext decrypted with a different

key might decrypt to dog. Another way of looking at this is if you encrypted

a random number or binary data, an attacker using the brute-force tech-

nique would be unable to determine if the resulting plaintext was the orig-

inal number without using some additional context.

Typically, you do not encrypt random binary data, but rather something

with more structure. During brute-force attacks, the decrypted result is

checked to see if the structure exists. For instance, English text is encoded

using ASCII where only 7 bits in each byte are used. During a brute-force

attack, every output that is only 7 bits is stored in a “candidates” file, which

will need to be checked by an actual human to determine what the correct

result is. False-positive results are extremely rare for language texts.

Even with small key sizes, such as DES (for Data Encryption Standard),

there are 255, or approximately 3.6 × 1016, keys to check. When measuring

cryptographic security, you can’t be afraid of large numbers. While 255

sounds out of reach, in fact, it’s quite doable. Brute-force algorithms are

easily parallelizable, and with the Internet, tens of thousands of computers

are easily pooled for such attacks, thereby enabling trillions of keys to be

tested per second.

Other Attacks

Brute-force attacks can easily be thwarted by using a larger key, so a full

key-space scan becomes impossible. The following attacks attempt to find

weaknesses in the cipher and provide shortcuts to a brute-force attack. If a

significant attack occurs, the cipher is said to be broken and thus insecure.

Ciphertext-only attack. This is probably the most non-networked

application. The adversary here has a ciphertext message and that’s

it. In the vast majority of cases, this attack indicates brute force,

where every key is checked.

Chosen plaintext. Here the attack has a bulk list of plaintexts

and receives the corresponding ciphertext and through analysis can

determine the key.

Adaptive chosen plaintext. Similar to the previous except the process

is more interactive. Plaintext may depend on the results of the previ-

ous plaintext and ciphertexts.

Chosen ciphertext. The attack selects ciphertext, which is then

decrypted.

Related key. The attack requires the plaintext and the ciphertext from

different keys.

Partial known plaintext attack. This attack is one of my own devis-

ing. I haven’t seen anything in literature of this. Here the attacker

has the ciphertext and some of the plaintext. This potential attack

turns out to have applications for database security.

Power attack.

This only applies to smartcard-like devices. An attacker

can actually measure the radiation and time of the cryptographic

computation and estimate the key or size of the key.

Many of these attacks come in the form of linear or differential (or some

combination) analysis. Knowing the algorithm being used and with many

pairs of text (plaintext and/or ciphertext), an attacker can determine the

internal structure and thus extract the key or limit the key space.

Many might be wondering what the point is of the attacks if the attacker

already has plaintext-ciphertext pairs. The trick is that the text pairs are

chosen by the attackers and by themselves are not interesting (the values

are more or less random or generated by an algorithm). The plan is to use

these pairings to recover the key and thus decrypt ciphertexts that do con-

tain useful information. Still, the attacks may seem odd, yet many network

protocols allow many of these types of attacks since they echo back. For

common ciphers most of these attacks require a lot of (at least 240) pairings,

which would certainly be noticed by even the most primitive network

monitoring. Another way these attacks can be exploited is by an attacker

having access to either the encryption or decryption functions of a system.

A hacker breaking into a system might not have access to the keys but will

have access to system calls to do encryption. Then the attack might only

generate a modest strain on the system’s resources and not be noticeable.

Common Block Ciphers

There are dozens of block ciphers; however, you really only need to consider

a small handful (see Table 2.1). Block ciphers come in two flavors: 64-bit

block and 128-bit block.

Data Encryption Standard (DES)

The Data Encryption Standard, first developed in the 1970s, is the most

widely used cipher. It was the first industrial-strength algorithm that was

fully specified to the public, in Federal Information Processing Standard,

or FIPS, 46-2. (FIPS are the standards used by the U.S. federal government

for information processing.)

DES keys are specified as a 64-bit value; however, only 56 bits are used

as part of the key. For each byte, the 8th bit is a parity bit that causes bytes

to have an odd number of one-bits in them (other wise known as “odd-

parity”). This is left over from the early days of computing.

Table 2.1 Selected Secret-Key Ciphers

BLOCK SIZE

KEY SIZE

ALGORITHM

YEAR

BITS

BITS

REASON

DES

1977

64

56

Standard

Triple DES

1985

64

168

Standard

Blowfish

1993

64

40–442

Popular, fast

RC5

1996

64 standard,

0–2048

Popular,

but supports

supported

32 and 128

IDEA

1992

64

128

Popular

Rijndael

1998

128, 192, 256

128, 192, 256

Standard, fast

Twofish

1998

128, 192, 256

128, 192, 256

Popular

RC6

1998

128, 192, 256

128, 192, 256

Popular,

supported

It’s worth looking at the inner workings of DES, since the ideas are used

in many ciphers and the concept is actually fairly simple. First, the key is

expanded. Although the key itself is only 56 bits, internally the key is

expanded into sixteen 48-bit values. Likewise, various stages in the pipeline

also expand data from 32 bits to 48 bits. Later, these expanded values get

contracted as well. While, theoretically, expanding a key into larger values

doesn’t make a key longer than it really is, it does have the benefit of allow-

ing more complicated functions to be constructed with simpler code.

The basic structure of DES is reused in many other ciphers and is what is

known as Feistel structure. The plaintext block is split into left and right sec-

tions, and then the following is iterated:

Ln+1 = Rn

Rn+1 = Ln ⊕ f(Rn, Kn)

where Kn is a subkey of the main key and f is a mixing function. This mix-

ing function is really what makes DES successful. It uses what are called

S-Boxes, which are maps from a 6-bit input to a 4-bit output (the contraction)

based on a lookup table. In DES (and other ciphers) the map is nonlinear and

normally specified by using tables. The security of DES depends entirely

on the values in the S-Boxes and exactly how they were constructed is a

mystery to this day.

DES Security

As of this writing, DES has had no attacks that are significantly better than

brute force. So in structure DES is solid and secure. However, the problem

of short key size and vulnerability to brute-force attacks remains. Nowa-

days DES can be cracked either by specialized hardware or by distributed

networks in a few hours. (See www.eff.org/descracker/ for full

details.) What’s interesting is that various modifications have been pro-

posed to enhance DES. It has been shown that just about any tampering or

enhancements to the S-Box structure results in a much weaker cipher. These

days there really is no reason to use DES except in legacy applications.

Triple DES

While DES appears to have a solid design, the key space is just too short

and is susceptible to brute-force attacks. To help fix the problem, an interim

solution was proposed that effectively performs multiple iterations of DES

on the message using different keys. What’s known as Triple DES, TDES,

DES-EDE, or 3DES can provide 168 bits of security using three DES keys. A

less common version is a two-key version providing 112 bits of security

where three operations are done, but where one key is used twice. It does

this by composition of three DES operations, as shown in Table 2.2.

Table 2.2 Triple DES Operations

OPERATION

3 KEYS

2 KEYS

Encryption

Ek3(Dk2(Ek1(m)))

Ek1(Dk2(Ek1(m)))

Decryption

Dk3(Ek2(Dk1(m)))

Dk1(Ek2(Dk1(m)))

In software this is a bit slow, but for basic networking (100 megabit), the

speed limitations aren’t much of a problem, since the algorithm can be

implemented in hardware fairly efficiently. Triple DES is not a FIPS but

instead is specified under the ANSI X.9.52 standard and NIST SP 800-20

(ANSI stands for American National Standard Institute, NIST for National

Institute of Standards and Technology).

Triple DES Security

Providing 168 bits of security, Triple DES is by far the most studied,

strongest algorithm available. When in doubt, DES is probably the best

choice. Despite more modern ciphers, nobody suggests that Triple DES is

less secure than others.

Why Not “Double DES”

Using two keys, or “Double DES,” isn’t done because the key size of 112 is

too small; moreover, it isn’t much more secure than a single iteration of

DES. Given a plaintext-ciphertext pair, an attacker could extract the key

using a brute-force attack requiring up to 2112 computations. However, the

attacker can trade computation costs for memory storage by using a “meet-

in-the-middle” attack. There are many variations, but the simplest is to

compute 256 different encryptions of the plaintext and store them, then

begin decrypting the ciphertext using keys and compare the result to the

table. The attacker will likely get multiple matches and would need

another plaintext-ciphertext pair to verify that is the correct key. The worst

case is that the attacker would need to perform 256 decryptions, but even

then that’s only 257 total operations.

There is nothing specific to DES in this; any doubled-cipher will have

this problem. The attack extends to the two-key version of Triple DES as

well, but it’s generally less feasible, since the storage requirements become

quite large.

Blowfish

In 1994, 56-bit DES was becoming obsolete and Triple DES was still very

slow. Every other secure encryption algorithm was classified, incompletely

specified (GOST, a Soviet algorithm was known, but the critical S-Boxes

were classified), or known but unusable, since they were patented (Khufu,

REDOC II, IDEA) or crippled for legal export (RC2, RC4).

In late 1993, Bruce Schneier presented the Blowfish cipher, a free, non-

patented, drop-in replacement for DES with a 64-bit block size but a vari-

able key length of up to 448 bits [Schneier1994]. In today’s world awash

with cryptologists and a healthy open-source movement, it’s easy to forget

how revolutionary this was—strong cryptography for anyone who wanted

it. Even better, the Blowfish cipher is still secure.

The block size is the same as DES with 64 bits. However, Blowfish can

have a variable key size in multiples of 8 bits, from 40 to 442 bits (56 bytes).

During initialization, a key regardless of length is expanded to an internal

structure of 4,168 bytes. The setup costs are expensive, since it does 521 iter-

ations of encryption to generate the key schedule. However, after this, the

key size has no effect on performance—in other words, the speed using a

442-bit key is the same as with a 56-bit key. This allows key choice to be based

purely on security, storage, and protocol requirements instead of on perfor-

mance. The initialization penalty can somewhat be avoided by directly

specifying the internal array instead of having it generated (of course, then

a key of 4168 bytes instead of at most 56 bytes must be saved and stored).

In overall structure, it’s not that different from DES. The subkey array is

eighteen 32-bit subkeys (compare to DES with sixteen 48-bit subkeys).

There are four S-Boxes, each with 256 entries (compare to DES, which has

eight S-Boxes with 64 entries). The most important difference is that Blow-

fish dynamically creates S-Box values during initialization, while DES uses

fixed values. Notice that in Blowfish, all data is a nice round number of

32 and 8 bits, all designed to fit into machine registers with minimal

amount of bit extractions and shifts. This gives Blowfish its speed.

Cryptographically, Blowfish has been deemed strong and has been quite

well received. The algorithm has been included in dozens of free and

limited commercial products. There have been no significant faults found

with the algorithm. In a reduced eight-round version (not the standard

16 rounds), roughly 1 in 214 keys are weak [Vau1996]. The weakness allows

an attacker to know if a weak key is being used or not, but not what the

specific key is (and it also allows some other more abstract attacks). The

weak keys come about because Blowfish dynamically creates S-Boxes.

Apparently, in some cases Blowfish generates some stinkers where some of

the S-Box values are duplicated. There is no way of knowing which key is

weak in advance; you have to go through the iteration and then check the

S-Box values to see. Again, no weak keys have been found for the full-

strength 16-round version.

IDEA

The IDEA cipher also operates on 64-bit blocks but uses a fixed-size 128-bit

key length [LaMa1990, LaMaMu1991]. It is generally regarded as a very

well-designed cipher and has experienced no known cryptographic attacks

better than brute force. It is covered by patents owned by the Swiss firm

Ascom, but they have allowed free noncommercial use, provided permis-

sions are first obtained. Because of this licensing, IDEA became popular in

the email encryption program PGP (Pretty Good Privacy). Today it seems

to still be quite popular in the Perl community, partially since it is a very

secure cipher and has received a positive recommendation from Bruce

Schneier in Applied Cryptography (at the time Schneier had just developed

Blowfish and it was still too new to be recommended). However, IDEA,

like Blowfish, generates some nonlinear aspects of the cipher based on the

key; because of this there are classes of weak keys [DaGoVn1993]. Fortu-

nately, the odds of using one of the weak keys is 2-77, so it’s not considered

a grave threat.

While IDEA is fairly easy to implement in software, it’s also very slow

because of generous use of expensive multiplication operations. A naïve

implementation is typically about the same speed as DES, if not slower.

Those same multiplications allow it to speed up significantly on 64-bit

machines, or if someone hand-codes an assembly version that uses some

microprocessors’ extensions such as MMX (Intel and AMD processors),

SSE2 (Intel’s Pentium 4), or AltiVec (IBM/Motorola’s PowerPC). In these

cases, it’s about twice as fast as DES. See [Lipmaa1998] for a discussion of

using the MMX extension with IDEA.

RC5

RC5 was presented by Ron Rivest (coinventor of the RSA, for Rivest-

Shamir-Adelman, algorithm) in 1995 [Rivest1995]. It is also fully specified

in RFC 2105; however, it is patented (U.S. Patent 5,724,428) and is propri-

etary to RSA Security. It is fully parameterized in terms of block size, key

size, and security represented as RC5-w/r/b, where:

w = Word size in bits—16, 32, and 64—which results in a block size

of 2w. The standard is 32.

r = Number of rounds, which, in turn, provides security. The appro-

priate number depends on w.

b = Number of bytes in secret key; can be 0 to 255.

Internally, the key expansion stage takes a key and turns into an internal

array S of r × w/8 bytes. Following that, encryption is amazingly short and

simple:

A = A + S[0]; // A is first half of plaintext block

B = B + S[1]; // B is second half of plaintext block

for (i = 1 .. r)

A = ((A 5 B) << B) + S[2i]; // <-- data-dependant rotations

B = ((B 5 A) << A) + S[2i+1] ;

The matching decryption algorithm is equally terse. As you can see, like

Blowfish, RC5 uses data-dependant rotations.

RC5 is typically used with 64-bit blocks, 16 rounds, and keys being at

least 128 bits (e.g., RC5-32/16/128+). While you can use larger blocks, you

might consider using the newer ciphers like Rijndael or RC6 (discussed in

the following sections). Using these guidelines, the most effective attacks

are brute force, and the security is based on the size of the key. There is no

reason not to use at least 128 bits, since after key setup, the cipher runs at

the same speed independent of key size.

Rijndael and the Advanced Encryption Standard (AES)

In 2001, the NIST selected Rijndael as the replacement for DES (FIPS 197).

Flemish for XYZ and pronounced “rain-doll,” Rijndael is an interesting

cipher, since it works in a completely different way from the previous ciphers.

The algorithm is in some ways similar to shuffling and cutting a deck of

cards. The interstate is laid out in a square, and the rows and columns are

shifted, mixed, and added in various ways. The entries themselves are also

substituted and altered. It has a lot of parallel and symmetric structure

because of the mathematics, which provides a lot of flexibility in how it is

implemented. However, some have criticized it as having too much structure,

which may lead to future attacks. Apparently that didn’t bother the NSA

(National Security Agency) or the NIST. No known cryptographic attacks

are known, and it works well on a wide variety of processors, doesn’t use

bit shifting or rotation, and is very fast.

The AES Competition

How Rijndael was selected is also interesting. Knowing that DES had out-

lived its useful lifetime, the NIST issued an open call for replacements.

Anyone was welcome to submit a cipher. The general requirements were:

Must be a symmetric block cipher.

Must support a block size of 128 bits.

Must support key sizes of 128, 192, and 256 bits.

Should be at least as fast Triple DES.

Should run on a variety of hardware and should have “good”

characteristics for implementation in hardware.

If selected as the winner, the submitter relinquishes all intellectual

property claims such as patents and trademarks.

Out of 15 semifinalists, 5 were selected:

Rijndael, the eventual winner.

MARS, from IBM.

RC6, from RSA Labs (similar to RC5).

Serpent, from Ross Anderson, Eli Biham, and Lars Kundsen.

Twofish, from Bruce Schneier, John Kelsey, Doug Whiting, David

Wagner, Chris Hall, and Niels Ferguson. It is the successor to the

previously mentioned Blowfish algorithm.

How did they rate? Table 2.3 presents summaries from the confer-

ence reports from the Third Advanced Encryption Standard Candidate

Conference.

The rankings are extremely relative, and a last-place finish does not

mean the algorithm was unacceptable. Each algorithm author had his or

her own interpretation of what the NIST selection process called for. Ser-

pent, for instance, was extremely conservative in security, and this shows

in its performance, while RC6 focused on 32- and 64-bit processors.

That the four other ciphers were not selected does not imply they are

insecure. To the contrary, the NSA said any of the final candidates would

make a good standard from a security standpoint (the speed with which

they made their decision would seem to imply that the NSA has much more

sophisticated methods of cipher analysis than the public has). In addition,

the NIST has not specifically excluded them from being used. Since the

NIST wanted a “generalist” cipher, these alternates may find their way into

other more specialized standards and applications. However, Twofish and

RC6 are particularly noteworthy because they have strong corporate or

independent support and are likely to be used in future applications.

Table 2.3 Evaluation of AES Finalists on Various Platforms

PROCESSOR

RELATIVE RANKING

ARCHITECTURE

BETTER......................................................WORSE

FGPA

Rijndael

Serpent

RC6

MARS

(programmable

Twofish

logic)

DSP TMS320C62x

RC6

Rijndael

Twofish

Toshiba T6N55

Rijndael

RC6

(smart card)

Twofish

Pentium II

Rijndael

RC6

MARS

Serpent

Twofish

Itanium /

Rijndael

RC6

MARS

Serpent

PA-RISC (HP)

Twofish

Alpha 21264

Rijndael

Twofish

RC6

MARS

Serpent

ASIC hardware,

Rijndael

Serpent

Twofish

MARS/RC6

worst case

“Parallelism”

Rijndael

Twofish

Serpent

MARS

RC6

(feedback modes)

Memory usage

RC6

Serpent

MARS

Twofish

Rijndael

Software

Rijndael

MARS

Serpent

(summary, Whiting)

Twofish

RC6

Software (summary,

RC6

Rijndael

MARS

Twofish

Serpent

Bassham)

Java (speed)

Rijndael

MARS

RC6

Java (throughput)

RC6

MARS

Rijndael

Twofish

Java (key setup)

Rijndael

Serpent

RC6

MARS

Twofish

Twofish

The successor to Blowfish, Twofish has already been widely deployed in

many open-source and minor commercial projects. Twofish is meticulously

constructed and well documented, and the submission paper is an excel-

lent reference and bibliography on cryptanalysis. The developers even

published a book on its design [ScKeWhWaFe1999]. In some ways, Twofish

is a very traditional design that isn’t dissimilar to DES: it’s based on a Feistel

design using expanded keys and S-Boxes for nonlinearity. The big differ-

ence is how each part was designed and how each part interacts with the

others. Many components were found by an exhaustive search for a map-

ping that provided an optimal result using the smallest number of machine

operations. While each piece of Twofish separately is fairly straightfor-

ward, the final result is hard to understand and analyze. One description of

it is “a collection of patches which fix various problems” [Baudron1999].

That said, no one is suggesting it is insecure and currently the best attack is

brute force.

RC6

RC6 builds upon the success of RC5 and is modified to handle 128-bit block

sizes using four 32-bit registers. Like RC5, the algorithm is completely

parameterized and specified as RC6-w/r/b. As submitted to the AES, RC6

uses w = 32 and r = 20. The length of the key may be any multiple of 8; how-

ever, it is most common to use b = 16, 24 or 32 bytes (128, 192, and 256 bits).

The key scheduling function is almost identical to the one used in RC5,

except that RC6 generates more bytes from the key. It is fast—and in many

cases faster than Rijndael—and Java performance is excellent. RSA Secu-

rity has submitted RC6 to various other standard bodies, so you can be

sure you will be seeing more of it. They also have trademarked the name

“RC6” and received a patent on its design (U.S. Patent 6,269,163).

Ciphers You Shouldn’t Use

This section lists ciphers that are insecure and should not be used. Oddly,

they keep turning up in supposedly high-security applications such as

authentication tokens (see [FuSiSmFe2001] for examples). The ciphers are

most likely used because they are simple to implement in a few lines of

code and visually appear to do a good job obscuring the underlying data.

Password XOR Schemes

In this type of scheme, the user takes a password and repeatedly uses it to

XOR against the message. This has been used in various forms, including

file protection. It is also common in protecting machine code where the

program starts to decrypt the machine code and then jumps into the newly

decrypted code. However, this is a weak scheme, and it can be easily

cracked, especially if there is any knowledge about what the plaintext

looks like (such as common headers or ASCII). It can be used to obscure

data to make it visually unreadable, but not for anything that is susceptible

to an attack:

// same function to encode and decode

public static void xorData(byte[] data, byte[] key) {

for (int i = 0, j = 0; i < data.length; i++) {

data[i] ^= key[j++];

if (j == key.length) j = 0;

}

}

Classical Cryptography

Classical cryptography involved simple ciphers that typically worked on

letters or characters as opposed to bytes. More advanced classical cipher

operated on blocks, but typically blocks of letters instead of blocks of bytes.

These are now primarily of historical interest and useful as examples for

more advanced concepts.

Interestingly, some of the work of classical cryptography is still classified.

From 1938 to 1941, William Friedman wrote a four-volume series titled

Military Cryptanalysis. In 1984 the first two volumes were declassified. A

more modern series from Lambros Callimahos included three volumes

written from 1959 to 1977. Again only the first two are declassified. In 1992,

an attempt to declassify the remaining volumes using the Freedom of Infor-

mation Act was rejected. According to the NSA (Gilmore v. NSA: 10-02-93 let-

ter, available at: www.eff.org/Privacy/Key_escrow/ITAR_export/

ITAR_FOIA/Gilmore_v_NSA/10-02-93.letter

The documents are classified because their disclosure could reasonably be

expected to cause serious damage to the national security . . . . In addition,

this Agency is authorized by various statutes to protect certain information

concerning its activities. We have determined that such information exists in

these documents.

In other words, 45 years later, World War II cryptography is still essential

to national security. One can only speculate as to why this would be. To my

knowledge no further attempts have been made to release the material

since 1992, but maybe someone will try again.

Regardless of what is hidden in those texts, classical cryptography has

no place in your security solution.

Cryptography for

Internet and Database

Applications

Developing Secret and Public Key

Techniques with Java™

Cryptography for Internet and

Database Applications

Nick Galbreath

Cryptography for

Internet and Database

Applications

Developing Secret and Public Key

Techniques with Java™

Publisher: Bob Ipsen

Editor: Carol A. Long

Developmental Editor: Adaobi Obi

Managing Editor: Micheline Frederick

New Media Editor: Brian Snapp

Text Design & Composition: Wiley Composition Services

Designations used by companies to distinguish their products are often claimed as trade-

marks. In all instances where Wiley Publishing, Inc., is aware of a claim, the product names

appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appro-

priate companies for more complete information regarding trademarks and registration.

This book is printed on acid-free paper. ∞

Copyright © 2002 by Nicholas Galbreath. All rights reserved.

Published by Wiley Publishing, Inc., Indianapolis, Indiana

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted

in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or

otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright

Act, without either the prior written permission of the Publisher, or authorization through

payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rose-

wood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470. Requests to the Pub-

lisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc.,

10475 Crosspointe Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail:

permcoordinator@wiley.com.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their

best efforts in preparing this book, they make no representations or warranties with respect

to the accuracy or completeness of the contents of this book and specifically disclaim any

implied warranties of merchantability or fitness for a particular purpose. No warranty may

be created or extended by sales representatives or written sales materials. The advice and

strategies contained herein may not be suitable for your situation. You should consult with

a professional where appropriate. Neither the publisher nor author shall be liable for any

loss of profit or any other commercial damages, including but not limited to special, inci-

dental, consequential, or other damages.

For general information on our other products and services please contact our Customer

Care Department within the United States at (800) 762-2974, outside the United States at

(317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears

in print may not be available in electronic books.

Library of Congress Cataloging-in-Publication Data:

ISBN: 0-471-21029-3

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

Preface

xiii

Introduction

xv

Chapter 1

Bits and Bytes

1

General Operations

1

Number Bases

2

Bits and Bytes

3

Signed Bytes

4

Bitwise Operators

4

Complementation or Bitwise NOT

6

Bitwise AND

6

Bitwise OR

7

Bitwise Exclusive OR (XOR)

8

Left-Shift

8

Right-Shift

9

Special Operations and Abbreviations

10

Packed Words

11

Integers and Endian Notation

12

Java Numerics

13

Basic Types

14

Type Conversion

15

Unsigned to Signed Conversions

17

Overflow

17

Arrays

18

Numeric Classes

20

Contents

v

Booleans and BitFields

21

Chars

22

Working with Bytes

22

Sign Problems

22

Conversion of Integral Types to Byte Arrays

24

Converting to Hex Strings

25

BigInteger

28

Creating and Converting

29

BigInteger and Cryptography

31

Secret Methods in BigInteger

31

Chapter 2

Secret Key Cryptography

35

Symmetric Block Ciphers

35

Cipher Properties

36

Security Properties

36

Brute-Force Attacks

37

Other Attacks

37

Common Block Ciphers

39

Data Encryption Standard (DES)

39

Triple DES

40

Blowfish

41

IDEA

43

RC5

43

Rijndael and the Advanced Encryption Standard (AES)

44

Twofish

46

RC6

47

Ciphers You Shouldn’t Use

47

Password XOR Schemes

47

Classical Cryptography

48

ROT 13

49

Padding

49

Fixed-Value Padding

49

Random Padding

50

PKCS Padding

50

Modes of Operations

51

Initialization Vectors

51

Electronic Codebook Mode (ECB)

53

Cipher Block Chaining Mode (CBC)

54

Cipher Feedback Mode (CFB)

56

Output Feedback Mode (OFB)

58

Counter Mode (CTR)

60

Propagating CBC (PCBC) Mode

60

Key Wrapping

61

Triple DES KEY Wrapping

61

AES Key Wrapping

62

Turning Passwords into Keys

63

Hashes

64

Cryptographic Hashes

65

Collisions

66

Attacks

67

Algorithms

69

The MD Family

71

The SHA Family

71

The RIPE-MD Family

72

Hash Standards and Practices

73

Hashed Message Authentication Codes (HMACs)

74

The Standard HMAC

74

Legacy Concatenation Schemes

75

HMAC Standards and Practices

76

Summary

76

Chapter 3

Public Key Cryptography

77

Public Key Ciphers

77

Other Systems

79

Digital Signatures

79

Key Agreements

79

Zero-Knowlege Schemes

79

Secret Sharing

80

Public Key Security Classification

80

Fundamental Mathematics

82

Prime Numbers

82

The Distribution of Prime Numbers

83

Prime Testing

84

Probabilistic Tests

85

Sequence-Based Tests

87

Elementary Number Theory

88

Modular Arithmetic

88

Additive Groups

89

Multiplicative Groups

89

Fields

90

Rings

90

Orders and Generators

90

Public Key Encryption and Major PKCS Categories

91

RSA and Integer Factorization

92

Factoring

92

The RSA Problem

94

The Algorithm

94

Message Representation and OAEP

96

In Practice and Standards

98

Choice of Parameters

98

Discrete Logarithm Systems

100

Underlying Mathematics

100

The Algorithm

103

Standards and Practice

106

Elliptic Curves

106

Underlying Mathematics: Elliptic Curves

106

The Algorithm

110

Standards and Practice

112

Other Public Key Cryptographic Systems

112

Rabin Cryptosystem

112

NTRU

114

Summary

115

Chapter 4

Random Numbers

117

Randomness and Security

119

Testing for Randomness

120

FIPS 140-2 Requirements

121

Pseudorandom Number Generators

122

Cryptographic PRNG

123

SHA-1 PRNG

123

Cipher-CBC or ANSI X9.17

123

FIPS 186

123

Blum-Blum-Shub

124

Stream Ciphers

125

One-Time Pads

125

RC4 or ArcFour

125

Using Randomness

126

Generating Random Numbers for Gaming

126

Generating Random Numbers in a Range

127

Shuffling

129

Generating Random Permutations

131

Small Permutations

131

Large Fixed Permutations

132

Random Sampling

133

Accessing Entropy

134

OS Services

134

Win32 CryptoAPI CryptGenRandom

135

/dev/random and friends

135

Userland Services

136

Entropy Generating Daemon (EGD)

137

PRNGD

141

Yarrow and EGADS

141

TrueRand Library

141

Remote Services

142

RAND Corporation

143

HotBits

143

Random.org

143

LavaRnd

144

Java and Random Numbers

144

Random and SecureRandom

144

java.util.random

145

java.security.SecureRandom

146

Developer Issues

148

Reseeding

148

Collecting Entropy

150

An Entropy Pool Implementation

150

Basic System State

151

Thread Schemes

153

Reading External Sources

156

Application Events

156

User Events

157

Chapter 5

Java Cryptography

159

Organization

160

Providers and Engine Classes

161

Parameters, Keys, and Certificates

162

Error Handling

163

Providers

164

Standard Names

165

Standard Sun and SunJCE Providers

168

Other Providers

169

Initializing Providers

170

Writing Your Own Provider

171

Core Engine Classes

171

MessageDigest

171

Digest Streams

172

MAC

173

SecureRandom

174

Ciphers

177

Additional Cipher-Related Objects

180

Signatures

183

SignedObject

184

Key Agreement Protocols

184

Parameters, Keys, and Certificates

185

Algorithm Parameters

186

AlgorithmParameters

186

AlgorithmParameterGenerators

188

Keys

189

Secret Keys

192

Public/Private Keys

195

Encoding and Encrypting Keys

197

Summary

202

Chapter 6

Small Message Encoding and Encryption

203

Preprocessing

203

Converting Digits into Bytes

203

7-bit to 8-bit Compression

205

General Compression and java.util.zip.Deflate

206

Adding Check and Parity Bits

207

Small Message Encryption

209

Single-Block Encryption

210

n-Block Encryption

210

Very Small Message Encryption

211

XOR Tables

211

Small Codebooks

211

RC5-16/16

212

Small Message Encoding

212

Encoding for Customer-Usable Data

213

Capacity and Range

213

Selecting a Base Representation

214

Selecting Base Alphabets

215

Mixed Bases and Alphabets

221

Adding Check Digits

221

Encoding for Machines and Customer-Visible Applications

230

Base 64

230

Base 85

234

Base 128 and Java Source Encoding

237

Chapter 7

Application and Data Architecture

241

Database Architecture for Encrypted Data

241

Selecting a Cipher

243

Secret or Public?

243

Cipher Selection

244

Data

245

Passwords

245

Challenges and Responses

246

Payment, Credit Card, and Other Account Numbers

247

Social Security Number (U.S.)

250

Birthdates and Birthdays

250

Last Name

251

Searching, Indexing, and Constraints

251

Removing Randomness

252

Deterministic Key Selection

252

Indexing and Hashing

253

Uniqueness Constraints

254

Asymmetric Data Usages

255

Null Values and Database Applications

256

Secure Memory Management in Java

258

Smart Array Classes

259

Char Arrays

262

Using SecureRandom

263

Secret Key Management

263

Secret Key Data

264

Key Generation

266

Key Encryption

266

Storage

267

Key Access and Distribution

268

Using Keys with Ciphers and MACs

269

Passwords

272

Startup Passwords

272

Member Names and Passwords

274

Selecting Passwords

274

Member Login, Success and Failure

275

Changing Passwords and Challenges

276

Web-Based Password Entry

276

Generating New Passwords

277

Member Names

278

Logging

278

Embedded-Encryption Logging

279

Fully Encrypted Log Files

279

Public Key Logging

281

Split Log Files

281

Network-Based Logs

282

Cryptographic Tokens and Applications

282

Token Design

282

Expirations and Time Quantization

283

Creating the Security Bits

285

URL Tokens

285

Tamper-Evident URLs

285

Protecting Static Content

286

A Simple URL MAC Implementation

287

Fast Query String Parsing

290

URL Encryption

296

Very Short URLs

296

Cookie Tokens

297

Detecting Cookie Capability

297

Cookies and Authentication

297

Tokens for Access Control

298

Buy-on-Demand Systems

299

Multiple Key Systems

299

Trials and Expirations

300

Decimal and Monetary Computations

300

Doubles and Floats

300

BigDecimal

301

Rounding

302

BigDecimal and Mathematics

303

BigDecimal Alternatives and Wrappers

304

Usage and Storage

304

Appendix A Java Cryptography Class Reference

305

References

367

Index

381

I wrote this book for software engineers with little or no exposure to cryp-

tography. Most other books fall into one of two categories, the encyclope-

dia and description or the purely API descriptive. The goal was try and

bridge the two by providing a solid introduction to cryptography while

providing solid examples and uses. In addition, many books focus over-

whelmingly on public key techniques. In my experience the most common

uses for public key ciphers are handled by third-party applications (VPNs,

Emails) or off-the-shelf protocols such as SSL and SSH.

While there are a number of excellent cryptographic libraries in C and

C++, Java serves as the reference since:

It’s very popular for new business and server applications.

Memory management is automatically done, eliminating entire

classes of bugs (stack smashing and buffer overflows).

It provides a standard cryptographic API that, while not perfect or

complete, is about as “universal” as we can get.

The cryptographic API for Java is scattered among several packages.

Instead of listing classes by packages as is commonly done, Appendix A

lists the classes alphabetically. I found this to be much more useful than

flipping between different package sections.

I’ve tried to limit source code to examples that are relatively simple or

that demonstrate main points or self-contained utilities. More complicated

(and perhaps more useful) examples were excluded simply because the

error and exception handling for a security application can be quite long

and tedious and didn’t really add much value. I have always disliked

Preface

xiii

CD-ROMs glued into books and likewise, never found chapters and chap-

ters of source code to be very useful. Instead, full source code and more

examples are presented at the companion Web site at www.wiley.com/

compbooks.galbreath.

Unfortunately, time and space didn’t permit examination of many top-

ics. In particular:

XML key management, encryption, and signature scheme

Privacy Issues, including privacy “seals,” Gramm-Leech-Biley Act of

1999 and HIPPA privacy rule.

More detailed database tips and techniques.

Perhaps these will be detailed in future editions. Until then, I hope you find

this book useful.

Finally, this book would not have been possible without the many fruit-

ful conversations and support from Dave Andre, Jeff Bussgang, Ann

Calvit, Roy Dixit, Jim Finucane, Bill French, Venkat Gaddipati, Nabil

Hachem, Sam Haradhvala, Adam Hirsch, Steve Morris, Jack Orenstein,

Rich O’Neil, and Matt Rowe, and many others from Upromise, SightPath

and Open Market. Special thanks to my family and friends that had to put

up with my social hiatus while working on this book. Thank you all.

Nick Galbreath

Plum Island, MA

June 2002

The goal of the book is to present the systems programmer with a full intro-

duction into the science and the application of cryptography using Java.

Unlike many texts, this book does not focus on cryptography for transmis-

sion (allowing two parties to communicate securely). For most applica-

tions, those requirements are handled by the SSL or SSH protocols, or a

third-party application. Instead this book focuses on cryptography for stor-

age, message integrity, and authentication, (the so-called “single-point”

techniques) which is more common with custom applications. Beside pure

cryptography, many “auxiliary” techniques are covered that make cryp-

tography useful.

The first chapter covers basic logical and numeric operations, both at the

theoretical level and with the specifics of the Java numeric model. While it

may be a review to many readers, I’ve found that many system program-

mers just do not have exposure to these low-level details. Either they never

learned, or more likely, because it just isn’t normally used in day-to-day

work, it’s been forgotten. Those readers who are more familiar with C or

C++, should find this chapter especially useful for translating code one

way or another from Java.

The next two chapters introduce the reader to science, mathematics, and

standards of both secret key and public key cryptography. Here you’ll

learn why the algorithms work and the various tradeoffs between them.

While the mathematics aren’t essential in day-to-day programming tasks,

the goal was to teach the reader to be familiar with the terminology that is

often tossed around to be able to make informed decisions.

Introduction

xv

Chapter 4 discusses random numbers and random number generators.

While not technically cryptography, random numbers are essential to it.

Those readers developing gaming systems should find this section espe-

cially interesting. Java is used for examples of algorithms, but it could be

easily converted to any other programming language.

Finally, in Chapter 5 we introduce Java’s cryptographic APIs. The Java

SDK already comes with some excellent documentation, but I’ve tried to

pull it together consolidating the JCA and JCE into a coherent format. Like-

wise, I made Appendix A the Java cryptography reference I wanted when

I’m programming. Instead of listing classes by package, I list them all

alphabetically. You’ll find you’ll do a lot less page flipping this way and get

a much better understanding of the API with this organization.

Chapter 6 is on small message encoding, and like Chapter 4, isn’t techni-

cally cryptography, but it’s always an issue when using cryptography.

Here you’ll learn how to convert binary data (perhaps encrypted) into var-

ious ASCII formats. This is critical when embedding data in a URL, creating

passwords and cryptographic tokens.

Chapter 7 pulls everything together and discusses many topics: applica-

tion and database design, working the passwords and tokens, key man-

agement, and logging.

Numerous source code examples are used, and the source itself can be

found at the companion Web site www.modp.com. Many of the examples

in the book will require a little reworking for production use. You’ll want

to modify the error handling to suite your needs. In some cases, if the code

is especially long and not particularly illuminating, I decided not to list it

and instead just refer to the Web site. I find page after page of printed

source code in a book to not be particularly useful.

Unfortunately, due to time and space requirements a lot of important

topics are not covered. Digital signatures are only mentioned and they

deserve more, but I’ve found for basic application work, they just aren’t

that useful. Either, most people won’t use them or there is already a trust

relationship in place eliminating their usefulness, or encryption with a hash

or MAC is preferable. And finally, when I do need to use them, the process

is normally handled by a third party application. Other topics not given

their due are the new XML and SAML standards for creating documents

that contain encrypted data, and embedding cryptography within the data-

base (instead of the application), such as the newer Oracle database can do.

A lot more could be said for key management and database design as well.

Perhaps future editions will remedy this.

1

Before getting into the details of cryptographic operators, we’ll review

some basics of working with bits and number bases. For those of you who

have worked on operating systems, low-level protocols, and embedded

systems, the material in this chapter will probably be a review; for the rest

of you, whose day-to-day interactions with information technology and

basic software development don’t involve working directly with bits, this

information will be an important foundation to the information contained

in the rest of the book.

General Operations

Most cryptographic functions operate directly on machine representation

of numbers. What follows are overviews of how numbers are presented in

different bases, how computers internally represent numbers as a collec-

tion of bits, and how to directly manipulate bits. This material is presented

in a computer- and language-generic way; the next section focuses specifi-

cally on the Java model.

Bits and Bytes

C HAPTE R

1

Number Bases

In day-to-day operations, we represent numbers using base 10. Each digit

is 0 through 9; thus, given a string of decimal digits dndn-1 ... d2d1d0, the

numeric value would be:

10ndn + 10n-1dn-1 + ... + 102d2 + 10d1 + d0

This can be generalized to any base x, where there are x different digits

and a number is represented by:

xndn + xn-1dn-1 + ... + x2d2 + xd1 + d0

In computer science, the most important base is base 2, or the binary rep-

resentation of the number. Here each digit, or bit, can either be 0 or 1. The

decimal number 30 can be represented in binary as 11110 or 16 + 4 + 2. Also

common is hexadecimal, or base 16, where the digits are 0 to 9 and A, B, C,

D, E, and F, representing 10 to 15, respectively. The number 30 can now be

represented in hexadecimal as 1E or 16 + 14. The relationship between dig-

its in binary, decimal, and hexadecimal is listed in Table 1.1.

Table 1.1 Binary, Decimal, and Hexadecimal Representations

BINARY

DECIMAL

HEXADECIMAL

0000

0

0

0001

1

1

0010

2

2

0011

3

3

0100

4

4

0101

5

5

0110

6

6

0111

7

7

1000

8

8

1001

9

9

1010

10

A

1011

11

B

1100

12

C

1101

13

D

1110

14

E

1111

15

F

When you are working with different bases, the base of a number

may be ambiguous. For instance, is 99 the decimal or the hexadecimal 99

(= 9 × 16+9)? In this case, it’s common to prefix a hexadecimal number with

0x or just x (e.g., 99 becomes 0x99 or x99). The same confusion can happen

with binary numbers: Is 101 the decimal 101 or the binary 101 (4 + 1 = 5)?

When a number in binary may be confused, it’s customary to add a sub-

script 2 at the end of a string of binary digits, for example, 1012.

Any number can be represented in another base b if it is positive; how-

ever, doing the conversion isn’t necessarily easy. We’ll discuss general-

purpose base conversion in a later section, but it’s useful to note that

conversion between two bases is especially easy if one of the bases is a

power of the other. For instance, the decimal number 1234 has the canoni-

cal representation in base 10 as 1 × 1000 + 2 × 100 + 3 × 10 + 4. However,

1234 can also be thought as two “digits” in base 100: (12) and (34), with a

value of 12 × 1000 + 34 × 100. It’s the same number with the same value; the

digits have just been regrouped. This property is especially useful for base

2. Given a binary string, it’s possible to convert to hexadecimal by group-

ing 4 bits and computing the hexadecimal value:

10011100 = 1001 1100 = 9C

Bits and Bytes

A bit is the smallest unit of information that a computer can work on and

can take two values “1” and “0,” although sometimes depending on con-

text the values of “true” and “false” are used. On most modern computers,

we do not work directly on bits but on a collection of bits, sometimes called

a word, the smallest of which is a byte. Today, a byte by default means 8 bits,

but technically it can range from 4 to 10 bits. The odd values are from either

old or experimental CPU architectures that aren’t really in use anymore. To

be precise, many standards use octet to mean 8 bits, but we’ll use the more

common byte. Modern CPUs operate on much larger word sizes: The term

32-bit microprocessor means the CPU operates primarily on 32-bit words in

one clock cycle. It can, of course, operate on 8-bit words, but it doesn’t

mean it happens any faster. Many CPUs also have special instructions that

sometimes can operate on larger words, such as the SSE and similar

instructions for multimedia, as well as vector processing, such as on the

PowerPC.

A byte has a natural numeric interpretation as an integer from 0 to 255

using base 2, as described earlier. Bit n represents the value 2n, and the

value of the byte becomes the sum of these bits. The bits are laid out exactly

as expected for a numeric representation, with bit 0 on the right and bit 7 on

the left, but the layout is backward when compared to Western languages.

(b7b6b5b4b3b2b1b0) = 27b7 + 26b6 + 25b5 + 24b4 + 23b3 + 22b2 + 21b1 + 20b0

or using decimal notation

(b7b6b5b4b3b2b1b0) = 128b7 + 64b6 + 32b5 + 16b4+ 8b3 + 4b2+ 2b1 + b0

For example, 00110111 = 32 + 16 + 4 + 2 + 1 = 55.

Bits on the left are referred to as the most-significant bits, since they

contribute the most to the overall value of the number. Likewise, the right-

most bits are called the least-significant bits. This layout is also known as

Big-Endian, which we’ll discuss later.

Signed Bytes

Negative numbers can be represented in a few ways. The simplest is to

reverse one bit to represent the sign of the number, either positive or nega-

tive. Bits 0 through 6 would represent the number, and bit 7 would repre-

sent the sign. Although this allows a range from –127 to 127, it has the quirk

of two zeros: a “positive” zero and a “negative” zero. Having the two zeros

is odd, but it can be worked around. The bigger problem is when an over-

flow occurs—for instance, adding 127 + 2 is 129 in unsigned arithmetic, or

1000001. However, in signed arithmetic, the value is –1.

The most common representation is known as two’s complement. Given x,

its negative is represented by flipping all the bits (turning 1s into 0s and

vice versa) and adding 1, or computing –1 –x (the same value). For example,

note in Table 1.2 that adding 1 to 127 makes the value –128. While this

method is a bit odd, there are many benefits. Microprocessors can encode

just an addition circuit and a complementation circuit to do both addition

and subtraction (in fact, many CPUs carry around the complement with

the original value just in case subtraction comes up). The other main bene-

fit is when casting occurs, or converting a byte into a larger word, as

described in the following section.

Bitwise Operators

The usual arithmetic functions such as addition and multiplication inter-

pret words as numbers and perform the appropriate operations. However,

other operations work directly on the bits without regard to their represen-

tation as numbers. These are bitwise or logical operations. While examples

shown in the next sections use 8-bit bytes, they naturally extend to any

word size.

Table 1.2 Two’s Complement Representation

UNSIGNED

SIGNED

HEXADECIMAL

BINARY

VALUE

VALUE

REPRESENTATION

REPRESENTATION

0

0

00

00000000

1

1

01

00000001

2

2

02

00000010

126

126

7d

01111110

127

127

7f

01111111

128

-128

80

10000000

129

-127

81

10000001

130

-126

82

10000010

253

-3

fd

11111101

254

-2

fe

11111110

255

-1

ff

11111111

As shown in Table 1.3, the operations are represented by different sym-

bols depending if the context is programming or typographical. When

required, this book will use the programming notations, with the exception

of XOR, since the caret symbol (^) is used to denote exponents in some

systems.

Table 1.3 Bitwise Operations and Notation

C-STYLE

C-STYLE SELF-

OPERATION

NOTATION

ASSIGNMENT

TYPOGRAPHICAL

NOT a

~a

n/a

¬ a

a AND b

a & b

a &= b

a ^ b

a OR b

a | b

a |= b

a b

a XOR b

a ^ b

a ^= b

a ⊕ b

Shift a left by

a << n

a <<= n

a << n

n bits

(continues)

^

Table 1.3 Bitwise Operations and Notation (Continued)

C-STYLE

C-STYLE SELF-

OPERATION

NOTATION

ASSIGNMENT TYPOGRAPHICAL

Shift a right by

a >> n

a >>=n

None; all shifts and

n bits, preserve

words are assumed

sign of a

to be unsigned

Shift a right by

a >>> n

a >>>=n

a >> n

n bits, unsigned

Rotate a right by

(x >>> n) |

n/a

ROTRn(a)

n bits; w is

(x << w-n)

number of bits of a

Rotate a left by

(x << n) |

n/a

ROTLn(a)

n bits; w is the

( x >>> w-n)

number of bits of a

Concatenating

(a << shift) | b

n/a

a || b

a and b

Take n least-

a & mask

a &= mask

MSBn(a)

significant bits of a

Take n most-

a & mask

LSBn(a)

significant bits of a

>>> shift

Complementation or Bitwise NOT

The simplest bit operation is complementation or the bitwise NOT. This

simply flips bits within a word, where 0s become 1s and 1s become 0s—for

example ~11001 = 00110. Cryptographically, this operation is not used

much, primarily for implementing basic arithmetic in hardware.

Bitwise AND

AND is useful in bit testing and bit extraction. It’s based on the usual truth

table for logical AND, as shown in Table 1.4. You can remember the truth

table by observing that it’s the same as binary multiplication—anything

times zero is zero.

Table 1.4 Bitwise AND

0

1

0

0

0

1

0

1

AND

To test if a bit is set, create a mask with 1s in positions you wish to test

and perform the logical AND operation. If the result is nonzero, the bit is

set; otherwise, the bit is not:

01011101

AND 00001000

________

00001000 // not == 0, bit 4 is set

It’s also useful for bit clearing. A mask is created with 1s in positions to

preserve and 0s where you want to clear. For instance, to clear the four

least-significant bits in a byte, use the mask 11110000:

11011101

AND 11110000

________

11010000

Bitwise OR

Bitwise OR extends the logical OR to a series of bits. In English, or means

one or the other but not both. The bitwise version means one or the other

or both (the same as the logical OR). See Table 1.5.

Logical OR is useful in joining nonoverlapping words. The following

example joins two 4-bit words together:

11010000

OR 00001111

___________

11011111

In this case, the same effect could be done by using regular addition;

however, it’s pretty standard to use OR instead to make it clear that addi-

tion is not the point of the operation and that you are doing bit operations.

Table 1.5 Bitwise OR

0

1

0

0

1

1

1

1

OR

Table 1.6 Bitwise XOR

Bitwise Exclusive OR (XOR)

Exclusive OR is abbreviated as XOR, and its operation is denoted by ⊕. It

is less common in normal programming operations but very common in

cryptographic applications. Table 1.6 shows the Bitwise XOR table. This

table is easy to remember, since it is the same as addition but without any

carry (this is also known as addition modulo 2). It’s also equivalent to the

English usage of or—one or the other but not both.

XOR tests to see if two bits are different: If the bits are both 0s or both 1s,

the result is 0; if they are different, the result is 1. XOR is useful in crypto-

graphic applications, since unlike AND and OR, it’s an invertible opera-

tion, so that A ^ B ^ B = A:

11010011

XOR 10101010

____________

01111101

XOR 10101010

____________

11010011

Left-Shift

As its name implies, the left-shift operator shifts the bits within a word to

the left by a certain number of positions. Left-shifts are denoted by a << b,

where a is the value and b denotes the number of positions to shift left.

Zeros are filled in on the least-significant positions:

11111111 << 1 = 11111110

11001100 << 3 = 01100000

Mathematically, each shift left can be thought of as multiplication by 2:

3 × 2 = 00000011 << 1 = 00000110 = 4 + 2 = 6

3 × 4 = 00000011 << 2 = 00001100 = 8 + 4 = 12

-3 × 2 = 11111101 << 1 = 11111010 = -6

-128 × 2 = 10000000 << 1 = 00000000 = 0 (overflow)

0

1

0

0

1

1

1

0

XOR

When you are working with large word sizes (such as 32-bit integers),

left-shifts are used to compute large powers of 2, since 2n = 1 << n. A trick

to test to see if only one bit is set (or rather the value is a power of 2) is using

x & -x == x. In many cryptographic operations, we need to extract a cer-

tain number of the least-significant bits from a word. Normally you are

working with word sizes much bigger than a byte. However, say you

needed to extract the six least-significant bits from a byte B. You could per-

form B & 0x3f (B & 0011111). Computing this for larger word sizes is a bit

clumsy, and worse, some algorithms may extract a variable number of bits,

so the mask has to be dynamic. Hardwiring a hexadecimal mask is also

prone to errors, and someone reading your code would have to think about

how many bits you are extracting. How many bits are in 0x7fffff? The

answer is 23, but it’s not automatically clear, and staring at a computer mon-

itor all day where the text is in a small font makes the task harder. Instead,

a left-shift can be used, since 2n – 1 or (1 << n) – 1 has binary representation

of with (n – 1) digits of “1.” Instead of 0x7fffff, we can use ((1 << 24) – 1), which

will be perfectly clear to someone familiar with bit operations. Even better,

you can make your intentions clearer by making the mask a constant:

public static final int mask23lsb = (1<<24) -1; // 23-bit mask

Right-Shift

Not surprisingly, right-shifts, denoted by >>, are the opposite of left-shifts,

and positions are shifted to the right. However, there is another significant

difference. In left-shift notation, zeros are placed in the least-significant

position. With right-shift, the value of the most-significant bit is used to

fill in the shifted positions. For example, 1000000 >> 2 = 1100000, but

0100000 >> 2 = 00100000. This is designed so that the sign is preserved and

the right-shift is equivalent to division by 2 (dropping any fractional part),

regardless of the sign. For example, we’ll “undo” the previous left-shift

examples:

6 / 2 = 00000110 >> 1 = 00000011 = 3

12 / 4 = 00001100 >> 2 = 00000011 = 3

-6 / 2 = 11111010 >> 2 = 11111101 = -3

-127 / 2 = 10000001 >> 1 = 11000000 = -64

This works well when you are treating the byte as a number. In crypto-

graphic applications, the bits are just treated as bits without any numerical

interpretation. In every case, it is assumed the byte or word is unsigned

(e.g., unsigned long in C). If you are using a signed type (such as Java),

you’ll want to use the unsigned right-shift, denoted by >>> (three less-than

symbols). This just shifts the bits to the right and fills in zeros for the most

significant bits. All cryptographic papers assume the shift is unsigned and

normally use the plain >>. When coding an algorithm using signed types,

make sure you convert >> to >>>.

Special Operations and Abbreviations

The previous operations, while useful, are fairly basic and typically directly

implemented in hardware. By combining and composing them, you can

create new higher-level bit operations. For cryptography, the most useful

operations are rotations, concatenation, and extraction of most- and least-

significant bits.

Rotations

Bit rotations are fairly self-explanatory: Bits are shifted either left or right,

and the bits that “fall off the edge” roll over onto the other side. These are

commonly used in cryptography, since this method provides an invertible

operation that can aid in scrambling bits. The problem is that they aren’t

used for much else, so many CPUs do not have rotation instructions, or if

they do, they are frequently slow. Even if they have a fast rotation, the only

way to access them is by writing assembly language, since rotations do not

have a standard operator like the shifts do. Common programming rota-

tions can be accomplished with a combination of shifts and logical opera-

tors, which is the slowest method of all. In its most general case, a rotation

will take one subtraction, two shifts, and a logical OR operation. Remember

to use the unsigned right-shift operator:

(x >>> n) | (x << 32-n); // rotate right n positions, x 32-bit int

(x << n) | (x >>> 32-n); // rotate left n position, x 32-bit int

There is no special typographical symbol for rotations; we normally use

the abbreviations ROTR or ROTL, although certain technical papers may

define their own symbols in their work.

Bit Concatenation

Bit concatenation is the process of simply joining together two sets of bits

into one word. If b is n-bits long, then a || b is a << n | a:

a = 101

b = 0011

a || b = 11 << 4 | 0011 = 1010000 | 0011 = 1010011

MSB and LSB operations

Two other common operations are extracting a certain number of most-

significant bits (MSB) or least-significant bits (LSB). We’ll use the notation

MSBn(a) to denote extracting the n most-significant bits. Likewise with

LSBn(a). These operations are accomplished by making the appropriate

mask, and in the MSB case, by shifting appropriately:

MSB3(10111111) = (10111111 & 11100000) >>> 5 = 101

LSB2(11111101) = 11111111 & 00000011 = 00000001

Packed Words

Shifts are useful in creating packed words, where you treat the word as an

array of bits rather than as a number, and where the bits represent anything

you like. When doing this, you should use an unsigned numeric type.

Unfortunately, Java and many scripting languages do not have unsigned

types, so you must be extremely careful. As an example, suppose you have

two variables that are numbers between 0 and 15. When transmitting or

storing them, you might want to use the most space-efficient representa-

tion. To do this, you represent them not as 8-bit numbers from 0 to 255, but

as two numbers each with 4 bits. For this example, we’ll use pseudocode

and use the type int to represent an unsigned 8-bit value:

int a = 5; // 00000101

int b = 13; // 00001101

int packed = a << 4 | b; // = 00000101 << 4 | 00001101

// = 01010000 | 00001101

// = 01011101

To undo the operation:

b = packed & 0x0F; // 01011101 & 00001111 = 00001101

a = packed >>> 4; // note unsigned right-shift

// 01011101 >>> 4 = 00000101

Likewise, you might wish to view the byte as holding eight true/false

values:

int c = b0 | (b1 << 1) | (b2 << 2) | (b3 << 3) | (b4 << 4)

| (b5 << 5) | (b6 << 6) | (b7 << 7);

int mask = 0x01;

b0 = c & mask

b1 = (c >>> 1) & mask;

b2 = (c >>> 2) & mask;

and so on. If you were using an array, you could do this dynamically in a

loop as well.

for (int i = 0; i < 8; ++i)

b[i] = (c >>> i) & mask;

It’s quite easy to make mistakes, especially when the packed word has a

complicated structure. If you are writing code and the answers aren’t what

you’d expect:

Check to make sure you are using the unsigned right-shift operator

>>> instead of the signed version >>.

Check that the language isn’t doing some type of automatic type

conversion of signed values (e.g., turning bytes into integers before

the shift operation happens).

Check to make sure your masks are correct.

If you are using dynamic shifts, make sure the shift amount isn’t

larger than the word size. For instance, if you are shifting a byte,

make sure you aren’t doing >> 9. How this is interpreted depends

on the environment.

Integers and Endian Notation

Endian refers to how a string of bytes should be interpreted as an integer,

and this notation comes is two flavors: Little-Endian and Big-Endian. The

names come from Jonathan Swift’s Gulliver’s Travels, where the tiny people,

Lilliputians, were divided into two camps based on how they ate eggs. The

Little-Endians opened the eggs from the small end, and the Big-Endians

opened theirs from the big end. As you might expect with that reference,

there are pros and cons to Big-Endian and Little-Endian representations,

but overall it doesn’t make any difference except when you have to convert

between the two formats. A comparison of the two is shown in Table 1.7.

Table 1.7 Comparison of Little-Endian versus Big-Endian Representation

ENDIAN TYPE

B0B1B2B3 = 0XAABBCCDD

SAMPLE

MICROPROCESSORS

Little-Endian

aa bb cc dd

Intel x86, Digital (VAX,

Alpha)

Big-Endian

dd cc bb aa

Sun, HP, IBM RS6000, SGI,

“Java”

Big-Endian, also know as most-significant byte order (MSB) or network

order, puts the most-significant or highest byte value first. This is equiva-

lent to how we write decimal numbers: left to right. The downside is that

many numerical algorithms have to work backward starting at the end of

the array and working forward (just as you would manually with pencil

and paper).

The Little-Endian, or least significant byte (LSB), format is the opposite.

This makes it harder for humans to read hex dumps, but numeric algorithms

are a little easier to implement. Adding capacity (or widening conversions)

is also easier, since you just add bytes to the end of the array of bytes (i.e.,

0xff becomes 0xff00).

Fortunately, regardless of what the byte Endian order is, the bits within

bytes are always in Big-Endian format. For instance, 1 is always stored in a

byte as 000000012 no matter what platform.

The Endian issue becomes critical when you are working with heteroge-

neous systems—that is, systems that use different Endian models. When

shipping bytes between these machines, you must use a standard Endian

format or an ASCII format. In many other programming languages, you

must determine in advance what the Endian architecture is and adjust sub-

sequent bit operations appropriately. For cryptographic applications this

becomes critical, since you are often manipulating bits directly.

With Java, the underlying architecture is hidden and you program using

a Big-Endian format. The Java virtual machine does any Endian conversion

needed behind the scenes.

For C and C++ programmers, normally a BIG_ENDIAN or LITTLE_

ENDIAN macro is defined by the compiler or from an include file. If not,

you can use code similar to this for testing. It sets raw memory and then

converts it to an integer type. The value will be different depending on the

CPU’s Endian type. This C code assumes an int is a standard 4 bytes or

32 bits, but you may wish to generalize:

int isBigEndian() {

static unsigned char test[sizeof(unsigned int)] = {0, 0, 0, 1};

unsigned int i = *(unsigned int) test;

if (i == 0x0000001) return 1; // true, big-endian

return 0; // false, little-endian

}

Java Numerics

We’ll now apply the information in the previous section to the Java

numeric model. While notation is similar to the C or C++ model, there are

Java-specific issues with using signed and unsigned types—specifically,

byte arrays. Java also provides class wrappers around the native types, as

well as an unlimited-capacity integer arithmetic.

Basic Types

Java provides the standard basic numeric types—integers with 8, 16, 32,

and 64 bits, and floating-point types with 32- and 64-bit representations.

Unlike C and other languages, all the types are signed—there are no native

unsigned types. Table 1.8 shows Java’s primitive numeric types.

For integer types, a literal integer can be expressed in decimal or hexa-

decimal formats (using lower- or uppercase letters for hex digits):

int i1 = 1000;

int i2 = 0x3e8; // == 1000 decimal

int i3 = 0x3E8; // same

For literal long values, a prefix of L or l should always be used, even if it

appears unnecessary:

long l1 = 1000; // compiles but is not recommended

long l1 = 1000L; // recommended

long l2 = 0xfffffffff; // won’t compile, error

long l2 = 0xfffffffffL; // correct

Table 1.8 Primitive Numeric Types in Java

NAME

TYPE

LOGICAL SIZE

RANGE

byte

signed integer

8 bits

–128 to 127

short

signed integer

16 bits

–32768 to 32767

int

signed integer

32 bits

–2,147,483,648 to

2,147,483,647 (2.1 billion)

long

signed integer

64 bits

–9,223,372,036,854,775,808

to 9,223,372,036,854,775,807

(± 9.2 × 1018)

float

ANSI/IEEE 754

32 bits

±1.4 × 10-45 to

floating point

±3.4028235 × 1038

(6–7 significant decimal digits)

double

ANSI/IEEE 754

64 bits

±4.9 × 10-324 to

floating point

±1.7976931348623157 × 10308

(15 significant decimal digits)

Specifying literal values for bytes can be tricky because a byte is signed

from –128 to 127, while very often you’ll be using constants specified from

0 to 255. If the value is within –128 to 127, the code will compile. If the value

is from 128 to 255, you can either convert it to its negative equivalent or use

a cast operation. The same principles apply to the short type:

byte b1 = 10;

byte b2 = 189; // error, out of bounds

byte b3 = -67; // = 189, ok

byte b4 = (byte) 189; // ok

byte b4 = (byte) 0xbd; // ok

Floating-type literals are assumed to be a double type unless suffixed

by an f for float. Floating types can also be representing using scientific

notation using valEscale = val × 10scale.

float f1 = 0.1; // compiler error

float f2 = 0.1f; // by default “0.1” is a double type

double d1 = 0.1;

double d2 = 1.0E2 // == 100

In practice, the short type is rarely used because it almost always is

converted into an int type before anything else is done. The float

type also isn’t used because it doesn’t provide enough precision for most

applications.

Type Conversion

Conversions between types can either be widening or narrowing. A widen-

ing conversion is where the new type can represent larger numbers and can

“contain” the old type. For an integer-to-integer type conversion (i.e.,

short to long), this just means putting the same number in a larger

container. Integer-to-floating-point conversions are also considered as

widening, but some of the least-significant digits may be scrambled or

zeroed due to how floating point numbers are presented. These types of

conversion happen automatically and silently either at compile time or at

run time. For example:

int i1 = 123;

long l1 = i; // ok -- l = 123

float f1 = i; // ok -- f = 123

int i2 = 123456789;

float f2 = i; // ok, but f = 123456792

Narrowing conversions may result in a loss of magnitude and precision.

Any conversion from floating-point to an integer type is considered nar-

rowing and is clearly a larger integer type to a smaller integer type. Java

will not automatically do narrowing conversions, and a compiler error will

be issued if the compiler detects such a conversion. To do these conver-

sions, you must explicitly declare them with the cast operator.

Converting floating point to integers drops or truncates any digits to the

right of the decimal point:

float f1 = 0.123456;

int i0 = f1; // compiler error -- requires a cast.

int i1 = (int) f1; // == 0

float f2 = 0.99999;

int i2 = (int) f2; // == 0

float f3 = 100.9;

int i3 = (int) f3; // == 100

float f4 = -100.9;

int i4 = (int) f4; // == -100

int i5 = 0xffL; // compiler error; long to int conversion

For narrowing conversions from one integer type to another, the least-

significant bits (or bytes) form the larger type:

int i1 = 0xfffff01;

byte b1 = (byte) b; // b = 0x01;

These rules are summarized in Table 1.9, with N representing a narrow-

ing conversion, W for a widening conversion, and W* for a widening

conversion that may result in a loss of precision.

Table 1.9 Primitive Type Conversion

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

byte

W

W

W

W

W

short

N

W

W

W

W

int

N

N

W

W*

W

long

N

N

N

W*

W*

float

N

N

N

N

W

double

N

N

N

N

N

Unsigned to Signed Conversions

There are no unsigned types in Java; however, you can still use the signed

types as an unsigned container. Even though a byte is signed, it still has

8 bits and can represent an integer from 0 to 255 even if Java thinks other-

wise. To use the unsigned value, it must be converted into a larger type

using an AND mask.

To convert a unsigned byte:

byte c = (byte) 254; // b = -2 in Java.

short c = (short)(x & 0xff); // c = 254

int c = x & 0xff // c = 254

long c = x & 0xffL; // c = 254, must put L at end of 0xffL

Unsigned short values are converted by the following:

int c = x & 0xffff;

long c = x & 0xffffL;

Finally, the unsigned int is converted into a signed long by the

following:

long c = x & 0xffffffffL

It’s critical to put an L at the end of the mask when working with long

types. Without it, Java will respond as if you are doing an int computa-

tion, then doing a widening conversion to a long:

byte b = (byte)0xff;

long b = (b & 0xff) << 56; // Wrong. b = 0

long b = (long)((int)(b & 0xff) << 56); // same as previous

long b = (b & 0xffL) << 56; // Correct. b = 0xff00000000000000

Overflow

When a computation exceeds an integer type’s bounds, the value is “rolled

over” silently; no exception is thrown:

byte b = (byte) 127; // b = 0111111

b++; // b = 1000000, but b has the value of -128

While the silence may seem disturbing, in practice overflow is rarely a

problem. For floating-point computations overflow is still silent, but the

result is not rolled over. Instead, the type takes on a special Infinity

value that can be checked using normal comparison operators or by using

the Double.isInfinite method. Also note that Java does not think 0.0

is the same as 0 (zero). Instead, 0.0 is treated as a number that is very close

to zero, so division by 0.0 results in an infinite value. Keep in mind, how-

ever, that Java throws an exception if you try to divide by the integer 0. In

the event that a computation doesn’t make sense, the value is NaN, which

stands for “not a number.” If you wish to see if a value is NaN, then you

must use the Double.isNaN method. Various examples of manipulating

double types are shown as follows:

double d1 = 1.7E308 * 2.0; // overflow = Infinity

double d2 = 1.0/0.0; // = Infinity

double d3 = -1.0/0.0; // = -Infinity

int i = 1 / 0; // throws a DivisionByZero exception

double d4 = 0.0/0.0 // = NaN

boolean b = (d4 == d4); // = false, always

boolean b = Double.isNan(d4); // true;

boolean b = Double.isInfinite(d3); // true

Arrays

In Java, native-type arrays are treated and created as if there were objects

created using a constructor with the new operator. Arrays can also be set

with initial values, as shown in the following code snippet.

int[] a= new int[3]; // all three elements set to zero

int[] a = {1, 2, 3}; // a pre-set 3 element array.

Once constructed, an array is not resizable. If dynamically allocated

storage is needed, you should use one of the collections in the java.util

package. The index to arrays is with int types, and the first element starts

at 0 (as in C). Small types will have widening conversions done to them,

and the long type cannot be used without a cast operation. Thus, single

dimensional arrays cannot be larger than 231, or roughly 2.1 billion, entries.

Hopefully, this will be sufficient for your needs.

Since arrays are objects, assignment is done by reference, as shown in the

following:

int[] a = {1,2};

int[] b = a;

b[0] = 100; // modifies the object that both a and b point too.

System.out.println(a[0]); // will print 100, not 1

a = null; // b is null, but a is intact.

In addition, they can be copied using the clonemethod. To use this, cast

the result to the correct array type:

int[] a = {1, 2};

int[] b = (int[]) a.clone(); // b is a “deep copy” of a

The System.arraycopy method is extremely useful for copying parts

of an array or concatenating arrays. It makes a native system call to copy

memory directly instead of copying each element individually and is much

faster than writing a custom for loop:

System.arrayCopy(Object input, int inputOffset,

Object output, int outputOffset, int length)

Even though the method has Object in its signature, this method only

works on array types. Other useful methods for manipulating native

arrays can be found in the class java.util.Arrays and are summarized

in Table 1.10.

Table 1.10 Summary of Methods from java.util.Arrays

JAVA.UTIL.ARRAYS METHOD

DESCRIPTION

static boolean

Returns true if and only if the arrays are

equals(type[] a, type[] b)

the same length and every element is

equal.

static void

Fills an array with a single value.

fill(type[] a, type val)

static void

Performs a fast numeric sort. Array is

sort(type[] a)

modified.

static void

Sorts only part of an array.

sort(type[] a, int fromIndex,

int toIndex)

static int

Performs a binary search of a sorted

binarySearch(type[] a,

array. Returns the array index if a match

type val)

is found and –1 if no match. Arrays must

be sorted first.

Numeric Classes

Each numeric type has a matching class that acts as an object wrapper, as

shown in Table 1.11.

These classes do not have any support for mathematical operations—

you can’t add two integer objects directly. Instead, these classes are pri-

marily used to allow numeric types to be used in methods that expect an

Object type, such as in collections from the java.util package (e.g.,

ArrayList, HashMap). The wrapper classes also provide basic string for-

matting and parsing from strings for numbers. Since they are objects, they

can also be null. Likewise, objects are always pass-by-reference.

public void changeInt(Integer i) {

i = new Integer(“1”);

}

Integer I = new Integer(“0”);

changeInt(i); // i is now 1

All of the classes share some common traits (examples are just shown for

Integer and Long):

Two public fields, MAX_VALUE and MIN_VALUE, in case you don’t

want to memorize the previous table.

byteValue, doubleValue, floatValue, intValue, and

longValue methods that return the underlying number in

a native type.

A static method valueOf that accepts a string and an optional radix

to parse a string and return an object. The radix can be 2 to 32.

static Integer Integer.valueOf(int val)

static Integer Integer.valueOf(int val, int radix)

static Long Long.valueOf(long val)

static Long Long.valueOf(long val, int radix)

A static method parseClass (where Class is the name of the class,

such as Byte or Integer) that also accepts a string, and an optional

radix to parse a string returns the native type instead of an object.

static int Integer.parseLong(int val)

static int Integer.parseFloat(int val, int radix)

static long Long.parseLong(long val)

static long Long.parseLong(long val, int radix)

toString that returns the number as unformatted decimal number.

Table 1.11

Java Class Wrappers for Native Types

NATIVE TYPE

MATCHING JAVA.LANG CLASS

int

Integer

byte

Byte

double

Double

float

Float

long

Long

short

Short

The Long and Integer classes have a few other useful static methods

that provide an unsigned representation of the number in binary, hex, or

octal formats as shown:

static String Integer.toBinaryString(int val)

static String Integer.toHexString(int val)

static String Integer.toOctalString(int val)

static String Long.toBinaryString(long val)

static String Long.toHexString(long val)

static String Long.toOctalString(long val)

Binary representation is especially useful for debugging bit fields.

These objects do not add any “leading zeros” to the output, so new

Integer(16).toHexString() just returns F and not 0F or 0x0F.

Booleans and BitFields

Java provides a native boolean type that can either be true or false.

Unlike C and C++, it is not a numeric type and does not have a numeric

value, and it is not automatically cast. Like the numeric types, there is also

a boolean class wrapper. If you want to create an array of boolean values,

you could use Boolean with one of the collection classes (e.g.,

ArrayList). However, there is a special class java.utl.BitField

specially designed for use with boolean types that provides a huge perfor-

mance increase and memory savings. More specialized applications will

convert a native type or byte array into a bit field directly.

Chars

Java has another native type char that represents a Unicode character and

is used by String and StringBuffer internally. It is represented by an

unsigned 16-bit integer, but it is not a numeric type. It is automatically cast

to an integer in situations when needed, such in mathematical or bitwise

operations, but it’s not automatically cast to any other type. However, you

can explicitly convert a char to another type by using a cast operator.

Since it’s unsigned, it might be tempting to use char in places for math-

ematical purposes and for raw bit fields. In practice, there is not much

point, since char types are automatically converted into an int before any

operation is done, eliminating any advantage

Working with Bytes

Because the Java byte type is signed and because of the automatic conver-

sion rules, working with bytes and byte array can be frustrating. Following

are some tips and tricks to simplify byte manipulation.

Sign Problems

It’s always easiest to specify a constant byte value by using a cast (later in

this chapter you will see tricks on making byte arrays):

byte b = 0xff; // Compiler error. 0xff is an ‘int’ type

byte b = (byte) 0xff; // right

Bytes range from –127 to 128, and not 0 to 255. Setting a byte with the

value 128–255 and a cast is fine—the byte contains the correct bits. The

problem is when you want to retrieve the unsigned value or when you per-

form a computation with bytes and ints. For instance:

byte b = 127;

int i = 3;

System.out.println((i+b)); // Prints -126 and not 130

results in –127 being printed, not 129. In the addition step, byte b is auto-

matically converted to an int type, including the sign. While b was set

with 128, internally it’s really –1, and this is its value when converted.

The key point to remember is that all bitwise and arithmetic operators

work on int and long types only. All other types are automatically cast. If

you are using a byte as an unsigned value, you must manually convert the

type to an int or long before the operation using val & 0xFF or val &

0xFFL respectively.

The problem is more mysterious with the shift operators. Let’s say you

want to do a right-shift on 0xFF. In binary, this is 11111111, so you’d expect an

unsigned right-shift to be 0111111, or 0x7F. The natural way to write this is:

byte b1 = (byte) 0xff;

byte b2 = b >>> 1; // compiler error

However, this results in a strange error from the javac compiler, and the

problem isn’t clearly defined:

aprogram:java.6: possible loss of precision

found: int

required: byte

byte b2 = b >>> 1;

^

You might try and fix this by casting the result:

byte b1 = (byte) 0xff;

byte b2 = (byte)(b >>> 1); // error #2

This performs the compile operation; however, the result is still wrong.

The value of b2 is not 0x7f, but strangely remains at 0xff or –1. Again, Java

is converting the byte into an integer before the shift. Since bytes are signed,

it converts the byte value of –1 into an int value of –1, which is 0xffffffff.

Then it performs the shift, resulting in 0x7fffffff. Finally, the cast takes the

least bits, or 0xff, and converts them to a byte. The step-by-step details

are listed in Table 1.12. The correct solution is to treat the byte as an

unsigned value, then convert to an int, and then do the shift and cast back

down to a byte:

byte b1 = (byte) 0xff;

byte b2 = (byte)((b & 0xff) >>> 1); // correct:

Table 1.12 Comparison of Bit-Shifting a Byte Type

STEPS

INCORRECT

CORRECT

Initial value

0xff = –1

0xff = –1

Convert to int

0xffffffff = –1

0x000000ff = 255

Right-shift >>>

0x7fffffff = large int

0x0000007f = 127

Cast down to byte

0xff = –1

0x7f = 127

To convert a byte back to its unsigned value, you have to do a little trick;

you must bitwise AND the byte with 0xff, that is, b & 0xff. For example:

byte b = 128;

int i = 1;

int unsignedByteValue = b & 0xff

System.out.println((i + unsignedByteValue); // 129

System.out.println((i + (b & 0xff)); // combined

This trick works by observing that the least-significant bytes in an int

match exactly with the byte type, as shown in Table 1.13.

Conversion of Integral Types to Byte Arrays

Java does not have any easy way of converting integral types into raw

bytes. There are the serialization methods, but these involve a lot of work

for something that should be relatively simple. For example:

public static byte[] intToByteArray(int i) {

byte[] buf = new byte[4];

b[0] = (byte) (i >>> 24);

b[1] = (byte) (i >>> 16);

b[2] = (byte) (i >>> 8);

b[3] = (byte) i;

}

public static int byteArrayToInt(byte[]) {

If (buf.length != 4)

throw new RuntimeException(“Bad Length”);

return ((b[0] & 0xff) << 24) | ((b[1] & 0xff) << 16) |

(b[2] & 0xff) << 8) | (b[3]);

}

Table 1.13 Comparison of Representation between int and byte Types

UNSIGNED

SIGNED

BYTE

INT

VALUE

VALUE

REPRESENTATION

REPRESENTATION

0

0

00

00000000

1

1

01

00000001

2

2

02

00000002

126

126

7d

0000007e

127

127

7f

0000007f

128

-128

80

00000080

Table 1.13

(Continued)

UNSIGNED

SIGNED

BYTE

INT

VALUE

VALUE

REPRESENTATION

REPRESENTATION

129

-127

81

00000081

130

-126

82

00000082

253

-3

fd

fffffffd

254

-2

fe

fffffffe

255

-1

ff

ffffffff

256

256

00

00000100

We could do the same thing for longs. Instead, we’ll demonstrate a dif-

ferent technique that builds the result by incrementally shifting the value

in question:

public static byte[] longToByteArray(long I) {

byte[] buf = new byte[8];

for (int j = 7; j >= 0; ++j) {

buf[j] = (l & 0xffL); // !! must use 0xffL, not 0xff !!

l >>>= 8; // l = l >>> 8;

}

}

public static long byteArrayToLong(byte[] buf)

{

long i = 0;

if (buf.length != 8)

throw new RuntimeException(“Bad Length”);

for (int j = 0; j < 8; ++j) {

i |= buf[j];

i <<= 8; // l = l << 8;

}

return i;

}

Converting to Hex Strings

Converting an array into a hexadecimal string is a fairly common task for

printing, data interchange, and debugging. A naive way to do hex conver-

sions is by using one of the previously mentioned byte array-to-long

methods followed by Long.toHexString. However this is very slow

and is limited to arrays smaller than 8 bytes. Even worse would be using

BigInteger (discussed later in the chapter). For example:

byte[] mybytes = ...;

BigInteger a = new BigInteger(1, mybytes);

return a.toString(16);

While mathematically correct, most programs operate on printing two

hex digits per byte; for example, decimal 16 is converted to 0F, not F. The

other problem is that this method is horrendously slow.

While these methods are useful in a pinch, for high performance appli-

cations they won’t do. If one is willing to use a little memory, performance

can be doubled or tripled. The conversion itself is fairly simple, but an

implementation that uses tables has some important benefits:

Provides typically higher performance (at the expense of some minor

setup and memory costs).

Allows using different alphabets instead of the standard. These

issues are further detailed in Chapter 4.

Typically allows simpler coding.

The conversion class starts out defining the hexadecimal alphabet (in

our case, the standard one). The hexDecode table is the inverse mapping;

for instance, character A is mapped to 10. Characters that are invalid are set

to –1. For example:

public class Hexify

{

protected static char[] hexDigits = {‘0’, ‘1’, ‘2’, ‘3’, ‘4’,

‘5’ ‘6’, ‘7’, ‘8’, ‘9’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’},

protected static int[] hexDecode = new int[256];

static {

for (int i = 0; i < 256; ++i) hexDecode[i] = -1;

for (int i = ‘0’; i <= ‘9’; ++i) hexDecode[i] = i - ‘0’;

for (int i = ‘A’; i <= ‘F’; ++i) hexDecode[i] = i - ‘A’ + 10;

for (int i = ‘a’; i <= ‘f’; ++i) hexDecode[i] = i - ‘a’ + 10;

}

Encoding is fairly straightforward. Each byte is split into two sections,

and the table is used to determine the appropriate character:

public static String encode(byte[] b) {

char[] buf = new char[b.length * 2];

int max = b.length;

int j = 0;

for (int i = 0; i < max; ++i) {

buf[j++] = hexDigits[(b[i] & 0xf0) >> 4];

buf[j++] = hexDigits[b[i] & 0x0f];

}

return new String[buf];

}

Decoding is more interesting. First, a function, given a hexadecimal

character, returns the integer value. This is done via a table lookup, and the

result is verified to make sure it’s valid. It’s always important to check the

input for validity, especially if you don’t have control of what the input is.

You could easily add or inline these checks into the main decode routine

for a minimal improvement in execution time, as follows:

protected static int getHexDecode(char c) {

int x = hexDecode[c];

if (x < 0) throw new RuntimeException(“Bad hex digit “ + c);

return x;

}

While the encode function always encodes two characters per byte, you

may receive a hexadecimal string that doesn’t conform to this rule. If you

are only processing strings that you created, you could check that the num-

ber of input characters is even and that an exception is thrown if it’s not.

The following implementation handles full generality. The string is checked

to see if it has an odd number of characters, by computing max & 0x01

(or, equivalently, max % 2), which just looks at the least-significant bit. If

it’s set, then the value is odd.

public static byte[] decode(String s) {

char[] input = s.charArray[];

int max = input.length;

int maxodd = max & 0x01;

byte b;

byte[] buf = new byte[max/2 + odd];

int i = 0, j = 0;

if (maxodd == 1) {

buf[j++] = getHexDecode[input[i++]]

}

while (i < max) {

buf[j++] == (byte)(getHexDecode[input[i++]] << 4 |

getHexdecode[input[i++]]);

}

return buf;

} //end of class Hexity

BigInteger

The BigInteger class in the java.math package provides a way of

doing computations with arbitrary large integer types. Unlike C++, Java

does not overload operators, so to do basic math, you have to call methods

to perform mathematical operations such as addition. All of the operations

create new BigInteger objects as results. The objects used in the opera-

tions remain untouched. For instance:

BigInteger b1 = new BigInteger(“1”);

BigInteger b2 = new BigInteger(“2”);

BigInteger b3 = b1.add(b2);

At the end, b1 is still 1, b2 is still 2, and b3 is what you’d expect it to be,

3. Modifying an object requires explicit code. For instance, if we wanted

b1 to contain the result of the addition, we would use self-assignment like

this:

b1 = b1.add(b2); // b1 is now 3

While the notation is different, the behavior is no different than using

normal operators. Adding 1 + 2 doesn’t change the original values of 1 or 2.

All of the usual arithmetic operations are available: add, subtract,

multiply, divide, and remainder. Another method, divideAnd-

Remainder, returns a two-element BigInteger array where element 0

is the division result and element 1 is the remainder. A complete list and

comparison of BigInteger operations is given in Table 1.14.

Table 1.14 BigInteger Arithmetic Operators

ARITHMETIC

NATIVE TYPE

BIGINTEGER

OPERATION

NOTATION

NOTATION

Addition

a + b

a.add(b)

Subtraction

a – b

a.subtract(b)

Multiplication

a * b

a.mult(b)

Integer division

a / b

a.divide(b)

Remainder

a % b

a.remainder(b)

Division with

int[] result =

BigInteger[] result =

remainder

{ a / b, a % b }

a.divideAndRemainder(b)

Negation

–a

a.negate()

Table 1.14

(Continued)

ARITHMETIC

NATIVE TYPE

BIGINTEGER

OPERATION

NOTATION

NOTATION

Exponentiation

Math.pow(a,b)

a.pow(b)

Random value

Random r = new Random(); Random r = ...

int bits = ...;

a = r.getInt();

BigInteger a =

new BigInteger(bits, r);

Absolute value

(a >= 0) ? a : b or

a.abs()

Math.abs(a)

Minimum of a, b

(a < b) ? a : b or

a.min(b)

Math.min(a,b)

Maximum of a, b

(a > b) ? a : b or

a.max(b)

Math.max(a,b)

Likewise, all the usual bit operations are available, as described in

Table 1.15. There is no need for a special unsigned right-shift operator, since

BigInteger has unlimited capacity; the result of shifting n places to the

right is the equivalent of dividing by 2n, regardless of sign. In addition are

a few new methods that simplify working with bits: testBit, setBit,

and flipBit. Two others called bitCount and bitLength need a bit

more explaining. For positive integers the results are what you’d expect:

bitLength returns the minimal number of bits required to represent the

integer, and BitCount returns the number of one-bits in the representation.

For negative numbers, bitLength gives a minimal length excluding the

sign bit, and bitCount returns the number of zeros in the representation.

Creating and Converting

BigInteger objects can be converted by using a string representation of a

number, a native type, or with a byte array. Using a native type, a string, or

a byte array representation of a number creates BigInteger objects.

Strings

You can create BigInteger objects by using a construction that takes a

string representation of a number. By default, the constructor expects the

string representation to be in decimal (base 10) form. However, you can

also use the standard base 16 representations, as follows:

BigInteger(String base10Rep)

BigInteger(String representation, int radix)

Table 1.15 BigInteger Bit Operations

NATIVE TYPE

BIGINTEGER

BIT OPERATION

NOTATION

NOTATION

AND

a & b

a.and(b)

ANDNOT

a & ~b

a.andNot(b)

NOT (complement)

~a

a.not(b)

OR

a | b

a.or(b)

XOR

a ^ b

a.xor(b)

Shift left n bits, signed

a << n

a.shiftLeft(n)

Shift right n bits, signed

a >> n

a.shiftRight(n)

Test bit n

(a & (1 << n) != 0)

a.testBit(n)

Set bit n

(a | (1 << n))

a.setBit(n)

Flip bit n

(a ^ (1 << n))

a.flipBit(n)

Conversely, you can retrieve a string representation in an appropriate

base by using the toString method:

String toString()

String toString(int radix)

Numeric Types

The BigInteger class can be created from a long by using the static

method:

Long val = 123456789123;

BigInteger bi = BigInteger.valueOf(val);

Once you have the BigInteger object, it can be converted directly

into a numeric type by using intValue, longValue, floatValue, or

doubleValue. Not that we’ll mind, but there is no method to directly

convert into a short or char value. If the value is too large to fit into the

numeric type, a silent narrowing conversion is done (the high bits are

chopped off).

Byte Arrays

Generating byte arrays and constructing byte a. Assuming you have a

BigInteger object, a byte array can be generated with:

byte[] toByteArray()

And the results of this output can be used to create a new BigInteger

object:

BigInteger(byte[] array)

However, because of signing issues, typically this format is not used for

cryptographic purposes. If the bit length is a multiple of 8, and it always is

for cryptographic applications, the byte array starts with an extra byte

indicating the sign (0 for positive). Either your application can deal with a

leading zero, or you can strip it away, as in the following:

byte ba[] = bi.toByteArray();

if (ba[0] == 0) {

byte[] tmp = new byte[ba.length - 1];

System.arraycopy(ba, 1, tmp, 0, tmp.length)

ba = tmp;

}

A lower-speed option is to use strings as the common medium:

Byte ba[] = Hexify.decode(bi.toString(16);

Since BigInteger expects a sign-bit, you’ll need to use a special con-

structor and manually indicate the sign by passing in 1, –1, or 0 for positive,

negative, or zero:

BigInteger(int signum, byte[] magnitude)

BigInteger and Cryptography

BigInteger has a few special methods specifically designed for cryptog-

raphy, listed in Table 1.16. These operations are discussed fully in Chapter 3.

Secret Methods in BigInteger

For Sun Microsystems to effectively test for primality, many generic

methods and algorithms had to be implemented; however, they are not

part of the public API. For people working in numbers theory or who are

developing more advanced cryptographic algorithms, these “hidden”

methods may be useful. Unfortunately, they are declared private; but with

a few modifications to the source code, you can create your own enhanced

BigInteger variant.

Table 1.16 BigInteger Operations Useful in Implementing Cryptographic Algorithms

OPERATION

BIGINTEGER NOTATION

Create a random nonnegative integer

BigInteger

uniformly distributed from 0 to 2n–1

(int numBits, Random r)

Create a random integer that is prime

Random r = ...;

with certainty 1 – 2n

int bitLength = ...;

BigInteger a =

BigInteger(bitLength, r)

Check if prime with certainty using

a.isProbablePrime()

IEEE standard of 1 – 2100

a mod b

a.mod(b)

an mod b

a.modPow(b, n); // n is an

int, not BigInteger

find a-1, such that aa-1 = 1 mod b

a.modInv(b)

Greatest common denominator of a, b

a.gcb(b)

The most interesting of these are the following methods:

int jacobiSymbol(int p, BigInteger n); // this is “package protected”

private boolean passesMillerRabin(int iterations)

private boolean passesLucasLehmer()

private static BigInteger lucasLehmerSequence(int z,

BigInteger k, BigInteger n)

In addition, there is an implementation of sieve for testing primality of

small numbers in the class BitSieve.

To “free” them:

1. Copy all the source files from this directory into a new directory.

2. Edit each file and change the package name from sun.math to one

of your own choosing.

3. Make the BitSieve class “public.”

4. Change the desired methods and classes to public. Note that the

method jacobiSymbol in BigInteger and the class BitSieve

do not have an access modifier, so you have to add public in front

of the method.

5. Compile.

This code is copyrighted by Sun Microsystems. For commercial use and

distribution, refer to the license agreement distributed with the source

code.

Now that we have a full understanding of bit operations and the Java

model, we’ll discuss secret and public key cryptography.

35

Secret key cryptography is based on cryptographic functions that rely on a

single key that must be kept confidential. Secret key ciphers perform

encryption and decryption using the same key, and cryptographic hashes

must be computed and verified using the same key.

Symmetric Block Ciphers

In general, a cipher is a function that maps a message, known as plaintext,

into an unreadable form, known as ciphertext, by use of a key. This is known

as encryption. Without the key, you cannot do the inverse transformation,

the decryption, or turning the ciphertext back into its original plaintext

form. In lay terminology code is often used interchangeably with cipher.

However, they have different meanings. A code is translation of a message

by use of a dictionary, without any key being used.

Modern block ciphers operate on a fixed number of bytes in a single pass,

for instance, transforming 8 bytes plaintext into 8 bytes ciphertext. Messages

larger than the block must be chopped into pieces, and if the message isn’t

a multiple of the block size, the message must be padded appropriately.

Secret Key Cryptography

C HAPTE R

2

Cipher Properties

To be useful a cipher must be secure and must be usable. What makes a

cipher secure? For ciphers that have stood the test of time (i.e., those in

common use and have not been cracked), a rough estimate of security is

key length, with the longer the key being more secure. A cipher is deemed

secure if there is no attack that is significantly better than brute force. How-

ever, clearly this by itself is not enough, and the notion of security is tied to

current or contextual computation resources available to an attacker. An

algorithm may be structurally secure, but if the key is too small, brute-force

attacks can easily break the cipher by directly recovering the key.

Security Properties

The fundamental property of an encryption algorithm is that the output be

effectively random, regardless of what the input or key is. From this prop-

erty, many notions emerge regarding how well the algorithm performs:

Semantically secure, polynomially secure, or “indistinguishably

of encryptions.” There are many variants of this idea, but the

simplest is that, given two plaintext messages and their ciphertext

equivalents, it is impossible to determine which plaintext matches

the correct encrypted version with a probability greater than .5. In

other words, the ciphertext does not reveal any information about

the plaintext.

Nonmalleability. Given a ciphertext, it is impossible to manip-

ulate the ciphertext to produce related plaintext. In other words,

a 1-bit change to a ciphertext should result in a very different

plaintext.

Plaintext-aware.

It is computationally infeasible to produce a valid

ciphertext without knowing the corresponding plaintext. For instance,

you should be able to generate ciphertext that decrypts to 7-bit

ASCII, unless you are encrypting the original plaintext. This is useful

to prevent the generation of false messages.

Lack of algebraic structure.

It’s very important that a cipher doesn’t

have certain algebraic properties that might allow multiple encryp-

tion or multiple keys to negate each other. In particular, it is important

that the cipher does not form a group where:

Each key has an inverse key.

Encrypting something twice with different keys is equivalent to

encrypting it once with another key.

The result of encrypting with one key and decrypting with

another key is the same as encrypting with one key. This would

be very useful for attack purposes.

Ideally, you don’t even want a subset of keys and messages to form a

group; however, proving that is quite tricky. You also don’t want “fixed

points,” that is, messages that encrypt to themselves.

Brute-Force Attacks

Brute-force attacks are the simplest attack, and for symmetric ciphers, this

means trying each key, one after another, until the message decrypts prop-

erly. In decryption, there is no inherent flag saying “this is the right key” or

“decryption occurred correctly.” For instance, a ciphertext with one key

might decrypt to cat, while the same ciphertext decrypted with a different

key might decrypt to dog. Another way of looking at this is if you encrypted

a random number or binary data, an attacker using the brute-force tech-

nique would be unable to determine if the resulting plaintext was the orig-

inal number without using some additional context.

Typically, you do not encrypt random binary data, but rather something

with more structure. During brute-force attacks, the decrypted result is

checked to see if the structure exists. For instance, English text is encoded

using ASCII where only 7 bits in each byte are used. During a brute-force

attack, every output that is only 7 bits is stored in a “candidates” file, which

will need to be checked by an actual human to determine what the correct

result is. False-positive results are extremely rare for language texts.

Even with small key sizes, such as DES (for Data Encryption Standard),

there are 255, or approximately 3.6 × 1016, keys to check. When measuring

cryptographic security, you can’t be afraid of large numbers. While 255

sounds out of reach, in fact, it’s quite doable. Brute-force algorithms are

easily parallelizable, and with the Internet, tens of thousands of computers

are easily pooled for such attacks, thereby enabling trillions of keys to be

tested per second.

Other Attacks

Brute-force attacks can easily be thwarted by using a larger key, so a full

key-space scan becomes impossible. The following attacks attempt to find

weaknesses in the cipher and provide shortcuts to a brute-force attack. If a

significant attack occurs, the cipher is said to be broken and thus insecure.

Ciphertext-only attack. This is probably the most non-networked

application. The adversary here has a ciphertext message and that’s

it. In the vast majority of cases, this attack indicates brute force,

where every key is checked.

Chosen plaintext. Here the attack has a bulk list of plaintexts

and receives the corresponding ciphertext and through analysis can

determine the key.

Adaptive chosen plaintext. Similar to the previous except the process

is more interactive. Plaintext may depend on the results of the previ-

ous plaintext and ciphertexts.

Chosen ciphertext. The attack selects ciphertext, which is then

decrypted.

Related key. The attack requires the plaintext and the ciphertext from

different keys.

Partial known plaintext attack. This attack is one of my own devis-

ing. I haven’t seen anything in literature of this. Here the attacker

has the ciphertext and some of the plaintext. This potential attack

turns out to have applications for database security.

Power attack.

This only applies to smartcard-like devices. An attacker

can actually measure the radiation and time of the cryptographic

computation and estimate the key or size of the key.

Many of these attacks come in the form of linear or differential (or some

combination) analysis. Knowing the algorithm being used and with many

pairs of text (plaintext and/or ciphertext), an attacker can determine the

internal structure and thus extract the key or limit the key space.

Many might be wondering what the point is of the attacks if the attacker

already has plaintext-ciphertext pairs. The trick is that the text pairs are

chosen by the attackers and by themselves are not interesting (the values

are more or less random or generated by an algorithm). The plan is to use

these pairings to recover the key and thus decrypt ciphertexts that do con-

tain useful information. Still, the attacks may seem odd, yet many network

protocols allow many of these types of attacks since they echo back. For

common ciphers most of these attacks require a lot of (at least 240) pairings,

which would certainly be noticed by even the most primitive network

monitoring. Another way these attacks can be exploited is by an attacker

having access to either the encryption or decryption functions of a system.

A hacker breaking into a system might not have access to the keys but will

have access to system calls to do encryption. Then the attack might only

generate a modest strain on the system’s resources and not be noticeable.

Common Block Ciphers

There are dozens of block ciphers; however, you really only need to consider

a small handful (see Table 2.1). Block ciphers come in two flavors: 64-bit

block and 128-bit block.

Data Encryption Standard (DES)

The Data Encryption Standard, first developed in the 1970s, is the most

widely used cipher. It was the first industrial-strength algorithm that was

fully specified to the public, in Federal Information Processing Standard,

or FIPS, 46-2. (FIPS are the standards used by the U.S. federal government

for information processing.)

DES keys are specified as a 64-bit value; however, only 56 bits are used

as part of the key. For each byte, the 8th bit is a parity bit that causes bytes

to have an odd number of one-bits in them (other wise known as “odd-

parity”). This is left over from the early days of computing.

Table 2.1 Selected Secret-Key Ciphers

BLOCK SIZE

KEY SIZE

ALGORITHM

YEAR

BITS

BITS

REASON

DES

1977

64

56

Standard

Triple DES

1985

64

168

Standard

Blowfish

1993

64

40–442

Popular, fast

RC5

1996

64 standard,

0–2048

Popular,

but supports

supported

32 and 128

IDEA

1992

64

128

Popular

Rijndael

1998

128, 192, 256

128, 192, 256

Standard, fast

Twofish

1998

128, 192, 256

128, 192, 256

Popular

RC6

1998

128, 192, 256

128, 192, 256

Popular,

supported

It’s worth looking at the inner workings of DES, since the ideas are used

in many ciphers and the concept is actually fairly simple. First, the key is

expanded. Although the key itself is only 56 bits, internally the key is

expanded into sixteen 48-bit values. Likewise, various stages in the pipeline

also expand data from 32 bits to 48 bits. Later, these expanded values get

contracted as well. While, theoretically, expanding a key into larger values

doesn’t make a key longer than it really is, it does have the benefit of allow-

ing more complicated functions to be constructed with simpler code.

The basic structure of DES is reused in many other ciphers and is what is

known as Feistel structure. The plaintext block is split into left and right sec-

tions, and then the following is iterated:

Ln+1 = Rn

Rn+1 = Ln ⊕ f(Rn, Kn)

where Kn is a subkey of the main key and f is a mixing function. This mix-

ing function is really what makes DES successful. It uses what are called

S-Boxes, which are maps from a 6-bit input to a 4-bit output (the contraction)

based on a lookup table. In DES (and other ciphers) the map is nonlinear and

normally specified by using tables. The security of DES depends entirely

on the values in the S-Boxes and exactly how they were constructed is a

mystery to this day.

DES Security

As of this writing, DES has had no attacks that are significantly better than

brute force. So in structure DES is solid and secure. However, the problem

of short key size and vulnerability to brute-force attacks remains. Nowa-

days DES can be cracked either by specialized hardware or by distributed

networks in a few hours. (See www.eff.org/descracker/ for full

details.) What’s interesting is that various modifications have been pro-

posed to enhance DES. It has been shown that just about any tampering or

enhancements to the S-Box structure results in a much weaker cipher. These

days there really is no reason to use DES except in legacy applications.

Triple DES

While DES appears to have a solid design, the key space is just too short

and is susceptible to brute-force attacks. To help fix the problem, an interim

solution was proposed that effectively performs multiple iterations of DES

on the message using different keys. What’s known as Triple DES, TDES,

DES-EDE, or 3DES can provide 168 bits of security using three DES keys. A

less common version is a two-key version providing 112 bits of security

where three operations are done, but where one key is used twice. It does

this by composition of three DES operations, as shown in Table 2.2.

Table 2.2 Triple DES Operations

OPERATION

3 KEYS

2 KEYS

Encryption

Ek3(Dk2(Ek1(m)))

Ek1(Dk2(Ek1(m)))

Decryption

Dk3(Ek2(Dk1(m)))

Dk1(Ek2(Dk1(m)))

In software this is a bit slow, but for basic networking (100 megabit), the

speed limitations aren’t much of a problem, since the algorithm can be

implemented in hardware fairly efficiently. Triple DES is not a FIPS but

instead is specified under the ANSI X.9.52 standard and NIST SP 800-20

(ANSI stands for American National Standard Institute, NIST for National

Institute of Standards and Technology).

Triple DES Security

Providing 168 bits of security, Triple DES is by far the most studied,

strongest algorithm available. When in doubt, DES is probably the best

choice. Despite more modern ciphers, nobody suggests that Triple DES is

less secure than others.

Why Not “Double DES”

Using two keys, or “Double DES,” isn’t done because the key size of 112 is

too small; moreover, it isn’t much more secure than a single iteration of

DES. Given a plaintext-ciphertext pair, an attacker could extract the key

using a brute-force attack requiring up to 2112 computations. However, the

attacker can trade computation costs for memory storage by using a “meet-

in-the-middle” attack. There are many variations, but the simplest is to

compute 256 different encryptions of the plaintext and store them, then

begin decrypting the ciphertext using keys and compare the result to the

table. The attacker will likely get multiple matches and would need

another plaintext-ciphertext pair to verify that is the correct key. The worst

case is that the attacker would need to perform 256 decryptions, but even

then that’s only 257 total operations.

There is nothing specific to DES in this; any doubled-cipher will have

this problem. The attack extends to the two-key version of Triple DES as

well, but it’s generally less feasible, since the storage requirements become

quite large.

Blowfish

In 1994, 56-bit DES was becoming obsolete and Triple DES was still very

slow. Every other secure encryption algorithm was classified, incompletely

specified (GOST, a Soviet algorithm was known, but the critical S-Boxes

were classified), or known but unusable, since they were patented (Khufu,

REDOC II, IDEA) or crippled for legal export (RC2, RC4).

In late 1993, Bruce Schneier presented the Blowfish cipher, a free, non-

patented, drop-in replacement for DES with a 64-bit block size but a vari-

able key length of up to 448 bits [Schneier1994]. In today’s world awash

with cryptologists and a healthy open-source movement, it’s easy to forget

how revolutionary this was—strong cryptography for anyone who wanted

it. Even better, the Blowfish cipher is still secure.

The block size is the same as DES with 64 bits. However, Blowfish can

have a variable key size in multiples of 8 bits, from 40 to 442 bits (56 bytes).

During initialization, a key regardless of length is expanded to an internal

structure of 4,168 bytes. The setup costs are expensive, since it does 521 iter-

ations of encryption to generate the key schedule. However, after this, the

key size has no effect on performance—in other words, the speed using a

442-bit key is the same as with a 56-bit key. This allows key choice to be based

purely on security, storage, and protocol requirements instead of on perfor-

mance. The initialization penalty can somewhat be avoided by directly

specifying the internal array instead of having it generated (of course, then

a key of 4168 bytes instead of at most 56 bytes must be saved and stored).

In overall structure, it’s not that different from DES. The subkey array is

eighteen 32-bit subkeys (compare to DES with sixteen 48-bit subkeys).

There are four S-Boxes, each with 256 entries (compare to DES, which has

eight S-Boxes with 64 entries). The most important difference is that Blow-

fish dynamically creates S-Box values during initialization, while DES uses

fixed values. Notice that in Blowfish, all data is a nice round number of

32 and 8 bits, all designed to fit into machine registers with minimal

amount of bit extractions and shifts. This gives Blowfish its speed.

Cryptographically, Blowfish has been deemed strong and has been quite

well received. The algorithm has been included in dozens of free and

limited commercial products. There have been no significant faults found

with the algorithm. In a reduced eight-round version (not the standard

16 rounds), roughly 1 in 214 keys are weak [Vau1996]. The weakness allows

an attacker to know if a weak key is being used or not, but not what the

specific key is (and it also allows some other more abstract attacks). The

weak keys come about because Blowfish dynamically creates S-Boxes.

Apparently, in some cases Blowfish generates some stinkers where some of

the S-Box values are duplicated. There is no way of knowing which key is

weak in advance; you have to go through the iteration and then check the

S-Box values to see. Again, no weak keys have been found for the full-

strength 16-round version.

IDEA

The IDEA cipher also operates on 64-bit blocks but uses a fixed-size 128-bit

key length [LaMa1990, LaMaMu1991]. It is generally regarded as a very

well-designed cipher and has experienced no known cryptographic attacks

better than brute force. It is covered by patents owned by the Swiss firm

Ascom, but they have allowed free noncommercial use, provided permis-

sions are first obtained. Because of this licensing, IDEA became popular in

the email encryption program PGP (Pretty Good Privacy). Today it seems

to still be quite popular in the Perl community, partially since it is a very

secure cipher and has received a positive recommendation from Bruce

Schneier in Applied Cryptography (at the time Schneier had just developed

Blowfish and it was still too new to be recommended). However, IDEA,

like Blowfish, generates some nonlinear aspects of the cipher based on the

key; because of this there are classes of weak keys [DaGoVn1993]. Fortu-

nately, the odds of using one of the weak keys is 2-77, so it’s not considered

a grave threat.

While IDEA is fairly easy to implement in software, it’s also very slow

because of generous use of expensive multiplication operations. A naïve

implementation is typically about the same speed as DES, if not slower.

Those same multiplications allow it to speed up significantly on 64-bit

machines, or if someone hand-codes an assembly version that uses some

microprocessors’ extensions such as MMX (Intel and AMD processors),

SSE2 (Intel’s Pentium 4), or AltiVec (IBM/Motorola’s PowerPC). In these

cases, it’s about twice as fast as DES. See [Lipmaa1998] for a discussion of

using the MMX extension with IDEA.

RC5

RC5 was presented by Ron Rivest (coinventor of the RSA, for Rivest-

Shamir-Adelman, algorithm) in 1995 [Rivest1995]. It is also fully specified

in RFC 2105; however, it is patented (U.S. Patent 5,724,428) and is propri-

etary to RSA Security. It is fully parameterized in terms of block size, key

size, and security represented as RC5-w/r/b, where:

w = Word size in bits—16, 32, and 64—which results in a block size

of 2w. The standard is 32.

r = Number of rounds, which, in turn, provides security. The appro-

priate number depends on w.

b = Number of bytes in secret key; can be 0 to 255.

Internally, the key expansion stage takes a key and turns into an internal

array S of r × w/8 bytes. Following that, encryption is amazingly short and

simple:

A = A + S[0]; // A is first half of plaintext block

B = B + S[1]; // B is second half of plaintext block

for (i = 1 .. r)

A = ((A 5 B) << B) + S[2i]; // <-- data-dependant rotations

B = ((B 5 A) << A) + S[2i+1] ;

The matching decryption algorithm is equally terse. As you can see, like

Blowfish, RC5 uses data-dependant rotations.

RC5 is typically used with 64-bit blocks, 16 rounds, and keys being at

least 128 bits (e.g., RC5-32/16/128+). While you can use larger blocks, you

might consider using the newer ciphers like Rijndael or RC6 (discussed in

the following sections). Using these guidelines, the most effective attacks

are brute force, and the security is based on the size of the key. There is no

reason not to use at least 128 bits, since after key setup, the cipher runs at

the same speed independent of key size.

Rijndael and the Advanced Encryption Standard (AES)

In 2001, the NIST selected Rijndael as the replacement for DES (FIPS 197).

Flemish for XYZ and pronounced “rain-doll,” Rijndael is an interesting

cipher, since it works in a completely different way from the previous ciphers.

The algorithm is in some ways similar to shuffling and cutting a deck of

cards. The interstate is laid out in a square, and the rows and columns are

shifted, mixed, and added in various ways. The entries themselves are also

substituted and altered. It has a lot of parallel and symmetric structure

because of the mathematics, which provides a lot of flexibility in how it is

implemented. However, some have criticized it as having too much structure,

which may lead to future attacks. Apparently that didn’t bother the NSA

(National Security Agency) or the NIST. No known cryptographic attacks

are known, and it works well on a wide variety of processors, doesn’t use

bit shifting or rotation, and is very fast.

The AES Competition

How Rijndael was selected is also interesting. Knowing that DES had out-

lived its useful lifetime, the NIST issued an open call for replacements.

Anyone was welcome to submit a cipher. The general requirements were:

Must be a symmetric block cipher.

Must support a block size of 128 bits.

Must support key sizes of 128, 192, and 256 bits.

Should be at least as fast Triple DES.

Should run on a variety of hardware and should have “good”

characteristics for implementation in hardware.

If selected as the winner, the submitter relinquishes all intellectual

property claims such as patents and trademarks.

Out of 15 semifinalists, 5 were selected:

Rijndael, the eventual winner.

MARS, from IBM.

RC6, from RSA Labs (similar to RC5).

Serpent, from Ross Anderson, Eli Biham, and Lars Kundsen.

Twofish, from Bruce Schneier, John Kelsey, Doug Whiting, David

Wagner, Chris Hall, and Niels Ferguson. It is the successor to the

previously mentioned Blowfish algorithm.

How did they rate? Table 2.3 presents summaries from the confer-

ence reports from the Third Advanced Encryption Standard Candidate

Conference.

The rankings are extremely relative, and a last-place finish does not

mean the algorithm was unacceptable. Each algorithm author had his or

her own interpretation of what the NIST selection process called for. Ser-

pent, for instance, was extremely conservative in security, and this shows

in its performance, while RC6 focused on 32- and 64-bit processors.

That the four other ciphers were not selected does not imply they are

insecure. To the contrary, the NSA said any of the final candidates would

make a good standard from a security standpoint (the speed with which

they made their decision would seem to imply that the NSA has much more

sophisticated methods of cipher analysis than the public has). In addition,

the NIST has not specifically excluded them from being used. Since the

NIST wanted a “generalist” cipher, these alternates may find their way into

other more specialized standards and applications. However, Twofish and

RC6 are particularly noteworthy because they have strong corporate or

independent support and are likely to be used in future applications.

Table 2.3 Evaluation of AES Finalists on Various Platforms

PROCESSOR

RELATIVE RANKING

ARCHITECTURE

BETTER......................................................WORSE

FGPA

Rijndael

Serpent

RC6

MARS

(programmable

Twofish

logic)

DSP TMS320C62x

RC6

Rijndael

Twofish

Toshiba T6N55

Rijndael

RC6

(smart card)

Twofish

Pentium II

Rijndael

RC6

MARS

Serpent

Twofish

Itanium /

Rijndael

RC6

MARS

Serpent

PA-RISC (HP)

Twofish

Alpha 21264

Rijndael

Twofish

RC6

MARS

Serpent

ASIC hardware,

Rijndael

Serpent

Twofish

MARS/RC6

worst case

“Parallelism”

Rijndael

Twofish

Serpent

MARS

RC6

(feedback modes)

Memory usage

RC6

Serpent

MARS

Twofish

Rijndael

Software

Rijndael

MARS

Serpent

(summary, Whiting)

Twofish

RC6

Software (summary,

RC6

Rijndael

MARS

Twofish

Serpent

Bassham)

Java (speed)

Rijndael

MARS

RC6

Java (throughput)

RC6

MARS

Rijndael

Twofish

Java (key setup)

Rijndael

Serpent

RC6

MARS

Twofish

Twofish

The successor to Blowfish, Twofish has already been widely deployed in

many open-source and minor commercial projects. Twofish is meticulously

constructed and well documented, and the submission paper is an excel-

lent reference and bibliography on cryptanalysis. The developers even

published a book on its design [ScKeWhWaFe1999]. In some ways, Twofish

is a very traditional design that isn’t dissimilar to DES: it’s based on a Feistel

design using expanded keys and S-Boxes for nonlinearity. The big differ-

ence is how each part was designed and how each part interacts with the

others. Many components were found by an exhaustive search for a map-

ping that provided an optimal result using the smallest number of machine

operations. While each piece of Twofish separately is fairly straightfor-

ward, the final result is hard to understand and analyze. One description of

it is “a collection of patches which fix various problems” [Baudron1999].

That said, no one is suggesting it is insecure and currently the best attack is

brute force.

RC6

RC6 builds upon the success of RC5 and is modified to handle 128-bit block

sizes using four 32-bit registers. Like RC5, the algorithm is completely

parameterized and specified as RC6-w/r/b. As submitted to the AES, RC6

uses w = 32 and r = 20. The length of the key may be any multiple of 8; how-

ever, it is most common to use b = 16, 24 or 32 bytes (128, 192, and 256 bits).

The key scheduling function is almost identical to the one used in RC5,

except that RC6 generates more bytes from the key. It is fast—and in many

cases faster than Rijndael—and Java performance is excellent. RSA Secu-

rity has submitted RC6 to various other standard bodies, so you can be

sure you will be seeing more of it. They also have trademarked the name

“RC6” and received a patent on its design (U.S. Patent 6,269,163).

Ciphers You Shouldn’t Use

This section lists ciphers that are insecure and should not be used. Oddly,

they keep turning up in supposedly high-security applications such as

authentication tokens (see [FuSiSmFe2001] for examples). The ciphers are

most likely used because they are simple to implement in a few lines of

code and visually appear to do a good job obscuring the underlying data.

Password XOR Schemes

In this type of scheme, the user takes a password and repeatedly uses it to

XOR against the message. This has been used in various forms, including

file protection. It is also common in protecting machine code where the

program starts to decrypt the machine code and then jumps into the newly

decrypted code. However, this is a weak scheme, and it can be easily

cracked, especially if there is any knowledge about what the plaintext

looks like (such as common headers or ASCII). It can be used to obscure

data to make it visually unreadable, but not for anything that is susceptible

to an attack:

// same function to encode and decode

public static void xorData(byte[] data, byte[] key) {

for (int i = 0, j = 0; i < data.length; i++) {

data[i] ^= key[j++];

if (j == key.length) j = 0;

}

}

Classical Cryptography

Classical cryptography involved simple ciphers that typically worked on

letters or characters as opposed to bytes. More advanced classical cipher

operated on blocks, but typically blocks of letters instead of blocks of bytes.

These are now primarily of historical interest and useful as examples for

more advanced concepts.

Interestingly, some of the work of classical cryptography is still classified.

From 1938 to 1941, William Friedman wrote a four-volume series titled

Military Cryptanalysis. In 1984 the first two volumes were declassified. A

more modern series from Lambros Callimahos included three volumes

written from 1959 to 1977. Again only the first two are declassified. In 1992,

an attempt to declassify the remaining volumes using the Freedom of Infor-

mation Act was rejected. According to the NSA (Gilmore v. NSA: 10-02-93 let-

ter, available at: www.eff.org/Privacy/Key_escrow/ITAR_export/

ITAR_FOIA/Gilmore_v_NSA/10-02-93.letter

The documents are classified because their disclosure could reasonably be

expected to cause serious damage to the national security . . . . In addition,

this Agency is authorized by various statutes to protect certain information

concerning its activities. We have determined that such information exists in

these documents.

In other words, 45 years later, World War II cryptography is still essential

to national security. One can only speculate as to why this would be. To my

knowledge no further attempts have been made to release the material

since 1992, but maybe someone will try again.

Regardless of what is hidden in those texts, classical cryptography has

no place in your security solution.