Loading ...
Global Do...
News & Politics
8
0
Try Now
Log In
Pricing
Nick Galbreath Cryptography for Internet and Database Applications Developing Secret and Public Key Techniques with Java™ Cryptography for Internet and Database Applications Nick Galbreath Cryptography for Internet and Database Applications Developing Secret and Public Key Techniques with Java™ Publisher: Bob Ipsen Editor: Carol A. Long Developmental Editor: Adaobi Obi Managing Editor: Micheline Frederick New Media Editor: Brian Snapp Text Design & Composition: Wiley Composition Services Designations used by companies to distinguish their products are often claimed as trade- marks. In all instances where Wiley Publishing, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appro- priate companies for more complete information regarding trademarks and registration. This book is printed on acid-free paper. ∞ Copyright © 2002 by Nicholas Galbreath. All rights reserved. Published by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rose- wood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470. Requests to the Pub- lisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspointe Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail: permcoordinator@wiley.com. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, inci- dental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Library of Congress Cataloging-in-Publication Data: ISBN: 0-471-21029-3 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 Preface xiii Introduction xv Chapter 1 Bits and Bytes 1 General Operations 1 Number Bases 2 Bits and Bytes 3 Signed Bytes 4 Bitwise Operators 4 Complementation or Bitwise NOT 6 Bitwise AND 6 Bitwise OR 7 Bitwise Exclusive OR (XOR) 8 Left-Shift 8 Right-Shift 9 Special Operations and Abbreviations 10 Packed Words 11 Integers and Endian Notation 12 Java Numerics 13 Basic Types 14 Type Conversion 15 Unsigned to Signed Conversions 17 Overflow 17 Arrays 18 Numeric Classes 20 Contents v Booleans and BitFields 21 Chars 22 Working with Bytes 22 Sign Problems 22 Conversion of Integral Types to Byte Arrays 24 Converting to Hex Strings 25 BigInteger 28 Creating and Converting 29 BigInteger and Cryptography 31 Secret Methods in BigInteger 31 Chapter 2 Secret Key Cryptography 35 Symmetric Block Ciphers 35 Cipher Properties 36 Security Properties 36 Brute-Force Attacks 37 Other Attacks 37 Common Block Ciphers 39 Data Encryption Standard (DES) 39 Triple DES 40 Blowfish 41 IDEA 43 RC5 43 Rijndael and the Advanced Encryption Standard (AES) 44 Twofish 46 RC6 47 Ciphers You Shouldn’t Use 47 Password XOR Schemes 47 Classical Cryptography 48 ROT 13 49 Padding 49 Fixed-Value Padding 49 Random Padding 50 PKCS Padding 50 Modes of Operations 51 Initialization Vectors 51 Electronic Codebook Mode (ECB) 53 Cipher Block Chaining Mode (CBC) 54 Cipher Feedback Mode (CFB) 56 Output Feedback Mode (OFB) 58 Counter Mode (CTR) 60 Propagating CBC (PCBC) Mode 60 Key Wrapping 61 Triple DES KEY Wrapping 61 AES Key Wrapping 62 Turning Passwords into Keys 63 Hashes 64 Cryptographic Hashes 65 Collisions 66 Attacks 67 Algorithms 69 The MD Family 71 The SHA Family 71 The RIPE-MD Family 72 Hash Standards and Practices 73 Hashed Message Authentication Codes (HMACs) 74 The Standard HMAC 74 Legacy Concatenation Schemes 75 HMAC Standards and Practices 76 Summary 76 Chapter 3 Public Key Cryptography 77 Public Key Ciphers 77 Other Systems 79 Digital Signatures 79 Key Agreements 79 Zero-Knowlege Schemes 79 Secret Sharing 80 Public Key Security Classification 80 Fundamental Mathematics 82 Prime Numbers 82 The Distribution of Prime Numbers 83 Prime Testing 84 Probabilistic Tests 85 Sequence-Based Tests 87 Elementary Number Theory 88 Modular Arithmetic 88 Additive Groups 89 Multiplicative Groups 89 Fields 90 Rings 90 Orders and Generators 90 Public Key Encryption and Major PKCS Categories 91 RSA and Integer Factorization 92 Factoring 92 The RSA Problem 94 The Algorithm 94 Message Representation and OAEP 96 In Practice and Standards 98 Choice of Parameters 98 Discrete Logarithm Systems 100 Underlying Mathematics 100 The Algorithm 103 Standards and Practice 106 Elliptic Curves 106 Underlying Mathematics: Elliptic Curves 106 The Algorithm 110 Standards and Practice 112 Other Public Key Cryptographic Systems 112 Rabin Cryptosystem 112 NTRU 114 Summary 115 Chapter 4 Random Numbers 117 Randomness and Security 119 Testing for Randomness 120 FIPS 140-2 Requirements 121 Pseudorandom Number Generators 122 Cryptographic PRNG 123 SHA-1 PRNG 123 Cipher-CBC or ANSI X9.17 123 FIPS 186 123 Blum-Blum-Shub 124 Stream Ciphers 125 One-Time Pads 125 RC4 or ArcFour 125 Using Randomness 126 Generating Random Numbers for Gaming 126 Generating Random Numbers in a Range 127 Shuffling 129 Generating Random Permutations 131 Small Permutations 131 Large Fixed Permutations 132 Random Sampling 133 Accessing Entropy 134 OS Services 134 Win32 CryptoAPI CryptGenRandom 135 /dev/random and friends 135 Userland Services 136 Entropy Generating Daemon (EGD) 137 PRNGD 141 Yarrow and EGADS 141 TrueRand Library 141 Remote Services 142 RAND Corporation 143 HotBits 143 Random.org 143 LavaRnd 144 Java and Random Numbers 144 Random and SecureRandom 144 java.util.random 145 java.security.SecureRandom 146 Developer Issues 148 Reseeding 148 Collecting Entropy 150 An Entropy Pool Implementation 150 Basic System State 151 Thread Schemes 153 Reading External Sources 156 Application Events 156 User Events 157 Chapter 5 Java Cryptography 159 Organization 160 Providers and Engine Classes 161 Parameters, Keys, and Certificates 162 Error Handling 163 Providers 164 Standard Names 165 Standard Sun and SunJCE Providers 168 Other Providers 169 Initializing Providers 170 Writing Your Own Provider 171 Core Engine Classes 171 MessageDigest 171 Digest Streams 172 MAC 173 SecureRandom 174 Ciphers 177 Additional Cipher-Related Objects 180 Signatures 183 SignedObject 184 Key Agreement Protocols 184 Parameters, Keys, and Certificates 185 Algorithm Parameters 186 AlgorithmParameters 186 AlgorithmParameterGenerators 188 Keys 189 Secret Keys 192 Public/Private Keys 195 Encoding and Encrypting Keys 197 Summary 202 Chapter 6 Small Message Encoding and Encryption 203 Preprocessing 203 Converting Digits into Bytes 203 7-bit to 8-bit Compression 205 General Compression and java.util.zip.Deflate 206 Adding Check and Parity Bits 207 Small Message Encryption 209 Single-Block Encryption 210 n-Block Encryption 210 Very Small Message Encryption 211 XOR Tables 211 Small Codebooks 211 RC5-16/16 212 Small Message Encoding 212 Encoding for Customer-Usable Data 213 Capacity and Range 213 Selecting a Base Representation 214 Selecting Base Alphabets 215 Mixed Bases and Alphabets 221 Adding Check Digits 221 Encoding for Machines and Customer-Visible Applications 230 Base 64 230 Base 85 234 Base 128 and Java Source Encoding 237 Chapter 7 Application and Data Architecture 241 Database Architecture for Encrypted Data 241 Selecting a Cipher 243 Secret or Public? 243 Cipher Selection 244 Data 245 Passwords 245 Challenges and Responses 246 Payment, Credit Card, and Other Account Numbers 247 Social Security Number (U.S.) 250 Birthdates and Birthdays 250 Last Name 251 Searching, Indexing, and Constraints 251 Removing Randomness 252 Deterministic Key Selection 252 Indexing and Hashing 253 Uniqueness Constraints 254 Asymmetric Data Usages 255 Null Values and Database Applications 256 Secure Memory Management in Java 258 Smart Array Classes 259 Char Arrays 262 Using SecureRandom 263 Secret Key Management 263 Secret Key Data 264 Key Generation 266 Key Encryption 266 Storage 267 Key Access and Distribution 268 Using Keys with Ciphers and MACs 269 Passwords 272 Startup Passwords 272 Member Names and Passwords 274 Selecting Passwords 274 Member Login, Success and Failure 275 Changing Passwords and Challenges 276 Web-Based Password Entry 276 Generating New Passwords 277 Member Names 278 Logging 278 Embedded-Encryption Logging 279 Fully Encrypted Log Files 279 Public Key Logging 281 Split Log Files 281 Network-Based Logs 282 Cryptographic Tokens and Applications 282 Token Design 282 Expirations and Time Quantization 283 Creating the Security Bits 285 URL Tokens 285 Tamper-Evident URLs 285 Protecting Static Content 286 A Simple URL MAC Implementation 287 Fast Query String Parsing 290 URL Encryption 296 Very Short URLs 296 Cookie Tokens 297 Detecting Cookie Capability 297 Cookies and Authentication 297 Tokens for Access Control 298 Buy-on-Demand Systems 299 Multiple Key Systems 299 Trials and Expirations 300 Decimal and Monetary Computations 300 Doubles and Floats 300 BigDecimal 301 Rounding 302 BigDecimal and Mathematics 303 BigDecimal Alternatives and Wrappers 304 Usage and Storage 304 Appendix A Java Cryptography Class Reference 305 References 367 Index 381 I wrote this book for software engineers with little or no exposure to cryp- tography. Most other books fall into one of two categories, the encyclope- dia and description or the purely API descriptive. The goal was try and bridge the two by providing a solid introduction to cryptography while providing solid examples and uses. In addition, many books focus over- whelmingly on public key techniques. In my experience the most common uses for public key ciphers are handled by third-party applications (VPNs, Emails) or off-the-shelf protocols such as SSL and SSH. While there are a number of excellent cryptographic libraries in C and C++, Java serves as the reference since: It’s very popular for new business and server applications. Memory management is automatically done, eliminating entire classes of bugs (stack smashing and buffer overflows). It provides a standard cryptographic API that, while not perfect or complete, is about as “universal” as we can get. The cryptographic API for Java is scattered among several packages. Instead of listing classes by packages as is commonly done, Appendix A lists the classes alphabetically. I found this to be much more useful than flipping between different package sections. I’ve tried to limit source code to examples that are relatively simple or that demonstrate main points or self-contained utilities. More complicated (and perhaps more useful) examples were excluded simply because the error and exception handling for a security application can be quite long and tedious and didn’t really add much value. I have always disliked Preface xiii CD-ROMs glued into books and likewise, never found chapters and chap- ters of source code to be very useful. Instead, full source code and more examples are presented at the companion Web site at www.wiley.com/ compbooks.galbreath. Unfortunately, time and space didn’t permit examination of many top- ics. In particular: XML key management, encryption, and signature scheme Privacy Issues, including privacy “seals,” Gramm-Leech-Biley Act of 1999 and HIPPA privacy rule. More detailed database tips and techniques. Perhaps these will be detailed in future editions. Until then, I hope you find this book useful. Finally, this book would not have been possible without the many fruit- ful conversations and support from Dave Andre, Jeff Bussgang, Ann Calvit, Roy Dixit, Jim Finucane, Bill French, Venkat Gaddipati, Nabil Hachem, Sam Haradhvala, Adam Hirsch, Steve Morris, Jack Orenstein, Rich O’Neil, and Matt Rowe, and many others from Upromise, SightPath and Open Market. Special thanks to my family and friends that had to put up with my social hiatus while working on this book. Thank you all. Nick Galbreath Plum Island, MA June 2002 The goal of the book is to present the systems programmer with a full intro- duction into the science and the application of cryptography using Java. Unlike many texts, this book does not focus on cryptography for transmis- sion (allowing two parties to communicate securely). For most applica- tions, those requirements are handled by the SSL or SSH protocols, or a third-party application. Instead this book focuses on cryptography for stor- age, message integrity, and authentication, (the so-called “single-point” techniques) which is more common with custom applications. Beside pure cryptography, many “auxiliary” techniques are covered that make cryp- tography useful. The first chapter covers basic logical and numeric operations, both at the theoretical level and with the specifics of the Java numeric model. While it may be a review to many readers, I’ve found that many system program- mers just do not have exposure to these low-level details. Either they never learned, or more likely, because it just isn’t normally used in day-to-day work, it’s been forgotten. Those readers who are more familiar with C or C++, should find this chapter especially useful for translating code one way or another from Java. The next two chapters introduce the reader to science, mathematics, and standards of both secret key and public key cryptography. Here you’ll learn why the algorithms work and the various tradeoffs between them. While the mathematics aren’t essential in day-to-day programming tasks, the goal was to teach the reader to be familiar with the terminology that is often tossed around to be able to make informed decisions. Introduction xv Chapter 4 discusses random numbers and random number generators. While not technically cryptography, random numbers are essential to it. Those readers developing gaming systems should find this section espe- cially interesting. Java is used for examples of algorithms, but it could be easily converted to any other programming language. Finally, in Chapter 5 we introduce Java’s cryptographic APIs. The Java SDK already comes with some excellent documentation, but I’ve tried to pull it together consolidating the JCA and JCE into a coherent format. Like- wise, I made Appendix A the Java cryptography reference I wanted when I’m programming. Instead of listing classes by package, I list them all alphabetically. You’ll find you’ll do a lot less page flipping this way and get a much better understanding of the API with this organization. Chapter 6 is on small message encoding, and like Chapter 4, isn’t techni- cally cryptography, but it’s always an issue when using cryptography. Here you’ll learn how to convert binary data (perhaps encrypted) into var- ious ASCII formats. This is critical when embedding data in a URL, creating passwords and cryptographic tokens. Chapter 7 pulls everything together and discusses many topics: applica- tion and database design, working the passwords and tokens, key man- agement, and logging. Numerous source code examples are used, and the source itself can be found at the companion Web site www.modp.com. Many of the examples in the book will require a little reworking for production use. You’ll want to modify the error handling to suite your needs. In some cases, if the code is especially long and not particularly illuminating, I decided not to list it and instead just refer to the Web site. I find page after page of printed source code in a book to not be particularly useful. Unfortunately, due to time and space requirements a lot of important topics are not covered. Digital signatures are only mentioned and they deserve more, but I’ve found for basic application work, they just aren’t that useful. Either, most people won’t use them or there is already a trust relationship in place eliminating their usefulness, or encryption with a hash or MAC is preferable. And finally, when I do need to use them, the process is normally handled by a third party application. Other topics not given their due are the new XML and SAML standards for creating documents that contain encrypted data, and embedding cryptography within the data- base (instead of the application), such as the newer Oracle database can do. A lot more could be said for key management and database design as well. Perhaps future editions will remedy this. 1 Before getting into the details of cryptographic operators, we’ll review some basics of working with bits and number bases. For those of you who have worked on operating systems, low-level protocols, and embedded systems, the material in this chapter will probably be a review; for the rest of you, whose day-to-day interactions with information technology and basic software development don’t involve working directly with bits, this information will be an important foundation to the information contained in the rest of the book. General Operations Most cryptographic functions operate directly on machine representation of numbers. What follows are overviews of how numbers are presented in different bases, how computers internally represent numbers as a collec- tion of bits, and how to directly manipulate bits. This material is presented in a computer- and language-generic way; the next section focuses specifi- cally on the Java model. Bits and Bytes C HAPTE R 1 Number Bases In day-to-day operations, we represent numbers using base 10. Each digit is 0 through 9; thus, given a string of decimal digits dndn-1 ... d2d1d0, the numeric value would be: 10ndn + 10n-1dn-1 + ... + 102d2 + 10d1 + d0 This can be generalized to any base x, where there are x different digits and a number is represented by: xndn + xn-1dn-1 + ... + x2d2 + xd1 + d0 In computer science, the most important base is base 2, or the binary rep- resentation of the number. Here each digit, or bit, can either be 0 or 1. The decimal number 30 can be represented in binary as 11110 or 16 + 4 + 2. Also common is hexadecimal, or base 16, where the digits are 0 to 9 and A, B, C, D, E, and F, representing 10 to 15, respectively. The number 30 can now be represented in hexadecimal as 1E or 16 + 14. The relationship between dig- its in binary, decimal, and hexadecimal is listed in Table 1.1. Table 1.1 Binary, Decimal, and Hexadecimal Representations BINARY DECIMAL HEXADECIMAL 0000 0 0 0001 1 1 0010 2 2 0011 3 3 0100 4 4 0101 5 5 0110 6 6 0111 7 7 1000 8 8 1001 9 9 1010 10 A 1011 11 B 1100 12 C 1101 13 D 1110 14 E 1111 15 F When you are working with different bases, the base of a number may be ambiguous. For instance, is 99 the decimal or the hexadecimal 99 (= 9 × 16+9)? In this case, it’s common to prefix a hexadecimal number with 0x or just x (e.g., 99 becomes 0x99 or x99). The same confusion can happen with binary numbers: Is 101 the decimal 101 or the binary 101 (4 + 1 = 5)? When a number in binary may be confused, it’s customary to add a sub- script 2 at the end of a string of binary digits, for example, 1012. Any number can be represented in another base b if it is positive; how- ever, doing the conversion isn’t necessarily easy. We’ll discuss general- purpose base conversion in a later section, but it’s useful to note that conversion between two bases is especially easy if one of the bases is a power of the other. For instance, the decimal number 1234 has the canoni- cal representation in base 10 as 1 × 1000 + 2 × 100 + 3 × 10 + 4. However, 1234 can also be thought as two “digits” in base 100: (12) and (34), with a value of 12 × 1000 + 34 × 100. It’s the same number with the same value; the digits have just been regrouped. This property is especially useful for base 2. Given a binary string, it’s possible to convert to hexadecimal by group- ing 4 bits and computing the hexadecimal value: 10011100 = 1001 1100 = 9C Bits and Bytes A bit is the smallest unit of information that a computer can work on and can take two values “1” and “0,” although sometimes depending on con- text the values of “true” and “false” are used. On most modern computers, we do not work directly on bits but on a collection of bits, sometimes called a word, the smallest of which is a byte. Today, a byte by default means 8 bits, but technically it can range from 4 to 10 bits. The odd values are from either old or experimental CPU architectures that aren’t really in use anymore. To be precise, many standards use octet to mean 8 bits, but we’ll use the more common byte. Modern CPUs operate on much larger word sizes: The term 32-bit microprocessor means the CPU operates primarily on 32-bit words in one clock cycle. It can, of course, operate on 8-bit words, but it doesn’t mean it happens any faster. Many CPUs also have special instructions that sometimes can operate on larger words, such as the SSE and similar instructions for multimedia, as well as vector processing, such as on the PowerPC. A byte has a natural numeric interpretation as an integer from 0 to 255 using base 2, as described earlier. Bit n represents the value 2n, and the value of the byte becomes the sum of these bits. The bits are laid out exactly as expected for a numeric representation, with bit 0 on the right and bit 7 on the left, but the layout is backward when compared to Western languages. (b7b6b5b4b3b2b1b0) = 27b7 + 26b6 + 25b5 + 24b4 + 23b3 + 22b2 + 21b1 + 20b0 or using decimal notation (b7b6b5b4b3b2b1b0) = 128b7 + 64b6 + 32b5 + 16b4+ 8b3 + 4b2+ 2b1 + b0 For example, 00110111 = 32 + 16 + 4 + 2 + 1 = 55. Bits on the left are referred to as the most-significant bits, since they contribute the most to the overall value of the number. Likewise, the right- most bits are called the least-significant bits. This layout is also known as Big-Endian, which we’ll discuss later. Signed Bytes Negative numbers can be represented in a few ways. The simplest is to reverse one bit to represent the sign of the number, either positive or nega- tive. Bits 0 through 6 would represent the number, and bit 7 would repre- sent the sign. Although this allows a range from –127 to 127, it has the quirk of two zeros: a “positive” zero and a “negative” zero. Having the two zeros is odd, but it can be worked around. The bigger problem is when an over- flow occurs—for instance, adding 127 + 2 is 129 in unsigned arithmetic, or 1000001. However, in signed arithmetic, the value is –1. The most common representation is known as two’s complement. Given x, its negative is represented by flipping all the bits (turning 1s into 0s and vice versa) and adding 1, or computing –1 –x (the same value). For example, note in Table 1.2 that adding 1 to 127 makes the value –128. While this method is a bit odd, there are many benefits. Microprocessors can encode just an addition circuit and a complementation circuit to do both addition and subtraction (in fact, many CPUs carry around the complement with the original value just in case subtraction comes up). The other main bene- fit is when casting occurs, or converting a byte into a larger word, as described in the following section. Bitwise Operators The usual arithmetic functions such as addition and multiplication inter- pret words as numbers and perform the appropriate operations. However, other operations work directly on the bits without regard to their represen- tation as numbers. These are bitwise or logical operations. While examples shown in the next sections use 8-bit bytes, they naturally extend to any word size. Table 1.2 Two’s Complement Representation UNSIGNED SIGNED HEXADECIMAL BINARY VALUE VALUE REPRESENTATION REPRESENTATION 0 0 00 00000000 1 1 01 00000001 2 2 02 00000010 126 126 7d 01111110 127 127 7f 01111111 128 -128 80 10000000 129 -127 81 10000001 130 -126 82 10000010 253 -3 fd 11111101 254 -2 fe 11111110 255 -1 ff 11111111 As shown in Table 1.3, the operations are represented by different sym- bols depending if the context is programming or typographical. When required, this book will use the programming notations, with the exception of XOR, since the caret symbol (^) is used to denote exponents in some systems. Table 1.3 Bitwise Operations and Notation C-STYLE C-STYLE SELF- OPERATION NOTATION ASSIGNMENT TYPOGRAPHICAL NOT a ~a n/a ¬ a a AND b a & b a &= b a ^ b a OR b a | b a |= b a b a XOR b a ^ b a ^= b a ⊕ b Shift a left by a << n a <<= n a << n n bits (continues) ^ Table 1.3 Bitwise Operations and Notation (Continued) C-STYLE C-STYLE SELF- OPERATION NOTATION ASSIGNMENT TYPOGRAPHICAL Shift a right by a >> n a >>=n None; all shifts and n bits, preserve words are assumed sign of a to be unsigned Shift a right by a >>> n a >>>=n a >> n n bits, unsigned Rotate a right by (x >>> n) | n/a ROTRn(a) n bits; w is (x << w-n) number of bits of a Rotate a left by (x << n) | n/a ROTLn(a) n bits; w is the ( x >>> w-n) number of bits of a Concatenating (a << shift) | b n/a a || b a and b Take n least- a & mask a &= mask MSBn(a) significant bits of a Take n most- a & mask LSBn(a) significant bits of a >>> shift Complementation or Bitwise NOT The simplest bit operation is complementation or the bitwise NOT. This simply flips bits within a word, where 0s become 1s and 1s become 0s—for example ~11001 = 00110. Cryptographically, this operation is not used much, primarily for implementing basic arithmetic in hardware. Bitwise AND AND is useful in bit testing and bit extraction. It’s based on the usual truth table for logical AND, as shown in Table 1.4. You can remember the truth table by observing that it’s the same as binary multiplication—anything times zero is zero. Table 1.4 Bitwise AND 0 1 0 0 0 1 0 1 AND To test if a bit is set, create a mask with 1s in positions you wish to test and perform the logical AND operation. If the result is nonzero, the bit is set; otherwise, the bit is not: 01011101 AND 00001000 ________ 00001000 // not == 0, bit 4 is set It’s also useful for bit clearing. A mask is created with 1s in positions to preserve and 0s where you want to clear. For instance, to clear the four least-significant bits in a byte, use the mask 11110000: 11011101 AND 11110000 ________ 11010000 Bitwise OR Bitwise OR extends the logical OR to a series of bits. In English, or means one or the other but not both. The bitwise version means one or the other or both (the same as the logical OR). See Table 1.5. Logical OR is useful in joining nonoverlapping words. The following example joins two 4-bit words together: 11010000 OR 00001111 ___________ 11011111 In this case, the same effect could be done by using regular addition; however, it’s pretty standard to use OR instead to make it clear that addi- tion is not the point of the operation and that you are doing bit operations. Table 1.5 Bitwise OR 0 1 0 0 1 1 1 1 OR Table 1.6 Bitwise XOR Bitwise Exclusive OR (XOR) Exclusive OR is abbreviated as XOR, and its operation is denoted by ⊕. It is less common in normal programming operations but very common in cryptographic applications. Table 1.6 shows the Bitwise XOR table. This table is easy to remember, since it is the same as addition but without any carry (this is also known as addition modulo 2). It’s also equivalent to the English usage of or—one or the other but not both. XOR tests to see if two bits are different: If the bits are both 0s or both 1s, the result is 0; if they are different, the result is 1. XOR is useful in crypto- graphic applications, since unlike AND and OR, it’s an invertible opera- tion, so that A ^ B ^ B = A: 11010011 XOR 10101010 ____________ 01111101 XOR 10101010 ____________ 11010011 Left-Shift As its name implies, the left-shift operator shifts the bits within a word to the left by a certain number of positions. Left-shifts are denoted by a << b, where a is the value and b denotes the number of positions to shift left. Zeros are filled in on the least-significant positions: 11111111 << 1 = 11111110 11001100 << 3 = 01100000 Mathematically, each shift left can be thought of as multiplication by 2: 3 × 2 = 00000011 << 1 = 00000110 = 4 + 2 = 6 3 × 4 = 00000011 << 2 = 00001100 = 8 + 4 = 12 -3 × 2 = 11111101 << 1 = 11111010 = -6 -128 × 2 = 10000000 << 1 = 00000000 = 0 (overflow) 0 1 0 0 1 1 1 0 XOR When you are working with large word sizes (such as 32-bit integers), left-shifts are used to compute large powers of 2, since 2n = 1 << n. A trick to test to see if only one bit is set (or rather the value is a power of 2) is using x & -x == x. In many cryptographic operations, we need to extract a cer- tain number of the least-significant bits from a word. Normally you are working with word sizes much bigger than a byte. However, say you needed to extract the six least-significant bits from a byte B. You could per- form B & 0x3f (B & 0011111). Computing this for larger word sizes is a bit clumsy, and worse, some algorithms may extract a variable number of bits, so the mask has to be dynamic. Hardwiring a hexadecimal mask is also prone to errors, and someone reading your code would have to think about how many bits you are extracting. How many bits are in 0x7fffff? The answer is 23, but it’s not automatically clear, and staring at a computer mon- itor all day where the text is in a small font makes the task harder. Instead, a left-shift can be used, since 2n – 1 or (1 << n) – 1 has binary representation of with (n – 1) digits of “1.” Instead of 0x7fffff, we can use ((1 << 24) – 1), which will be perfectly clear to someone familiar with bit operations. Even better, you can make your intentions clearer by making the mask a constant: public static final int mask23lsb = (1<<24) -1; // 23-bit mask Right-Shift Not surprisingly, right-shifts, denoted by >>, are the opposite of left-shifts, and positions are shifted to the right. However, there is another significant difference. In left-shift notation, zeros are placed in the least-significant position. With right-shift, the value of the most-significant bit is used to fill in the shifted positions. For example, 1000000 >> 2 = 1100000, but 0100000 >> 2 = 00100000. This is designed so that the sign is preserved and the right-shift is equivalent to division by 2 (dropping any fractional part), regardless of the sign. For example, we’ll “undo” the previous left-shift examples: 6 / 2 = 00000110 >> 1 = 00000011 = 3 12 / 4 = 00001100 >> 2 = 00000011 = 3 -6 / 2 = 11111010 >> 2 = 11111101 = -3 -127 / 2 = 10000001 >> 1 = 11000000 = -64 This works well when you are treating the byte as a number. In crypto- graphic applications, the bits are just treated as bits without any numerical interpretation. In every case, it is assumed the byte or word is unsigned (e.g., unsigned long in C). If you are using a signed type (such as Java), you’ll want to use the unsigned right-shift, denoted by >>> (three less-than symbols). This just shifts the bits to the right and fills in zeros for the most significant bits. All cryptographic papers assume the shift is unsigned and normally use the plain >>. When coding an algorithm using signed types, make sure you convert >> to >>>. Special Operations and Abbreviations The previous operations, while useful, are fairly basic and typically directly implemented in hardware. By combining and composing them, you can create new higher-level bit operations. For cryptography, the most useful operations are rotations, concatenation, and extraction of most- and least- significant bits. Rotations Bit rotations are fairly self-explanatory: Bits are shifted either left or right, and the bits that “fall off the edge” roll over onto the other side. These are commonly used in cryptography, since this method provides an invertible operation that can aid in scrambling bits. The problem is that they aren’t used for much else, so many CPUs do not have rotation instructions, or if they do, they are frequently slow. Even if they have a fast rotation, the only way to access them is by writing assembly language, since rotations do not have a standard operator like the shifts do. Common programming rota- tions can be accomplished with a combination of shifts and logical opera- tors, which is the slowest method of all. In its most general case, a rotation will take one subtraction, two shifts, and a logical OR operation. Remember to use the unsigned right-shift operator: (x >>> n) | (x << 32-n); // rotate right n positions, x 32-bit int (x << n) | (x >>> 32-n); // rotate left n position, x 32-bit int There is no special typographical symbol for rotations; we normally use the abbreviations ROTR or ROTL, although certain technical papers may define their own symbols in their work. Bit Concatenation Bit concatenation is the process of simply joining together two sets of bits into one word. If b is n-bits long, then a || b is a << n | a: a = 101 b = 0011 a || b = 11 << 4 | 0011 = 1010000 | 0011 = 1010011 MSB and LSB operations Two other common operations are extracting a certain number of most- significant bits (MSB) or least-significant bits (LSB). We’ll use the notation MSBn(a) to denote extracting the n most-significant bits. Likewise with LSBn(a). These operations are accomplished by making the appropriate mask, and in the MSB case, by shifting appropriately: MSB3(10111111) = (10111111 & 11100000) >>> 5 = 101 LSB2(11111101) = 11111111 & 00000011 = 00000001 Packed Words Shifts are useful in creating packed words, where you treat the word as an array of bits rather than as a number, and where the bits represent anything you like. When doing this, you should use an unsigned numeric type. Unfortunately, Java and many scripting languages do not have unsigned types, so you must be extremely careful. As an example, suppose you have two variables that are numbers between 0 and 15. When transmitting or storing them, you might want to use the most space-efficient representa- tion. To do this, you represent them not as 8-bit numbers from 0 to 255, but as two numbers each with 4 bits. For this example, we’ll use pseudocode and use the type int to represent an unsigned 8-bit value: int a = 5; // 00000101 int b = 13; // 00001101 int packed = a << 4 | b; // = 00000101 << 4 | 00001101 // = 01010000 | 00001101 // = 01011101 To undo the operation: b = packed & 0x0F; // 01011101 & 00001111 = 00001101 a = packed >>> 4; // note unsigned right-shift // 01011101 >>> 4 = 00000101 Likewise, you might wish to view the byte as holding eight true/false values: int c = b0 | (b1 << 1) | (b2 << 2) | (b3 << 3) | (b4 << 4) | (b5 << 5) | (b6 << 6) | (b7 << 7); int mask = 0x01; b0 = c & mask b1 = (c >>> 1) & mask; b2 = (c >>> 2) & mask; and so on. If you were using an array, you could do this dynamically in a loop as well. for (int i = 0; i < 8; ++i) b[i] = (c >>> i) & mask; It’s quite easy to make mistakes, especially when the packed word has a complicated structure. If you are writing code and the answers aren’t what you’d expect: Check to make sure you are using the unsigned right-shift operator >>> instead of the signed version >>. Check that the language isn’t doing some type of automatic type conversion of signed values (e.g., turning bytes into integers before the shift operation happens). Check to make sure your masks are correct. If you are using dynamic shifts, make sure the shift amount isn’t larger than the word size. For instance, if you are shifting a byte, make sure you aren’t doing >> 9. How this is interpreted depends on the environment. Integers and Endian Notation Endian refers to how a string of bytes should be interpreted as an integer, and this notation comes is two flavors: Little-Endian and Big-Endian. The names come from Jonathan Swift’s Gulliver’s Travels, where the tiny people, Lilliputians, were divided into two camps based on how they ate eggs. The Little-Endians opened the eggs from the small end, and the Big-Endians opened theirs from the big end. As you might expect with that reference, there are pros and cons to Big-Endian and Little-Endian representations, but overall it doesn’t make any difference except when you have to convert between the two formats. A comparison of the two is shown in Table 1.7. Table 1.7 Comparison of Little-Endian versus Big-Endian Representation ENDIAN TYPE B0B1B2B3 = 0XAABBCCDD SAMPLE MICROPROCESSORS Little-Endian aa bb cc dd Intel x86, Digital (VAX, Alpha) Big-Endian dd cc bb aa Sun, HP, IBM RS6000, SGI, “Java” Big-Endian, also know as most-significant byte order (MSB) or network order, puts the most-significant or highest byte value first. This is equiva- lent to how we write decimal numbers: left to right. The downside is that many numerical algorithms have to work backward starting at the end of the array and working forward (just as you would manually with pencil and paper). The Little-Endian, or least significant byte (LSB), format is the opposite. This makes it harder for humans to read hex dumps, but numeric algorithms are a little easier to implement. Adding capacity (or widening conversions) is also easier, since you just add bytes to the end of the array of bytes (i.e., 0xff becomes 0xff00). Fortunately, regardless of what the byte Endian order is, the bits within bytes are always in Big-Endian format. For instance, 1 is always stored in a byte as 000000012 no matter what platform. The Endian issue becomes critical when you are working with heteroge- neous systems—that is, systems that use different Endian models. When shipping bytes between these machines, you must use a standard Endian format or an ASCII format. In many other programming languages, you must determine in advance what the Endian architecture is and adjust sub- sequent bit operations appropriately. For cryptographic applications this becomes critical, since you are often manipulating bits directly. With Java, the underlying architecture is hidden and you program using a Big-Endian format. The Java virtual machine does any Endian conversion needed behind the scenes. For C and C++ programmers, normally a BIG_ENDIAN or LITTLE_ ENDIAN macro is defined by the compiler or from an include file. If not, you can use code similar to this for testing. It sets raw memory and then converts it to an integer type. The value will be different depending on the CPU’s Endian type. This C code assumes an int is a standard 4 bytes or 32 bits, but you may wish to generalize: int isBigEndian() { static unsigned char test[sizeof(unsigned int)] = {0, 0, 0, 1}; unsigned int i = *(unsigned int) test; if (i == 0x0000001) return 1; // true, big-endian return 0; // false, little-endian } Java Numerics We’ll now apply the information in the previous section to the Java numeric model. While notation is similar to the C or C++ model, there are Java-specific issues with using signed and unsigned types—specifically, byte arrays. Java also provides class wrappers around the native types, as well as an unlimited-capacity integer arithmetic. Basic Types Java provides the standard basic numeric types—integers with 8, 16, 32, and 64 bits, and floating-point types with 32- and 64-bit representations. Unlike C and other languages, all the types are signed—there are no native unsigned types. Table 1.8 shows Java’s primitive numeric types. For integer types, a literal integer can be expressed in decimal or hexa- decimal formats (using lower- or uppercase letters for hex digits): int i1 = 1000; int i2 = 0x3e8; // == 1000 decimal int i3 = 0x3E8; // same For literal long values, a prefix of L or l should always be used, even if it appears unnecessary: long l1 = 1000; // compiles but is not recommended long l1 = 1000L; // recommended long l2 = 0xfffffffff; // won’t compile, error long l2 = 0xfffffffffL; // correct Table 1.8 Primitive Numeric Types in Java NAME TYPE LOGICAL SIZE RANGE byte signed integer 8 bits –128 to 127 short signed integer 16 bits –32768 to 32767 int signed integer 32 bits –2,147,483,648 to 2,147,483,647 (2.1 billion) long signed integer 64 bits –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (± 9.2 × 1018) float ANSI/IEEE 754 32 bits ±1.4 × 10-45 to floating point ±3.4028235 × 1038 (6–7 significant decimal digits) double ANSI/IEEE 754 64 bits ±4.9 × 10-324 to floating point ±1.7976931348623157 × 10308 (15 significant decimal digits) Specifying literal values for bytes can be tricky because a byte is signed from –128 to 127, while very often you’ll be using constants specified from 0 to 255. If the value is within –128 to 127, the code will compile. If the value is from 128 to 255, you can either convert it to its negative equivalent or use a cast operation. The same principles apply to the short type: byte b1 = 10; byte b2 = 189; // error, out of bounds byte b3 = -67; // = 189, ok byte b4 = (byte) 189; // ok byte b4 = (byte) 0xbd; // ok Floating-type literals are assumed to be a double type unless suffixed by an f for float. Floating types can also be representing using scientific notation using valEscale = val × 10scale. float f1 = 0.1; // compiler error float f2 = 0.1f; // by default “0.1” is a double type double d1 = 0.1; double d2 = 1.0E2 // == 100 In practice, the short type is rarely used because it almost always is converted into an int type before anything else is done. The float type also isn’t used because it doesn’t provide enough precision for most applications. Type Conversion Conversions between types can either be widening or narrowing. A widen- ing conversion is where the new type can represent larger numbers and can “contain” the old type. For an integer-to-integer type conversion (i.e., short to long), this just means putting the same number in a larger container. Integer-to-floating-point conversions are also considered as widening, but some of the least-significant digits may be scrambled or zeroed due to how floating point numbers are presented. These types of conversion happen automatically and silently either at compile time or at run time. For example: int i1 = 123; long l1 = i; // ok -- l = 123 float f1 = i; // ok -- f = 123 int i2 = 123456789; float f2 = i; // ok, but f = 123456792 Narrowing conversions may result in a loss of magnitude and precision. Any conversion from floating-point to an integer type is considered nar- rowing and is clearly a larger integer type to a smaller integer type. Java will not automatically do narrowing conversions, and a compiler error will be issued if the compiler detects such a conversion. To do these conver- sions, you must explicitly declare them with the cast operator. Converting floating point to integers drops or truncates any digits to the right of the decimal point: float f1 = 0.123456; int i0 = f1; // compiler error -- requires a cast. int i1 = (int) f1; // == 0 float f2 = 0.99999; int i2 = (int) f2; // == 0 float f3 = 100.9; int i3 = (int) f3; // == 100 float f4 = -100.9; int i4 = (int) f4; // == -100 int i5 = 0xffL; // compiler error; long to int conversion For narrowing conversions from one integer type to another, the least- significant bits (or bytes) form the larger type: int i1 = 0xfffff01; byte b1 = (byte) b; // b = 0x01; These rules are summarized in Table 1.9, with N representing a narrow- ing conversion, W for a widening conversion, and W* for a widening conversion that may result in a loss of precision. Table 1.9 Primitive Type Conversion BYTE SHORT INT LONG FLOAT DOUBLE byte W W W W W short N W W W W int N N W W* W long N N N W* W* float N N N N W double N N N N N Unsigned to Signed Conversions There are no unsigned types in Java; however, you can still use the signed types as an unsigned container. Even though a byte is signed, it still has 8 bits and can represent an integer from 0 to 255 even if Java thinks other- wise. To use the unsigned value, it must be converted into a larger type using an AND mask. To convert a unsigned byte: byte c = (byte) 254; // b = -2 in Java. short c = (short)(x & 0xff); // c = 254 int c = x & 0xff // c = 254 long c = x & 0xffL; // c = 254, must put L at end of 0xffL Unsigned short values are converted by the following: int c = x & 0xffff; long c = x & 0xffffL; Finally, the unsigned int is converted into a signed long by the following: long c = x & 0xffffffffL It’s critical to put an L at the end of the mask when working with long types. Without it, Java will respond as if you are doing an int computa- tion, then doing a widening conversion to a long: byte b = (byte)0xff; long b = (b & 0xff) << 56; // Wrong. b = 0 long b = (long)((int)(b & 0xff) << 56); // same as previous long b = (b & 0xffL) << 56; // Correct. b = 0xff00000000000000 Overflow When a computation exceeds an integer type’s bounds, the value is “rolled over” silently; no exception is thrown: byte b = (byte) 127; // b = 0111111 b++; // b = 1000000, but b has the value of -128 While the silence may seem disturbing, in practice overflow is rarely a problem. For floating-point computations overflow is still silent, but the result is not rolled over. Instead, the type takes on a special Infinity value that can be checked using normal comparison operators or by using the Double.isInfinite method. Also note that Java does not think 0.0 is the same as 0 (zero). Instead, 0.0 is treated as a number that is very close to zero, so division by 0.0 results in an infinite value. Keep in mind, how- ever, that Java throws an exception if you try to divide by the integer 0. In the event that a computation doesn’t make sense, the value is NaN, which stands for “not a number.” If you wish to see if a value is NaN, then you must use the Double.isNaN method. Various examples of manipulating double types are shown as follows: double d1 = 1.7E308 * 2.0; // overflow = Infinity double d2 = 1.0/0.0; // = Infinity double d3 = -1.0/0.0; // = -Infinity int i = 1 / 0; // throws a DivisionByZero exception double d4 = 0.0/0.0 // = NaN boolean b = (d4 == d4); // = false, always boolean b = Double.isNan(d4); // true; boolean b = Double.isInfinite(d3); // true Arrays In Java, native-type arrays are treated and created as if there were objects created using a constructor with the new operator. Arrays can also be set with initial values, as shown in the following code snippet. int[] a= new int[3]; // all three elements set to zero int[] a = {1, 2, 3}; // a pre-set 3 element array. Once constructed, an array is not resizable. If dynamically allocated storage is needed, you should use one of the collections in the java.util package. The index to arrays is with int types, and the first element starts at 0 (as in C). Small types will have widening conversions done to them, and the long type cannot be used without a cast operation. Thus, single dimensional arrays cannot be larger than 231, or roughly 2.1 billion, entries. Hopefully, this will be sufficient for your needs. Since arrays are objects, assignment is done by reference, as shown in the following: int[] a = {1,2}; int[] b = a; b[0] = 100; // modifies the object that both a and b point too. System.out.println(a[0]); // will print 100, not 1 a = null; // b is null, but a is intact. In addition, they can be copied using the clonemethod. To use this, cast the result to the correct array type: int[] a = {1, 2}; int[] b = (int[]) a.clone(); // b is a “deep copy” of a The System.arraycopy method is extremely useful for copying parts of an array or concatenating arrays. It makes a native system call to copy memory directly instead of copying each element individually and is much faster than writing a custom for loop: System.arrayCopy(Object input, int inputOffset, Object output, int outputOffset, int length) Even though the method has Object in its signature, this method only works on array types. Other useful methods for manipulating native arrays can be found in the class java.util.Arrays and are summarized in Table 1.10. Table 1.10 Summary of Methods from java.util.Arrays JAVA.UTIL.ARRAYS METHOD DESCRIPTION static boolean Returns true if and only if the arrays are equals(type[] a, type[] b) the same length and every element is equal. static void Fills an array with a single value. fill(type[] a, type val) static void Performs a fast numeric sort. Array is sort(type[] a) modified. static void Sorts only part of an array. sort(type[] a, int fromIndex, int toIndex) static int Performs a binary search of a sorted binarySearch(type[] a, array. Returns the array index if a match type val) is found and –1 if no match. Arrays must be sorted first. Numeric Classes Each numeric type has a matching class that acts as an object wrapper, as shown in Table 1.11. These classes do not have any support for mathematical operations— you can’t add two integer objects directly. Instead, these classes are pri- marily used to allow numeric types to be used in methods that expect an Object type, such as in collections from the java.util package (e.g., ArrayList, HashMap). The wrapper classes also provide basic string for- matting and parsing from strings for numbers. Since they are objects, they can also be null. Likewise, objects are always pass-by-reference. public void changeInt(Integer i) { i = new Integer(“1”); } Integer I = new Integer(“0”); changeInt(i); // i is now 1 All of the classes share some common traits (examples are just shown for Integer and Long): Two public fields, MAX_VALUE and MIN_VALUE, in case you don’t want to memorize the previous table. byteValue, doubleValue, floatValue, intValue, and longValue methods that return the underlying number in a native type. A static method valueOf that accepts a string and an optional radix to parse a string and return an object. The radix can be 2 to 32. static Integer Integer.valueOf(int val) static Integer Integer.valueOf(int val, int radix) static Long Long.valueOf(long val) static Long Long.valueOf(long val, int radix) A static method parseClass (where Class is the name of the class, such as Byte or Integer) that also accepts a string, and an optional radix to parse a string returns the native type instead of an object. static int Integer.parseLong(int val) static int Integer.parseFloat(int val, int radix) static long Long.parseLong(long val) static long Long.parseLong(long val, int radix) toString that returns the number as unformatted decimal number. Table 1.11 Java Class Wrappers for Native Types NATIVE TYPE MATCHING JAVA.LANG CLASS int Integer byte Byte double Double float Float long Long short Short The Long and Integer classes have a few other useful static methods that provide an unsigned representation of the number in binary, hex, or octal formats as shown: static String Integer.toBinaryString(int val) static String Integer.toHexString(int val) static String Integer.toOctalString(int val) static String Long.toBinaryString(long val) static String Long.toHexString(long val) static String Long.toOctalString(long val) Binary representation is especially useful for debugging bit fields. These objects do not add any “leading zeros” to the output, so new Integer(16).toHexString() just returns F and not 0F or 0x0F. Booleans and BitFields Java provides a native boolean type that can either be true or false. Unlike C and C++, it is not a numeric type and does not have a numeric value, and it is not automatically cast. Like the numeric types, there is also a boolean class wrapper. If you want to create an array of boolean values, you could use Boolean with one of the collection classes (e.g., ArrayList). However, there is a special class java.utl.BitField specially designed for use with boolean types that provides a huge perfor- mance increase and memory savings. More specialized applications will convert a native type or byte array into a bit field directly. Chars Java has another native type char that represents a Unicode character and is used by String and StringBuffer internally. It is represented by an unsigned 16-bit integer, but it is not a numeric type. It is automatically cast to an integer in situations when needed, such in mathematical or bitwise operations, but it’s not automatically cast to any other type. However, you can explicitly convert a char to another type by using a cast operator. Since it’s unsigned, it might be tempting to use char in places for math- ematical purposes and for raw bit fields. In practice, there is not much point, since char types are automatically converted into an int before any operation is done, eliminating any advantage Working with Bytes Because the Java byte type is signed and because of the automatic conver- sion rules, working with bytes and byte array can be frustrating. Following are some tips and tricks to simplify byte manipulation. Sign Problems It’s always easiest to specify a constant byte value by using a cast (later in this chapter you will see tricks on making byte arrays): byte b = 0xff; // Compiler error. 0xff is an ‘int’ type byte b = (byte) 0xff; // right Bytes range from –127 to 128, and not 0 to 255. Setting a byte with the value 128–255 and a cast is fine—the byte contains the correct bits. The problem is when you want to retrieve the unsigned value or when you per- form a computation with bytes and ints. For instance: byte b = 127; int i = 3; System.out.println((i+b)); // Prints -126 and not 130 results in –127 being printed, not 129. In the addition step, byte b is auto- matically converted to an int type, including the sign. While b was set with 128, internally it’s really –1, and this is its value when converted. The key point to remember is that all bitwise and arithmetic operators work on int and long types only. All other types are automatically cast. If you are using a byte as an unsigned value, you must manually convert the type to an int or long before the operation using val & 0xFF or val & 0xFFL respectively. The problem is more mysterious with the shift operators. Let’s say you want to do a right-shift on 0xFF. In binary, this is 11111111, so you’d expect an unsigned right-shift to be 0111111, or 0x7F. The natural way to write this is: byte b1 = (byte) 0xff; byte b2 = b >>> 1; // compiler error However, this results in a strange error from the javac compiler, and the problem isn’t clearly defined: aprogram:java.6: possible loss of precision found: int required: byte byte b2 = b >>> 1; ^ You might try and fix this by casting the result: byte b1 = (byte) 0xff; byte b2 = (byte)(b >>> 1); // error #2 This performs the compile operation; however, the result is still wrong. The value of b2 is not 0x7f, but strangely remains at 0xff or –1. Again, Java is converting the byte into an integer before the shift. Since bytes are signed, it converts the byte value of –1 into an int value of –1, which is 0xffffffff. Then it performs the shift, resulting in 0x7fffffff. Finally, the cast takes the least bits, or 0xff, and converts them to a byte. The step-by-step details are listed in Table 1.12. The correct solution is to treat the byte as an unsigned value, then convert to an int, and then do the shift and cast back down to a byte: byte b1 = (byte) 0xff; byte b2 = (byte)((b & 0xff) >>> 1); // correct: Table 1.12 Comparison of Bit-Shifting a Byte Type STEPS INCORRECT CORRECT Initial value 0xff = –1 0xff = –1 Convert to int 0xffffffff = –1 0x000000ff = 255 Right-shift >>> 0x7fffffff = large int 0x0000007f = 127 Cast down to byte 0xff = –1 0x7f = 127 To convert a byte back to its unsigned value, you have to do a little trick; you must bitwise AND the byte with 0xff, that is, b & 0xff. For example: byte b = 128; int i = 1; int unsignedByteValue = b & 0xff System.out.println((i + unsignedByteValue); // 129 System.out.println((i + (b & 0xff)); // combined This trick works by observing that the least-significant bytes in an int match exactly with the byte type, as shown in Table 1.13. Conversion of Integral Types to Byte Arrays Java does not have any easy way of converting integral types into raw bytes. There are the serialization methods, but these involve a lot of work for something that should be relatively simple. For example: public static byte[] intToByteArray(int i) { byte[] buf = new byte[4]; b[0] = (byte) (i >>> 24); b[1] = (byte) (i >>> 16); b[2] = (byte) (i >>> 8); b[3] = (byte) i; } public static int byteArrayToInt(byte[]) { If (buf.length != 4) throw new RuntimeException(“Bad Length”); return ((b[0] & 0xff) << 24) | ((b[1] & 0xff) << 16) | (b[2] & 0xff) << 8) | (b[3]); } Table 1.13 Comparison of Representation between int and byte Types UNSIGNED SIGNED BYTE INT VALUE VALUE REPRESENTATION REPRESENTATION 0 0 00 00000000 1 1 01 00000001 2 2 02 00000002 126 126 7d 0000007e 127 127 7f 0000007f 128 -128 80 00000080 Table 1.13 (Continued) UNSIGNED SIGNED BYTE INT VALUE VALUE REPRESENTATION REPRESENTATION 129 -127 81 00000081 130 -126 82 00000082 253 -3 fd fffffffd 254 -2 fe fffffffe 255 -1 ff ffffffff 256 256 00 00000100 We could do the same thing for longs. Instead, we’ll demonstrate a dif- ferent technique that builds the result by incrementally shifting the value in question: public static byte[] longToByteArray(long I) { byte[] buf = new byte[8]; for (int j = 7; j >= 0; ++j) { buf[j] = (l & 0xffL); // !! must use 0xffL, not 0xff !! l >>>= 8; // l = l >>> 8; } } public static long byteArrayToLong(byte[] buf) { long i = 0; if (buf.length != 8) throw new RuntimeException(“Bad Length”); for (int j = 0; j < 8; ++j) { i |= buf[j]; i <<= 8; // l = l << 8; } return i; } Converting to Hex Strings Converting an array into a hexadecimal string is a fairly common task for printing, data interchange, and debugging. A naive way to do hex conver- sions is by using one of the previously mentioned byte array-to-long methods followed by Long.toHexString. However this is very slow and is limited to arrays smaller than 8 bytes. Even worse would be using BigInteger (discussed later in the chapter). For example: byte[] mybytes = ...; BigInteger a = new BigInteger(1, mybytes); return a.toString(16); While mathematically correct, most programs operate on printing two hex digits per byte; for example, decimal 16 is converted to 0F, not F. The other problem is that this method is horrendously slow. While these methods are useful in a pinch, for high performance appli- cations they won’t do. If one is willing to use a little memory, performance can be doubled or tripled. The conversion itself is fairly simple, but an implementation that uses tables has some important benefits: Provides typically higher performance (at the expense of some minor setup and memory costs). Allows using different alphabets instead of the standard. These issues are further detailed in Chapter 4. Typically allows simpler coding. The conversion class starts out defining the hexadecimal alphabet (in our case, the standard one). The hexDecode table is the inverse mapping; for instance, character A is mapped to 10. Characters that are invalid are set to –1. For example: public class Hexify { protected static char[] hexDigits = {‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’ ‘6’, ‘7’, ‘8’, ‘9’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’}, protected static int[] hexDecode = new int[256]; static { for (int i = 0; i < 256; ++i) hexDecode[i] = -1; for (int i = ‘0’; i <= ‘9’; ++i) hexDecode[i] = i - ‘0’; for (int i = ‘A’; i <= ‘F’; ++i) hexDecode[i] = i - ‘A’ + 10; for (int i = ‘a’; i <= ‘f’; ++i) hexDecode[i] = i - ‘a’ + 10; } Encoding is fairly straightforward. Each byte is split into two sections, and the table is used to determine the appropriate character: public static String encode(byte[] b) { char[] buf = new char[b.length * 2]; int max = b.length; int j = 0; for (int i = 0; i < max; ++i) { buf[j++] = hexDigits[(b[i] & 0xf0) >> 4]; buf[j++] = hexDigits[b[i] & 0x0f]; } return new String[buf]; } Decoding is more interesting. First, a function, given a hexadecimal character, returns the integer value. This is done via a table lookup, and the result is verified to make sure it’s valid. It’s always important to check the input for validity, especially if you don’t have control of what the input is. You could easily add or inline these checks into the main decode routine for a minimal improvement in execution time, as follows: protected static int getHexDecode(char c) { int x = hexDecode[c]; if (x < 0) throw new RuntimeException(“Bad hex digit “ + c); return x; } While the encode function always encodes two characters per byte, you may receive a hexadecimal string that doesn’t conform to this rule. If you are only processing strings that you created, you could check that the num- ber of input characters is even and that an exception is thrown if it’s not. The following implementation handles full generality. The string is checked to see if it has an odd number of characters, by computing max & 0x01 (or, equivalently, max % 2), which just looks at the least-significant bit. If it’s set, then the value is odd. public static byte[] decode(String s) { char[] input = s.charArray[]; int max = input.length; int maxodd = max & 0x01; byte b; byte[] buf = new byte[max/2 + odd]; int i = 0, j = 0; if (maxodd == 1) { buf[j++] = getHexDecode[input[i++]] } while (i < max) { buf[j++] == (byte)(getHexDecode[input[i++]] << 4 | getHexdecode[input[i++]]); } return buf; } //end of class Hexity BigInteger The BigInteger class in the java.math package provides a way of doing computations with arbitrary large integer types. Unlike C++, Java does not overload operators, so to do basic math, you have to call methods to perform mathematical operations such as addition. All of the operations create new BigInteger objects as results. The objects used in the opera- tions remain untouched. For instance: BigInteger b1 = new BigInteger(“1”); BigInteger b2 = new BigInteger(“2”); BigInteger b3 = b1.add(b2); At the end, b1 is still 1, b2 is still 2, and b3 is what you’d expect it to be, 3. Modifying an object requires explicit code. For instance, if we wanted b1 to contain the result of the addition, we would use self-assignment like this: b1 = b1.add(b2); // b1 is now 3 While the notation is different, the behavior is no different than using normal operators. Adding 1 + 2 doesn’t change the original values of 1 or 2. All of the usual arithmetic operations are available: add, subtract, multiply, divide, and remainder. Another method, divideAnd- Remainder, returns a two-element BigInteger array where element 0 is the division result and element 1 is the remainder. A complete list and comparison of BigInteger operations is given in Table 1.14. Table 1.14 BigInteger Arithmetic Operators ARITHMETIC NATIVE TYPE BIGINTEGER OPERATION NOTATION NOTATION Addition a + b a.add(b) Subtraction a – b a.subtract(b) Multiplication a * b a.mult(b) Integer division a / b a.divide(b) Remainder a % b a.remainder(b) Division with int[] result = BigInteger[] result = remainder { a / b, a % b } a.divideAndRemainder(b) Negation –a a.negate() Table 1.14 (Continued) ARITHMETIC NATIVE TYPE BIGINTEGER OPERATION NOTATION NOTATION Exponentiation Math.pow(a,b) a.pow(b) Random value Random r = new Random(); Random r = ... int bits = ...; a = r.getInt(); BigInteger a = new BigInteger(bits, r); Absolute value (a >= 0) ? a : b or a.abs() Math.abs(a) Minimum of a, b (a < b) ? a : b or a.min(b) Math.min(a,b) Maximum of a, b (a > b) ? a : b or a.max(b) Math.max(a,b) Likewise, all the usual bit operations are available, as described in Table 1.15. There is no need for a special unsigned right-shift operator, since BigInteger has unlimited capacity; the result of shifting n places to the right is the equivalent of dividing by 2n, regardless of sign. In addition are a few new methods that simplify working with bits: testBit, setBit, and flipBit. Two others called bitCount and bitLength need a bit more explaining. For positive integers the results are what you’d expect: bitLength returns the minimal number of bits required to represent the integer, and BitCount returns the number of one-bits in the representation. For negative numbers, bitLength gives a minimal length excluding the sign bit, and bitCount returns the number of zeros in the representation. Creating and Converting BigInteger objects can be converted by using a string representation of a number, a native type, or with a byte array. Using a native type, a string, or a byte array representation of a number creates BigInteger objects. Strings You can create BigInteger objects by using a construction that takes a string representation of a number. By default, the constructor expects the string representation to be in decimal (base 10) form. However, you can also use the standard base 16 representations, as follows: BigInteger(String base10Rep) BigInteger(String representation, int radix) Table 1.15 BigInteger Bit Operations NATIVE TYPE BIGINTEGER BIT OPERATION NOTATION NOTATION AND a & b a.and(b) ANDNOT a & ~b a.andNot(b) NOT (complement) ~a a.not(b) OR a | b a.or(b) XOR a ^ b a.xor(b) Shift left n bits, signed a << n a.shiftLeft(n) Shift right n bits, signed a >> n a.shiftRight(n) Test bit n (a & (1 << n) != 0) a.testBit(n) Set bit n (a | (1 << n)) a.setBit(n) Flip bit n (a ^ (1 << n)) a.flipBit(n) Conversely, you can retrieve a string representation in an appropriate base by using the toString method: String toString() String toString(int radix) Numeric Types The BigInteger class can be created from a long by using the static method: Long val = 123456789123; BigInteger bi = BigInteger.valueOf(val); Once you have the BigInteger object, it can be converted directly into a numeric type by using intValue, longValue, floatValue, or doubleValue. Not that we’ll mind, but there is no method to directly convert into a short or char value. If the value is too large to fit into the numeric type, a silent narrowing conversion is done (the high bits are chopped off). Byte Arrays Generating byte arrays and constructing byte a. Assuming you have a BigInteger object, a byte array can be generated with: byte[] toByteArray() And the results of this output can be used to create a new BigInteger object: BigInteger(byte[] array) However, because of signing issues, typically this format is not used for cryptographic purposes. If the bit length is a multiple of 8, and it always is for cryptographic applications, the byte array starts with an extra byte indicating the sign (0 for positive). Either your application can deal with a leading zero, or you can strip it away, as in the following: byte ba[] = bi.toByteArray(); if (ba[0] == 0) { byte[] tmp = new byte[ba.length - 1]; System.arraycopy(ba, 1, tmp, 0, tmp.length) ba = tmp; } A lower-speed option is to use strings as the common medium: Byte ba[] = Hexify.decode(bi.toString(16); Since BigInteger expects a sign-bit, you’ll need to use a special con- structor and manually indicate the sign by passing in 1, –1, or 0 for positive, negative, or zero: BigInteger(int signum, byte[] magnitude) BigInteger and Cryptography BigInteger has a few special methods specifically designed for cryptog- raphy, listed in Table 1.16. These operations are discussed fully in Chapter 3. Secret Methods in BigInteger For Sun Microsystems to effectively test for primality, many generic methods and algorithms had to be implemented; however, they are not part of the public API. For people working in numbers theory or who are developing more advanced cryptographic algorithms, these “hidden” methods may be useful. Unfortunately, they are declared private; but with a few modifications to the source code, you can create your own enhanced BigInteger variant. Table 1.16 BigInteger Operations Useful in Implementing Cryptographic Algorithms OPERATION BIGINTEGER NOTATION Create a random nonnegative integer BigInteger uniformly distributed from 0 to 2n–1 (int numBits, Random r) Create a random integer that is prime Random r = ...; with certainty 1 – 2n int bitLength = ...; BigInteger a = BigInteger(bitLength, r) Check if prime with certainty using a.isProbablePrime() IEEE standard of 1 – 2100 a mod b a.mod(b) an mod b a.modPow(b, n); // n is an int, not BigInteger find a-1, such that aa-1 = 1 mod b a.modInv(b) Greatest common denominator of a, b a.gcb(b) The most interesting of these are the following methods: int jacobiSymbol(int p, BigInteger n); // this is “package protected” private boolean passesMillerRabin(int iterations) private boolean passesLucasLehmer() private static BigInteger lucasLehmerSequence(int z, BigInteger k, BigInteger n) In addition, there is an implementation of sieve for testing primality of small numbers in the class BitSieve. To “free” them: 1. Copy all the source files from this directory into a new directory. 2. Edit each file and change the package name from sun.math to one of your own choosing. 3. Make the BitSieve class “public.” 4. Change the desired methods and classes to public. Note that the method jacobiSymbol in BigInteger and the class BitSieve do not have an access modifier, so you have to add public in front of the method. 5. Compile. This code is copyrighted by Sun Microsystems. For commercial use and distribution, refer to the license agreement distributed with the source code. Now that we have a full understanding of bit operations and the Java model, we’ll discuss secret and public key cryptography. 35 Secret key cryptography is based on cryptographic functions that rely on a single key that must be kept confidential. Secret key ciphers perform encryption and decryption using the same key, and cryptographic hashes must be computed and verified using the same key. Symmetric Block Ciphers In general, a cipher is a function that maps a message, known as plaintext, into an unreadable form, known as ciphertext, by use of a key. This is known as encryption. Without the key, you cannot do the inverse transformation, the decryption, or turning the ciphertext back into its original plaintext form. In lay terminology code is often used interchangeably with cipher. However, they have different meanings. A code is translation of a message by use of a dictionary, without any key being used. Modern block ciphers operate on a fixed number of bytes in a single pass, for instance, transforming 8 bytes plaintext into 8 bytes ciphertext. Messages larger than the block must be chopped into pieces, and if the message isn’t a multiple of the block size, the message must be padded appropriately. Secret Key Cryptography C HAPTE R 2 Cipher Properties To be useful a cipher must be secure and must be usable. What makes a cipher secure? For ciphers that have stood the test of time (i.e., those in common use and have not been cracked), a rough estimate of security is key length, with the longer the key being more secure. A cipher is deemed secure if there is no attack that is significantly better than brute force. How- ever, clearly this by itself is not enough, and the notion of security is tied to current or contextual computation resources available to an attacker. An algorithm may be structurally secure, but if the key is too small, brute-force attacks can easily break the cipher by directly recovering the key. Security Properties The fundamental property of an encryption algorithm is that the output be effectively random, regardless of what the input or key is. From this prop- erty, many notions emerge regarding how well the algorithm performs: Semantically secure, polynomially secure, or “indistinguishably of encryptions.” There are many variants of this idea, but the simplest is that, given two plaintext messages and their ciphertext equivalents, it is impossible to determine which plaintext matches the correct encrypted version with a probability greater than .5. In other words, the ciphertext does not reveal any information about the plaintext. Nonmalleability. Given a ciphertext, it is impossible to manip- ulate the ciphertext to produce related plaintext. In other words, a 1-bit change to a ciphertext should result in a very different plaintext. Plaintext-aware. It is computationally infeasible to produce a valid ciphertext without knowing the corresponding plaintext. For instance, you should be able to generate ciphertext that decrypts to 7-bit ASCII, unless you are encrypting the original plaintext. This is useful to prevent the generation of false messages. Lack of algebraic structure. It’s very important that a cipher doesn’t have certain algebraic properties that might allow multiple encryp- tion or multiple keys to negate each other. In particular, it is important that the cipher does not form a group where: Each key has an inverse key. Encrypting something twice with different keys is equivalent to encrypting it once with another key. The result of encrypting with one key and decrypting with another key is the same as encrypting with one key. This would be very useful for attack purposes. Ideally, you don’t even want a subset of keys and messages to form a group; however, proving that is quite tricky. You also don’t want “fixed points,” that is, messages that encrypt to themselves. Brute-Force Attacks Brute-force attacks are the simplest attack, and for symmetric ciphers, this means trying each key, one after another, until the message decrypts prop- erly. In decryption, there is no inherent flag saying “this is the right key” or “decryption occurred correctly.” For instance, a ciphertext with one key might decrypt to cat, while the same ciphertext decrypted with a different key might decrypt to dog. Another way of looking at this is if you encrypted a random number or binary data, an attacker using the brute-force tech- nique would be unable to determine if the resulting plaintext was the orig- inal number without using some additional context. Typically, you do not encrypt random binary data, but rather something with more structure. During brute-force attacks, the decrypted result is checked to see if the structure exists. For instance, English text is encoded using ASCII where only 7 bits in each byte are used. During a brute-force attack, every output that is only 7 bits is stored in a “candidates” file, which will need to be checked by an actual human to determine what the correct result is. False-positive results are extremely rare for language texts. Even with small key sizes, such as DES (for Data Encryption Standard), there are 255, or approximately 3.6 × 1016, keys to check. When measuring cryptographic security, you can’t be afraid of large numbers. While 255 sounds out of reach, in fact, it’s quite doable. Brute-force algorithms are easily parallelizable, and with the Internet, tens of thousands of computers are easily pooled for such attacks, thereby enabling trillions of keys to be tested per second. Other Attacks Brute-force attacks can easily be thwarted by using a larger key, so a full key-space scan becomes impossible. The following attacks attempt to find weaknesses in the cipher and provide shortcuts to a brute-force attack. If a significant attack occurs, the cipher is said to be broken and thus insecure. Ciphertext-only attack. This is probably the most non-networked application. The adversary here has a ciphertext message and that’s it. In the vast majority of cases, this attack indicates brute force, where every key is checked. Chosen plaintext. Here the attack has a bulk list of plaintexts and receives the corresponding ciphertext and through analysis can determine the key. Adaptive chosen plaintext. Similar to the previous except the process is more interactive. Plaintext may depend on the results of the previ- ous plaintext and ciphertexts. Chosen ciphertext. The attack selects ciphertext, which is then decrypted. Related key. The attack requires the plaintext and the ciphertext from different keys. Partial known plaintext attack. This attack is one of my own devis- ing. I haven’t seen anything in literature of this. Here the attacker has the ciphertext and some of the plaintext. This potential attack turns out to have applications for database security. Power attack. This only applies to smartcard-like devices. An attacker can actually measure the radiation and time of the cryptographic computation and estimate the key or size of the key. Many of these attacks come in the form of linear or differential (or some combination) analysis. Knowing the algorithm being used and with many pairs of text (plaintext and/or ciphertext), an attacker can determine the internal structure and thus extract the key or limit the key space. Many might be wondering what the point is of the attacks if the attacker already has plaintext-ciphertext pairs. The trick is that the text pairs are chosen by the attackers and by themselves are not interesting (the values are more or less random or generated by an algorithm). The plan is to use these pairings to recover the key and thus decrypt ciphertexts that do con- tain useful information. Still, the attacks may seem odd, yet many network protocols allow many of these types of attacks since they echo back. For common ciphers most of these attacks require a lot of (at least 240) pairings, which would certainly be noticed by even the most primitive network monitoring. Another way these attacks can be exploited is by an attacker having access to either the encryption or decryption functions of a system. A hacker breaking into a system might not have access to the keys but will have access to system calls to do encryption. Then the attack might only generate a modest strain on the system’s resources and not be noticeable. Common Block Ciphers There are dozens of block ciphers; however, you really only need to consider a small handful (see Table 2.1). Block ciphers come in two flavors: 64-bit block and 128-bit block. Data Encryption Standard (DES) The Data Encryption Standard, first developed in the 1970s, is the most widely used cipher. It was the first industrial-strength algorithm that was fully specified to the public, in Federal Information Processing Standard, or FIPS, 46-2. (FIPS are the standards used by the U.S. federal government for information processing.) DES keys are specified as a 64-bit value; however, only 56 bits are used as part of the key. For each byte, the 8th bit is a parity bit that causes bytes to have an odd number of one-bits in them (other wise known as “odd- parity”). This is left over from the early days of computing. Table 2.1 Selected Secret-Key Ciphers BLOCK SIZE KEY SIZE ALGORITHM YEAR BITS BITS REASON DES 1977 64 56 Standard Triple DES 1985 64 168 Standard Blowfish 1993 64 40–442 Popular, fast RC5 1996 64 standard, 0–2048 Popular, but supports supported 32 and 128 IDEA 1992 64 128 Popular Rijndael 1998 128, 192, 256 128, 192, 256 Standard, fast Twofish 1998 128, 192, 256 128, 192, 256 Popular RC6 1998 128, 192, 256 128, 192, 256 Popular, supported It’s worth looking at the inner workings of DES, since the ideas are used in many ciphers and the concept is actually fairly simple. First, the key is expanded. Although the key itself is only 56 bits, internally the key is expanded into sixteen 48-bit values. Likewise, various stages in the pipeline also expand data from 32 bits to 48 bits. Later, these expanded values get contracted as well. While, theoretically, expanding a key into larger values doesn’t make a key longer than it really is, it does have the benefit of allow- ing more complicated functions to be constructed with simpler code. The basic structure of DES is reused in many other ciphers and is what is known as Feistel structure. The plaintext block is split into left and right sec- tions, and then the following is iterated: Ln+1 = Rn Rn+1 = Ln ⊕ f(Rn, Kn) where Kn is a subkey of the main key and f is a mixing function. This mix- ing function is really what makes DES successful. It uses what are called S-Boxes, which are maps from a 6-bit input to a 4-bit output (the contraction) based on a lookup table. In DES (and other ciphers) the map is nonlinear and normally specified by using tables. The security of DES depends entirely on the values in the S-Boxes and exactly how they were constructed is a mystery to this day. DES Security As of this writing, DES has had no attacks that are significantly better than brute force. So in structure DES is solid and secure. However, the problem of short key size and vulnerability to brute-force attacks remains. Nowa- days DES can be cracked either by specialized hardware or by distributed networks in a few hours. (See www.eff.org/descracker/ for full details.) What’s interesting is that various modifications have been pro- posed to enhance DES. It has been shown that just about any tampering or enhancements to the S-Box structure results in a much weaker cipher. These days there really is no reason to use DES except in legacy applications. Triple DES While DES appears to have a solid design, the key space is just too short and is susceptible to brute-force attacks. To help fix the problem, an interim solution was proposed that effectively performs multiple iterations of DES on the message using different keys. What’s known as Triple DES, TDES, DES-EDE, or 3DES can provide 168 bits of security using three DES keys. A less common version is a two-key version providing 112 bits of security where three operations are done, but where one key is used twice. It does this by composition of three DES operations, as shown in Table 2.2. Table 2.2 Triple DES Operations OPERATION 3 KEYS 2 KEYS Encryption Ek3(Dk2(Ek1(m))) Ek1(Dk2(Ek1(m))) Decryption Dk3(Ek2(Dk1(m))) Dk1(Ek2(Dk1(m))) In software this is a bit slow, but for basic networking (100 megabit), the speed limitations aren’t much of a problem, since the algorithm can be implemented in hardware fairly efficiently. Triple DES is not a FIPS but instead is specified under the ANSI X.9.52 standard and NIST SP 800-20 (ANSI stands for American National Standard Institute, NIST for National Institute of Standards and Technology). Triple DES Security Providing 168 bits of security, Triple DES is by far the most studied, strongest algorithm available. When in doubt, DES is probably the best choice. Despite more modern ciphers, nobody suggests that Triple DES is less secure than others. Why Not “Double DES” Using two keys, or “Double DES,” isn’t done because the key size of 112 is too small; moreover, it isn’t much more secure than a single iteration of DES. Given a plaintext-ciphertext pair, an attacker could extract the key using a brute-force attack requiring up to 2112 computations. However, the attacker can trade computation costs for memory storage by using a “meet- in-the-middle” attack. There are many variations, but the simplest is to compute 256 different encryptions of the plaintext and store them, then begin decrypting the ciphertext using keys and compare the result to the table. The attacker will likely get multiple matches and would need another plaintext-ciphertext pair to verify that is the correct key. The worst case is that the attacker would need to perform 256 decryptions, but even then that’s only 257 total operations. There is nothing specific to DES in this; any doubled-cipher will have this problem. The attack extends to the two-key version of Triple DES as well, but it’s generally less feasible, since the storage requirements become quite large. Blowfish In 1994, 56-bit DES was becoming obsolete and Triple DES was still very slow. Every other secure encryption algorithm was classified, incompletely specified (GOST, a Soviet algorithm was known, but the critical S-Boxes were classified), or known but unusable, since they were patented (Khufu, REDOC II, IDEA) or crippled for legal export (RC2, RC4). In late 1993, Bruce Schneier presented the Blowfish cipher, a free, non- patented, drop-in replacement for DES with a 64-bit block size but a vari- able key length of up to 448 bits [Schneier1994]. In today’s world awash with cryptologists and a healthy open-source movement, it’s easy to forget how revolutionary this was—strong cryptography for anyone who wanted it. Even better, the Blowfish cipher is still secure. The block size is the same as DES with 64 bits. However, Blowfish can have a variable key size in multiples of 8 bits, from 40 to 442 bits (56 bytes). During initialization, a key regardless of length is expanded to an internal structure of 4,168 bytes. The setup costs are expensive, since it does 521 iter- ations of encryption to generate the key schedule. However, after this, the key size has no effect on performance—in other words, the speed using a 442-bit key is the same as with a 56-bit key. This allows key choice to be based purely on security, storage, and protocol requirements instead of on perfor- mance. The initialization penalty can somewhat be avoided by directly specifying the internal array instead of having it generated (of course, then a key of 4168 bytes instead of at most 56 bytes must be saved and stored). In overall structure, it’s not that different from DES. The subkey array is eighteen 32-bit subkeys (compare to DES with sixteen 48-bit subkeys). There are four S-Boxes, each with 256 entries (compare to DES, which has eight S-Boxes with 64 entries). The most important difference is that Blow- fish dynamically creates S-Box values during initialization, while DES uses fixed values. Notice that in Blowfish, all data is a nice round number of 32 and 8 bits, all designed to fit into machine registers with minimal amount of bit extractions and shifts. This gives Blowfish its speed. Cryptographically, Blowfish has been deemed strong and has been quite well received. The algorithm has been included in dozens of free and limited commercial products. There have been no significant faults found with the algorithm. In a reduced eight-round version (not the standard 16 rounds), roughly 1 in 214 keys are weak [Vau1996]. The weakness allows an attacker to know if a weak key is being used or not, but not what the specific key is (and it also allows some other more abstract attacks). The weak keys come about because Blowfish dynamically creates S-Boxes. Apparently, in some cases Blowfish generates some stinkers where some of the S-Box values are duplicated. There is no way of knowing which key is weak in advance; you have to go through the iteration and then check the S-Box values to see. Again, no weak keys have been found for the full- strength 16-round version. IDEA The IDEA cipher also operates on 64-bit blocks but uses a fixed-size 128-bit key length [LaMa1990, LaMaMu1991]. It is generally regarded as a very well-designed cipher and has experienced no known cryptographic attacks better than brute force. It is covered by patents owned by the Swiss firm Ascom, but they have allowed free noncommercial use, provided permis- sions are first obtained. Because of this licensing, IDEA became popular in the email encryption program PGP (Pretty Good Privacy). Today it seems to still be quite popular in the Perl community, partially since it is a very secure cipher and has received a positive recommendation from Bruce Schneier in Applied Cryptography (at the time Schneier had just developed Blowfish and it was still too new to be recommended). However, IDEA, like Blowfish, generates some nonlinear aspects of the cipher based on the key; because of this there are classes of weak keys [DaGoVn1993]. Fortu- nately, the odds of using one of the weak keys is 2-77, so it’s not considered a grave threat. While IDEA is fairly easy to implement in software, it’s also very slow because of generous use of expensive multiplication operations. A naïve implementation is typically about the same speed as DES, if not slower. Those same multiplications allow it to speed up significantly on 64-bit machines, or if someone hand-codes an assembly version that uses some microprocessors’ extensions such as MMX (Intel and AMD processors), SSE2 (Intel’s Pentium 4), or AltiVec (IBM/Motorola’s PowerPC). In these cases, it’s about twice as fast as DES. See [Lipmaa1998] for a discussion of using the MMX extension with IDEA. RC5 RC5 was presented by Ron Rivest (coinventor of the RSA, for Rivest- Shamir-Adelman, algorithm) in 1995 [Rivest1995]. It is also fully specified in RFC 2105; however, it is patented (U.S. Patent 5,724,428) and is propri- etary to RSA Security. It is fully parameterized in terms of block size, key size, and security represented as RC5-w/r/b, where: w = Word size in bits—16, 32, and 64—which results in a block size of 2w. The standard is 32. r = Number of rounds, which, in turn, provides security. The appro- priate number depends on w. b = Number of bytes in secret key; can be 0 to 255. Internally, the key expansion stage takes a key and turns into an internal array S of r × w/8 bytes. Following that, encryption is amazingly short and simple: A = A + S[0]; // A is first half of plaintext block B = B + S[1]; // B is second half of plaintext block for (i = 1 .. r) A = ((A 5 B) << B) + S[2i]; // <-- data-dependant rotations B = ((B 5 A) << A) + S[2i+1] ; The matching decryption algorithm is equally terse. As you can see, like Blowfish, RC5 uses data-dependant rotations. RC5 is typically used with 64-bit blocks, 16 rounds, and keys being at least 128 bits (e.g., RC5-32/16/128+). While you can use larger blocks, you might consider using the newer ciphers like Rijndael or RC6 (discussed in the following sections). Using these guidelines, the most effective attacks are brute force, and the security is based on the size of the key. There is no reason not to use at least 128 bits, since after key setup, the cipher runs at the same speed independent of key size. Rijndael and the Advanced Encryption Standard (AES) In 2001, the NIST selected Rijndael as the replacement for DES (FIPS 197). Flemish for XYZ and pronounced “rain-doll,” Rijndael is an interesting cipher, since it works in a completely different way from the previous ciphers. The algorithm is in some ways similar to shuffling and cutting a deck of cards. The interstate is laid out in a square, and the rows and columns are shifted, mixed, and added in various ways. The entries themselves are also substituted and altered. It has a lot of parallel and symmetric structure because of the mathematics, which provides a lot of flexibility in how it is implemented. However, some have criticized it as having too much structure, which may lead to future attacks. Apparently that didn’t bother the NSA (National Security Agency) or the NIST. No known cryptographic attacks are known, and it works well on a wide variety of processors, doesn’t use bit shifting or rotation, and is very fast. The AES Competition How Rijndael was selected is also interesting. Knowing that DES had out- lived its useful lifetime, the NIST issued an open call for replacements. Anyone was welcome to submit a cipher. The general requirements were: Must be a symmetric block cipher. Must support a block size of 128 bits. Must support key sizes of 128, 192, and 256 bits. Should be at least as fast Triple DES. Should run on a variety of hardware and should have “good” characteristics for implementation in hardware. If selected as the winner, the submitter relinquishes all intellectual property claims such as patents and trademarks. Out of 15 semifinalists, 5 were selected: Rijndael, the eventual winner. MARS, from IBM. RC6, from RSA Labs (similar to RC5). Serpent, from Ross Anderson, Eli Biham, and Lars Kundsen. Twofish, from Bruce Schneier, John Kelsey, Doug Whiting, David Wagner, Chris Hall, and Niels Ferguson. It is the successor to the previously mentioned Blowfish algorithm. How did they rate? Table 2.3 presents summaries from the confer- ence reports from the Third Advanced Encryption Standard Candidate Conference. The rankings are extremely relative, and a last-place finish does not mean the algorithm was unacceptable. Each algorithm author had his or her own interpretation of what the NIST selection process called for. Ser- pent, for instance, was extremely conservative in security, and this shows in its performance, while RC6 focused on 32- and 64-bit processors. That the four other ciphers were not selected does not imply they are insecure. To the contrary, the NSA said any of the final candidates would make a good standard from a security standpoint (the speed with which they made their decision would seem to imply that the NSA has much more sophisticated methods of cipher analysis than the public has). In addition, the NIST has not specifically excluded them from being used. Since the NIST wanted a “generalist” cipher, these alternates may find their way into other more specialized standards and applications. However, Twofish and RC6 are particularly noteworthy because they have strong corporate or independent support and are likely to be used in future applications. Table 2.3 Evaluation of AES Finalists on Various Platforms PROCESSOR RELATIVE RANKING ARCHITECTURE BETTER......................................................WORSE FGPA Rijndael Serpent RC6 MARS (programmable Twofish logic) DSP TMS320C62x RC6 Rijndael Twofish Toshiba T6N55 Rijndael RC6 (smart card) Twofish Pentium II Rijndael RC6 MARS Serpent Twofish Itanium / Rijndael RC6 MARS Serpent PA-RISC (HP) Twofish Alpha 21264 Rijndael Twofish RC6 MARS Serpent ASIC hardware, Rijndael Serpent Twofish MARS/RC6 worst case “Parallelism” Rijndael Twofish Serpent MARS RC6 (feedback modes) Memory usage RC6 Serpent MARS Twofish Rijndael Software Rijndael MARS Serpent (summary, Whiting) Twofish RC6 Software (summary, RC6 Rijndael MARS Twofish Serpent Bassham) Java (speed) Rijndael MARS RC6 Java (throughput) RC6 MARS Rijndael Twofish Java (key setup) Rijndael Serpent RC6 MARS Twofish Twofish The successor to Blowfish, Twofish has already been widely deployed in many open-source and minor commercial projects. Twofish is meticulously constructed and well documented, and the submission paper is an excel- lent reference and bibliography on cryptanalysis. The developers even published a book on its design [ScKeWhWaFe1999]. In some ways, Twofish is a very traditional design that isn’t dissimilar to DES: it’s based on a Feistel design using expanded keys and S-Boxes for nonlinearity. The big differ- ence is how each part was designed and how each part interacts with the others. Many components were found by an exhaustive search for a map- ping that provided an optimal result using the smallest number of machine operations. While each piece of Twofish separately is fairly straightfor- ward, the final result is hard to understand and analyze. One description of it is “a collection of patches which fix various problems” [Baudron1999]. That said, no one is suggesting it is insecure and currently the best attack is brute force. RC6 RC6 builds upon the success of RC5 and is modified to handle 128-bit block sizes using four 32-bit registers. Like RC5, the algorithm is completely parameterized and specified as RC6-w/r/b. As submitted to the AES, RC6 uses w = 32 and r = 20. The length of the key may be any multiple of 8; how- ever, it is most common to use b = 16, 24 or 32 bytes (128, 192, and 256 bits). The key scheduling function is almost identical to the one used in RC5, except that RC6 generates more bytes from the key. It is fast—and in many cases faster than Rijndael—and Java performance is excellent. RSA Secu- rity has submitted RC6 to various other standard bodies, so you can be sure you will be seeing more of it. They also have trademarked the name “RC6” and received a patent on its design (U.S. Patent 6,269,163). Ciphers You Shouldn’t Use This section lists ciphers that are insecure and should not be used. Oddly, they keep turning up in supposedly high-security applications such as authentication tokens (see [FuSiSmFe2001] for examples). The ciphers are most likely used because they are simple to implement in a few lines of code and visually appear to do a good job obscuring the underlying data. Password XOR Schemes In this type of scheme, the user takes a password and repeatedly uses it to XOR against the message. This has been used in various forms, including file protection. It is also common in protecting machine code where the program starts to decrypt the machine code and then jumps into the newly decrypted code. However, this is a weak scheme, and it can be easily cracked, especially if there is any knowledge about what the plaintext looks like (such as common headers or ASCII). It can be used to obscure data to make it visually unreadable, but not for anything that is susceptible to an attack: // same function to encode and decode public static void xorData(byte[] data, byte[] key) { for (int i = 0, j = 0; i < data.length; i++) { data[i] ^= key[j++]; if (j == key.length) j = 0; } } Classical Cryptography Classical cryptography involved simple ciphers that typically worked on letters or characters as opposed to bytes. More advanced classical cipher operated on blocks, but typically blocks of letters instead of blocks of bytes. These are now primarily of historical interest and useful as examples for more advanced concepts. Interestingly, some of the work of classical cryptography is still classified. From 1938 to 1941, William Friedman wrote a four-volume series titled Military Cryptanalysis. In 1984 the first two volumes were declassified. A more modern series from Lambros Callimahos included three volumes written from 1959 to 1977. Again only the first two are declassified. In 1992, an attempt to declassify the remaining volumes using the Freedom of Infor- mation Act was rejected. According to the NSA (Gilmore v. NSA: 10-02-93 let- ter, available at: www.eff.org/Privacy/Key_escrow/ITAR_export/ ITAR_FOIA/Gilmore_v_NSA/10-02-93.letter The documents are classified because their disclosure could reasonably be expected to cause serious damage to the national security . . . . In addition, this Agency is authorized by various statutes to protect certain information concerning its activities. We have determined that such information exists in these documents. In other words, 45 years later, World War II cryptography is still essential to national security. One can only speculate as to why this would be. To my knowledge no further attempts have been made to release the material since 1992, but maybe someone will try again. Regardless of what is hidden in those texts, classical cryptography has no place in your security solution.