This page discusses an assembly language implementation of the encryption algorithm known as Blowfish.
Introduction
Bitwise XOR and modulo additions
Feistel network
The function F
Encoding and decoding routines
Key generation
Testing
References
Blowfish is a fast and secure encryption algorithm, designed by B. Schneier in 1993. Since that time it has never been cracked, in spite of numerous attempts. It is designed for speed, by using only simple operations like additions and bitwise exclusive or (XOR). Its speed and strength are due to the fact that it uses a huge key, over 4 Kbytes of random numbers! Obviously, it is impossible to memorize such a key, but the Blowfish designer provided an elegant solution: a unique key is selected once and for all, and subsequently altered by a password of your choice. The alteration is performed via repeted rounds of Blowfish encryption, in other words the key encodes itself. This self-encoding process is a bit lengthy but only needs to be performed once per session.
Before we go further, let me discuss briefly the two mathematical operations that are at the heart of Blowfish: modulo additions and bitwise XOR.
Modulo additions simply means that there is a maximum value that cannot be trespassed. If the result of the addition is greater than this value, it "wraps aroud" and restart from zero. For instance, additions modulo 216 are 16-bits additions: >F000 + >2002 should be >11002, but because only 16 bits are allowed, the result is >1002 instead. Blowfish uses modulo 232 additions, which means the greatest possible value is >FFFFFFFF.
A bitwise exclusive OR compares two numbers bit-by-bit: when the two bits are identical the corresponding bit in the result os 0, when the bits are different the result is 1. In other words, the truth table is: 0 xor 0 = 0, 0 xor 1 = 1, 1 xor 0 = 1, 1 xor 1 = 0.
XOR is a commonly used encryption tool, as it very effectively mangles the original value (aka plaintext):
>00100110 (plaintext: >46 or 'F')
xor >10110010 (key: >B2)
>11010100 (ciphertext: >D4)
XOR is commutative, so the plaintext can be retrieved by just XORing the ciphertext with the key again:
>11010100 (ciphertext: >D4)
xor >10110010 (key: >B2)
>00100110 (plaintext: >46)
Unfortunately it is also associative, so if an adversary manages to guess one word in the plaintext, he can obtain the key by combining plantext and ciphertext:
>00100110 (plaintext: >46)
xor >11010100 (ciphertext: >D4)
>10110010 (key: >B2)
Several strategies are possible to overcome this problem, by combining XORs in more complex functions.
Blowfish is is a block cipher, which means that is operates on blocks of a fixed size (64 bits, or 8 bytes) that are encoded independently from each other. It is based on a Feistel network, an algorithm that operates as follows:
The function F will be discussed later, for the moment being it is sufficient to know that it returns an arbitrary number that depends on the value of its argument.
It is easy to see that the above algorithm is symetrical and can be taken backwards to decipher the ciphertext. To make the code harder to brake, the process is repeated 8 times. In addition, at each step Xl and Xr are XORed with arbitrarily chosen random numbers, that change at each pass. These numbers are stored in an array called P that contains eighteen 32-bit numbers and forms one part of the key (there are 18 values instead of 16 because an extra XOR is performed after the last step).
The algorithm thus become:
To decode the final value back to the original one, the same method is
used, except that the array P is walked backwards:
Elegant, isn't it?
Before we get to this mysterious function F, let me introduce two important cryptographic concepts: one-way functions and non-linearity.
A one way function is a function where it is very easy to calculate F(x) given x, but very difficult, if at all possible, to work backwards and get x from F(x). A naive exemple would be: y = x2 It is easy to square a number, but much harder to extract its square root. Of course, real-world one-way functions are more complex than this, but you get the idea.
Non-linearity means that even a tiny change in the key results in major changes in the ciphertext. It is important so that an adversary does not get the feeling that "it begins to make sense" as he guesses more and more letters in your password. Ideally, even if he has all characters correct but for a single bit, the ciphertext should still be just as mystifying as if he had it all wrong.
A good way to achieve non-linearity is to use S boxes. A S box is nothing else than an array of randomly chosen numbers. A non-linear function may just return S[x], i.e. the element at position x in the array. Since these elements are chosen at random, chances are that two neighbouring elements have vastly different values. In addition, this is also a one-way function since a given number may appear at several places in the array: given this value it is impossible to tell which element was the one that returned it.
There is one problem with this approach though: remember that the function F in Blowfish takes a 32-bit number as an argument and returns another 32-bit number. If the function was to perform a simple array lookup we would need a S box containing 4 billions 32-bit numbers! To overcome this problem, the function F in Blowfish breaks up the 32-bit input parameter in 4 bytes. Each byte is used to fetch a 32-bit number from a different S box (S1 through S4), and the four numbers are combined with additions and XOR operations. Blowfish uses four S boxes to avoid the symetry problems that could occur for numbers like >11112222 if only one box were used. Together with the P box, the four S boxes make up the encryption key.
The algorithm for function F is:
Note how fast and easy it is to obtain F(x) from x: only 4 lookups, two additions (modulo 232) and an exclusive or. By contrast, given F(x) it is very hard to figure out what the value of x was. It may even be impossible, as F may return the same number for different values of x. Thus F is a non-linear, one-way function.
With all this theory behind us, it is now time for some assembly code. Here is my implementation of the coding and decoding functions for Blowfish. Note that they have been optimised for speed: they almost exclusively use registers and my advise would be to use a workspace located in the fast scratch-pad memory, such as >8300. For the same reason, I did not implement F as a distinct function, but rather integrated it into each routine. This saves us the 16 calls and return operations. To further speed up these routines, you may "open the loop", viz copy F 16 times in a row instead of using a counter. This way, you don't even need to swap Xr and Xl, and you save the time needed to decrement a counter, test it, and jump back to the beginning of the loop.
*--------------------------------------------------------------------- |
*--------------------------------------------------------------------- |
If you followed the theoretical part above, you will have noted that Blowfish uses a really large key: a P box containing 18 32-bit numbers and four S boxes, each with 256 entries. All this adds up to 4168 bytes. What we need now is a way to generate these boxes from a simple, easy to remember, password.
To do this, we'll first start with an arbitrary set of random values in the boxes. By convention these are the decimals of Pi, but any suite of random numbers will do. The nice thing with Pi is that, if you loose the source code, you can always reconstitute it. Also, you can verify that the numbers provided are indeed the decimals of Pi and not values carefully selected by a devious programmer so as to leave a weakness in your encryption scheme, a backdoor that will allow him (or the government) to easily decrypt all your messages...
Second, we'll XOR your password over the bytes in the P box. Your password should be at most 56 characters (I don't know why it cannot be longer). If it's shorter it will be repeated over and over again, until all the bytes in the P box have been XORed with a letter from your password.
Third, we'll Blowfish-encode an arbitrary chosen 64-bit value, using the P and S boxes that we just created. By convention, the value chosen is >0000000000000000. The ciphered result will replace the first 4 bytes in the P box. This ciphered value is then Blowfished again, using the new version of the key, and the result replaces the next 4 bytes in the P box.
The process is repeated 521 times, so as to replace each entry in the P box, then in the four S boxes. Each time, the new ciphered value is encoded again, using the version of the key that it has just created. The whole process is quite lengthy, and takes about 3 seconds with the routine below.
*--------------------------------------------------------------------- |
Note that I also wrote a tiny routine to wipe the key out of memory, just for security purposes.
Oh, and here are the decimals of Pi (in hexadecimal).
Finally, here are some test routines. They encode 512 bytes in an arbitrary chosen buffer.
DEF ENCODE,DECODE,INIKEY,WIPE *--------------------------------------------------------------------- |
To test:
B. Schneider. Description of a New Variable-Length Key, 64-Bit Block Cipher (Blowfish). Fast Software Encryption, Cambridge Security Workshop Proceedings (December 1993). Springer Verlag, 1994, pp. 191-204.
B Schneider's company website: http://www.counterpane.com
A Blowfish implementation in C language by Paul Kosher (pck@netcom.com), found at the above website.