Implementing SEA on x51 and AVR
SEA - Scalable Encryption Standard - is a newly emerging block cipher from a group of belgian crypto-specialist, targeted primarily on microcontrollers.
The cipher is designed so, that it uses only elementary operations commonly found in microcontrollers - addition, logical AND, OR and XOR, rotation and moves. Thus, implementation of the cipher is straighforward and small, while performance is good.
Similarly to the vast majority of modern blok ciphers, also SEA is a Feistel cipher, but with a simple structure and maybe unusually high number of rounds. As it is only a few months old, there seems to be no independent cryptoanalysis yet, but the authors analysed it for several modes of attack quite exhaustively.
An unusual feature of the cipher is, that it takes into account the native word length of the microcontroller, thus defines e.g. 8-bit, or 32-bit version of the cipher. This increases the ease of its implementation. Although the different word-length versions are mutually incompatible, in the most common scenario this is not a big drawback, as the typical counterpart of the microcontroller is a bigger system (e.g. a PC) with abundant resources, where optimum implementation is not necessary.
Another remarkable feature of the cipher is - as its name indicates - its scalability. It is not an unheard-of feature - for example, one of the requirements for AES was to have variants of the cipher with respect to both block length and key length. But SEA is designed so that it enables virtually any block and key length, provided that both are equal and multiple of 6 word lengths. Of course, in practice a limited set of word length and block/key length will be used, for example the authors compared 96-bit SEA on 8-bit microcontrollers and 192-bit SEA on 32-bit microcontrollers. However, as requirements on key length increase yearly due to advancing technology used (potentially) for brute-force attacks, SEA offers a quick and easy upgrade path.
A minor - but handy - peculiarity of SEA is, that after performing the whole key schedule (i.e. a complete encryption or decryption cycle), the key returns to its initial state. It means that the key does not require any extra memory - it can be stored in the place where it is used.
While attempting to implement the cipher on two popular 8-bit platforms - x51 and AVR, the rather mathematically-oriented notation used by the authors caused confusion in several cases. For example, in the substitution, the "recursivity" means, that in the three consecutive steps the already modified values obtained in the previous steps are used, rather than the original value of the data/key. An another maybe confusing thing is the notation of the rounding (the "half-finished" square bracket) which was due to poor print overlooked; while it is easy to describe the algorithm with no rounding altogether (as the number of rounds is by definition always odd).
The presented implementations are written in assembly language of the respective microcontroller and are optimized for speed, resulting in several times larger code than if the author's guideline was followed. The rationale behind this decision is, that even in this way the code size is well below 1 kbyte and in modern microcontrollers the code memory (usually FLASH) comes very cheap in multi-kilobyte chunks. Also, one-way process is given as the reference implementation (only encryption or only decryption - often only one of them is used in embedded applications), although the both-way version is presented, too, coming at a small penalty in size and some decrease in performance.
SEA for x51
As set forth above, the implementation is slightly different than the description indicates. First, the even and odd rounds are separated, to avoid the left-right swap both in data and key schedule; the swap has been replaced by renaming left for right and vice versa in the even round. Second, the first half of rounds is separated from the second half, thus avoiding the branching based on round count, and also the left-right swap of data in the middle and the end of process is replaced by renaming. The resulting code increased in size almost 4 times, but due to the big number of rounds, even a relatively small change in the rounds' code may have significant impact on overall speed.
Scalability of this implementation is ensured by performing all the operations (both on data and key) strictly on 3-byte chunks. The presented implementation works on 96-bit key/data block, so there are two byte triplets manipulation in each data and key round. For larger bit size, simply copy the triplet-manipulation code as many times as needed and rename the triplet's variables respectively; also change the total number of rounds (as half the calculated round number, in the test in the first part). Even the smallest SEA (48-bit) can be derived easily in this way. Both data and key are stored in directly-accessed memory, so this pratically limits the size of key/data block to 288 bits in reasonable applications.
All three variants (encryption only (Sea), decryption only (iSea), and both way (SeaED)) are presented here in a single file - sea1.a51 - together with code size and cycles count.
SEA for AVR
The AVR implementation is a "manual translation" of the x51 implementation into the AVR assembly language. However, there are two differences. First, AVR does not have byte rotations (without involving carry), so that needs to be simulated by four instructions. More importantly, AVRs don't have an equivalent of x51's directly addressable memory, so both key and data block have to be kept in the register file. As keeping them completely or partially in RAM would meen permanent sweeping it in and out of registers with detrimental impact on performance, any upscaling is out of question.
Also the AVR implementations in all three variants (encryption only (Sea), decryption only (iSea), and both way (SeaED)) are presented here in a single file - sea1.asm - together with code size and cycles count. There is a fourth implementation - Sea2 - nicely demonstrating the possibility to decrease the code length in exchange for decreased speed.