Lecture 11: Instruction encoding (by Trek Palmer) ================================= Up til now, we have been using ARM instructions without any understanding of how they are represented within the machine. On the ARM, all instructions (excluding Thumb) are 32-bits. Some architectures support variable-length encodings, but for reasons of efficiency and ease of compilation, fixed-length encodings are standard. An instruction encoding is a method that maps an operation and set of arguments into an integer. Now, because this integer is used to drive control logic the encoding is going to be motivated by hardware concerns more than convenience of assembler implementors or assembly coders. Some architectures are concerned with maximizing code density (x86) and so the encodings are really convoluted and designed to squeeze as much into the bits as they can. In general, modern architectures encode their instructions so that they're effeciently decodable and have room for future extensions. The ARM diverges from this somewhat. An Example encoding: the ADD instruction ----------------------------------------- First off, every ARM instruction is predicated, so space in the instruction encoding must be allocated to the predicate. The ARM has 16 predicates, so 4 bits are necessary. The ARM folks decided to allocate these four bits at the head of the instruction: 31 28 0 +----+-------------------------------------------------------------------+ |Cond| | +----+-------------------------------------------------------------------+ Now, because the logical and arithmetic operations have the same basic syntax and format, it would make sense to have the instruction encodings look the same. This is the case on the ARM. But we need a way of distinguishing one arithmetic operation from another, so there is a special field (usually known as the opcode) that distinguishes instructions. For ADD on the ARM, the opcode is 0100 31 28 0 +----+--+-+----+---------------------------------------------------------+ |Cond|00| |0100| | +----+--+-+----+---------------------------------------------------------+ Other ops will have different encodings, AND is 0000, MOV 1101. Also note that bits 27 and 26 are 0. This is part of the encoding that identifies this instruction as an arithmetic/logic instruction (rather than a branch or a load or something). ADD, like other arithmetic operations takes three arguments, two inputs and an output. Because the ARM is a register-register machine, the output argument is a register. And because the ARM is the way it is, the first input argument is a register. Because the ARM has 16 registers, four bits is sufficient to encode a specific register. So we need to sacrifice 8 bits to encode the output and one of the input registers. 31 28 19 15 0 +----+--+-+----+-+----+----+---------------------------------------------+ |Cond|00|I|0100|S| Rn | Rd | | +----+--+-+----+-+----+----+---------------------------------------------+ Rd is the target register and Rn is the first argument. S is the bit that signifies whether or not this is going to be an ADD or an ADDS instruction. The second input argument is tricky. There are three basic classes of argument: ADD Rx, Ry, #Z immediate ADD Rx, Ry, Rz, LSR #4 immediate shift ADD Rx, Ry, Rz, LSL Rw register shift For immediates, the ARM is a little wacky. Rather than just encoding a 12-bit constant in the remaining bits, the ARM designers chose to allow an 8-bit constant with a four bit rotation argument. Note that bit 25 (I) is 1, which indicates that the low 12 bits are encoding an immediate value. So, the constant value being added to Rn is immed_8 rotated right by 2*rotate_imm. 31 28 19 15 11 7 0 +----+--+-+----+-+----+----+----------+----------------------------------+ |Cond|00|1|0100|S| Rn | Rd |rotate_imm| immed_8 | +----+--+-+----+-+----+----+----------+----------------------------------+ So, for ADD R0, R1, #hff000000, the encoding would be: 31 28 19 15 11 7 0 +----+--+-+----+-+----+----+----------+----------------------------------+ |Cond|00|1|0100|S|0001|0000| 0100 | 11111111 | +----+--+-+----+-+----+----+----------+----------------------------------+ For the immediate shift, we need to encode several things: the register we're shifting, the amount we're shifting by, and the kind of shift. 31 28 19 15 11 6 5 4 3 0 +----+--+-+----+-+----+----+----------+-----+-+--------------------------+ |Cond|00|0|0100|S| Rn | Rd | shift_imm|shift|0| Rm | +----+--+-+----+-+----+----+----------+-----+-+--------------------------+ shift_imm holds the amount to shift by, and shift encodes what kind of shifting needs to be done. So, for ADD R0, R1, R2, LSL #2 the encoding is: 31 28 19 15 11 7 0 +----+--+-+----+-+----+----+-----------+----+---+------------------------+ |Cond|00|1|0100|S|0001|0000| 00010 | 00 | 0 | 0010 | +----+--+-+----+-+----+----+-----------+----+---+------------------------+ For register shifts, all we need is to replace the shift_imm with a register: 31 28 19 15 11 8 7 6 5 4 3 0 +----+--+-+----+-+----+----+--------+-+-----+-+--------------------------+ |Cond|00|0|0100|S| Rn | Rd | Rs |0|shift|0| Rm | +----+--+-+----+-+----+----+--------+-+-----+-+--------------------------+ Another encoding, B/BL ----------------------- Branching is simple, first, there's the opcode, then we need to know whether this is a normal branch or a BL, and lastly we need the offset (namely, the amount to add to the PC). So here we go: 31 28 24 23 0 +----+---+--+------------------------------------------------------------+ |Cond|101| L| signed_immed_24 | +----+---+--+------------------------------------------------------------+