Lecture 4: More ARM Goodies (by Trek Palmer) ------------- Predication & Condition codes, continued ------------------------------ Recall the example from last lecture: if(foo == 0) { foo++; } Assuming the value of foo is stored in R3: SUBS R3, R3, #0 ADDEQ R3, R3, #1 Note that it's not the standard ADD operation. In ARM assembly language, the predicate for an instruction follows the mnemonic name. No predicate is the same as the predicate AL (for always). Note also that SUB has an S after it. This tells the ARM to use the results of this instruction to set the condition codes. So, the basic flow of the program is this. The ALU subtracts 0 from the value in R3 and stores the result back in R3, and sets the condition codes accordingly. Then, the next instruction has its predicate checked against the condition codes. Here, it is important to realize that the predicate names used for assembly coding are for the convenience of the programmer and the condition codes checked may not be immediately obvious. In the case of EQ, the processor checks the zero flag. So, in this case, if R3 = 0, then the subsequent ADD instruction will be executed. Otherwise it'll just be skipped over. Predication is not present in many architectures, and the way you accomplish a similar task is using a special branch instruction. The ARM itself has a branch instruction. In most machines, branches work by simply modifying the value of the PC (or the nPC if the machine has one). Here, again, the ARM is a little wacky. Most machines don't let the programmer explicitly access the PC. Normally the programmer can change the PC only through special instructions (like branch). But on the ARM, because R15 is the PC, you can actually just use standard ALU ops (like ADD) to change the flow of control. Don't do this. It is bad programming style, hard to debug, and you'll run up against wacky alignment issues. So, using the branch instruction on the ARM (whose mnemonic is B), we can rewrite the preceding code thus: SUBS R3, R3, #0 BNE after ADD R3, R3, #1 after: So here you see several new things. SUBS is the same, because we need to set the condition codes. The next instruction is the branch instruction. Note that it is predicated, but with a different predicate than ADD was previously. This time we're checking to see if the value was not equal (NE). Also, the argument to B is a string, not a register or constant! This string is known as a label. It is a convenience that the assembler provides the assembly programmer, to make the code easier to write and easier to read. The assembler will take the code and figure out how much needs to be added or subtracted to the PC in order to transfer control to the target label, and the assembler will replace the label with a numerical value. In this case, the label "after" occurs after the instruction following the branch. So, if the branch is taken, what needs to be added to the PC to change it accordingly (remember, ARM instructions are 4 bytes large)? B needs to add 8 to the PC to make it skip the following instruction. So without labels, the instruction would look like: BNE #8. Aren't labels useful? Note that the ADD is no longer predicated. So now, the flow of control through this branch-enabled code would be this: first the SUB happens and sets the condition codes appropriately. Then, the B instruction's predication is checked. If R3 != 0, then B is executed and the PC changes to skip over the ADD following the branch, otherwise B isn't executed, the PC changes normally and ADD executes. Some Useful Predicates ---------------------- EQ - equal NE - not equal GE - greater than or equal LE - less than or equal GT - greater than LT - less than MI - negative Another example ---------------- Java-ish code: int x; if(x < 0) x = 0; int z = x + 5; Assuming x is in R1 and z is in R4, the equivalent ARM code would be: CMP R1, #0 MOVEQ R1, #0 ADD R4, R1, #5 The CMP (compare) instruction is a useful instruction. It compares (that is to say, it subtracts) two values and sets the condition codes appropriately. It does not store any values back to registers. You can see that CMP is more convenient than SUB. Now we can ask, what are the benefits to predication? Predication allows for simpler and shorter code in certain circumstances. It is the code size issue that is particularly important for the ARM, and is probably the reason the designers decided to put it in the processor. But, now a new question arises: why would an architecture with predication still need a separate branch instruction? Loops ====== The answer is that sometimes you need to branch backwards. Most of the time, you branch backwards in order to iterate/loop. So, predication can eliminate many branches, but whenever you need to loop, you'll have to use one. Now, in java, you're used to many different kinds of loops. for, while and do while, but in assembly land all you have are labels and branches. Simple as they are, labels and branches can be used to construct loops equivalent to the more familiar for and while loops. For instance, consider the following Java-like code: for(int i = 0 ; i < 15 ; i++) { j = j + j; } Assuming j is in R1, and that all the other registers are free, what would the ARM code for this loop look like? w/o predication: SUB R0, R0, R0 ;i -> R0 and i = 0 start_for: CMP R0, #15 ;is i < 15? BGE end_for ADD R1, R1, R1 ; j = j + j ADD R0, R0, #1 ; i++ B start_for end_for: with predication: SUB R0, R0, R0 ;i -> R0 and i = 0 start_for: CMP R0, #15 ;is i < 15? ADDLT R1, R1, R1 ; j = j + j ADDLT R0, R0, #1 ; i++ BLT start_for Another example: ---------------- while(i < j) { i = i * 3; } j++; assuming i is in R2 and j is in R7, w/o predication: start_while: CMP R2, R7 BGE end_while MUL R2, R2, #3 B start_while end_while: ADD R7, R7, #1 with predication: start_while: CMP R2, R7 MULLT R2, R2, #3 BLT start_while ADD R7, R7, #1