Lecture 9: Function calls (by Trek Palmer) ========================= As seasoned Java coders, you are used to functions. It may surprise you to learn that architectures (and their corresponding assembly languages) also support functions. Of course, functions at the hardware level operate somewhat differently from their Java counterparts, but many of the same features obtain. To illustrate, consider the following mock Java code: class A { public int foo(int bar) { int ret = bar; if(bar < 0) ret = ret * -1; return ret + 1; } public static void main(String[] args) { int baz = foo(17); } } First, the differences. In Java, all functions must be part of a class. This is because Java is Object-oriented through and through. There is no similar requirement for assembly level stuff. Assembly functions are just collections of instructions. If you tell the processor that an address out there is the start of a stream of instructions (i.e. a function), it'll treat them as a stream of instructions. Also, Java requires type and protection information to be attached to functions. Again, because most architectures have no notion of data-protection, annotations like 'public' and 'private' are meaningless. Also, most architectures don't support typing so the type information is unnecessary. In fact, assembly functions often have no explicit way of knowing the number of their parameters. Now that we know that assembly functions have no notion of protection, type-safety, or function arity, we can examine the similarities. Like Java, assembly functions have names. These names are labels, which means that although we use strings to express them the CPU understands them as addresses. Like Java, assembly functions can take parameters, however there is no way to verify the type or even the number of parameters! Also, like Java, assembly functions can return values. Consider the following translation: public int foo(int bar) { int ret = bar; if(bar < 0) ret = ret * -1; return ret + 1; } ==> (bar is R4) foo: MOV R5, R4 ;ret = bar MOV R6, #0 SUB R6, R6, #1 ; generating -1 CMP R5, #0 MULLT R5, R5, R6 ADD R5, R5, #1 MOV R4, R5 ; putting ret in the return register MOV PC, R14 ; returning So first we see that functions are responsible for marshaling their own parameters. Foo "knows" that its first parameter will be in R4. The next two instructions are for generating the constant -1 (which cannot fit in the immediate field of an ARM instruction). It is then when we get into the actual body of the function. After the ADD that computes the return value, the next instruction places the return value in a register where the calling function "knows" it will be. The last instruction moves R14 (the link register) into the PC. This will cause the CPU to return into the calling function. The first bits of code, where the function gathers its parameters are commonly referred to as the function prologue. The last few instructions where the return value is placed in the expected register and where the calling function is returned into are often called the function epilogue. A few things worth noting: first is the fact that registers are global and when foo was called, the registers will contain the values that the calling function put there. This implementation of foo will overwrite (the technical term is clobber) the values in R5, R6 and R4. If the implementation of main had any values in those registers it needed, then calling foo can actually break the program. This is bad. How do we fix this problem? There are actually several solutions that have been tried in architecture. One is the notion of register windows, in this scheme functions can actually request a fresh set of registers, so they don't have to worry about cloberring the caller's data. From an assembly programmer's point of view, this seems like an ideal solution. But, it plays merry hell with compilation, and complicates the hardware considerably, so it is used on only a few systems (the SPARC and the Itanium have register windows). The far more popular solution is to save the values of the registers you're going to use out to memory. Therefore, in the function prologue, it'll save the values of the registers it's going to use and in the function epilogue, it'll restore the values back to the registers just before it returns. Now look at this code: foo: STM R13, {R5, R6} ;saving registers ADD R13, R13, #8 ;moving pointer MOV R5, R4 ;ret = bar MOV R6, #0 SUB R6, R6, #1 ; generating -1 CMP R5, #0 MULLT R5, R5, R6 ADD R5, R5, #1 MOV R4, R5 ; putting ret in the return register SUB R13, R13, #8 LDM R13, {R5, R6} ; restoring old values MOV PC, R14 ; returning In this code, I'm using two new instructions, LDM and STM (load and store multiple, respectively). STM RX, {RY, RZ, RW}, will store at addresses beginning with [RX] the values in RY, RZ, RW. So [RX] will have RY in it, [RX + 4] will have RZ, [RX + 8] will have RW. LDM will load from memory those values and store them in the registers specified in the register list. Now the function prologue saves out R5 and R6 while the function epilogue restores R5 and R6 to their old values. Why didn't I save and restore R4? R4 is special, R4 was used to pass a parameter to foo, and R4 was used to return a value back to the calling function. This is an example of a calling convention, which is a system (which may or not be enforced by hardware) for calling functions, passing parameters, and returning values back. Now, using STM and LDM, we can write functions that play nice with caller data. But there are still some unresolved issues. The first one is how does one even call a function? Consider the following translation: public static void main(String[] args) { int baz = foo(17); } => (baz in R4) main: BL foo That's it. You just use a variant of branching to call a function. BL is different from B in only one respect. BL saves the address of the following instruction in R14 (also known as the link register). This is why, in the implementation of foo, we moved R14 into PC in order to jump back to the calling function. It works very similarly to function calling in Java, when the code returns, execution is resumed in the caller just after the call. In assembly, this is accomplished by dedicating a register to holding this address (known as a return address). But, of course, this has it's own, assembly specific implications. Our implementation of foo is incomplete. Specifically, it is not re-entrant. To see why this is a problem, consider the following Java pseudo-code: int bar() | bar: { | int a = baz(); | BL baz return a * 2; | MUL R4, R4, #2 } | MOV PC, R14 | | baz: int baz() | STM R13!, {R5} { | BL gah int b = gah(); | ADD R5, R4, #1 return b + 1; | MOV R4, R5 } | LDM R13!, {R5} | MOV PC, R14 | int gah() | gah: { | MOV R4, #14 return 14; | MOV PC, R14 } | Assuming we start executing in bar, we jump right away to baz. Now we're at baz and R14 has the address of the MUL instruction in bar. baz now saves the value of R5, and then jumps into gah. Now we're executing at gah and R14 has the address of the ADD instruction in baz. gah now moves the constant 14 into the return register and then jumps back to the calling function (in this case, baz). We resume execution at the ADD. After ADD, R5 has the value 15 in it, baz then moves that value into the return register. Then baz restores the value of R5 and prepares to jump back to the calling function. And this is where things go haywire. Remember R14 has the address of the ADD instruction in baz. When baz moves the link register into PC, control is NOT going to be transferred back to bar, but rather to the ADD instruction in baz. This is bad. So, how do we fix this? The problem is that R14 is shared across all function calls. We need to preserve the value of R14 through function calls, but because R14 should really only be affected by BL invocations, we can save R14 in the function prologue and restore it in the epilogue. So, if we replace STM R13!, {R5} with STM R13!, {R5, R14} and LDM R13!, {R5} with LDM R13!, {R5, R14}, then baz won't clobber R14 and when it returns it'll return into bar. The Stack ---------- If you look at how the functions save data you see a picture like this: bar | baz | gah Also note that for the purpose of saving registers, all a function needs is to know where the last function's data stopped. And to restore registers all the function needs to know is where its data starts. And, because the function knows how many registers it's saving (and therefore the total size of shared data) it can calculate the starting point of it's data from the location of the last function's data. Therefore all the function needs is a pointer to the end of the last function's data. If we turn the previous picture on its side and add the pointer it looks like: --- <------------Pointer (after STM is executed) gah --- baz --- bar --- A called function adds its data to the top, and removes its data from the top when it returns (reassigning the pointer as necessary). If we name the pointer top, you can see that this structure resembles a stack. And, in fact, systems commonly use a stack to hold saved function data. In fact, in some register-starved systems (like the x86) everything goes on the stack (including parameters and return values). In most systems the pointer to the top of the stack is called the stack pointer (or sp). Some systems have an additional pointer to the previous value of top, known as the frame pointer (or fp). When using a stack, the function's prologue needs to reset the sp (R13 on the ARM), and then store the values out to memory. In the epilogue, the function needs to restore the values and then reset the sp. On many systems, due to reasons of history and virtual memory, the stack actually grows downwards in memory. So, the prologue would look thus: SUB R13, R13, #12 STM R13, {R5, R6, R8} And the epilogue: LDM R13, {R5, R6, R8} ADD R13, R13, #12 Now, because this happens all the time and LDM and STM are used primarily for functions the ARM instruction set has collapsed the SP increment/decrement functionality into LDM and STM. SUB R13, R13, #12 ===> STMDB R13!, {R5, R6, R8} STM R13, {R5, R6, R8} LDM R13, {R5, R6, R8} ===> LDMIA R13!, {R5, R6, R8} ADD R13, R13, #12