Libraries (by Trek Palmer) ============= Libraries in Java ------------------ In Java, libraries are referenced with the import keyword. In order to reference java code in a library, you need to know its name. Libraries in assembly --------------------- In assembly-land, in order to call a function, you need to know its address. This serves the same purpose as the name in Java. However, if the code is stored in a different file (as is the case with libraries), how do you know what the address is? This is a tricky problem, and is the crux of the problem of linking. Static Linking --------------- On Unix systems, there is a tool (known, conveniently enough, as the linker) that links together files that reference each other. So if you had two assembly files: foo.s and bar.s that were coded thus: in file foo in file bar ------------- -------------- foo: bar: STMDB R13!, {R14} STMDB R13!, {R14} ADD R4, R4, #5 SUB R4, R4, #12 LDMIA R13!, {R14} LDMIA R13!, {R14} MOV PC, R14 MOV PC, R14 ... ... BL bar BL foo ... ... how could they be assembled to appropriately call each other? Well, the first thing is that the linker will need access to the symbol table of the other file. Here, we have two options. We can just concatenate the two files, suitably adjusting the symbol table entries to reflect the new position of the code, and then reassemble the file. This, of course, only works if you have the source code handy. This is terribly rare. Companies are not in the habit of handing their source code out as this tends to cut into their profits. So the further question is: how do we link together a source file and a binary file? If the binary file has a symbol table in it, then we can calculate the offset of the various functions within the text segment of the file itself. So, we get the symbol table out of the binary file. We do an initial pass through the source file to calculate how large it is. This value then becomes an offset added to the symbol table entries. So when you hit a line of code like: BL bar, you lookup bar in the symbol table (which is usually a hash table of some kind), add the offset to it, and then use that value as the target for the BL instruction. Of course, this only works if only the source file is making calls into a binary. What do you do if you have multiple binary files that each need to call each other? In the above example, this situation emerges if you seperately assemble foo.s and bar.s, but then need to link them together to create a usable executable. This is actually the common case. Apart from introductory computer science programming courses, seperate compilation is the norm. So, how do you link together multiple assembled binaries? The symbol table is, once again, necessary. But because we can't usually rewrite the binary on disk to patch the BL instructions, we can't use the simple trick above. In fact, we can't directly call functions anymore. We need to introduce a level of indirection. If, instead of branching directly to the function, we were to jump to a function that would retrieve the address from a table, we would be able to change the address of a function without having to rewrite the code. So, if the code were to become: BL bar -> MOV R4, # BL getfunctionaddr ADD PC, PC, R4