Dynamic Libraries (by Trek Palmer) =================== As we saw in the last lecture, even in the static case, if you want to support seperate compilation and linking (generally considered a good thing), then you have to replace direct function calls to external code with calls to a lookup function. This introduces a level of indirection which allows a linker just to modify a few well-defined global data structures (until now, all we've seen is the symbol table) to link binaries together. To summarize, BL foo, isn't good if foo is an external function. Because to link together two binary files, the code would have to be rewritten so that the call used the correct offset. This is generally impossible (or, at the very least, unwise). Also, it complicates assembling and makes it impossible to incrementally link files together (like doing some at compile time, and then the rest at run time). So to solve this problem: BL foo => BL lookup Here, we've replaced the general problem of calling functions you didn't define in other binaries with the simpler problem of calling a linker-supplied method to fetch the address and then indirectly calling the function. As long as the linker places the lookup code in the binary before it finishes linking it, we should be ok. In general, service routines like lookup are gauranteed to be stored a known locations in memory so the compiler can do most of the legwork. dynamic linking ---------------- Up til now, we've been discussing static linking, which is the process of taking pre-compiled/assembled binary files and combining them into one large monolithic executable. In modern systems, this is not how most executables work. Most binaries are dynamically linked, by which I mean that they depend on binary files on the system that are only linked against when the executable is run. In the static case, each file knows which external functions it is calling and which internal functions it defines. This is the same for the dynamic case. The difference is that the actual process of linking happens AFTER the program is loaded in memory. And, in some cases, it happens on demand AS THE PROGRAM IS RUNNING! So now, binaries need the following information to support linking: 1) Table of internally defined functions (for others to link against) 2) Table of externally called functions (for the linker to resolve) 3) Table of libraries needed (so the linker knows what to fetch from the file system) Each binary file will have this data. Also bear in mind that there are now several global data structures floating around to support the indirect referencing that linking requires. This is a lot of stuff to keep track of, and rather than placing each thing at a known distinguished address (which would have the effect of partitioning the address space), we have one data structure at a distinguished address that tells us where everybody else is. This is called the Global Offset Table or GOT. You lookup the location for everything external or global in the GOT. So now static data, as well as functions, must be accessed in an indirect fashion. ADR Rx, foo => ADR Ry, GOT_BASE LDR Ry, [Ry, #foo_offset] LDR Rx, [Ry] Bear in mind that this is the simplest case, where foo_offset can be known in advance. In general, it may be necessary to calculate the offset (as in the case of a hash table). At this point, we're entering into system specifics. Each OS/Binary format pair has its own particular linking semantics, but for the rest of this discussion we're going to be using one of the most popular formats, ELF, as a motivating example. Using ELF isn't so bad, because ELF is used on most modern UNIX systems, and was based off of COFF, which is also the basis for the Windows binary format, PE. Dynamic Linking in ELF ----------------------- The first thing about ELF is that all library code is so-called Position Independant Code (PIC). This is code that uses offsets ONLY for local calls/loads/stores. Therefore, it can be loaded anywhere in memory. This is very handy for libraries, as the loader can just place them anywhere convenient. Note that many binary formats do not require this of their libraries. In Windows, for instance, all DLL code is merely relocatable. Which means that it wants to be located in a specific place in memory and if it isn't available, the linker/loader has to rewrite portions of the code so that it will execute properly in the new place in memory. In an ELF file, there is a .dynamic section. It has many things in it, but most importantly are: NEEDED - libraries needed by this executable/library SONAME - library names (used by the linker to find them on the filesystem) SYMTAB - pointer to the symbol table PLTGOT - pointer to the GOT and the PLT INIT - pointer to code to be executed on load FINI - pointer to code to be execute on program exit NEEDED and SONAME are used by the linker to fetch out all the libraries that this program needs. The linker ends up doing an exhaustive dependance search through all the libraries, topologically sorting them, then loading them in that order. So, for instance, in almost all C programs there will be a reference to the C library (libc). Somewhere in SONAME then, there will be the string libc. Actually, there is usually version information attached to it as well. For instance libc6, meaning libc version 6. On many systems, library versioning is a hack (or just broken, as on Windows). OSX, however, has gotten it right. In OSX, it is possible to have many many different versions of a library co-existing peacefully, so that older binaries can co-exist with younger stock. SYMTAB is the table of internal symbols in this library. This is useful for looking them up during function/symbol resolution. The GOT (each library has its own), is the repository of location information for all other static and linking data. The PLT (Procedure Linkage Table) fulfils the role for dynamic linking that the plain symbol table did for static linking. The PLT is a table that holds code snippets for jumping to external functions. So, to do an ELF-style function address lookup: ADR Rx, GOT LDR Ry, [GOT, #PLT_OFFSET] SUB Ry, PC, Ry ;turning an address into a PC-relative offset NOP ;remember PC is actually PC + 8 ADD PC, PC, Ry ;making the indirect call And in the PLT: MOV PC, #function_address So, why the extra layer of indirection? Why can't the PLT just be a table of addresses? Well, the reason for having code in the PLT is that ELF supports (and, in fact, recommends) using lazy linking. Laziness is actually a generic term in computer science and it means delaying work until the last possible moment. The hope is that you'll never have to do the work anyway and so you'll be more efficient by just postponing it indefinitely. Laziness applied to linking means the following: the linker/loader won't actually load a library into memory until it is actually needed by some code already loaded. This is effecient insofar as it is rare for a program to actually call more than a small subset of a libraries' routines (you've probably never written code that calls more than a third of the methods in java.util.HashMap, for instance). So, if a library is never called, you don't spend all that time loading it into memory. And if a function is never called, you don't spend all that time resolving its address. The downside is that if a library is used a lot, then it costs somewhat more to do lazy linking. In this lazy system then, the PLT code is actually initialized to a so-called trampoline routine. A trampoline is a wrapping function that directs the flow of control into a service function that patches up some stuff, then actually jumps to the code the user program wanted to call. In our simplified ARM ELF system, a reasonable PLT default entry for the function "foo" would look like: MOV Rx, # BL load_library Then, when the user program calls this code, it will cause the linker to first load the library. After the libraries code and data are loaded into memory, the PLT entries for that library will be rewritten thus: MOV Rx, # BL resolve_function_address So, after loading, the linker will jump back to the PLT code, which will now cause the function address to be resolved. This will then cause the PLT entry to be rewritten to what we saw before, namely: MOV PC, #function_address In actuality, things aren't so simple. The PLT code is a lot more convoluted and involves much GOT wrangling. The reasons are somewhat obscure, but they relate to being able to have much of the PLT be read-only. So rather than actually rewrite PLT entries (which is infeasable on a variable-length instruction architecture like x86), you actually write values into the GOT that all the PLT entries reference. The effect, although obscured with ugly code, is exactly the same as the code-rewriting scheme described above. That is, in a nutshell, what dynamic linking is all about. Of course, it is unlikely that you'll be able to go out and write a linker/loader from what I've told you; but now what actually happens on your systems when you run a program should be far less mysterious. Also, you'll be able to appreciate such error message gems as: "foo.dll not found" on Windows. Or, if you're more of a UNIX person (and therefore inherently more tasteful than the Windows crowd :) errors like "libfoo.so: undefined symbol: bar", or the ever-popular: "Error loading shared object libfoo.so: relocation error"!