Virtual Memory (by Trek Palmer) ================ So far we've been treating addresses as literal physical addresses. That is to say, when we use the value 0 as an address, we've been assuming that that references data actually stored in the lowest bytes in RAM. There are two problems with a system like this: 1) How can you address a 32-bit address space unless you have 4GB of RAM? 2) How can multiple programs co-exist without clobbering each other? Problem 1 may seem a little odd. Phrased another way it is simply this: we have our heap at low addresses, growing up in memory, and we have our stack at high addresses growing down. If we don't actually have enough RAM to occupy all the addresses between where our heap starts and our stack starts, how can we run programs? Problem 2 is more subtle. So far, we haven't had to do anything when writing ARM code to make sure we weren't violating some other program's data. But we know from experience that modern systems are capable of running hundreds of programs at the same time. How are they protected from one another? If each program has its own stack, what stops them from colliding in memory? Or for that matter, what keeps their heaps from colliding? The solution employed by modern systems is Virtual Memory. Pages ------- The first thing to understanding virtual memory is knowing that modern systems partition all of memory into fixed-size chunks called pages. Commonly, pages are 4-8K in size. The ARM supports page sizes of 1,4, and 16K. So, if you've chopped memory up into 4K pages, then you can chop up the addresses as well. Because 4K is 2^12, you can treat the low-order 12 bits of an address as a page offset. The remaining 20 bits can be considered the page number. This is illustrated below: 31 12 11 0 +------------------------------------+--------------------+ | Page Number | offset within page | +------------------------------------+--------------------+ Now, the act of addressing is a two step process. Rather than just sending a 32-bit value to memory and asking for that byte, now you are asking the MMU for a specific page, and then the MMU will get a specific byte off of that page for you. This simple modification is surprisingly powerful. Now the MMU can transparently maintain a mapping between a program's page numbers and actual physical pages. As an example, assume that our program's text starts at address 0, and the the stack begins at the highest point in memory 0xFFFFFFFF. With direct physical addressing, you would have to have 4GB of memory for the program to be able to use those addresses, but with virtual addresses, the MMU can place the virtual pages in different locations in physical memory. So for address 0x4, the page number is 0, and the offset 4. For address 0xFFFFFFFF, the page number is 0xFFFFFC, and the offset 0x3FF, so the MMU could easily have the following mapping: virtual page | physical page ----------------+--------------- 0 | 0 0xFFFFFC | 1 ----------------+--------------- See what's just happened? Now, instead of your program needing all 4GB to run, all it needs is 8K! This mapping would be layed out in physical memory thus: physical page: 0 1 virtual page: 0 0xFFFFFC Now here's where it gets tricky. Let's assume that your program does a lot of deep function calls, and the stack needs to grow to be larger than 4K, So your program needs to address values on the previous virtual page, namely 0xFFFFFB. Because the MMU is hiding the virtual physical mapping from us, virtual pages don't need to be laid out in ascending order in physical memory. The MMU can just map page 0xFFFFFB to physical page 3. physical page: 0 1 2 virtual page: 0 0xFFFFFC 0xFFFFFB Now consider the situation with multiple programs. Each program can have its own virtual addresses, and as long as the MMU translates them to different physical addresses there's no way they can conflict! Consider the following mapping: virtual page | physical page ----------------+--------------- 0 | 0 \ 0xFFFFFC | 1 | Program 1 0xFFFFFB | 4 / ----------------+--------------- 0 | 2 \ 0xFFFFFC | 3 | Program 2 ----------------+--------------- / So now programs can all be written as if there weren't any other programs running on the system and the MMU will protect their data from being clobbered by other programs! Isn't that nice! You wouldn't have to rewrite a lick of code to get it to work on a system with virtual memory support. Page Tables ------------ So far, we haven't discussed how exactly the MMU gets these mappings. It is surprisingly simple. The MMU gets the mappings from a data structure known as a page table. A page table is just an array, where the virtual page number is the index and the physical page number is stored at that offset. Each program has its own page table, stored somewhere in physical memory. The MMU has a distinguished register called the PTBR (for Page Table Base Register). When the OS decides to switch over to a program, it sets the MMU's PTBR to the page table of the program. Then, all subsequent virtual accesses (until the next program switch, that is) will be done through that program's page table. So, the page table for the previously described program 1 would be: offset: 0 1 2 3 4 . . . 0xFFFFFB 0xFFFFFC value: 0 X X X X . . . 4 1 TLB ------- There is, however, a problem with virtual memory. Now, in order to get an address, first the MMU needs to go out to memory to get the page table entry, then it can construct the physical address and then fetch that out of memory. We have taken one memory access and turned it into two. This will, at the very least, double the amount of time it takes to get anything from memory. Page table accesses are somewhat odd, and so don't always cache well. Therefore, data caches don't bother caching page table data, and MMUs have their own specialized page table caches. These caches are called TLBs (for Translation Lookaside Buffer). They cache the virtual page number and its corresponding physical page. Of course, when the PTBR is reset the TLB needs to be flushed so that the previous program's mappings aren't used. But, with a TLB, much of the memory access time overhead that virtual memory introduced can be eliminated. Metadata --------- If each page table entry is 32-bits (which is normal), but only the upper 20 bits are needed, then the MMU can store additional meta data flags in the low 12 bits. More on what those flags can be used for in the next lecture.