Lecture 23: Sequential execution in detail (by Trek Palmer) ================================================ CPSR Structure: 31 30 29 28 27 26 8 7 6 5 4 3 2 1 0 +--+--+--+--+--+--------------------------------------+-+-+-+--+--+--+--+--+ |N | Z| C| V| Q| DNM (don't matter) |I|F|T|M4|M3|M2|M1|M0| ---------------------------------------------------------------------------- NZCV, are the condition codes. Q is a special condition code used by DSP instructions that you won't (mercifully) be exposed to. I is the interrupt enable/disable flag F is the fast interrupt enable/disable flag T is the thumb mode flag M4 - M0 are the mode bits (telling you what exception mode you're in) An example, the getc (SWI #h00F00001) trap ------------------------------------------- Now actual I/O for a real OS (like linux) is fairly complicated, involving file descriptors, buffers, and a whole bunch of overhead. So we use a simpler abstraction. getc just grabs the next character off of standard input. Of course, this is implemented in the simulator as just a hunk of code, but as an exception it works like this: at address 0xFEED0000, SWI #h00F00001 is encountered 1) We enter supervisor exception handling mode 2) R14_svc = 0xFEED0008 (the PC on ARM is always off by 8) 3) SPSR_svc = CPSR 4) CPSR[4:0] = 0b10011 5) CPSR[5] = 0 (no thumb exception handlers) 6) CPSR[6] = 0 (we allow fast interrupts, b/c they have higher priority) 7) CPSR[7] = 1 (disable normal interrupts) 8) PC = 0x00000008 Now at 0x00000008, there will be a branch instruction, like B 0xFFFF0000 (address of branch handler) The branch handler will decode SWI to decide which syscall it's doing (inst & 0x000FFFFF), and will then use that as an index into an array of addresses, each of which is the address of the appropriate trap code. For instance, assuming the trap number is in R4: ADR R5, trapBaseAddr LDR PC, [R5, R4, LSL #2] The LSL is necessary because ARM instructions are 4 bytes wide, so you need to scale the offset by 4, which is the same as shifting left by 2. Now the PC will have the address (+ 8, actually) of the OS internal function to process this trap. At the end of the trap handler will be the following line of code: MOVS PC, R14 In supervisor mode, this will copy R14_svc into PC AND restore CPSR from SPSR. And that's it, now you know how user-mode programs can communicate to supervisor-mode programs! Sequential Execution --------------------- At the beginning of the semester, we went over an abstract picture of the internals of a processor. Now, we're going to look at the picture in much more detail. Single-bus organization ------------------------ Bus PC <-->|| ||<--->Instruction Decode and control MAR<----|| ^ || | MDR<--->|| | ||--->IR Y<----|| 4 | ||<--->Register File | | +--|| v v | || +-----+ | || sel--> \MUX/ | || | | || A v B v || ----\/--- || control-\ ALU / || \_____/ || | ||<-->PSR v || Z---->|| In a single-bus architecture, everything is connected to the same bus. Each device capable of reading from the bus has an in control line, that when it's set causes the bus value to be read in and latched. Each device capable of writing to the bus has an out line that when it's set causes the device to write its values out onto the bus. In this system, there is a small set of operations available: -Transfer a word from one register to another -Perform an ALU op and store that in a register -Fetch a word from memory and store it in a register -Store a value from a register into memory by and large, these four functions can be composed to perform any action in the system. Register Transfers ------------------ Register transfers are easy, registers are just arrays of flip-flops, so the out logic can just be a tri-state device and the input logic can be a 2-bit MUX. Bus ----+----------------------------------------------------+---------- | | | +----------------------------------+ | | | ___ | |\ | | +--|0 \ +--| \----+ | |MUX\ +-------+ | | / +----------+1 |----|D Q|-------------+ |/| | / | | | |__/ clk-|> ~Q| | | +-------+ | Register in Register out A tri-state device is just a triggered buffer. If its second input is high, it passes through the value, otherwise it acts like an open. This is necessary for electrical reasons (so that downstream devices don't overwrite upstream values with all 0s for instance), but for all intents and purposes, you can treat a tri-state device like a switch. To write to a register, you dump the value to be written onto the bus, assert Register in and the flip-flop takes care of the rest. To write out a value, assert Register out and the Q output will be routed to the bus. Fetching values from memory --------------------------- To read a value from memory, first MAR must have the address written to it. Both MAR and MDR are a little different, in that they are connected to two buses. One is the internal CPU bus, the other is the bus that leads (eventually) to the memory. So they each have two in/out line sets. One for the CPU, one for the memory system. Memory accesses are also complicated by the fact that memory often operates at a different speed from the CPU (so it can take many CPU cycles to read a value from memory). Also, due to cache effects it is even possible for different memory accesses to have different access times! To combat this problem a new signal is needed. MFC (memory function completed) is a signal that is asserted by the memory system when it's done with the action. So to read from memory: 1) write address to MAR 2) Start a memory read operation in the memory system 3) Wait for MFC to go high 4) write MDR onto the bus 5) Read in bus value into target register Writing values to memory ------------------------ Similar to reading values, writing them out requires: 1) write address register's value onto bus 2) Read bus value into MAR 3) write contents register's value onto bus 4) Read bus value into MDR 5) Start memory write 6) Wait for MFC to go high