Lecture 3: Negative Numbers (by Trek Palmer) ------------- Representation of Negative Numbers ================================== We represent negative numbers by putting the symbol "-" in front of a number. But a computer doesn't have it so easy. All it knows is bits, so we need to find some way to encode a negative number into a bit pattern so that the computer knows it's a negative value. Several schemes have been used over the years: 1) Sign and magnitude 2) 1's complement 3) 2's complement - Sign and magnitude This is, perhaps, the most obvious way to represent a negative number. You simply mark one of the bits (usually the high-order bit) as a 'sign bit' and then use the other bits to hold the value of the number (this is the magnitude part of sign and magnitude). It's simple and has the advantage that you can tell whether or not something is negative just by looking at a single bit. Ex. 0101010 = +42, 1101010 = -42 Sign and magnitude isn't without its disadvantages, however. Because you have to sacrifice one bit to hold sign information, you reduce the range of integers you can represent in a single word. It also means that the computer has to have separate subtraction circuitry (more on this later). (also there are two forms for 0, +0 and -0). - 1's complement 1's complement is a way of representing negative numbers by transforming the positive values. To turn a positive number into its negative equivalent, you simply flip all the bits (e.g. 101 -> 010). Note that this is equivalent to subtracting the number from 2^n - 1 (for an n-bit number). However, subtraction becomes addition of the 1's complement form of the subtracted value! It isn't straight addition, however. In the case of overflow, certain additional operations must be done. If bits 'fall off the end' they needed to be added back to the sum. (also there are two forms for 0, +0 and -0). overflow Ex. 11110 ==> 11110 - 11101 + 00010 -------- 100000 => 00000 + 1 = 1 Overflow Ex. 100000 ==> 100000 - 011101 + 100010 ---------- 1000010 => 00010 + 1 = 11 - 2's complement 2's complement, like 1's complement, requires a transformation of the positive value in order to generate its negative representation. It is equivalent to subtracting the number from 2^n. Although, the easiest way to generate it is to flip all the bits and add 1 (e.g. 101 -> 010 + 1 = 011). Although this conversion is more complicated than the 1's complement version, subtraction becomes straight addition! We just ignore the overflow bit, and as long as the answer is between -2^(n-1) and 2^(n-1) - 1 we're ok. Note also that 2's complement has only one form for 0 (this is good). Ex. 11110 ==> 11110 - 11101 + 00011 -------- 100001 => 00001 Ex. 100000 ==> 100000 - 011101 + 100011 ---------- 1000011 => 000011 Also note that the MSB is effectively a sign bit (0 = positive, 1=negative), so you get easy sign checking with 2's complement as well. Most modern systems employ 2's complement for representing signed values. Overflow ======== Overflow is a property of fixed-precision number representations (such as a 32-bit integer). Overflow is, basically, what happens when the result of an operation is too big to fit in the space allotted. When adding unsigned numbers, overflow is when the MSB has a carry out. However, when subtracting two values using 2's complement, the meaning of such a carry out is less clear. To decide what to do, we need to do a little impromptu number theory. Consider two numbers a and b. If we subtract a from b, the result can't have a greater magnitude than either a or b. Therefore, adding two numbers of different sign cannot generate overflow. If a and b have the same sign, but a+b has a different sign, then overflow has occured. Binary Coded Decimal ==================== An obscure, but still used encoding for numbers is BCD, or Binary Coded Decimal. The system is simple, every 4 bits stands for a particular decimal digit. For instance: 37 -> 0011 0111 191 -> 0001 1001 0001 Simple, right? Unfortunatly, BCD complicates arithmetic cicuitry (anything over 9 has to be added to the next digit), and wastes space. So, it's rarely used in new systems. But, BCD used to be fairly widespread (particularly in IBM systems) and you often run across it in older systems (or new systems who are trying to emulate older systems). Intro to the ARM ================ The ARM is a 32-bit RISC ISA with some DSP garnish. It's a very popular chip, particularly for embedded systems. The ARM comes in a variety of speeds and memory sizes and is used in systems ranging from microwaves to the Game Boy Advanced. The ARM has 16 32-bit user accessible registers, labelled R0-R15. The ARM is a little unusual in that R15 is actually the PC (actually, because the ARM is pipelined, R15 is the PC + 8). The ARM actually has two instruction sets. The main instruction set is composed of fixed-size 32-bit instructions (these are the ARM instructions). The other instruction set is comprised of 16-bit instructions (the Thumb instructions). Thumb instructions can perform a subset of the full ARM instruction set. The purpose of the smaller Thumb instructions is to increase code density (which matters in memory-constrained systems). In this class, we're only going to deal with ARM instructions. The ARM is a RISC machine (also known as a load-store architecture), so special instructions are used to get and set values from memory. ALU instructions, therefore can only have two different kinds of arguments: registers and constants. ARM instructions have a three address syntax, the first argument specifies the destination register, while the second and third specify the arguments to the ALU. Example: ADD R1, R2, R5 SUB R4, R5, #25 AND R3, R7, R7 Note that constants are prefixed with the # symbol. (# or #d for decimal and #h for hexidecimal). Some useful ALU ops -------------------- ADD SUB RSB MUL AND ORR EOR MOV BIC Predication & Condition codes ------------------------------ The ARM is a predicated architecture, which means that each instruction is tagged with a condition. The ARM, like many architectures, uses condition codes to evaluate conditional expressions. On the ARM, there are four bits in the condition code (N,Z,V,C). N signifies whether or not the last computed value was negative. Z signifies whether or not the last computed value was zero. V signifies whether or not the last computed value generated overflow. And lastly, C signifies whether or not the last computed value generated a carry (into the 33rd bit). Note that C and V are distinct. For addition, they are the same, but for multiplication V and C can have different values. Condition codes are used, at the assembly level, to perform the same kinds of things that constructs like if are used for in higher-level languages. For example, in java, to decide if a number is zero you might use the following code: if(foo == 0) { foo++; } To perform the equivalent task in ARM assembly, you would perform an operation on the register containing the value of foo, and then use the condition codes to decide what to do next. For instance, assuming foo was stored in R3 SUBS R3, R3, #0 ADDEQ R3, R3, #1 Note that it's not the standard ADD operation. In ARM assembly language, the predicate for an instruction follows the mnemonic name. No predicate is the same as the predicate AL (for always). Note also that SUB has an S after it. This tells the ARM to use the results of this instruction to set the condition codes. So, the basic flow of the program is this. The ALU subtracts 0 from the value in R3 and stores the result back in R3, and sets the condition codes accordingly. Then, the next instruction has its predicate checked against the condition codes. Here, it is important to realize that the predicate names used for assembly coding are for the convenience of the programmer and the condition codes checked may not be immediately obvious. In the case of EQ, the processor checks the zero flag. So, in this case, if R3 = 0, then the subsequent ADD instruction will be executed. Otherwise it'll just be skipped over.