Lecture 29: Disks, our spinning, high-capacity friends (by Trek Palmer) ======================================================== Although many storage systems have been employed throughout the history of computing, most modern systems use magnetic hard disks to store data. How bits are stored -------------------- The bits themselves are encoded in the orientation of metallic particles deposited on the surface of circular platters (the discs from which the storage medium derives its name). The read/write heads of the disk are actually little electro-magnets. If a current is driven through the head, it'll generate a magnetic field. This induced field will cause the magnetic film underneath the head to become aligned parallel to the field generated by the head. Without applying a current, as the read/write head encounters changes in magnetic alignment, that'll inducea current in the coil, and with appropriate electronics this miniscule current can be detected and amplified. Because only changes in alignment can be detected, the encoding of bits on the platters is non-trivial. A simple encoding would be like: 1 0 ---+ +--- | | | | +--- ---+ SHOW ENCODING OF 101010 But this wastes half the bits on the disk, so disk manufacturers have come up with considerably more complicated, but denser encodings. Basic disk mechanisms ===================== A disk basically consists of a central spinning spindle, attached to which are multiple platters. Hovering above and below each of these platters are read/write heads attached to a motorized armature. This arm can swing in and out, allowing the read/write heads to be positioned from the edge of the platter to the part nearest the spindle. spindle | v +-------+ =====||====== |<----- Arm || +-------+ =====||====== | || +-------+ =====||====== | +-------+ The platters ============ The platters themselves are high-strength alloys. They have to be hard-wearing because they spin at 5-10K rpms. All kinds of material science is invested in all the tricky problems with adhering magnetic material to a fast spinning surface. Fortunatly, as computer scientists, we can just trust that all this stuff has been solved for us, and ignore the fact that hard-drives represent a serious engineering effort. As with any storage mechanism, it is not sufficient to simply cram it full of bits, you've got to have some easy way of getting them back out again. Harddrives have an addressing system, but unlike RAM, hard-drive addressing is based on the geometry of the drive. Each platter is divided into concentric rings. These rings are called tracks. The tracks are further sub-divided into sectors. The sector divisions stretch out from the spindle like spokes, forming triangular wedges. Because the platters are stacked one on top of the other, tracks can be grouped vertically into cylinders. Because the basic quantum of data on a disk is the sector (512 bytes is a normal size), any data on the disk can be addressed with a track number, sector number, and platter number (also known as a surface number). Unlike RAM, where it costs the same to access data regardless of the address, disks have variable access times. For instance, if data is on the same cylinder, it is much cheaper to access than if it's on another track (because the read/write arm will have to be moved). Basically, there are two sources of delay. One is the rotational delay, this is how long you have to wait for the platter to rotate around so that the desired sector is under the read/write heads. This isn't constant, because it's dependant upon how far out the sector was when the read/write head was positioned. This is why faster-spinning disks are faster, the rotational latency of a 10000 rpm drive is roughly half that of a 5400 rpm drive. The seond source of latency is called the seek time. This is the amount of time that it'll take to reposition the read/write heads. This too, isn't constant. The mechanism used to position the heads has two speeds. The slow speed is used to shift the heads by a few tracks, while the fast speed is used to move by hundreds of tracks at a time. This complicated timing is why hard-drive access times are given as averages. Disk Controllers ================ We don't get to access hard drives directly. Drives are chained off of a controller, which is an I/O device that allows the computer to use normal I/O routines to read and write to a hard drive. The controller abstraction is useful for many other reasons. It allows the hard drive systems to use all kinds of buffering and remapping without having to expose the details to the operating system. This comes in handy because most modern harddrives have bad block detection in hardware. What this means is that when the hardware determines that some defect on the platter is messing with reading and writing, that it will remap the defective sector to a free reserve sector. RAID ==== Another advantage to the controller abstraction is that many drives can be made to appear as a single drive. Although it is expensive (perhaps impossible) to make a 1 TB drive, it is possible to lump together 4 250GB drives and have the controller present the I/O bus with the illusion of a single drive.