Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 456 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
456
Dung lượng
3,32 MB
Nội dung
Part Four Storage Management Since main memory is usually too small to accommodate all the data and programs permanently, the computer system must provide secondary storage to back up main memory Modern computer systems use disks as the primary on-line storage medium for information (both programs and data) The file system provides the mechanism for on-line storage of and access to both data and programs residing on the disks A file is a collection of related information defined by its creator The files are mapped by the operating system onto physical devices Files are normally organized into directories for ease of use The devices that attach to a computer vary in many aspects Some devices transfer a character or a block of characters at a time Some can be accessed only sequentially, others randomly Some transfer data synchronously, others asynchronously Some are dedicated, some shared They can be read-only or read – write They vary greatly in speed In many ways, they are also the slowest major component of the computer Because of all this device variation, the operating system needs to provide a wide range of functionality to applications, to allow them to control all aspects of the devices One key goal of an operating system’s I/O subsystem is to provide the simplest interface possible to the rest of the system Because devices are a performance bottleneck, another key is to optimize I/O for maximum concurrency Download at http://www.pin5i.com/ Download at http://www.pin5i.com/ 10 CHAPTER Mass -Storage Structure The file system can be viewed logically as consisting of three parts In Chapter 11, we examine the user and programmer interface to the file system In Chapter 12, we describe the internal data structures and algorithms used by the operating system to implement this interface In this chapter, we begin a discussion of file systems at the lowest level: the structure of secondary storage We first describe the physical structure of magnetic disks and magnetic tapes We then describe disk-scheduling algorithms, which schedule the order of disk I/Os to maximize performance Next, we discuss disk formatting and management of boot blocks, damaged blocks, and swap space We conclude with an examination of the structure of RAID systems CHAPTER OBJECTIVES • To describe the physical structure of secondary storage devices and its effects on the uses of the devices • To explain the performance characteristics of mass-storage devices • To evaluate disk scheduling algorithms • To discuss operating-system services provided for mass storage, including RAID 10.1 Overview of Mass-Storage Structure In this section, we present a general overview of the physical structure of secondary and tertiary storage devices 10.1.1 Magnetic Disks Magnetic disks provide the bulk of secondary storage for modern computer systems Conceptually, disks are relatively simple (Figure 10.1) Each disk platter has a flat circular shape, like a CD Common platter diameters range from 1.8 to 3.5 inches The two surfaces of a platter are covered with a magnetic material We store information by recording it magnetically on the platters 467 Download at http://www.pin5i.com/ 468 Chapter 10 Mass-Storage Structure track t spindle arm assembly sector s read-write head cylinder c platter arm rotation Figure 10.1 Moving-head disk mechanism A read–write head “flies” just above each surface of every platter The heads are attached to a disk arm that moves all the heads as a unit The surface of a platter is logically divided into circular tracks, which are subdivided into sectors The set of tracks that are at one arm position makes up a cylinder There may be thousands of concentric cylinders in a disk drive, and each track may contain hundreds of sectors The storage capacity of common disk drives is measured in gigabytes When the disk is in use, a drive motor spins it at high speed Most drives rotate 60 to 250 times per second, specified in terms of rotations per minute (RPM) Common drives spin at 5,400, 7,200, 10,000, and 15,000 RPM Disk speed has two parts The transfer rate is the rate at which data flow between the drive and the computer The positioning time, or random-access time, consists of two parts: the time necessary to move the disk arm to the desired cylinder, called the seek time, and the time necessary for the desired sector to rotate to the disk head, called the rotational latency Typical disks can transfer several megabytes of data per second, and they have seek times and rotational latencies of several milliseconds Because the disk head flies on an extremely thin cushion of air (measured in microns), there is a danger that the head will make contact with the disk surface Although the disk platters are coated with a thin protective layer, the head will sometimes damage the magnetic surface This accident is called a head crash A head crash normally cannot be repaired; the entire disk must be replaced A disk can be removable, allowing different disks to be mounted as needed Removable magnetic disks generally consist of one platter, held in a plastic case to prevent damage while not in the disk drive Other forms of removable disks include CDs, DVDs, and Blu-ray discs as well as removable flash-memory devices known as flash drives (which are a type of solid-state drive) Download at http://www.pin5i.com/ 10.1 Overview of Mass-Storage Structure 469 A disk drive is attached to a computer by a set of wires called an I/O bus Several kinds of buses are available, including advanced technology attachment (ATA), serial ATA (SATA), eSATA, universal serial bus (USB), and fibre channel (FC) The data transfers on a bus are carried out by special electronic processors called controllers The host controller is the controller at the computer end of the bus A disk controller is built into each disk drive To perform a disk I/O operation, the computer places a command into the host controller, typically using memory-mapped I/O ports, as described in Section 9.7.3 The host controller then sends the command via messages to the disk controller, and the disk controller operates the disk-drive hardware to carry out the command Disk controllers usually have a built-in cache Data transfer at the disk drive happens between the cache and the disk surface, and data transfer to the host, at fast electronic speeds, occurs between the cache and the host controller 10.1.2 Solid-State Disks Sometimes old technologies are used in new ways as economics change or the technologies evolve An example is the growing importance of solid-state disks, or SSDs Simply described, an SSD is nonvolatile memory that is used like a hard drive There are many variations of this technology, from DRAM with a battery to allow it to maintain its state in a power failure through flash-memory technologies like single-level cell (SLC) and multilevel cell (MLC) chips SSDs have the same characteristics as traditional hard disks but can be more reliable because they have no moving parts and faster because they have no seek time or latency In addition, they consume less power However, they are more expensive per megabyte than traditional hard disks, have less capacity than the larger hard disks, and may have shorter life spans than hard disks, so their uses are somewhat limited One use for SSDs is in storage arrays, where they hold file-system metadata that require high performance SSDs are also used in some laptop computers to make them smaller, faster, and more energy-efficient Because SSDs can be much faster than magnetic disk drives, standard bus interfaces can cause a major limit on throughput Some SSDs are designed to connect directly to the system bus (PCI, for example) SSDs are changing other traditional aspects of computer design as well Some systems use them as a direct replacement for disk drives, while others use them as a new cache tier, moving data between magnetic disks, SSDs, and memory to optimize performance In the remainder of this chapter, some sections pertain to SSDs, while others not For example, because SSDs have no disk head, disk-scheduling algorithms largely not apply Throughput and formatting, however, apply 10.1.3 Magnetic Tapes Magnetic tape was used as an early secondary-storage medium Although it is relatively permanent and can hold large quantities of data, its access time is slow compared with that of main memory and magnetic disk In addition, random access to magnetic tape is about a thousand times slower than random access to magnetic disk, so tapes are not very useful for secondary storage Download at http://www.pin5i.com/ 470 Chapter 10 Mass-Storage Structure DISK TRANSFER RATES As with many aspects of computing, published performance numbers for disks are not the same as real-world performance numbers Stated transfer rates are always lower than effective transfer rates, for example The transfer rate may be the rate at which bits can be read from the magnetic media by the disk head, but that is different from the rate at which blocks are delivered to the operating system Tapes are used mainly for backup, for storage of infrequently used information, and as a medium for transferring information from one system to another A tape is kept in a spool and is wound or rewound past a read –write head Moving to the correct spot on a tape can take minutes, but once positioned, tape drives can write data at speeds comparable to disk drives Tape capacities vary greatly, depending on the particular kind of tape drive, with current capacities exceeding several terabytes Some tapes have built-in compression that can more than double the effective storage Tapes and their drivers are usually categorized by width, including 4, 8, and 19 millimeters and 1/4 and 1/2 inch Some are named according to technology, such as LTO-5 and SDLT 10.2 Disk Structure Modern magnetic disk drives are addressed as large one-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer The size of a logical block is usually 512 bytes, although some disks can be low-level formatted to have a different logical block size, such as 1,024 bytes This option is described in Section 10.5.1 The one-dimensional array of logical blocks is mapped onto the sectors of the disk sequentially Sector is the first sector of the first track on the outermost cylinder The mapping proceeds in order through that track, then through the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost By using this mapping, we can—at least in theory—convert a logical block number into an old-style disk address that consists of a cylinder number, a track number within that cylinder, and a sector number within that track In practice, it is difficult to perform this translation, for two reasons First, most disks have some defective sectors, but the mapping hides this by substituting spare sectors from elsewhere on the disk Second, the number of sectors per track is not a constant on some drives Let’s look more closely at the second reason On media that use constant linear velocity (CLV), the density of bits per track is uniform The farther a track is from the center of the disk, the greater its length, so the more sectors it can hold As we move from outer zones to inner zones, the number of sectors per track decreases Tracks in the outermost zone typically hold 40 percent more sectors than tracks in the innermost zone The drive increases its rotation speed as the head moves from the outer to the inner tracks to keep the same rate of data moving under the head This method is used in CD-ROM Download at http://www.pin5i.com/ 10.3 Disk Attachment 471 and DVD-ROM drives Alternatively, the disk rotation speed can stay constant; in this case, the density of bits decreases from inner tracks to outer tracks to keep the data rate constant This method is used in hard disks and is known as constant angular velocity (CAV) The number of sectors per track has been increasing as disk technology improves, and the outer zone of a disk usually has several hundred sectors per track Similarly, the number of cylinders per disk has been increasing; large disks have tens of thousands of cylinders 10.3 Disk Attachment Computers access disk storage in two ways One way is via I/O ports (or host-attached storage); this is common on small systems The other way is via a remote host in a distributed file system; this is referred to as network-attached storage 10.3.1 Host-Attached Storage Host-attached storage is storage accessed through local I/O ports These ports use several technologies The typical desktop PC uses an I/O bus architecture called IDE or ATA This architecture supports a maximum of two drives per I/O bus A newer, similar protocol that has simplified cabling is SATA High-end workstations and servers generally use more sophisticated I/O architectures such as fibre channel (FC), a high-speed serial architecture that can operate over optical fiber or over a four-conductor copper cable It has two variants One is a large switched fabric having a 24-bit address space This variant is expected to dominate in the future and is the basis of storage-area networks (SANs), discussed in Section 10.3.3 Because of the large address space and the switched nature of the communication, multiple hosts and storage devices can attach to the fabric, allowing great flexibility in I/O communication The other FC variant is an arbitrated loop (FC-AL) that can address 126 devices (drives and controllers) A wide variety of storage devices are suitable for use as host-attached storage Among these are hard disk drives, RAID arrays, and CD, DVD, and tape drives The I/O commands that initiate data transfers to a host-attached storage device are reads and writes of logical data blocks directed to specifically identified storage units (such as bus ID or target logical unit) 10.3.2 Network-Attached Storage A network-attached storage (NAS) device is a special-purpose storage system that is accessed remotely over a data network (Figure 10.2) Clients access network-attached storage via a remote-procedure-call interface such as NFS for UNIX systems or CIFS for Windows machines The remote procedure calls (RPCs) are carried via TCP or UDP over an IP network—usually the same localarea network (LAN) that carries all data traffic to the clients Thus, it may be easiest to think of NAS as simply another storage-access protocol The networkattached storage unit is usually implemented as a RAID array with software that implements the RPC interface Download at http://www.pin5i.com/ 472 Chapter 10 Mass-Storage Structure client NAS LAN/WAN client NAS client Figure 10.2 Network-attached storage Network-attached storage provides a convenient way for all the computers on a LAN to share a pool of storage with the same ease of naming and access enjoyed with local host-attached storage However, it tends to be less efficient and have lower performance than some direct-attached storage options iSCSI is the latest network-attached storage protocol In essence, it uses the IP network protocol to carry the SCSI protocol Thus, networks—rather than SCSI cables—can be used as the interconnects between hosts and their storage As a result, hosts can treat their storage as if it were directly attached, even if the storage is distant from the host 10.3.3 Storage-Area Network One drawback of network-attached storage systems is that the storage I/O operations consume bandwidth on the data network, thereby increasing the latency of network communication This problem can be particularly acute in large client–server installations—the communication between servers and clients competes for bandwidth with the communication among servers and storage devices A storage-area network (SAN) is a private network (using storage protocols rather than networking protocols) connecting servers and storage units, as shown in Figure 10.3 The power of a SAN lies in its flexibility Multiple hosts and multiple storage arrays can attach to the same SAN, and storage can be dynamically allocated to hosts A SAN switch allows or prohibits access between the hosts and the storage As one example, if a host is running low on disk space, the SAN can be configured to allocate more storage to that host SANs make it possible for clusters of servers to share the same storage and for storage arrays to include multiple direct host connections SANs typically have more ports—as well as more expensive ports—than storage arrays FC is the most common SAN interconnect, although the simplicity of iSCSI is increasing its use Another SAN interconnect is InfiniBand — a special-purpose bus architecture that provides hardware and software support for high-speed interconnection networks for servers and storage units 10.4 Disk Scheduling One of the responsibilities of the operating system is to use the hardware efficiently For the disk drives, meeting this responsibility entails having fast Download at http://www.pin5i.com/ 10.4 Disk Scheduling 473 client server client storage array storage array LAN/WAN server client SAN data-processing center tape library web content provider Figure 10.3 Storage-area network access time and large disk bandwidth For magnetic disks, the access time has two major components, as mentioned in Section 10.1.1 The seek time is the time for the disk arm to move the heads to the cylinder containing the desired sector The rotational latency is the additional time for the disk to rotate the desired sector to the disk head The disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer We can improve both the access time and the bandwidth by managing the order in which disk I/O requests are serviced Whenever a process needs I/O to or from the disk, it issues a system call to the operating system The request specifies several pieces of information: • • • • Whether this operation is input or output What the disk address for the transfer is What the memory address for the transfer is What the number of sectors to be transferred is If the desired disk drive and controller are available, the request can be serviced immediately If the drive or controller is busy, any new requests for service will be placed in the queue of pending requests for that drive For a multiprogramming system with many processes, the disk queue may often have several pending requests Thus, when one request is completed, the operating system chooses which pending request to service next How does the operating system make this choice? Any one of several disk-scheduling algorithms can be used, and we discuss them next 10.4.1 FCFS Scheduling The simplest form of disk scheduling is, of course, the first-come, first-served (FCFS) algorithm This algorithm is intrinsically fair, but it generally does not provide the fastest service Consider, for example, a disk queue with requests for I/O to blocks on cylinders 98, 183, 37, 122, 14, 124, 65, 67, Download at http://www.pin5i.com/ 474 Chapter 10 Mass-Storage Structure queue 98, 183, 37, 122, 14, 124, 65, 67 head starts at 53 14 37 5365 67 98 122124 183 199 Figure 10.4 FCFS disk scheduling in that order If the disk head is initially at cylinder 53, it will first move from 53 to 98, then to 183, 37, 122, 14, 124, 65, and finally to 67, for a total head movement of 640 cylinders This schedule is diagrammed in Figure 10.4 The wild swing from 122 to 14 and then back to 124 illustrates the problem with this schedule If the requests for cylinders 37 and 14 could be serviced together, before or after the requests for 122 and 124, the total head movement could be decreased substantially, and performance could be thereby improved 10.4.2 SSTF Scheduling It seems reasonable to service all the requests close to the current head position before moving the head far away to service other requests This assumption is the basis for the shortest-seek-time-first (SSTF) algorithm The SSTF algorithm selects the request with the least seek time from the current head position In other words, SSTF chooses the pending request closest to the current head position For our example request queue, the closest request to the initial head position (53) is at cylinder 65 Once we are at cylinder 65, the next closest request is at cylinder 67 From there, the request at cylinder 37 is closer than the one at 98, so 37 is served next Continuing, we service the request at cylinder 14, then 98, 122, 124, and finally 183 (Figure 10.5) This scheduling method results in a total head movement of only 236 cylinders—little more than one-third of the distance needed for FCFS scheduling of this request queue Clearly, this algorithm gives a substantial improvement in performance SSTF scheduling is essentially a form of shortest-job-first (SJF) scheduling; and like SJF scheduling, it may cause starvation of some requests Remember that requests may arrive at any time Suppose that we have two requests in the queue, for cylinders 14 and 186, and while the request from 14 is being serviced, a new request near 14 arrives This new request will be serviced next, making the request at 186 wait While this request is being serviced, another request close to 14 could arrive In theory, a continual stream of requests near one another could cause the request for cylinder 186 to wait indefinitely Download at http://www.pin5i.com/ 20.9 IBM OS/360 899 venient and practical mode of computing One result of CTSS was increased development of time-sharing systems Another result was the development of MULTICS 20.8 MULTICS The MULTICS operating system was designed from 1965 to 1970 at MIT as a natural extension of CTSS CTSS and other early time-sharing systems were so successful that they created an immediate desire to proceed quickly to bigger and better systems As larger computers became available, the designers of CTSS set out to create a time-sharing utility Computing service would be provided like electrical power Large computer systems would be connected by telephone wires to terminals in offices and homes throughout a city The operating system would be a time-shared system running continuously with a vast file system of shared programs and data MULTICS was designed by a team from MIT, GE (which later sold its computer department to Honeywell), and Bell Laboratories (which dropped out of the project in 1969) The basic GE 635 computer was modified to a new computer system called the GE 645, mainly by the addition of pagedsegmentation memory hardware In MULTICS, a virtual address was composed of an 18-bit segment number and a 16-bit word offset The segments were then paged in 1-KB-word pages The second-chance page-replacement algorithm was used The segmented virtual address space was merged into the file system; each segment was a file Segments were addressed by the name of the file The file system itself was a multilevel tree structure, allowing users to create their own subdirectory structures Like CTSS, MULTICS used a multilevel feedback queue for CPU scheduling Protection was accomplished through an access list associated with each file and a set of protection rings for executing processes The system, which was written almost entirely in PL/1, comprised about 300,000 lines of code It was extended to a multiprocessor system, allowing a CPU to be taken out of service for maintenance while the system continued running 20.9 IBM OS/360 The longest line of operating-system development is undoubtedly that of IBM computers The early IBM computers, such as the IBM 7090 and the IBM 7094, are prime examples of the development of common I/O subroutines, followed by development of a resident monitor, privileged instructions, memory protection, and simple batch processing These systems were developed separately, often at independent sites As a result, IBM was faced with many different computers, with different languages and different system software The IBM/360 —which first appeared in the mid 1960’s — was designed to alter this situation The IBM/360 ([Mealy et al (1966)]) was designed as a family of computers spanning the complete range from small business machines to large scientific machines Only one set of software would be needed for these systems, which all used the same operating system: OS/360 This arrangement Download at http://www.pin5i.com/ 900 Chapter 20 Influential Operating Systems was intended to reduce maintenance problems for IBM and to allow users to move programs and applications freely from one IBM system to another Unfortunately, OS/360 tried to be all things to all people As a result, it did none of its tasks especially well The file system included a type field that defined the type of each file, and different file types were defined for fixed-length and variable-length records and for blocked and unblocked files Contiguous allocation was used, so the user had to guess the size of each output file The Job Control Language (JCL) added parameters for every possible option, making it incomprehensible to the average user The memory-management routines were hampered by the architecture Although a base-register addressing mode was used, the program could access and modify the base register, so that absolute addresses were generated by the CPU This arrangement prevented dynamic relocation; the program was bound to physical memory at load time Two separate versions of the operating system were produced: OS/MFT used fixed regions and OS/MVT used variable regions The system was written in assembly language by thousands of programmers, resulting in millions of lines of code The operating system itself required large amounts of memory for its code and tables Operating-system overhead often consumed one-half of the total CPU cycles Over the years, new versions were released to add new features and to fix errors However, fixing one error often caused another in some remote part of the system, so that the number of known errors in the system remained fairly constant Virtual memory was added to OS/360 with the change to the IBM/370 architecture The underlying hardware provided a segmented-paged virtual memory New versions of OS used this hardware in different ways OS/VS1 created one large virtual address space and ran OS/MFT in that virtual memory Thus, the operating system itself was paged, as well as user programs OS/VS2 Release ran OS/MVT in virtual memory Finally, OS/VS2 Release 2, which is now called MVS, provided each user with his own virtual memory MVS is still basically a batch operating system The CTSS system was run on an IBM 7094, but the developers at MIT decided that the address space of the 360, IBM’s successor to the 7094, was too small for MULTICS, so they switched vendors IBM then decided to create its own time-sharing system, TSS/360 Like MULTICS, TSS/360 was supposed to be a large, time-shared utility The basic 360 architecture was modified in the model 67 to provide virtual memory Several sites purchased the 360/67 in anticipation of TSS/360 TSS/360 was delayed, however, so other time-sharing systems were developed as temporary systems until TSS/360 was available A time-sharing option (TSO) was added to OS/360 IBM’s Cambridge Scientific Center developed CMS as a single-user system and CP/67 to provide a virtual machine to run it on When TSS/360 was eventually delivered, it was a failure It was too large and too slow As a result, no site would switch from its temporary system to TSS/360 Today, time sharing on IBM systems is largely provided either by TSO under MVS or by CMS under CP/67 (renamed VM) Neither TSS/360 nor MULTICS achieved commercial success What went wrong? Part of the problem was that these advanced systems were too large and too complex to be understood Another problem was the assumption that computing power would be available from a large, remote source Download at http://www.pin5i.com/ 20.11 CP/M and MS/DOS 901 Minicomputers came along and decreased the need for large monolithic systems They were followed by workstations and then personal computers, which put computing power closer and closer to the end users 20.10 TOPS-20 DEC created many influential computer systems during its history Probably the most famous operating system associated with DEC is VMS, a popular business-oriented system that is still in use today as OpenVMS, a product of Hewlett-Packard But perhaps the most influential of DEC’s operating systems was TOPS-20 TOPS-20 started life as a research project at Bolt, Beranek, and Newman (BBN) around 1970 BBN took the business-oriented DEC PDP-10 computer running TOPS-10, added a hardware memory-paging system to implement virtual memory, and wrote a new operating system for that computer to take advantage of the new hardware features The result was TENEX, a generalpurpose timesharing system DEC then purchased the rights to TENEX and created a new computer with a built-in hardware pager The resulting system was the DECSYSTEM-20 and the TOPS-20 operating system TOPS-20 had an advanced command-line interpreter that provided help as needed to users That, in combination with the power of the computer and its reasonable price, made the DECSYSTEM-20 the most popular time-sharing system of its time In 1984, DEC stopped work on its line of 36-bit PDP-10 computers to concentrate on 32-bit VAX systems running VMS 20.11 CP/M and MS/DOS Early hobbyist computers were typically built from kits and ran a single program at a time The systems evolved into more advanced systems as computer components improved An early “standard” operating system for these computers of the 1970s was CP/M, short for Control Program/Monitor, written by Gary Kindall of Digital Research, Inc CP/M ran primarily on the first “personal computer” CPU, the 8-bit Intel 8080 CP/M originally supported only 64 KB of memory and ran only one program at a time Of course, it was text-based, with a command interpreter The command interpreter resembled those in other operating systems of the time, such as the TOPS-10 from DEC When IBM entered the personal computer business, it decided to have Bill Gates and company write a new operating system for its 16-bit CPU of choice —the Intel 8086 This operating system, MS-DOS, was similar to CP/M but had a richer set of built-in commands, again mostly modeled after TOPS-10 MS-DOS became the most popular personal-computer operating system of its time, starting in 1981 and continuing development until 2000 It supported 640 KB of memory, with the ability to address “extended” and “expanded” memory to get somewhat beyond that limit It lacked fundamental current operating-system features, however, especially protected memory Download at http://www.pin5i.com/ 902 Chapter 20 Influential Operating Systems 20.12 Macintosh Operating System and Windows With the advent of 16-bit CPUs, operating systems for personal computers could become more advanced, feature rich, and usable The Apple Macintosh computer was arguably the first computer with a GUI designed for home users It was certainly the most successful for a while, starting at its launch in 1984 It used a mouse for screen pointing and selecting and came with many utility programs that took advantage of the new user interface Hard-disk drives were relatively expensive in 1984, so it came only with a 400-KB-capacity floppy drive by default The original Mac OS ran only on Apple computers and slowly was eclipsed by Microsoft Windows (starting with Version 1.0 in 1985), which was licensed to run on many different computers from a multitude of companies As microprocessor CPUs evolved to 32-bit chips with advanced features, such as protected memory and context switching, these operating systems added features that had previously been found only on mainframes and minicomputers Over time, personal computers became as powerful as those systems and more useful for many purposes Minicomputers died out, replaced by general and special-purpose “servers.” Although personal computers continue to increase in capacity and performance, servers tend to stay ahead of them in amount of memory, disk space, and number and speed of available CPUs Today, servers typically run in data centers or machine rooms, while personal computers sit on or next to desks and talk to each other and servers across a network The desktop rivalry between Apple and Microsoft continues today, with new versions of Windows and Mac OS trying to outdo each other in features, usability, and application functionality Other operating systems, such as AmigaOS and OS/2, have appeared over time but have not been long-term competitors to the two leading desktop operating systems Meanwhile, Linux in its many forms continues to gain in popularity among more technical users —and even with nontechnical users on systems like the One Laptop per Child (OLPC) children’s connected computer network (http://laptop.org/) 20.13 Mach The Mach operating system traces its ancestry to the Accent operating system developed at Carnegie Mellon University (CMU) Mach’s communication system and philosophy are derived from Accent, but many other significant portions of the system (for example, the virtual memory system and task and thread management) were developed from scratch Work on Mach began in the mid 1980’s and the operating system was designed with the following three critical goals in mind: Emulate 4.3 BSD UNIX so that the executable files from a UNIX system can run correctly under Mach Be a modern operating system that supports many memory models, as well as parallel and distributed computing Have a kernel that is simpler and easier to modify than 4.3 BSD Download at http://www.pin5i.com/ 20.13 Mach 903 Mach’s development followed an evolutionary path from BSD UNIX systems Mach code was initially developed inside the 4.2BSD kernel, with BSD kernel components replaced by Mach components as the Mach components were completed The BSD components were updated to 4.3BSD when that became available By 1986, the virtual memory and communication subsystems were running on the DEC VAX computer family, including multiprocessor versions of the VAX Versions for the IBM RT/PC and for SUN workstations followed shortly Then, 1987 saw the completion of the Encore Multimax and Sequent Balance multiprocessor versions, including task and thread support, as well as the first official releases of the system, Release and Release Through Release 2, Mach provided compatibility with the corresponding BSD systems by including much of BSD’s code in the kernel The new features and capabilities of Mach made the kernels in these releases larger than the corresponding BSD kernels Mach moved the BSD code outside the kernel, leaving a much smaller microkernel This system implements only basic Mach features in the kernel; all UNIX-specific code has been evicted to run in user-mode servers Excluding UNIX-specific code from the kernel allows the replacement of BSD with another operating system or the simultaneous execution of multiple operating-system interfaces on top of the microkernel In addition to BSD, user-mode implementations have been developed for DOS, the Macintosh operating system, and OSF/1 This approach has similarities to the virtual machine concept, but here the virtual machine is defined by software (the Mach kernel interface), rather than by hardware With Release 3.0, Mach became available on a wide variety of systems, including single-processor SUN, Intel, IBM, and DEC machines and multiprocessor DEC, Sequent, and Encore systems Mach was propelled to the forefront of industry attention when the Open Software Foundation (OSF) announced in 1989 that it would use Mach 2.5 as the basis for its new operating system, OSF/1 (Mach 2.5 was also the basis for the operating system on the NeXT workstation, the brainchild of Steve Jobs of Apple Computer fame.) The initial release of OSF/1 occurred a year later, and this system competed with UNIX System V, Release 4, the operating system of choice at that time among UNIX International (UI) members OSF members included key technological companies such as IBM, DEC, and HP OSF has since changed its direction, and only DEC UNIX is based on the Mach kernel Unlike UNIX, which was developed without regard for multiprocessing, Mach incorporates multiprocessing support throughout This support is also exceedingly flexible, ranging from shared-memory systems to systems with no memory shared between processors Mach uses lightweight processes, in the form of multiple threads of execution within one task (or address space), to support multiprocessing and parallel computation Its extensive use of messages as the only communication method ensures that protection mechanisms are complete and efficient By integrating messages with the virtual memory system, Mach also ensures that messages can be handled efficiently Finally, by having the virtual memory system use messages to communicate with the daemons managing the backing store, Mach provides great flexibility in the design and implementation of these memory-objectmanaging tasks By providing low-level, or primitive, system calls from which more complex functions can be built, Mach reduces the size of the kernel Download at http://www.pin5i.com/ 904 Chapter 20 Influential Operating Systems while permitting operating-system emulation at the user level, much like IBM’s virtual machine systems Some previous editions of Operating System Concepts included an entire chapter on Mach This chapter, as it appeared in the fourth edition, is available on the Web (http://www.os-book.com) 20.14 Other Systems There are, of course, other operating systems, and most of them have interesting properties The MCP operating system for the Burroughs computer family was the first to be written in a system programming language It supported segmentation and multiple CPUs The SCOPE operating system for the CDC 6600 was also a multi-CPU system The coordination and synchronization of the multiple processes were surprisingly well designed History is littered with operating systems that suited a purpose for a time (be it a long or a short time) and then, when faded, were replaced by operating systems that had more features, supported newer hardware, were easier to use, or were better marketed We are sure this trend will continue in the future Exercises 20.1 20.2 Discuss what considerations the computer operator took into account in deciding on the sequences in which programs would be run on early computer systems that were manually operated What optimizations were used to minimize the discrepancy between CPU and I/O speeds on early computer systems? 20.3 Consider the page-replacement algorithm used by Atlas In what ways is it different from the clock algorithm discussed in Section 9.4.5.2? 20.4 Consider the multilevel feedback queue used by CTSS and MULTICS Suppose a program consistently uses seven time units every time it is scheduled before it performs an I/O operation and blocks How many time units are allocated to this program when it is scheduled for execution at different points in time? 20.5 What are the implications of supporting BSD functionality in user-mode servers within the Mach operating system? 20.6 What conclusions can be drawn about the evolution of operating systems? What causes some operating systems to gain in popularity and others to fade? Bibliographical Notes Looms and calculators are described in [Frah (2001)] and shown graphically in [Frauenfelder (2005)] The Manchester Mark is discussed by [Rojas and Hashagen (2000)], and its offspring, the Ferranti Mark 1, is described by [Ceruzzi (1998)] Download at http://www.pin5i.com/ Bibliography 905 [Kilburn et al (1961)] and [Howarth et al (1961)] examine the Atlas operating system The XDS-940 operating system is described by [Lichtenberger and Pirtle (1965)] The THE operating system is covered by [Dijkstra (1968)] and by [McKeag and Wilson (1976)] The Venus system is described by [Liskov (1972)] [Brinch-Hansen (1970)] and [Brinch-Hansen (1973)] discuss the RC 4000 system The Compatible Time-Sharing System (CTSS) is presented by [Corbato et al (1962)] The MULTICS operating system is described by [Corbato and Vyssotsky (1965)] and [Organick (1972)] [Mealy et al (1966)] presented the IBM/360 [Lett and Konigsford (1968)] cover TSS/360 CP/67 is described by [Meyer and Seawright (1970)] and [Parmelee et al (1972)] DEC VMS is discussed by [Kenah et al (1988)], and TENEX is described by [Bobrow et al (1972)] A description of the Apple Macintosh appears in [Apple (1987)] For more information on these operating systems and their history, see [Freiberger and Swaine (2000)] The Mach operating system and its ancestor, the Accent operating system, are described by [Rashid and Robertson (1981)] Mach’s communication system is covered by [Rashid (1986)], [Tevanian et al (1989)], and [Accetta et al (1986)] The Mach scheduler is described in detail by [Tevanian et al (1987a)] and [Black (1990)] An early version of the Mach sharedmemory and memory-mapping system is presented by [Tevanian et al (1987b)] A good resource describing the Mach project can be found at http://www.cs.cmu.edu/afs/cs/project/mach/public/www/mach.html [McKeag and Wilson (1976)] discuss the MCP operating system for the Burroughs computer family as well as the SCOPE operating system for the CDC 6600 Bibliography [Accetta et al (1986)] M Accetta, R Baron, W Bolosky, D B Golub, R Rashid, A Tevanian, and M Young, “Mach: A New Kernel Foundation for UNIX Development”, Proceedings of the Summer USENIX Conference (1986), pages 93–112 [Apple (1987)] Apple Technical Introduction to the Macintosh Family AddisonWesley (1987) [Black (1990)] D L Black, “Scheduling Support for Concurrency and Parallelism in the Mach Operating System”, IEEE Computer, Volume 23, Number (1990), pages 35–43 Download at http://www.pin5i.com/ 906 Chapter 20 Influential Operating Systems [Bobrow et al (1972)] D G Bobrow, J D Burchfiel, D L Murphy, and R S Tomlinson, “TENEX, a Paged Time Sharing System for the PDP-10”, Communications of the ACM, Volume 15, Number (1972) [Brinch-Hansen (1970)] P Brinch-Hansen, “The Nucleus of a Multiprogramming System”, Communications of the ACM, Volume 13, Number (1970), pages 238–241 and 250 [Brinch-Hansen (1973)] Hall (1973) [Ceruzzi (1998)] P Brinch-Hansen, Operating System Principles, Prentice P E Ceruzzi, A History of Modern Computing, MIT Press (1998) [Corbato and Vyssotsky (1965)] F J Corbato and V A Vyssotsky, “Introduction and Overview of the MULTICS System”, Proceedings of the AFIPS Fall Joint Computer Conference (1965), pages 185–196 [Corbato et al (1962)] F J Corbato, M Merwin-Daggett, and R C Daley, “An Experimental Time-Sharing System”, Proceedings of the AFIPS Fall Joint Computer Conference (1962), pages 335–344 [Dijkstra (1968)] E W Dijkstra, “The Structure of the THE Multiprogramming System”, Communications of the ACM, Volume 11, Number (1968), pages 341–346 [Frah (2001)] (2001) G Frah, The Universal History of Computing, John Wiley and Sons [Frauenfelder (2005)] M Frauenfelder, The Computer — An Illustrated History, Carlton Books (2005) [Freiberger and Swaine (2000)] P Freiberger and M Swaine, Fire in the Valley — The Making of the Personal Computer, McGraw-Hill (2000) [Howarth et al (1961)] D J Howarth, R B Payne, and F H Sumner, “The Manchester University Atlas Operating System, Part II: User’s Description”, Computer Journal, Volume 4, Number (1961), pages 226–229 [Kenah et al (1988)] L J Kenah, R E Goldenberg, and S F Bate, VAX/VMS Internals and Data Structures, Digital Press (1988) [Kilburn et al (1961)] T Kilburn, D J Howarth, R B Payne, and F H Sumner, “The Manchester University Atlas Operating System, Part I: Internal Organization”, Computer Journal, Volume 4, Number (1961), pages 222–225 [Lett and Konigsford (1968)] A L Lett and W L Konigsford, “TSS/360: A Time-Shared Operating System”, Proceedings of the AFIPS Fall Joint Computer Conference (1968), pages 15–28 [Lichtenberger and Pirtle (1965)] W W Lichtenberger and M W Pirtle, “A Facility for Experimentation in Man-Machine Interaction”, Proceedings of the AFIPS Fall Joint Computer Conference (1965), pages 589–598 [Liskov (1972)] B H Liskov, “The Design of the Venus Operating System”, Communications of the ACM, Volume 15, Number (1972), pages 144–149 [McKeag and Wilson (1976)] R M McKeag and R Wilson, Studies in Operating Systems, Academic Press (1976) Download at http://www.pin5i.com/ Bibliography 907 [Mealy et al (1966)] G H Mealy, B I Witt, and W A Clark, “The Functional Structure of OS/360”, IBM Systems Journal, Volume 5, Number (1966), pages 3–11 [Meyer and Seawright (1970)] R A Meyer and L H Seawright, “A Virtual Machine Time-Sharing System”, IBM Systems Journal, Volume 9, Number (1970), pages 199–218 [Organick (1972)] E I Organick, The Multics System: An Examination of Its Structure, MIT Press (1972) [Parmelee et al (1972)] R P Parmelee, T I Peterson, C C Tillman, and D Hatfield, “Virtual Storage and Virtual Machine Concepts”, IBM Systems Journal, Volume 11, Number (1972), pages 99–130 [Rashid (1986)] R F Rashid, “From RIG to Accent to Mach: The Evolution of a Network Operating System”, Proceedings of the ACM/IEEE Computer Society, Fall Joint Computer Conference (1986), pages 1128–1137 [Rashid and Robertson (1981)] R Rashid and G Robertson, “Accent: A Communication-Oriented Network Operating System Kernel”, Proceedings of the ACM Symposium on Operating System Principles (1981), pages 64–75 [Rojas and Hashagen (2000)] R Rojas and U Hashagen, The First Computers — History and Architectures, MIT Press (2000) [Tevanian et al (1987a)] A Tevanian, Jr., R F Rashid, D B Golub, D L Black, E Cooper, and M W Young, “Mach Threads and the Unix Kernel: The Battle for Control”, Proceedings of the Summer USENIX Conference (1987) [Tevanian et al (1987b)] A Tevanian, Jr., R F Rashid, M W Young, D B Golub, M R Thompson, W Bolosky, and R Sanzi, “A UNIX Interface for Shared Memory and Memory Mapped Files Under Mach”, Technical report, Carnegie-Mellon University (1987) [Tevanian et al (1989)] A Tevanian, Jr., and B Smith, “Mach: The Model for Future Unix”, Byte (1989) Download at http://www.pin5i.com/ Download at http://www.pin5i.com/ Credits • Figure 1.11: From Hennesy and Patterson, Computer Architecture: A Quanti- C 2002, Morgan Kaufmann Publishers, Figure tative Approach, Third Edition, 5.3, p 394 Reprinted with permission of the publisher • Figure 6.24 adapted with permission from Sun Microsystems, Inc C 1971, Interna• Figure 9.18: From IBM Systems Journal, Vol 10, No 3, tional Business Machines Corporation Reprinted by permission of IBM Corporation • Figure 12.9: From Leffler/McKusick/Karels/Quarterman, The Design and C 1989 by AddisonImplementation of the 4.3BSD UNIX Operating System, Wesley Publishing Co., Inc., Reading, Massachusetts Figure 7.6, p 196 Reprinted with permission of the publisher • Figure 13.4: From Pentium Processor User’s Manual: Architecture and Programming Manual, Volume 3, Copyright 1993 Reprinted by permission of Intel Corporation • Figures 17.5, 17.6, and 17.8: From Halsall, Data Communications, Computer C 1992, Addison-Wesley PubNetworks, and Open Systems, Third Edition, lishing Co., Inc., Reading, Massachusetts Figure 1.9, p 14, Figure 1.10, p 15, and Figure 1.11, p 18 Reprinted with permission of the publisher • Figure 6.14: From Khanna/Sebree/Zolnowsky, “Realtime Scheduling in SunOS 5.0,” Proceedings of Winter USENIX, January 1992, San Francisco, California Derived with permission of the authors 909 Download at http://www.pin5i.com/ Download at http://www.pin5i.com/ Index A access-control lists (ACLs), 832 ACLs (access-control lists), 832 ACPI (advanced configuration and power interface), 862 address space layout randomization (ASLR), 832 admission-control algorithms, 286 advanced configuration and power interface (ACPI), 862 advanced encryption standard (AES), 677 advanced local procedure call (ALPC), 135, 854 ALPC (advanced local procedure call), 135, 854 AMD64 architecture, 387 Amdahl’s Law, 167 AMD virtualization technology (AMD-V), 720 Android operating system, 85–86 API (application program interface), 63–64 Apple iPad, 60, 84 application containment, 713, 727–728 Aqua interface, 59, 84 ARM architecture, 388 arrays, 31 ASIDs (address-space identifiers), 374 ASLR (address space layout randomization), 832 assembly language, 77 asynchronous threading, 172 augmented-reality applications, 36 authentication: multifactor, 689 automatic working-set trimming, 446 B background processes, 74–75, 115, 296 balanced binary search trees, 33 binary search trees, 33 binary translation, 718–720 binary trees, 33 bitmaps, 34 bourne-Again shell (bash), 789 bridging, 732 bugs, 66 C CFQ (Completely Fair Queueing), 817 children, 33 chipsets, 836 Chrome, 123 CIFS (common internet file system), 871 circularly linked lists, 32 client(s): thin, 35 client-server model, 854–855 clock algorithm, 418–419 clones, 715 cloud computing, 41–42, 716 Cocoa Touch, 84 911 Download at http://www.pin5i.com/ 912 Index code integrity module (Windows 7), 832 COM (component object model), 873 common internet file system (CIFS), 871 Completely Fair Queueing (CFQ), 817 computational kernels, 835–836 computer environments: cloud computing, 41–42 distributed systems, 37–38 mobile computing, 36–37 real-time embedded systems, 43 virtualization, 40–41 computing: mobile, 36–37 concurrency, 166 Concurrency Runtime (ConcRT), 297, 880–881 condition variables, 879 conflict phase (of dispatch latency), 285 containers, 728 control partitions, 723 coupling, symmetric, 17 CPU scheduling: real-time, 283–290 earliest-deadline-first scheduling, 288–289 and minimizing latency, 283–285 POSIX real-time scheduling, 290 priority-based scheduling, 285–287 proportional share scheduling, 289–290 rate-monotonic scheduling, 287–288 virtual machines, 729 critical-section problem: and mutex locks, 212–213 D Dalvik virtual machine, 86 data parallelism, 168–169 defense in depth, 689 desktop window manager (DWM), 831 device objects, 855 Digital Equipment Corporation (DEC), 379 digital signatures, 832 DirectCompute, 835 discovery protocols, 39 disk(s): solid-state, 469 dispatcher, 294 DMA controller, 595 doubly linked lists, 32 driver objects, 855 DWM (desktop window manager), 831 dynamic configurations, 837, 838 E earliest-deadline-first (EDF) scheduling, 288–289 EC2, 41 EDF (earliest-deadline-first) scheduling, 288–289 efficiency, 837 emulation, 40, 727 emulators, 713 encryption: public-key, 678 energy efficiency, 837 Erlang language, 241–242 event latency, 283–284 event-pair objects, 855 exit() system call, 120, 121 Download at http://www.pin5i.com/ Index ext2 (second extended file system), 811 ext3 (third extended file system), 811–813 ext4 (fourth extended file system), 811 extended file attributes, 505 extensibility, 736 F fast-user switching, 863–864 FIFO, 32 file info window (Mac OS X), 505 file replication, 767 file systems: Windows 7, see Windows foreground processes, 115, 296 fork-join strategy, 172 fourth extended file system (ext4), 811 G GCD (Grand Central Dispatch), 182–183 general trees, 33 gestures, 60 global positioning system (GPS), 36 GNOME desktop, 60 GPS (global positioning system), 36 Grand Central Dispatch (GCD), 182–183 granularity, minimum, 797 graphics shaders, 835 guard pages, 847 GUIs (graphical user interfaces), 59–62 913 H Hadoop, 765 Hadoop distributed file system (HDFS), 767 handle tables, 844 hands-on computer systems, 20 hardware: virtual machines, 720–721 hash collisions, 471 hash functions, 33–34 hash maps, 471 HDFS (Hadoop distributed file system), 767 hibernation, 860–861 hybrid cloud, 42 hybrid operating systems, 83–86 Android, 85–86 iOS, 84–85 Mac OS X, 84 hypercalls, 726 hypervisors, 712 type 0, 723–724 type 1, 724–725 type 2, 725 I IA-32 architecture, 384–387 paging in, 385–387 segmentation in, 384–385 IA-64 architecture, 387 IaaS (infrastructure as a service), 42 idle threads, 840 IDSs (intrusion-detection systems), 691–694 imperative languages, 241 impersonation, 853 implicit threading, 177–183 Download at http://www.pin5i.com/ ... 183, 37, 122 , 14, 124 , 65, 67, Download at http://www.pin5i.com/ 474 Chapter 10 Mass-Storage Structure queue 98, 183, 37, 122 , 14, 124 , 65, 67 head starts at 53 14 37 5365 67 98 122 124 183 199... request at cylinder 2, 150, and the previous request was at cylinder 1,805 The queue of pending requests, in FIFO order, is: 2, 069, 1 ,21 2, 2, 296, 2, 800, 544, 1,618, 356, 1, 523 , 4,965, 3681 Download... from 17 to 20 2, moving them all down one spot That is, sector 20 2 is copied into the spare, then sector 20 1 into 20 2, then 20 0 into 20 1, and so on, until sector 18 is copied into sector 19 Slipping