Operating Systems Design and Implementation, Third Edition By Andrew S Tanenbaum - Vrije Universiteit Amsterdam, The Netherlands, Albert S Woodhull - Amherst, Massachusetts Publisher: Prentice Hall Pub Date: January 04, 2006 Table of Contents • Index • Print ISBN: 0-13-142938-8 10 Print ISBN: 978-0-13-142938-3 13 eText ISBN: 0-13-185991-9 10 eText ISBN: 978-0-13-185991-3 13 Pages: 1080 Revised to address the latest version of MINIX (MINIX 3), this streamlined, simplified new edition remains the only operating systems text to first explain relevant principles, then demonstrate their applications using a Unix-like operating system as a detailed example It has been especially designed for high reliability, for use in embedded systems, and for ease of teaching For the latest version of MINIX and simulators for running MINIX on other systems visit: www.minix3.org Operating Systems Design and Implementation, Third Edition By Andrew S Tanenbaum - Vrije Universiteit Amsterdam, The Netherlands, Albert S Woodhull - Amherst, Massachusetts Publisher: Prentice Hall Pub Date: January 04, 2006 Table of Contents • Index • Print ISBN: 0-13-142938-8 10 Print ISBN: 978-0-13-142938-3 13 eText ISBN: 0-13-185991-9 10 eText ISBN: 978-0-13-185991-3 13 Pages: 1080 Copyright Preface xv Chapter Introduction Section 1.1 What Is an Operating System? Section 1.2 History of Operating Systems Section 1.3 Operating System Concepts 19 Section 1.4 System Calls 26 Section 1.5 Operating System Structure 42 Section 1.6 Outline of the Rest of This Book 51 Section 1.7 Summary 51 Problems 52 Chapter Processes 55 Section 2.1 Introduction to Processes 55 Section 2.2 Interprocess Communication 68 Section 2.3 Classical IPC Problems 88 Section 2.4 Scheduling 93 Section 2.5 Overview of Processes in MINIX 112 Section 2.6 Implementation of Processes in MINIX 125 Section 2.7 The System Task in MINIX 192 Section 2.8 The Clock Task in MINIX 204 Section 2.9 Summary 214 Problems 215 Chapter Input/Output 221 Section 3.1 Principles of I/O Hardware 222 Section 3.2 Principles of I/O Software 229 Section 3.3 Deadlocks 237 Section 3.4 Overview of I/O in MINIX 252 Section 3.5 Block Devices in MINIX 261 Section 3.6 RAM Disks 271 Section 3.7 Disks 278 Section 3.8 Terminals 302 Section 3.9 Summary 366 Problems 367 Chapter Memory Management 373 Section 4.1 Basic Memory Management 374 Section 4.2 Swapping 378 Section 4.3 Virtual Memory 383 Section 4.4 Page Replacement Algorithms 396 Section 4.5 Design Issues for Paging Systems 404 Section 4.6 Segmentation 410 Section 4.7 Overview of the MINIX Process Manager 420 Section 4.8 Implementation of the MINIX Process Manager447 Section 4.9 Summary 475 Problems 476 Chapter File Systems 481 Section 5.1 Files 482 Section 5.2 Directories 491 Section 5.3 File System Implementation 497 Section 5.4 Security 526 Section 5.5 Protection Mechanisms 537 Section 5.6 Overview of the MINIX File System 548 Section 5.7 Implementation of the MINIX File System 566 Section 5.8 Summary 606 Problems 607 Chapter Reading List and Bibliography 611 Section 6.1 Suggestions for Further Reading 611 Section 6.2 Alphabetical Bibliography 618 Appendix A Installing MINIX 629 Section A.1 Preparation 629 Section A.2 Booting 631 Section A.3 Installing to the Hard Disk 632 Section A.4 Testing 634 Section A.5 Using a Simulator 636 Appendix B The MINIX Source Code 637 Appendix C Index to Files 1033 About the Authors 1053 About the MINIX CD InsideBackCover System Requirements InsideBackCover Hardware InsideBackCover Software InsideBackCover Installation InsideBackCover Product Support InsideBackCover Index Copyright [Page iv] Library of Congress Cataloging in Publication Data Tanenbaum, Andrew S Operating Systems: Design and Implementation / Andrew S Tanenbaum, Albert S Woodhull -3rd ed ISBN: 0-13-142938-8 Operating systems (Computers) I Woodhull, Albert S II Title QA76.76.O63T36 2006 005.4'3 dc22 Vice President and Editorial Director, ECS: Marcia J Horton Executive Editor: Tracy Dunkelberger Editorial Assistant: Christianna Lee Executive Managing Editor: Vince O'Brien Managing Editor: Camille Trentacoste Director of Creative Services: Paul Belfanti Art Director and Cover Manager: Heather Scott Cover Design and Illutsration: Tamara Newnam Managing Editor, AV Management and Production: Patricia Burns Art Editor: Gregory Dulles Manufacturing Manager, ESM: Alexis Heydt-Long Manufacturing Buyer: Lisa McDowell Executive Marketing Manager: Robin O'Brien Marketing Assistant: Barrie Reinhold © 2006, 1997, 1987 by Pearson Education, Inc Pearson Prentice Hall Pearson Education, Inc Upper Saddle River, NJ 07458 All rights reserved No part of this book may be reproduced in any form or by any means, without permission in writing from the publisher Pearson Prentice Hall® is a trademark of Pearson Education, Inc The authors and publisher of this book have used their best efforts in preparing this book These efforts include the development, research, and testing of the theories and programs to determine their effectiveness The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in this book The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs All rights reserved No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher Printed in the United States of America 10 Pearson Pearson Pearson Pearson Pearson Pearson Pearson Pearson Pearson Education Ltd., London Education Australia Pty Ltd., Sydney Education Singapore, Pte Ltd Education North Asia Ltd., Hong Kong Education Canada, Inc., Toronto Educación de Mexico, S.A de C.V Education-Japan, Tokyo Education Malaysia, Pte Ltd Education, Inc., Upper Saddle River, New Jersey Dedication To Suzanne, Barbara, Marvin, and the memory of Sweetie p and Bram AST To Barbara and Gordon ASW The MINIX Mascot Other operating systems have an animal mascot, so we felt MINIX ought to have one too We chose the raccoon because raccoons are small, cute, clever, agile, eat bugs, and are userfriendlyat least if you keep your garbage can well locked [Page xv] Preface Most books on operating systems are strong on theory and weak on practice This one aims to provide a better balance between the two It covers all the fundamental principles in great detail, including processes, interprocess communication, semaphores, monitors, message passing, scheduling algorithms, input/output, deadlocks, device drivers, memory management, paging algorithms, file system design, security, and protection mechanisms But it also discusses one particular systemMINIX 3a UNIX-compatible operating system in detail, and even provides a source code listing for study This arrangement allows the reader not only to learn the principles, but also to see how they are applied in a real operating system When the first edition of this book appeared in 1987, it caused something of a small revolution in the way operating systems courses were taught Until then, most courses just covered theory With the appearance of MINIX, many schools began to have laboratory courses in which students examined a real operating system to see how it worked inside We consider this trend highly desirable and hope it continues It its first 10 years, MINIX underwent many changes The original code was designed for a 256K 8088-based IBM PC with two diskette drives and no hard disk It was also based on UNIX Version As time went on, MINIX evolved in many ways: it supported 32-bit protected mode machines with large memories and hard disks It also changed from being based on Version 7, to being based on the international POSIX standard (IEEE 1003.1 and ISO 9945-1) Finally, many new features were added, perhaps too many in our view, but too few in the view of some other people, which led to the creation of Linux In addition, MINIX was ported to many other platforms, including the Macintosh, Amiga, Atari, and SPARC A second edition of the book, covering this system, was published in 1997 and was widely used at universities [Page xvi] The popularity of MINIX has continued, as can be observed by examining the number of hits for MINIX found by Google This third edition of the book has many changes throughout Nearly all of the material on principles has been revised, and considerable new material has been added However, the main change is the discussion of the new version of the system, called MINIX and the inclusion of the new code in this book Although loosely based on MINIX 2, MINIX is fundamentally different in many key ways The design of MINIX was inspired by the observation that operating systems are becoming bloated, slow, and unreliable They crash far more often than other electronic devices such as televisions, cell phones, and DVD players and have so many features and options that practically nobody can understand them fully or manage them well And of course, computer viruses, worms, spyware, spam, and other forms of malware have become epidemic To a large extent, many of these problems are caused by a fundamental design flaw in current operating systems: their lack of modularity The entire operatng system is typically millions of lines of C/C++ code compiled into a single massive executable program run in kernel mode A bug in any one of those millions of lines of code can cause the system to malfunction Getting all this code correct is impossible, especially when about 70% consists of device drivers, written by third parties, and outside the purview of the people maintaining the operating system With MINIX 3, we demonstrate that this monolithic design is not the only possibility The MINIX kernel is only about 4000 lines of executable code, not the millions found in Windows, Linux, Mac OSX, or FreeBSD The rest of the system, including all the device drivers (except the clock driver), is a collection of small, modular, user-mode processes, each of which is tightly restricted in what it can and with which other processes it may communicate While MINIX is a work in progress, we believe that this model of building an operating system as a collection of highly-encapsulated user-mode processes holds promise for building more reliable systems in the future MINIX is especially focused on smaller PCs (such as those commonly found in Third-World countries and on embedded systems, which are always resource constrained) In any event, this design makes it much easier for students to learn how an operating system works than attempting to study a huge monolithic system The CD-ROM that is included in this book is a live CD You can put it in your CD-ROM drive, reboot the computer, and MINIX will give a login prompt within a few seconds You can log in as root and give the system a try without first having to install it on your hard disk Of course, it can also be installed on the hard disk Detailed installation instructions are given in Appendix A [Page xvii] As suggested above, MINIX is rapidly evolving, with new versions being issued frequently To download the current CD-ROM image file for burning, please go to the official Website: www.minix3.org This site also contains a large amount of new software, documentation, and news about MINIX development For discussions about MINIX 3, or to ask questions, there is a USENET newsgroup: comp.os.minix People without newsreaders can follow discussions on the Web at http://groups.google.com/group/comp.os.minix As an alternative to installing MINIX on your hard disk, it is possible to run it on any one of several PC simulators now available Some of these are listed on the main page of the Website Instructors who are using the book as the text for a university course can get the problem solutions from their local Prentice Hall representative The book has its own Website It can be found by going to www.prenhall.com/tanenbaum and selecting this title We have been extremely fortunate in having the help of many people during the course of this project First and foremost, Ben Gras and Jorrit Herder have done most of the programming of the new version They did a great job under tight time constraints, including responding to e-mail well after midnight on many occasions They also read the manuscript and made many useful comments Our deepest appreciation to both of them Kees Bot also helped greatly with previous versions, giving us a good base to work with Kees wrote large chunks of code for versions up to 2.0.4, repaired bugs, and answered numerous questions Philip Homburg wrote most of the networking code as well as helping out in numerous other useful ways, especially providing detailed feedback on the manuscript People too numerous to list contributed code to the very early versions, helping to get MINIX off the ground in the first place There were so many of them and their contributions have been so varied that we cannot even begin to list them all here, so the best we can is a generic thank you to all of them Several people read parts of the manuscript and made suggestions We would like to give our special thanks to Gojko Babic, Michael Crowley, Joseph M Kizza, Sam Kohn Alexander Manov, and Du Zhang for their help Finally, we would like to thank our families Suzanne has been through this 16 times now Barbara has been through it 15 times now Marvin has been through it 14 times now It's kind of getting to be routine, but the love and support is still much appreciated (AST) Al's Barbara has been through this twice now Her support, patience, and good humor were essential Gordon has been a patient listener It is still a delight to have a son who understands and cares about the things that fascinate me Finally, step-grandson Zain's first birthday coincides with the release of MINIX Some day he will appreciate this (ASW) Andrew S Tanenbaum Albert S Woodhull [Page 1] Introduction Without its software, a computer is basically a useless lump of metal With its software, a computer can store, process, and retrieve information; play music and videos; send e-mail, search the Internet; and engage in many other valuable activities to earn its keep Computer software can be divided roughly into two kinds: system programs, which manage the operation of the computer itself, and application programs, which perform the actual work the user wants The most fundamental system program is the operating system, whose job is to control all the computer's resources and provide a base upon which the application programs can be written Operating systems are the topic of this book In particular, an operating system called MINIX is used as a model, to illustrate design principles and the realities of implementing a design A modern computer system consists of one or more processors, some main memory, disks, printers, a keyboard, a display, network interfaces, and other input/output devices All in all, a complex system Writing programs that keep track of all these components and use them correctly, let alone optimally, is an extremely difficult job If every programmer had to be concerned with how disk drives work, and with all the dozens of things that could go wrong when reading a disk block, it is unlikely that many programs could be written at all Many years ago it became abundantly clear that some way had to be found to shield programmers from the complexity of the hardware The way that has evolved gradually is to put a layer of software on top of the bare hardware, to manage all parts of the system, and present the user with an interface or virtual machine that is easier to understand and program This layer of software is the operating system [Page 2] The placement of the operating system is shown in Fig 1-1 At the bottom is the hardware, which, in many cases, is itself composed of two or more levels (or layers) The lowest level contains physical devices, consisting of integrated circuit chips, wires, power supplies, cathode ray tubes, and similar physical devices How these are constructed and how they work is the province of the electrical engineer Figure 1-1 A computer system consists of hardware, system programs, and application programs Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Object Off-line printing One-shot mode One-time password Open source Operating system as extended machine as resource manager characteristics client-server file systems history input/output layered memory management processes structure virtual machine Operating system concepts Optimal page replacement OS/360 2nd 3rd Ostrich algorithm Overlapped seek Overlays Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] P-threads Page directory Page fault Page fault frequency algorithm Page frame Page replacement algorithm aging clock first-in, first-out global least recently used local not recently used optimal page fault frequency second chance WSclock Page size Page table 2nd inverted multilevel Page table structure Page, virtual memory Paging design issues Pentium Parentboard 2nd 3rd 4th 5th 6th Partition 2nd Partition table Password challenge-response one-time salted Path name 2nd PDP-1 PDP-11 PDP-7 Penetration team Pentium, paging Pentium, virtual memory Performance, file system Periodic real time system Permission bits [See mode] Peterson's solution PFF [See Page Fault Frequency algorithm] Physical address Physical dump Physical identification PID Pipe Pixel Plug 'n Play Plug-in, browser PM [See Process Manager] Policy Policy versus mechanism 2nd Polling Ports, I/O [See I/O ports] POSIX header files Preamble, disk block Preemptable resource Preemptive scheduling Prepaging Preprocessor, C 2nd 3rd 4th 5th Present/absent bit Prevention of deadlock Primary partition Primitive, message 2nd 3rd 4th 5th 6th 7th 8th 9th Principal Principle of least privilege Printer daemon Priority inversion Priority scheduling Privacy PRIVATE 2nd 3rd Privilege level Process 2nd Process control block Process creation Process hierarchy Process implementation MINIX Process management MINIX Process manager 2nd data structures header files implementation initialization main program overview Process model Process scheduling MINIX 2nd Process state Process switch Process table 2nd Process termination Processor status word Producer-consumer problem Prompt Proportionality Protected mode Protection Protection domain Protection mechanism 2nd Pseudo terminal 2nd Pseudoparallelism PSW PUBLIC 2nd 3rd 4th 5th 6th 7th Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Quantum Queue(s) character input 2nd 3rd 4th 5th input 2nd multilevel in MINIX 2nd 3rd 4th 5th 6th 7th 8th multiple process send 2nd timer Quickfit algorithm Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Race condition RAID [See Redundant Array of Inexpensive Disks] RAM disk 2nd Random access file Raw mode 2nd Read Only Memory Readers-and-writers problem Real time system 2nd Real-time scheduling Recycle bin Redundant array of inexpensive disks Reference monitor Referenced bit Regular file Reincarnation server 2nd 3rd Relative path name Reliability, file system Relocation, memory Rendezvous Reserved suffix Resource fungible 2nd nonpreemptable preemptable Resource deadlock Resource manager Resource trajectory Response time Right capability generic 2nd RISC 2nd 3rd Role ROM [See Read Only Memory] Root directory Root file system Round-robin scheduling RS [See Reincarnation Server] RS232 terminal Run-to-completion scheduling RWX bits 2nd [See also mode] Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Safe state Salted password SATA [See Serial AT Attachment] Scan code Schedulable system Scheduler Scheduling batch system categories of algorithms fair-share first-come first-served goals guaranteed interactive system lottery MINIX multiple queue nonpreemptive 2nd policy vs mechanism preemptive 2nd 3rd 4th priority process real-time system round-robin shortest job first shortest process next shortest remaining time next thread three level XDS 2nd Scheduling algorithm Scheduling mechanism Scheduling policy Scrolling SCSI 2nd Second chance paging algorithm Second generation computer Security access control list capability design principles physical identification protection mechanisms 2nd 3rd 4th 5th viruses worms Security attack Security flaws Security threat Segment data 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th descriptor table Intel versus MINIX 2nd memory register stack 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th text 2nd 3rd 4th 5th 6th 7th 8th 9th Segmentation Segmentation, implementation Pentium Semaphore Separate I and D space Sequential access file Sequential process Serial AT Attachment Serial line Server 2nd Service MINIX Session leader SETUID bit 2nd 3rd 4th 5th 6th 7th 8th Shared library Shared text 2nd MINIX Shebang Shell 2nd Shortcut Shortest job first scheduling Shortest process next scheduling Shortest remaining time next scheduling Shortest seek first algorithm Signal 2nd 3rd Signal handler Signal handling, MINIX Signals, implementation in MINIX Single large expensive disk SLED [See Single Large Expensive Disk] Sleep and wakeup Sleep primitive Soft real time Software interrupt Software scrolling Source code organization, MINIX Sparse file Special file Spin lock Spooling 2nd Spooling directory 2nd Spyware Square-wave mode SSF [See Shortest Seek First algorithm] Stack segment Standard C [See ANSI C] Standard input Standard output Starvation State Static Status bit Strict alternation Striping, disk Strobed register Stty command 2nd 3rd Subject Subpartition table 2nd 3rd 4th Superblock 2nd Superuser Supervisor call Supervisor mode Swapping Symbolic link Synchronization Synchronous alarm Synchronous input/output System availability System call 2nd 3rd directory management file management process management signaling System image [See Boot image] System library, MINIX System notification message System process System task, MINIX 2nd System V Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Tagged architecture Task Task state segment 2nd Terminal driver, MINIX Terminal hardware Terminal input, MINIX Terminal mode Terminal output, MINIX Terminal software Termios structure 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Text segment Third generation computer Thompson, Ken Thrashing Threads C-threads P-threads Threat, security Three-level scheduling Throughput Tiger team Timer user-space in MINIX Timers, implementation in MINIX Timesharing TLB [See Translation Lookaside Buffer] Track-at-a-time caching Translation lookaside buffer Trap 2nd Trapdoor Triple indirect block Trojan horse TSL instruction TSS [See Task State Segment] Turnaround time Two-phase locking Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] UART [See Universal Asynchronous Receiver Transmitter] UID [See User IDentification] Uniform interface, input/output device Uniform naming Universal asynchronous receiver transmitter Universal coordinated time UNIX beginning of time boot block deadlock device driver device numbers directories 2nd error reporting file system 2nd file system caching file system consistency files history i-nodes interprocess communication link system call mounted file systems paging passwords process structure processes 2nd scripts signals 2nd structure terminal I/O threads User authentication User identification User mode 2nd User-friendliness User-level I/O software, MINIX UTC [See Universal Coordinated Time] Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Vector I/O request 2nd 3rd 4th interrupt Video controller Video RAM Virtual address Virtual address space Virtual console Virtual machine 2nd 3rd Virtual machine monitor Virtual memory 2nd design issues page replacement algorithms paging Pentium segmentation working set model Virtual memory interface Virus VM/370 Volume boot code Von Neumann, John Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Wakeup primitive Wakeup waiting bit Watchdog timer MINIX Wildcard Windows 2nd 3rd 4th 5th 6th 7th 8th Windows 2000 2nd Windows 98 2nd 3rd Windows NT 2nd Windows XP 2nd 3rd 4th Working directory 2nd Working set model Workstation Worm Worst-fit algorithm Write-through cache WSclock algorithm WSclock page replacement algorithm Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] XDS 2nd XWindow system Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Z] Zilog Z80 Zombie state Zuse, Konrad ... Publication Data Tanenbaum, Andrew S Operating Systems: Design and Implementation / Andrew S Tanenbaum, Albert S Woodhull -3rd ed ISBN: 0-13-142938-8 Operating systems (Computers) I Woodhull, Albert... Introduction Section 1.1 What Is an Operating System? Section 1.2 History of Operating Systems Section 1.3 Operating System Concepts 19 Section 1.4 System Calls 26 Section 1.5 Operating System Structure.. .Operating Systems Design and Implementation, Third Edition By Andrew S Tanenbaum - Vrije Universiteit Amsterdam, The Netherlands,