POSIX 4: Programming for the Real World Bill O Gallmeister O'Reilly & Associates, Inc 103 Morris Street, Suite A Sebastopol, CA 95472 POSIX.4: Programming for the Real World by Bill O Gallmeister Copyright © 1995 O'Reilly & Associates, Inc All rights reserved Printed in United States of America Editor: Mike Loukides Production Editor: Clairemarie Fisher O'Leary Printing History: January 1995: First Edition Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks and The Java TM Series is a trademark of O'Reilly & Associates, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O'Reilly & Associates, Inc was aware of a trademark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein This book is printed on acid-free paper with 85% recycled content, 15% post-consumer waste O'Reilly & Associates is committed to using paper with the highest recycled content available consistent with high quality ISBN: 1-56592-074-0 [11/98] To my son, Ian, and my father, who was right Page vii Table of Contents Preface xv PART I: Programming for the Real World Introduction What's POSIX? Doing More the POSIX Way The POSIX Environment The Applications Are Varied The Problems Are the Same 13 Some Solutions to These Problems 16 What POSIX Does Not Do 16 The POSIX Way 19 What POSIX Is 19 Compile-Time Checking 24 Run-Time Checking 30 Headers and Namespace Pollution 36 Who's Got What? 39 Conclusion 39 The Basics of Real-Time: Multiple Tasks 41 Doing Everything at Once 41 Running in Cycles 43 Multiple Processes 47 Signals 55 Page viii Conclusion 82 Exercises 82 Better Coordination: Messages, Shared Memory, and Synchronization 85 Communicating Between Processes 85 POSIX.1 Communication: Pipes and FIFOs 88 System V Message Queues 94 POSIX.4 Message Queues 94 POSIX.4 Shared Memory and File Mapping 110 Synchronizing Multiple Processes 129 Conclusion 146 Exercises 146 On Time: Scheduling, Time, and Memory Locking 149 Trying to Make Things Happen On Time 149 Rates and Responses 151 Standard Scheduling Solutions Under UNIX 153 Portable Real-Time Scheduling: the POSIX.4 Scheduling Interfaces 159 Making Things Happen On Time 171 Keeping Your Memory Handy: UNIX and POSIX Memory Locking 193 Brass Tacks 200 Nice but Not Necessary: How to Make the Time Readable 207 Conclusion 209 Exercises 209 I/O for the Real World 213 I/O Is Everything 213 I/O in Real-Time Systems 213 y UNIX Has a Problem with Real-Time I/O 214 Standard UNIX Facilities for Real-World I/O 217 Achieving Greater Control over File Operations 219 Asynchronous I/O: I/O While You Don't Wait 224 Deterministic I/O 245 Conclusion 248 Exercises 248 Page ix Performance, or How to Choose an Operating System 251 Performance in Real-Time Systems 252 Measuring the Right Thing 258 Metrics for POSIX Systems 260 Conclusion 272 Exercises 273 PART II: Manpages 277 281 283 290 293 295 297 303 306 307 311 aio_cancel 315 aio_error 317 aio_read 319 aio_return 322 aio_suspend 324 aio_write 326 clock getres 329 _g clock_gettime 331 clock_settime 333 close 335 exec 337 exit 341 fdatasync 343 fork 345 fsync 348 kill 351 lio_listio 354 mkfifo 358 mlock 360 Page x mlockall 363 mmap 365 mprotect 370 mq_close 373 mq_getattr 375 mq_notify 377 mq_open 379 mq_receive 383 mq_send 385 mq_setattr 387 msync 389 munlock 392 munlockall 394 munmap 396 nanosleep 398 pathconf, fpathconf 400 pipe 403 sched_get_priority_max 405 sched_get_priority_min 407 sched_getparam 409 sched_getscheduler 411 sched_rr_get_interval 413 sched_setparam 415 sched_setscheduler 417 sched yield 420 Page 516 /* Jitter wasn't measured on the first signal */ printf("Average jitter: %f nsec\n", totaltime / (float)(nsig-1)); if (jbuf.next) { /* There was jitter */ start_sec = ta[0].tv_sec; for (i=0; i= 0) { if ((ta[this].tv_sec != ta[prev].tv_sec + secs) || (ta[this].tv_nsec != ta[prev].tv_nsec + nsecs)) { /* There seems to have been jitter Verify */ if ((ta[this].tv_sec==ta[prev].tv_sec + 1) && (ta[this].tv_nsec == 0)) /* No jitter; the seconds just clicked over */ goto skip; /* Calculate the amount of jitter */ jbuf.j[jbuf.next].tv_sec = ta[this].tv_sec - ta[prev].tv_sec; jbuf.j[jbuf.next].tv_nsec = ta[this].tv_nsec - ta[prev].tv_nsec; jbuf.next++; if (jbuf.next == JITTERBUF_MAX) { ctrlc(0); /* Terminate */ } } skip: prev = this; this++; if (this == TIMEBUF_MAX) this = 0; } } Page 517 Chapter 6: I/O for the Real World iopipe.c #define _POSIX_C_SOURCE 199309L #include #include #include #include /* * This program copies its standard input to standard output * using read and write BUFSIZE bytes at a time */ #define BUFSIZE (64*1024) char buf[BUFSIZE]; int main(int argc, char **argv) { ssizet nbytes; fcntl(fileno(stdout), F_GETFL, &flags); flags |= O_DSYNC; /* Just get the data down */ fcntl(fileno(stdout), F_SETFL, flags); while (1) { nbytes = read(fileno(stdin), buf, BUFSIZE); if (nbytes si_value.sival_ptr); /* No need to call aio_error here know AIO's done */ nbytes_read = aio_return(&ap->a); if (nbytes_read > 0) { /* Read some data, so turn around and write it out */ ap->a.aio_fildes = fileno(stdout); ap->a.aio_buf = bufl; ap->a.aio_nbytes = nbytes_read; ap->a.aio_offset = ap->curr_offset; ap->a.aio_reqprio = 0; ap->a.aio_sigevent.sigev_notify = SIGEV_SIGNAL; ap->a.aio_sigevent.sigev_signo = SIG_AIO_WRITE_DONE; ap->a.aio_sigevent.sigev_value.sival_ptr = (void *)ap; aio_write(&ap->a); } else { aio_to_go ; } } /* Called when this write is complete */ void aiowrite_done(int signo, siginfo_t *info, void *ignored) { struct aiocb_plus *ap; ssize_t nbytes_written; ap = (struct aiocb_plus *)(info->si_value.sival_ptr); /* No need to call aio_error here know AIO's done */ nbytes_written = aio_return(&ap->a); /* Fire up another aio_read, skipping the data being read by our peer */ ap->a.aio_fildes = fileno(stdout); ap->a.aio_buf = bufl; ap->a.aio_nbytes = BUFSIZE; Page 519 ap->curr_offset += 2*BUFSIZE; ap->a.aio_offset = ap->curr_offset; ap->a.aio_reqprio = 0; ap->a.aio_sigevent.sigev_notify = SIGEV_SIGNAL; ap->a.aio_sigevent.sigev_signo = SIG_AIO_READ_DONE; ap->a.aio_sigevent.sigev_value.sival_ptr = (void *)ap; aio_read(&ap->a); } main(int argc, char **argv) { sigset_t allsigs; struct sigaction sa; /* Handler for read completions */ sa.sa_sigaction = aioread_done; sa.sa_flags = SA_SIGACTICN; /* Prevent the WRITE signal from coming in while we're handling * a READ completion Just to keep things more clean */ sigemptyset(&sa.sa_mask); sigaddset (&sa.sa_mask, SIG_AIO_WRITE_DONE); if (sigaction(SIG_AIO_READ_DONE, &sa, NULL) < 0) { perror("sigaction"); exit(1); } /* Handler for write completions */ sa.sa_sigaction = aiowrite_done; sa.sa_flags = SA_SIGACTION; /* Prevent the READ signal from coming in while we're handling * a WRITE completion Just to keep things more clean */ sigemptyset(&sa.sa_mask); sigaddset(&sa.sa_mask, SIG_AIO_READ_DONE); if (sigaction(SIG_AIO_WRITE_DONE, &sa, NULL) < 0) { perror("sigaction"); exit(1); } /* Block these signals from the mainline code so we can safely * examine the global variable aio_to_go */ sigemptyset(&allsigs); sigaddset(&allsigs, SIG_AIO_READ_DONE); sigaddset(&allsigs, SIG_AIO_WRITE_DONE); sigprocmask(SIG_BLOCK, &allsigs, NULL); aio_to_go = 2; /* Global flag */ fcntl(fileno(stdout), F_GETFL, &flags); flags |= O_DSYNC; /* Just get the data down */ fcntl(fileno(stdout), F_SETFL, flags); /* Set up asynchronous I/O */ a1.a.aio_fildes = fileno(stdin); a1.a.aio_buf = a1.buffer = buf1; a1.a.aio_nbytes = BUFSIZE; a1.a.aio_offset = a1.curr_offset = (off_t)0; Page 520 a1.a.aio_reqprio = 0; a1.a.aio_sigevent.sigev_notify = SIGEV_SIGNAL; a1.a.aio_sigevent.sigev_signo = SIG_AIO_READ_DONE; a1.a.aio_sigevent.sigev_value.sival_ptr = (void *)&a1; aio_read(&a1.a); a2.a.aio_fildes = fileno(stdin); a2.a.aio_buf = a2.buffer = buf2; a2.a.aio_nbytes = BUFSIZE; a2.a.aio_offset = a2.curr_offset = (off_t)BUFSIZE; a2.a.aio_reqprio = 0; a2.a.aio_sigevent.sigev_notify = SIGEV_SIGNAL; a2.a.aio_sigevent sigev_signo = SIG_AIO_READ_DONE; a2.a.aio_sigevent.sigev_value.sival_ptr = (void *)&a2; aio_read(&a2.a); /* Let the signals take it from here! */ sigemptyset(&allsigs); /* Mask no signals when we suspend */ while (aio_to_go) { sigsuspend(&allsigs); } exit(0); } Chapter 7: Performance, or How To Choose an Operating System noswitch.c #define _POSIX_C_SOURCE #include #include #include #include #include 199309 /* POSIX 9/1993: 1, */ /* * Measure overhead of using the sched_yield call */ int nyield = 0; main() { struct sigaction sa; extern void alarm_handler(); sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sa.sa_handler = alarm_handler; /* Terminates experiment */ Page 521 if (sigaction(SIGALRM, &sa, NULL) < 0) { perror("sigaction SIGALRM"); exit(1); } switcher(); /* with self ==> no ctxtsw */ fprintf(stderr, "Unexpected exit from test program!\n"); exit(4); } /* Should never wake up from the sigsuspend SIGUSR1 is blocked */ switcher() { alarm(60); while (1) { sched_yield(); nyield++; } } void alarm_handler() { printf("%d yield calls in 60 seconds = %d yield calls/sec\n", nyield, nyield / 60); exit(0); } switch c #define _POSIXC_SOURCE #include #include #include #include 199309 /* POSIX 9/1993: 1, */ /* * Measure context switch time using the sched_yield call */ int nswitch = 0; pid_t chpid; main() { struct sigaction sa; extern void alarm_handler(), child_terminate(); sigemptyset (&sa.sa_mask); sa.sa_flags = 0; sa.sa_handler = alarm_handler; /* Terminates experiment */ if (sigaction(SIGALRM, &sa, NULL) < 0) { perror("sigaction SIGALRM"); Page 522 exit(1); } sa.sa_handler = child_terminate; /* Terminates child */ sigfillset(&sa.sa_mask);/* Take no signals after experiment done */ if (sigaction(SIGUSR2, &sa, NULL) < 0) { perror("sigaction SIGUSR2"); exit(1); } /* Should set scheduler here, or use atprio */ switch (chpid = fork()) { case -1: /* error */ perror("fork"); exit(3); break; default: /* parent, set alarm and fall through * to common case */ alarm(60); case 0: /* everybody */ switcher(); exit(0); break; } fprintf(stderr, "Unexpected exit from test program!\n"); exit(4); } /* Should never wake up from the sigsuspend SIGUSR1 is blocked */ switcher() { while (1) { sched_yield(); nswitch++; } } child_terminate() { printf("%d switches in 60 seconds = %d switch/sec\n", nswitch, nswitch / 60); exit(0); } void alarm_handler() { printf("%d switches in 60 seconds = %d switch/sec\n", nswitch, nswitch / 60); kill(chpid, SIGUSR2); exit(0); } Page 523 sending_sigs_self.c #include #include #include #include /* * Measure time sent just for one process to send signals to itself * Avoid all signal-handling overhead by having the signal blocked */ int nsigs = 0; main() { struct sigaction sa; sigset_t blockem; extern void panic_handler(), alarm_handler(), child_terminate(); sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sa.sa_handler = alarm_handler; /* Terminates experiment */ if (sigaction(SIGALRM, &sa, NULL) < 0) { perror ("sigaction SIGALRM"); exit(1); } /* Should never _receive_ SIGUSR1 it's blocked * Setting up this handler is just paranoia */ sa.sa_handler = panic_handler; if (sigaction(SIGUSR1, &sa, NULL) < 0) { perror("sigaction SIGUSR1"); exit(1); } sigemptyset(&blockem); sigaddset(&blockem, SIGUSR1); if (sigprocmask(SIG_BLOCK, &blockem) < 0) { perror ("sigprocmask"); exit(2); } send_sigs_self(); fprintf(stderr, "Unexpected exit from test program!\n"); exit(4); } send_sigs_self() { pid_t self = getpid(); alarm(60); while(1) { Page 524 if (kill(self, SIGUSR1) < 0) { perror("kill"); return; } nsigs++; } } void panic_handler(int sig) { char *signame; switch (sig) { case SIGUSR1: signame = "SIGUSR1; break; case SIGUSR2: signame = "SIGUSR2"; break; case SIGALRM: signame = "SIGALRM"; break; default: signame = ""; break; } printf("ERROR: received signal %d (%s)\n", sig, signame); exit(5); } void alarm_handler() { printf("%d signals sent (%d/sec)\n", nsigs, nsigs/60); exit(0); } sending_recving_sigs_self.c #include #include #include #include /* * Measure time sent just for one process to send signals to itself * AND handle those signals */ int nsigs_sent = 0; int nsigs_recv = 0; main() { struct sigaction sa; sigset_t blockem; extern void null_handler(), alarm_handler(); sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sa.sa_handler = alarm_handler; /* Terminates experiment */ if (sigaction(SIGALRM, &sa, NULL) < 0) { Page 525 perror("sigaction SIGALRM"); exit(1); } /* Should never _receive_ SIGUSR1 it's blocked * Setting up this handler is just paranoia */ sa.sa_handler = null_handler; if (sigaction(SIGUSR1, &sa, NULL) < 0) { perror("sigaction SIGUSR1"); exit(1); } send_sigs_self(); fprintf(stderr, "Unexpected exit from test program!\n"); exit(4); } send_sigs_self() { pid_t self = getpid(); alarm(60); while(1) { if (kill(self, SIGUSR1) < 0) { perror("kill"); return; } nsigs_sent++; } } void null_handler() { nsigs_recv++; } void alarm_handler() { printf("%d signals sent (%d/sec)\n", nsigs_sent, nsigs_sent/60); printf("%d signals received (%d/sec)\n", nsigs_recv, nsigs_recv/60); exit(0); } sending_sigs.c #include #include #include #include /* * Measure time sent just for one process to send signals to another process Page 526 * Avoid all overhead on the child side by having the signal blocked */ int nsigs = 0; pid_t chpid; main() { struct sigaction sa; sigset_t blocken; extern void panic_handler(), alarm_handler(), child_terminate(); sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sa.sa_handler = alarm_handler; /* Terminates experiment */ if (sigaction(SIGALRM, &sa, NULL) < 0) { perror("sigaction SIGALRM"); exit(1); } /* No one should ever _receive_ SIGUSR1 it's blocked * Setting up this handler is just paranoia */ sa.sa_handler = panic_handler; if (sigaction(SIGUSR1, &sa, NULL) < 0) { perror ("sigaction SIGUSR1"); exit(1); } sa.sa_handler = child_terminate; /* Terminates child */ sigfillset(&sa.sa_mask);/* Take no signals after experiment done */ if (sigaction(SIGUSR2, &sa, NULL) < 0) { perror("sigaction SIGUSR2"); exit(1); } sigemptyset(&blockem); sigaddset(&blockem, SIGUSR1); if (sigprocmask(SIG_BLOCK, &blockem) < 0) { perror ("sigprocmask"); exit(2); } switch (chpid = fork()) { case -1: /* error */ perror("fork"); exit(3); break; case 0: /* child */ be_a_child(); exit(0); break; default: /* parent */ be_the_parent(); exit(0); break; Page 527 } fprintf(stderr, "Unexpected exit from test program!\n"); exit(4); } /* Should never wake up from the sigsuspend SIGUSR1 is blocked */ be_a_child() { sigset_t sigset; sigfillset(&sigset); sigdelset(&sigset, SIGUSR2); /* Wait for only SIGUSR2 */ while (1) { sigsuspend(&sigset); } } be_the_parent() { alarm(60); while(1) { if (kill(chpid, SIGUSR1) < 0) { perror("kill"); return; } nsigs++; } } void panic_handler(int sig) { char *signame; switch (sig) { case SIGUSR1: signame = "SIGUSR1"; break; case SIGUSR2: signame = "SIGUSR2"; break; case SIGALRM: signame = "SIGALRM"; break; default: signame = ""; break; } printf("ERROR: Child received signal %d (%s)\n", sig, signame); kill(getppid(), SIGALRM); /* Terminate experiment */ exit(1); } void child_terminate() { exit(0); } void alarm_handler() { printf("%d signals sent by parent (%d/sec)\n", nsigs, nsigs/60); Page 528 kill(chpid, SIGUSR2); exit(0); } sigs_sent_noswtch.c, sigs_sent_swtch.c The code for these exercises can be found in the section for Chapter 3, earlier in this appendix Page 529 Bibliography Throughout the book, I've referred to several books for additional information or just plain fun bedside reading Here are the complete references Bach, Maurice J The Design of the UNIX Operating System Prentice-Hall 0-13-201799-7 This is one of the canonical references to how UNIX is put together Frisch, Æleen Essential System Administration O'Reilly & Associates 0-937175-80-3 Institute of Electrical and Electronics Engineers Portable Operating System Interface (POSIX)—Part 1: System Application Program Interface (API) [C Language] Institute of Electrical and Electronics Engineers 1-55937061-0 This is POSIX.1, also known as ISO 9945-1 (1990) Institute of Electrical and Electronics Engineers Portable Operating System Interface (POSIX)—Part 2: Shell and Utilities (volumes and 2) Institute of Electrical and Electronics Engineers 1-55937-255-9 This is POSIX.2 It's not an ISO standard yet Institute of Electrical and Electronics Engineers Portable Operating System Interface (POSIX)—Part 1: Application Program Interface (API) [C Language]—Amendment: Realtime Extensions Institute of Electrical and Electronics Engineers 1-55937-375-X This is POSIX.4 Jain, Raj The Art of Computer Systems Performance Analysis John Wiley and Sons, Inc 0-471-50336-3 If you're really serious about performance measurement, you should read this book Klein, Mark H., Ralya, Thomas, Pollak, Bill, Obenza, Ray, Harbour, Michael González A Practitioner's Handbook for Real-Time Analysis: Guide to Rate Monotonic Analysis for Page 530 Real-Time Systems Kluwer Academic 0-7293-9361-9 This book tells you everything you might want to know about Rate Monotonic Analysis Leffler, Samuel J., McKusick, Kirk, Karels, Mike, Quarterman, John The Design and Implementation of the 4.3BSD UNIX Operating System Addison-Wesley 0-201-06196-1 This is the other canonical reference to how UNIX is put together Lewine, Don POSIX Programmer's Guide O'Reilly & Associates 0-937175-73-0 Zlotnick, Fred The POSIX.1 Standard: A Programmer's Guide Benjamin Cummings 0-8053-9605-5 .. .POSIX 4: Programming for the Real World Bill O Gallmeister O'Reilly & Associates, Inc 103 Morris Street, Suite A Sebastopol, CA 9 547 2 POSIX. 4: Programming for the Real World by Bill... sched_setparam 41 5 sched_setscheduler 41 7 sched yield 42 0 _y sem_close 42 2 sem_destroy 42 4 sem_getvalue 42 6 sem_init 42 8 sem_open 43 1 sem_post 43 5 sem_unlink 43 7 sem_wait, sem_trywait 43 9 sh_open 44 1 shm_unlink... shm_unlink 44 5 sigaction 44 7 sigprocmask 45 1 sigset 45 3 sigsuspend 45 5 sigwaitinfo 45 7 Page xi sysconf 46 0 timer_create 46 3 timer_delete 46 6 timergetoverrun 46 8 timer_gettime 47 0 timer_settime 47 2 wait,