Ruby is a powerful programming language with a focus on simplicity, but beneath its elegant syntax it performs countless unseen tasks Ruby Under a Microscope gives you a hands-on look at Ruby’s core, using extensive diagrams and thorough explanations to show you how Ruby is implemented (no C skills required) Author Pat Shaughnessy takes a scientific approach, laying out a series of experiments with Ruby code to take you behind the scenes of how programming languages work You’ll even find information on JRuby and Rubinius (two alternative implementations of Ruby), as well as in-depth explorations of Ruby’s garbage collection algorithm Ruby Under a Microscope will teach you: How a few computer science concepts underpin Ruby’s complex implementation How Ruby executes your code using a virtual machine How classes and modules are the same inside Ruby How Ruby employs algorithms originally developed for Lisp How Ruby uses grammar rules to parse and understand your code How your Ruby code is translated into a different language by a compiler No programming language needs to be a black box Whether you’re already intrigued by language implementation or just want to dig deeper into Ruby, you’ll find Ruby Under a Microscope a fascinating way to become a better programmer About the Author Well known for his coding expertise and passion for the Ruby programming language, Pat Shaughnessy blogs and writes tutorials at http://patshaughnessy.net/ He also develops Ruby applications at management consulting firm McKinsey & Co Shaughnessy is a regular presenter on the Ruby conference circuit, and his articles and presentations have been featured in the Ruby Weekly newsletter, the Ruby5 podcast, and The Ruby Show “I LIE FLAT.” This book uses RepKover — a durable binding that won’t snap shut T H E F I N E ST I N G E E K E N T E RTA I N M E N T ™ w w w.nostarch.com $39.95 ($41.95 CDN) Shelve In: Programming Languages/Ruby Shaughnessy Covers Ruby 2.x, 1.9, and 1.8 Ruby Under a Microscope How Ruby Works Under the Hood Ruby Under a Microscope An Illustrated Guide to Ruby Internals Pat Shaughnessy s e m i t n s t pu end | n | Advance Praise for Ruby Under a Microscope “Many people have dug into the Ruby source code, but few make it back out and tell the tale as elegantly as Pat does in Ruby Under a Microscope! I particularly love the diagrams—and there are lots of them—as they make many opaque implementation topics a lot easier to understand, especially when coupled with Pat’s gentle narrative This book is a delight for language implementation geeks and Rubyists with a penchant for digging into the guts of their tools.” —Peter Cooper (@ peterc), Editor of Ruby Inside and Ruby Weekly “Man, this book was missing in the Ruby landscape—awesome content.” —X avier Noria (@ fxn), Ruby Hero, Ruby on R ails Core Team Member “Pat Shaughnessy did a tremendous job writing THE book about Ruby internals Definitely a must read—you won’t find information like this anywhere else.” —Santiago Pastorino (@ spastorino), W yeWorks Co -Founder, Ruby on R ails Core Team Member “I really enjoyed the book and now have a far better understanding of both Ruby and CS The writing made very complex topics (at least for me) very accessible, and I found the book hard to put down Diagrams were awesome and are already popping in my head as I code This is by far one of my top favourite Ruby books.” —Vlad Ivanovic (@ vladiim), Digital Strategist at Holler S ydney “While I’m not usually digging into Ruby Internals, this book was an absolutely awesome read.” —David Deryl Downey (@ daviddwdowney), Founder of C yber Space Technologies Group Ruby Under a Microscope An Illustrated Guide to Ruby Internals Pat Shaughnessy Ruby Under a Microscope Copyright © 2014 by Patrick Shaughnessy All rights reserved No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher Printed in USA First printing 17 16 15 14 13 ISBN-10: 1-59327-527-7 ISBN-13: 978-1-59327-527-3 Publisher: William Pollock Production Editor: Riley Hoffman Cover Illustration: Charlie Wylie Interior Design: Octopod Studios Developmental Editor: William Pollock Technical Reviewer: Aaron Patterson Copyeditor: Julianne Jigour Compositors: Susan Glinert Stevens and Riley Hoffman Proofreader: Elaine Merrill For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc directly: No Starch Press, Inc 245 8th Street, San Francisco, CA 94103 phone: 415.863.9900; fax: 415.863.9950; info@nostarch.com; www.nostarch.com Library of Congress Cataloging-in-Publication Data Shaughnessy, Pat Ruby under a microscope : an illustrated guide to Ruby internals / by Pat Shaughnessy pages cm Summary: "An under-the-hood look at how the Ruby programming language runs code Extensively illustrated with complete explanations and hands-on experiments Covers Ruby 2.x" Provided by publisher ISBN 978-1-59327-527-3 (paperback) ISBN 1-59327-527-7 (paperback) Ruby (Computer program language) I Title QA76.73.R83S53 2013 005.1'17 dc23 2013030614 No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc Other product and company names mentioned herein may be the trademarks of their respective owners Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark The information in this book is distributed on an “As Is” basis, without warranty While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it To my wife, Cristina; my daughter, Ana; and my son, Liam— thanks for supporting me all along About the Author Pat Shaughnessy is a Ruby developer working at McKinsey & Co., a management consulting firm Pat was originally trained as a physicist at MIT, but later spent more than 20 years working as a software developer using C, Java, PHP, and Ruby, among other languages Writing Ruby Under a Microscope has given him an excuse to reuse bits of his scientific training while studying Ruby A fluent Spanish speaker, Pat frequently visits his wife’s family in northern Spain He lives outside of Boston with his wife and two children Brief Contents Foreword by Aaron Patterson xv Acknowledgments xvii Introduction xix Chapter 1: Tokenization and Parsing Chapter 2: Compilation 31 Chapter 3: How Ruby Executes Your Code 55 Chapter 4: Control Structures and Method Dispatch 83 Chapter 5: Objects and Classes 105 Chapter 6: Method Lookup and Constant Lookup 133 Chapter 7: The Hash Table: The Workhorse of Ruby Internals 167 Chapter 8: How Ruby Borrowed a Decades-Old Idea from Lisp 191 Chapter 9: Metaprogramming 219 Chapter 10: JRuby: Ruby on the JVM 251 Chapter 11: Rubinius: Ruby Implemented with Ruby 273 Chapter 12: Garbage Collection in MRI, JRuby, and Rubinius 295 Index 327 Con t e n t s in De ta il Foreword by Aaron Patterson xv Acknowledgments xvii Introduction xix Who This Book Is For Using Ruby to Test Itself Which Implementation of Ruby? Overview Tokenization and Parsing xx xx xxi xxi Tokens: The Words That Make Up the Ruby Language The parser_yylex Function Experiment 1-1: Using Ripper to Tokenize Different Ruby Scripts Parsing: How Ruby Understands Your Code 12 Understanding the LALR Parse Algorithm 13 Some Actual Ruby Grammar Rules 20 Reading a Bison Grammar Rule 22 Experiment 1-2: Using Ripper to Parse Different Ruby Scripts 23 Summary 29 Compilation 31 No Compiler for Ruby 1.8 Ruby 1.9 and 2.0 Introduce a Compiler How Ruby Compiles a Simple Script Compiling a Call to a Block How Ruby Iterates Through the AST Experiment 2-1: Displaying YARV Instructions The Local Table Compiling Optional Arguments Compiling Keyword Arguments Experiment 2-2: Displaying the Local Table Summary How Ruby Executes Your Code YARV’s Internal Stack and Your Ruby Stack Stepping Through How Ruby Executes a Simple Script Executing a Call to a Block Taking a Close Look at a YARV Instruction Experiment 3-1: Benchmarking Ruby 2.0 and Ruby 1.9 vs Ruby 1.8 32 33 34 38 42 44 46 48 49 51 53 55 56 58 61 63 65 Summary This chapter has covered one of the most important but least understood areas of Ruby internals: garbage collection We learned that garbage collectors allocate memory for new objects and clean up unused garbage objects We examined the basic algorithms used by MRI, Rubinius, and JRuby for garbage collection and discovered that MRI allocates and reclaims memory using a free list, while Rubinius and the JVM use the semi-space algorithm We also saw how Rubinius and JRuby employ concurrent and generational GC techniques, which MRI starts to use in Ruby 2.1 But we’ve only scratched the surface of garbage collection Since its invention in 1960, many complex GC algorithms have been developed; indeed, garbage collection is still an active area of computer science research The GC implementations in MRI, Rubinius, and JRuby are likely to continue to evolve and improve over time Garbage Collection in MRI, JRuby, and Rubinius 325 Index Symbols & operator, 47 $& special variable, 76 * (splat) operator, 47 A abstract syntax tree (AST), 23–29, 32–44 See also AST nodes algorithm constant lookup, 162, 163–164 Immix, 315 LALR parse, 13–19 method lookup, 138–151 semi-space, 311–312 allocator pointer, 126 ancestors method, 259 Appleby, Austin, 183 args_add_block AST node, 25 arguments to a block, 96 default values for, 47 keyword compiling, 49 exploring how Ruby implements, 99–103 method, 70–71 optional, 48, 96 preparing, for normal Ruby methods, 95 unnamed, 47, 96 ARGV array, 65, 75, 79, 262 Array (C++ class), 287 Array class, 294–291 arrays, in Rubinius and MRI, 284–287 Array#sample method, 260–263 Array#shift method, 288–291 AST (abstract syntax tree), 23–29, 32–44 See also AST nodes AST nodes, 23–29, 32–44 args_add_block, 25 binary, 27–28 NEW_CALL, 22–23 NODE_CALL, 23, 26, 35–43 NODE_DVAR, 41 NODE_FCALL, 34–38, 41–43 NODE_ITER, 39–41 NODE_SCOPE, 34–44, 46 attr_accessor method, 98–99, 116 attribute get and set methods, 94, 98–99 attribute names, 116–119 attribute names table, 125, 127 attr_reader method calling, 97–98 optimization by method dispatch, 98–99 ATTRSET methods, 94, 98–99 attr_writer method calling, 97–98 optimization by method dispatch, 98–99 autoload keyword, 164 B backtraces, in Rubinius and MRI, 281–284 Baker, Henry, 309 BasicObject class, 259 benchmarking Ruby versions, 65–67 binary AST node, 27–28 bin density (in a hash table), 175 Binding class, 208 binding keyword, 238 bins (in a hash table), 169 Bison, 12, 22–23 bitmap marking, 299–300 Blackburn, Stephen M., 324 block arguments, 96 blocks calling a method with, 71–72 as closures, 192–198 calling, 61–62, 194–196 compiling calls to, 38–44 lexical scope for, representation by Ruby, 244–245 vs while loops, speed of, 200–203 BMETHOD methods, 94 brace_block grammar rule, 21 branchunless YARV instruction, 85 built-in Ruby methods, calling, 97–99 bump allocation, 310 bytecode, 33 bytecode (JRuby option), 256 bytecode instructions Java, 254 Rubinius, 277, 278–279 ByteList (Java object), 264 C C++, working together with Ruby, 279–280 C4 (continuously concurrent compacting) collector, 320 caches, clearing Ruby’s method, 143–144 calling attr_reader, 97–98 attr_writer, 97–98 blocks, 61–62, 194–196 built-in Ruby methods, 97–99 eval with binding, 238–240 lambda more than once in the same scope, 216–217 lambdas, 209–211 methods with blocks, 71–72 normal Ruby methods, 95–97 call stack, 56 catch tables, 88–90 CFP (current frame pointer), 57, 88, 95 CFUNC methods, 61, 94, 97 child grammar rule, 15 Class (C++ class), 280 Class (Ruby class), 117 class