(BQ) Part 1 book Visualization analysis and design has contents: What’s vis, and why do it; what data abstraction; why task abstraction; analysis four levels for validation; marks and channels; rules of thumb; arrange tables.
AN A K PETERS BOOK WITH VITALSOURCE® EBOOK A K Peters Visualization Series “A must read for researchers, sophisticated practitioners, and graduate students.” —Jim Foley, College of Computing, Georgia Institute of Technology Author of Computer Graphics: Principles and Practice “Munzner’s new book is thorough and beautiful It belongs on the shelf of anyone touched and enriched by visualization.” —Chris Johnson, Scientific Computing and Imaging Institute, University of Utah “This is the visualization textbook I have long awaited It emphasizes abstraction, design principles, and the importance of evaluation and interactivity.” “Munzner elegantly synthesizes an astounding amount of cutting-edge work on visualization into a clear, engaging, and comprehensive textbook that will prove indispensable to students, designers, and researchers.” —Steven Franconeri, Department of Psychology, Northwestern University “Munzner shares her deep insights in visualization with us in this excellent textbook, equally useful for students and experts in the field.” Tamara Munzner —Jarke van Wijk, Department of Mathematics and Computer Science, Eindhoven University of Technology “The book shapes the field of visualization in an unprecedented way.” —Jim Hollan, Department of Cognitive Science, University of California, San Diego —Wolfgang Aigner, Institute for Creative Media Technologies, St Pölten University of Applied Sciences “Munzner is one of the world’s very top researchers in information visualization, and this meticulously crafted volume is probably the most thoughtful and deep synthesis the field has yet seen.” “This book provides the most comprehensive coverage of the fundamentals of visualization design that I have found It is a much-needed and long-awaited resource for both teachers and practitioners of visualization.” —Michael McGuffin, Department of Software and IT Engineering, École de Technologie Supérieure —Kwan-Liu Ma, Department of Computer Science, University of California, Davis • Access online or download to your smartphone, tablet or PC/Mac • Search the full text of this and other titles you own • Make and share notes and highlights • Copy and paste text and figures for use in your own documents • Customize your view by changing font size and layout Visualization Analysis & Design This book’s unified approach encompasses information visualization techniques for abstract data, scientific visualization techniques for spatial data, and visual analytics techniques for interweaving data transformation and analysis with interactive visual exploration Suitable for both beginners and more experienced designers, the book does not assume any experience with programming, mathematics, human– computer interaction, or graphic design K14708 Visualization/Human–Computer Interaction/Computer Graphics Illustrations by Eamonn Maguire Visualization Analysis & Design A K PETERS VISUALIZATION SERIES Series Editor: Tamara Munzner Visualization Analysis and Design Tamara Munzner 2014 Visualization Analysis & Design Tamara Munzner Department of Computer Science University of British Columbia Illustrations by Eamonn Maguire Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business AN A K PETERS BOOK Cover art: Genesis 6-3-00, by Aribert Munzner Casein on paperboard, 26” × 20”, 2000 http://www.aribertmunzner.com For reuse of the diagram figures released under the CC-BY-4.0 license, written permission from the publishers is not required CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S Government works Version Date: 20140909 International Standard Book Number-13: 978-1-4665-0893-4 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com ✐ ✐ ✐ ✐ Contents Preface Why a New Book? Existing Books Audience Who’s Who Structure: What’s in This Book What’s Not in This Book Acknowledgments xv xv xvi xvii xviii xviii xx xx What’s Vis, and Why Do It? 1.1 The Big Picture 1.2 Why Have a Human in the Loop? 1.3 Why Have a Computer in the Loop? 1.4 Why Use an External Representation? 1.5 Why Depend on Vision? 1.6 Why Show the Data in Detail? 1.7 Why Use Interactivity? 1.8 Why Is the Vis Idiom Design Space Huge? 1.9 Why Focus on Tasks? 1.10 Why Focus on Effectiveness? 1.11 Why Are Most Designs Ineffective? 1.12 Why Is Validation Difficult? 1.13 Why Are There Resource Limitations? 1.14 Why Analyze? 1.15 Further Reading 1 6 10 11 11 12 14 14 16 18 Data Abstraction The Big Picture Why Do Data Semantics and Types Matter? Data Types Dataset Types 2.4.1 Tables 2.4.2 Networks and Trees 2.4.2.1 Trees 20 21 21 23 24 25 26 27 What: 2.1 2.2 2.3 2.4 v ✐ ✐ ✐ ✐ vi Contents 2.4.3 2.5 2.6 2.7 Why: 3.1 3.2 3.3 3.4 Fields 2.4.3.1 Spatial Fields 2.4.3.2 Grid Types 2.4.4 Geometry 2.4.5 Other Combinations 2.4.6 Dataset Availability Attribute Types 2.5.1 Categorical 2.5.2 Ordered: Ordinal and Quantitative 2.5.2.1 Sequential versus Diverging 2.5.2.2 Cyclic 2.5.3 Hierarchical Attributes Semantics 2.6.1 Key versus Value Semantics 2.6.1.1 Flat Tables 2.6.1.2 Multidimensional Tables 2.6.1.3 Fields 2.6.1.4 Scalar Fields 2.6.1.5 Vector Fields 2.6.1.6 Tensor Fields 2.6.1.7 Field Semantics 2.6.2 Temporal Semantics 2.6.2.1 Time-Varying Data Further Reading Task Abstraction The Big Picture Why Analyze Tasks Abstractly? Who: Designer or User Actions 3.4.1 Analyze 3.4.1.1 Discover 3.4.1.2 Present 3.4.1.3 Enjoy 3.4.2 Produce 3.4.2.1 Annotate 3.4.2.2 Record 3.4.2.3 Derive 3.4.3 Search 3.4.3.1 Lookup 3.4.3.2 Locate 3.4.3.3 Browse 3.4.3.4 Explore 27 28 29 29 30 31 31 32 32 33 33 33 34 34 34 36 37 37 37 38 38 38 39 40 42 43 43 44 45 45 47 47 48 49 49 49 50 53 53 53 53 54 Contents vii 3.4.4 54 54 55 55 55 57 59 59 60 62 64 Analysis: Four Levels for Validation 4.1 The Big Picture 4.2 Why Validate? 4.3 Four Levels of Design 4.3.1 Domain Situation 4.3.2 Task and Data Abstraction 4.3.3 Visual Encoding and Interaction Idiom 4.3.4 Algorithm 4.4 Angles of Attack 4.5 Threats to Validity 4.6 Validation Approaches 4.6.1 Domain Validation 4.6.2 Abstraction Validation 4.6.3 Idiom Validation 4.6.4 Algorithm Validation 4.6.5 Mismatches 4.7 Validation Examples 4.7.1 Genealogical Graphs 4.7.2 MatrixExplorer 4.7.3 Flow Maps 4.7.4 LiveRAC 4.7.5 LinLog 4.7.6 Sizing the Horizon 4.8 Further Reading 66 67 67 67 69 70 71 72 73 74 75 77 78 78 80 81 81 81 83 85 87 89 90 91 Marks and Channels 5.1 The Big Picture 5.2 Why Marks and Channels? 5.3 Defining Marks and Channels 5.3.1 Channel Types 5.3.2 Mark Types 94 95 95 95 99 99 3.5 3.6 3.7 3.8 Query 3.4.4.1 Identify 3.4.4.2 Compare 3.4.4.3 Summarize Targets How: A Preview Analyzing and Deriving: Examples 3.7.1 Comparing Two Idioms 3.7.2 Deriving One Attribute 3.7.3 Deriving Many New Attributes Further Reading viii Contents 5.4 99 100 101 103 103 106 106 109 111 112 114 Rules of Thumb 6.1 The Big Picture 6.2 Why and When to Follow Rules of Thumb? 6.3 No Unjustified 3D 6.3.1 The Power of the Plane 6.3.2 The Disparity of Depth 6.3.3 Occlusion Hides Information 6.3.4 Perspective Distortion Dangers 6.3.5 Other Depth Cues 6.3.6 Tilted Text Isn’t Legibile 6.3.7 Benefits of 3D: Shape Perception 6.3.8 Justification and Alternatives Example: Cluster–Calendar Time-Series Vis Example: Layer-Oriented Time-Series Vis 6.3.9 Empirical Evidence 6.4 No Unjustified 2D 6.5 Eyes Beat Memory 6.5.1 Memory and Attention 6.5.2 Animation versus Side-by-Side Views 6.5.3 Change Blindness 6.6 Resolution over Immersion 6.7 Overview First, Zoom and Filter, Details on Demand 6.8 Responsiveness Is Required 6.8.1 Visual Feedback 6.8.2 Latency and Interaction Design 6.8.3 Interactivity Costs 6.9 Get It Right in Black and White 6.10 Function First, Form Next 6.11 Further Reading 116 117 117 117 118 118 120 121 123 124 124 125 125 128 129 131 131 132 132 133 134 135 137 138 138 140 140 140 141 5.5 5.6 5.7 Using Marks and Channels 5.4.1 Expressiveness and Effectiveness 5.4.2 Channel Rankings Channel Effectiveness 5.5.1 Accuracy 5.5.2 Discriminability 5.5.3 Separability 5.5.4 Popout 5.5.5 Grouping Relative versus Absolute Judgements Further Reading Contents ix Arrange Tables 7.1 The Big Picture 7.2 Why Arrange? 7.3 Arrange by Keys and Values 7.4 Express: Quantitative Values Example: Scatterplots 7.5 Separate, Order, and Align: Categorical Regions 7.5.1 List Alignment: One Key Example: Bar Charts Example: Stacked Bar Charts Example: Streamgraphs Example: Dot and Line Charts 7.5.2 Matrix Alignment: Two Keys Example: Cluster Heatmaps Example: Scatterplot Matrix 7.5.3 Volumetric Grid: Three Keys 7.5.4 Recursive Subdivision: Multiple Keys 7.6 Spatial Axis Orientation 7.6.1 Rectilinear Layouts 7.6.2 Parallel Layouts Example: Parallel Coordinates 7.6.3 Radial Layouts Example: Radial Bar Charts Example: Pie Charts 7.7 Spatial Layout Density 7.7.1 Dense Example: Dense Software Overviews 7.7.2 Space-Filling 7.8 Further Reading Arrange Spatial Data 8.1 The Big Picture 8.2 Why Use Given? 8.3 Geometry 8.3.1 Geographic Data Example: Choropleth Maps 8.3.2 Other Derived Geometry 8.4 Scalar Fields: One Value 8.4.1 Isocontours Example: Topographic Terrain Maps Example: Flexible Isosurfaces 8.4.2 Direct Volume Rendering Example: Multidimensional Transfer 144 145 145 145 146 146 149 149 150 151 153 155 157 158 160 161 161 162 162 162 162 166 167 168 171 172 172 174 175 Functions 178 179 179 180 180 181 182 182 183 183 185 186 187 164 Arrange Tables Figure 7.13 Parallel coordinates were designed to show correlation between neighboring axes At the top, parallel lines show perfect positive correlation At the bottom, all of the lines cross over each other at a single spot in between the two axes, showing perfect negative correlation In the middle, the mix of crossings shows uncorrelated data From [Wegman 90, Figure 3] Section 13.4.1 covers scaling to larger datasets with hierarchical parallel coordinates terms of number of attributes that can be displayed: dozens is common As the number of attributes shown increases, so does the width required to display them, so a parallel coordinates display showing many attributes is typically a wide and flat rectangle Assuming that the axes are vertical, then the amount of vertical screen space required to distinguish position along them does not change, but the amount of horizontal screen space increases as more axes are added One limit is that there must be enough room between the axes to discern the patterns of intersection or parallelism of the line segments that pass between them The basic parallel coordinates idiom scales to showing hundreds of items, but not thousands If too many lines are overplotted, the resulting occlusion yields very little information Figure 7.14 contrasts the idiom used successfully with 13 items and attributes, as in Figure 7.14(a), versus ineffectively with over 16,000 items and attributes, as in Figure 7.14(b) In the latter case, only the minimum and maximum values along each axis can be read; it is nearly impossible to see trends, anomalies, or correlations The patterns made easily visible by parallel coordinates have to with the pairwise relationships between neighboring axes Thus, the cru- 7.6 Spatial Axis Orientation (a) 165 (b) Figure 7.14 Parallel coordinates scale to dozens of attributes and hundreds of items, but not to thousands of items (a) Effective use with 13 items and attributes (b) Ineffective use with over 16,000 items and attributes From [Fua et al 99, Figures and 2] cial limitation of parallel coordinates is how to determine the order of the axes Most implementations allow the user to interactively reorder the axes However, exploring all possible configurations of axes through systematic manual interaction would be prohibitively time consuming as the number of axes grows, because of the exploding number of possible combinations Another limitation of parallel coordinates is training time; first-time users not have intuitions about the meaning of the patterns they see, which must thus be taught explicitly Parallel coordinates are often used in one of several multiple views showing different visual encodings of the same dataset, rather than as the only encoding The combination of more familiar views such as scatterplots with a parallel coordinates view accelerates learning, particularly since linked highlighting reinforces the mapping between the dots in the scatterplots and the jagged lines in the parallel coordinates view Multiple view design choices are discussed in Sections 12.3 and 12.4 166 Arrange Tables Idiom What: Data How: Encode Why: Tasks Scale 7.6.3 In mathematical language, the angle channel is nonmonotonic Parallel Coordinates Table: many value attributes Parallel layout: horizontal spatial position used to separate axes, vertical spatial position used to express value along each aligned axis with connection line marks as segments between them Find trends, outliers, extremes, correlation Attributes: dozens along secondary axis Items: hundreds Radial Layouts In a radial spatial layout, items are distributed around a circle using the angle channel in addition to one or more linear spatial channels, in contrast to the rectilinear layouts that use only two spatial channels The natural coordinate system in radial layouts is polar coordinates, where one dimension is measured as an angle from a starting line and the other is measured as a distance from a center point Figure 7.15 compares polar coordinates, as shown in Figure 7.15(a), with standard rectilinear coordinates, as shown in Figure 7.15(b) From a strictly mathematical point of view, rectilinear and radial layouts are equivalent under a particular kind of transformation: a box bounded by two sets of parallel lines is transformed into a disc where one line is collapsed to a point at the center and the other line wraps around to meet up with itself, as in Figure 7.15(c) However, from a perceptual point of view, rectilinear and radial layouts are not equivalent at all The change of visual channel has two major consequences from visual encoding principles alone First, the angle channel is less accurately perceived than a rectilinear spatial position channel Second, the angle channel is inherently cyclic, because the start and end point are the same, as opposed to the inherently linear nature of a position channel The expressiveness and effectiveness principles suggest some guidelines on the use of radial layouts Radial layouts may be more effective than rectilinear ones in showing the periodicity of patterns, but encoding nonperiodic data with the periodic channel of angle 7.6 Spatial Axis Orientation 167 10 4 (a) 10 (b) (c) Figure 7.15 Layout coordinate systems (a) Radial layouts use polar coordinates, with one spatial position and one angle channel (b) Rectlinear layouts use two perpendicular spatial position channels After [Wickham 10, Figure 8] (c) Transforming rectilinear to radial layouts maps two parallel bounding lines to a point at the center and a circle at the perimeter may be misleading Radial layouts imply an asymmetry of importance between the two attributes and would be inappropriate when the two attributes have equal importance Example: Radial Bar Charts The same five-attribute dataset is encoded with a rectilinear bar chart in Figure 7.16(a) and with a radial alternative in Figure 7.16(b) In both cases, line marks are used to encode a quantitative attribute with the length channel, and the only difference is the radial versus the rectilinear orientation of the axes (a) (b) Figure 7.16 Radial versus rectilinear layouts (a) Rectilinear bar chart (b) Radial bar chart After [Booshehrian et al 11, Figure 4] 168 Arrange Tables Idiom What: Data Radial Bar Charts Table: one quantitative attribute, one categorical attribute Length coding of line marks; radial layout How: Encode Example: Pie Charts Synonyms for polar area chart are rose plot and coxcomb plot; these were first popularized by Florence Nightingale in the 19th century in her analysis of Crimean war medical data (a) The most commonly used radial statistical graphic is the pie chart, shown in Figure 7.17(a) Pie charts encode a single attribute with area marks and the angle channel Despite their popularity, pie charts are clearly problematic when considered according to the visual channel properties discussed in Section 5.5 Angle judgements on area marks are less accurate than length judgements on line marks The wedges vary in width along the radial axis, from narrow near the center to wide near the outside, making the area judgement particularly difficult Figure 7.17(b) shows a bar chart with the same data, where the perceptual judgement required to read the data is the high-accuracy position along a common scale channel Figure 7.17(c) shows a third radial chart that is a more direct equivalent of a bar chart transformed into polar coordinates The polar area chart also encodes a single quantitative attribute but varies the length of the wedge just as a bar chart varies the length of the bar, rather than varying the angle as in a pie chart The data in Figure 7.17 shows the clarity distribution of diamonds, where I1 is worst and IF is best These instances redundantly encode each mark with color for easier legibility, but these idioms could be used without color coding (b) (c) Figure 7.17 Pie chart versus bar chart accuracy (a) Pie charts require angle and area judgements (b) Bar charts require only high-accuracy length judgements for individual items (c) Polar area charts are a more direct equivalent of bar charts, where the length of each wedge varies like the length of each bar From [Wickham 10, Figures 15 and 16] 169 100% 35M 65 Years and Over 90% Population 7.6 Spatial Axis Orientation 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 30M 14 to 17 Years 80% ≥65