Interactive Data Visualization with Python Second Edition Present your data as an effective and compelling story Abha Belorkar Sharath Chandra Guntuku Shubhangi Hora Anshu Kumar Interactive Data Visualization with Python Second Edition Copyright © 2020 Packt Publishing All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews Every effort has been made in the preparation of this book to ensure the accuracy of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information Authors: Abha Belorkar, Sharath Chandra Guntuku, Shubhangi Hora, and Anshu Kumar Technical Reviewer: Saurabh Dorle Managing Editor: Ranu Kundu Acquisitions Editor: Kunal Sawant Production Editor: Shantanu Zagade Editorial Board: Shubhopriya Banerjee, Bharat Botle, Ewan Buckingham, Mahesh Dhyani, Manasa Kumar, Alex Mazonowicz, Bridget Neale, Dominic Pereira, Shiny Poojary, Abhisekh Rane Erol Staveley, Ankita Thakur, Nitesh Thakur, and Jonathan Wray First published: October 2019 Second edition: April 2020 Production Reference: 1130420 ISBN: 978-1-80020-094-4 Published by Packt Publishing Ltd Livery Place, 35 Livery Street Birmingham B3 2PB, UK Table of Contents Preface i Chapter 1: Introduction to Visualization with Python – Basic and Customized Plotting 1 Introduction 2 Handling Data with pandas DataFrame 3 Reading Data from Files . 3 Exercise 1: Reading Data from Files 3 Observing and Describing Data 4 Exercise 2: Observing and Describing Data . 4 Selecting Columns from a DataFrame . 8 Adding New Columns to a DataFrame . 8 Exercise 3: Adding New Columns to the DataFrame 9 Applying Functions on DataFrame Columns 10 Exercise 4: Applying Functions on DataFrame columns 11 Exercise 5: Applying Functions on Multiple Columns 13 Deleting Columns from a DataFrame . 14 Exercise 6: Deleting Columns from a DataFrame 14 Writing a DataFrame to a File 16 Exercise 7: Writing a DataFrame to a File 16 Plotting with pandas and seaborn 18 Creating Simple Plots to Visualize a Distribution of Variables 18 Exercise 8: Plotting and Analyzing a Histogram 19 Bar Plots . 25 Exercise 9: Creating a Bar Plot and Calculating the Mean Price Distribution 25 Exercise 10: Creating Bar Plots Grouped by a Specific Feature 30 Tweaking Plot Parameters 31 Exercise 11: Tweaking the Plot Parameters of a Grouped Bar Plot 32 Annotations . 35 Exercise 12: Annotating a Bar Plot 36 Activity 1: Analyzing Different Scenarios and Generating the Appropriate Visualization 39 Summary 45 Chapter 2: Static Visualization – Global Patterns and Summary Statistics 47 Introduction 48 Creating Plots that Present Global Patterns in Data 48 Scatter Plots . 49 Exercise 13: Creating a Static Scatter Plot . 50 Hexagonal Binning Plots 51 Exercise 14: Creating a Static Hexagonal Binning Plot 51 Contour Plots . 53 Exercise 15: Creating a Static Contour Plot 53 Line Plots 54 Exercise 16: Creating a Static Line Plot 55 Exercise 17: Presenting Data across Time with multiple Line Plots 58 Heatmaps 60 Exercise 18: Creating and Exploring a Static Heatmap 60 The Concept of Linkage in Heatmaps . 66 Exercise 19: Creating Linkage in Static Heatmaps 66 Creating Plots That Present Summary Statistics of Your Data 71 Histogram Revisited 71 Example 1: Histogram Revisited . 72 Box Plots 73 Exercise 20: Creating and Exploring a Static Box Plot 73 Violin Plots 76 Exercise 21: Creating a Static Violin Plot 77 Activity 2: Design Static Visualization to Present Global Patterns and Summary Statistics 78 Summary 83 Chapter 3: From Static to Interactive Visualization 85 Introduction 86 Static versus Interactive Visualization 88 Applications of Interactive Data Visualizations 93 Getting Started with Interactive Data Visualizations 95 Interactive Data Visualization with Bokeh 98 Exercise 22: Preparing Our Dataset 99 Exercise 23: Creating the Base Static Plot for an Interactive Data Visualization 104 Exercise 24: Adding a Slider to the Static Plot 107 Exercise 25: Adding a Hover Tool 108 Interactive Data Visualization with Plotly Express 113 Exercise 26: Creating an Interactive Scatter Plot 113 Activity 3: Creating Different Interactive Visualizations Using Plotly Express 117 Summary 119 Chapter 4: Interactive Visualization of Data across Strata 121 Introduction 122 Interactive Scatter Plots 122 Exercise 27: Adding Zoom-In and Zoom-Out to a Static Scatter Plot 124 Exercise 28: Adding Hover and Tooltip Functionality to a Scatter Plot 127 Exercise 29: Exploring Select and Highlight Functionality on a Scatter Plot 130 Exercise 30: Generating a Plot with Selection, Zoom, and Hover/Tooltip Functions 133 Selection across Multiple Plots 136 Exercise 31: Selection across Multiple Plots 137 Selection Based on the Values of a Feature 140 Exercise 32: Selection Based on the Values of a Feature 141 Other Interactive Plots in altair 143 Exercise 33: Adding a Zoom-In and Zoom-Out Feature and Calculating the Mean on a Static Bar Plot 144 Exercise 34: An Alternative Shortcut for Representing the Mean on a Bar Plot 150 Exercise 35: Adding a Zoom Feature on a Static Heatmap 153 Exercise 36: Creating a Bar Plot and a Heatmap Next to Each Other 157 Exercise 37: Dynamically Linking a Bar Plot and a Heatmap 160 Activity 4: Generate a Bar Plot and a Heatmap to Represent Content Rating Types in the Google Play Store Apps Dataset 163 Summary 166 Chapter 5: Interactive Visualization of Data across Time 169 Introduction 170 Temporal Data 170 Types of Temporal Data 171 Why Study Temporal Visualization? 172 Understanding the Relation between Temporal Data and Time‑Series Data 174 Examples of Domains That Use Temporal Data 175 Visualization of Temporal Data 176 How Time-Series Data Is Manipulated and Visualized 179 Date/Time Manipulation in pandas 181 Building a DateTime Index 182 Choosing the Right Aggregation Level for Temporal Data 183 Exercise 38: Creating a Static Bar Plot and Calculating the Mean and Standard Deviation in Temporal Data 185 Exercise 39: Calculating zscore to Find Outliers in Temporal Data 190 Resampling in Temporal Data 194 Common Pitfalls of Upsampling and Downsampling 194 Exercise 40: Upsampling and Downsampling in Temporal Data 194 Using shift and tshift to Introduce a Lag in Time-Series Data 199 Exercise 41: Using shift and tshift to Shift Time in Data 199 Autocorrelation in Time Series 201 Interactive Temporal Visualization 203 Bokeh Basics 204 Advantages of Using Bokeh 204 Exercise 42: Adding Interactivity to Static Line Plots Using Bokeh 206 Exercise 43: Changing the Line Color and Width on a Line Plot 208 Exercise 44: Adding Box Annotations to Find Anomalies in a Dataset 210 Interactivity in Bokeh 212 Activity 5: Create an Interactive Temporal Visualization 214 Summary 215 Chapter 6: Interactive Visualization of Geographical Data 217 Introduction 218 Choropleth Maps 218 Worldwide Choropleth Maps 219 Exercise 45: Creating a Worldwide Choropleth Map 220 Exercise 46: Tweaking a Worldwide Choropleth Map 223 Exercise 47: Adding Animation to a Choropleth Map 227 USA State Maps 231 Exercise 48: Creating a USA State Choropleth Map 232 Plots on Geographical Maps 235 Scatter Plots 235 Exercise 49: Creating a Scatter Plot on a Geographical Map 235 Bubble Plots 237 Exercise 50: Creating a Bubble Plot on a Geographical Map 238 Line Plots on Geographical Maps 244 Exercise 51: Creating Line Plots on a Geographical Map 245 Activity 6: Creating a Choropleth Map to Represent Total Renewable Energy Production and Consumption across the World 250 Summary 255 Chapter 7: Avoiding Common Pitfalls to Create Interactive Visualizations 257 Introduction 258 Data Formatting and Interpretation 258 Avoiding Common Pitfalls while Dealing with Dirty Data 259 Outliers 259 Exercise 52: Visualizing Outliers in a Dataset with a Box Plot 261 Exercise 53: Dealing with Outliers 266 Missing Data 269 Exercise 54: Dealing with Missing Values 269 Duplicate Instances and/or Features 275 Bad Feature Selection 276 Activity 7: Determining Which Features to Visualize on a Scatter Plot 276 Data Visualization 279 Choosing a Visualization 279 Common Pitfalls While Visualizing Data 282 Exercise 55: Creating a Confusing Visualization 283 Activity 8: Creating a Bar Graph for Improving a Visualization 286 Cheat Sheet for the Visualization Process 288 Summary 290 Appendix 293 Index 335 ... TrainingByPackt /Interactive- Data- Visualization- with- Python/ tree/master/Graphics/ Lesson1 Handling Data with pandas DataFrame | Handling Data with pandas DataFrame The pandas library is an extremely resourceful... "https://raw.githubusercontent.com/TrainingByPackt/ Interactive- Data- Visualization- with- Python/ master/datasets/diamonds csv" Handling Data with pandas DataFrame | Read files from the URL into the pandas DataFrame: #Yes, we can read files.. .Interactive Data Visualization with Python Second Edition Present your data as an effective and compelling story Abha Belorkar Sharath Chandra Guntuku Shubhangi Hora Anshu Kumar Interactive Data