You should also have noticed that the numeric columns are arranged at the bottom of the screen under the Measures pane and the textual, date time column types are arranged at the top under the Dimensions pane. This is based on how tableau considers and treats these data types as either continuous or discrete quantities. In addition to the classification of data types based on their intrinsic value, we could classify them based on the presence of continuity or not in the values. For example, the product names are discrete values. In the
Trang 3Table of Contents
1 Introduction
1.1 Why visualize data ?
1.2 Who is this book for ?
1.3 How is this book different ?
3.4 Converting "Business questions" to the language of data
4 The Crux of Tableau
4.1 The 4 building pillars
4.1.1 Dimensions, Measures & Aggregations
4.1.2 Viz Pane - columns & rows shelf
4.1.3 Marks card
4.1.3.1 Color block4.1.3.2 Size block4.1.3.3 Label block4.1.3.4 Detail block4.1.3.5 Tooltip block4.1.4 Filters
4.2 Putting it all together
4.3 Show Me
Trang 44.4 Sheets & Dashboards
5 Calculations
5.1 Grouping values
5.2 Calculated Fields
5.3 Row level, Aggregation & Dis-aggregation
5.4 Bringing in more data
5.4.1 From Excel/ CSV
5.4.2 From MySQL
5.5 Importance of cardinality - A practical example
5.6 Data Modeling
6 Tables & Table calculations
6.1 Show me or start from scratch ?
6.2 Table totals
6.3 Table calculations
6.3.1 Table & Pane - Down & Across
6.3.2 Down then Across & Across then Down
6.3.3 Shortcut to reading Table calculations in English6.3.4 Formulation of Table calculations
6.3.5 Comparisons - YoY, WoW, MoM
7.4 Shapes & Icons
7.5 Level of Detail (LOD) calculations
7.5.1 Fixed LOD
7.5.2 Include LOD
Trang 57.5.3 Exclude LOD
7.6 Reference Lines & Forecasts
7.6.1 Reference Lines using Parameters
7.6.2 Reference Lines using secondary data
7.6.3 Forecast & Trend lines
7.7 Order of operations
8 Dashboards
8.1 Less is more
8.2 Dashboards: A view from 10000ft
8.3 Fit & Layout
8.4 Filters & Interactions
8.4.1 Customizing filters
8.4.2 Discrete vs continuous filters
8.4.3 Filter domain
9 Useful links
Tableau Public & Tableau Desktop
Most of the examples illustrated here can be followed along with Tableau Public Cases requiring Tableau Desktop are highlighted.
Trang 61.1 Why visualize data ?
Over the past few decades, Excel has become the de facto data analyticstool for most business users When you need to sum two values, it couldn’t
be simpler than clicking on the first value that you need to add, follow it upwith a "+" sign and the next value to be added Voilà, you’ve got yourself thetotal of two values Drag the formula down by clicking on the corners, you’vegot yourself a sum of 2 columns
Unfortunately this flexibility comes at a cost The user gets graduallytrapped in the world of quick fixes and patched formulae that Excel has tooffer Initially Lotus 123, the predecessor of Excel was conceived primarily
as a data entry tool and indeed Excel "excels" at this task
But now in this new era of big data, data visualization and data analyticsdeserve their own tool-kit The human brain does a very poor job decipheringmeaningful trends from a table of raw data (numbers) but at the same timeexcels at comparing, extrapolating and spotting trends in visual shapes andcolors It turns out the brain is able to take in a picture and process it in onestroke while on the other hand processes text in a linear fashion Imagine, a
Trang 7bar chart which condenses 100 rows of data into a few columns againstreading the rows one by one You start to get the picture Having said that,it’s the responsibility of the analyst to effectively distill and convey themeaning hidden behind the numbers through effective and meaningfulvisualizations.
If you take a close look at the following table of raw numbers, you’ll beable to make a few observations
There are four datasets
Each dataset has an x and y column
Numbers seem to range from 4 to 13
There are atmost 2 decimals
Figure 1: Anscombe’s dataset
Trang 8And we start squeezing our eyelids together to squeeze out moreinformation from this table The more astute among you, might have copy,pasted these numbers into a good ol’ excel sheet and grabbed your "I’m adata scientist" coffee mug You start making a list
Average of x: 9
Sample variance of x: 11
Average of y: 7.5
Sample variance of x: 4.125
Correlation between x and y: 0.816
A nice linear regression line: y = 3 + 0.5x
R 2 of the linear regression line: 0.67
That’s a lot of numbers and now the strange thing that you notice is thatthis above list is the same across all the 4 datasets It’s fairly easy to makesimplistic reductions about the distributions of x and y that they are similarbased on the summary statistics A quick visualization would instantly revealthe hidden gems in the distributions This example also helps underline theimportance of exploratory data analytics before drawing any inferences andconclusions
Figure 2: Anscombe’s dataset in Tableau
Having said that, I’m definitely not discouraging the use of Excel in anyway It’s a powerful tool in the repertoire of any competent professional Themain point I want to drive home is the fact that Excel often needs to be
Trang 9complemented by a data visualization tool to help effectively communicateand share your findings Microsoft PowerBI is just a good tool as Tableau butthis book being about Tableau, I’ll contain myself to illustrations withTableau.
1.2 Who is this book for ?
The primary intended audience of this book are Business analysts, DataAnalysts and Financial Analysts or more broadly anyone who is hitting thelimits of Excel with their data analytics needs If your day to day revolvesaround staring at numbers all day long, then you’re definitely part of thetarget audience There are no prerequisites to follow along the concepts inthis book We will work our way gradually from the very fundamentals ofdata all the way up to to building fancy dashboards & visualizations ongigabytes of data
1.3 How is this book different ?
There are many books on the market which are excellent Tableau userguides and reference manuals They do an excellent job of presenting everymenu tab, button, pane and shelf in Tableau If you’re the kind of person whoneeds to know every single button and functionality tucked into Tableau thenthis might not be the right book for you
When you start to learn a new language and want to go about it in asystematic and methodical way, you would start with the grammar.Understanding the foundational underpinnings of the languages, helps youget the basics right and then it’s a matter of stringing words together to makesentences Lining up words within the rules defined by the grammar (or not)
in infinite possible ways, to write Shakespearean poetry or tabloid articles orhave conversations, is a logical next step
This book intends to approach the subject of mastering Tableau in asimilar fashion We’ll try to distill the very core essence of Tableau in a fewconcepts and then it’s just a matter of combining them in infinite possibleways to build the required data visualizations
Icons used in this book
Trang 10Tips & shortcuts worth keeping in mind
Traps to watch out for in Tableau which could help you avoidpotential headaches down the road
Technical details which are not necessary to follow along thechapters in this book and can be comfortably skipped or glossedover
Trang 112.1 Installation of Tableau
We’ll gloss over very quickly the installation of Tableau public whichyou will require to follow along with this book Unfortunately for the cornercases, business users working on Macs and Linux, I’ll have to redirect yourquestions and concerns to the almighty Google Head over to Tableaupublic’s site Hand over your email, as usual, in return for the Tableau publicinstallation file (.exe)
Trang 12Figure 3: Tableau public download pageDouble click on the exe and follow along the instructions to complete theinstallation of Tableau Public.
Trang 13Figure 4: Tableau public download page
It’s time to take the cover off and take tableau for a quick spin As soon
as you open up Tableau, click on Microsoft Excel under connect as shown inthe figure below
Figure 5: Tableau - Connect to Excel
In the next screen, drag and drop the Excel tab from which you’d like toimport the data as shown in the figure below Make sure that the data looks
right in the preview pane before clicking on the Sheet 1
Trang 14Figure 6: Tableau - Import Data from Tab
2.2 Data sources required for the exercises in the book
The various tableau workbooks, Excel and CSV files required for you tofollow along are available at this book’s Github source page You can alsodirectly get the workbooks from the Tableau public server available under myprofile
Trang 153.1 Data types
Now that we’re done with the formalities of setting up Tableau, let’s stepinto the shallow end of the pool with some basic data types Essentially, wecan broadly categorize any piece of data into the following 3 types
Absolutely nothing mind blowing here so far To give you a few businessexamples (keeping inline with the book’s audience)
String / textual data types could include product names, categories,
country names etc (Identified by "Abc" next to column names in
Tableau)
Trang 16Numeric data types could include profit, sales, sale price etc (Identified
by the icon "#")
Date types could include invoice date, shipping date, return time etc.
(Identified by the calendar icon)
You have the flexibility to change the data types if Tableau didn’tinterpret the data type correctly Say for example, the column Row ID which
is interpreted as a number needs to be changed into a String, click on the
"123" number icon next to the column name and switch it to the String
datatype
Figure 7: Variable types in Tableau
You should also have noticed that the numeric columns are arranged at
the bottom of the screen under the "Measures" pane and the textual, date time column types are arranged at the top under the "Dimensions" pane This
is based on how tableau considers and treats these data types as eithercontinuous or discrete quantities
In addition to the classification of data types based on their intrinsicvalue, we could classify them based on the presence of continuity or not inthe values For example, the product names are discrete values In the
Trang 17superstore dataset, you’ll notice that there are 3 types of values in thecategory column (Furniture, office supplies and technology) There is nosense of continuity between the three distinct values and hence we would call
them discrete values.
Dimensions Measures
You can quickly filter the columns by type using the followingshortcuts:
D: Displays just the dimensions.
M: Displays just the measures.
C: Displays just the calculated fields that you have created in
Tableau (covered in the section about calculations)
On the other-hand, take the example of the order date column whichcontains dates When you want to understand the sales over the past 3 years
of order dates, we consider the date column as a continuous range as the
days are in successive order and we want to see the evolution across time
Figure 8: Discrete vs Continuous variables
Trang 18Discrete vs Continuous quantities
Discrete columns are identified in blue pills and Continuous
quantities are identified as green pills This subtle difference oftenleads to "unexpected outcomes" in visualizations
You can switch the type of the variable from discrete to continuous
or vice versa and there are many ways of doing it You could rightclick on the YEAR(Order Date) green pill as shown in the figure 8and switch it to discrete This will ensure that this value is treated as
a discrete value in just this analysis
3.2 Data sources
In order to get a car started, just as you need fuel, in order to startbuilding your visualizations with Tableau, we need to import some data.Keeping in mind with the primary audience of this book, let’s walk throughthe steps with Excel and CSV files to begin with In section 5.4, we will seehow to bring in more than one data source as you might have data coming infrom multiple sources For example, you might have to consolidate budgetdata from the finance teams, sales data from Salesforce and Web performancefrom Google analytics in your reports Let’s build our way slowly to thatstarting with a single data source in this section
In the Global superstore.xlsx file (available here), there are 3 tabs:
Orders, Returns and People If you’ve been following along, we imported theorders tab in section 2.1 I just want to draw your attention to a couple ofpoints on this step
By default, when Tableau imports data, it will provide you a preview ofthe first few rows in a tabular format as you can see in figure 9 Sometimesyou’d prefer to see the list of columns and make sure that they are importedproperly and more importantly check if their data types are inferred properly
You can switch views by clicking on the "Manage metadata" icon in the
box highlighted as 1 This will provide you the metadata as shown in thebelow figure You now have the possibility to give the column names analias Let’s say for some reason, you want to change the column name from
"Row ID" to "Sno Row", then you would change the value under the column
"Field Name" These aliases will replace your column names everywhere in
Trang 19your Tableau reports.
Figure 9: Excel data import
Figure 10: Excel metadata
The second thing which could be interesting for you at this step are the
Data Source filters Let’s say you’re responsible for the French sales and
you would like to import just the data where the column "country" is equal toFrance, you could add a global "Data source" filter which will filter the dataupstream before getting imported into Tableau By clicking on "Add" next to
Trang 20the Filters as highlighted in the block 2 on the figure 9, you’ll be able to gothrough the steps illustrated in figure 11.
Figure 11: Data source filters
Extract vs Live Data ?
When you work with SQL like data sources, you’ve got 2 ways ofworking with data You could either work on live mode in which thedata will be queried in real time by Tableau i.e, everytime you build
a visualization for your report On the other-hand, if you select the
"Extract" mode, Tableau will pull-down the data and store it locally
in ".hyper" format in your computer As a result, subsequentquerying and processing will be much much faster compared to the
"Live" mode.
In case you’re working with voluminous data, it’s definitely
recommended to put the data source in "Extract" mode to help you
keep things fluid while you slice and dice the data We will get intomore details about these details in section 7.7
Trang 213.3 Data preparation
You would have probably heard analysts, working with a lot of data,gripe that 80% of their time is spent cleaning and preparing data.Unfortunately it is no exaggeration, I would even push it up to 90-95% Thereare a vast suite of tools which can help you get data in the right format foryour analysis such as Alteryx, Tableau Data Prep etc Excel VBA can be veryhelpful at times when you want to perform certain manual actions on yourexcel files Data preparation is a topic that merits it’s own standalone book.But one thing that you would need to keep in mind is that data analyticssolutions work best on tabular data in which each row presents a uniquecombination of values
Algeria Consumer $9,000AustraliaHome Office$5,000Hungary Corporate $7,000Sweden Home Office$9,000Canada Corporate $10,000AustraliaConsumer $5,000Hungary Consumer $3,000Canada Home Office$9,000Sweden Consumer $5,000
$62,000
Table 1: Tabular Data
The Table 1 above provides you a classical tabular dataset of profits bycountry and segment You will notice that Australia for example getsrepeated twice (one row when the segment = Home Office and one when thesegment = Consumer) You will also observe that the sum of profits across allcountries and segment is $62 000
Trang 22Table 2: Pivoted Data
The same values can be represented in a pivoted fashion as illustrated inTable 2 The pivoted dataset manages to provide the same values in a morecondensed format and unfortunately this dataset would not work as well asthe tabular dataset with any data analytics tool You need to transpose the 3segments which are in the column headers into a new column called segment
to get back your initial tabular dataset This operation goes by many names:Transpose, Stacking, Pivoting etc
Table 3: Monthly sales data
Let’s say you’ve got a table of sales across the last 12 months for 5countries in a pivoted data format We can easily transform this data into atabular version in Tableau Import the excel file (Ch 3 - Transpose Tab in theGlobal Superstore dataset available in Github) into Tableau as usual, and thenselect the 12 months of data keeping the shift key pressed Right click on the
column and click on "Pivot" as highlighted in figure 12 This will create your
tabular dataset on which we can easily start comparing MoM (Month overMonth), QoQ (Quarter over Quarter) comparisons which we will get into inthe following chapters
Trang 23Figure 12: Pivot data in Tableau
3.4 Converting "Business questions" to the
Let’s say, in our dataset, you would like to visualize the profit generated
by Qatar in the last 3 months for every category of product sold Arecommended way to gently rephrase the question would be as follows:
Total Profits by Category when Sale Date <=3 months and Country is
Trang 24equal to Qatar.
We start with the value that we’d like to measure (called "Measure"
appropriately in Tableau) and how we’d like to aggregate this measure.(Average or sum or Minimum or maximum etc.)
We then follow it by the variable that we would like to break down thismeasure by In our example, it is by category of product This variable is
called a "Dimension" We can follow this up with as many dimensions as
we would like But unfortunately, as soon as we start to add more than 2dimensions in a chart, the visualizations tend to get much more complicatedand harder to infer
Let’s do another example This time we would like to calculate theunique number of orders by country for each sub category in the officesupplies category
We can get fancy in our formulations by stacking bricks on top of each other.Say for example, we wanted to analyze the year over year (YoY) % growth
of the unique orders that we calculated above, our entire reformulatedquestion would read as such
Trang 264.1 The 4 building pillars
If you have been following along so far, you should have a fairunderstanding on how to import data into Tableau, make sure that the datatypes are properly inferred and also the basic semantics of converting yourbusiness questions into the language of data analytics Now that we have gotthe foundation set, it’s time to look at the crux of Tableau - i.e, the 4 buildingblocks which will allow us to build and refine our data visualizations
4.1.1 Dimensions, Measures & Aggregations
By now, you should have a fair understanding of what Dimensions,
Measures and Aggregations mean in the context of data analytics I just
noticed that in the paid version of Tableau (Tableau Desktop), Dimensionsand Measures are highlighted appropriately unlike the Tableau Public which
is the free version I’m guessing the 70$ license fee buys a few nice lookinglabels in addition to a plethora of other functionalities
The continuous quantities highlighted in green (measures) are
Trang 27aggregated and broken down or not by dimensions You could take a
non-continuous quantity such as "User Id" and aggregate it (count or unique) to
make it a measure and then continue with the dimensions in your analysis.
In the dimensions and measures shelf, if you scroll to the bottom of each shelf, you’ll notice the two values measure names & measure values tucked
in Unfortunately most Tableau beginners fail to take notice of these 2columns which bring in a lot of flexibility to your analysis We’ll makeextensive use of these 2 columns later in this book when we look at tablesmore in detail in section 6.1
Figure 13: Dimensions, Measures & Aggregates
By default, when you drag and drop your measures, Tableau willaggregate them as a sum This can be handy most often but let’s say you have
a column containing a ratio and by default, you always need to average thesevalues instead of summing them You can change the default aggregationmeasures to an average for example as highlighted below in figure 14 byright-clicking on the measure in question
Trang 28Figure 14: Set default aggregations for measures
A possible use-case for measure names & measure values
In this unrealistic use-case, let’s say you want to pile up discounts,profits and sales in a bar chart If you try dragging the three measuresinto the rows shelf, you’ll end up getting three discrete bars in blue
on the right as shown in figure 15 But instead if you drop measure
values on the rows shelf and measure names in the filters shelf, you
can pile them up on top of each other
Trang 29Figure 15: Measures names & values
4.1.2 Viz Pane - columns & rows shelf
To illustrate the use of columns and row shelf in Tableau, let’s go back
to the basic terminology of a bar chart as illustrated in figure 16 In the case
of this bar chart, the column shelf houses the dimension which is category in
this example As a result, we have got three vertical columns (Furniture,
office supplies & technology) Similarly the rows shelf houses the measure
-Total profits As a result on your Y-axis, we have the total profits whichdetermine the heights of the vertical bars Just to hammer the point home, if
we were to swap the pills in rows and columns shelf, i.e, sum(profit) in thecolumns shelf and category in the rows shelf, what would be end result ?
Do we always need the rows & column shelf ?
Turns out, you don’t absolutely need to use the Rows & Columnsshelf in all your visualizations For example, in the case of Treemaps, Pie charts or Packed Bubbles, we can completely do awaywith the concept of rows & columns ? Do you see why ? If not, don’tworry, this is the topic that we will deep dive in the next section
Trang 30You have the flexibility to pile up the dimensions and measures in thecolumns and rows shelf Let’s say you drop in year of order date in addition
to the category along the column shelf Given that there are 3 uniquecategories and 4 years of order history, can you predict the number of bars inthe graph ? 12 ! Each category of product gets broken down by the year oforders and hence 3 * 4 = 12 The cartesian product allows you to very quicklypour through various combinations in order to spot outliers and trends Ifrightly applied, they can be very helpful in your exploratory data analysis.But as a rule of thumb, for your visualizations in the dashboards, do restrictyourself to less than 2 dimensions
Figure 16: Bar Chart
Figure 17: Columns & row shelf flipped - Horizontal bar
Trang 31Figure 18: Stacking the column & row shelf
4.1.3 Marks card
Trang 32Figure 19: Marks Card
Armed with the knowledge of the first 2 building blocks, you can startunleashing bar charts, horizontal bar charts, line charts and a slew of othervisualizations So now, the next challenge is to change the color of the charts
or increase the size of the elements in the visualization or show labels &tooltips on your visualizations to make them striking This is where the markssection comes in handy
The marks card, as highlighted in figure 19, is essentially composed of 2main blocks The block in orange is a dropdown which let’s you play aroundwith the shape of the visualization The second block, in blue, is in turncomposed of 6 sub blocks which help you customize the various visualelements in a graph Let’s start with the blue block and get to thevisualization type (orange block) later
4.1.3.1 Color block This block, as you can imagine, allows you to change the
colors of the graphical elements You have two ways of making use of thisfunctionality Quite simply, you could use it to change the color of the graph.When you click on it directly, you get the popup that looks like in figure 20
If you have company color codes, you can add them to your custom colorpalette here and save yourself some time
The second, and more interesting, approach is by dropping in a
Trang 33dimension into the color box This breaks down the visualization into a moregranular detailed visualization by associating each unique value in thedimension with a color In the figure 21, you see how the line chart getsbroken into 3 distinct lines for each category This allows you to see howeach of the categories performed with respect to each other across the years
in terms of profit
Figure 20: Option 1: Change colors
I would like to draw your attention to a subtle but evident point of theresult that we get while we drop the category dimension in the column shelfalong with the year as illustrated in figure 22.(Note: you can drop the samedimension in multiple shelves simultaneously i.e, columns, rows, color etc)
Trang 34Figure 21: Option 2: Breakdown by dimensions
Figure 22: Stacking dimensions in the column shelf
Custom color palettes ?
Trang 35You can create your custom color palettes and save them in Tableau.I’m going to refer you to the almighty Google again but with the
keyword Preferences.tps Without going too much into details, it’s
an XML file (specifying the Color Hex codes of your custom palette)that you need to save in your Documents > My Tableau RepositoryFolder
4.1.3.2 Size block No surprises here This block allows you to change the
size of the visualizations Let’s build a quick visualization of a stacked barchart of total profits generated by ship mode We add the profit in the rowscolumn as we would like to sum over them and drop the ship mode in thecolor block as we saw in the previous section This provides a single multi-color bar as illustrated in the visualization on the left of figure 23
We can quickly see that the total profits are the highest in the standardclass of shipping and decreases progressively as we move up to the same dayshipping But let us say we want to accentuate this and clearly make a pointthat total profits are very low on same day shipping, we would CMD / CTRL
click on the SUM(Profit) on the rows shelf and drop it into the size block as
well This will ensure that the width of the bars is determined by the totalprofits (Homework: show that the profitability of the orders is not influenced
by the ship mode ? It might be tempting to conclude that same day shippingmode is not profitable.)
Trang 36Figure 23: Sizing up the graphs
In the beginning of this section, I mentioned briefly that there is adropdown tucked in sneakily above the size, color & label blocks Thisdropdown allows us to change the shapes of the visualizations Now would
be a good time as any to take a detour and check this out
Let’s make a small change to the visualization in figure 24 Let’s emptythe rows shelf and simply shift the shape from a bar to square in thedropdown We’ve got ourselves a beloved Treemap Fancy a packed bubblechart ? Switch the square to a circle !
Trang 37Figure 24: Change the shapes
4.1.3.3 Label block We’re off to great start with just 3 blocks on the marks
card You’ll be able to craft almost more than 90% of the typical graph typesthat you would require for your dashboard by playing with thesecombinations
Let’s run with the Tree map to showcase the label functionality Ashighlighted in figure 25, drag the ship mode on to the label card This allowsyou to annotate the visual elements in a chart Clicking on the label carditself, opens a popup which allows endless customization possibilities
Trang 38Figure 25: Annotating with labels
4.1.3.4 Detail block Let’s go back to the basic bar chart to help us wrap our
head around the detail block By now, you should be capable of building abar chart with your eyes closed (almost!) In the previous section aboutlabels, section 4.1.3.3, we saw that adding a dimension to the label annotatesthe graph In figure 26, let’s start off with a simple total sales by category andadd segment to the color card Now you have 3 segments highlighted in eachvertical bar Let’s then add the ship mode to the label card As you cannotice, ship mode is not part of the initial configuration of the visualization
So visually nothing will look different apart from the labels that were justadded, which might initially seem randomly strewn about As you starthovering your mouse over the bars and you will notice that the bars arebroken down into granular blocks Each of the sub-blocks represents the shipmode corresponding to the segment and category combination and now thelabels start to make more sense
Trang 39Figure 26: Adding more details
This is just the same as adding ship mode to the details block instead In
essence, the detail block allows you to break down a visual element into it’sgranular details As you can see there are 3 pills in the Marks section, the firstpill segment dictates the color, the second and third pill which are ship modeprovide the granular details within each segment Note: There is a hierarchy
in the order of the color and the details block If you flip the order of segmentand ship mode in the pill order, you will find yourself with an adjacentvisualization
Trang 40Figure 27: Hierarchy in details
4.1.3.5 Tooltip block We can continue to add more details into a
visualization and there comes a point when the visualization becomes nolonger meaningful or decipherable or a combination of both Tooltips,fortunately provide you an elegant way to load more information withoutvisually cramming your visualization Used appropriately, tooltips can helpyou tell the right story effectively by reducing "Chartjunk"
"The interior decoration of graphics generates a lot of ink that doesnot tell the viewer anything new The purpose of decoration varies—
to make the graphic appear more scientific and precise, to enliven thedisplay, to give the designer an opportunity to exercise artistic skills.Regardless of its cause, it is all non-data-ink or redundant data-ink,and it is often chartjunk."
(Edward Tufte)