Tag Archives: mapping

Waffle Charts and Unit Charts on Maps

One of the people I really wanted to meet at #data17 was Sarah Battersby, if you haven’t seen her blogs on mapping you’re missing out. It turns out she wanted to meet me, too!

So Sarah does really awesome stuff (can you tell I’m a fanboy yet??) and yesterday she posted about doing spiral jittering on maps:

That inspired me to dig out an old post where I’d done a waffle chart or unit chart jittering technique on a time series:

And adapt that to a map, like this one that makes me think of bunches of grapes:

Why would we need to do this?

I’m not going to go into all the details, you can go read Sarah’s fantastic Jittering post instead for an overview. Basically there are times when we have more than one “thing” for one or more point(s) on a map and we want to show them that way. For example in the map below there are 5 IDs for the selected zip code that are all plotted on top of one another. Through using jittering we can show a mark for each ID like in the jittered map above where that one visible circle is replaced by five circles.

There is nothing new under the sun

Back in 2013 Stephen Few proposed the idea of bricks on maps as a way of encoding a value, the technique I’m using here could be used to get the same result. The key difference is that instead of using the unit or waffle chart to encode the single value we’re using the unit chart to encode multiple values. Andy Cotgreave also had a nice post on some of the issues with this kind of chart.

How do we put a waffle chart on a map?

First of all we need a data source that has latitudes & longitudes that we can do calculations on. If you don’t there are a variety of ways to get one. After that we need to get mathy. Here’s my workout view:

All of the table calculations have a compute using on ID, the idea is that the compute using is set to the dimension(s) that are making the extra marks and then the dimension(s) left for partitioning (Zip Code in this case) are the ones that are creating the initial map positions.

I’ll briefly walk through each calculation.

Index uses Tableau’s INDEX() formula and uniquely identifies each mark.

Column Count gets us a count of columns that is based on forming the smallest possible square that will contain all the marks, here’s the formula:

//this formula will try to draw a square, if you want something else then just put in a fixed aggregate number like MIN(3)
IF INT(SQRT(SIZE())) == SQRT(SIZE()) THEN
 SQRT(SIZE())
ELSE
 INT(SQRT(SIZE()))+1
END
This would be a lot simpler if Tableau supported the CEILING() function for table calculations, because then it would be CEILING(SQRT(SIZE())). Vote up https://community.tableau.com/ideas/6239 if you’d like that too.

Column Offset then computes the number of positions of lateral (latitude) offset for each mark from the center of the unit chart. I’ve done some different experiments on where to position the unit chart and for me the most natural is to have the center of the unit chart be the centroid of the point.

//this gets the column position in the row
IF [Index] % [Column Count] = 0 THEN
 [Column Count]
ELSE
 [Index] % [Column Count]
END
//subtract the position of the midline to get the offset
 - ([Column Count] + 1)/2

Row Count then gets the number of rows in the unit chart, and would again benefit from a CEILING() function for table calcs (hint hint).

IF INT(SIZE()/[Column Count]) = SIZE()/[Column Count] THEN
 SIZE()/[Column Count]
ELSE
 INT(SIZE()/[Column Count]) + 1
END

Row Offset computes the number of positions of vertical (longitude) offset for each mark from the center of the unit chart:

//row position in column
IF INT([Index]/[Column Count]) = [Index]/[Column Count] THEN
 [Index]/[Column Count]
ELSE
 INT([Index]/[Column Count]) + 1
END
//subtract the midline of the row to get the offset
- ([Row Count] + 1)/2

Jitter – Waffle – Latitude then gets the latitude for the point and subtracts the row offset multiplied by a waffle jitter parameter that lets us deal with the zoom level to offset by just the right amount of latitude. This field has the Latitude geographic role assigned.

TOTAL(AVG([Latitude])) - [Row Offset] * [waffle jitter]

Jitter – Waffle – Longitude does the same for the longitude.

TOTAL(AVG([Longitude])) + [Column Offset] * [waffle jitter]

Note that I’ve only tested this so far with US latitude & longitudes, these calcs might need to be slightly different if you are east of 0 longitude or south of the equator.

Building the View

With all the calcs created in the workout sheet and verified, then we can duplicate that sheet and start dragging & dropping pills to build a view. In this case I’ve created a dual axis on to show that the centers of the unit chart are on the centroids of the zip codes (the black crosses) and at this zoom level a waffle jitter value of 0.02 works well:

For this view I’ve made the waffle jitter slightly smaller to 0.015 degrees and added white borders to the marks:

Why Stop Here?

We don’t have to use just circle marks, in this view I’ve used shape marks and different colors for each shape just to show how much information we could place on a single view.

And in Tableau v10.5 we could even do something like only show the unit chart on a map if there weren’t too many marks nearby (by using a binning or hex bin function in an LOD expression) and use the new Viz in Tooltip feature to get the detail.

Conclusion

Thanks to Alan Eldridge for the original jittering post and Sarah for adding more jittering options and inspiring this post! Here’s the unit or waffle chart jitter on maps workbook on Tableau Public.

 

 

 

 

Cross Data Source Joins + Join on Calcs = UNION Almost Anything in Tableau v10.2

Since Tableau v9.0 or so every new release has come with new features that simplify and reduce the amount of data prep I have to do outside of Tableau. Pivot in version 9.0, the first batch of union support in v9.3, support for ad hoc groups in calculations, cross data source joins and filters in v10.0, and more in-database unions and join calculations in v10.2. With the join calculations we can now do unions and cross/cartesian joins within or across almost any data source without needing Custom SQL or linked databases and without waiting for Tableau to implement more union support, read on to learn how!

Here are some use cases for unions across data sources:

  • Union data that is coming from different systems, for example when different subsidiaries of an organization are using different databases but you want a single view of the company.
  • Union actual sales data from transactional systems and budget data that might come from an Excel spreadsheet.
  • Union customer & store/facility data sets so you can draw both on the same map.

This post goes through examples of all three using a combination of text files and superstore, and Rody Zakovich will be doing a post sometime soon on unions and joins with Tableau data extracts. (Did you know you could do cross data source joins to extracts? That capability came with v10.0, and we can have all sorts of fun with that using join calculations!)

Continue reading

Two Sequential Color Palettes on the Same Map: Coloring by a Dimension and a Measure

Back in 2014 I was inspired by this New York Times post The Most Detailed Maps You’ll See from the Midterm Elections to try to figure out how to replicate those maps in Tableau. The maps essentially uses three color palettes on the same map, blue for Democratic leaning areas, red for Republican, and purple for tossups where the lightness (relative light/dark) of each color palette is based on the population of the precinct. So we might think of this in Tableau as coloring by a dimension (really a discrete value) of red/blue/purple and then shading that by a measure (the population), however that’s not something available in a couple of clicks.

This post covers a variety of techniques to build such a map, to simplify for this post we’ll build this map where there are two color palettes

screen-shot-2016-10-22-at-12-43-17-pm

For this example the data set is Superstore Sales with some filters applied to create more variation. A set of states have been arbitrarily grouped as Red and another set as Blue, and the goal is to shade them within the Red and Blue hues by the % of Total Sales. I added some additional requirements:

  1. The lightness of the colors in each palette must be equivalent (symmetrical) so the “darkest” color in each color ramp corresponds to the highest % of Total Sales regardless of which hue the color is (red or blue). This allows the user to compare the relative lightness across the hues, if we don’t then then the visualization will lead to inaccurate conclusions.
  2. The color ramps for the palettes must automatically update as the data changes, for example if a user filters the view or new data arrives.
  3. The color legends themselves have to be accurate and appear in the worksheet. This is an ease-of-construction requirement because if we’ve used a technique that hacks the color legends too much then we’d have to manually build a color legend and put that in a dashboard.
  4. We need flexibility with the color palettes so they can look as good (as vibrant) as we want.
  5. The method needs to support at least 1 additional palette, for example if this were used in a US election prediction map to be able to use a single third hue for a true tossup state, or as in the NYTimes post to add an additional purple sequential palette to identify precincts that voted close to 50-50.

I tried several different methods, the only method that satisfies all the requirements is the 6th one. The six methods are:

  1. A calculated field that makes blue negative and red positive and then a diverging palette based on that field. Problem: The color ramps are not symmetrical and the color legend would have to be manually created.screen-shot-2016-10-22-at-12-21-58-pm
  2. Using a custom color palette (see my post An Exploration of Custom Color Palettes) with calculated fields to normalize the values and properly assign the colors in the palette. This enables the lightness to be equivalent and dynamically updates with the data so it’s really close to the requirements and could work in some situations. Problem: This requires more effort setting up the color palettes and calculations and the color legends would have to be manually created.
    screen-shot-2016-10-23-at-7-51-45-am
  3. A dual axis map with background color and foreground as a sequential black palette with transparency. I first learned about this technique from Wil Jones of Interworks in Create a Dual Axis Color Map in Tableau. Problem: The colors are too muted.screen-shot-2016-10-22-at-12-24-17-pm
  4. Calculated field that returns discrete values and then manual assignment of colors. Problem: Colors are limited to the values that appear in the data, in this example the data is sparse so there’s no Red 6-8. so if the data changes the palette won’t automatically update (Tableau will assign the default palette) and and will need manual effort.screen-shot-2016-10-22-at-12-25-09-pm
  5. A dual axis map with calculations that are used to change the level of detail of each Marks Card by returning red states or blue states that have a geographic role assigned, then separate measures are available for each. This is getting close because it gets us completely separate palettes that can have the colors that we want. Problem: To keep the shading of each color palette the same the color ramps had to be fixed at 10%, so if the data changes the ramps will need to be edited to reflect the data or they could be misleading.screen-shot-2016-10-22-at-12-27-43-pm
  6. This starts out similar to method #5: A dual axis map with calculations that are used to change the level of detail of each Marks Card by returning red states or blue states that have a geographic role assigned, then separate measures are available for each. In addition a duplicate State dimension that is not a geography is used to increase the level of detail so the color calculations can have same ramp of values. Problem: Most complicated to set up, and can have bigger queries (which aren’t a big deal when we’re only looking at 50 states), but it’s totally dynamic and meets all the requirements.

Here’s how to build method #6:

Part 1: Dual axis maps at different levels of detail

  1. Make sure you have in your data a field that will group the states. In my case I built a map of the 50 states and grouped the states by manually selecting some and creating an ad hoc group that I called State (group). If there’s a dimension value that can slice the states this will also work, or even a calculated field that creates a dimension.
  2. Create two calculated dimensions for Red States and Blue States. Here’s the Red States formula:  IF [State (group)] = 'Red' THEN [State] END. This only returns the state value and then Null for everything else. Also note the ability to use ad hoc groups in calculations is new in version 10.0, if you’re using an earlier version of Tableau you’d need to use a calculated field or field from the data source as a dimension in the calc.
  3. Assign the Red/Blue States dimensions to the State geographic role.
  4. Create a filled map with the Red States dimension on the level of detail.
  5. Ctrl+drag (Cmd+drag on Mac) the Latitude (generated) pill on Rows to the right to create a second axis (and a third marks card, so there’s All and two Latitude (generated) cards).
  6. Activate the bottom Marks card by clicking on the the right-hand Latitude (generated) pill on Rows.
  7. On the bottom marks card replace the Red States dimension with the Blue States dimension. You should now see a view like this:
    screen-shot-2016-10-22-at-12-30-39-pm

What we’ve done here is to create a dual axis map with two different levels of detail where the two Marks Cards for this map have different dimensions. The upper map has the level of detail of Red States that includes all the states in the Red group plus one Null value for all the blue states, the lower map has the level of detail of Blue States that includes all the states in the Blue group plus one Null value for all the red states.

Now at this point we could put separate % of total measures on the two Marks Cards with a Compute Using on the Red/Blue state, respectively and put those on the Color Shelf. This is method #5 before we’ve fixed the colors:

screen-shot-2016-10-22-at-12-31-51-pm

However this has two issues:

  1. Each Marks Card has a Null value (for blue states on the Red map and red states on the blue Map) that rolls up all those opposing states and ends up being the biggest % of Total and that affects the color ramp so the color range isn’t as big as it could be. Note how the largest blue value is 36.88%, but the map is entirely light blue. We can workaround this by using the filter option on the null/unknown marks indicator in the lower right, however…
  2. The two color ramps are not symmetrical at the top end in terms of lightness/darkness and the only way here to make them symmetrical is to manually fix the color ramps, which then doesn’t meet the requirement to have the view automatically update.

While can manually fix the bottom end of the color ramps at 0 to make them symmetrical there, at the top end we really need to have the % of total measures return the same values so that way we don’t have to manually fix the color ramps. That leads to the next steps in Part 2.

Part 2: Increase the level of detail without increasing the # of visible marks

I played around with a bunch of ideas on how to fix the problem including doing a linear scaling, nesting table calculations, precomputing all the values so I wouldn’t need to use table calculations, and so on, until I came back to revisiting how Tableau’s color ramp is determined for sequential palettes. The default is that bottom end of the ramp is assigned to the lowest value of the field on the Color shelf and top end of the ramp is assigned to the highest value of the field on Color. The challenge with the approach so far is that in each of the Red & Blue color ramps there’s a % of total value (the Null value for the blue states & red states, respectively) that is being calculated but is not visible. And that led to my insight: “Why not enable Tableau to have an accurate set of values for the color ramp that aren’t visible?” and a few clicks later I had it.

This solution leans on a couple of bits of somewhat esoteric Tableau knowledge:

a) In maps the visible marks are the ones that have an assigned geography. In this case for only a subset of states have an assigned geography when the Red States dimension is used on one Marks card and a different subset has an assigned geography when the Blue States dimension is used on the other Marks card. So we’ve already seen that there can be additional marks in the view that aren’t visible, therefore we can have all the states available on each marks card by adding a State dimension that isn’t drawn as a geography.

b) Table calcs have some interesting (to me, at least) behavior when there are multiple marks cards at different grains of detail the table calculation results depend on what Marks card(s) the table calculation is on. If a table calculation is on a single Marks card then only the dimensions from that Marks card plus Rows, Columns, and Pages are available for addressing and partitioning. If a table calculation is on the All Marks card then it can use any of the dimensions anywhere for addressing and partitioning, however the table calculation is separately computed for each Marks card based on the dimensions available for each Marks card i.e. the dimensions on that Marks card + Rows, Columns, and Pages, and dimensions that are not on that particular Marks card are ignored. That’s a lot of words, someday it’ll have to be another post to go into more detail.

In this case to get the % of total table calculation to the return the same result across both the red state and blue state Marks cards we need to set up the addressing of the calculation so it is across all the State dimensions that we are using – the State dimension that will get an accurate % of total plus the Red and Blue States dimensions that are used to create visible marks.

Here’s how to build it starting from Part 1 above:

  1. Duplicate the State dimension.
  2. Remove the State geographic role from the State (copy) dimension that you created in step 1 by right-clicking on the State (copy) dimension in the Dimensions window and choosing Geographic Role->None:
    screen-shot-2016-10-22-at-12-35-03-pmThis is necessary so when we add the the new State (copy) dimension to the view that Tableau doesn’t draw all the states on both the red & blue Marks cards.
  3. Click on the All Marks card header to activate it.
  4. Add the State (copy) dimension to the Detail on the All Marks card. Pro tip – you can drag it onto the Detail button or any part of the open area in the bottom of the Marks card. You should see a view that looks like this:
    screen-shot-2016-10-22-at-12-36-02-pm
    Note the “27 unknown” in the lower right, that’s the sign that the level of detail has increased without making more states visible. If you see all the states in each map then you’ll need to re-do step 2.
  5. We need to build three calculations to make the colors. The first is a bog-standard % of total calculation, we can do that entirely with drag & drop. Start by dragging Sales from Measures to the Detail on the All Marks card.
  6. Right-click on the Sales pill on the Marks card and choose Quick Table Calculation->Percent of Total.
  7. Right-click on the Sales pill again and choose Edit Table Calculation… to open the Table Calculation window.
  8. Make sure Specific Dimensions is selected with all three State dimensions clicked on: Red States, Blue States, and State (copy). Validate that the table calculation is accurate by hovering over the marks, using View Data, etc.
  9. Drag the Sales pill from the Detail back to the Measures window. Tableau creates a new calculated field and gives you a chance to name it, I called it % of Total Sales.
  10. Now to build out the measures that will be used on Color. Right-click on the new % of Total Sales measure and chose Create->Calculated Field… to create a new calculated field.
  11. Set the name of the new calculated field to be Red % of Total Sales, then click Apply.
  12. Repeat step 10.
  13. Set the name of this new calculated field to be Blue % of Total Sales, then click Apply.
  14. Drag both the Red & Blue % of Total Sales measures onto the All Marks card. The All Marks card should now look like this:
    screen-shot-2016-10-22-at-12-39-00-pm
  15. Activate the red state Marks card by either clicking on its header or clicking on the appropriate Latitude (generated) pill.
  16. Click to the left of the Red % of Total pill to put it on Color. The view should now look like this:
    screen-shot-2016-10-22-at-12-39-55-pm
  17. Hover over the color legend and click on the drop-down menu->Edit Colors… to open the color palette.
  18. Change the palette to Tableau’s default Red sequential palette.
  19. Click on Advanced >> to open the advanced section.
  20. Click on the Start checkbox and enter 0. This is to ensure that both color palettes start from the same value.
  21. Click OK. The map should now look like this:
    screen-shot-2016-10-22-at-12-41-07-pm
  22. Activate the blue state Marks Card.
  23. Click to the left of the Blue % of Total pill to put it on Color. Note that both color legends have the same ending value.
  24. Hover over the color legend and click on the drop-down menu->Edit Colors… to open the color palette.
  25. Click on Advanced >> to open the advanced section.
  26. Click on the Start checkbox and enter 0. This is to ensure that both color palettes start from the same value.
  27. Click OK. The map should now look like this:
    screen-shot-2016-10-22-at-12-42-12-pm
  28. The last step in building the map is to right-click on the right-hand Latitude (generated) pill on Rows and choose Dual Axis. Here’s the map:

screen-shot-2016-10-22-at-12-44-15-pm

From here we can clean up the map formatting, edit the tooltips, turn off the unknown indicator, etc.

Notes and Next Steps

A key part of what makes the colors more accurate is that Tableau’s default red & blue color palettes use the same lightness throughout the palettes. If you are going to use a different set of palettes then take some time to make sure the lightness values are equivalent for the palettes.

If what distinguishes between the states is a measure (as opposed to a dimension like the ad hoc group used here) then you’ll need to use a variation on this technique, if you have questions post below in the comments.

If you want to get an additional discrete color (such as to mark tossup states) then you can change the measure that draws colors for a given Marks card to return a single negative value and use a diverging palette with more configuration of the Advanced >> options, for example I’ve set up Minnesota to be yellow in this  view:

screen-shot-2016-10-22-at-12-44-59-pm

If you’re careful with sequential or diverging palettes you can actually make multiple continuous ranges and/or discrete colors in the same palette, see An Exploration of Custom Color Palettes and Keith Helfrich’s post at http://redheadedstepdata.io/color-the-dupes/ for more details. So either method #6 or method #2 could build something like the NYTimes viz that I linked to at the beginning of this post, the question is how much mucking around with color palettes and color legends do you want to do?

Here’s a link to the Two Sequential Palettes on a Map workbook on Tableau Public.

If you think this is awesome and want more you then please check out DataBlick, we provide Tableau training and consulting and even short-term support to help you make better views and dashboards.