Creating a Dynamic Range “Parameter” in Tableau

Tableau’s native parameters have a couple of key limitations as of this writing in February 2018: The list of values is static and we can only select one value at a time, not a range of values. So when we want to do something like set up a viz with user-selectable top and bottom thresholds we have to set two parameters, not one, when what we really want is a nice range control like the view below:

What it looks like here is that I’m using a ranged “parameter” to change the mark color & shape whose set of possible values is based on other filters…which is what we’ve wanted in Tableau for awhile now. Now you might be asking “Why go to all this trouble, why not use Tableau’s JavaScript API or the (forthcoming) Extensions API?”. And my response to that is that not everyone has the expertise available to use Tableau’s APIs or D3 so we’ve got to make the best use of the tool that we have at hand. Curious about how can you build this for yourself? Read on to find out!

Continue reading

Using a Filter Action as a Parameter

I got a question recently about wanting to use an action in Tableau to set a parameter. For example in this view below the goal is to hover over a one of the bars below to send the action to the circles on top and use that value to color the marks, change the shape, etc. In this case what we want to do is some sort of evaluation like [Circles Continent] = [Selected bars Continent] to be able to flag the selected continent and treat it differently, just like we would if we had something like [Circles Continent] = [Continent Parameter].

But for actions that cross worksheets all we have are highlight & filter actions. Here’s what happens with a highlight action:

Tableau’s highlight is limited to greying out the non-highlighted marks and being able to optionally display text.

And if we try a filter action we are even more stuck:

The filter action removes all marks but the selected mark which then breaks the rank table calculation, positions the mark in the wrong place, and doesn’t really let us do things like change the selected mark’s color vs. the other marks.

So Tableau doesn’t have an action to set a parameter value so we’re kind of out of luck…or are we? Try out this viz, you can hover and see the color and size change while the rank value is still preserved:

Now you can get this kind of effect using Tableau’s JavaScript API, this was done without using any JS at all. Read on for an explanation of how you can do this for yourself! Also thanks to Rody Zakovich, he gave some feedback to this and came up with some great extensions that he’ll be posting about!

A caution for Tableau newbies: this uses some relatively advanced data preparation, Level of Detail expressions, data blending, filter actions, understanding of the difference between the grain of the data and the viz level of detail, and knowledge of Tableau’s order of operations. If those terms don’t mean anything to you then you might want to start out by learning about those first.

Tableau is a Data-Driven Drawing Engine

The key to all of this is that fundamentally Tableau is a data-driven drawing engine. By that I mean that what we see in the viz and available interactivity are dependent on the data. So if we feed the right data to Tableau we can get it to do (al)most anything we want. For example in a post from last year I set up waffle or unit charts inside a map.

In this case what we’re wanting to do is change Tableau’s interaction behavior across worksheets. Looking at our options for interacting across worksheets in a workbook:

  • Highlight actions can identify specific values have a very specific set of behaviors around appearance so we can’t change that.
  • Filter actions can identify specific values in the target viz but remove other values.

So there’s a loophole in filter actions…filter actions remove other values, but since the origin & targets of the filter action are coming from the data if we feed Tableau the right data we can have it keep what we want and no more. So in this case we just need to feed Tableau more data (as in copies of the data) so that after the filter action takes effect we have enough data to identify the selected & non-selected marks. Here’s a description of what I mean:

How I think of this is that we’re starting out with i continent values and what we’ll do is expand that out to some number j continent values (actually 2i or i*i), then the filter action will cut that number down to a manageable number k continent values that we can then use calculations to identify the selected and non-selected marks.

Introducing the Scaffold

A scaffold is used in building construction, and in Tableau a scaffold data source is one that helps us get the data “just so”. In Multiple Ways to Multi-Select and Highlight I did a version of this where a union was used to give enough data so that way a mark could be highlighted. That required a full union of the data which can get prohibitively large, so for this method we’ll use a scaffold source that has just the values we need, and then when we want measures from the underlying data we can use a Tableau data blend.

The scaffold uses multiple copies of the list of the values that we want to filter for (Continent in this case). Now if you just have a “flat” table of data and don’t have a separate unique list of values then there are multiple ways to get one, please see Creating a List of Values in Tableau from Text and Excel Sources. I’ll be using

Once you have the list there are two different scaffolds you can use: One uses a cross product (i.e. for every continent there is every other continent), the second uses a union (thanks to Rody for that suggestion and demo). I’ll go through the cross product scaffold first because that’s a bit easier to set up than the union.

Using a Cross Product Scaffold to use a Filter Action as a Parameter

This section goes through the cross product scaffold. A cross product is also called a cross join or cartesian join or Cartesian product and a simple description is “for each value of A return each and every value of B”. So if we start out with the two values [A1, A2] and three values [B1, B2, B3] then we get the six combinations [A1B1, A1B2, A1B3, A2B1, A2B2, A2B3].

In this case we’re building a cross product of the dimension we want to build an action on and for this example we’re using Continent so the cross product will be 6 continents * 6 continents and end up with 36 rows in the scaffold. It’s important that the scaffold only has one record for each combination, if it has more than one record then the calculations below will break and alternative formulae would be required.

I’ll explain a little further about how this ends up working down below.

Creating the scaffold and setting up initial interactivity

  1. In Tableau connect to your original data source, in this case Superstore.
  2. Followed the instructions for creating an aggregated extract source from  Creating Lists of Values for Tableau from Text & Excel Sources. For this next step I used the Continent dimension. Note where you saved the extract.
  3. Connect to the extract (.tde or .hyper file) in Tableau.
  4. Drag a second copy of the extract onto the canvas.
  5. In the join window set up an inner join with two join calculations so the join is 1 = 1.
  6. Rename the copy of the dimension to something useful, I used xprod continent. (xprod is short for cross product).
  7. Rename the data source to something useful, I named it xprod Continents.
  8. Create a worksheet for the target using the scaffold (xprod) source as the primary with any necessary fields from the secondary source. This view requires the dimension & xprod dimension to be somewhere on the viz. To help see what’s going on I used a crosstab to start. Note that the xprod dimension is not in the compute using of the rank table calculation since there are multiple copies of the data.
  9. Connect to your original data source.
  10. Create the origin worksheet, in this case it’s a simple set of bars:
  11. Build a dashboard with the origin and target sheets.
  12. Add a filter action as a Select filter and and add a filter that for the source field uses the original dimension from the raw source and for the target field uses the xprod dimension from the scaffold source from step 6.

Here’s a demo:

How does this work?

The scaffold source has 6 copies of the data, one for each xprod Continent. The filter action targets the xprod continent so when the filter action is triggered only one xprod Continent remains, and because we’ve multiplied the data there are the 6 Continent values remaining.

This leaves us with two useful attributes for each mark – the Continent, and the xprod Continent that identifies the selected value. The xprod continent is effectively the [Selected bars Continent] or [Continent Parameter] that we originally wanted to be able to do evaluations like  [Circles Continent] = [Selected bars Continent] or [Circles Continent] = [Continent Parameter], only we had to do some extra data preparation to get there!

Identifying Selected Marks for Setting Color, Size, etc.

Now to we can do the evaluations to identify the selection status. There are three states to track:

  • No selection made at all (which is something we can’t do with a regular parameter)
  • The selected mark
  • The non-selected marks (when there is a selected mark)

When there is no selection at all then there are 6 xprod continents for each Continent so we can count those and be able to flag the selected/non-selected state. Then if a selection is made the Filter Action reduces the data to only one value of xprod continent so we can test for that to identify the selected mark vs. non selected marks.

Here’s the Selection Status (xprod) formula used in the scaffold source:

//given the scaffold source COUNT(continent) across the data will return more than
//1 when the scaffold isn't filtered
//this uses the ability of EXCLUDE LOD expressions to be evaluated as
//record level calcs before they are aggregated in the view
IF {EXCLUDE [xprod continent]: COUNT([Continent])} > 1 THEN
    -1 //no continent selected
ELSE
    // identify selected continent
    IF [Continent] = [xprod continent] THEN
        1 //selected
    ELSE
        0 //not selected
   END
END

With this flag now in place we can create additional calculations that can be placed on Color, Size, Shape, Label, Tooltip, etc. or even elsewhere in the viz.

For example here’s the Highlight Text calculation:

IF [Selection Status Flag (xprod)] = 1 THEN "I'm selected!" END

This only returns “I’m Selected” for the selected mark and Null for everything else. By putting this on Label it only appears when the mark is selected and can be used on Color as well. Note that it uses the ATTR() aggregation because the Selection Status Flag (xprod) is using an EXCLUDE LOD expression.

I created another calculated field for Size and some customization of the Size so that the nothing selected state has a mid-size neutral state, the selected mark is large, the non-selected marks are small. Here’s the completed viz:

Setting up with Select and Iterating

A couple of notes on setting these calculations up in the view – since we are using fields that have different results depending on the filter action status we will need to do an iterative process. For example when using Highlight Text on Color I needed to put the field in, trigger the filter action as a select action (so it would stay in place when I moved off the mark), then set the color for the selected mark, then verify everything was working by turning the action on and off, and then finally making the action a hover action.

Removing the Extraneous Scaffolded Marks

If we select all the marks in an unfiltered/non-selected scaffold view there are 36 marks – behind each Continent mark we can see there are the scaffolded continent marks from the xprod continent dimension. Personally I don’t like views that have extra marks kicking around for the following reasons:

  • The more marks Tableau has to draw the slower the viz.
  • Even though the marks are hidden they can cause confusion on the parts of users as they interact with the viz.
  • The extra marks will be part of any viewing of summary data or data downloads and that can be especially confusing.

So how can we filter out the extraneous marks? This is where knowledge of Tableau’s order of operations, the viz level of detail (vizLOD), and filter actions comes into play. ~~link to documentation. The vizLOD is Continent and xprod continent and when there’s no filter action there are 6 xprod continents for each Continent, whereas when the filter action is activated there is 1 xprod continent for each Continent. Now the filter action is applied as a record-level aka dimension filter in Tableau’s order of operations ~~link so we need to use a filter that comes after that which could be an aggregate filter, a table calculation filter, or (as in this case) and INCLUDE or EXCLUDE LOD expression-based filter. Here’s the formula for the Remove Extra Marks (xprod) calculation:

[xprod continent] = {EXCLUDE [Continent], [xprod continent] : MIN([xprod continent])}

This uses a variation of the technique from my earlier post on identifying a dimension at a lower level  where we’re using a Level of Detail expression to compute a result as an aggregate and then comparing it a record level. In the filtered view we know there’s only one xprod continent for each continent so that works out just fine.

~pic of selected

In the unfiltered view the EXCLUDE LOD will return the first xprod continent (probably Antarctica) and then only that one is kept while showing the 6 continents.

~pic of not selected

With this filter in place we end up with only 6 marks either way and have removed the extra marks added by the scaffolding to get a nice clean viz.

Final Notes on using a Cross Product Scaffold to use a Filter Action  as a Parameter

This is not a technique for the faint of heart, it’s using a wide range of Tableau’s functionality to get a specific set of user interactivity. So it might not be for you. In building views like this for me where I’ve worked out the details of how the calcs need to work the most challenging part is often building the scaffold source. For example if you have hundreds or thousands of values of the dimension(s) you need to scaffold then the cross product can get prohibitively large, and for that we’ve got the alternative of using a union, we’ll cover that in the next section.

Using a Union Scaffold to use a Filter Action as a Parameter

Rody pointed this out to me as an option, this method uses a union’ed scaffold source instead of a cross product and a filter action whose filter pill is set to Exclude. So the scaffold source can be a lot smaller, but the set up is a little more complicated.

Overview 

For this method instead of having N sets of values in the scaffold there are only 2 sets of values. We set up special calculated fields in the scaffold and the original data that will enable the filter action to exclude (remove) from a selected value from the scaffold so we can use that difference to detect what has been selected.

How to Build the Union Scaffold

Here’s how to build this, this is a slight variation on the instructions for an aggregated extract from Creating Lists of Values for Tableau from Text & Excel Sources:

  1. In Tableau connect to the raw data source.
  2. Union the raw data to itself.
  3. Create a worksheet that only has the necessary dimensions plus the Table Name and Sheet dimensions.
  4. Create an aggregated extract per the instructions in the link. ~pic

This ends up with a scaffold source where there are two copies of the list of values, like this: ~pic

Setting Up Interactivity

  1. In the original data source create an ExcludeOrigin field in the original data with the formula '~~' + [Continent].
  2. Create an origin worksheet with the Continent & ExcludeOrigin fields.
  3. In the scaffold source create an ExcludeTarget field with the formula:
//there's an implied ELSE Null, the Null values are the ones we will ultimately keep
IF [Table Name] = 'Data1' THEN '~~' + [Continent] END
  1. Build the target Scaffold sheet with Continent and ExcludeTarget as dimensions. Note that there are 2N marks where N is the number of Continents with 2 values of ExcludeTarget for each. ~pic
  2. Add any measure(s) you want from the original data via a data blend.
  3. Create a dashboard with the two worksheets.
  4. Add a filter action on Select from the origin worksheet to the target worksheet that goes from the ExcludeOrigin field to the ExcludeTarget field. ~pic
  5. Trigger the filter action by selecting a mark on the origin worksheet.
  6. Go to the target worksheet.
  7. Right-click on the Action (ExcludeTarget) pill on Filters and choose Edit Filter…  The Edit Filter window appears. (If you don’t see the pill on Filters  then you haven’t triggered the Filter Action).
  8. Click on Exclude, then click OK. ~pic
  9. Go back to the dashboard and click on different marks on the origin worksheet, you’ll see the target update.

To explain how this works we have to keep in mind that there are effectively two states:

  • When there are no marks selected in the origin worksheet then nothing is excluded from the target sheet and we see all N (12) marks from the scaffold.
  • When a mark is selected in the origin worksheet then the corresponding mark with the non-Null value of ExlcudeTarget is removed from the viz, leaving us with N-1 (11) marks remaining.

Identfying Selected Marks for Color, Size, etc.

Because this scaffold is built using a union the detection of mark selection status works a little differently, here’s the formula for the Mark Selection Status (union) field:

//In the union scaffold there are two states: all rows exist or one has been filtered out by the selection
//if all rows exist then there are 2x the number of continents and we can test for that
IF {EXCLUDE [Continent], [ExcludeTarget] : COUNT([Continent])} % 2 = 0 THEN
    -1 //no selection made
ELSE
    IF {EXCLUDE [ExcludeTarget] : COUNT([Continent])}  = 1 THEN
        1 //the selected value
    ELSE
        0 //the non-selected values
    END
END

Essentially since we’ve doubled the data then we can use the modulo (%)  operator to detect that doubling and identify the no selection status, then by counting continents we can find out whether are 1 or 2 records and identify the selected/non-selected marks.

From here the other calculations are all the same as for the cross product scaffold except for the Remove Extra Marks calculation. In that case the Remove Extra Marks (union) formula is:

ISNULL(ATTR([ExcludeTarget]))

Note that we could just use ATTR([Exclude Target]) and filter for Null as an alternative…this is one of those cases where I like having a separate calculated field because then by the name of the field I can give the viz maintainers a chance to understand what is going on.

Here’s a completed dashboard using the union, you’ll find the interactivity to be the same as the cross product version:

Conclusion…or…When Should I Use This?

When I’m building a dashboard and my users are wanting interactivity that is more than what Tableau immediately offers I go through a mental checklist:

  1. Is the goal something that we can pull off using highlighting, sheet swapping, filter actions, parameters, sets, etc.?
  2. Is this the only “ask” for additional interactivity or are there other cases for this dashboard where the desired user experience is pushing the boundaries of what is provided in Tableau? If so are there resources to use some JavaScript and Tableaus JS API?
  3. Only then do I start considering more complicated methods that require more data prep and configuration like the one presented here.

Here’s a link to the Filter Action as Parameter dashboard on Tableau Public. Hopefully you learned a bit about how to take advantage of Tableau’s capabilities, if you have any alternatives or questions please ask in the comments below!

Creating Lists of Values for Tableau from Text & Excel Sources

There are various use cases where we start out with a “flat” table like the Superstore sample data that has a number of columns with various dimensions and we want to make a simple list of unique values of one or more dimensions. such as a list that has just the six continents in Superstore:

The use cases for this include:

  • Using a filter action value as a parameter in the target source (look for posts from myself and Rody Zakovich on this in the next week).
  • Cross data source filters with higher performance when the list of filter values can be small compared to the volume of data.
  • Creating scaffold data sources to pad out data and ensure there are no sparse combinations of values.
  • Situations where we’d want to do a union or cross product of the data to do something like a market basket analysis but the union or cross product would be prohibitively large, so instead we only union or cross product desired dimension(s) and then join in the original data as necessary.
  • The last multi-select highlighter method from Multiple Ways to Multi-Select and Highlight in Tableau can use a self-union.

If you are starting out with a well-structured data warehouse with dimension tables, can write SQL, Python, or R, build custom views on the data source, use data preparation tools like Alteryx or Easymorph or Trifacta, etc. then obtaining or generating these kinds of lists is pretty straightforward. But not everyone has those skills or resources, and in the case of users who just have Excel and/or text files we need to get creative. This post goes through a three different methods to get these lists in Tableau:

    1. Ask!
    2. Aggregated Extract
    3. Excel Pivot Table as a Data Source
    4. Custom SQL

In this post I’ll go through each of these options. [Note: this post was updated on 10 Jan 2018 to make the aggregated extract method a little simpler.]

1. Ask!

This might seem obvious, but sometimes we’re stressed out and under deadlines and don’t realize we might be able to get help. If the data you are working with is coming from someone else then go ahead and ask them if they have a list of unique values. I’ve found that most people want the data they produce to be used and used well and if I’m coming back to them asking for something so I can do more with “their” data they are happy to accommodate me. I might phrase the request like “I want to make sure I’m using the latest list of departments, can you give me that list?”

The one caveat to getting data back from your ask is that you’ll need to go through some validation to make sure the list matches up with the “real” data, sometimes the amount of validation and cleansing isn’t worth the effort and one of these other approaches is better. However if you’re in a data-starved environment the kind of relationships you can make by asking for data can lead to more trust and ultimately more access to the data you want (and need).

2. Aggregated Extracts

For this method we’re going to connect to the data source and build an extract only we’ll be telling Tableau to aggregate the data to the desired level of detail (the field(s) we want to use) before Tableau builds the extract. The resulting extract then just has a record for each combination of field(s) that we want to use.

  1. Connect to the data source.
  2. Create a single worksheet with the field(s) you want to use as dimension pills, I usually just put them on Rows as discrete (blue) pills:
  3. Right-click on the source and choose Extract Data… The Extract Data window opens.
  4. Click on the Aggregate data for visible dimensions checkbox.
  5. Click the Hide All Unused Fields button.
  6. Click Extract. Tableau will ask where to save the extract. Choose a location and click OK.

Voila, you now have an aggregated extract source that you can use in Tableau data blends and/or join to!

Notes on Aggregated Extracts

There are a few things to keep in mind when using aggregated extracts: First of all there’s the need to refresh them to keep up with the data so if you have Tableau Server you’ll need to set up an appropriate schedule, if not then you’ll need to set up your own manual or automated workflow that gets the results you need. One possibility is using Tableau’s extract API.

Secondly if new columns are later added to the data they are automatically added to the extract. This may be ok for some use cases, there are others where this will break views that depend on that extracted data.

Finally, if you want to join on this aggregated extract you’ll need to join directly to the .tde or .hyper file.  Where this gets complicated is handling data updates. You’ll need one workbook or workflow to update the extract and then use the extract in a second workbook. Unfortunately we can’t publish the extract to Tableau Server or Online and join to that published data source (yet), otherwise that would be an easy workaround. There are a number of cases where a Tableau data blend is sufficient, we’ll be demonstrating one in the next week.

3. Excel Pivot Table as a Data Source

For Excel sources besides connecting to worksheets with raw data we can connect to worksheets that are built as a pivot table.

Here’s how using Excel 2016 for Mac:
  1. Open the source in Excel.
  2. Create a pivot table in a new worksheet.
  3. Drag the field(s) you are interested in to Rows.
  4. Rename the Row Labels header to have appropriate values if necessary.
  5. Remove the grand total.
  6. Rename the worksheet to something more meaningful than Sheet2.
  7. Save the workbook in Excel.
  8. Open up Tableau and connect to the Excel workbook.
  9. Drag the pivot table you just added onto the canvas:
Now you can use this to join to other tables and/or use in data blends.

Notes on using Excel Pivot Tables as a Data Source

Before Tableau introduced Level of Detail expressions in version 9 I used pivot tables in production views to pre-aggregate the data for some values and also to create tables I could join on to pad out the data so I could be sure to see records for every (person, office, metric) for every month. This method has one potentially major challenge around data updates, though, and that is that if we have data in worksheet A and a pivot table in worksheet B and we update the data in A (such as adding a new value that should appear in the pivot table B) that change won’t be reflected in the pivot table B until there is an explicit command in Excel to update the pivot table B and then save the workbook.

Even though we can tell Excel to do things like “Refresh data when opening file” this flag is only detected by Excel, not Tableau. Therefore to get updates to the data to be reflected in the pivot table the workflow has to include the steps to do a Data->Refresh All or open the pivot table worksheet before saving the workbook.

4. Custom SQL for Excel & Text Files

When I’m delivering Tableau training classes and we get to the point of talking about SQL & Tableau there are two common reactions: 1) yeay! and 2) [eyes glaze over]. This part is for the people in the latter category. Tableau hasn’t turned everything we might want to do into point & click, so sometimes we need to work with raw data. We do this in our everyday lives…there’s no good vegetarian restaurant in my town so when my wife & I want African ground nut stew we’ve got to make it ourselves. So I think of using Custom SQL as using the raw ingredients of the data to get a result I don’t have another way to get. However, in this case we’re going to be lazy (in a good way) and make Tableau write the SQL for us! Here’s how (these instructions don’t work for Tableau for Mac, see the Notes section below for more info):

    1. Start adding a new data source that is the Excel or text file you want to connect to.
    2. In the Open dialog select the file, then on the Open button click the drop down carat and choose “Open with Legacy Connection”.  You’ll return to the data source window.
    3. Drag the worksheet or file if necessary onto the canvas.
    4. Use the Data->Convert to Custom SQL menu option. The Convert to Custom SQL window will appear.
    5. Edit the Custom SQL to remove all the fields that you don’t need.
    6. Make sure to delete the trailing comma from the last field in the SELECT before the FROM.
    7. Add the DISTINCT keyword after the SELECT before the first field. The SQL query will now look something like this:
    8. Click Preview Results… to test. If it comes back with an error then check your syntax (see notes below for some tips) and try again. If it works by showing a View Data window with your results close the View Data window and then click OK to close the Custom SQL window You’ve now created a unique list of values using custom SQL!

The advantage of using Custom SQL compared to using an aggregated extract or pivot table is that it updates with the data and doesn’t require the more complicated workflows of the other methods.

Simple SQL SELECT Query Syntax

Here’s a really simple example for getting one field from one table:
SELECT DISTINCT [table].[field1] AS [field1]
FROM [table]
If you want multiple fields from one table the SQL query looks like this:
SELECT DISTINCT  [table].[field1] AS [field1],
   [table].[field2] AS [field2],
   [table].[field3] AS [field3]
FROM [table]

In some ways SQL is written a little backwards, and in more complicated queries backwards and forwards. To me the real “starting place” of a SQL query is the FROM part because that is telling the SQL engine where (what table, worksheet, or text file, generically called “table”) to get the data from. Then the SELECT is going to grab the set of fields that we specify. The DISTINCT keyword tells the SQL engine to only get the unique (distinct) combinations of values of those fields instead of grabbing every single record.

The field names themselves use the [table name].[field name] convention so that if there are multiple tables in a query each field referenced can be uniquely identified. The table and field names are surrounded by square brackets by default to handle situations where the table or field name might have spaces. Finally Tableau uses the AS [field name] aliasing option to ensure that the name used by Tableau is a usable name in Tableau.

SQL doesn’t care about spaces & line feeds, we could write SELECT DISTINCT [table].[field1] AS [field1] FROM [table] all one one line and it would work just fine.

SQL cares very much about the placement of square brackets & commas, if one is out of place or missing then the whole query will fail. Make sure that you have all brackets in place and make sure that the last field in the SELECT doesn’t have a comma after it.

Notes on Custom SQL for Excel & Text Files

The Legacy Connector is not available on Tableau for Mac, so we can’t use this particular method for connecting to Excel or text files on the Mac.

The Legacy Connector is actually the Microsoft JET driver that was phased out in Tableau version 8.3 for a variety of reasons, here’s a link of differences to be aware of from the Tableau legacy connector documentation. Also here’s the Tableau documentation on Connect to a Custom SQL Query. Finally I did a post awhile back on details of using the Custom SQL in the context of Microsoft Access connections which also use the MS JET driver, some of the points there are useful to keep in mind.

Hacky…or not?

If it all seems a bit hacky and contrived then I agree with you. At this time if all we have are Excel or text files and what features Tableau provides we’re in a low-resource environment and workarounds are necessary.

I regularly see projects I’m working with needing to invest more in data preparation in order to keep Tableau humming along. That investment could be in scripting languages like Python or PowerShell or R, using PowerQuery, starting the process of moving data into a database (there are free versions of many databases), and/or use more dedicated data preparation tools like Alteryx, Easymorph, or Trifacta. I like to set expectations around this early on in new projects because once they start using Tableau invariably projects run into imitations of their existing data pipeline to provide the volume and variety of data that they can now analyze in Tableau.

Conclusion

The goal for this post was to set you up with the skills you need to get a custom list of distinct values to support several different use cases and I hope this did that for you. As mentioned early on, Rody Zakovich and I have some posts in the works that use this to do some new things in Tableau!

Text Wrapping within Tableau Panes

I recently got a request for help with a wrapping challenge. Not holiday presents, instead text wrapping in Tableau. Here’s a demo view illustrating the problem:

There’s a whole bunch of ellipses (…) where there’s too much data to display. The only options Tableau gives us in this case are to:

  • Change the order of the text by sorting the Product Name dimension.
  • Manually change the size of each pane by resizing the headers.
  • Use the Fit options to dynamically Fit Width, Fit Height, or Entire View.

The manual sizing is problematic because it won’t dynamically adjust to the number of marks, and in the case of views with lots of marks like this one it takes a lot of effort to figure out what size will get all the marks, never mind that the list of values is really hard to read:

And while the Fit options are great at ensuring a view with only a few marks takes up the available space when there are many marks it ends up either not displaying values or creating overlapping values depending on the settings.

Controlling X + Y

In this view the mark layout–either no pills or only discrete (blue) pills on Rows and Columns– is generating a pane for each distinct combination of header values and then Tableau’s mark stacking algorithm is laying out the text. So at this point we’re stuck and can’t do anything about what Tableau is up to. This is where we need to keep in mind one of the master Tableau concepts: everything is a scatterplot. If Tableau won’t place the marks where we want them then we can generate our own X + Y coordinates whether by adding data or creating our own calculations. This is the approach taken by the tile maps introduced to Tableau by Brittany Fong or or as a model of the solar system that I made awhile back: . More recently Ken Flerlage did a great introduction on his Beyond Show Me series of blog posts.

Therefore “all” we need to do is figure out where to place the marks. More on that in a moment, there are two more details I want to go into:

Green Panes vs. Blue Panes: Pane Sizing

Tableau’s logic around what generates a pane vs. marks in a pane is a little complicated so I’m going to keep this focused on three key elements, here are the first two:

  1. All panes created by a given combination of pills are the same size.
  2. (corollary to #1) If we resize one header or axis then all the other panes for that header or axis will resize as well. Tableau does this because it’s easier to visually parse (read) a view that has consistent sizing of elements.

Here’s a view with COUNTD(Product Names) on Text & Color with just discrete (blue) pills on Rows and Columns:

Somehow we need to fit 509 Product Names into the pane for Q4 2015/Office Supplies. If I resize Office Supplies to be taller then both Furniture & Technology change as well:

The same goes if I’ve got a continuous axis to place X/Y coordinates on. In this view I’ve simply put MIN(0) and MIN(1) on Columns & MIN(0) Rows and we can see a set of axes:

If I resize MIN(1) on Columns then to make it wider then all of the panes for MIN(0) and MIN(1) on Columns are resized.

So we can’t really dynamically resize panes to fit the data, all we can do is fit more or less into a pane. Therefore the desired solution can’t involve resizing panes, instead we will need to be generating more or fewer panes, and that leads to the next point around panes.

Green Panes vs. Blue Panes: Number of Panes

The third key element around green panes and blue panes is this:

  1. a) Continuous pills generate an axis for every discrete pill header. b) Discrete pills generate a header for every value of the pill.

We can see a) in action in the continuous views above, with MIN(0) and MIN(1) on Columns we get two axes for each quarter/year combination. So to add more axes we’d need to add more continuous pills but we can’t dynamically add them, and the number of axes ultimately depends on the discrete pills anyway so discrete is the way to go.

We can see b) in the discrete views above, there’s a header for each quarter in each year. Where this gets a little more interesting (and more useful in our case) is when the data is sparse, as in this case where the Avery products are not sold in every customer Segment:

Avery 5 is only sold in one segment so there is only a single header for Consumer, whereas Avery 494 is sold in all three segments so there’s a header for each.

So how this comes together is that in reating X/Y coordinates for positioning the text in our desired view we’re going to use discrete headers that can give us just enough headers (and no more) for the task, here’s a pic of the desired view with those headers:

Packing the Marks: the Algorithm

I experimented with some different layouts and looked at the following factors:

  • In each pane there’s a list of 0 or more values (marks).
  • At least in English when we’re reading lists we tend to make them top to bottom and when more is needed we add another column to the right.
  • There’s a balance in readability between too many columns vs. too tall columns. When there are many columns already then adding more columns for the list makes the view harder to read; in other words, a “tall” view with fewer columns is easier to read than a “wide” view.
  • When the panes in a row or panes in a column have different numbers of marks it’s important to efficiently stack the marks: too much white space can make the view harder to read.
  • A stacking layout that is closer to a differently-sized squares is easier to read than one that ends up with differently-sized rectangles.

The algorithm I came up with is a variation on the panel chart layout I used in Waffle Charts and Unit Charts on Maps that uses table calculations. The algorithm does the following:

  • Calculates the index for each mark in a pane using INDEX() and the number of marks in a pane using SIZE(). These calculations are used in the following calculations.
  • Counts the number of mark columns needed for each pane where there’s a Max # of Mark Columns. parameter to set a “no more than” value to prevent views from getting too wide. Then a nested calculation counts the maximum number of mark columns in each column.
  • Once we have the number of mark columns then the algorithm computes the number of mark rows for each pane, and then gets the maximum number of mark rows for each row.
  • Finally the mark row position and mark column position can be computed based on the index for each mark in the pane and the available number of rows and columns.

I numbered the calculations so they can be each brought into a workout view in order with their compute using set and validated before moving to the next calc. Calcs 1 & 2 require a compute using on the dimension to be used on Text and Calcs 4 & 6 have nested compute usings, see the comments on the calcs for details.

Here’s the workout view:

One complication is that the date dimensions are on Detail with custom dates with the ATTR() aggregation on Rows. This is a method to prevent unwanted data densification.

Once the workout view is built and validated then it’s possible to duplicate the view and rearrange pills, here’s that view:

There’s still a bit of manual resizing required, in this case it’s just to have enough size in each of the panes created by the column and row position table calculations to display the text. Once that is done those headers can be hidden for the final view:

We’re not limited to a text display, for example here’s a highlight table that only took a couple more clicks:

 

Conclusion

Here’s a view to play with where you can adjust the Max # of Columns parameter and the number of states (which is a proxy for how many products are displayed). Click the image to open the text wrapping in pane view on Tableau Public:
The key concept to keep in mind is that when Tableau won’t plot marks where we want we can add to the data source to get the necessary X&Y coordinates via joins, blends, and/or writing calculations. Since Tableau was designed as a tool to support interactive visual analytics tasks like making giant text tables with the desired text wrapping can take more effort than we might like, however given Tableau’s flexibility we can get the job done.

Waffle Charts and Unit Charts on Maps

One of the people I really wanted to meet at #data17 was Sarah Battersby, if you haven’t seen her blogs on mapping you’re missing out. It turns out she wanted to meet me, too!

So Sarah does really awesome stuff (can you tell I’m a fanboy yet??) and yesterday she posted about doing spiral jittering on maps:

That inspired me to dig out an old post where I’d done a waffle chart or unit chart jittering technique on a time series:

And adapt that to a map, like this one that makes me think of bunches of grapes:

Why would we need to do this?

I’m not going to go into all the details, you can go read Sarah’s fantastic Jittering post instead for an overview. Basically there are times when we have more than one “thing” for one or more point(s) on a map and we want to show them that way. For example in the map below there are 5 IDs for the selected zip code that are all plotted on top of one another. Through using jittering we can show a mark for each ID like in the jittered map above where that one visible circle is replaced by five circles.

There is nothing new under the sun

Back in 2013 Stephen Few proposed the idea of bricks on maps as a way of encoding a value, the technique I’m using here could be used to get the same result. The key difference is that instead of using the unit or waffle chart to encode the single value we’re using the unit chart to encode multiple values. Andy Cotgreave also had a nice post on some of the issues with this kind of chart.

How do we put a waffle chart on a map?

First of all we need a data source that has latitudes & longitudes that we can do calculations on. If you don’t there are a variety of ways to get one. After that we need to get mathy. Here’s my workout view:

All of the table calculations have a compute using on ID, the idea is that the compute using is set to the dimension(s) that are making the extra marks and then the dimension(s) left for partitioning (Zip Code in this case) are the ones that are creating the initial map positions.

I’ll briefly walk through each calculation.

Index uses Tableau’s INDEX() formula and uniquely identifies each mark.

Column Count gets us a count of columns that is based on forming the smallest possible square that will contain all the marks, here’s the formula:

//this formula will try to draw a square, if you want something else then just put in a fixed aggregate number like MIN(3)
IF INT(SQRT(SIZE())) == SQRT(SIZE()) THEN
 SQRT(SIZE())
ELSE
 INT(SQRT(SIZE()))+1
END
This would be a lot simpler if Tableau supported the CEILING() function for table calculations, because then it would be CEILING(SQRT(SIZE())). Vote up https://community.tableau.com/ideas/6239 if you’d like that too.

Column Offset then computes the number of positions of lateral (latitude) offset for each mark from the center of the unit chart. I’ve done some different experiments on where to position the unit chart and for me the most natural is to have the center of the unit chart be the centroid of the point.

//this gets the column position in the row
IF [Index] % [Column Count] = 0 THEN
 [Column Count]
ELSE
 [Index] % [Column Count]
END
//subtract the position of the midline to get the offset
 - ([Column Count] + 1)/2

Row Count then gets the number of rows in the unit chart, and would again benefit from a CEILING() function for table calcs (hint hint).

IF INT(SIZE()/[Column Count]) = SIZE()/[Column Count] THEN
 SIZE()/[Column Count]
ELSE
 INT(SIZE()/[Column Count]) + 1
END

Row Offset computes the number of positions of vertical (longitude) offset for each mark from the center of the unit chart:

//row position in column
IF INT([Index]/[Column Count]) = [Index]/[Column Count] THEN
 [Index]/[Column Count]
ELSE
 INT([Index]/[Column Count]) + 1
END
//subtract the midline of the row to get the offset
- ([Row Count] + 1)/2

Jitter – Waffle – Latitude then gets the latitude for the point and subtracts the row offset multiplied by a waffle jitter parameter that lets us deal with the zoom level to offset by just the right amount of latitude. This field has the Latitude geographic role assigned.

TOTAL(AVG([Latitude])) - [Row Offset] * [waffle jitter]

Jitter – Waffle – Longitude does the same for the longitude.

TOTAL(AVG([Longitude])) + [Column Offset] * [waffle jitter]

Note that I’ve only tested this so far with US latitude & longitudes, these calcs might need to be slightly different if you are east of 0 longitude or south of the equator.

Building the View

With all the calcs created in the workout sheet and verified, then we can duplicate that sheet and start dragging & dropping pills to build a view. In this case I’ve created a dual axis on to show that the centers of the unit chart are on the centroids of the zip codes (the black crosses) and at this zoom level a waffle jitter value of 0.02 works well:

For this view I’ve made the waffle jitter slightly smaller to 0.015 degrees and added white borders to the marks:

Why Stop Here?

We don’t have to use just circle marks, in this view I’ve used shape marks and different colors for each shape just to show how much information we could place on a single view.

And in Tableau v10.5 we could even do something like only show the unit chart on a map if there weren’t too many marks nearby (by using a binning or hex bin function in an LOD expression) and use the new Viz in Tooltip feature to get the detail.

Conclusion

Thanks to Alan Eldridge for the original jittering post and Sarah for adding more jittering options and inspiring this post! Here’s the unit or waffle chart jitter on maps workbook on Tableau Public.

 

 

 

 

A Dating Lesson: Using Math to Get Dates (by Converting Julian Dates)

I recently had to convert some date fields formatted as Julian dates into more usable Tableau date data types (it’s not a date format that Tableau natively understands) and my web search got me a little frustrated because all the solutions I found were long formulas using string manipulation functions like LEFT(), RIGHT(), etc. That kind of solution can get the job done, but there’s something to be left desired:

String manipulation is sloooooooooooooooooooooooooooooowwww.

How slow? How about one thousand times slower (or more) than using math functions! I like my calculations to be efficient from the beginning: The practical reason is that I don’t have to spend time later going back to do performance tuning. Computers were built to do math really really fast so whenever possible I’ll use math to do my date conversions. The aesthetic reason is that there’s an elegance to creating equations that use the least number of steps, and the play reason is because math is fun!

Therefore in this post I’ll describe a faster, more elegant, and more fun way (IMO) of converting Julian dates into Tableau dates. Note that in some cases you might be able to do the conversion to a date data type inside your database, that could be even faster than the formulas I describe here.

What is a Julian date, anyways?

There are two formats of Julian dates that I’ve run into:

  • The JDE (for JD Edwards) or CYYDDD format where the DDD is the day of the year (a number from 1 to 366), YY is the two digit year, and C is the century where 0 is 1900, 1 is 2000, and so on. For example 5001 is 1905-01-01 and 117365 is 2017-12-31.
  • The “modern Julian date” or YYYYDDD format where DDD is again the day of year and YYYY is the four digit year, so 2017365 would be 2017-12-31.

Now the Julian dates might be stored in raw data as strings or as numbers, either way we’re going to work with them as numbers. So if you’re starting out with Julian dates as strings then the first step is to right-click on those fields and change the data type to Number (Whole).

Learning Three Math & Date Techniques

We can convert Julian JDE/CYYDDD dates in a single formula:

DATE(DATEADD('day',[Julian JDE/CYYDDD] % 1000 - 1,
    DATEADD('year',INT([Julian JDE/CYYDDD]/1000),#1900-01-01#)))

There are three techniques at work in this formula:

  1. Extract the right-most N digits from a number using the modulo operator, for example to get the DDD value.
  2. Remove the right-most N digits from a number to extract the left-most M digits from a number using the INT() function, for example to get the CYY value.
  3. Using DATEADD() to add intervals to a starting date and nest for each date_part as necessary.

Extracting the right-most N digits from a number

The slow way to get the right-most N digits from a number would be something like INT(RIGHT(STR([number]),N)). We can completely avoid the string conversion and need for RIGHT() by using some math.

The modulo operator % is a math function that returns the remainder for a whole number. So 5 % 2 returns 1, 17 % 10 returns 7, and so on. We can use this to return the right-most N digits from a number without needing to do any string manipulation by using [number] % 10^N.

In our case [Julian JDE/CYYDDD] % 1000 will return the last 3 digits of the field corresponding to the DDD aka day of year.

A couple of other uses of the modulo function are jittering marks in a dot plot and creating groups/bins based on the right-most digits when numbers are categorical variables.

Remove the right-most N digits from a number to extract the left-most M digits from the number

The starting way to get the left-most M digits from a number would be something like INT(LEFT(STR([number]),M)). However this gets more complicated with the JDE/CYYDDD format because depending on the date then we might want the 2 left most digits (for years in the 1900s) or the 3 left most digits (for years in the 2000s) which would lead to even more string manipulation to get the desired result, for example here’s a formula I found online for getting the year: IF LEN([Julian JDE/CYYDDD]) = 6 AND LEFT([Julian JDE/CYYDDD], 1) = '1' THEN '20' + MID([Julian JDE/CYYDDD], 2, 2) ELSEIF LEN([Julian JDE/CYYDDD]) = 5 THEN '19' + MID([Julian JDE/CYYDDD], 1, 2) END. All those LEN(), LEFT(), and MID() calls will be slooooww.

Thinking about this from a math standpoint what we want to do is with the JDE/CYYDDD format is to remove the DDD (the right-most N digits) and extract what’s left. The way we can do that is in two steps: 1) turn the DDD portion into a decimal so CYYDDD becomes CYY.DDD, then 2) truncate the DDD decimal portion so we’re just left with the CYY.

We can do that with the formula INT([number] / 10^N). The [number] / 10^N divides the number by the number of digits we want to remove, so 5001 becomes 5.001, 117365 becomes 117.365, and so on. Then the INT() around that truncates the decimal places and leaves us with the remaining whole number.

In our case INT([Julian JDE/CYYDDD] / 1000) removes the right-most three digits so what we’re left with is the century and number of years.

I regularly use this technique to build my own bins, for example INT([Sales] / 100) * 100 creates bins of 0, 100, 200, and so on.

Using DATEADD() to add a number of intervals to a starting date

Tableau’s DATEADD() function takes three arguments:

  • date_part: a value like ‘day’, ‘month’, ‘hour’, etc.
  • interval: a whole number of intervals
  • date: a starting date

The advantage of using DATEADD() over something like DATE([lots of string manipulation to build a string of YYYY-MM-DD or MM/DD/YYYY or DD/MM/YYYY format]) is again that we’re avoiding all that string manipulation and just doing date math, which is really math.

With the first two techniques we’ve converted the JDE/CYYDDD format to into the intervals for the day date_part and the intervals for the year date_part, so now we can use DATEADD() once for each date_part in a nested fashion. All we need is a starting date, and in this case we can take advantage of the CYY structure to figure out the appropriate starting date. This is the number of years since 1900, so to generate the year we can use:

DATEADD('year',INT([Julian JDE/CYYDDD]/1000),#1900-01-01#)

Then to add the DDD day of year we can use DATEADD() with a starting date of the year date we just made. We do need to subtract 1 from the DDD day of year because the first day of the year is 1, not 0.

DATEADD('day',[Julian JDE/CYYDDD] % 1000 - 1,
    DATEADD('year',INT([Julian JDE/CYYDDD]/1000),#1900-01-01#))

Julian JDE/CYYDDD Conversion

Tableau’s DATEADD() always returns a datetime data type, so using the above formula and wrapping the calculation in a final DATE() gives us the desired date data type:

DATE(DATEADD('day',[Julian JDE/CYYDDD] % 1000 - 1,
    DATEADD('year',INT([Julian JDE/CYYDDD]/1000),#1900-01-01#)))

Here’s a Tableau view showing the DDD and CYY calculations along with the final Converted JDE/CYYYDDD calculation:

 

Julian YYYYDDD Conversion

The YYYYDDD conversion uses a quite similar formula, with one change:

DATE(DATEADD('day',[Julian YYYYDDD] % 1000 - 1, 
    DATEADD('year',INT([Julian YYYYDDD]/1000)-1900,#1900-01-01#)))

The difference is that the YYYY portion of the calculation is returning the full four digit year so we subtract 1900 from that to get a number of years to add to the 1900-01-01 start date.

Here’s a Tableau view showing the DDD and YYYY calculations along with the final Converted YYYYDDD calculation:

Conclusion

Math is fun. Math is fast. Math is your friend. Use math to get dates. (Date data types, that is).

Here’s a link to the converting julian dates to dates workbook on Tableau Public.

 

A little plug: If you like this post (and can tolerate the occasional bad pun) and want help in finding dates (in your data) please check out my company, DataBlick. We offer Tableau and Alteryx support, consulting, and training and can help you get the most out of your data!

Get Some Pointers on How to Use Tableau (Or Not) Thursday, April 27 in SF

This Thursday, April 27th I’ll be speaking at the San Francisco/Bay Area Tableau User Group at Mapbox’s brand new headquarters on How (Not) to Use Tableau. The talk is based on my experiences helping users and organizations get the most out of Tableau and has something for everyone – tips for brand-new users, dashboard developers, managers, trainers, Center of Excellence/BICC staff, and more! You’ll even see what this has to do with dataviz:

And maybe even see some of my childhood Lego collection as part of a lesson on data structure..here’s a registration link, I hope to see you there! https://www.eventbrite.com/e/how-not-to-use-tableau-lessons-from-zen-master-jonathan-drummey-tickets-33972395349