Tag Archives: aggregation

LOD Expression Remix – Finding a Dimension at a Lower Level

Last week Mark Jackson had a great post on using Tableau v9 Level of Detail expressions to find a dimension at a lower level (with an update here). In his Superstore example where there are multiple Categories in each State, the goal is to show a view of each State with the largest Category in that State based on the number of customers, like this:

In this remix post I’ll demonstrate an alternative solution that doesn’t require any string manipulation, along with going through my current process for building & verifying LOD expressions.

[Post edited 20140527 to include links to Mark’s update to his original post.]

Continue reading →

How to Have Sets with Your Secondary (Data Sources)

9 Replies

Tableau 9.2 adds the ability to use boolean dimensions directly from secondary sources in the view and on the Filters Shelf so it simplifies this setup somewhat. Read How to Have Sets with your Secondary (9.2 Style) with Aggregated Booleans for details.

Tableau Sets can turn incredibly complicated interactions into a few mouse clicks that can reveal patterns in your data. However, if you’ve ever created a Set in a secondary data source and tried to use it across a blend in a primary, this is what you see:

Lots of grayed out items that can’t be touched. It’s like you’re at a party and have just met the hot-babe-of-appropriate-gender-of-your-dreams and they are totally into you, but only at the party, and not back at your place.

However, like many things in Tableau, with a little creativity you can get what you need, and there is a way where you can have your Sets anywhere you want.

If you’re not familiar with sets, then I suggest you start with Hot. Dirty. Sets. That was the title of a great session on sets by Russell Christopher and Michael Kravec at the Tableau Customer Conference in DC last month, the session video is up at http://tcc13.tableauconference.com/sessions. In their presentation Russell and Michael plumbed the depths of bad puns and went through a number of use cases. If you’re not familiar with Tableau version 8 sets, definitely watch the video. If you didn’t got to TCC, I suggest you watch this (vanilla) Tableau On-Demand training video on sets.

For this post I’ll attempt to avoid any more jokes about sets and show you how you can ~~get jiggy with~~ use sets from blended secondary data sources, and probably learn a bit more about data blending on the way.

In this tutorial I’ll be using the Superstore Sales and Coffee Chain data that ship with Tableau, with the Superstore Sales as the primary, Coffee Chain as the secondary. The Coffee Chain data is an Access database, and Tableau uses Microsoft JET for the driver for Access, Excel, and text files (this will change somewhat in 8.2). Microsoft JET has more limited functionality than other database connectors, so the first step is to extract that data into Tableau’s data engine. That will enable us to use the IN/OUT of sets.

Then in the Coffee Chain database we create a set for the Top N States by Sum of Sales:

And here’s a view of that Coffee Chain data, where there are only 20 states:

But when we set up a view based on Superstore Sales and switch over to the Coffee Chain data, any and all sets are ~~taboo~~ greyed out:

However, we can use sets in calculated fields. The In/Out of a Set returns a boolean value, and here’s a simple formula used for the In Top N CC States calculated field:

IIF([Top N CC States],"In","Out")

This is treated as a dimension in the CoffeeChain data, which can lead to all sorts of applications, including using as a filter, color, or in additional calculated fields.

Note that in production, I’d probably use 1 and 0 as values instead of “In” and “Out” and then use aliases (because integers are faster than strings), and to solve a filter issue, however for this demo I’ll stick with In and Out to make it easier to understand and explain the filter issue further down.

Set from Secondary as Filter

Here’s a starting view from Superstore Sales as our primary data source:

Before putting the In Top N CC States on the Filter Shelf, though, in the Coffee Chain data click on State as a linking field:

Important note: whatever dimension(s) you use for the Set must be the blending dimension(s), no more and no less, if they are not then your results will be mighty strange. I’m still trying to work out what’s happening there. If you really need some calculation that blends on a different set of dimensions, then use a duplicated data connection for that blend.

Now you can drag the In Top N CC States calculated dimension from the Coffee Chain data onto the Filters Shelf, and here’s the filter setting:

And the view:

The worksheet is now only returning the 5 states from the primary and secondary data sources that meet the filter criteria from the set from the secondary data source. You’ve now used a Set from a secondary source!

Set from Secondary on Color

How about using the set calculation in the view? If we change the In Top CC States to All values, and put a copy of the pill on the Color Shelf then we see three different values: In, Out, and Null.

Why are there Nulls? This is one aspect of data blending that can be confusing. In the Superstore Sales data, there are 49 US states (48 contiguous states plus the District of Columbia). In the Coffee Chain data, there are 20 states. 5 of those are In the Top N CC States set (blue above), 15 of those are Out (light orange). That leaves 29 states in the Superstore Sales data have no corresponding rows in Coffee Chain. For the In Top Top N CC States set calculation, rather than assigning those 29 states to In or Out, Tableau assigns a value of Null to those states because they don’t have corresponding values, and in this case they get a dark orange color. This is the same behavior Tableau has for any linked dimension value from the secondary that doesn’t have corresponding values in the primary.

How can we help those Nulls ~~come Out of the closet~~ become part of the Out of the set?

This assignment of Null values is done in the primary data source, as there’s so way we can change it in the secondary (without doing some sort of padding in the secondary for all of those states). However, there is a way we can do that in the primary data source. Tableau does all the computation it can within each data source, then blends the data sources together, at which point things like calculations that refer to another data source are computed. We can create a calculation in the primary that refers to the set in the secondary, and test whether the set calculation is returning Null. Here’s the formula from Superstore Sales for the Primary In Top N CC States calculated field:

IFNULL(MIN(IIF([Sample - Coffee Chain (Access)].[Top N CC States],"In","Out")),"Out")

The inner IIF() is a row-level calculation that evaluates the In/Out of the Set, then that gets wrapped in MIN() because we’re working across the blend – Tableau requires us to aggregate all measures and dimensions used in calculated fields from other data sources. The IFNULL() then tests the result of the MIN(), and if there’s a Null from one of those 29 states that is in the primary but not the secondary, the IFNULL() Outs that one as well. Here’s a view with the calculation, where now everything is In or Out:

But you don’t have to ~~be monogamous~~ limit yourself to just one set. You can use combined sets from the secondary, for example in this view with CoffeeChain as the Primary I’m showing the top and bottom 5 states from the CoffeeChain data:

Now to talk about complications and workarounds before getting into some advanced cases:

Discrete vs. Continuous Measures

It’s important to note that the Primary In Top N CC States set calculation is a measure, because it’s using an aggregate from the secondary data source.

One effect is that Tableau won’t let us filter on discrete (blue) measures like the Primary In Top N CC States:

My understanding is that this has to do with Tableau needing to know the domain (range of values) of measures to build the filter. (Here’s a Tableau Idea to support this, vote it up!) However, we can filter on continuous measures, so if you want to use that primary filter you can change the calc to return numbers instead of strings, like this calc called Primary In Top N CC States Continuous:

IFNULL(MIN(IIF([Sample - Coffee Chain (Access)].[Top N CC States],1,0)),0)

And this will work fine as a filter:

Alternatively, you could use a table calc filter such as LOOKUP([Primary In Top N CC States],0) since discrete measures based on table calculations can be used as filters. I prefer the regular aggregate for performance reasons: Table calculation filters are applied after the data has been returned to Tableau, so that can lead to a lot of unnecessary traffic across the wire.

Using In/Out of Secondary Set in Primary Crosstab

In a simple view in Coffee Chain and our Top N set, we can quickly see the Sum of Sales for the In/Out of the set:

We can even set that up from the Superstore as Primary, using the In Top N CC States set calculation from the Coffee Chain secondary, all we need to do is make sure that State is turned on as a blending field and do a little extra filtering to get rid of those pesky Null values:

Advanced Uses for Secondary Sets

You can mix and match Sets from Primary and Secondary sources, here are three examples:

Cohort Analysis

In this view, we’re looking at trends for profits broken down by the performance reviews of our sales people, looking to see if there are any trends. The sales data comes from Superstore Sales, the performance reviews are coming from a secondary data source. The panes are created by the In/Out of the Top 40 Customers by Profit. The top pane shows the sales for the Top 40 customers, the bottom pane everyone else. The lines are colored by the In/Out of the Top 5 Salespeople:

Combining Sets Across Data Sources

To create a combined set, we can use a calculated field that evaluates the sets from the different data sources, like this one that gets the intersection of the Top N States for Sales from both Superstore Sales and Coffee Chain, assuming the Superstore Sales is primary:

And here it is used in a view, with the Top 5 States from each, only 3 states overlap:

The Three Way

Why stop at only two sets from different data sources? In this view using Coffee Chain as primary, we’re coloring the states based on the In/Out of the Top N States by Coffee Chain Sales, the Top N States with highest # of Starbucks per capita (data from Statemaster.com), and Top N States by Superstore Sales. Three data sources, three sets:

And, of course, you can build calculated fields as well, here’s one in Superstore:

Conclusion

Here’s a link for the Tableau Public workbook for Sets with your Secondary.

With a little creativity and a whole lot of jokes at the maturity level of the average American teenage boy, we can get more out of data blending and sets. If you’d like Tableau to have more support for secondary sets out of the box (and generally treat secondary data sources more like primary data sources), please vote for http://community.tableausoftware.com/ideas/2773.

Now, go forth and use sets ~~in strange places~~ ~~any position you want~~ in new ways!

Table Calculations

LOOKUP() vs. PREVIOUS_VALUE

See here: http://community.tableausoftware.com/message/173752

Also this from email from Joe Mako:
Have you attempted the formula that you recommended at
http://community.tableausoftware.com/message/173588#173588 ?

Instead of ATTR([Status]) != PREVIOUS_VALUE(ATTR([Status]))

try ATTR([Status]) != LOOKUP(ATTR([Status]),-1)

The PREVIOUS_VALUE() function is self referential, kind of like a recursive function or a loop. The argument you pass the function is just the starting value for the partition, while the LOOKUP() function gets the value based on an offset, and an offset of -1 means previous.

In your statement, the first mark will result in False, effectively [Status] != [Status], and every other record will also result in True because it is likely [Status] != “False” or [Status] != “True”

The formula ATTR([Status]) != LOOKUP(ATTR([Status]),-1) is basically saying:
“Compare the Status of the current mark and the Status of the previous mark in the partition.”

I recommend you try some trial and error with the PREVIOUS_VALUE() function

Try some formulas like:

PREVIOUS_VALUE(0)+1
PREVIOUS_VALUE(100)-1
PREVIOUS_VALUE(0)+SUM([Sales])
PREVIOUS_VALUE("")+ATTR([Product])+", "

Addressing and Partitioning

The addressing fields define what part of the table you are computing along. The partitioning fields define how to group the calculation. In the example of a running sum of product sales across several years, the addressing field is the Date field while the parititioning field is the product field. When youd define the addressing for a table calculation, all the other fields are used for partitioning.
You can specify the addressing in the Table Calculation dialog box. The addressing can be relative to the table structure or a specific field. Each addressing option is described below.
Table (Across)
This option sets the adressing to compute along the entire table moving horizontally through each partition.

From Table Calcs primer PDF:
• Partitioning: the scope or grouping of the calculation. This can be the full table, a pane, a cell, a dimension or it can be customized even further for more advanced calculations.
• Addressing: the anchor or the source of each partition. It defines the root of the calculation.
Consider, for example, looking at a running sum of sales by product across several years. In our example, the running sum is partitioned by product: every product’s sale is summed over time, so the result of the calculation is a running sum of coffee sales, tea sales, etc. The addressing field is the date field. With every new date, the sales of that data are added to the sum. When you define the addressing for a table calculation, all the other fields are used for partitioning.
Table (Across)
Table (Across) is a Calculate Along option. The chart below uses Table (Across) to set the addressing to compute along the entire table, moving horizontally through each partition.
When calculation addressing is set to Table( Across), the fields that span horizontally across the table are the addressing fields (Category and Region).

From: http://www.tableausoftware.com/support/forum/topic/version-6-table-calcs-understanding-few-basics#comment-27646
The fields specified in the ‘Compute Using’ menu or ‘Edit Table Calculation…’ dialog are termed the addressing fields. All of the dimensions on a sheet that are not addressing fields are partitioning fields. This corresponds to the ‘computed along’ (addressing) and ‘for each’ (partitioning) components of the description in the table calculation dialog. One other nice thing is that in the final v6, the tooltip for a table calculation will actually contain the description also, so you don’t have to open the dialog to understand what it is doing.

From http://www.tableausoftware.com/support/forum/topic/version-6-table-calcs-understanding-few-basics#comment-27736
‘Restarting every’ moves the given field and those above it in the Advanced… dialog to be partitioning. Similarly ‘At the level’ moves fields after it in the list to be partitioning, though there is a subtle difference in how that partitioning is done (its not done on value, but rather on postition within the partition, i’ll give more details on that in the tutorial).

One might ask why you would put something in the ordering list in Advanced… only to partition on it by setting ‘Restarting every’. The answer lies in the sorting behavior. For example, if you want to know the top products for each market, you can’t simply sort products by sales (which sorts based on sales for all markets). Instead, you sort Market,Product by sales. Then, when you partition on Market, the products within each of those Market partitions are still sorted by the sales within that market. Play around with it to see the difference.

My notes:
Addressing is the compute along/over/by.
Partitioning is the “grouping” or “scope” of the calculation – to some degree defined by the “context of the calculation, since All dimensions not part of addressing are used for partitioning. Also includes dimensions (and discrete pills) on Level of Detail shelf (and Color shelf too?)
‘Restarting every’ moves the given field and those above it in the Advanced… dialog to be partitioning.
Example of running sum of sales by product across several years:
product is partition, each new product creates a new returned value of sum of sales
date is address, sum of sales for each new date is added to the returned value for that partition
Where I’ve been confused (I think) is that there are three things in table calcs, not two: partitioning, addressing, and what is being calculated

More notes from convo with Joe on 3/2012

I don’t want to misrepresent Joe’s awesome knowledge here, if there are any mistakes in these notes from our conversation they are almost certainly mine.

Table (Across), etc. do a visual sort base on the layout of the view. The results from Table (Across), etc. can sometimes be the same as what you can get by using Advanced Compute Using, they aren’t always. For example, you can have a complex sort set up in the view and Table (Across), etc. will work just fine, but you can’t duplicate that sort using Advanced Compute Using because Advanced Compute Using creates a set of the fields in the Compute Using and then sorts that set along whatever is set in the Order Along.

This can be confusing because the English language description Tableau gives when you hover over the pill in the view can be the same for a Table (Across) and an Advanced Compute Using(), but the results will be different.

~~~Example using Superstore Sales:
Put Container on Rows, Category on Color, and Sales on Columns. Create an INDEX() function and see what it does try to duplicate that using an Advanced Compute Using

Show Me can be used to “freeze” a setup, typically by Table (Across) or Table (Down). Can be used to get to a view that you can’t get w/Advanced…Compute Using. “Table (Down) is magic dust with Show Me.” ~~~Need an example here, probably one with a funky INDEX().

Roll Your Own Index

IF ATTR([Field]) = LOOKUP(ATTR([Field],-1) THEN
 PREVIOUS_VALUE(0)
ELSE
 PREVIOUS_VALUE(0)+1
END

This creates a rank that shows ties, or can be used to ignore something in a table calc.

~~~Example:
Rows: Roll Your Own Index (set Compute Using to Container), Container (sorted on Sum(Sales) Asc) Color Shelf: Category

This allows the index to be computed on container but ignore category for sort and partitioning. This would be a nice feature.

Padding

Using dimensions on both Rows and Columns shelves causes Tableau to pad, while dimensions only on Rows doesn’t cause Tableau to pad.

~~~Put Container on Rows, Category on Cols, see what INDEX() does Put Container on Rows, Category on Rows, see what INDEX() does – will see different marks used now Compute Using set to Across then Down vs. Down.

next example: Put Customer, Order Date on Rows. Index() on Text Index on Customer – ranks them Index on Order Date – causes padding, Tableau takes a really long time to return data. If you use MONTH(Order Date) it doesn’t take as long.

the key issues w/padding are:
Aggregate vs. dimension
continuous vs. discrete
datepart vs. datetrunc

Some details w/display can get lost when using Primary/Secondary sources. example: duplicate Superstore sales, when Customer from both is in the Customer from secondary does not show ATTR(), i.e. doesn’t show that it’s an aggregate. (thought here is to outline the pills)

How Joe makes table calcs work

1. put every dimension on Rows shelf
2. Measure Names as Columns, Measure Values as text 3. Do addressing on the right-most pill

This way, can turn on totals and subtotals to check what they return vis-a-vis the table calcs. Also can see the partitioning and padding. Using a full-blown cross tab makes it more confusing. Also allows filters using table calcs to be tested since all dimensions are available on the Rows shelf.

Filtering Notes

Table calc filters can only see what’s on Columns and Rows shelves, and/or see their own instance on the Level of Detail, so if needed fields for the calc aren’t on Columns or Rows you can put an instance of the table calc on the Level of detail shelf.

Another advantage to using Joe’s method for making table calcs work is that table calc filters can be tested as well.

Aggregated/continuous pills can go on Filter shelf Aggregated/discrete pills can’t go on Filter shelf Table calc/continuous or discrete pills can go on Filter shelf

Example of this
Create SUM(1) calculated field, call it “Test Agg”. As Continuous, it can go on Filter shelf. As Discrete, can’t go on filter shelf. Create another calculated field LOOKUP([Test Agg],0). As Continuous or Discrete, it can go on the Filter shelf.

(end of Joe notes)

More on At the Level and Restarting Every (Alex Kerin question leading to Ross Bunker post)

http://community.tableausoftware.com/message/178239#178239

Table calc – result only needed for one row in partition:

From http://www.clearlyandsimply.com/clearly_and_simply/2011/01/another-look-at-site-catchment-analysis-with-tableau-6-part-3.html

IF (FIRST()==0) THEN
   WINDOW_SUM(SUM([Population]), 0, IIF(FIRST()==0, LAST(), 0)) 
END

Don’t try this with TOTAL(), it can mess things up

Table calc – result needed for all rows in partition (same answer for each case)

From http://www.clearlyandsimply.com/clearly_and_simply/2011/01/another-look-at-site-catchment-analysis-with-tableau-6-part-3.html

PREVIOUS_VALUE(WINDOW_SUM(SUM([foo]), 0, IIF(FIRST()==0, LAST(), 0)))

Probably shouldn’t try this with TOTAL() either
TOTAL() hits DB while WINDOW_SUM() does not, would probably affect order of filters? TOTAL() causes any enclosed aggregate functions to be evaluated over each partition – but I’m not really sure why “Table Across” behaves differently to
WINDOW_XXX() functions.

Uncheck Ignore in Table Calculations so discrete field can be used in a table calc:
http://community.tableausoftware.com/message/173434#173434

Idea for Table Calcs:
Duplicate a dimension and put the copy on the LoD shelf so calcs can be along that, but summed overall.
http://community.tableausoftware.com/message/174254#174254

Index (roll your own)
From: http://www.tableausoftware.com/support/forum/topic/ignoring-order-along-advanced-table-calc#comment-44787
Based on: http://www.tableausoftware.com/support/forum/topic/top-x-shows-more-x-values-when-using-color-shelf

IF LOOKUP(MIN([Product]),-1)=MIN([Product]) THEN 
   PREVIOUS_VALUE(0)
ELSE
   PREVIOUS_VALUE(0)+1
END

Discussion of this and partitioning (Ross Bunker)
http://community.tableausoftware.com/message/183504#183504

Using PREVIOUS_VALUE to combine multiple rows into one
Example here is to make a list of names like Jane, John, Joe, etc. from multiple rows.
http://community.tableausoftware.com/thread/118259

TODAY() and NOW() generate Internal Expression Error inside table calc
As of Tableau v7, the following formula generates an “Internal Expression Error: Function ‘TODAY’ is not defined in the current context” error.
WINDOW_SUM(IF ATTR([Order Date]) >= TODAY()-29 THEN WINDOW_SUM(SUM([Sales])) END)
The same happens when NOW() is used instead of TODAY(). The solution is to make a calculated field with NOW() or TODAY(), and then use the calculated field in the function.

Given an index to another row, a formula to get an offset to that row from the current row, to use in a LOOKUP() statement:

FIRST()+[the index] -1

Improving table calc partitioning & addressing
http://community.tableausoftware.com/ideas/1189

[loop category=”wikicontent” tag=”tc,tablecalcs,table-calculations”]
[field title] – Added [field date]
[content]
[/loop]

[field title-link][field thumbnail-link]

[/loop]

Preparing Data

Key bits on migrating from Excel

http://kb.tableausoftware.com/articles/knowledgebase/preparing-excel-files-analysis
http://kb.tableausoftware.com/articles/knowledgebase/addin-reshaping-data-excel

anonymizing data for Tableau

http://community.tableausoftware.com/thread/118843

DATE() and STR() functions behave differently depending on the data source (Excel/text/Access vs. extract)

http://community.tableausoftware.com/message/179624

DATEPARSE() for converting different date formats

DATEPARSE-ing Tips for Successful Speed Dating with Tableau 8.1

Joining Vs. Blending

http://www.tableausoftware.com/support/knowledge-base/join-vs-relationship-60
http://www.tableausoftware.com/support/knowledge-base/summary-and-detail-60

When one data source has subcategory & measures, and another has category & subcategory, can use Primary Groups to effectively create the category within the first data source. KB has article on this, search for “creating primary groups from a secondary source”.

Data blending and getting aggregations right:

Workbook with three options (extract, control data relationships for blend, custom SQL) by Jeff Mills:
http://community.tableausoftware.com/message/183888#183888
http://community.tableausoftware.com/thread/118319

Creating a Tableau Data Source .tds for connecting to just the database and not the table by Craig Bloodworth:
http://www.theinformationlab.co.uk/2012/11/26/connect-to-a-database-without-specifying-a-table-video/

Outer joining multiple Excel tables
https://www.interworks.com/blogs/tmccullough/2013/06/26/outer-joining-multiple-excel-tables-tableau-8

Custom datasource config files:
http://community.tableausoftware.com/message/227764

[loop category=”wikicontent” tag=”preparing,reshaping”]
[field title] – Added [field date]
[content]
[/loop]

[field title-link][field thumbnail-link]

[/loop]

Partitioning by Table Calculations

9 Replies

Here’s a issue on the Tableau forums that shows up at least once per month or so: someone has a data set that they want to compute a measure over, and the measure requires table calculations. Then they want to use the results of those calculations as a discrete value for partitioning other calculations. For example, given a set of student grades, compute each student’s percentile, then show the average of each percentile in the same view. Or a similar idea, only the percentile is based on sum of sales, and you want to show the top 10% and bottom 10% of sales. Continue reading →

Drawing with Numbers

Thoughts on data visualization and Tableau

Tag Archives: aggregation

LOD Expression Remix – Finding a Dimension at a Lower Level

How to Have Sets with Your Secondary (Data Sources)

Set from Secondary as Filter

Set from Secondary on Color

Discrete vs. Continuous Measures

Using In/Out of Secondary Set in Primary Crosstab

Advanced Uses for Secondary Sets

Cohort Analysis

Combining Sets Across Data Sources

The Three Way

Conclusion

Table Calculations

LOOKUP() vs. PREVIOUS_VALUE

Ignore in Table Calculations

Lots of Table Calc Links

Order of calculations:

Addressing and Partitioning

More notes from convo with Joe on 3/2012

Roll Your Own Index

Padding

How Joe makes table calcs work

Filtering Notes

More on At the Level and Restarting Every (Alex Kerin question leading to Ross Bunker post)

Table calc – result only needed for one row in partition:

Table calc – result needed for all rows in partition (same answer for each case)

Given an index to another row, a formula to get an offset to that row from the current row, to use in a LOOKUP() statement:

Related posts:

Preparing Data

Key bits on migrating from Excel

anonymizing data for Tableau

DATE() and STR() functions behave differently depending on the data source (Excel/text/Access vs. extract)

Joining Vs. Blending

Data blending and getting aggregations right:

Related posts:

Partitioning by Table Calculations