Over on the Tableau forums Alexander Mou answered a thread on generating a count from sparse data, and the solution he came up with is found in his blog post Dynamic Histogram Over Time. In this post I’m diving into some details of what Alexander did, coming up with a couple of alternative remixes of that solution, and describing a couple of different ways to effectively partition a table calculation via another table calculation. Read on for details!
Tag Archives: partition
From http://www.tableausoftware.com/support/forum/topic/trellis-plots comments by Joe Mako:
1. First step in table calcs is what kind of pills you have:
a. aggregated continuous
b. aggregated discrete
c. dimension continuous
d. dimension discrete
2. What shelf these pill types are on
– each of these types has a different effect on the evaluation of a table calc depending on their arrangement.
– when you have a dimension on both the rows and columns shelf, Tableau will padd for missing dinension combinations, unless you are using a set based on these dimensions
– building your view with all dimensions on the rows shelf is a good place to start, and once results are as expected, then move pills one at a time to other shelf, and see how shelf placement effects evaluation.
3. How you set your compute using
– if you use the compute using selection from the context menu, or the drop-down in the dialog, and set it to a dimension, that dimension will be used for addressing, and all others will be used for partitioning. I call this Regular Compute Using
– when setting the compute using to something like Table/Pane/Cell, I call that Layout Compute Using, and try to avoid this because I find it to be a delicate setting (move a pill, move a dimension pill, and your evaluation will likely be effected).
– with Regular Compute Using, if all addressing values do not exist in all partitioning values, the INDEX() value for each addressing value will not be the same across partition values. This is apparent with not all combinations of dimension values exist. (In this situation, there are states that have no sales records for some months, so whit your compute using set to State, and partitioning on Month, each State will not have the same INDEX() value in each month.)
– when using a Regular Compute Using and a crosstab of the dimensions in use (one on the Rows, and the other on the Columns), and no Set based on those dimensions, Tableau will pad the data making each State have the same INDEX() value for each Month.
– So without the crosstab setup, but with the desire for Tableau to pad the data to cause the INDEX() function to evaluate as desired, you can have Tableau pad the data as desired with an Advanced Compute Using.
– When you use multiple dimensions on the Compute Using (right-side list box) for addressing, I think of it as like a pseudo-crosstab of all potential combinations of the dimension values (this is what we want in this situation). If you use a Set instead of multiple dimensions, that data will not be padded.
It is my current belief that if you are aware of what pills you have in use, what shelf they are on, how your compute using is setup, and how all these potential combinations of setups effect the evaluation of table calcs, then table calcs are straightforward.
Without doubt, I have not included all information needed to understand table calcs here, but only some key concepts that play a factor in the attached “(fixed)” worksheets.
(end of post)
Another couple of posts by Joe:
Joe’s notes on 6.1 to 7.0 differences:
LOOKUP() vs. PREVIOUS_VALUE
See here: http://community.tableausoftware.com/message/173752
Also this from email from Joe Mako:
Have you attempted the formula that you recommended at
Instead of ATTR([Status]) != PREVIOUS_VALUE(ATTR([Status]))
try ATTR([Status]) != LOOKUP(ATTR([Status]),-1)
The PREVIOUS_VALUE() function is self referential, kind of like a recursive function or a loop. The argument you pass the function is just the starting value for the partition, while the LOOKUP() function gets the value based on an offset, and an offset of -1 means previous.
In your statement, the first mark will result in False, effectively [Status] != [Status], and every other record will also result in True because it is likely [Status] != “False” or [Status] != “True”
The formula ATTR([Status]) != LOOKUP(ATTR([Status]),-1) is basically saying:
“Compare the Status of the current mark and the Status of the previous mark in the partition.”
I recommend you try some trial and error with the PREVIOUS_VALUE() function
Try some formulas like:
PREVIOUS_VALUE(0)+1 PREVIOUS_VALUE(100)-1 PREVIOUS_VALUE(0)+SUM([Sales]) PREVIOUS_VALUE("")+ATTR([Product])+", "
Ignore in Table Calculations
Lots of Table Calc Links
http://community.tableausoftware.com/thread/111337 – multiple “reference” lines on the data using table calcs and/or duplicating data
http://www.tableausoftware.com/support/forum/topic/multiple-reference-lines – using average age for male & female reference lines, interesting table calcs here
http://www.tableausoftware.com/support/forum/topic/display-dimension-text-associated-measure – older discussion from 2010 showing use of multiple table calcs trying to get a minimum of a set of values for display, great comments from Ross Bunker
Order of calculations:
The following text from http://onlinehelp.tableausoftware.com/v7.0/pro/online/en-us/calculations_tablecalculations_understanding_addressing.html
Addressing and Partitioning
The addressing fields define what part of the table you are computing along. The partitioning fields define how to group the calculation. In the example of a running sum of product sales across several years, the addressing field is the Date field while the parititioning field is the product field. When youd define the addressing for a table calculation, all the other fields are used for partitioning.
You can specify the addressing in the Table Calculation dialog box. The addressing can be relative to the table structure or a specific field. Each addressing option is described below.
This option sets the adressing to compute along the entire table moving horizontally through each partition.
From Table Calcs primer PDF:
• Partitioning: the scope or grouping of the calculation. This can be the full table, a pane, a cell, a dimension or it can be customized even further for more advanced calculations.
• Addressing: the anchor or the source of each partition. It defines the root of the calculation.
Consider, for example, looking at a running sum of sales by product across several years. In our example, the running sum is partitioned by product: every product’s sale is summed over time, so the result of the calculation is a running sum of coffee sales, tea sales, etc. The addressing field is the date field. With every new date, the sales of that data are added to the sum. When you define the addressing for a table calculation, all the other fields are used for partitioning.
Table (Across) is a Calculate Along option. The chart below uses Table (Across) to set the addressing to compute along the entire table, moving horizontally through each partition.
When calculation addressing is set to Table( Across), the fields that span horizontally across the table are the addressing fields (Category and Region).
The fields specified in the ‘Compute Using’ menu or ‘Edit Table Calculation…’ dialog are termed the addressing fields. All of the dimensions on a sheet that are not addressing fields are partitioning fields. This corresponds to the ‘computed along’ (addressing) and ‘for each’ (partitioning) components of the description in the table calculation dialog. One other nice thing is that in the final v6, the tooltip for a table calculation will actually contain the description also, so you don’t have to open the dialog to understand what it is doing.
‘Restarting every’ moves the given field and those above it in the Advanced… dialog to be partitioning. Similarly ‘At the level’ moves fields after it in the list to be partitioning, though there is a subtle difference in how that partitioning is done (its not done on value, but rather on postition within the partition, i’ll give more details on that in the tutorial).
One might ask why you would put something in the ordering list in Advanced… only to partition on it by setting ‘Restarting every’. The answer lies in the sorting behavior. For example, if you want to know the top products for each market, you can’t simply sort products by sales (which sorts based on sales for all markets). Instead, you sort Market,Product by sales. Then, when you partition on Market, the products within each of those Market partitions are still sorted by the sales within that market. Play around with it to see the difference.
Addressing is the compute along/over/by.
Partitioning is the “grouping” or “scope” of the calculation – to some degree defined by the “context of the calculation, since All dimensions not part of addressing are used for partitioning. Also includes dimensions (and discrete pills) on Level of Detail shelf (and Color shelf too?)
‘Restarting every’ moves the given field and those above it in the Advanced… dialog to be partitioning.
Example of running sum of sales by product across several years:
product is partition, each new product creates a new returned value of sum of sales
date is address, sum of sales for each new date is added to the returned value for that partition
Where I’ve been confused (I think) is that there are three things in table calcs, not two: partitioning, addressing, and what is being calculated
More notes from convo with Joe on 3/2012
Table (Across), etc. do a visual sort base on the layout of the view. The results from Table (Across), etc. can sometimes be the same as what you can get by using Advanced Compute Using, they aren’t always. For example, you can have a complex sort set up in the view and Table (Across), etc. will work just fine, but you can’t duplicate that sort using Advanced Compute Using because Advanced Compute Using creates a set of the fields in the Compute Using and then sorts that set along whatever is set in the Order Along.
This can be confusing because the English language description Tableau gives when you hover over the pill in the view can be the same for a Table (Across) and an Advanced Compute Using(), but the results will be different.
~~~Example using Superstore Sales:
Put Container on Rows, Category on Color, and Sales on Columns. Create an INDEX() function and see what it does try to duplicate that using an Advanced Compute Using
Show Me can be used to “freeze” a setup, typically by Table (Across) or Table (Down). Can be used to get to a view that you can’t get w/Advanced…Compute Using. “Table (Down) is magic dust with Show Me.” ~~~Need an example here, probably one with a funky INDEX().
Roll Your Own Index
IF ATTR([Field]) = LOOKUP(ATTR([Field],-1) THEN PREVIOUS_VALUE(0) ELSE PREVIOUS_VALUE(0)+1 END
This creates a rank that shows ties, or can be used to ignore something in a table calc.
Rows: Roll Your Own Index (set Compute Using to Container), Container (sorted on Sum(Sales) Asc) Color Shelf: Category
This allows the index to be computed on container but ignore category for sort and partitioning. This would be a nice feature.
Using dimensions on both Rows and Columns shelves causes Tableau to pad, while dimensions only on Rows doesn’t cause Tableau to pad.
~~~Put Container on Rows, Category on Cols, see what INDEX() does Put Container on Rows, Category on Rows, see what INDEX() does – will see different marks used now Compute Using set to Across then Down vs. Down.
next example: Put Customer, Order Date on Rows. Index() on Text Index on Customer – ranks them Index on Order Date – causes padding, Tableau takes a really long time to return data. If you use MONTH(Order Date) it doesn’t take as long.
the key issues w/padding are:
Aggregate vs. dimension
continuous vs. discrete
datepart vs. datetrunc
Some details w/display can get lost when using Primary/Secondary sources. example: duplicate Superstore sales, when Customer from both is in the Customer from secondary does not show ATTR(), i.e. doesn’t show that it’s an aggregate. (thought here is to outline the pills)
How Joe makes table calcs work
1. put every dimension on Rows shelf
2. Measure Names as Columns, Measure Values as text 3. Do addressing on the right-most pill
This way, can turn on totals and subtotals to check what they return vis-a-vis the table calcs. Also can see the partitioning and padding. Using a full-blown cross tab makes it more confusing. Also allows filters using table calcs to be tested since all dimensions are available on the Rows shelf.
Table calc filters can only see what’s on Columns and Rows shelves, and/or see their own instance on the Level of Detail, so if needed fields for the calc aren’t on Columns or Rows you can put an instance of the table calc on the Level of detail shelf.
Another advantage to using Joe’s method for making table calcs work is that table calc filters can be tested as well.
Aggregated/continuous pills can go on Filter shelf Aggregated/discrete pills can’t go on Filter shelf Table calc/continuous or discrete pills can go on Filter shelf
Example of this
Create SUM(1) calculated field, call it “Test Agg”. As Continuous, it can go on Filter shelf. As Discrete, can’t go on filter shelf. Create another calculated field LOOKUP([Test Agg],0). As Continuous or Discrete, it can go on the Filter shelf.
(end of Joe notes)
More on At the Level and Restarting Every (Alex Kerin question leading to Ross Bunker post)
Table calc – result only needed for one row in partition:
IF (FIRST()==0) THEN WINDOW_SUM(SUM([Population]), 0, IIF(FIRST()==0, LAST(), 0)) END
Don’t try this with TOTAL(), it can mess things up
Table calc – result needed for all rows in partition (same answer for each case)
PREVIOUS_VALUE(WINDOW_SUM(SUM([foo]), 0, IIF(FIRST()==0, LAST(), 0)))
Probably shouldn’t try this with TOTAL() either
TOTAL() hits DB while WINDOW_SUM() does not, would probably affect order of filters? TOTAL() causes any enclosed aggregate functions to be evaluated over each partition – but I’m not really sure why “Table Across” behaves differently to
Uncheck Ignore in Table Calculations so discrete field can be used in a table calc:
Idea for Table Calcs:
Duplicate a dimension and put the copy on the LoD shelf so calcs can be along that, but summed overall.
Index (roll your own)
Based on: http://www.tableausoftware.com/support/forum/topic/top-x-shows-more-x-values-when-using-color-shelf
IF LOOKUP(MIN([Product]),-1)=MIN([Product]) THEN PREVIOUS_VALUE(0) ELSE PREVIOUS_VALUE(0)+1 END
Discussion of this and partitioning (Ross Bunker)
Using PREVIOUS_VALUE to combine multiple rows into one
Example here is to make a list of names like Jane, John, Joe, etc. from multiple rows.
TODAY() and NOW() generate Internal Expression Error inside table calc
As of Tableau v7, the following formula generates an “Internal Expression Error: Function ‘TODAY’ is not defined in the current context” error.
WINDOW_SUM(IF ATTR([Order Date]) >= TODAY()-29 THEN WINDOW_SUM(SUM([Sales])) END)
The same happens when NOW() is used instead of TODAY(). The solution is to make a calculated field with NOW() or TODAY(), and then use the calculated field in the function.
Given an index to another row, a formula to get an offset to that row from the current row, to use in a LOOKUP() statement:
FIRST()+[the index] -1
Improving table calc partitioning & addressing
[loop category=”wikicontent” tag=”tc,tablecalcs,table-calculations”]
[field title] – Added [field date]
- [loop tag=”densification,address,addressing,tc,table-calcs,table-calculations,aggregation,index,padding,partition,partitioning,PREVIOUS_VALUE(),rank,ranking,at-the-level” exclude=”this” relation=”and” compare=”not” taxonomy=”category” value=”wikicontent”]
- [field title-link][field thumbnail-link]
Tableau Tongue Twister #2
A partition can be a pane, and a pane can be a partition, and panes can be a pain to partition.