Tag Archives: partitioning

At the Level – Unlocking the Mystery Part 2: Rank Functions

Many moons ago I did a first post exploring the non-obvious logic of the most secretive of Tableau table calculation configuration options: At the Level. A few weeks ago I was inspired by a question over email to dive back in, this post explores At the Level for the five rank functions: RANK(), RANK_DENSE(), RANK_MODIFIED(), RANK_UNIQUE(), and RANK_PERCENTILE(). The rank functions add a level of indirection to the already complicated behavior of At the Level and I don’t have any particular use cases, so…

If you are like me and won’t rest until you understand every detail of Tableau’s functionality, then this post is for you. Otherwise you may find this post unhelpful and/or confusing due to extreme table calculation geekery. You have been warned.

The particular challenge with ordinal functions like INDEX(), FIRST(), and the rank functions is that we absolutely have to understand how addressing and partitioning works in Tableau, and then we tack onto that an understanding of how the calculations work, and finally we can add on how At the Level works. For the first part, I suggest you read the Part 1 post on At the Level, it goes into some detail on addressing and partitioning. To understand the rank functions here’s the Tableau manual for table calculations (scroll down to the Rank functions section). Finally, read on for how At the Level works for rank functions.

Continue reading

Counting from Nothing – A Double Remix (or, Partitioning via Table Calculations v2)

Over on the Tableau forums Alexander Mou answered a thread on generating a count from sparse data, and the solution he came up with is found in his blog post Dynamic Histogram Over Time. In this post I’m diving into some details of what Alexander did, coming up with a couple of alternative remixes of that solution, and describing a couple of different ways to effectively partition a table calculation via another table calculation. Read on for details!

Continue reading

At the Level – Unlocking the Mystery Part 1: Ordinal Calcs

There was a Tableau forums thread on At the Level awhile back where Matthew Lutton asked for an alternative explanation of this somewhat puzzling table calculation configuration option, and I’d promised I’d take a swing at it. Plus, I’ve been deep into book writing about shaping data for Tableau, and a taking a break to write about obscure table calc options sounds like fun! (Yes, I’m wired differently.)

Read on for a refresher on addressing and partitioning and my current understanding of uses of At the Level for ordinal table calculations such as INDEX() and SIZE(). Part 2 will cover LOOKUP(), and Part 3 will cover WINDOW_SUM(), RUNNING_SUM(), and R scripts. If you’re new to table calcs, read through at least the Beginning set of links in  Want to Learn Table Calculations. Thanks to Alex Kerin, Richard Leeke, Dimitri Blyumin, Joe Mako, and Ross Bunker for their Tableau forum posts that have informed what you’re about to read.

Continue reading

Table Calculations

See Want to Learn Table Calculations? Here’s How for a list of posts that I recommend.

From http://www.tableausoftware.com/support/forum/topic/trellis-plots comments by Joe Mako:

1. First step in table calcs is what kind of pills you have:
a. aggregated continuous
b. aggregated discrete
c. dimension continuous
d. dimension discrete

2. What shelf these pill types are on
– each of these types has a different effect on the evaluation of a table calc depending on their arrangement.
– when you have a dimension on both the rows and columns shelf, Tableau will padd for missing dinension combinations, unless you are using a set based on these dimensions
– building your view with all dimensions on the rows shelf is a good place to start, and once results are as expected, then move pills one at a time to other shelf, and see how shelf placement effects evaluation.

3. How you set your compute using
– if you use the compute using selection from the context menu, or the drop-down in the dialog, and set it to a dimension, that dimension will be used for addressing, and all others will be used for partitioning. I call this Regular Compute Using
– when setting the compute using to something like Table/Pane/Cell, I call that Layout Compute Using, and try to avoid this because I find it to be a delicate setting (move a pill, move a dimension pill, and your evaluation will likely be effected).

– with Regular Compute Using, if all addressing values do not exist in all partitioning values, the INDEX() value for each addressing value will not be the same across partition values. This is apparent with not all combinations of dimension values exist. (In this situation, there are states that have no sales records for some months, so whit your compute using set to State, and partitioning on Month, each State will not have the same INDEX() value in each month.)

– when using a Regular Compute Using and a crosstab of the dimensions in use (one on the Rows, and the other on the Columns), and no Set based on those dimensions, Tableau will pad the data making each State have the same INDEX() value for each Month.

– So without the crosstab setup, but with the desire for Tableau to pad the data to cause the INDEX() function to evaluate as desired, you can have Tableau pad the data as desired with an Advanced Compute Using.

– When you use multiple dimensions on the Compute Using (right-side list box) for addressing, I think of it as like a pseudo-crosstab of all potential combinations of the dimension values (this is what we want in this situation). If you use a Set instead of multiple dimensions, that data will not be padded.

It is my current belief that if you are aware of what pills you have in use, what shelf they are on, how your compute using is setup, and how all these potential combinations of setups effect the evaluation of table calcs, then table calcs are straightforward.

Without doubt, I have not included all information needed to understand table calcs here, but only some key concepts that play a factor in the attached “(fixed)” worksheets.

(end of post)

Another couple of posts by Joe:
http://community.tableausoftware.com/message/175545
http://community.tableausoftware.com/message/136628#136628

Joe’s notes on 6.1 to 7.0 differences:
http://community.tableausoftware.com/thread/117103?start=0&tstart=0

LOOKUP() vs. PREVIOUS_VALUE

See here: http://community.tableausoftware.com/message/173752

Also this from email from Joe Mako:
Have you attempted the formula that you recommended at
http://community.tableausoftware.com/message/173588#173588 ?

Instead of ATTR([Status]) != PREVIOUS_VALUE(ATTR([Status]))

try ATTR([Status]) != LOOKUP(ATTR([Status]),-1)

The PREVIOUS_VALUE() function is self referential, kind of like a recursive function or a loop. The argument you pass the function is just the starting value for the partition, while the LOOKUP() function gets the value based on an offset, and an offset of -1 means previous.

In your statement, the first mark will result in False, effectively [Status] != [Status], and every other record will also result in True because it is likely [Status] != “False” or [Status] != “True”

The formula ATTR([Status]) != LOOKUP(ATTR([Status]),-1) is basically saying:
“Compare the Status of the current mark and the Status of the previous mark in the partition.”

I recommend you try some trial and error with the PREVIOUS_VALUE() function

Try some formulas like:

PREVIOUS_VALUE(0)+1
PREVIOUS_VALUE(100)-1
PREVIOUS_VALUE(0)+SUM([Sales])
PREVIOUS_VALUE("")+ATTR([Product])+", "

Ignore in Table Calculations

http://www.tableausoftware.com/fr-fr/support/forum/topic/aggregating-aggregate-histograms
http://community.tableausoftware.com/thread/114442
http://community.tableausoftware.com/thread/110683

Lots of Table Calc Links

http://www.tableausoftware.com/table-calculations
http://www.tableausoftware.com/support/knowledge-base/table-calculations
http://www.tableausoftware.com/support/knowledge-base/pareto-analysis
http://www.tableausoftware.com/support/knowledge-base/compare-two-table-calcs-60
http://www.tableausoftware.com/support/knowledge-base/nestedsorting
http://www.tableausoftware.com/support/knowledge-base/sorting-concepts-related-quick-table-calculations
http://www.tableausoftware.com/support/forum/topic/plotting-custom-median-and-connecting-marks
http://community.tableausoftware.com/thread/111337 – multiple “reference” lines on the data using table calcs and/or duplicating data
http://www.tableausoftware.com/support/forum/topic/multiple-reference-lines – using average age for male & female reference lines, interesting table calcs here
http://www.tableausoftware.com/support/forum/topic/display-dimension-text-associated-measure – older discussion from 2010 showing use of multiple table calcs trying to get a minimum of a set of values for display, great comments from Ross Bunker

http://interworks.co.uk/blog/groundwork-for-custom-table-calculations/

 

Order of calculations:

http://community.tableausoftware.com/message/139603#139603

 

The following text from http://onlinehelp.tableausoftware.com/v7.0/pro/online/en-us/calculations_tablecalculations_understanding_addressing.html

Addressing and Partitioning

The addressing fields define what part of the table you are computing along. The partitioning fields define how to group the calculation. In the example of a running sum of product sales across several years, the addressing field is the Date field while the parititioning field is the product field. When youd define the addressing for a table calculation, all the other fields are used for partitioning.
You can specify the addressing in the Table Calculation dialog box. The addressing can be relative to the table structure or a specific field. Each addressing option is described below.
Table (Across)
This option sets the adressing to compute along the entire table moving horizontally through each partition.

From Table Calcs primer PDF:
• Partitioning: the scope or grouping of the calculation. This can be the full table, a pane, a cell, a dimension or it can be customized even further for more advanced calculations.
• Addressing: the anchor or the source of each partition. It defines the root of the calculation.
Consider, for example, looking at a running sum of sales by product across several years. In our example, the running sum is partitioned by product: every product’s sale is summed over time, so the result of the calculation is a running sum of coffee sales, tea sales, etc. The addressing field is the date field. With every new date, the sales of that data are added to the sum. When you define the addressing for a table calculation, all the other fields are used for partitioning.
Table (Across)
Table (Across) is a Calculate Along option. The chart below uses Table (Across) to set the addressing to compute along the entire table, moving horizontally through each partition.
When calculation addressing is set to Table( Across), the fields that span horizontally across the table are the addressing fields (Category and Region).

 

From: http://www.tableausoftware.com/support/forum/topic/version-6-table-calcs-understanding-few-basics#comment-27646
The fields specified in the ‘Compute Using’ menu or ‘Edit Table Calculation…’ dialog are termed the addressing fields. All of the dimensions on a sheet that are not addressing fields are partitioning fields. This corresponds to the ‘computed along’ (addressing) and ‘for each’ (partitioning) components of the description in the table calculation dialog. One other nice thing is that in the final v6, the tooltip for a table calculation will actually contain the description also, so you don’t have to open the dialog to understand what it is doing.

From http://www.tableausoftware.com/support/forum/topic/version-6-table-calcs-understanding-few-basics#comment-27736
‘Restarting every’ moves the given field and those above it in the Advanced… dialog to be partitioning. Similarly ‘At the level’ moves fields after it in the list to be partitioning, though there is a subtle difference in how that partitioning is done (its not done on value, but rather on postition within the partition, i’ll give more details on that in the tutorial).

One might ask why you would put something in the ordering list in Advanced… only to partition on it by setting ‘Restarting every’. The answer lies in the sorting behavior. For example, if you want to know the top products for each market, you can’t simply sort products by sales (which sorts based on sales for all markets). Instead, you sort Market,Product by sales. Then, when you partition on Market, the products within each of those Market partitions are still sorted by the sales within that market. Play around with it to see the difference.

My notes:
Addressing is the compute along/over/by.
Partitioning is the “grouping” or “scope” of the calculation – to some degree defined by the “context of the calculation, since All dimensions not part of addressing are used for partitioning. Also includes dimensions (and discrete pills) on Level of Detail shelf (and Color shelf too?)
‘Restarting every’ moves the given field and those above it in the Advanced… dialog to be partitioning.
Example of running sum of sales by product across several years:
product is partition, each new product creates a new returned value of sum of sales
date is address, sum of sales for each new date is added to the returned value for that partition
Where I’ve been confused (I think) is that there are three things in table calcs, not two: partitioning, addressing, and what is being calculated

More notes from convo with Joe on 3/2012

I don’t want to misrepresent Joe’s awesome knowledge here, if there are any mistakes in these notes from our conversation they are almost certainly mine.

Table (Across), etc. do a visual sort base on the layout of the view. The results from Table (Across), etc. can sometimes be the same as what you can get by using Advanced Compute Using, they aren’t always. For example, you can have a complex sort set up in the view and Table (Across), etc. will work just fine, but you can’t duplicate that sort using Advanced Compute Using because Advanced Compute Using creates a set of the fields in the Compute Using and then sorts that set along whatever is set in the Order Along.

This can be confusing because the English language description Tableau gives when you hover over the pill in the view can be the same for a Table (Across) and an Advanced Compute Using(), but the results will be different.

~~~Example using Superstore Sales:
Put Container on Rows, Category on Color, and Sales on Columns. Create an INDEX() function and see what it does try to duplicate that using an Advanced Compute Using

Show Me can be used to “freeze” a setup, typically by Table (Across) or Table (Down). Can be used to get to a view that you can’t get w/Advanced…Compute Using. “Table (Down) is magic dust with Show Me.” ~~~Need an example here, probably one with a funky INDEX().

Roll Your Own Index

IF ATTR([Field]) = LOOKUP(ATTR([Field],-1) THEN
 PREVIOUS_VALUE(0)
ELSE
 PREVIOUS_VALUE(0)+1
END

This creates a rank that shows ties, or can be used to ignore something in a table calc.

~~~Example:
Rows: Roll Your Own Index (set Compute Using to Container), Container (sorted on Sum(Sales) Asc) Color Shelf: Category

This allows the index to be computed on container but ignore category for sort and partitioning. This would be a nice feature.

Padding

Using dimensions on both Rows and Columns shelves causes Tableau to pad, while dimensions only on Rows doesn’t cause Tableau to pad.

~~~Put Container on Rows, Category on Cols, see what INDEX() does Put Container on Rows, Category on Rows, see what INDEX() does – will see different marks used now Compute Using set to Across then Down vs. Down.

next example: Put Customer, Order Date on Rows. Index() on Text Index on Customer – ranks them Index on Order Date – causes padding, Tableau takes a really long time to return data. If you use MONTH(Order Date) it doesn’t take as long.

the key issues w/padding are:
Aggregate vs. dimension
continuous vs. discrete
datepart vs. datetrunc

Some details w/display can get lost when using Primary/Secondary sources. example: duplicate Superstore sales, when Customer from both is in the Customer from secondary does not show ATTR(), i.e. doesn’t show that it’s an aggregate. (thought here is to outline the pills)

How Joe makes table calcs work

1. put every dimension on Rows shelf
2. Measure Names as Columns, Measure Values as text 3. Do addressing on the right-most pill

This way, can turn on totals and subtotals to check what they return vis-a-vis the table calcs. Also can see the partitioning and padding. Using a full-blown cross tab makes it more confusing. Also allows filters using table calcs to be tested since all dimensions are available on the Rows shelf.

Filtering Notes

Table calc filters can only see what’s on Columns and Rows shelves, and/or see their own instance on the Level of Detail, so if needed fields for the calc aren’t on Columns or Rows you can put an instance of the table calc on the Level of detail shelf.

Another advantage to using Joe’s method for making table calcs work is that table calc filters can be tested as well.

Aggregated/continuous pills can go on Filter shelf Aggregated/discrete pills can’t go on Filter shelf Table calc/continuous or discrete pills can go on Filter shelf

Example of this
Create SUM(1) calculated field, call it “Test Agg”. As Continuous, it can go on Filter shelf. As Discrete, can’t go on filter shelf. Create another calculated field LOOKUP([Test Agg],0). As Continuous or Discrete, it can go on the Filter shelf.

(end of Joe notes)

More on At the Level and Restarting Every (Alex Kerin question leading to Ross Bunker post)

http://community.tableausoftware.com/message/178239#178239

 

Table calc – result only needed for one row in partition:

From http://www.clearlyandsimply.com/clearly_and_simply/2011/01/another-look-at-site-catchment-analysis-with-tableau-6-part-3.html

IF (FIRST()==0) THEN
   WINDOW_SUM(SUM([Population]), 0, IIF(FIRST()==0, LAST(), 0)) 
END

Don’t try this with TOTAL(), it can mess things up

Table calc – result needed for all rows in partition (same answer for each case)

From http://www.clearlyandsimply.com/clearly_and_simply/2011/01/another-look-at-site-catchment-analysis-with-tableau-6-part-3.html

PREVIOUS_VALUE(WINDOW_SUM(SUM([foo]), 0, IIF(FIRST()==0, LAST(), 0)))

Probably shouldn’t try this with TOTAL() either
TOTAL() hits DB while WINDOW_SUM() does not, would probably affect order of filters? TOTAL() causes any enclosed aggregate functions to be evaluated over each partition – but I’m not really sure why “Table Across” behaves differently to
WINDOW_XXX() functions.

Uncheck Ignore in Table Calculations so discrete field can be used in a table calc:
http://community.tableausoftware.com/message/173434#173434

Idea for Table Calcs:
Duplicate a dimension and put the copy on the LoD shelf so calcs can be along that, but summed overall.
http://community.tableausoftware.com/message/174254#174254

Index (roll your own)
From: http://www.tableausoftware.com/support/forum/topic/ignoring-order-along-advanced-table-calc#comment-44787
Based on: http://www.tableausoftware.com/support/forum/topic/top-x-shows-more-x-values-when-using-color-shelf

IF LOOKUP(MIN([Product]),-1)=MIN([Product]) THEN 
   PREVIOUS_VALUE(0)
ELSE
   PREVIOUS_VALUE(0)+1
END

Discussion of this and partitioning (Ross Bunker)
http://community.tableausoftware.com/message/183504#183504

Using PREVIOUS_VALUE to combine multiple rows into one
Example here is to make a list of names like Jane, John, Joe, etc. from multiple rows.
http://community.tableausoftware.com/thread/118259

TODAY() and NOW() generate Internal Expression Error inside table calc
As of Tableau v7, the following formula generates an “Internal Expression Error: Function ‘TODAY’ is not defined in the current context” error.
WINDOW_SUM(IF ATTR([Order Date]) >= TODAY()-29 THEN WINDOW_SUM(SUM([Sales])) END)
The same happens when NOW() is used instead of TODAY(). The solution is to make a calculated field with NOW() or TODAY(), and then use the calculated field in the function.

 

Given an index to another row, a formula to get an offset to that row from the current row, to use in a LOOKUP() statement:

FIRST()+[the index] -1

Improving table calc partitioning & addressing
http://community.tableausoftware.com/ideas/1189

[loop category=”wikicontent” tag=”tc,tablecalcs,table-calculations”]
[field title] – Added [field date]
[content]
[/loop]

Related posts:

    [loop tag=”densification,address,addressing,tc,table-calcs,table-calculations,aggregation,index,padding,partition,partitioning,PREVIOUS_VALUE(),rank,ranking,at-the-level” exclude=”this” relation=”and” compare=”not” taxonomy=”category” value=”wikicontent”]

  • [field title-link][field thumbnail-link]

[/loop]