Here’s a issue on the Tableau forums that shows up at least once per month or so: someone has a data set that they want to compute a measure over, and the measure requires table calculations. Then they want to use the results of those calculations as a discrete value for partitioning other calculations. For example, given a set of student grades, compute each student’s percentile, then show the average of each percentile in the same view. Or a similar idea, only the percentile is based on sum of sales, and you want to show the top 10% and bottom 10% of sales.
The problem here is that Tableau does not support partitioning table calculations by the results of other table calculations. You can partition table calculations by dimensions, and even by discrete aggregates if you turn off Ignore Table Calculations using the pill’s menus, but not by table calculations. An alternative workaround is to do the percentile calc in one worksheet, export the data, then use that as a datasource for an additional worksheet, but that has the limitations of a) not being dynamic and b) prone to failure for datasets in the millions of rows.
However, table calculations are magical things that can slice, dice, rearrange, re-aggregate, and yes, Martha, even partition by the results of other table calculations, and you’re about to learn how. Thanks to Joe Mako for his pioneering work, it led me to a Tableau epiphany that I’ll be sharing in a future post.
Here’s the basic process for partitioning the results of a table calculation by other table calculations:
- Start with a text table. Put all the dimensions and discrete aggregates you’ll need for the final view on the Rows Shelf, with the addressing pill(s) on the right, partitioning pills on the left.
- Build the initial table calc and verify that it’s working properly.
- Build the partitioning calc(s), add those and verify them.
- Do the work to reduce overlapping text/additional marks.
- Duplicate the worksheet and rearrange pills to get to the final view.
The tricky bit is step 3, since every calc that we want to use in the final result has to appropriately reference the calc we are partitioning over. For example, for doing an average of test scores within a percentile, we need to compute the total of the scores within each percentile for the numerator, and the total number of tests within each percentile for the denominator, and then in a third calc generate the average. If we want the average of scores within a range of percentiles, then we’ll need to return the scores for the range and compute the average over that.
The data set that I’m using is a bunch of test scores that I mocked up, for three schools with two subjects, here’s a sample of the data and the school data.xlsx spreadsheet:
// The quantile rank gives the proportion of values the same as or less than the value in question // Note that the term subtracting half the number of records for the current value is effectively // returning the average rank in the case of multiple occurences of the same value. // See http://http://en.wikipedia.org/wiki/Percentile_rank. // Calc from http://community.tableausoftware.com/message/189088#189088 IF (ATTR([Grade]) = LOOKUP(ATTR([Grade]), -1)) THEN PREVIOUS_VALUE(0) ELSE INT([Quantile] * (RUNNING_SUM(SUM([Number of Records])) - ((SUM([Number of Records]) - 1)/ 2)) / TOTAL(SUM([Number of Records]))) END
[Grade] is the student’s particular grade, [Quantile] is a parameter, set to 100 in this case. In order to accurately compute the percentiles, we need all the grades i.e. every row, so [Test ID] needs to be in the view, sorted by Sum of Grade/Ascending, and the Compute Using for this calc is set to Test ID.
Here’s a starter view:
Making Friends with LOOKUP() and PREVIOUS_VALUE()
Now we can move on to the partitioning calcs. These calculations make use of two very powerful table calculation functions:
- LOOKUP([field],[offset]) gets the value of [field] from another row in the partition, based on the offset.
- PREVIOUS_VALUE([field]) is a self-referencing function, where it returns the value of the calculation from the prior row in the partition, and returns the given [field] for the first row in the partition.
Here’s the Total of Grades w/in Quantile formula, which has a Compute Using of Test ID:
IF LOOKUP([Quantile (inc)],-1) == [Quantile (inc)] THEN PREVIOUS_VALUE(0)+SUM([Grade]) ELSE SUM([Grade]) END
The IF statement is comparing the current row’s Quantile to the previous Quantile. If the two values are the same, then it adds the current row’s SUM([Grade]) the prior value of the Total of Grades w/in Quantile; this effectively is a running sum. Where the partitioning comes in is that if the current row’s Quantile is different, then the calculation “restarts” and returns the SUM([Grade]).
The Total Records w/in Quantile formula is even simpler, all it does is count from 1 up for each Quantile and still uses the Compute Using of Test ID:
IF LOOKUP([Quantile (inc)],-1) == [Quantile (inc)] THEN PREVIOUS_VALUE(0)+1 ELSE 1 END
The Average Grade w/in Quantile uses this formula, again with a Compute Using of Test ID:
IF LOOKUP([Quantile (inc)],1) != [Quantile (inc)] OR LAST()==0 THEN [Total of Grades w/in Quantile]/[Total Records w/in Quantile] END
The IF statement is used to identify the last row within each Quantile “partition”, and the very last row, and then return the average for that row and Null for everything else. The reason why we need the IF statement is two-fold: One is that every row has a value for the Total Grades and Total Records, but the only accurate division result is the one based on the last row within each Quantile. The second reason is that we really do only want to return one value for the average within each Quantile in order to prevent overlapping text and improve performance by drawing fewer marks.
Here’s the the basic workout view created by these calcs:
Once you have that, then you can duplicate the worksheet and start rearranging pills, to create something like another text table. Note that the table has blank spaces because there aren’t test scores in the data to cover every possible quantile, and the quantile calc is not attempting to interpolate for the blanks:
Or a line chart:
In both of these views, there’s a filter on Average Grade w/in Quantile that is set to only include non-Null values. This reduces the number of marks Tableau will display and significantly improves performance when you are displaying more than a few hundred marks.
Averaging Top 10th and Bottom 10th Percentiles
For these calculations, we again use the Quantile results as the basis for another calculation to create new measures:
IF FIRST()==0 THEN WINDOW_AVG(IF [Quantile (inc)] >= 91 THEN ATTR([Grade]) END,0,IIF(FIRST()==0,LAST(),0)) END
In this case, the calc only return the Grade if the Quantile is in the top 10th percentile, then do a WINDOW_AVG of those results. A similar calc looks at the bottom 10th percentile. With those measures, we can combine them into a plot of the means and the median:
There’s a basic example of how to partition table calculations by the result of other table calculations, and the partitioning by table calculations workbook on Tableau Public.