Tag Archives: lod

TRIMMEAN() in Tableau

Excel’s TRIMMEAN() function can be quite useful at removing outliers, essentially it removes the top and bottom Nth percent of values and then computes the mean of the rest. Here’s the equivalent formula in Tableau that in Superstore Sales computes the TRIMMEAN() of sales at the customer level removing the top and bottom 5th percentile of customers when used with the AVG() aggregation:

{FIXED [Customer Name]: SUM(
    IF {FIXED [Customer Name] : SUM([Sales])} < 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
    AND {FIXED [Customer Name] : SUM([Sales])} > 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
       [Sales]
    END)
}

Read on for how to build and validate your own TRIMMEAN() equivalent in Tableau.

When building out calculations in Tableau I try to let Tableau do as much of the computation as possible for both the calculations and the validation, so I’m typing as little as I can. Starting with Superstore, let’s identify the top and bottom 5th percentiles, here’s a view using a reference distribution:

Screen Shot 2016-07-12 at 3.57.38 PM

Now we know what we’re going to have to remove. The next step is to duplicate this worksheet as a crosstab, then build out calcs that can return the 5th and 95th percentiles of Sales at the Customer Name level. While this can be done with table calculations (here’s an example from the Tableau forums) I’m going to use FIXED Level of Detail Expressions so I’ve got a dimension I can use, so for example I could compare the trimmed group to the non-trimmed group. Here’s the 95th percentile Level of Detail Expression:

{FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}

The inner LOD is calculating the sales at the Customer level, then the outer LOD is returning the 95th percentile as a record level value. Here’s the two calcs which have values that compare to the reference lines above:

Screen Shot 2016-07-12 at 4.03.46 PM

The next step is to filter out the values outside of the desired range, here’s the TRIMMEAN Filter formula:

{FIXED [Customer Name] : SUM([Sales])} < 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
AND {FIXED [Customer Name] : SUM([Sales])} > 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)}

This uses the 5th and 95th percentile formulas and only returns True when the Customer level sales is less than the 95th percentile or greater than the 5th percentile, we can visually validate it by dropping it on the Color Shelf:

Screen Shot 2016-07-12 at 4.06.37 PM

Now that we have this the next step is to calculate what the trimmed mean would be. Again, we can use a view with a reference line, this time it’s been filtered using the TRIMMEAN Filter calc and the reference line is an average:

Screen Shot 2016-07-12 at 4.08.16 PM

Now we can embed the TRIMMEAN Filter formula inside an IF/THEN statement to only return the sales for the filtered values, this is the Trimmed Sales calc:

IF {FIXED [Customer Name] : SUM([Sales])} < 
   {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
AND {FIXED [Customer Name] : SUM([Sales])} > 
   {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
   [Sales]
END

And here it is in the workout view, only returning the sales for the trimmed customers:

Screen Shot 2016-07-12 at 4.15.36 PM

Now that we have the trimmed sales there are two ways we can go. If we want the trimmed mean without the Customer Name in the Level of Detail then we can validate that in our workout view by using Tableau’s two-pass Grand Totals to get the average of the customer-level trimmed sales. This was created by:

  1. Removing the TRIMMEAN Filter pill from Colors (this increases the vizLOD and is no longer necessary).
  2. Clicking on the Analytics tab.
  3. Dragging out a Column Grand Total.
  4. Right-clicking the SUM(Trimmed Sales) pill on Measure Values and setting Total Using->Average.

Scrolling down to the bottom we can see that the overall trimmed mean matches of 2,600.79 matches the one from the reference line.

Screen Shot 2016-07-12 at 4.20.33 PM

Note that we could have used the Summary Card instead, however using the Grand Total lets us see exact values.

There’s a problem, though, if we use the Trimmed Sales all on its own in a view it breaks, whether using SUM() or AVG():

Screen Shot 2016-07-12 at 4.25.49 PM

The reason why is that the Trimmed Sales is a record level value and Superstore is at the level of detail of individual order items, but we’re trying to compute the trimmed mean across Customer Names. For the true trimmed mean in this case we need to aggregate this trimmed sales to the Customer Name like we did in the workout view, here’s the Trimmed Sales (Customer Level) formula that uses the Trimmed Sales and wraps that in an LOD to get the Customer Level sales:

{FIXED [Customer Name]: SUM(
    IF {FIXED [Customer Name] : SUM([Sales])} < 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
    AND {FIXED [Customer Name] : SUM([Sales])} > 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
       [Sales]
    END)
}

This returns the same results in the workout view:

Screen Shot 2016-07-12 at 4.31.49 PM

And works all on its own in a view:

Screen Shot 2016-07-12 at 4.32.32 PM

Now this is a case where the FIXED level of detail expression is returning different results depending on the level of detail of the view, if we want it to return the same result then we can wrap all that in one more LOD expression, this is the TRIMMEAN Fixed calculation:

{FIXED : AVG(
    {FIXED [Customer Name]: SUM(
        IF {FIXED [Customer Name] : SUM([Sales])} < 
        {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
        AND {FIXED [Customer Name] : SUM([Sales])} > 
        {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
            [Sales]
        END)
    })
}

And here it is in the workout view and a view without any dimensions:

Screen Shot 2016-07-12 at 4.34.36 PM

Screen Shot 2016-07-12 at 4.35.35 PM

 

 

Final Comments

This is a good (and bad) example of how Tableau is different from Excel. In one bad sense note that I didn’t parameterize the percentage for the trimmed mean, this is because in Tableau it would require two parameters because we can’t put calculations as the arguments to the PERCENTILE() function. In another bad sense the calculation requires understanding Level of Detail expressions and is not wrapped into a simple formula. On the other hand we’ve got very precise control over what the calculation is computing over with those Level of Detail expressions and aren’t just limited to doing trimmed means, we could do trimmed medians, get the Nth percentile of the trimmed values, etc.

Here’s the trimmed mean workbook on Tableau Public.

Keeping a Value in Totals Whilst Excluding from Quick Filter List

Over at Peter Gilk’s Paint by Numbers blog there was a question on this post on filtering while retaining results. Here’s the what Jeremy asked:

May I ask if it would be possible to get a detailed explanation of applying this principle to a different type of data?

For example, I would like to see the US Sales totals, and have the ability to filter it to a US state without the ability to select a US territory (Guam, Puerto Rico, etc), but to have the US territory sales remain in the US national totals. How could I do this?

In this short post I cover two different techniques how to do this using a self-data blend and LOD expressions, respectively.

Continue reading

LOD Expressions and Custom Grand Totals: Replacing Table Calculations and Self-Data Blends with LOD Expressions

I’ve been trying to figure out how to write about this one, I think I finally have a simple enough scenario to describe: In my world of healthcare delivery, I have things like different payors where I want to know what % of the population is covered by a certain payor (like Medicare and Medicaid), and I don’t really need to show anything about the rest of the population other than have the raw numbers available in the computation. Using Superstore, we can do a equivalent modeling of that using Customer Segment as a stand-in for a set of possible distinct conditions for each patient (Customer). So I want to know what % of total Sales are in a given Segment, being able to filter for any set of Customer Segment(s) I want, and show the sum of the % of total Sales for only my filtered Customer Segments. Ideally ending up with something like this:

2015-06-17 07_52_51-Tableau - LOD and Grand Totals

Read on for how this goes from relatively difficult in earlier versions to relatively simple in Tableau version 9.

Continue reading

LOD Expression Remix – Finding a Dimension at a Lower Level

Last week Mark Jackson had a great post on using Tableau v9 Level of Detail expressions to find a dimension at a lower level (with an update here). In his Superstore example where there are multiple Categories in each State, the goal is to show a view of each State with the largest Category in that State based on the number of customers, like this:

2015-05-27 10_43_52-Tableau - return lower level dim from LODIn this remix post I’ll demonstrate an alternative solution that doesn’t require any string manipulation, along with going through my current process for building & verifying LOD expressions.

[Post edited 20140527 to include links to Mark’s update to his original post.]

Continue reading

Level of Detail (LOD) Expressions

Level of Detail (LOD) Expressions (also called LOD Calculations)

White paper by Alan Eldridge & Tara Walker: https://www.tableau.com/learn/whitepapers/understanding-lod-expressions

Top 15 LOD Expressions by Bethany Lyons: https://www.tableau.com/LOD-expressions

Tableau online help for LOD expressions: http://onlinehelp.tableau.com/current/pro/online/windows/en-us/help.html#calculations_calculatedfields_lod.html

Video intro for LOD expressions by Craig Bloodworth @ the Information Lab:
https://www.youtube.com/watch?v=BnSF6i4MW0E&feature=youtu.be

Bora Beran’s blog introducing LOD expressions:
https://boraberan.wordpress.com/2015/01/30/whats-new-in-tableau-9-0-part-2-level-of-detail-expressions/

Mark Jackson on Using LOD to return a lower level dimension value (2 posts, plus Jonathan’s remix)
http://ugamarkj.blogspot.com/2015/05/using-tableau-lod-to-find-dimension-at.html
http://ugamarkj.blogspot.com/2015/05/updated-using-tableau-lod-to-find.html
http://drawingwithnumbers.artisart.org/lod-expression-remix-finding-a-dimension-at-a-lower-level/

Alan Eldridge’s blog (this material is refined in the white paper mentioned above):
Understanding Level of Detail Expressions – Part 1
Understanding Level of Detail Expressions – Part 2
Understanding Level of Detail Expressions – Part 3

LOD Rework 1 – Relative Date Filter

Mark Jackson on using LOD expressions for a nested filtering situation:
http://ugamarkj.blogspot.com/2015/02/advanced-grouping-with-tableau-level-of.html

LOD expressions for semi-additive measures:
Guest Post: Tableau LOD calcs for semi-additive measures

Comparison of LOD expressions to techniques in other tools (Excel, SQL, Altered, etc.)
https://dabblingwithdata.wordpress.com/2015/03/05/why-version-9-will-fix-tableau-for-me-and-workarounds-in-the-mean-time/

Using FIXED vs INCLUDE/EXCLUDE:

 

[loop category=”wikicontent” tag=”lod, lod calculations, lod expressions”]
[field title] – Added [field date]
[content]
[/loop]

Related posts:

      [loop tag=”lod, lod calculations, lod expressions” exclude=”this” relation=”and” compare=”not” taxonomy=”category” value=”wikicontent”]
  • [field title-link][field thumbnail-link]

[/loop]