Tag Archives: Excel

TRIMMEAN() in Tableau

Excel’s TRIMMEAN() function can be quite useful at removing outliers, essentially it removes the top and bottom Nth percent of values and then computes the mean of the rest. Here’s the equivalent formula in Tableau that in Superstore Sales computes the TRIMMEAN() of sales at the customer level removing the top and bottom 5th percentile of customers when used with the AVG() aggregation:

{FIXED [Customer Name]: SUM(
    IF {FIXED [Customer Name] : SUM([Sales])} < 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
    AND {FIXED [Customer Name] : SUM([Sales])} > 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
       [Sales]
    END)
}

Read on for how to build and validate your own TRIMMEAN() equivalent in Tableau.

When building out calculations in Tableau I try to let Tableau do as much of the computation as possible for both the calculations and the validation, so I’m typing as little as I can. Starting with Superstore, let’s identify the top and bottom 5th percentiles, here’s a view using a reference distribution:

Screen Shot 2016-07-12 at 3.57.38 PM

Now we know what we’re going to have to remove. The next step is to duplicate this worksheet as a crosstab, then build out calcs that can return the 5th and 95th percentiles of Sales at the Customer Name level. While this can be done with table calculations (here’s an example from the Tableau forums) I’m going to use FIXED Level of Detail Expressions so I’ve got a dimension I can use, so for example I could compare the trimmed group to the non-trimmed group. Here’s the 95th percentile Level of Detail Expression:

{FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}

The inner LOD is calculating the sales at the Customer level, then the outer LOD is returning the 95th percentile as a record level value. Here’s the two calcs which have values that compare to the reference lines above:

Screen Shot 2016-07-12 at 4.03.46 PM

The next step is to filter out the values outside of the desired range, here’s the TRIMMEAN Filter formula:

{FIXED [Customer Name] : SUM([Sales])} < 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
AND {FIXED [Customer Name] : SUM([Sales])} > 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)}

This uses the 5th and 95th percentile formulas and only returns True when the Customer level sales is less than the 95th percentile or greater than the 5th percentile, we can visually validate it by dropping it on the Color Shelf:

Screen Shot 2016-07-12 at 4.06.37 PM

Now that we have this the next step is to calculate what the trimmed mean would be. Again, we can use a view with a reference line, this time it’s been filtered using the TRIMMEAN Filter calc and the reference line is an average:

Screen Shot 2016-07-12 at 4.08.16 PM

Now we can embed the TRIMMEAN Filter formula inside an IF/THEN statement to only return the sales for the filtered values, this is the Trimmed Sales calc:

IF {FIXED [Customer Name] : SUM([Sales])} < 
   {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
AND {FIXED [Customer Name] : SUM([Sales])} > 
   {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
   [Sales]
END

And here it is in the workout view, only returning the sales for the trimmed customers:

Screen Shot 2016-07-12 at 4.15.36 PM

Now that we have the trimmed sales there are two ways we can go. If we want the trimmed mean without the Customer Name in the Level of Detail then we can validate that in our workout view by using Tableau’s two-pass Grand Totals to get the average of the customer-level trimmed sales. This was created by:

  1. Removing the TRIMMEAN Filter pill from Colors (this increases the vizLOD and is no longer necessary).
  2. Clicking on the Analytics tab.
  3. Dragging out a Column Grand Total.
  4. Right-clicking the SUM(Trimmed Sales) pill on Measure Values and setting Total Using->Average.

Scrolling down to the bottom we can see that the overall trimmed mean matches of 2,600.79 matches the one from the reference line.

Screen Shot 2016-07-12 at 4.20.33 PM

Note that we could have used the Summary Card instead, however using the Grand Total lets us see exact values.

There’s a problem, though, if we use the Trimmed Sales all on its own in a view it breaks, whether using SUM() or AVG():

Screen Shot 2016-07-12 at 4.25.49 PM

The reason why is that the Trimmed Sales is a record level value and Superstore is at the level of detail of individual order items, but we’re trying to compute the trimmed mean across Customer Names. For the true trimmed mean in this case we need to aggregate this trimmed sales to the Customer Name like we did in the workout view, here’s the Trimmed Sales (Customer Level) formula that uses the Trimmed Sales and wraps that in an LOD to get the Customer Level sales:

{FIXED [Customer Name]: SUM(
    IF {FIXED [Customer Name] : SUM([Sales])} < 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
    AND {FIXED [Customer Name] : SUM([Sales])} > 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
       [Sales]
    END)
}

This returns the same results in the workout view:

Screen Shot 2016-07-12 at 4.31.49 PM

And works all on its own in a view:

Screen Shot 2016-07-12 at 4.32.32 PM

Now this is a case where the FIXED level of detail expression is returning different results depending on the level of detail of the view, if we want it to return the same result then we can wrap all that in one more LOD expression, this is the TRIMMEAN Fixed calculation:

{FIXED : AVG(
    {FIXED [Customer Name]: SUM(
        IF {FIXED [Customer Name] : SUM([Sales])} < 
        {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
        AND {FIXED [Customer Name] : SUM([Sales])} > 
        {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
            [Sales]
        END)
    })
}

And here it is in the workout view and a view without any dimensions:

Screen Shot 2016-07-12 at 4.34.36 PM

Screen Shot 2016-07-12 at 4.35.35 PM

 

 

Final Comments

This is a good (and bad) example of how Tableau is different from Excel. In one bad sense note that I didn’t parameterize the percentage for the trimmed mean, this is because in Tableau it would require two parameters because we can’t put calculations as the arguments to the PERCENTILE() function. In another bad sense the calculation requires understanding Level of Detail expressions and is not wrapped into a simple formula. On the other hand we’ve got very precise control over what the calculation is computing over with those Level of Detail expressions and aren’t just limited to doing trimmed means, we could do trimmed medians, get the Nth percentile of the trimmed values, etc.

Here’s the trimmed mean workbook on Tableau Public.

Conditional Formatting

Conditional Formatting Equivalents

The most complete post on conditional formatting in Tableau is this one: http://drawingwithnumbers.artisart.org/older-but-still-useful-conditional-formatting/, which describes the multiple axis crosstab here: http://public.tableausoftware.com/views/conditionalformattingv4/Introduction. Other than the Formatting Numbers section, most everything else is covered in that post & workbook.

Building a Measure Names/Values Table – Have only One Measure on Measure Values Card

Put Measure Names on the Filters Shelf first and keep only the desired Measured Value(s), possible to have just one Measure Value this way and still have a header. http://community.tableausoftware.com/message/214871#214871

Formatting Numbers

http://www.clearlyandsimply.com/clearly_and_simply/2011/04/tableau-quick-tip-2-custom-number-formats.html#more

http://community.tableausoftware.com/thread/118621

Creating custom color palettes
http://kb.tableausoftware.com/articles/knowledgebase/creating-custom-color-palettes

highlighting specific rows: Rapid Graphs by Stephen McDaniel and Eileen McDaniel, http://www.freakalytics.com

aligning shapes for marks: mark_align.twbx file by Joe Mako in http://community.tableausoftware.com/thread/113351

setting up custom highlight table: margin_zone.twbx file by Joe Mako in http://community.tableausoftware.com/thread/110007

http://www.tableausoftware.com/support/forum/topic/conditional-formatting-columns-cross-tabs-conditional-appearance-marks-0
http://www.tableausoftware.com/support/forum/topic/conditional-formatting
http://www.tableausoftware.com/support/forum/topic/conditional-background-formatting
http://www.tableausoftware.com/support/forum/topic/conditional-formatting-columns-cross-tabs-conditional-appearance-marks
http://www.tableausoftware.com/support/forum/topic/conditional-formating-measures
http://www.tableausoftware.com/support/forum/topic/how-can-i-do-these-things
http://www.tableausoftware.com/support/forum/topic/shape-alignment#comment-39152

Make an “empty” header:
http://www.tableausoftware.com/support/forum/topic/output-particular-format-column#comment-43090

Force Tableau to show text headers in a row and not hide them in cross-tab display:
http://community.tableausoftware.com/message/172574#172574

Removing last column (Abc column)

http://community.tableausoftware.com/message/174253#174253
http://kb.tableausoftware.com/articles/knowledgebase/removing-abc-placeholders

Crosstabs and Text Tables

Putting more columns than the default 6 on tables – from the menu bar, select Table->Options and increase the number of columns

[loop category=”wikicontent” tag=”conditional-formatting,formatting,Excel,format”]
[field title] – Added [field date]
[content]
[/loop]

Related posts:

    [loop tag=”conditional-formatting,formatting,Excel,format” exclude=”this” relation=”and” compare=”not” taxonomy=”category” value=”wikicontent”]

  • [field title-link][field thumbnail-link]

[/loop]

Preparing Data

Key bits on migrating from Excel

http://kb.tableausoftware.com/articles/knowledgebase/preparing-excel-files-analysis
http://kb.tableausoftware.com/articles/knowledgebase/addin-reshaping-data-excel

anonymizing data for Tableau

http://community.tableausoftware.com/thread/118843

DATE() and STR() functions behave differently depending on the data source (Excel/text/Access vs. extract)

http://community.tableausoftware.com/message/179624

DATEPARSE() for converting different date formats

DATEPARSE-ing Tips for Successful Speed Dating with Tableau 8.1

 

Joining Vs. Blending

http://www.tableausoftware.com/support/knowledge-base/join-vs-relationship-60
http://www.tableausoftware.com/support/knowledge-base/summary-and-detail-60

When one data source has subcategory & measures, and another has category & subcategory, can use Primary Groups to effectively create the category within the first data source. KB has article on this, search for “creating primary groups from a secondary source”.

Data blending and getting aggregations right:

Workbook with three options (extract, control data relationships for blend, custom SQL) by Jeff Mills:
http://community.tableausoftware.com/message/183888#183888
http://community.tableausoftware.com/thread/118319

Creating a Tableau Data Source .tds for connecting to just the database and not the table by Craig Bloodworth:
http://www.theinformationlab.co.uk/2012/11/26/connect-to-a-database-without-specifying-a-table-video/

Outer joining multiple Excel tables
https://www.interworks.com/blogs/tmccullough/2013/06/26/outer-joining-multiple-excel-tables-tableau-8

Custom datasource config files:
http://community.tableausoftware.com/message/227764

[loop category=”wikicontent” tag=”preparing,reshaping”]
[field title] – Added [field date]
[content]
[/loop]

Related posts:

    [loop tag=”preparing,reshaping,anonymizing,blinding,reshape,reshaper” exclude=”this” relation=”and” compare=”not” taxonomy=”category” value=”wikicontent”]

  • [field title-link][field thumbnail-link]

[/loop]

Stacked Bars and Lines

Bars and Lines

This is a post about getting the output you want, despite what Tableau thinks.
This is a post about making Tableau do what Excel can do (whether it’s a good idea or not).
This is a post about gaining better understanding of dimensions, measures, continuous, and discrete.
This is a post about putting bars and lines on the same chart. Continue reading