Tag Archives: padding

Seismometer-style animations in Tableau v2020.1

Note: this post uses a lot of animations and might be slow to load. Also the post was updated about 2 hours after publishing thanks to a tip from Pauli Isaacs @ Tableau about thinking about viz Level of Detail and what makes a markID.

I use a heart rate variability monitor and yesterday I was watching the graph on the app (sample below):

At first I was thinking of this as a “pulse rate” visualization but that style of monitoring has the trace of the pulse rate going from left to right, and then I realized it’s more like a seismometer or lie detector where the trace is drawn on the right while the paper is pulled to the left.

It takes a little bit of data preparation but we can get that effect in Tableau with the new #VizAnimations animated transitions feature coming in v2020.1. Here’s a view using climate data from the Met Office Hadley Centre showing 170 years of variance from the 1960-1990 global mean temperature:

I really like how having the marks “come in from the right” makes it seems like something is happening “right now”, and as the marks get to the late 20th century to the first couple of decades of the 21st the rising temperature change has a bigger impact. This kind of effect wasn’t really possible in earlier versions

Read on to find out how to apply this to your own data!

Overview

There are several steps to this method:

  1. Identify the dimension that you are using on Pages and the sort order. In this case that’s the Year and we’re going in ascending order. Then pad out the data so that each of those values of that dimension has all the data from the preceding values. So for each Year we’ll do a self-join to get the prior years.
  2. Create calculated fields to identify the latest/last value of the dimension on Pages and create a common x axis.
  3. Adjust the animation speed so that it doesn’t overly animate.
  4. Create calculated fields to act as a display x axis.

Note that technically this method will work for versions of Tableau prior to v2020.1, however with the animation controls it’s smoother.

1. At the core: a self-join to pad out the data

If you’ve ever tried to use the Pages Shelf in Tableau to draw a line mark over time you’ve ended up with something like this:

There’s a workaround where instead of using a line mark type we can use circles or something else and turn on Show history for Pages with mark trails and get a “line”, but the color of the trail depends on the color of the current mark, so the Color Shelf isn’t very useful:

Also if we hover over the mark trail there is no tooltip since there’s only one mark displayed.

Now a way I like to think of the Pages Shelf in Tableau is that it’s like a flip book:

https://images.app.goo.gl/FzTeRSi8ddcfq1aU6

The Pages Shelf is slicing the view into a set of “pages” based on the values of the field(s) on Pages. Since Year is on Pages that means we see one year for each Page, and in this case since Year is the only dimension in the view (setting the viz level of detail aka vizLOD) there’s only one mark for each year.* Mark trails are a convenience that Tableau adds.

* I’m skipping the other factor in the number of marks that is the number and layout of continuous fields on Rows/Columns and measures on Measure Values. #tableaunerdybits

Therefore if we think of going left to right with the years and adding a mark for each additional year of data what we really want to do is display all the years preceding each year as we add each new year. For example in this data set in 1850 we just have 1850, in 1851 we want to display the data for 1850 & 1851, in 1852 we want the data for 1850, 1851, and 1852, and so on. So we need to pad out 1851 with two years, 1852 with three years, and so on.

I term the field that we’re paging by the “Page” field, so “Page Year” in this case, and then the version that is getting displayed the “Display” field, so “Display Year” in this case.

The way we can set up this data source in Tableau is with a non-equi-join on the Year, which is supported in Tableau Desktop and Tableau Prep for most all data sources:

Note that the non-equi-join is doing a form of cross product on the data, so if you have a lot of original data you may need to aggregate first using a custom query or view, or Tableau Prep or some other tool. Here’s a screenshot of the resulting data, not the replication of the Display Years across the Page Years:

Once we’ve got that then in a few clicks we can get a left to right animation that preserves the color, has tooltips, etc. (This animation is running slower because it’s using default settings).

2. Setting up the right to left seismograph animation

In order to build the desired view we can’t just reverse the axes because we need to preserve the time order of the data. What we really want to do is draw the first mark (1850) at the right-most edge of the x axis. Then for the next Page Year 1851 we want to draw 1850 just to the left and then 1851 at the right-most edge, and so on.

Another piece to know about the Pages Shelf is that it draws all of the discrete headers & continuous axis ranges in advance to the full set of values/widest extent. Therefore we need to make a common axis for all of the marks. This leads to a level of abstraction where instead of plotting the marks based on the year we’ll indirectly do that using some math where the first Page Year & Display Year 1850 is at position X, then in Page Year 1851 the Display Year 1850 will move to position X-1 (i.e. to the left) and Display Year 1851 will be at position X, and so on.

To do this I created a Max X Position calculation with using a FIXED level of detail expression and the formula {FIXED : COUNTD([Page Year])} to identify the number of years.

Then the X Position calculation has the formula [Max X Position] – ([Page Year] – [Display Year]). This appropriately offsets each year, as in this workout view:

I also added a Year Label calculation with the formula IF MIN([Display Year]) % 10 = 0 THEN MIN([Display Year]) END, this will be used later to simulate an X axis.

A couple of notes on this:

  • This formula is assuming that the dimension used on Pages is sequential and has no gaps, if there were then you’d need a different calculation for the Max X Position.
  • It could also be possible to get the desired formula using a series of table calculations (and potentially have higher performance in some situations), in this case I skipped that due to the complexity of the calculations; directly using the Year values in the calculations feels more “direct” to me.

With that in place we can use the Page Year on Pages, the Display Year on Dtail and ATTR(X Position) on Columns and see the view going from right to left:

We’ll fix the X axis further below, the next step is to get the desired animation behavior.

3. Moving from “worm-like” to fully “sliding”

With the default animation settings (note this is the v2020.1 beta, the final display might change) as the new marks are added to the right the animation slides left, but the right-most mark is “extruded to the right, then up or down” and looks to me like a worm searching for food. This is due to the Sequential animation that is animating all the marks in sequence.

The solution is to change the animation Style to Simultaneous, and personally I like the look of changing the Duration to a Custom setting of 0.1.

Here’s a GIF with the revised setting:

Now we’re getting somewhere!

4. Setting up an X axis for display

The last piece is to set up a working X axis. Since the X Position calculation is “fixing” the X axis for the Pages Shelf we can’t just use that X axis, instead we need something else. In this case I’m turning to a labeling technique that I learned from Ramon Martinez’s Overweight & Obesity viz Workbook: Overweight and Obesity across Countries where we use a dual axis. In this case the synchronized second axis is using a type in calculation to set the Y location and then the Year Label calculation is on Text. Since the Year Label calculation only returns every 10th year then it appears to be a label:

After building & verifying that it works then we can turn off Show Headers and do other formatting to get the final view:

Conclusion

As I’d written earlier in this post I really like the effect of the marks coming in from the right instead of the left as a way to help telling the story of the data; besides making for a common “entrance” for each new year’s data the sliding effect of moving the earlier years’ marks to the left helps remind me of the “what happened before” and keep that trend in mind, which in the case of climate data is getting quite extreme. I’ll definitely be using this method as time goes on.

I’ve also got a couple of ideas for future exploration…the first is to build a truly “moving window” like we see in an ongoing seismometer or polygraph readout. The second idea is to extend the “axis for display” concept to enable custom zoom effects on the Y axis to help draw attention to changes in the data.

Here’s a link to the packaged workbook on Dropbox: global temp variance seismograph.twbx (I’ll post to Tableau Public once it’s updated to 2020.1).

When 576 = 567 = 528 = 456: Counting Marks

Tableau’s data densification is like…nothing else I’ve ever used. It’s a feature that is totally brilliant when it “just works” like automatically building out a running sum on sparse data and mind-taxingly complicated when a data blend’s results go haywire because densification was accidentally triggered.

What I’ve historically taught users is to always ALWAYS look at the marks count in the status bar as a first way to detect when data densification occurs. Here’s Superstore Sales data with MONTH(Order Date) on Columns, Region and State on Rows, there are 499 marks and we can see that the data is sparse by the class that are missing Abcs:

Screen Shot 2016-08-16 at 11.52.15 PM

If I add SUM(Sales) to the Level of Detail Shelf and set it to a Running Total Quick Table Calculation with the default Compute Using of Table (Across) so it’s addressing on Order Date then I see 576 marks and all the Abcs are filled in, this is Tableau’s data densification at work:

Screen Shot 2016-08-16 at 11.55.19 PM

However, here are three additional views all still using the same pill layout and Quick Table Calculations  showing three different marks counts (567, 528, and 456):

Screen Shot 2016-08-16 at 11.59.11 PMScreen Shot 2016-08-17 at 12.00.55 AM

The marks count is changing based on a variety of factors, the different quick table calculations used (running total, difference, and percent difference) are a part of it but the underlying behavior depends on whether a mark is densified or not, the pill arrangement, and whether or not a densified mark has been assigned a value (including Null) or not. Prior to Tableau version 9.0 these all would have been counted in the marks count and the views would show 576 marks for each, Tableau v9.0 changed the behavior to only count the “visible” marks.

I’ll walk through the above there examples. In this one the Running Total has been moved from the Level of Detail to the Rows Shelf and there are 567 marks.

Screen Shot 2016-08-16 at 11.59.11 PM

The reason why is that even though those combinations of Region, State, and Month have been densified for states like Iowa that don’t have any sales in the first month(s) of the year (more on how I know that below) those densified marks don’t have any assigned value (even Null) so they are not counted in the marks count nor are they counted in the Special Values indicator in the lower right.

In this view using the Difference calculation there are 528 marks and the Special Values indicator shows 48 nulls (528+48 = 576). In this case the Difference calculation is using the LOOKUP() function that is returning Null for the densified values.

Screen Shot 2016-08-16 at 11.59.11 PM

Finally in this view using the % Difference calculation there are 456 marks and the Special Values indicator shows 120 nulls (456+120 = 576). In this case the % difference calculation is spitting out extra nulls due to divide by 0 results.

Screen Shot 2016-08-16 at 11.59.11 PM

The difference is due to a change made in Tableau v9.0 where the marks count now only counts “visible” marks (Tableau’s term), where the definition of a “visible” mark is complicated, they are the “Yes” answers in the table below:

Screen Shot 2016-08-17 at 12.09.17 AM

Now one of the ways I’ve been used to checking for densification is selecting all the marks (either by Right+Clicking and choosing Select All or pressing Ctrl/Cmd+A) and then hovering over a mark and Right+Clicking and choosing View Data… or waiting for the tooltip to come up and using View Data. For example here’s the select all View Data in v9.0 for the % Difference on Rows view, the yellow cells indicate where data was densified and there are 576 rows:

Screen Shot 2016-08-17 at 12.12.24 AM

However, that doesn’t work anymore in Tableau v10.0, there was change made to the Select All functionality such that Select All only gets the “visible” marks, here’s that same view data in v10 and there are only 456 rows:

Screen Shot 2016-08-17 at 12.12.58 AM

So Select All doesn’t work the way it used to, and the marks count can change in “interesting ways” (and we haven’t gone into what things like formatting Special Values can do), so what can we do to spot densification? There are three workarounds for this, all documented in the right-most column of the table above:

  1. Select a discrete header or a range of headers, wait for the tooltip to come up, and click on the View Data icon.
  2. Right-click in the view (but not on a mark) and choose View Data…
  3. Use the Analyis->View Data… menu option.

All of these will show the densified values, here’s an animated GIF of selecting Iowa selected in the Difference on Rows view where we can see the  two Null values:

2016-08-17 00_21_03

However only one of those is actually densified, to tell that exactly we need to add a field that actually has data. In this case I’ve added SUM(Sales) to the Level of Detail Shelf and the View Data for Iowa now shows that it’s really only January that is densified, since there’s nothing at all in the January SUM(Sales) cell:

Screen Shot 2016-08-17 at 12.27.28 AM

Conclusion

The marks count is not a reliable indicator of the volume of densification and we need to resort to various selection mechanisms and the View Data dialog to more specifically identify how much has been densified. I’m not a fan of these changes: what I’d really like Tableau to do is to add a count of densified values to the status bar and details on what was densified to the default caption and the Worksheet->Describe Sheet… Until that time, though, hopefully this post will help you keep track of what Tableau is doing!

Here’s a link to the marks count workbook in v8.3 format (so you can open it up for yourself and see the differences in different versions).

Counting Pairwise Similar Votes in Tableau

This is another post inspired by a Tableau forums thread. Given a set of survey data that is in a “tall” format with a record for each voter & item (survey question) with their vote the goal is to end up with the sum of matching votes for each pair of voters. So if John & Karen both voted ‘yes’ on the same question that would count as 1 for that question, and then all other matching votes for the questions that John & Karen answered would be totaled up and that number put in a cell for the combination of John & Karen, like so:


The easiest way to do this would be using a join; however the data is from an OData source and those don’t support joins. Also data from OData sources has to be extracted and Tableau doesn’t currently support joining across extracts. The original poster indicated that doing any ETL wasn’t possible, the desire is to have everything just work in Tableau. So we turn to some alternatives, read on for how to build this with and without joins.

Approach

The way I approach this kind of problem is first to understand the goal and understand the data. The data seems pretty clear, and the goal is to end up with a matrix defined by the voter on Rows & Columns. So however many records there are in the data we want to see N^2 values where N is the number of voters. Given that the data source is OData (so no custom SQL) my first thought was to use the No-SQL Cross Product via Tableau’s data densification. That would requiring densifying both the voters (to make the matrix) and the items (to do the comparisons for each voter/item) and my initial attempts got way too complicated way too quickly so I bailed out on that. I came up with a slightly modified solution involving Tableau data blends, however I’m going to go through this first using a join-based solution because it’s easier to describe some of the subtleties involved (plus that will work for many data sources) and then the second time around with the blend-based solution

Join Solution

In this solution I set up the data so it has everything we need – all the combinations of votes and voters – then all we need to do is count records. Since we want to set up pairs of voters for each item, I set up a self-join on item:

This gets us the 47 combinations of voters & votes in the data. Now we can set up a view with the Voter dimensions from the original and the join:

Note that there are some empty cells here: the pairs Tom & Steave and Tom & Toney didn’t vote on any of the same items at all. We’ll come back to this later.

We only want to count voters that had the same vote, so the following Pairwise Vote Filter calc will return only those votes:

[Vote] = [vote (Sheet1$1)]

With that on the Filters shelf, we can set up a view using SUM(Number of Records):

There’s a bunch of empty cells here, what if we want 0’s to show? We can use Format->Pane Tab->Special Values->Text, but that will only work where there is data and we know there are some cells that don’t have data. To get those cells to be marks we can take advantage of Tableau’s domain completion by having a table calculation address on one of the voter dimensions.

We can use a simple table calculation like INDEX() (the field is called Domain Completion Trigger) and the default Compute Using of Table (Across) will address on the voter (Sheet1$1) dimension, padding out the marks:

With that in place we can now build the final view for the join. I set Format->Pane tab->Special Values->Text to be 0, changed the Mark Type to Square, edited the color to use a custom diverging palette (starting at 0), and turned off “Allow labels to overlap other marks” to have Tableau auto-swap the text color so the darker cells have white text:

So the data didn’t have quite all the granularity that we needed for display and we had to turn on data densification with a table calculation to pad it out. In the next section we’ll use Tableau’s ability to do even more padding.

Blend Solution

This uses a different approach. In this case, we set up the view so it has all the marks that we need (but not quite all the marks we’d want), blend in the data for the each half of a pair of voters and then use calculated fields to compute across the data and  “paint” the right values into the marks. It uses the original data as a primary source and then the domain completion technique outlined in the No-SQL Cross Product post to effectively get the necessary marks, then uses two self-data blends to get the comparison data that take advantage of the fact that Tableau data blends are computed after densification. For more information on that, see the Extreme Data Blending session from the 2014 Tableau Conference. https://tc14.tableau.com/schedule/content/1045.

Starting out I duplicated the Voter twice, naming one Voter (Rows) and Voter (Cols), then put those on Rows and Columns, respectively. We only see the 5 marks for the 5 voters:

Then we can use the same INDEX() calc to trigger domain completion:

We need to do the comparison at the level of voter *and* item, and for Tableau to compare across data sources the comparisons have to be done as aggregates, so that means that item has to be in the view. When we add Item to the view what we’re seeing in each cell is a mark for each time the the voter on Rows and Voter on Columns both had votes for the same Item. There are 47 marks here, just like the 47 rows we got from the self-join solution.

A problem here is that we lose some of the domain completion, we’ll work around that. (Where I’d tried to start was to do the second set of densification necessary domain complete on Item as well, but that got too complicated.)

The view just got a lot bigger here, that’s because of Tableau’s mark stacking behavior. We’ll fix that later with a table calculation filter.

Now we can set up a couple of self-blends by first duplicating the data source twice. It’s also possible to duplicate the data connection only (for example by directly connecting to the extract), however that requires more effort to set up. I named the duplicated sources Rows and Cols, and in each created calcs for Voter (Rows) and Voter (Cols), respectively. Then I could add in the Vote fields from each source and Tableau automatically blends the Rows source on Item, Voter (Rows) and blends the Cols source on Item, Voter (Cols). If I hadn’t named the fields the same then I could have used Data->Edit Relationships… Here’s a view showing for each voter pair the vote (Item), the votes from Rows, and votes from the Cols source:

This view lets us see what votes line up with what. So in row karen/column john, we can see that they both voted for items 21, 25, and 32, and had the same votes for each.

The next step is to build a test calculation for the view. Here’s the formula for Pairwise Similar Vote Test:

IF ATTR([blend Sheet1 (test_voting) (rows)].[Vote])
 == ATTR([blend Sheet1 (test_voting) (cols)].[Vote]) THEN 1 ELSE 0 END

We’re using ATTR() as an aggregation because a) that’s the default and b) we are comparing fields from two different sources and Tableau requires them to have some sort of aggregation applied.

In the view, we can see that the calculation is working accurately:

The Vote dimensions from the secondary are useful for checking the calcs, but they aren’t needed at this point so we can get rid of them:

Now to count up the votes in each cell. Here’s the Pairwise Similar Votes w/0 table calculation:

IF FIRST()==0 THEN
    WINDOW_SUM(IF ATTR([blend Sheet1 (test_voting) (rows)].[Vote])
      == ATTR([blend Sheet1 (test_voting) (cols)].[Vote]) THEN 1 ELSE 0 END)
END

This has a Compute Using on the Item so it partitions on Voter (Cols) and Voter (Rows). The inner IF statement is our same calc, those results get summed across all Items in each partition, and then the IF FIRST()==0 returns only a single non-Null value in each cell. Here it is:

We can then duplicate that view, make the marks Square, duplicate the Pairwise Similar Votes pill to the Color Shelf, set up a custom diverging color, duplicate the pill again to the Filters shelf to set it to filter for non-Null values, and we end up with this:

There are those empty holes where there are no Items for Tom & Steave and Tom & Toney. There’s no way that I know of using this particular blend to fill them in, because Item has to be a dimension in the view the domain completion is limited. This might be useful in some cases, I also came up with an alternative.

In this alternative instead of returning 0 when there are no pairwise similar votes the calc returns Null, here’s the revised Pairwise Similar Votes formula:

IF FIRST()==0 THEN
    WINDOW_SUM(IF ATTR([blend Sheet1 (test_voting) (rows)].[Vote])
      == ATTR([blend Sheet1 (test_voting) (cols)].[Vote]) THEN 1 END)
END

This has the same settings as the first, only now it won’t show any numbers. Then using the same process as before along with tweaking the color palette to start at 0 we can have a view that only shows where there are non-zero results, with white for everything else:

Conclusion

So there’s a couple of ways to go at this, the relatively easy way with a join and the more complicated way with the data blend. Personally, I’m in favor of voting up the Join Data from Different Sources feature request to allow joins across data sources, then even something like an OData source could be extracted twice and joined to create the desired view.

And the Tableau Public link: Pairwise Similar Votes.

Padding and Working with Null or Missing Values

Padding, Null/Missing Values

When We Need to Pad Data Outside Tableau

Some cases when we need to pad data “outside” Tableau, either in the data connection or underlying queries.

  • When doing future projections, in order to get future dates.
  • When combining two data sources that are both incomplete. An example is the basic falls view, where we want to look at fall rates and # of falls by unit over time. Some units may have no falls in the time period being reviewed, so we need to pad for the unit so that is always displayed. In addition, if there are no falls for the given time period for the unit, then Tableau won’t display headers for discrete dates for the unit, so we need to pad for the date. In addition, there might not be falls in a given unit when we don’t yet have the patient days (denominator for the fall rate), so we need to be sure to pad the date to get to the current date.
  • In order to get lines to draw properly in time-series data. (see a discussion link somewhere here)

More on padding data from Joe Mako – using Custom SQL to pad data:
http://community.tableausoftware.com/thread/104070?start=30&tstart=0

Most everything else in this section is about padding data in Tableau…

Padding Data

One way:
ZN(LOOKUP(SUM(IF [fieldname] = [value] THEN 1 ELSE 0 END),0))

Padding event information to get lines to draw
http://community.tableausoftware.com/message/174518#174518

Using ZN(LOOKUP()) to pad to get an area chart to draw:
http://community.tableausoftware.com/thread/119225

Use ZN() function to fill in zeroes (for example, to get the ref lines to show up) http://www.tableausoftware.com/support/forum/topic/filling-zero-blank-values

Working with Null/Missing Values

Post on how to create an average that includes setting missing days as 0:
http://community.tableausoftware.com/thread/116248

Getting from an “empty” cell to data:
http://community.tableausoftware.com/message/173105
ZN(), IFNULL(), etc. work when there is a Null value. When there is no data, LOOKUP(agg([field]),0) returns Null when there is no data and that can be wrapped in the ZN, etc. Alternatively the Format special values text can be used.

From http://community.tableausoftware.com/message/176503#176503
When working with blends where the secondary datasource has no corresponding rows, can use IFNULL(ATTR([secondary field])) as a calc in the primary to determine if there’s data returned, for example the following to return 0 for no value else return a value:
IF ISNULL(ATTR([secondary datasource field])) THEN 0 ELSE SUM([secondary datasourcefield]) END

more on padding data (getting Null values so Tableau will draw lines correctly):
http://community.tableausoftware.com/thread/104070?start=0&tstart=0

More on padding (at least for Tableau 6), from http://www.tableausoftware.com/support/forum/topic/trellis-plots comments by Joe Mako:

When dimensions are on Rows & Columns shelf, Tableau pads the domain. When one dimension is on the columns shelf, and one on rows shelf, tableau doesn’t pad the domain.

Having both dimensions used for addressing (right-side list box in Advanced…->Compute using) effectively causes Tableau to pad the domain.

Notes from Mar 2012 WebEx w/Joe Mako

I don’t want to misrepresent Joe’s awesome knowledge here, if there are any mistakes in these notes from our conversation they are almost certainly mine.

Joe notes there’s some strangeness when there’s there are distinct groups, i.e. a one-to-many hierarchy like category/sub-category in superstore sales. Joe walked me through an example: Put Sub-category on columns, SUM(Profit) on rows, Category on LoD. Have 17 marks in a Bar chart. Change mark type to line, suddenly there are 51 marks.

Also, show missing values can do weird stuff in Tableau 7.0.

For dates, use DATEPART/DATETRUNC instead of Tableau’s options because Tableau’s options are harder to use for table calcs.

If a combination of dimensions doesn’t exist, Tableau can get weird fast with padding.
Example w/SuperStore Sales:
Put Customer on Columns, Container & Category on Rows. See blank cells, and Abcs, with 4514 marks. Create Index() calculated field, and put that on the Text shelf. Suddenly there are 15900 marks.

also, see table calc padding.twbx workbook

From Joe Mako post http://community.tableausoftware.com/message/186882#186882
– Arrangement of dimension pills on shelves, Rows, Columns, and shelf on the Marks card, eg making a crosstab
– Using a date pill for the compute using
– Show Missing Values for discrete date pills
– Your Mark type, Line and Polygon mark types will pad more with a dimension on the Level of Detail shelf, depending on your data contents.

Each combination here has different results and affects padding differently. Another factor is how is padded marks can be detected and handled either in filtering or in formulas, because the concept of “Missing” is not equal to Null. The order of operations needs to be considered.

Controlling padding/densification

Use discrete measures on Rows & Columns to prevent unwanted densification, post from Joe Mako: http://community.tableausoftware.com/message/223044

There’s a lot of undocumented functionality in terms of padding within Tableau, so be careful about using these last bits.

[loop category=”wikicontent” tag=”padding”]
[field title] – Added [field date]
[content]
[/loop]

Related posts:

[loop tag=”padding,densification,domain-completion,domain-padding,missing-values,show-empty-columns,show-empty-rows,show-missing-values,null,empty” exclude=”this” relation=”and” compare=”not” taxonomy=”category” value=”wikicontent”]

  • [field title-link][field thumbnail-link]

[/loop]

Table Calculations

See Want to Learn Table Calculations? Here’s How for a list of posts that I recommend.

From http://www.tableausoftware.com/support/forum/topic/trellis-plots comments by Joe Mako:

1. First step in table calcs is what kind of pills you have:
a. aggregated continuous
b. aggregated discrete
c. dimension continuous
d. dimension discrete

2. What shelf these pill types are on
– each of these types has a different effect on the evaluation of a table calc depending on their arrangement.
– when you have a dimension on both the rows and columns shelf, Tableau will padd for missing dinension combinations, unless you are using a set based on these dimensions
– building your view with all dimensions on the rows shelf is a good place to start, and once results are as expected, then move pills one at a time to other shelf, and see how shelf placement effects evaluation.

3. How you set your compute using
– if you use the compute using selection from the context menu, or the drop-down in the dialog, and set it to a dimension, that dimension will be used for addressing, and all others will be used for partitioning. I call this Regular Compute Using
– when setting the compute using to something like Table/Pane/Cell, I call that Layout Compute Using, and try to avoid this because I find it to be a delicate setting (move a pill, move a dimension pill, and your evaluation will likely be effected).

– with Regular Compute Using, if all addressing values do not exist in all partitioning values, the INDEX() value for each addressing value will not be the same across partition values. This is apparent with not all combinations of dimension values exist. (In this situation, there are states that have no sales records for some months, so whit your compute using set to State, and partitioning on Month, each State will not have the same INDEX() value in each month.)

– when using a Regular Compute Using and a crosstab of the dimensions in use (one on the Rows, and the other on the Columns), and no Set based on those dimensions, Tableau will pad the data making each State have the same INDEX() value for each Month.

– So without the crosstab setup, but with the desire for Tableau to pad the data to cause the INDEX() function to evaluate as desired, you can have Tableau pad the data as desired with an Advanced Compute Using.

– When you use multiple dimensions on the Compute Using (right-side list box) for addressing, I think of it as like a pseudo-crosstab of all potential combinations of the dimension values (this is what we want in this situation). If you use a Set instead of multiple dimensions, that data will not be padded.

It is my current belief that if you are aware of what pills you have in use, what shelf they are on, how your compute using is setup, and how all these potential combinations of setups effect the evaluation of table calcs, then table calcs are straightforward.

Without doubt, I have not included all information needed to understand table calcs here, but only some key concepts that play a factor in the attached “(fixed)” worksheets.

(end of post)

Another couple of posts by Joe:
http://community.tableausoftware.com/message/175545
http://community.tableausoftware.com/message/136628#136628

Joe’s notes on 6.1 to 7.0 differences:
http://community.tableausoftware.com/thread/117103?start=0&tstart=0

LOOKUP() vs. PREVIOUS_VALUE

See here: http://community.tableausoftware.com/message/173752

Also this from email from Joe Mako:
Have you attempted the formula that you recommended at
http://community.tableausoftware.com/message/173588#173588 ?

Instead of ATTR([Status]) != PREVIOUS_VALUE(ATTR([Status]))

try ATTR([Status]) != LOOKUP(ATTR([Status]),-1)

The PREVIOUS_VALUE() function is self referential, kind of like a recursive function or a loop. The argument you pass the function is just the starting value for the partition, while the LOOKUP() function gets the value based on an offset, and an offset of -1 means previous.

In your statement, the first mark will result in False, effectively [Status] != [Status], and every other record will also result in True because it is likely [Status] != “False” or [Status] != “True”

The formula ATTR([Status]) != LOOKUP(ATTR([Status]),-1) is basically saying:
“Compare the Status of the current mark and the Status of the previous mark in the partition.”

I recommend you try some trial and error with the PREVIOUS_VALUE() function

Try some formulas like:

PREVIOUS_VALUE(0)+1
PREVIOUS_VALUE(100)-1
PREVIOUS_VALUE(0)+SUM([Sales])
PREVIOUS_VALUE("")+ATTR([Product])+", "

Ignore in Table Calculations

http://www.tableausoftware.com/fr-fr/support/forum/topic/aggregating-aggregate-histograms
http://community.tableausoftware.com/thread/114442
http://community.tableausoftware.com/thread/110683

Lots of Table Calc Links

http://www.tableausoftware.com/table-calculations
http://www.tableausoftware.com/support/knowledge-base/table-calculations
http://www.tableausoftware.com/support/knowledge-base/pareto-analysis
http://www.tableausoftware.com/support/knowledge-base/compare-two-table-calcs-60
http://www.tableausoftware.com/support/knowledge-base/nestedsorting
http://www.tableausoftware.com/support/knowledge-base/sorting-concepts-related-quick-table-calculations
http://www.tableausoftware.com/support/forum/topic/plotting-custom-median-and-connecting-marks
http://community.tableausoftware.com/thread/111337 – multiple “reference” lines on the data using table calcs and/or duplicating data
http://www.tableausoftware.com/support/forum/topic/multiple-reference-lines – using average age for male & female reference lines, interesting table calcs here
http://www.tableausoftware.com/support/forum/topic/display-dimension-text-associated-measure – older discussion from 2010 showing use of multiple table calcs trying to get a minimum of a set of values for display, great comments from Ross Bunker

http://interworks.co.uk/blog/groundwork-for-custom-table-calculations/

 

Order of calculations:

http://community.tableausoftware.com/message/139603#139603

 

The following text from http://onlinehelp.tableausoftware.com/v7.0/pro/online/en-us/calculations_tablecalculations_understanding_addressing.html

Addressing and Partitioning

The addressing fields define what part of the table you are computing along. The partitioning fields define how to group the calculation. In the example of a running sum of product sales across several years, the addressing field is the Date field while the parititioning field is the product field. When youd define the addressing for a table calculation, all the other fields are used for partitioning.
You can specify the addressing in the Table Calculation dialog box. The addressing can be relative to the table structure or a specific field. Each addressing option is described below.
Table (Across)
This option sets the adressing to compute along the entire table moving horizontally through each partition.

From Table Calcs primer PDF:
• Partitioning: the scope or grouping of the calculation. This can be the full table, a pane, a cell, a dimension or it can be customized even further for more advanced calculations.
• Addressing: the anchor or the source of each partition. It defines the root of the calculation.
Consider, for example, looking at a running sum of sales by product across several years. In our example, the running sum is partitioned by product: every product’s sale is summed over time, so the result of the calculation is a running sum of coffee sales, tea sales, etc. The addressing field is the date field. With every new date, the sales of that data are added to the sum. When you define the addressing for a table calculation, all the other fields are used for partitioning.
Table (Across)
Table (Across) is a Calculate Along option. The chart below uses Table (Across) to set the addressing to compute along the entire table, moving horizontally through each partition.
When calculation addressing is set to Table( Across), the fields that span horizontally across the table are the addressing fields (Category and Region).

 

From: http://www.tableausoftware.com/support/forum/topic/version-6-table-calcs-understanding-few-basics#comment-27646
The fields specified in the ‘Compute Using’ menu or ‘Edit Table Calculation…’ dialog are termed the addressing fields. All of the dimensions on a sheet that are not addressing fields are partitioning fields. This corresponds to the ‘computed along’ (addressing) and ‘for each’ (partitioning) components of the description in the table calculation dialog. One other nice thing is that in the final v6, the tooltip for a table calculation will actually contain the description also, so you don’t have to open the dialog to understand what it is doing.

From http://www.tableausoftware.com/support/forum/topic/version-6-table-calcs-understanding-few-basics#comment-27736
‘Restarting every’ moves the given field and those above it in the Advanced… dialog to be partitioning. Similarly ‘At the level’ moves fields after it in the list to be partitioning, though there is a subtle difference in how that partitioning is done (its not done on value, but rather on postition within the partition, i’ll give more details on that in the tutorial).

One might ask why you would put something in the ordering list in Advanced… only to partition on it by setting ‘Restarting every’. The answer lies in the sorting behavior. For example, if you want to know the top products for each market, you can’t simply sort products by sales (which sorts based on sales for all markets). Instead, you sort Market,Product by sales. Then, when you partition on Market, the products within each of those Market partitions are still sorted by the sales within that market. Play around with it to see the difference.

My notes:
Addressing is the compute along/over/by.
Partitioning is the “grouping” or “scope” of the calculation – to some degree defined by the “context of the calculation, since All dimensions not part of addressing are used for partitioning. Also includes dimensions (and discrete pills) on Level of Detail shelf (and Color shelf too?)
‘Restarting every’ moves the given field and those above it in the Advanced… dialog to be partitioning.
Example of running sum of sales by product across several years:
product is partition, each new product creates a new returned value of sum of sales
date is address, sum of sales for each new date is added to the returned value for that partition
Where I’ve been confused (I think) is that there are three things in table calcs, not two: partitioning, addressing, and what is being calculated

More notes from convo with Joe on 3/2012

I don’t want to misrepresent Joe’s awesome knowledge here, if there are any mistakes in these notes from our conversation they are almost certainly mine.

Table (Across), etc. do a visual sort base on the layout of the view. The results from Table (Across), etc. can sometimes be the same as what you can get by using Advanced Compute Using, they aren’t always. For example, you can have a complex sort set up in the view and Table (Across), etc. will work just fine, but you can’t duplicate that sort using Advanced Compute Using because Advanced Compute Using creates a set of the fields in the Compute Using and then sorts that set along whatever is set in the Order Along.

This can be confusing because the English language description Tableau gives when you hover over the pill in the view can be the same for a Table (Across) and an Advanced Compute Using(), but the results will be different.

~~~Example using Superstore Sales:
Put Container on Rows, Category on Color, and Sales on Columns. Create an INDEX() function and see what it does try to duplicate that using an Advanced Compute Using

Show Me can be used to “freeze” a setup, typically by Table (Across) or Table (Down). Can be used to get to a view that you can’t get w/Advanced…Compute Using. “Table (Down) is magic dust with Show Me.” ~~~Need an example here, probably one with a funky INDEX().

Roll Your Own Index

IF ATTR([Field]) = LOOKUP(ATTR([Field],-1) THEN
 PREVIOUS_VALUE(0)
ELSE
 PREVIOUS_VALUE(0)+1
END

This creates a rank that shows ties, or can be used to ignore something in a table calc.

~~~Example:
Rows: Roll Your Own Index (set Compute Using to Container), Container (sorted on Sum(Sales) Asc) Color Shelf: Category

This allows the index to be computed on container but ignore category for sort and partitioning. This would be a nice feature.

Padding

Using dimensions on both Rows and Columns shelves causes Tableau to pad, while dimensions only on Rows doesn’t cause Tableau to pad.

~~~Put Container on Rows, Category on Cols, see what INDEX() does Put Container on Rows, Category on Rows, see what INDEX() does – will see different marks used now Compute Using set to Across then Down vs. Down.

next example: Put Customer, Order Date on Rows. Index() on Text Index on Customer – ranks them Index on Order Date – causes padding, Tableau takes a really long time to return data. If you use MONTH(Order Date) it doesn’t take as long.

the key issues w/padding are:
Aggregate vs. dimension
continuous vs. discrete
datepart vs. datetrunc

Some details w/display can get lost when using Primary/Secondary sources. example: duplicate Superstore sales, when Customer from both is in the Customer from secondary does not show ATTR(), i.e. doesn’t show that it’s an aggregate. (thought here is to outline the pills)

How Joe makes table calcs work

1. put every dimension on Rows shelf
2. Measure Names as Columns, Measure Values as text 3. Do addressing on the right-most pill

This way, can turn on totals and subtotals to check what they return vis-a-vis the table calcs. Also can see the partitioning and padding. Using a full-blown cross tab makes it more confusing. Also allows filters using table calcs to be tested since all dimensions are available on the Rows shelf.

Filtering Notes

Table calc filters can only see what’s on Columns and Rows shelves, and/or see their own instance on the Level of Detail, so if needed fields for the calc aren’t on Columns or Rows you can put an instance of the table calc on the Level of detail shelf.

Another advantage to using Joe’s method for making table calcs work is that table calc filters can be tested as well.

Aggregated/continuous pills can go on Filter shelf Aggregated/discrete pills can’t go on Filter shelf Table calc/continuous or discrete pills can go on Filter shelf

Example of this
Create SUM(1) calculated field, call it “Test Agg”. As Continuous, it can go on Filter shelf. As Discrete, can’t go on filter shelf. Create another calculated field LOOKUP([Test Agg],0). As Continuous or Discrete, it can go on the Filter shelf.

(end of Joe notes)

More on At the Level and Restarting Every (Alex Kerin question leading to Ross Bunker post)

http://community.tableausoftware.com/message/178239#178239

 

Table calc – result only needed for one row in partition:

From http://www.clearlyandsimply.com/clearly_and_simply/2011/01/another-look-at-site-catchment-analysis-with-tableau-6-part-3.html

IF (FIRST()==0) THEN
   WINDOW_SUM(SUM([Population]), 0, IIF(FIRST()==0, LAST(), 0)) 
END

Don’t try this with TOTAL(), it can mess things up

Table calc – result needed for all rows in partition (same answer for each case)

From http://www.clearlyandsimply.com/clearly_and_simply/2011/01/another-look-at-site-catchment-analysis-with-tableau-6-part-3.html

PREVIOUS_VALUE(WINDOW_SUM(SUM([foo]), 0, IIF(FIRST()==0, LAST(), 0)))

Probably shouldn’t try this with TOTAL() either
TOTAL() hits DB while WINDOW_SUM() does not, would probably affect order of filters? TOTAL() causes any enclosed aggregate functions to be evaluated over each partition – but I’m not really sure why “Table Across” behaves differently to
WINDOW_XXX() functions.

Uncheck Ignore in Table Calculations so discrete field can be used in a table calc:
http://community.tableausoftware.com/message/173434#173434

Idea for Table Calcs:
Duplicate a dimension and put the copy on the LoD shelf so calcs can be along that, but summed overall.
http://community.tableausoftware.com/message/174254#174254

Index (roll your own)
From: http://www.tableausoftware.com/support/forum/topic/ignoring-order-along-advanced-table-calc#comment-44787
Based on: http://www.tableausoftware.com/support/forum/topic/top-x-shows-more-x-values-when-using-color-shelf

IF LOOKUP(MIN([Product]),-1)=MIN([Product]) THEN 
   PREVIOUS_VALUE(0)
ELSE
   PREVIOUS_VALUE(0)+1
END

Discussion of this and partitioning (Ross Bunker)
http://community.tableausoftware.com/message/183504#183504

Using PREVIOUS_VALUE to combine multiple rows into one
Example here is to make a list of names like Jane, John, Joe, etc. from multiple rows.
http://community.tableausoftware.com/thread/118259

TODAY() and NOW() generate Internal Expression Error inside table calc
As of Tableau v7, the following formula generates an “Internal Expression Error: Function ‘TODAY’ is not defined in the current context” error.
WINDOW_SUM(IF ATTR([Order Date]) >= TODAY()-29 THEN WINDOW_SUM(SUM([Sales])) END)
The same happens when NOW() is used instead of TODAY(). The solution is to make a calculated field with NOW() or TODAY(), and then use the calculated field in the function.

 

Given an index to another row, a formula to get an offset to that row from the current row, to use in a LOOKUP() statement:

FIRST()+[the index] -1

Improving table calc partitioning & addressing
http://community.tableausoftware.com/ideas/1189

[loop category=”wikicontent” tag=”tc,tablecalcs,table-calculations”]
[field title] – Added [field date]
[content]
[/loop]

Related posts:

    [loop tag=”densification,address,addressing,tc,table-calcs,table-calculations,aggregation,index,padding,partition,partitioning,PREVIOUS_VALUE(),rank,ranking,at-the-level” exclude=”this” relation=”and” compare=”not” taxonomy=”category” value=”wikicontent”]

  • [field title-link][field thumbnail-link]

[/loop]