Here’s a problem that has been bouncing around in my brain since I first used Tableau. How do I compare the results of every permutation of one item vs. another? Here’s an example using Superstore Sales – I put Region on Rows and Columns, and SUM(Sales) on the Text Shelf, and only see four values:
What if I want to compare Sales in Central to those in East, South, and West, and Sales in East to South and West, and Sales in West to Sales in South simultaneously? We can compare two at a time using parameters or a self-blend, or one vs. the rest in different ways via sets or table calcs or calculated fields, but how about each against each other? What if we want a correlation matrix? Read on to find out how to do this without any SQL, and learn a little bit about domain completion.
When we put Region on both Rows and Columns, even though we have 4 rows by 4 columns = 16 cells, Tableau generates only 4 marks, because there are only 4 Regions in the data. Here’s the status bar for the above view, showing the 4 marks:
So we need to pad out the data somehow to be able to compare each to the other. The typical solution to this is to do a cross product, cross join, or Cartesian join of the data where every row in the data is crossed with the same rows from other sources. In a simple data set using Microsoft JET SQL syntax, the cross product might be written as:
SELECT .[Region] as [Region], .[Sales] AS [Sales], .[Region] AS [Region cross], .[Sales] AS [Sales cross] FROM [Data] AS , [Data] AS 
If we started out with 4 rows of data, we’d end up with 16 rows. If we started out with 4000 rows, we’d end up with 16,000,000 rows, which is a whole lot of extra data for a seemingly simple analysis. If we did some work to pre-aggregate the data in SQL, we could cut that down significantly, but then we’d have to be careful about what dimensions we wanted to filter or blend on to keep them in the data, and besides not everybody knows SQL and, perhaps more importantly, nor should everybody have to.
Now, with a little dose of table calculations, you don’t have to. The key to doing a no-SQL cross product in Tableau is domain completion. Domain completion is a form of padding (also called densification) that Tableau does when building crosstabs for table calculations to pad out sparse data and return meaningful results. For example, here’s Superstore Sales with Region on Rows and Order Date on Columns for the first 10 days of December and there are 8 empty cells where there are no values in the data for a given Region or Order Date:
With an understanding of domain completion, now comes the beginnings of the cross product. Going back to our original view, Tableau has created 16 cells:
All we have to do is have a table calculation (or two) that will fill in those cells. Here’s an introductory calculation to do just that:
Why isn’t it padded out? This reveals another factor in domain completion, that we need to have different dimensions in play, right now there’s only Region. The next step is to duplicate the Region dimension, and we can use that copy instead:
With the default Compute Using of Table (Across), now we’re seeing Tableau complete the domain for each Region (copy) for each Region and therefore generating a mark for every combination of Region and Region (copy). We can also see this down below in the status bar, Tableau is showing 8 rows (4 Regions x 2 measures) by 4 columns, and 32 marks, indicating a fully completed domain:
The reason why we use the WINDOW_MAX(SUM([Sales])) is that within a given row there’s only a single value of sales and we want to return that value to every column. We could use WINDOW_AVG, _MIN, _MEDIAN, or _SUM, I’m just used to using WINDOW_MAX. Also, we can make the calculation more efficient by computing the result only once for each partition by wrapping it in PREVIOUS_VALUE(), to get:
In this case PREVIOUS_VALUE() returns the value of the calculation from the prior column to the next column, and for the very first column uses the WINDOW_MAX(SUM([Sales])) as the argument. Now we can take that formula and use it in two identical calculated fields to fully pad out the data, one for the columns and one for the rows:
Here, I’m using Table (Across) as the Compute Using for across and Table (Down) for down. To compare each row vs. the other, I created a basic % difference calc:
([Sales (Down)]-[Sales (Across)])/ABS([Sales (Across)])
However this looks confusing, is that 59.0% in the first column indicating that South had 59% more Sales than Central or vice versa? The easier choice would be a form of triangular matrix, showing one against the other. In a regular crosstab, we could potentially filter out the dimension combinations that we don’t want, however here we’ve used domain completion and a filter on dimensions wouldn’t remove the data that’s been padded in by the table calculations. In this case, the table calcs call for a table calculation filter. To set that up, we can use the INDEX() table calculation twice to identify each cell going across or down, and then do a simple comparison to return True for only the upper/right triangular matrix:
//creates triangle in upper right. flip < sign to have triangle in lower left [Index (Down)]<=[Index (Across)]
Finally, we can build the matrix as a highlight table. This is a little convoluted to work around what I believe is a a Tableau bug in version 8.0 and 8.1 (at least) where we have to use Show Me to get the highlight table with text that automatically becomes lighter on a dark background (if we manually build the highlight table Tableau will only have a single text color), here are the steps:
- Duplicate the workout crosstab worksheet.
- Moved the Upper/Right Triangle filter calc to the Filters Shelf, filtering for True.
- Drag the SUM(Sales), Index (Down), and Index (Across) measures off the Measure Values Shelf.
- Change the Sales (Down) measure into a discrete. (Tableau Tip – Once a measure is on the Measure Values card, you can change it into a discrete, even though you can’t drag a discrete measure onto the Measure Values card in the first place.)
- Change the Sales (Across) measure into a discrete.
- Drag Sales (Down) and Sales (Across) pills onto the Filters Shelf, filtering for All values. At this point, Tableau will replace Measure Values with the % Difference measure on the Text Shelf. (This is to park two measures, because using Show Me would otherwise remove the measure).
- On Show Me, click the Highlight Table. Tableau creates the highlight table, but moves the Region (copy) dimension to the Rows Shelf:
- Move Region (copy) back to the Columns Shelf. Now the table looks like an upper triangular matrix:
- Drag Sales (Down) off the Filters Shelf to the Columns Shelf to provide an extra label for comparison:
- Drag Sales (Across) off the Filters Shelf to the Rows Shelf to provide an extra label as well.
- For cleanup, change the Color Scheme to a red/black or something else that will work for color blind folks, get rid of the extra row and column dividers, and hide the labels to have a usable view:
- Finally, I added an additional calc that returns above/below/the same as to build a useful tooltip:
Here’s an alternate lower triangular matrix, and the whole workbook on Tableau Public:
I’m not sure how useful that kind of triangular matrix view is, so I’d been sitting on this post for awhile. It wasn’t until Monday that I got inspired to do something more…
Building a Correlation Matrix
This week Bora Beran had a post on creating a correlation matrix in Tableau using the car data from R, with two versions: one demonstrating R integration and the other table calcs:
I think correlation coefficients are very cool, and underutilized because historically the’ve been more painful to build in Tableau due to the combination of pre-processing the data and needing to build out the table calcs. I was excited to see this and tried out this technique on the correlation matrix Bora had created and got it working for both table calcs and R. The R side got a bit more complicated because I had to pad out more of the measures, but it works, here’s a download link because Tableau Public doesn’t support R. (Soon, I hope?). This makes a great example of how using this technique can reduce load on Tableau, the original data set with the cross product has over 11000 rows in it, using domain completion and table calcs cuts it down to 1/352nd of its size while retaining full ability to filter the data.
In a sign of not seeing the forest for the trees, I hadn’t actually realized that this technique is a cross product without SQL until working on Bora’s post, and with his encouragement I finished up this post (and thanks for the shout-out, too!). Also, thanks to Jim Wahl, I’d forgotten about the dual-index filter trick until he used it in a Tableau forum post. If you’re looking for other ways to compare ones to others, there’s a list of different tricks in the Analyses wiki. If you use this or some other comparison technique in one of your analyses, please link to it below!