Around here (southern Maine) there’s a bug called a horse fly. It’s like a super sized housefly that bites. Fortunately, it’s slow and easy to swat, if you see it. Last week I was bitten by Tableau’s sorting behavior, and thought I’d share the problem, a couple of simple solutions, and the lessons learned.
The story begins with me helping another Tableau user write a complicated calculation to calculate “empty” miles from a set of transit trips – buses and vans in this case – for a given day. The data set was a couple of hundred thousand rows with a lot of dimensions and measures, each row had a TripID to identify the passenger along with start and stop odometer readings. Here’s some simplified data filtered for just the one day:
Since multiple people could be on the bus at the same time, in order to calculate the miles that the vehicle was not in use I needed to properly account for the overlapping miles, and to do that I needed the data sorted, from the lowest StartOdometer/lowest EndOdometer on up. So, I set up a calculated field STR([StartOdometer]) +” ” + STR([End Odometer]), and the sorted TripID on that field, Ascending, with aggregation Minimum. Here’s the sort and the result:
Everything was working fine, until we created another view to do a calculation across days, and this happened:
We can see that TripID 9 shows up first on 2/1/12, even though it’s got a higher start and stop mileage than the others. What’s going on? Here was the evidence of my mistake – not getting to know the data. It turns out that TripID does not uniquely identify a trip – there are multiple records for each TripID.
An easy visual wat to check for uniqueness is to set up a Bar chart with your discrete dimensions and measures on the Rows Shelf, and SUM(Number of Records) on the Columns Shelf. If the Number of Records Axis only shows 0, 1, and 2 for values, then you have a unique set of records. This takes advantage of how Tableau draws axes as it sizes for the existing range and a bit more so the bar doesn’t go all the way to the right. Using this method, I found that TripID and TripDate did in fact identify unique records:
The reason why this happens is that Tableau doesn’t sort a value (and then re-arrange everything else in the table/view), it sorts a dimension by a value. An advantage to this is that dimensions can be independently sorted by different measures, the disadvantage is that we’ve been trained by applications like Excel to expect the former behavior.
What’s happening under the hood is that Tableau is doing exactly what I told it to, sorting TripID by the Minimum of Start Odometer+End Odometer, Ascending:
Given that sorting of TripID on the aggregation of Start Odometer+End Odometer, Tableau then proceeds to use that in every pane of the view, repeating the 1,9,3,2,4,5,6 order no matter what the rest of the pane might look like. This becomes more obvious when another day is added to the view. The rows for 1/3/2012 should really be sorted in reverse order:
Now, what we need here is what Tableau calls a nested sort, where the dimensional value used for sorting is made up of the Date and the TripID. Tableau has a KB article on nested sorting http://kb.tableausoftware.com/articles/knowledgebase/nestedsorting that describes how to use a Set and then sort on that. Unfortunately, Sets are slow when there is a large combination of the members, which is the case here. A better option for our needs is to have a field that uniquely identifies each row, either as a column in the data source or a calculated field in Tableau.
One method in this case is to create a calculated field with the formula STR([Date]) + ” ” + STR([TripID]), then put that into the view and set up the sort on StartOdometer+EndOdometer on the calculated field:
Or, if you have an actual ID column in the data source (which there was, to my embarrassment – it had just been buried amongst all the other measures in the Measures window), you can just as easily apply the sort to that:
There’s a lesson about Tableau’s sorting and getting to know your data set, and now you know how to avoid getting bitten by the Sort Fly.
Sort fly workbook on Tableau Public
Another key concept with sorting in Tableau is awareness of the two times that sorting is applied, sorting in the query to the data source (the sorting that you directly allied to the bold pills), and the visual sorting (sorting that happens after all computations have been evaluated (this is being applied to the Date pill when it is the first pill (not bold), it is being sorted, but its sorting is happening at a different time).
This is why you cannot sort a pill (using the sort dialog) based on a table calculation or aggregation from secondary data source. To sort cells/panes based on values not available in the sort dialog, you can use a visual sort, make the field a discrete, and place it on the Rows/Columns shelf in order to get the desired sort.
Sorting on a Set is the same kind of sorting that happens when you use an Advanced Compute Using on a table calc with multiple dimensions in the compute using list box, the sort will be applied to the combination of the dimension values, like what happens with you sort a Set, but unlike a Set you can control the order of the sorting.
Another situation with sorting is the z-order of marks in a view. This one is still a mystery to me, but sometimes trial and error can get the desired result. Things such as changing the order of the pills, changing the pill types, and putting them on different shelves have an effect on the z-order or marks.
Sorting is a big deal, and it is very nuanced, and mostly not documented in Tableau, it is not straightforward or directly controllable in some situations.
Thank you so much for this Jonathan! I added the unique identifier column and then hid it, but didn’t realize that I needed to use it for sorting. It was driving me nuts that I wasn’t able to sort my other columns. This helped so much!!!
I’m glad it was useful to you!