Around here (southern Maine) there’s a bug called a horse fly. It’s like a super sized housefly that bites. Fortunately, it’s slow and easy to swat, if you see it. Last week I was bitten by Tableau’s sorting behavior, and thought I’d share the problem, a couple of simple solutions, and the lessons learned.
The story begins with me helping another Tableau user write a complicated calculation to calculate “empty” miles from a set of transit trips – buses and vans in this case – for a given day. The data set was a couple of hundred thousand rows with a lot of dimensions and measures, each row had a TripID to identify the passenger along with start and stop odometer readings. Here’s some simplified data filtered for just the one day:
Since multiple people could be on the bus at the same time, in order to calculate the miles that the vehicle was not in use I needed to properly account for the overlapping miles, and to do that I needed the data sorted, from the lowest StartOdometer/lowest EndOdometer on up. So, I set up a calculated field STR([StartOdometer]) +” ” + STR([End Odometer]), and the sorted TripID on that field, Ascending, with aggregation Minimum. Here’s the sort and the result:
Everything was working fine, until we created another view to do a calculation across days, and this happened:
We can see that TripID 9 shows up first on 2/1/12, even though it’s got a higher start and stop mileage than the others. What’s going on? Here was the evidence of my mistake – not getting to know the data. It turns out that TripID does not uniquely identify a trip – there are multiple records for each TripID.
An easy visual wat to check for uniqueness is to set up a Bar chart with your discrete dimensions and measures on the Rows Shelf, and SUM(Number of Records) on the Columns Shelf. If the Number of Records Axis only shows 0, 1, and 2 for values, then you have a unique set of records. This takes advantage of how Tableau draws axes as it sizes for the existing range and a bit more so the bar doesn’t go all the way to the right. Using this method, I found that TripID and TripDate did in fact identify unique records:
The reason why this happens is that Tableau doesn’t sort a value (and then re-arrange everything else in the table/view), it sorts a dimension by a value. An advantage to this is that dimensions can be independently sorted by different measures, the disadvantage is that we’ve been trained by applications like Excel to expect the former behavior.
What’s happening under the hood is that Tableau is doing exactly what I told it to, sorting TripID by the Minimum of Start Odometer+End Odometer, Ascending:
Given that sorting of TripID on the aggregation of Start Odometer+End Odometer, Tableau then proceeds to use that in every pane of the view, repeating the 1,9,3,2,4,5,6 order no matter what the rest of the pane might look like. This becomes more obvious when another day is added to the view. The rows for 1/3/2012 should really be sorted in reverse order:
Now, what we need here is what Tableau calls a nested sort, where the dimensional value used for sorting is made up of the Date and the TripID. Tableau has a KB article on nested sorting http://kb.tableausoftware.com/articles/knowledgebase/nestedsorting that describes how to use a Set and then sort on that. Unfortunately, Sets are slow when there is a large combination of the members, which is the case here. A better option for our needs is to have a field that uniquely identifies each row, either as a column in the data source or a calculated field in Tableau.
One method in this case is to create a calculated field with the formula STR([Date]) + ” ” + STR([TripID]), then put that into the view and set up the sort on StartOdometer+EndOdometer on the calculated field:
Or, if you have an actual ID column in the data source (which there was, to my embarrassment – it had just been buried amongst all the other measures in the Measures window), you can just as easily apply the sort to that:
There’s a lesson about Tableau’s sorting and getting to know your data set, and now you know how to avoid getting bitten by the Sort Fly.