Tag Archives: pivot

Creating Lists of Values for Tableau from Text & Excel Sources

There are various use cases where we start out with a “flat” table like the Superstore sample data that has a number of columns with various dimensions and we want to make a simple list of unique values of one or more dimensions. such as a list that has just the six continents in Superstore:

The use cases for this include:

  • Using a filter action value as a parameter in the target source (look for posts from myself and Rody Zakovich on this in the next week).
  • Cross data source filters with higher performance when the list of filter values can be small compared to the volume of data.
  • Creating scaffold data sources to pad out data and ensure there are no sparse combinations of values.
  • Situations where we’d want to do a union or cross product of the data to do something like a market basket analysis but the union or cross product would be prohibitively large, so instead we only union or cross product desired dimension(s) and then join in the original data as necessary.
  • The last multi-select highlighter method from Multiple Ways to Multi-Select and Highlight in Tableau can use a self-union.

If you are starting out with a well-structured data warehouse with dimension tables, can write SQL, Python, or R, build custom views on the data source, use data preparation tools like Alteryx or Easymorph or Trifacta, etc. then obtaining or generating these kinds of lists is pretty straightforward. But not everyone has those skills or resources, and in the case of users who just have Excel and/or text files we need to get creative. This post goes through a three different methods to get these lists in Tableau:

    1. Ask!
    2. Aggregated Extract
    3. Excel Pivot Table as a Data Source
    4. Custom SQL

In this post I’ll go through each of these options. [Note: this post was updated on 10 Jan 2018 to make the aggregated extract method a little simpler.]

1. Ask!

This might seem obvious, but sometimes we’re stressed out and under deadlines and don’t realize we might be able to get help. If the data you are working with is coming from someone else then go ahead and ask them if they have a list of unique values. I’ve found that most people want the data they produce to be used and used well and if I’m coming back to them asking for something so I can do more with “their” data they are happy to accommodate me. I might phrase the request like “I want to make sure I’m using the latest list of departments, can you give me that list?”

The one caveat to getting data back from your ask is that you’ll need to go through some validation to make sure the list matches up with the “real” data, sometimes the amount of validation and cleansing isn’t worth the effort and one of these other approaches is better. However if you’re in a data-starved environment the kind of relationships you can make by asking for data can lead to more trust and ultimately more access to the data you want (and need).

2. Aggregated Extracts

For this method we’re going to connect to the data source and build an extract only we’ll be telling Tableau to aggregate the data to the desired level of detail (the field(s) we want to use) before Tableau builds the extract. The resulting extract then just has a record for each combination of field(s) that we want to use.

  1. Connect to the data source.
  2. Create a single worksheet with the field(s) you want to use as dimension pills, I usually just put them on Rows as discrete (blue) pills:
  3. Right-click on the source and choose Extract Data… The Extract Data window opens.
  4. Click on the Aggregate data for visible dimensions checkbox.
  5. Click the Hide All Unused Fields button.
  6. Click Extract. Tableau will ask where to save the extract. Choose a location and click OK.

Voila, you now have an aggregated extract source that you can use in Tableau data blends and/or join to!

Notes on Aggregated Extracts

There are a few things to keep in mind when using aggregated extracts: First of all there’s the need to refresh them to keep up with the data so if you have Tableau Server you’ll need to set up an appropriate schedule, if not then you’ll need to set up your own manual or automated workflow that gets the results you need. One possibility is using Tableau’s extract API.

Secondly if new columns are later added to the data they are automatically added to the extract. This may be ok for some use cases, there are others where this will break views that depend on that extracted data.

Finally, if you want to join on this aggregated extract you’ll need to join directly to the .tde or .hyper file.  Where this gets complicated is handling data updates. You’ll need one workbook or workflow to update the extract and then use the extract in a second workbook. Unfortunately we can’t publish the extract to Tableau Server or Online and join to that published data source (yet), otherwise that would be an easy workaround. There are a number of cases where a Tableau data blend is sufficient, we’ll be demonstrating one in the next week.

3. Excel Pivot Table as a Data Source

For Excel sources besides connecting to worksheets with raw data we can connect to worksheets that are built as a pivot table.

Here’s how using Excel 2016 for Mac:
  1. Open the source in Excel.
  2. Create a pivot table in a new worksheet.
  3. Drag the field(s) you are interested in to Rows.
  4. Rename the Row Labels header to have appropriate values if necessary.
  5. Remove the grand total.
  6. Rename the worksheet to something more meaningful than Sheet2.
  7. Save the workbook in Excel.
  8. Open up Tableau and connect to the Excel workbook.
  9. Drag the pivot table you just added onto the canvas:
Now you can use this to join to other tables and/or use in data blends.

Notes on using Excel Pivot Tables as a Data Source

Before Tableau introduced Level of Detail expressions in version 9 I used pivot tables in production views to pre-aggregate the data for some values and also to create tables I could join on to pad out the data so I could be sure to see records for every (person, office, metric) for every month. This method has one potentially major challenge around data updates, though, and that is that if we have data in worksheet A and a pivot table in worksheet B and we update the data in A (such as adding a new value that should appear in the pivot table B) that change won’t be reflected in the pivot table B until there is an explicit command in Excel to update the pivot table B and then save the workbook.

Even though we can tell Excel to do things like “Refresh data when opening file” this flag is only detected by Excel, not Tableau. Therefore to get updates to the data to be reflected in the pivot table the workflow has to include the steps to do a Data->Refresh All or open the pivot table worksheet before saving the workbook.

4. Custom SQL for Excel & Text Files

When I’m delivering Tableau training classes and we get to the point of talking about SQL & Tableau there are two common reactions: 1) yeay! and 2) [eyes glaze over]. This part is for the people in the latter category. Tableau hasn’t turned everything we might want to do into point & click, so sometimes we need to work with raw data. We do this in our everyday lives…there’s no good vegetarian restaurant in my town so when my wife & I want African ground nut stew we’ve got to make it ourselves. So I think of using Custom SQL as using the raw ingredients of the data to get a result I don’t have another way to get. However, in this case we’re going to be lazy (in a good way) and make Tableau write the SQL for us! Here’s how (these instructions don’t work for Tableau for Mac, see the Notes section below for more info):

    1. Start adding a new data source that is the Excel or text file you want to connect to.
    2. In the Open dialog select the file, then on the Open button click the drop down carat and choose “Open with Legacy Connection”.  You’ll return to the data source window.
    3. Drag the worksheet or file if necessary onto the canvas.
    4. Use the Data->Convert to Custom SQL menu option. The Convert to Custom SQL window will appear.
    5. Edit the Custom SQL to remove all the fields that you don’t need.
    6. Make sure to delete the trailing comma from the last field in the SELECT before the FROM.
    7. Add the DISTINCT keyword after the SELECT before the first field. The SQL query will now look something like this:
    8. Click Preview Results… to test. If it comes back with an error then check your syntax (see notes below for some tips) and try again. If it works by showing a View Data window with your results close the View Data window and then click OK to close the Custom SQL window You’ve now created a unique list of values using custom SQL!

The advantage of using Custom SQL compared to using an aggregated extract or pivot table is that it updates with the data and doesn’t require the more complicated workflows of the other methods.

Simple SQL SELECT Query Syntax

Here’s a really simple example for getting one field from one table:
SELECT DISTINCT [table].[field1] AS [field1]
FROM [table]
If you want multiple fields from one table the SQL query looks like this:
SELECT DISTINCT  [table].[field1] AS [field1],
   [table].[field2] AS [field2],
   [table].[field3] AS [field3]
FROM [table]

In some ways SQL is written a little backwards, and in more complicated queries backwards and forwards. To me the real “starting place” of a SQL query is the FROM part because that is telling the SQL engine where (what table, worksheet, or text file, generically called “table”) to get the data from. Then the SELECT is going to grab the set of fields that we specify. The DISTINCT keyword tells the SQL engine to only get the unique (distinct) combinations of values of those fields instead of grabbing every single record.

The field names themselves use the [table name].[field name] convention so that if there are multiple tables in a query each field referenced can be uniquely identified. The table and field names are surrounded by square brackets by default to handle situations where the table or field name might have spaces. Finally Tableau uses the AS [field name] aliasing option to ensure that the name used by Tableau is a usable name in Tableau.

SQL doesn’t care about spaces & line feeds, we could write SELECT DISTINCT [table].[field1] AS [field1] FROM [table] all one one line and it would work just fine.

SQL cares very much about the placement of square brackets & commas, if one is out of place or missing then the whole query will fail. Make sure that you have all brackets in place and make sure that the last field in the SELECT doesn’t have a comma after it.

Notes on Custom SQL for Excel & Text Files

The Legacy Connector is not available on Tableau for Mac, so we can’t use this particular method for connecting to Excel or text files on the Mac.

The Legacy Connector is actually the Microsoft JET driver that was phased out in Tableau version 8.3 for a variety of reasons, here’s a link of differences to be aware of from the Tableau legacy connector documentation. Also here’s the Tableau documentation on Connect to a Custom SQL Query. Finally I did a post awhile back on details of using the Custom SQL in the context of Microsoft Access connections which also use the MS JET driver, some of the points there are useful to keep in mind.

Hacky…or not?

If it all seems a bit hacky and contrived then I agree with you. At this time if all we have are Excel or text files and what features Tableau provides we’re in a low-resource environment and workarounds are necessary.

I regularly see projects I’m working with needing to invest more in data preparation in order to keep Tableau humming along. That investment could be in scripting languages like Python or PowerShell or R, using PowerQuery, starting the process of moving data into a database (there are free versions of many databases), and/or use more dedicated data preparation tools like Alteryx, Easymorph, or Trifacta. I like to set expectations around this early on in new projects because once they start using Tableau invariably projects run into imitations of their existing data pipeline to provide the volume and variety of data that they can now analyze in Tableau.

Conclusion

The goal for this post was to set you up with the skills you need to get a custom list of distinct values to support several different use cases and I hope this did that for you. As mentioned early on, Rody Zakovich and I have some posts in the works that use this to do some new things in Tableau!

Parallel Coordinates via Pivot and LOD Expressions

Parallel coordinates are a useful chart type for comparing a number of variables at once across a dimension. They aren’t a native chart type in Tableau, but have been built at different times, here’s one by Joe Mako that I use in this post for the data and basic chart. The data is a set of vehicle attributes from the 1970s, I first saw it used in this post from Robert Kosara. This post updates the method Joe used with two enhancements that make the parallel coordinates plot easier to create and more extensible, namely pivot and Level of Detail Expressions.

The major challenge in creating a parallel coordinates chart is getting all the ranges of data for each variable into a common scale. The easiest away to do this is to linearly scale each measure to a range from 0-1, the equation is of the form (x – min(x))/(max(x)-min(x)). Once that scale is made then laying out the viz only needs 4 pills to get the initial chart:

Screen Shot 2016-07-08 at 3.25.20 PM

The Category is a dimension holding the different variables, ID identifies the different cars in this case, the Value Scaled is the scaled measure that draws the axis. Value Scaled is hidden in the tooltips while Value is used in the tooltips.

Where this is easier to create is using Tableau’s pivot function, in Joe’s original version the data is in a “wide” format like this:

Screen Shot 2016-07-08 at 3.20.33 PMSo for each of the measures a calculation had to be built, and then the view was built using Measure Names and Measure Values:

Screen Shot 2016-07-08 at 3.47.05 PM

The major limitation here is in the tooltips (in fact, Joe had rightly hidden them in the original, they were so useless):

Screen Shot 2016-07-08 at 3.33.36 PM

The tooltip is showing the scaled value, not the actual value of acceleration. This is a limitation of Tableau’s Measure Names/Measure Values pills…If I put the other measures on the tooltip then I see all of them for every measure and it’s harder to identify the one I’m looking at. Plus axis ranges are harder to describe.

Pivoting Makes A Dimension

I think of Tableau’s Measure Names as a form of pivoting the data, to create a faux dimension. I write faux because beyond the limits mentioned above we can’t group Measure Names, we can’d blend on Measure Names, we can’t do cascading filters on Measure Names, etc. The workaround is to pivot our data so we turn those columns of measures into rows and get an actual “Pivot field names” dimension (renamed to Category in my case) and a single “Pivot field values” measure (renamed to Value in my case):

Screen Shot 2016-07-08 at 3.45.22 PM

Then for the scaling we can use a single calculation (instead of one for every original column), here’s the Value Scaled measure’s formula:

([Value] - {EXCLUDE [ID] : MIN([Value])})/
({EXCLUDE [ID] : MAX([Value])} - {EXCLUDE [ID] : MIN([Value])})

I used an EXCLUDE Level of Detail Expression here rather than a TOTAL() table calculation as an example of how we can use LODs to replace table calculations and have a simpler view because we don’t have to set the compute using of the table calculation.

Now with a real Category dimension in the view the Value Scaled calc is computed for each Category & ID, and this also means that if we put the Value measure in the view then that is computed for each Category & ID as well, immediately leading to more usable tooltips:

Screen Shot 2016-07-08 at 3.55.53 PM

For a quick interactive analysis this view takes just a couple of minutes to set up and the insights can be well worth the effort. Prior to the existence of Pivot and LOD expressions this view would have taken several times as long to create, so for me this revised method takes this chart type from “do I want to?” to “why not??”

Cleaning Up

To put this on a dashboard some further cleanup and additions are necessary. Identifying the axis ranges is something that is easier as well with the pivoted data. In this case I used a table calculation to identify the bottom and top-most marks in each axis and used that as mark labels to identify the axis range:

Screen Shot 2016-07-08 at 3.58.04 PM

The Value for Label calculation has the formula:

IF FIRST()==0 OR LAST()==0 THEN
    SUM([Value])
END

The addressing is an advanced Compute Using so that it identifies the very first or last mark in each Category based on the value:

Screen Shot 2016-07-08 at 4.01.00 PM

In addition I created two different versions of the value pill that each had different number formatting and used those on the tooltips, used Joe’s original parameters for setting the color and sort order with revised calculations (which were also easier to use since Category is a dimension), and finally added a couple of other worksheets to be the target of a Filter Action to show details of the vehicle:
Screen Shot 2016-07-08 at 4.02.54 PM

Click on the image above to download the workbook from Tableau Public.

The Letdown and the Pivot

The Letdown

Tableau does amazing demos. Fire up the software, connect to a data source, select a couple pills, click Show Me, boom there’s a view. Do a little drag and drop, boom, another view. Duplicate that one, boom, another view to rearrange. Within three minutes or less you can have a usable dashboard, for 200 rows of data or 200 million.

Screen Shot 2014-04-16 at 6.29.57 AMIf you’ve seen those demos, the not-so-dirty little secret of Tableau is that they pretty much all start with clean, well-formatted, analytics-ready data sources. As time goes on, I’ve interacted with more and more new Tableau users who are all fired up by what they saw in the demos, and then let down when they can’t immediately do that with their own data. They’ve got to reshape the data, learn some table calcs right away, or figure out data blending to deal with differing levels of granularity, and/or put together their first ever SQL query to do a UNION or a cross product, etc. Shawn Wallwork put it this way in a forum thread back in January: “On the one hand Tableau is an incredibly easy tool to use, allowing the non-technical, non-programmers, non-analysis to explore their data and gain useful insights. Then these same people want to do something ‘simple’ like a sort, and bang they hit the Table Calculation brick wall…”

I work with nurses and doctors who are smart, highly competent people who daily make life or death decisions. Give them a page of data and they all know how to draw bar charts, line charts, and scatterplots with that data. They can compute means and medians, and with a little help get to standard deviations and more. But hand them a file of messy data and they are screwed, they end up doing a lot of copy & paste, or even printing out the file to manually type the data in a more usable format. The spreadsheet software they are used to (hello, Excel) lets them down…

…and so does Tableau.

A data analyst like myself can salivate over the prospect of getting access to our call center data and swooping and diving through hundreds of thousands of call records looking for patterns. However, the call center manager might just want to know if the outgoing reminder calls are leading to fewer missed appointments. In other words, the call center manager has a job to do, that leads to a question she wants to answer, and she doesn’t necessarily care about the tool, the process, or the need to tack on a few characters as a prefix to the medical record number to make it correspond to what comes out of the electronic medical record system; she just wants an answer to her question so she can do her job better. To the degree that the software doesn’t support her needs, there has to be something else to help her get her job done.

The Pivot

When Joe Mako and I first talked about writing a book together, our vision was to write “the book” on table calculations and advanced use cases for Tableau. We wanted (and still want) to teach people *how* to build the crazy-awesome visualizations that we’ve put together, and how they can come up with their own solutions to the seemingly-intractable and impossible problems that get posted on the Tableau forums and elsewhere. And we’ve come to realize that there is a core set of understandings about data and how Tableau approaches data that are not explicitly revealed in the software nor well-covered in existing educational materials. Here are a few examples:

  • Spreadsheets can have a table of data, so do databases (we’ll leave JSON and XML data sources out of the mix for the moment). But spreadsheet tables and database tables are very different: Spreadsheet tables are very often formatted for readability by humans with merged cells and extra layers of headers that don’t make sense to computers. A single column in a spreadsheet can have many different data types and cells with many meanings, whereas databases are more rigid in their approach. We tend to assume that new users know this, and then they get confused when their data has a bunch of Null values because the Microsoft Jet driver assumed the column starting with numbers was numeric, and wiped out the text values.
  • Screen Shot 2014-04-16 at 6.09.22 AMWe—Tableau users who train and help other users—talk about how a certain data sets are “wide” vs. “tall”, and that tall data is (usually) better for Tableau, but we don’t really talk about what are the specific characteristics of the data and principles involved that in a way that new Tableau users who are non-data analysts can understand and apply those principles themselves to arrange their data for best use in Tableau.
  • Working with Tableau, we don’t just need to know the grain of the data–what makes a unique row in the data–we also need to understand the grain of the view–the distinct combinations of values of the dimensions in the view. There can be additional grains involved when we start including features like data blending and top filters. Even “simple” aggregations get confusing when we don’t understand the data or Tableau well enough to  make sense of how adding a dimension to the view can change the granularity.

Carnation, Lily, Lily, Rose by John Singer Sargent, from WikiMedia CommonsJust as we can’t expect to be a brilliant painter without an understanding of the interplay between color and light, we can’t expect to be a master of Tableau without a data- and Tableau- specific set of understandings. Therefore, we’ve been pivoting our writing to have more focus on these foundational elements. When they are in place, then doing something like a self-blend to get an unfiltered data source for a Filter Action becomes conceivable and implementable.

Screen Shot 2014-04-16 at 6.10.37 AMThis kind of writing takes time to research, think about, synthesize, and explain. I’ve been reading a lot of books, trawling through painfully difficult data sets, filling up pages with throw-away notes & diagrams, and always trying to keep in mind the nurses and doctors I work with, the long-time Tableau users who tell me that they still “don’t get” calculated fields in Tableau (never mind table calcs), and the folks I’m helping out on the Tableau forums. So “the book” is going slower than I’d hoped, and hopefully will be the better for it.

If you’d like a taste of this approach, I’ll be leading a hands-on workshop on pill types and granularity at this month’s Boston Tableau User Group on April 29.

Postscript #1: I’m not the only person thinking about this. Kristi Morton, Magdalena Balazinska, Dan Grossman (of the University of Washington), and Jock Mackinlay (of Tableau) have published a new paper Support the Data Enthusiast: Challenges for Next-Generation Data-Analysis Systems. I’m looking forward to what might come out of their research.

Postscript #2: This post wouldn’t have been possible without the help (whether they knew it or note) of lots of other smart people, including: Dan Murray, Shawn Wallwork, Robin Kennedy, Chris Gerrard, Jon Boeckenstedt, Gregory Lewandoski, and Noah Salvaterra. As I was writing this post, I read this quote from a Tableau user at the Bergen Record via Jewel Loree & Dustin Smith on Twitter: “Data is humbling, the more I learn, the less I know.” That’s been true for me as well!