screen-shot-2016-11-03-at-1-48-59-pm

#data16 Makes 5: My 2016 Tableau Conference Preview

In about 42 hours I’ll be getting on a plane to Austin, TX for the 2016 Tableau Conference, the highlight of my Tableau year where I get to meet, learn from, and celebrate with 12,000 dataviz geeks and a couple of thousand of their enablers (the lovely and talented Tableau employees who make it all possible). This year will be my fifth Tableau Conference and it hasn’t gotten old, I’m in a new role this year as a consultant and trainer at DataBlick so my focus is a little different. I’m certainly going to learn some more, I’m also networking [plug-alert] hey, did you know that you can hire DataBlick to help you do awesome things with Tableau?[/plug-alert], but most importantly I’m getting the chance to serve the fantastic Tableau community by cheering on some great work, showing off some cool projects and hopefully inspiring others to get involved, and helping other users. Here’s a list of some anticipated highlights of my week:

  • All week
    • Meeting people, making new friends, finally getting to see friends who I’ve never met in person, talking about my favorite piece of software, and whatever seems important.
      4954b15d-e9f8-4f42-bc7d-29c24239812a
    • Handing out new business cards! We did a redesign at DataBlick and for our website and business cards we have caricatures of ourselves, here’s my totally Zen’ed out Tableau fanboi pic:
      jonathan-drummey-300x300-png
      We’re running a contest for whoever can collect all 5 (or 6!) of them, check it out at link. And we’ve got some other swag that I’m really looking forward to handing out, including some presents for the Tableau devs.
  •  Monday
    • Hackathon: One of my #TCMemories from last year’s conference (pic below) was the Web Data Connector Hackathon, seeing dozens of customer developers building out new connectors was fantastic, I’m excited to see even more development that builds out the Tableau ecosystem.
      img_8098
    • Data+Women Meetup: I’m really happy to support the fantastic work and community-building that has been happening with the Data+Women effort.
    • MakeoverMonday: This runs concurrently with Data+Women so I won’t be there. 🙁 What the Andy’s (Cotgreave and Kriebel) started and what the community has collectively created is amazing! Did you know they just won an InformationIsBeautiful award?
  •  Tuesday
    • Healthcare & Life Sciences Meetup. Andy Dé who leads Tableau’s work in healthcare will be facilitating this, I’m looking forward to meeting other healthcare users there. And as I’m one of the leaders of the Tableau Healthcare User Group I’ll be networking to find speakers for our next meetings!
    • Keynote – Tableau Vision. Like it or not Tableau will be having some amount of a culture shift with the new CEO, I’m looking forward to seeing what they say in public and reading the tea leaves.
      221fee05-740d-43ca-8289-02fac2f57779
    • Tableau Labs (they run all week). I used to work in software development and I love that Tableau gives us the opportunity as users to talk directly with the devs about what’s coming, what we like, and what we need.
    • Data-Driven Alerts Come to Tableau. As one of the developers of VizAlerts I applaud Tableau for continuing to build out native alerting functionality and I’m looking forward to Zac Woodall’s presentation.
    • Supporting Eradicating Malaria in Zambia by 2020. I’ve been a volunteer on the #visualizenomalaria project with the Ministry of Health in Zambia, PATH, and the Tableau Foundation and it’s really inspiring. In fact some of the code I’ve contributed to VizAlerts has been to meet requirements for Zambia. Jeff Bernson, Anya A’Hearn, Allan Walker, and I will be sharing our stories and pointing out ways that you can get involved in this or other projects.
    • Devs on Stage. As much as I can point out the flaws in Tableau the developers know them better than me because they are living in that, and there are many fantastic things too. This is the devs moment to shine and a chance for me to cheer them on and appreciate them for what they’ve built that has changed my life, I’ll be down in front and making a lot of noise.
      img_8078
    • Elections! Besides the general elections the *really* important election for me is at my daughter’s awesome school where she (at the age of 9!) is running for school president, they’ve done a whole election theme where she’s had to write and give speeches, produce media, participate in debates, get interviewed, etc. (Yes, I’m a proud papa). If you see me staring at my phone Tuesday afternoon it’s because I’m updating my Facebook feed for the election results.
  • Wednesday
    • Healthcare User Group. I’m facilitating this lunchtime group where we’ll have another meetup and Nicole Webb and Marina Chakhalyan of Adventist Health will be presenting a great story on a workforce dashboard. I’m told that this user group will fill up, so get there early!
    • Jedi Chart Types. Chris DeMartini and Adam McCann are going to describe how they build their stunningly beautiful network graphs.
    • VizAlerts: KPI-Based Alerting, Burst Emails, and More for Tableau. Matt Coles and I will be  presenting VizAlerts 2.0 and a whole series of demos of use cases for VizAlerts and where it’s applicable (or not). We’ve also got a special audience-interactive demo (and contest!) planned.
    • Use Tableau Like a Sith. Unfortunately this is scheduled at the same time as my VizAlerts session so I can’t be there but I’ll be there in spirit. Plus DataBlick has something special in store for the Sith-lords in training.
    • Data Night Out. I have so many great memories of making new friends here and catching up with old ones, it’s always a highlight. Here’s a shot of a few thousand folks at #data15, plus a few crazy ones heading out on an aerial tour of the party:
      screen-shot-2016-11-03-at-1-48-59-pm
  • Thursday
    • Zens in the Community. I’ll be in the Tableau Community area for an hour to meet whoever shows up, I love to hear what people are doing and encourage them to get more involved.
    • Tableau Zen Doctor. One of the responsibilities of being a Tableau Zen Master is that we spend some time in Tableau Doctor helping users, for me helping other users on the Tableau forums, over email, over screenshares, and with HelpMeDataBlick has taught me so much that I’m grateful for the opportunity to do that in person.
    • Bill Nye. The USA only seems to have room for a couple of public scientists at a time, I’m thankful we got to see Neil deGrass Tyson (twice!) and now Bill Nye.

And there’s all the ones that I won’t get to see in person but will be watching on video afterwards, sessions by Doug Grindel and Alan Eldridge and Bethany Lyons and Jock MacKinlay and Maureen Stone and Craig Bloodworth and Jewel Loree and Michael Kovner and Amanda Pype and  Mark Jackson and Matt Francis and Heidi Lam and Andy Kriebel and Melanie Tory and Jeffrey Shaffer and so many more…it’s going to be a great week.

If you want to connect with me during the week, look for me on Twitter @jonathandrummey. Also Keith Helfrich and Chris DeMartini have put together a time-suck of a #data16 Twitter network analysis viz at rhsd.io/tc16-network-viz, it’s totally fascinating! Tweet #data16net to let them know you are looking at it!
image

See you there!

s-l1000

You Can’t Get There From Here: My Checklist for Connecting to Databases

I grew up with the Maine story “Which Way to Millinocket” that famously ends in “You can’t get there from here.” Sometimes I feel like that when connecting to a brand new database in Tableau, here are a couple of recent errors:

 

1

“Unable to connect to ODBC Data Source. Check that the necessary drivers are installed and that the connection properties are valid. Unable to connect to the server […]. Check that the server is running and that you have access privileges to the requested database.”

The message doesn’t indicate what the exact problem is nor how to fix it, just some unhelpful suggestions. It’s the “You can’t get there from here” story.

image001

[IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: “TCP/IP”. Communication API being used: “SOCKETS”. Location where the error was detected: “10.31.10.31”. Communication function detecting the error: “selectForConnectTimeout”. Protocol specific error code(s): “0”, “*”, “*”. SQLSTATE=08001

Unable to connect to the server […]. Check that the server is running and that you have access privileges to the requested database.

This one is even worse…there’s a bunch of developer-ese that is most likely leftover code from  back when IBM still shipped Token Ring and other protocols, it has no meaning to us. It’s like the other famous Maine farmer-giving-directions story where the farmer says “Go a few miles or thereabouts and take a left at the corner where the Hanson’s barn used to be that burned down awhile back.” The only marginally useful bit of the error message is the suggestion at the end (that is repeating what’s already at the top of the dialog).

The challenge with error messages like these is that the developers are doing the best they can and interconnecting pieces of software (like Tableau and your database of choice) is hard, sometimes there’s just not much information available to explain why things don’t work. Even more challenging is that all users like us care about is what we need to do next. This post is about what I do next when I run into errors like this. I use a checklist:

Checklist for Database Connection Errors

  1. Double-check that whatever information you were given about the database connection is correct and try again. For example I once was trying (and trying) to connect to a database server named something like “mhlsql” only it was really “mh1sql”. Oops!
  2. Triple-check that your information is correct. For example I was once given a database password that had a lot of gobbledygook and numbers and manually typed in the numbers again and again and had accidentally reversed two digits. If you can copy & paste information into the forms then do that instead of typing.
  3. Check whether you have network connectivity in general. Browsing a random site on the web covers this. I’ve been frustrated thinking I couldn’t connect to a server when in fact the entire building had lost internet access.
  4. Depending on your system and permissions you may need to skip this step: Check whether you can connect to the database server using a local (i.e. on your machine) database querying application like SQL Server Management Studio, MySQL Workbench, TOAD, etc.*** or an existing BI or reporting tool like Cognos, Crystal Reports, etc. This requires:
    1. Having the right database driver. You can check your local driver against the versions at http://www.tableau.com/support/drivers and install a new one if necessary. Also databases are sensitive to 32-bit and 64-bit distinctions so you’ll need to know what is right for your computer *and* the database. This is a place where you may need to ask your IS/IT department for information and/or to install the driver.
    2. Has that database server authorized your IP address? If you’re connecting to a database server for the first time it might be locked down and blocking your IP address. You can do some testing of this in step 5 below on using ping, telnet, etc. but ultimately this may need to go to IS/IT.
    3. Do you have the right username/password or credentials to connect? Again, if you’re connecting to a database server for the first time you might not be set up to connect to that server.
    4. Use the querying application or BI/reporting tool to check whether you can access the particular database as well as table(s)/view(s)/stored procedure(s) that you want to.  For example you might be able to connect to the desired database but not have permissions on certain tables in the database.
    5. Verify that your Tableau data source is using the same settings from steps #3. For example I’ve made a mistake where I accidentally connected to the production database in Tableau (where my credentials were limited) and the test database in the querying tool and then was wondering why I got errors in Tableau.
  5. If you don’t have a querying application or you tried one and couldn’t get that to connect to my source then check whether you can ping or telnet or Remote Desktop/ssh into that database server machine (i.e. know that the machine is up). These aren’t always available depending on the configuration of the server and what kind of permissions you have.
    1. You can run ping from the command line, the command is ping [hostname or IP address]. Getting the hostname or IP address for some databases can be difficult, for example on Windows if you just have a SQL Server database name. If you check your DSN (Windows) you can get the IP address or host name, you’ll need to open up the ODBC Administrator application which Microsoft has moved around in different releases, do a web search for your version. If this doesn’t work it could be that ping has been disabled on the database server machine or that the database server is refusing connections from my machine.
    2. You can also run telnet from the command line, the command is telnet [hostname or IP address] [port]. Again you can sometimes find the port in the DSN, you can also do a web search to find the default port for my database. If this doesn’t work it’s not a sign that the database doesn’t exist because the port could have been mapped to something else or it could be that the database server is refusing connections from your machine.
    3. If you have login credentials on the machine that the database server is running on then Remote Desktop or ssh can be useful to validate that your machine can at least “talk” to the remote machine.
  6. If none of the above works then if you have a co-worker who does have a successful setup try connecting from their machine. On Windows if you’re using saved DSN’s then you’ll also need to check that the DSNs are exactly the same on both machines.
  7. Do a web search and hopefully find something.
  8. Wave the white flag and submit a trouble ticket to IS/IT.

*** In general having a local querying application is useful in debugging performance issues because your can grab a query from the Tableau Performance Recorder or Tableau logs and then run it to see what it does.

If you have any other tips, please submit them in the comments below!

Do you like this post? Would you like to get this kind of help available to you? Check out DataBlick, we offer Tableau and Alteryx consulting, training, and on-demand support.

screen-shot-2016-10-22-at-12-43-17-pm

Two Sequential Color Palettes on the Same Map: Coloring by a Dimension and a Measure

Back in 2014 I was inspired by this New York Times post The Most Detailed Maps You’ll See from the Midterm Elections to try to figure out how to replicate those maps in Tableau. The maps essentially uses three color palettes on the same map, blue for Democratic leaning areas, red for Republican, and purple for tossups where the lightness (relative light/dark) of each color palette is based on the population of the precinct. So we might think of this in Tableau as coloring by a dimension (really a discrete value) of red/blue/purple and then shading that by a measure (the population), however that’s not something available in a couple of clicks.

This post covers a variety of techniques to build such a map, to simplify for this post we’ll build this map where there are two color palettes

screen-shot-2016-10-22-at-12-43-17-pm

For this example the data set is Superstore Sales with some filters applied to create more variation. A set of states have been arbitrarily grouped as Red and another set as Blue, and the goal is to shade them within the Red and Blue hues by the % of Total Sales. I added some additional requirements:

  1. The lightness of the colors in each palette must be equivalent (symmetrical) so the “darkest” color in each color ramp corresponds to the highest % of Total Sales regardless of which hue the color is (red or blue). This allows the user to compare the relative lightness across the hues, if we don’t then then the visualization will lead to inaccurate conclusions.
  2. The color ramps for the palettes must automatically update as the data changes, for example if a user filters the view or new data arrives.
  3. The color legends themselves have to be accurate and appear in the worksheet. This is an ease-of-construction requirement because if we’ve used a technique that hacks the color legends too much then we’d have to manually build a color legend and put that in a dashboard.
  4. We need flexibility with the color palettes so they can look as good (as vibrant) as we want.
  5. The method needs to support at least 1 additional palette, for example if this were used in a US election prediction map to be able to use a single third hue for a true tossup state, or as in the NYTimes post to add an additional purple sequential palette to identify precincts that voted close to 50-50.

I tried several different methods, the only method that satisfies all the requirements is the 6th one. The six methods are:

  1. A calculated field that makes blue negative and red positive and then a diverging palette based on that field. Problem: The color ramps are not symmetrical and the color legend would have to be manually created.screen-shot-2016-10-22-at-12-21-58-pm
  2. Using a custom color palette (see my post An Exploration of Custom Color Palettes) with calculated fields to normalize the values and properly assign the colors in the palette. This enables the lightness to be equivalent and dynamically updates with the data so it’s really close to the requirements and could work in some situations. Problem: This requires more effort setting up the color palettes and calculations and the color legends would have to be manually created.
    screen-shot-2016-10-23-at-7-51-45-am
  3. A dual axis map with background color and foreground as a sequential black palette with transparency. I first learned about this technique from Wil Jones of Interworks in Create a Dual Axis Color Map in Tableau. Problem: The colors are too muted.screen-shot-2016-10-22-at-12-24-17-pm
  4. Calculated field that returns discrete values and then manual assignment of colors. Problem: Colors are limited to the values that appear in the data, in this example the data is sparse so there’s no Red 6-8. so if the data changes the palette won’t automatically update (Tableau will assign the default palette) and and will need manual effort.screen-shot-2016-10-22-at-12-25-09-pm
  5. A dual axis map with calculations that are used to change the level of detail of each Marks Card by returning red states or blue states that have a geographic role assigned, then separate measures are available for each. This is getting close because it gets us completely separate palettes that can have the colors that we want. Problem: To keep the shading of each color palette the same the color ramps had to be fixed at 10%, so if the data changes the ramps will need to be edited to reflect the data or they could be misleading.screen-shot-2016-10-22-at-12-27-43-pm
  6. This starts out similar to method #5: A dual axis map with calculations that are used to change the level of detail of each Marks Card by returning red states or blue states that have a geographic role assigned, then separate measures are available for each. In addition a duplicate State dimension that is not a geography is used to increase the level of detail so the color calculations can have same ramp of values. Problem: Most complicated to set up, and can have bigger queries (which aren’t a big deal when we’re only looking at 50 states), but it’s totally dynamic and meets all the requirements.

Here’s how to build method #6:

Part 1: Dual axis maps at different levels of detail

  1. Make sure you have in your data a field that will group the states. In my case I built a map of the 50 states and grouped the states by manually selecting some and creating an ad hoc group that I called State (group). If there’s a dimension value that can slice the states this will also work, or even a calculated field that creates a dimension.
  2. Create two calculated dimensions for Red States and Blue States. Here’s the Red States formula:  IF [State (group)] = 'Red' THEN [State] END. This only returns the state value and then Null for everything else. Also note the ability to use ad hoc groups in calculations is new in version 10.0, if you’re using an earlier version of Tableau you’d need to use a calculated field or field from the data source as a dimension in the calc.
  3. Assign the Red/Blue States dimensions to the State geographic role.
  4. Create a filled map with the Red States dimension on the level of detail.
  5. Ctrl+drag (Cmd+drag on Mac) the Latitude (generated) pill on Rows to the right to create a second axis (and a third marks card, so there’s All and two Latitude (generated) cards).
  6. Activate the bottom Marks card by clicking on the the right-hand Latitude (generated) pill on Rows.
  7. On the bottom marks card replace the Red States dimension with the Blue States dimension. You should now see a view like this:
    screen-shot-2016-10-22-at-12-30-39-pm

What we’ve done here is to create a dual axis map with two different levels of detail where the two Marks Cards for this map have different dimensions. The upper map has the level of detail of Red States that includes all the states in the Red group plus one Null value for all the blue states, the lower map has the level of detail of Blue States that includes all the states in the Blue group plus one Null value for all the red states.

Now at this point we could put separate % of total measures on the two Marks Cards with a Compute Using on the Red/Blue state, respectively and put those on the Color Shelf. This is method #5 before we’ve fixed the colors:

screen-shot-2016-10-22-at-12-31-51-pm

However this has two issues:

  1. Each Marks Card has a Null value (for blue states on the Red map and red states on the blue Map) that rolls up all those opposing states and ends up being the biggest % of Total and that affects the color ramp so the color range isn’t as big as it could be. Note how the largest blue value is 36.88%, but the map is entirely light blue. We can workaround this by using the filter option on the null/unknown marks indicator in the lower right, however…
  2. The two color ramps are not symmetrical at the top end in terms of lightness/darkness and the only way here to make them symmetrical is to manually fix the color ramps, which then doesn’t meet the requirement to have the view automatically update.

While can manually fix the bottom end of the color ramps at 0 to make them symmetrical there, at the top end we really need to have the % of total measures return the same values so that way we don’t have to manually fix the color ramps. That leads to the next steps in Part 2.

Part 2: Increase the level of detail without increasing the # of visible marks

I played around with a bunch of ideas on how to fix the problem including doing a linear scaling, nesting table calculations, precomputing all the values so I wouldn’t need to use table calculations, and so on, until I came back to revisiting how Tableau’s color ramp is determined for sequential palettes. The default is that bottom end of the ramp is assigned to the lowest value of the field on the Color shelf and top end of the ramp is assigned to the highest value of the field on Color. The challenge with the approach so far is that in each of the Red & Blue color ramps there’s a % of total value (the Null value for the blue states & red states, respectively) that is being calculated but is not visible. And that led to my insight: “Why not enable Tableau to have an accurate set of values for the color ramp that aren’t visible?” and a few clicks later I had it.

This solution leans on a couple of bits of somewhat esoteric Tableau knowledge:

a) In maps the visible marks are the ones that have an assigned geography. In this case for only a subset of states have an assigned geography when the Red States dimension is used on one Marks card and a different subset has an assigned geography when the Blue States dimension is used on the other Marks card. So we’ve already seen that there can be additional marks in the view that aren’t visible, therefore we can have all the states available on each marks card by adding a State dimension that isn’t drawn as a geography.

b) Table calcs have some interesting (to me, at least) behavior when there are multiple marks cards at different grains of detail the table calculation results depend on what Marks card(s) the table calculation is on. If a table calculation is on a single Marks card then only the dimensions from that Marks card plus Rows, Columns, and Pages are available for addressing and partitioning. If a table calculation is on the All Marks card then it can use any of the dimensions anywhere for addressing and partitioning, however the table calculation is separately computed for each Marks card based on the dimensions available for each Marks card i.e. the dimensions on that Marks card + Rows, Columns, and Pages, and dimensions that are not on that particular Marks card are ignored. That’s a lot of words, someday it’ll have to be another post to go into more detail.

In this case to get the % of total table calculation to the return the same result across both the red state and blue state Marks cards we need to set up the addressing of the calculation so it is across all the State dimensions that we are using – the State dimension that will get an accurate % of total plus the Red and Blue States dimensions that are used to create visible marks.

Here’s how to build it starting from Part 1 above:

  1. Duplicate the State dimension.
  2. Remove the State geographic role from the State (copy) dimension that you created in step 1 by right-clicking on the State (copy) dimension in the Dimensions window and choosing Geographic Role->None:
    screen-shot-2016-10-22-at-12-35-03-pmThis is necessary so when we add the the new State (copy) dimension to the view that Tableau doesn’t draw all the states on both the red & blue Marks cards.
  3. Click on the All Marks card header to activate it.
  4. Add the State (copy) dimension to the Detail on the All Marks card. Pro tip – you can drag it onto the Detail button or any part of the open area in the bottom of the Marks card. You should see a view that looks like this:
    screen-shot-2016-10-22-at-12-36-02-pm
    Note the “27 unknown” in the lower right, that’s the sign that the level of detail has increased without making more states visible. If you see all the states in each map then you’ll need to re-do step 2.
  5. We need to build three calculations to make the colors. The first is a bog-standard % of total calculation, we can do that entirely with drag & drop. Start by dragging Sales from Measures to the Detail on the All Marks card.
  6. Right-click on the Sales pill on the Marks card and choose Quick Table Calculation->Percent of Total.
  7. Right-click on the Sales pill again and choose Edit Table Calculation… to open the Table Calculation window.
  8. Make sure Specific Dimensions is selected with all three State dimensions clicked on: Red States, Blue States, and State (copy). Validate that the table calculation is accurate by hovering over the marks, using View Data, etc.
  9. Drag the Sales pill from the Detail back to the Measures window. Tableau creates a new calculated field and gives you a chance to name it, I called it % of Total Sales.
  10. Now to build out the measures that will be used on Color. Right-click on the new % of Total Sales measure and chose Create->Calculated Field… to create a new calculated field.
  11. Set the name of the new calculated field to be Red % of Total Sales, then click Apply.
  12. Repeat step 10.
  13. Set the name of this new calculated field to be Blue % of Total Sales, then click Apply.
  14. Drag both the Red & Blue % of Total Sales measures onto the All Marks card. The All Marks card should now look like this:
    screen-shot-2016-10-22-at-12-39-00-pm
  15. Activate the red state Marks card by either clicking on its header or clicking on the appropriate Latitude (generated) pill.
  16. Click to the left of the Red % of Total pill to put it on Color. The view should now look like this:
    screen-shot-2016-10-22-at-12-39-55-pm
  17. Hover over the color legend and click on the drop-down menu->Edit Colors… to open the color palette.
  18. Change the palette to Tableau’s default Red sequential palette.
  19. Click on Advanced >> to open the advanced section.
  20. Click on the Start checkbox and enter 0. This is to ensure that both color palettes start from the same value.
  21. Click OK. The map should now look like this:
    screen-shot-2016-10-22-at-12-41-07-pm
  22. Activate the blue state Marks Card.
  23. Click to the left of the Blue % of Total pill to put it on Color. Note that both color legends have the same ending value.
  24. Hover over the color legend and click on the drop-down menu->Edit Colors… to open the color palette.
  25. Click on Advanced >> to open the advanced section.
  26. Click on the Start checkbox and enter 0. This is to ensure that both color palettes start from the same value.
  27. Click OK. The map should now look like this:
    screen-shot-2016-10-22-at-12-42-12-pm
  28. The last step in building the map is to right-click on the right-hand Latitude (generated) pill on Rows and choose Dual Axis. Here’s the map:

screen-shot-2016-10-22-at-12-44-15-pm

From here we can clean up the map formatting, edit the tooltips, turn off the unknown indicator, etc.

Notes and Next Steps

A key part of what makes the colors more accurate is that Tableau’s default red & blue color palettes use the same lightness throughout the palettes. If you are going to use a different set of palettes then take some time to make sure the lightness values are equivalent for the palettes.

If what distinguishes between the states is a measure (as opposed to a dimension like the ad hoc group used here) then you’ll need to use a variation on this technique, if you have questions post below in the comments.

If you want to get an additional discrete color (such as to mark tossup states) then you can change the measure that draws colors for a given Marks card to return a single negative value and use a diverging palette with more configuration of the Advanced >> options, for example I’ve set up Minnesota to be yellow in this  view:

screen-shot-2016-10-22-at-12-44-59-pm

If you’re careful with sequential or diverging palettes you can actually make multiple continuous ranges and/or discrete colors in the same palette, see An Exploration of Custom Color Palettes and Keith Helfrich’s post at http://redheadedstepdata.io/color-the-dupes/ for more details. So either method #6 or method #2 could build something like the NYTimes viz that I linked to at the beginning of this post, the question is how much mucking around with color palettes and color legends do you want to do?

Here’s a link to the Two Sequential Palettes on a Map workbook on Tableau Public.

If you think this is awesome and want more you then please check out DataBlick, we provide Tableau training and consulting and even short-term support to help you make better views and dashboards.

Marimekko wGT

What I did on my summer vacation (hint: it’s about Tableau v10 and Marimekko charts)

This summer while beta testing Tableau v10 I was very curious about the new mark sizing feature. Bora Beran did a new feature video during the beta showing a Marimekko chart aka mosaic plot. There have been a few posts on building Marimekko charts in the past by Joe Mako, Rob Austin, and a not-quite-a-Marimekko in an old Tableau KB article, but the two charts required extra data prep and the KB article wasn’t really a Marimekko, so I was really interested in what Tableau v10 could do.

I asked Bora for the workbook and he graciously sent it to me. (Thanks, Bora!) I found a problematic calculation, in particular a use of the undocumented At the Level feature that could return nonsensical results if the data was sparse. I rewrote the calculation, sent it back to Bora, and he came back asking if I’d like to write a blog post on the subject. (Thanks, Bora.) There are two lessons I learned from this: 1) the Tableau devs are happy to help users learn more about the new features, and 2) if a user helps them back they will ask for more. Caveat emptor!

Over the course of the next few weeks I did a lot of research, created a workbook with several dashboards & dozens of worksheets, arranged for Anya A’Hearn to sprinkle some of her brilliant design glitter and learned some new tricks, and wrote (and rewrote) 30+ pages (!) of documentation including screenshots. Martha Kang at Tableau made some edits and split it into 3 parts plus a bonus troubleshooting document and they’ve been posted this week, here are the links:

On Gender and Color

As part of the design process Anya and I had some conversations about how to color the marks.  The data set I used for the Marimekko tutorial is the UC Berkeley 1973 graduate admissions data that was used to counter claims of gender bias in admissions so gender is a key dimension in the data and I didn’t want to use the common blue/pink scheme for male/female. It’s a recent historical development and as a father I want my daughter to have a full range of opportunities in life including access to more than just the pale red end of the color spectrum in her clothes, tools, and toys. Anya and I shared some ideas back and forth and eventually Anya landed on using a color scheme from a Marimekko print she found online.

Screen Shot 2016-08-24 at 10.58.30 PM

Anya is going into more detail on the process in her Women in Data SF talk on Designing Effective Visualizations on Saturday 27 August from 10am-12:30pm Pacific, here’s the live presentation info and the virtual session info. It’s going to be a blast so check it out!

So that’s how I spent my summer vacation. Can’t wait for next year!

 

Screen Shot 2016-08-17 at 12.12.24 AM

When 576 = 567 = 528 = 456: Counting Marks

Tableau’s data densification is like…nothing else I’ve ever used. It’s a feature that is totally brilliant when it “just works” like automatically building out a running sum on sparse data and mind-taxingly complicated when a data blend’s results go haywire because densification was accidentally triggered.

What I’ve historically taught users is to always ALWAYS look at the marks count in the status bar as a first way to detect when data densification occurs. Here’s Superstore Sales data with MONTH(Order Date) on Columns, Region and State on Rows, there are 499 marks and we can see that the data is sparse by the class that are missing Abcs:

Screen Shot 2016-08-16 at 11.52.15 PM

If I add SUM(Sales) to the Level of Detail Shelf and set it to a Running Total Quick Table Calculation with the default Compute Using of Table (Across) so it’s addressing on Order Date then I see 576 marks and all the Abcs are filled in, this is Tableau’s data densification at work:

Screen Shot 2016-08-16 at 11.55.19 PM

However, here are three additional views all still using the same pill layout and Quick Table Calculations  showing three different marks counts (567, 528, and 456):

Screen Shot 2016-08-16 at 11.59.11 PMScreen Shot 2016-08-17 at 12.00.55 AM

The marks count is changing based on a variety of factors, the different quick table calculations used (running total, difference, and percent difference) are a part of it but the underlying behavior depends on whether a mark is densified or not, the pill arrangement, and whether or not a densified mark has been assigned a value (including Null) or not. Prior to Tableau version 9.0 these all would have been counted in the marks count and the views would show 576 marks for each, Tableau v9.0 changed the behavior to only count the “visible” marks.

I’ll walk through the above there examples. In this one the Running Total has been moved from the Level of Detail to the Rows Shelf and there are 567 marks.

Screen Shot 2016-08-16 at 11.59.11 PM

The reason why is that even though those combinations of Region, State, and Month have been densified for states like Iowa that don’t have any sales in the first month(s) of the year (more on how I know that below) those densified marks don’t have any assigned value (even Null) so they are not counted in the marks count nor are they counted in the Special Values indicator in the lower right.

In this view using the Difference calculation there are 528 marks and the Special Values indicator shows 48 nulls (528+48 = 576). In this case the Difference calculation is using the LOOKUP() function that is returning Null for the densified values.

Screen Shot 2016-08-16 at 11.59.11 PM

Finally in this view using the % Difference calculation there are 456 marks and the Special Values indicator shows 120 nulls (456+120 = 576). In this case the % difference calculation is spitting out extra nulls due to divide by 0 results.

Screen Shot 2016-08-16 at 11.59.11 PM

The difference is due to a change made in Tableau v9.0 where the marks count now only counts “visible” marks (Tableau’s term), where the definition of a “visible” mark is complicated, they are the “Yes” answers in the table below:

Screen Shot 2016-08-17 at 12.09.17 AM

Now one of the ways I’ve been used to checking for densification is selecting all the marks (either by Right+Clicking and choosing Select All or pressing Ctrl/Cmd+A) and then hovering over a mark and Right+Clicking and choosing View Data… or waiting for the tooltip to come up and using View Data. For example here’s the select all View Data in v9.0 for the % Difference on Rows view, the yellow cells indicate where data was densified and there are 576 rows:

Screen Shot 2016-08-17 at 12.12.24 AM

However, that doesn’t work anymore in Tableau v10.0, there was change made to the Select All functionality such that Select All only gets the “visible” marks, here’s that same view data in v10 and there are only 456 rows:

Screen Shot 2016-08-17 at 12.12.58 AM

So Select All doesn’t work the way it used to, and the marks count can change in “interesting ways” (and we haven’t gone into what things like formatting Special Values can do), so what can we do to spot densification? There are three workarounds for this, all documented in the right-most column of the table above:

  1. Select a discrete header or a range of headers, wait for the tooltip to come up, and click on the View Data icon.
  2. Right-click in the view (but not on a mark) and choose View Data…
  3. Use the Analyis->View Data… menu option.

All of these will show the densified values, here’s an animated GIF of selecting Iowa selected in the Difference on Rows view where we can see the  two Null values:

2016-08-17 00_21_03

However only one of those is actually densified, to tell that exactly we need to add a field that actually has data. In this case I’ve added SUM(Sales) to the Level of Detail Shelf and the View Data for Iowa now shows that it’s really only January that is densified, since there’s nothing at all in the January SUM(Sales) cell:

Screen Shot 2016-08-17 at 12.27.28 AM

Conclusion

The marks count is not a reliable indicator of the volume of densification and we need to resort to various selection mechanisms and the View Data dialog to more specifically identify how much has been densified. I’m not a fan of these changes: what I’d really like Tableau to do is to add a count of densified values to the status bar and details on what was densified to the default caption and the Worksheet->Describe Sheet… Until that time, though, hopefully this post will help you keep track of what Tableau is doing!

Here’s a link to the marks count workbook in v8.3 format (so you can open it up for yourself and see the differences in different versions).

Screen Shot 2016-07-13 at 2.29.44 PM

Getting the Version of a Tableau Workbook in a Few Clicks

In helping other Tableau users as part of DataBlick or my pro-bono contributions to the community I get a lot of Tableau workbooks in a lot of versions, in the last two weeks I’ve received v8.3, v9.0, v9.2, v9.3, and v10beta workbooks and when I edit them I need to make sure I’m using the same version of Tableau. And I’m often frustrated because I’ll open the workbook in the wrong version of Tableau and get this message:

Screen Shot 2016-07-13 at 2.30.52 PM

or this one:

Screen Shot 2016-07-13 at 2.31.34 PM

And then I have to open it up in other versions which can take awhile. I shared this problem in a Tableau Zen Master email thread and Shawn Wallwork replied with his trick for Windows that only takes a few seconds, and I was able to take that and come up with one for the Mac as well.

The basis for this technique is that the version of the Tableau workbook is stored in the XML of the .twb (Tableau Workbook) file, in particular the version attribute of the <workbook> tag, and we can open up the XML in any old text editor, I’ve highlighted the <workbook> tag that shows this workbook was created in version 9.1:

Screen Shot 2016-07-13 at 2.29.44 PM

However a .twbx (Tableau Packaged Workbook) is stored as a zip file so I’ve always just opened workbooks up in different versions of Tableau until I found the right one or once in awhile took the time to fire up a zip application, extract the TWBX to a folder and then looked at the .twb file. Both of these methods are slow. Shawn pointed out a shortcut on Windows using the free 7-Zip application that only takes a few seconds, here’s a demo:

check version windows

The steps on Windows are:

  1. Right-click on the TWBX and choose 7-Zip->Open Archive.
  2. Right-click on the .twb and choose View.
  3. Find the <workbook> tag and version attribute.
  4. Close the window.

That inspired me to figure out an equivalent on the Mac. I use BBEdit for my text editor (BareBones also offers a free text editor called TextWrangler that can do the same thing) and BBEdit natively supports zip files. It turns out I can just drag the TWBX onto the BBEdit icon on the dock and BBEdit will open up the zip file:

check version mac

The steps on Mac are:

  1. Drag the TWBX to the BBEdit or TextWrangler icon.
  2. Click on the .twb.
  3. Find the <workbook> tag and version attribute.
  4. Close the window.

This will easily save me hours over the course of a year, thanks, Shawn! There’s also a feature request for making version-checking easier that you can vote up. Also, if you’d like to change the version of the file you can edit that yourself, or (if you’re ok running the workbook XML through a web-based tool) using Jeffrey Shaffer’s Tableau File Conversion Utility.

Screen Shot 2016-07-12 at 4.32.32 PM

TRIMMEAN() in Tableau

Excel’s TRIMMEAN() function can be quite useful at removing outliers, essentially it removes the top and bottom Nth percent of values and then computes the mean of the rest. Here’s the equivalent formula in Tableau that in Superstore Sales computes the TRIMMEAN() of sales at the customer level removing the top and bottom 5th percentile of customers when used with the AVG() aggregation:

{FIXED [Customer Name]: SUM(
    IF {FIXED [Customer Name] : SUM([Sales])} < 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
    AND {FIXED [Customer Name] : SUM([Sales])} > 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
       [Sales]
    END)
}

Read on for how to build and validate your own TRIMMEAN() equivalent in Tableau.

When building out calculations in Tableau I try to let Tableau do as much of the computation as possible for both the calculations and the validation, so I’m typing as little as I can. Starting with Superstore, let’s identify the top and bottom 5th percentiles, here’s a view using a reference distribution:

Screen Shot 2016-07-12 at 3.57.38 PM

Now we know what we’re going to have to remove. The next step is to duplicate this worksheet as a crosstab, then build out calcs that can return the 5th and 95th percentiles of Sales at the Customer Name level. While this can be done with table calculations (here’s an example from the Tableau forums) I’m going to use FIXED Level of Detail Expressions so I’ve got a dimension I can use, so for example I could compare the trimmed group to the non-trimmed group. Here’s the 95th percentile Level of Detail Expression:

{FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}

The inner LOD is calculating the sales at the Customer level, then the outer LOD is returning the 95th percentile as a record level value. Here’s the two calcs which have values that compare to the reference lines above:

Screen Shot 2016-07-12 at 4.03.46 PM

The next step is to filter out the values outside of the desired range, here’s the TRIMMEAN Filter formula:

{FIXED [Customer Name] : SUM([Sales])} < 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
AND {FIXED [Customer Name] : SUM([Sales])} > 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)}

This uses the 5th and 95th percentile formulas and only returns True when the Customer level sales is less than the 95th percentile or greater than the 5th percentile, we can visually validate it by dropping it on the Color Shelf:

Screen Shot 2016-07-12 at 4.06.37 PM

Now that we have this the next step is to calculate what the trimmed mean would be. Again, we can use a view with a reference line, this time it’s been filtered using the TRIMMEAN Filter calc and the reference line is an average:

Screen Shot 2016-07-12 at 4.08.16 PM

Now we can embed the TRIMMEAN Filter formula inside an IF/THEN statement to only return the sales for the filtered values, this is the Trimmed Sales calc:

IF {FIXED [Customer Name] : SUM([Sales])} < 
   {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
AND {FIXED [Customer Name] : SUM([Sales])} > 
   {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
   [Sales]
END

And here it is in the workout view, only returning the sales for the trimmed customers:

Screen Shot 2016-07-12 at 4.15.36 PM

Now that we have the trimmed sales there are two ways we can go. If we want the trimmed mean without the Customer Name in the Level of Detail then we can validate that in our workout view by using Tableau’s two-pass Grand Totals to get the average of the customer-level trimmed sales. This was created by:

  1. Removing the TRIMMEAN Filter pill from Colors (this increases the vizLOD and is no longer necessary).
  2. Clicking on the Analytics tab.
  3. Dragging out a Column Grand Total.
  4. Right-clicking the SUM(Trimmed Sales) pill on Measure Values and setting Total Using->Average.

Scrolling down to the bottom we can see that the overall trimmed mean matches of 2,600.79 matches the one from the reference line.

Screen Shot 2016-07-12 at 4.20.33 PM

Note that we could have used the Summary Card instead, however using the Grand Total lets us see exact values.

There’s a problem, though, if we use the Trimmed Sales all on its own in a view it breaks, whether using SUM() or AVG():

Screen Shot 2016-07-12 at 4.25.49 PM

The reason why is that the Trimmed Sales is a record level value and Superstore is at the level of detail of individual order items, but we’re trying to compute the trimmed mean across Customer Names. For the true trimmed mean in this case we need to aggregate this trimmed sales to the Customer Name like we did in the workout view, here’s the Trimmed Sales (Customer Level) formula that uses the Trimmed Sales and wraps that in an LOD to get the Customer Level sales:

{FIXED [Customer Name]: SUM(
    IF {FIXED [Customer Name] : SUM([Sales])} < 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
    AND {FIXED [Customer Name] : SUM([Sales])} > 
    {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
       [Sales]
    END)
}

This returns the same results in the workout view:

Screen Shot 2016-07-12 at 4.31.49 PM

And works all on its own in a view:

Screen Shot 2016-07-12 at 4.32.32 PM

Now this is a case where the FIXED level of detail expression is returning different results depending on the level of detail of the view, if we want it to return the same result then we can wrap all that in one more LOD expression, this is the TRIMMEAN Fixed calculation:

{FIXED : AVG(
    {FIXED [Customer Name]: SUM(
        IF {FIXED [Customer Name] : SUM([Sales])} < 
        {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .95)}
        AND {FIXED [Customer Name] : SUM([Sales])} > 
        {FIXED : PERCENTILE({FIXED [Customer Name] : SUM([Sales])}, .05)} THEN
            [Sales]
        END)
    })
}

And here it is in the workout view and a view without any dimensions:

Screen Shot 2016-07-12 at 4.34.36 PM

Screen Shot 2016-07-12 at 4.35.35 PM

 

 

Final Comments

This is a good (and bad) example of how Tableau is different from Excel. In one bad sense note that I didn’t parameterize the percentage for the trimmed mean, this is because in Tableau it would require two parameters because we can’t put calculations as the arguments to the PERCENTILE() function. In another bad sense the calculation requires understanding Level of Detail expressions and is not wrapped into a simple formula. On the other hand we’ve got very precise control over what the calculation is computing over with those Level of Detail expressions and aren’t just limited to doing trimmed means, we could do trimmed medians, get the Nth percentile of the trimmed values, etc.

Here’s the trimmed mean workbook on Tableau Public.