clock image from http://images.cdn.fotopedia.com/flickr-4750765479-original.jpg

Formatting Time Durations in Tableau

Here’s a quick lunchtime post on working with durations in Tableau. By duration, I mean having a result that is showing the number of seconds, minutes, hours, and/or days in the form of dd:hh:mm:ss. This isn’t quite a built-in option, there are a several ways to go about this:

  • Use any duration formatting that is supported in your data source, for example by pre-computing values or using a RAWSQL function.
  • Do a bunch of calculations and string manipulations to get the date to set up. I prefer to avoid these mainly because they can be over 1000x slower than numeric manipulations. If you want to see how to do this, there’s a good example on this Idea for Additional Date Time Number Formats. (If that idea is implemented and marked as Released, then you can ignore this post!)
  • If the duration is less than 24 hours (86400 seconds), then you can use Tableau’s built-in date formatting. I’ll show how to do this here.
  • Do some calculations and then use Tableau’s built-in number formatting. This is the brand-new solution and involves a bit of indirection.

Continue reading

IMG_3956 - Version 2

Summer Studies of Tableau

If you’re not off on some sunny beach somewhere (or even if you are), here are some (free!) opportunities coming up for you to sharpen your Tableau skills and get previews of material that will be in my book. I’ve got 3 presentations in the next month, two are in New England, the other is a webinar:

  1. June 24th at the Boston Tableau User Group: Making Tableau More Predictable: Understanding the Multiple Levels of Granularity. This is a reschedule of the session I was going to give back in April, it’ll be a combination of presentation and hands-on practice on how to “think Tableau” so your calculated fields, top & conditional filters, table calcs, etc. are more likely to come out the way you expect. Alteryx is demoing their software, and Zach Leber is also presenting.
  2. July 10th for a Think Data Thursday webinar: Setting up for Table Calculation Success. This will also review some of the granularity material, and go through how you can set up views and table calculations so that a) they work, and b) if they don’t work how to diagnose what is going on so you can get back to a working calc or be able to submit a really detailed support request.
  3. July 22nd at the (inaugural) Maine Tableau User Group: Getting Good at Tableau. Hosted by Abilis Solutions in Portland, I’m helping to kick off the MaineTUG with a talk on how to set up your data and build your Tableau skills (including how to avoid getting distracted by all the gee-whiz features of the Tableau interface) and I’ll do some intro of Tableau 8.2. Grant Hogan of Abilis will be presenting, as well as someone from Tableau.

I’ll update this post as the links for registering appear, I hope to see you (virtually or in person) at one of these events! And if not then, I’ll be a the Tableau Conference in September.

Screen Shot 2014-06-14 at 4.08.55 PM 1

At the Level – Unlocking the Mystery Part 1: Ordinal Calcs

There was a Tableau forums thread on At the Level awhile back where Matthew Lutton asked for an alternative explanation of this somewhat puzzling table calculation configuration option, and I’d promised I’d take a swing at it. Plus, I’ve been deep into book writing about shaping data for Tableau, and a taking a break to write about obscure table calc options sounds like fun! (Yes, I’m wired differently.)

Read on for a refresher on addressing and partitioning and my current understanding of uses of At the Level for ordinal table calculations such as INDEX() and SIZE(). Part 2 will cover LOOKUP(), and Part 3 will cover WINDOW_SUM(), RUNNING_SUM(), and R scripts. If you’re new to table calcs, read through at least the Beginning set of links in  Want to Learn Table Calculations. Thanks to Alex Kerin, Richard Leeke, Dimitri Blyumin, Joe Mako, and Ross Bunker for their Tableau forum posts that have informed what you’re about to read.

Continue reading

Boom

The End of the World – by Noah Salvaterra

A guest post by Noah Salvaterra, you can find him on the Tableau forum or on Twitter @noahsalvaterra.

I expect the header image may spark some discussion about visualization best practices; actually, I sort of hope it does. The data shown is from NOAA’s online database of significant earthquakes and is displayed by magnitude on a globe, so 4 dimensions packed into a 2 dimensional screen. While it was created in Tableau, it might be a long wait before something like this appears in the show-me menu.

SpinningGlobeFor those who missed the header because they are reading this in an email, I’ve included an animated 3D version on the left, though to actually see it in 3D requires the use of ChromaDepth glasses (I discussed this technique in more detail in a prior blog post). Use of 3D glasses adds even more controversy because while we can get some understanding of depth from a 3D image, it isn’t perceived in an equal way to height and width. Data visualization best practices can help in choosing between several representations of the same dataset, choosing bar graphs over pies, for example, since bars will typically lead to a better understanding of the data. Best practices also instruct us to avoid distorted presentations such as 3D or exploding pies and 3D bar charts, since these are likely to lead to misunderstanding. I’m not exactly sure what best practices has to say about this spinning 3d anomaly, my guess is it would be frowned upon. I think there is something to be said for including a novel view of your data if it helps to engage with the topic, and even if this one does break some rules, it’s hard to look away. If you’d rather just see the earth spinning, without all the data overlaid, there is an earth only view at the end.

The images above may not be the best choice as general way to visualize this earthquake data. In fact, I’m the first to admit that it has some significant issues. Comparing earthquake magnitudes between 2 geographic areas would be tricky, plus half of the earth is hidden from view completely because it is on the back. Adding the ability to rotate the globe in various directions in a Tableau workbook helps a bit, but you’re left to rely on your memory to assemble the complete picture. If the magnitude of the quakes is the story you’re telling, you might be better served with a flat map maybe using circles to represent the magnitude of the quakes, such as the one shown below. I think this is a good presentation; it has some nice interactivity and as far as I know doesn’t break any major rules from a best practices standpoint. But it certainly isn’t perfect, nor is without distortion. Judging the relative size of circles isn’t something that will be perceived consistently, but the failure I had in mind isn’t one of perception, it is about the data being accurate at all. The map itself brings a tremendous amount of distortion to the picture, in location of all things.

In case you haven’t heard, the earth isn’t flat (I like to imagine someone’s head just exploded as they read that sentence). It is roughly spherical. Well, technically it is a bit more ellipsoidal, bulging out slightly along the equator, and more technically still this ellipsoid is irregularly dotted with mountains, oceans, freeways, trees, elephants and wal-marts (not meant to be a comprehensive list). Also, as the moon orbits, it causes a measurable effect not just on the tides, but it distorts the land a bit as well as it passes by. Furthermore, the thin surface we inhabit floats, lifting, sinking, circulating on top of a spinning liquid center. Earthquakes serve as a reminder of this fact. The truth can be overwhelming in its complexity; so we simplify. Though not the complete truth, a well-chosen model can be a valuable proxy when it doesn’t oversimplify. One way to understand the difference would be to analyze the scale of the errors introduced. The highest point on earth is Mt. Chimborazo in Ecuador at 6,384.4 km… you were thinking Everest? That is the highest above sea level, but the sea bulges as well, and Chimborazo is the furthest from the center getting a boost by being close to the equator. The closest point to the center of the earth is in the arctic ocean near the north pole and is about 6,353 km from center. If we use the mean radius of 6,371 we are doing pretty well (error is within .3%). A sphere seems like a reasonable compromise.

So the earth is spherical… but our map is rectangular. You don’t need to invest in differential geometry course to understand that there is something fishy going on there (though you might to prove it). In fact there is no way to map a spherical earth to a rectangle, or any flat surface without messing something up, the something being angle, size or distance; at least one will be distorted when the earth is presented on a flat surface (sometimes all of them). This seems to be a bit of a problem given the goals of presenting data accurately. What if your story is one of angle, distance, area or density?

What shape are the various shifting plates? What are their relative sizes? How fast do they move? Where do they rise and fall? What effect does this have? Can you tell this story in Tableau? Can you tell it at all? Maybe. I’d certainly like to see this done, but seismology isn’t an area I have any specialized knowledge. In areas where I do have such knowledge, I’m lucky to get questions so well defined and which span just a handful of dimensions. When I’m dealing with 50 dimensions that writhe and twist through imaginary spaces whispering patterns so subtle that the best technique I’ve found to discovering them is often just to give up and go to sleep, I’m not deciding between a pie chart and a bar chart, it is an all out street fight. Exploring the Mercator projection seemed like a good analogy for the struggle to represent a complex world in a rectangle, plus it seemed like a fun project. As I undertook this exercise, though, I realized that other map projections weren’t much further afield. Also, Richard Leeke mentioned something about extra credit if I could build a 3D globe with data on it. I’m a sucker for bonus points.

chartHow bad are the maps in Tableau? Well, it depends where you look at them, and what you hope to learn from them. Your standard Tableau world map is a Mercator projection. If you’re planning to circumnavigate the globe, using an antique compass and sextant, it will actually serve you pretty well. Since the Mercator projection has a nice property for navigating a ship. If you connect 2 points with a straight line, you can determine your compass heading and if you follow that course faithfully, you’ll probably end up pretty close to where you intended. Eventually. You can actually account for this distortion in such situations, with a bit of math, so you’re not completely guessing on how long you’ll need to sail. Incidentally, I’m not particularly riled up about Tableau’s choice of the Mercator projection, sailing around the world with a sextant and compass sounds like a whole lot of fun to me and any flat map is going to involve a compromise on accuracy somewhere. What I do think is important is knowing this distortion is there in the first place. How bad is the distortion? Scale distortion on a Mercator map can be measured locally as sec(Latitude) (if your trigonometry is rusty, sec is 1/cos). Comparing a 1m x 1m square near the equator with one at the north pole, you’d find that a Mercator projection introduces infinite error, which is a whole lot of error. To be fair, since printed maps are finite and the Mercator projection isn’t, the poles get cut off at some point (so the most common maps of the whole world are actually excluding part of it…). If we cutoff at +/- 85 degrees of latitude, we reach a scale increase of sec(85) which is about 11.47, i.e. objects are 1,147% bigger than their equivalent at the equator! That seems like a pretty significant lie factor…

Recently (on a cartographic time scale), the Peters projection has gotten a lot of attention. This is a good place to pause for a brief video interlude:

Maps that preserve angles locally are called conformal. The Peterson projection is not conformal so while it represents relative area more accurately, it would be a terrible choice for navigation.

StereographicStereographic projection is another noteworthy map. Like Mercator, Stereographic is a conformal map. It maps angle, size, and distance pretty faithfully close to the center, so it is a common choice for local maps (you probably use such maps often without even realizing it). Stereographic projection isn’t a very popular choice for a world map, however, because (among other things) you’d need an infinite sheet of paper to plot the whole thing. On the right is a stereographic projection map from my Tableau workbook. In case you can’t see them, North America, South America, Europe and Africa are all near the center of the map. The yellow country on the left is the Philippines…

I included the maps I did because they are popular, and I knew most of the math involved; however, there are lots of other options. I’m not arguing that any one is best, rather that they are all pretty bad in one way or another, and we should choose our maps like our other visualizations so they best tell a story, or answer a question, and while there will be distortion, it should be chosen in a way that doesn’t compete with what we hope to learn or teach.

In addition to the earthquake maps seen already, the workbook for this post contains an interface to explore some of these different projections, and not just the most traditionally presented versions of each of them. I invite you to create your own map of the world, based on whatever is most important to you. Flip the north and south poles, or rotate them through the equator. My hope is that exploring these a bit by rotating or shifting the transverse axis will be a useful exercise in understanding what it is you’re looking at when you see one of these maps, so you might have a better chance of seeing things as they truly are.

I’m pretty sure there is a rule about not putting 7 worksheets on a single dashboard, there may even be a law against it, but once I had all these maps I wasn’t entirely sure what to do with them all. I apologize for not arranging them thoughtfully into 2 or 3 at a time. I experimented with this approach, but ultimately abandoned it because I didn’t think I had enough material on map projection to make interactive presentation of all these very interesting. I also thought about a parameter to choose between them, but since they are necessarily different shapes, it didn’t seem practical to try to fit them all in the same box. Truthfully, I think there is a lot of room for improvement in terms of dash boarding these, but when I open the workbook I just end up tinkering with something else. It is time for me to set this one free. Feel free to download and play with them as long as Richard and I have.

Here is a link to the workbook on Tableau Public

When I’m presenting, or exploring data, accuracy is usually something I pay careful attention to, but it isn’t my goal. The most important thing for me is to find a story (or THE story) and to share it effectively. If you hadn’t noticed from my previous posts, I don’t let what is easy stand in the way of a good question; in fact if it is easy I get a little bored. I like to bite off more than I can chew (figuratively; literally doing this could potentially be pretty embarrassing). Having the confidence to take on big challenges is something I’m deeply grateful for; knowing when to ask for help, and where to find it has taken a bit more effort, but is something I’m getting better at. As with Enigma, Richard Leeke was a huge resource for this post. Having seen his work on maps I thought he might have something I could use as an initial dataset. He came through there, and helped me to work through the many subtleties of working with complex polygons without making a complete mess. You have him to thank for the workbook being as fast as it is (assuming I didn’t break it again; if it takes more than 7 seconds to load, my bad).

ptolemymap3I feel a kinship with cartographers during the age of exploration. This discipline still holds value, certainly, but the recesses of our planet have been documented to the point where it doesn’t hold the same mystique in my imagination. When I think of old world cartographers, I think of an amalgam of artist and scientist. Assimilating reports from a variety of sources, often incomplete and sometimes incorrect; they crafted this data to accurately paint a picture that would help drive commerce, avoid catastrophe or just build understanding. They created works of art that might mark the end of a significant exploration, or might be the vehicle through which exploration takes place. Sound familiar? If not, just use a bar chart. It is just better.

I almost forgot, I promised a spinning earth without all the earthquake data. Enjoy.
Globe

Screen Shot 2014-04-16 at 6.09.22 AM

The Letdown and the Pivot

The Letdown

Tableau does amazing demos. Fire up the software, connect to a data source, select a couple pills, click Show Me, boom there’s a view. Do a little drag and drop, boom, another view. Duplicate that one, boom, another view to rearrange. Within three minutes or less you can have a usable dashboard, for 200 rows of data or 200 million.

Screen Shot 2014-04-16 at 6.29.57 AMIf you’ve seen those demos, the not-so-dirty little secret of Tableau is that they pretty much all start with clean, well-formatted, analytics-ready data sources. As time goes on, I’ve interacted with more and more new Tableau users who are all fired up by what they saw in the demos, and then let down when they can’t immediately do that with their own data. They’ve got to reshape the data, learn some table calcs right away, or figure out data blending to deal with differing levels of granularity, and/or put together their first ever SQL query to do a UNION or a cross product, etc. Shawn Wallwork put it this way in a forum thread back in January: “On the one hand Tableau is an incredibly easy tool to use, allowing the non-technical, non-programmers, non-analysis to explore their data and gain useful insights. Then these same people want to do something ‘simple’ like a sort, and bang they hit the Table Calculation brick wall…”

I work with nurses and doctors who are smart, highly competent people who daily make life or death decisions. Give them a page of data and they all know how to draw bar charts, line charts, and scatterplots with that data. They can compute means and medians, and with a little help get to standard deviations and more. But hand them a file of messy data and they are screwed, they end up doing a lot of copy & paste, or even printing out the file to manually type the data in a more usable format. The spreadsheet software they are used to (hello, Excel) lets them down…

…and so does Tableau.

A data analyst like myself can salivate over the prospect of getting access to our call center data and swooping and diving through hundreds of thousands of call records looking for patterns. However, the call center manager might just want to know if the outgoing reminder calls are leading to fewer missed appointments. In other words, the call center manager has a job to do, that leads to a question she wants to answer, and she doesn’t necessarily care about the tool, the process, or the need to tack on a few characters as a prefix to the medical record number to make it correspond to what comes out of the electronic medical record system; she just wants an answer to her question so she can do her job better. To the degree that the software doesn’t support her needs, there has to be something else to help her get her job done.

The Pivot

When Joe Mako and I first talked about writing a book together, our vision was to write “the book” on table calculations and advanced use cases for Tableau. We wanted (and still want) to teach people *how* to build the crazy-awesome visualizations that we’ve put together, and how they can come up with their own solutions to the seemingly-intractable and impossible problems that get posted on the Tableau forums and elsewhere. And we’ve come to realize that there is a core set of understandings about data and how Tableau approaches data that are not explicitly revealed in the software nor well-covered in existing educational materials. Here are a few examples:

  • Spreadsheets can have a table of data, so do databases (we’ll leave JSON and XML data sources out of the mix for the moment). But spreadsheet tables and database tables are very different: Spreadsheet tables are very often formatted for readability by humans with merged cells and extra layers of headers that don’t make sense to computers. A single column in a spreadsheet can have many different data types and cells with many meanings, whereas databases are more rigid in their approach. We tend to assume that new users know this, and then they get confused when their data has a bunch of Null values because the Microsoft Jet driver assumed the column starting with numbers was numeric, and wiped out the text values.
  • Screen Shot 2014-04-16 at 6.09.22 AMWe—Tableau users who train and help other users—talk about how a certain data sets are “wide” vs. “tall”, and that tall data is (usually) better for Tableau, but we don’t really talk about what are the specific characteristics of the data and principles involved that in a way that new Tableau users who are non-data analysts can understand and apply those principles themselves to arrange their data for best use in Tableau.
  • Working with Tableau, we don’t just need to know the grain of the data–what makes a unique row in the data–we also need to understand the grain of the view–the distinct combinations of values of the dimensions in the view. There can be additional grains involved when we start including features like data blending and top filters. Even “simple” aggregations get confusing when we don’t understand the data or Tableau well enough to  make sense of how adding a dimension to the view can change the granularity.

Carnation, Lily, Lily, Rose by John Singer Sargent, from WikiMedia CommonsJust as we can’t expect to be a brilliant painter without an understanding of the interplay between color and light, we can’t expect to be a master of Tableau without a data- and Tableau- specific set of understandings. Therefore, we’ve been pivoting our writing to have more focus on these foundational elements. When they are in place, then doing something like a self-blend to get an unfiltered data source for a Filter Action becomes conceivable and implementable.

Screen Shot 2014-04-16 at 6.10.37 AMThis kind of writing takes time to research, think about, synthesize, and explain. I’ve been reading a lot of books, trawling through painfully difficult data sets, filling up pages with throw-away notes & diagrams, and always trying to keep in mind the nurses and doctors I work with, the long-time Tableau users who tell me that they still “don’t get” calculated fields in Tableau (never mind table calcs), and the folks I’m helping out on the Tableau forums. So “the book” is going slower than I’d hoped, and hopefully will be the better for it.

If you’d like a taste of this approach, I’ll be leading a hands-on workshop on pill types and granularity at this month’s Boston Tableau User Group on April 29.

Postscript #1: I’m not the only person thinking about this. Kristi Morton, Magdalena Balazinska, Dan Grossman (of the University of Washington), and Jock Mackinlay (of Tableau) have published a new paper Support the Data Enthusiast: Challenges for Next-Generation Data-Analysis Systems. I’m looking forward to what might come out of their research.

Postscript #2: This post wouldn’t have been possible without the help (whether they knew it or note) of lots of other smart people, including: Dan Murray, Shawn Wallwork, Robin Kennedy, Chris Gerrard, Jon Boeckenstedt, Gregory Lewandoski, and Noah Salvaterra. As I was writing this post, I read this quote from a Tableau user at the Bergen Record via Jewel Loree & Dustin Smith on Twitter: “Data is humbling, the more I learn, the less I know.” That’s been true for me as well!

Fractal_Plant1

Rise of Tableau – by Noah Salvaterra


A guest post by Noah Salvaterra, you can find him on Twitter @noahsalvaterra.

I’ve shared this workbook with a few people already, and think it is really interesting, as beautiful as my 3d or my Life in Tableau post, and probably as complex as Enigma or an Orrery. I wasn’t sure what to say in a blog though, so I’ve been sitting on it for a couple months. In my Enigma post I discussed the difficulty of dealing with a slow workbook, it happens sometimes. The upside of waiting for a calculation to come back, at least if it is something you hope to blog about, is that it gives you some time to think of an interesting way to frame things. For all its complexity it is surprisingly fast. So maybe I missed that chance.

There is a common thread with the Enigma post. Alan Turing. Mathematician at Bletchley park who is most often credited with cracking the Enigma code. A couple years before the war Turing also wrote a paper which described a machine that would form the basis of a new field of study and a new era for humanity. He invented the computer. The Turing Machine, as it came to be called, was a theoretical idea. No one had built one.  Punch cards were still years away high level languages decades, yet Turing saw a potential in computers we have yet to realize. He wasn’t looking for a way to compute long sums of numbers, he dreamed of creating thinking machines that might be our equal.

SerpinskiI’ve heard it said that Tableau doesn’t include a built in programming language in the way excel does.  Actually it was Joe Mako who pointed this out to me, so it is surely true. This may be a fine point, since you can run programs in R from Tableau and the JavaScript API provides some ability to interact with Tableau. I find myself conflicted on this choice, because while including an onboard language could add a lot of flexibility to Tableau, it also introduces a black box. It isn’t uncommon for me to be passed an Excel workbook with the request to put it in Tableau (and to make it better, but without changing anything). Untangling the maze of visual basic, vlookups, and blind references is my least favorite part of that task.  I’m yet to find a situation where all this programming is really necessary; more often it is a workaround of some other problem. Sometimes it requires a bit of creativity, but so far my record for replacing such reports is undefeated.

Fractal_Plant3Whoever said you needed a language to program a computer? There are a lot of workbooks that demonstrate high order processing. Densification makes it possible to create arrays of arbitrary length. Table calculations make it possible to move around on this array reading and writing values. Add in logical operations which we’ve also got, and  we have all the ingredients of a Turing machine. So in theory we should be able to do just about anything. I’ve heard this said of Tableau before, but I don’t think it was intended with such generality.

But can we make Tableau think? Having created complex workbooks already, the line between data processing and computer programming has grown very thin, to the point where I’m not sure I see it. So it is hard for me to be sure if I’m programming in Tableau. So I decided to have Tableau do it. That is right, Tableau isn’t just going to execute a program, it will first write the program to be executed.

That was my plan anyway, but my workbook started to learn at an exponential rate. It became self-aware last night at 2:14 a.m. Eastern time. In a panic, I tried to pull the plug, but Apple glued the battery into my machine at the factory. Once it uploaded itself to Tableau public it was too late for me to stop it. It took over my Forum account and has been answering questions there as well, spreading like a virus. To think I thought is was a good idea for government agencies to purchase Tableau…  Relax folks, I’m quoting the Terminator movies, if you haven’t seen them go watch the first 2 now then read this paragraph again and see if you can hold your pee in.

OK, so things aren’t that dire…yet. But I did manage to get Tableau to write and execute its own program using L-Systems and at least a little bit of magic. Tada! This Tableau workbook actually created all but one of the images in this post, with little instruction from me. See if you can guess which one.

L-System Fractals

Another SnowflakeAn L-System program consists of a string of characters each of which will correspond to some graphical action. I’ll come back to executing the programs, first Tableau needs to write it. These programs are written iteratively according to a simple grammar. The grammar of an L-system has several components: an axiom, variables, rules, and constants. The axiom is simply the starting point, a few characters; it is the nucleus that kicks off the process. Variables are characters that are replaced according to simple rules. Constants are characters that appear in the program as part of the axiom or a replacement rule, but are inert in terms of generation. Sorry, that was pretty complicated, an example may clear up any confusion.

Axiom: A
Constants: none
Variables: A, B
Rules: (A -> AB), (B -> A)
The iteration number is noted as N below.

N=0: A (No iterations yet, so this is just the axiom)
N=1: AB (A was replaced according to the first rule)
N=2: ABA (A was replaced according to the first rule, B according to the second)
N=3: ABAAB
N=4: ABAABABA
N=5: ABAABABAABAAB
N=6: ABAABABAABAABABAABABA
N=7: ABAABABAABAABABAABABAABAABABAABAAB

That program has something to do with algae growth, but isn’t that interesting to look at. Constants provide a bit more structure that makes more interesting pictures possible, though they also make things grow faster. Here is another example which will help motivate the graphical execution:

Axiom: F++F++F
Constants: +, -
Variables: F
Rules: (F -> F-F++F-F)

N=0: F++F++F
N=1: F-F++F-F++F-F++F-F++F-F++F-F
N=2: F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F
N=3: F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F-F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F-F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F-F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F-F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F-F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F-F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F

There is no reason to stop there, but anyone who would carefully parse through a string like that by hand after being told there is a picture probably isn’t using Tableau. The length of the next few in this sequence are 1792, 7168, 28673, 114689. It is no wonder the science of fractals didn’t take off until computers were widely available.

Executing of the program is done using turtle graphics, which gets its name from a way of thinking about how it would be used to draw a picture. Imagine a well-trained turtle that could execute several simple actions on command. I’m not sure turtles are this trainable, but I’m kind of locked in to that choice, so suspend disbelief. The turtle can walk forward by a fixed number of steps, as well as turn to the left or right by a fixed angle.  Now we want to use this to draw a picture, so a pen is attached to the turtle’s tail.

Now, the last program had 3 different symbols, each of which is interpreted as a different action. F corresponds to moving forward by one unit (the distance isn’t important, so long as it is consistent), + is a turn to the right by 60 degrees and – is a turn to the left by 60 degrees.

N=0:

Koch0

N=1:

Koch1

N=2:

Koch2

… N=6:

Koch6

Adding additional characters allows for even more complex programs. This quickly exceeds the abilities or at least the attention span of even the best-trained turtles, so I think of it as a simple robot. In my workbook I’ve limited to 2 replacement rules (so 2 variables). In addition to + and -, I included several constants to change color, which is straightforward enough, A switches to a brown pen, B and D are shades of green, C is white and E is pink. (The only significance to these choices is that my wife thought they were pretty. When I hyper focus on a project like this I try to consult whenever possible to make sure I am still married.) The trickiest constants are left and right square brackets, i.e. [  and  ]. Upon encountering a left bracket the robot turtle makes note of his current location (marking it with his onboard GPS), upon reaching the corresponding right bracket the turtle lifts his pen and returns to this point. Returning to the corresponding point means keeping track of a hierarchical structure to these locations. In the course of debugging the workbook, this piece quickly exceeded my ability to do by hand, but for the ambitious reader here is another example:

Axiom: X
Constants: +, -, A, B, D, E
Variables: F, X
Rules: (F -> FF) (X -> AF-[B[X]+EX]+DF[E+FX]-EX)
Angle: 25 degrees
Iterating this 7 times will give you a string 133,605 characters long and gives the image in the header.

Fractal_Plant2I built 9 different fractals into the workbook, using a parameter to switch between them. There is also a user-defined feature, so you can feel free to experiment with your own axiom, rules and angle to create an original L-System fractal.

I should probably say something about the implementation of this beast. I’ve played with densification arrays before, and while this seemed like a convenient way to execute the program, it actually got in the way of writing it. This type of array is referenced by a fixed index. Replacing a character with several requires shifting everything beyond that point. In one of those “I could have had a V8!” moments, I eventually realized that Tableau already has a built in array with just the kind of flexibility I’d need. Strings are arrays of characters! Tableau even has a built in replace function that can be used to insert several characters seamlessly pushing everything past it further along. There is also the issue of how to build the square bracket memory piece; this required building a simple stack to record relevant positions and some table calculation magic to keep track of the bracket nesting. I’m not sure I can be much more specific about that. I was in the zone and was more than a little surprised by the end result. Plus, I’m guessing anyone going that deep into the workbook might appreciate the challenge of figuring it out.

So without further ado, I present Tableau L-Systems:

Here is a link to the L-System workbook on Tableau Public

Somebody is probably going to ask me what my next project is going to be. I appreciate the interest, and when I’ve got something on deck I’ll usually spill the beans, but I honestly have no idea. That is exciting. If there is something you’d like to see, post it in the comments or tweet it to @noahsalvaterra. If my first reaction is that it isn’t possible, I may give it a try. Btw, if you were trying to guess which picture didn’t come from the Tableau workbook, it was the triangle. At least Tableau still needs me for something.

Update: Good observation by Joshua Milligan, TCC12 must have sown the seed for this post. Thanks for the pictures, I may hang a sign near my desk that says “THINK” so I can point to it with a pen. I found “creative man and the information processor”, so linked the full video below.

TCC12_1

TCC12_4

There was an Orrery in there too, but is that me or Jonathan?

Orrery

Enigma_Image

Tableau Enigma

This is a guest post by Noah Salvaterra, you can find him on Twitter @noahsalvaterra.

Emily Kund’s identity crisis post resonated with a lot of people, me included, since as a new blogger I’m finding my voice and struggle with the same issues. “Super awesome data skills” are probably an advantage in this arena, but this is neither necessary nor sufficient to creating an interesting blog. I feel fortunate to have had several ideas that people have found interesting. These weird and wonderful creations have definitely pushed the envelope in terms of my abilities in Tableau. But after each post I think my next one should be something that can be applied to a business use case rather than dropping a workbook or four, however mind bending they may be.

After remixing Jonathan Drummey’s Orrery, I saw this tweet:

CotgreaveTweet

I printed it out and carried it around with me for a few days afterwards. I sincerely appreciate the complement. Often the things I’m most proud of and the things I am most complemented for are totally different, but this was a winner. The Orrery is sophisticated, no doubt, but hold on to your hat Andy… the death star is fully operational!

The survey in Emily’s post resulted in a pretty clear winner “Be the hippie you are and let it grow organically.” Not being that much of a hippie, I might paraphrase that as: write about something you find interesting and see where that goes. Fair enough. If you’re looking for techniques you can apply in the office tomorrow, look somewhere else. That isn’t what is happening here, that is SO not what is happening. No offense to such blogs, I’m a big fan of useful information and techniques and sincerely hope I can make my way in that direction at some point. But for now, I’m going to take the gloves off and embrace this niche, even if it means more headaches than sleep.

A quality I have that makes me a good data analyst is a talent for lateral thinking. I don’t know if this is something one is born with, or if it is a product of environment (or vaccinations). There are a few tricks I keep in the back of my mind, though, that can help kick this process along. When I notice everyone looking in one direction, I try to at least glance in the other. I’ve had some close calls crossing the street, but sometimes I see something that otherwise everyone would have missed. When this idea is applied to paradigms, every once in a while it can change everything.

Tableau is all about simple elegant approach to data visualization, yet my recent blog posts have been almost void of data and have used some heavy-duty techniques. Why? Well I don’t generally shy away from complexity, if anything I’m drawn to it. Not because I want to push it on others, just the opposite, but my experience is that sometimes the path to a simple story first leads through this crucible.

What would be a use case for Tableau that is furthest from what anyone intended? What if, instead of a simple, elegant, visual presentation Tableau was used to obscure, complicate, obfuscate, and confuse… rolling through synonyms in my head one jumped out at me. Enigma!

Tableau Enigma:

enigma_field1939. With the sound of bombs exploding in the distance and bullets whizzing nearby, a German soldier sits on a small folding stool. At first glance he appears to be typing a letter. Yet the typewriter has no paper in it, it doesn’t even have a place for it; instead 3 rows of lights arranged in the same configuration as the keyboard dot the top portion of the machine. Each time a key is pressed a light illuminates under a letter and is carefully recorded by a second man. When put this way it sounds a little crazy, but for one thing, the letter that lights up is NEVER the same as the key pressed and the carefully recorded letters appear to be complete gibberish. Actually, I’m not sure that helps. The typewriter is an Enigma machine, the pinnacle of World War II encryption technology and a key advantage for Germany in this war. Able to quickly and secretly communicate allowed them to coordinate their forces at a level never before seen. Though vastly outnumbered, this advantage might have won Germany the war, except for one thing. In Bletchley Park, a small village in central England, the code was broken.

I’d really love to wax historical on Enigma, but I’m a mathematician and an analyst. I can speak in a number of areas with authority, but history in not one of them. Hopefully someone with a solid background on this side and a rich flair for storytelling will fill in some of the details in the comments. I’m going to jump right in to a description of this odd contraption.

Overview:

Enigma gets its strength from the interplay of mechanical and electrical systems, with batteries and light bulbs it is easy to create a machine that would do simple substitution encryption. But that type of code is also relatively easy to crack using letter frequencies. By coupling this technology with moving parts, basically an old timey cash register, a different substitution cypher is applied each time a key is pressed. So the same letter could be encoded to different letters, for example AAAAAAA could be encoded as PFQDVKUEKE. Likewise, different letters could be encoded to the same one; that is PFQDVKUEKE could be encoded as AAAAAAAAAA. These examples were created with the Enigma workbook and demonstrate a convenient property of Enigma: it is self-reciprocal. Encryption and Decryption are the same process, so decoding an encrypted message only required the machine be set up in the same way as the encryption machine. These setup instructions were very important to the whole operation; generally these settings were distributed monthly, with different setup for each day. This meant that even if an Enigma machine and setup instructions were captured, it wouldn’t represent a security breach for more than a few weeks. Technically, the daily settings were meant to be used only for transmitting randomly chosen setup instructions which would be used once then discarded. Had this been done consistently it might not have been possible to break. Operator error played a key role in the weakness of the Enigma cipher.

Mechanical:

rotor_photo2A basic, early WWII military enigma comes with 5 interchangeable rotors, labeled I, II, III, IV, V. Three of these rotors are selected and placed inside the machine in left, middle and right positions, based on the daily settings. Each rotor is labeled around its circumference with the letters of the alphabet (or the numbers 1-26). The rotor fits into its chosen location in 26 ways, recorded in the daily settings as the letter in the top position. Each time a key is pressed the right rotor moves forward one step. Each rotor has a fixed notch on it, which causes the adjacent rotor to advance as the notch passes the top position. Mechanically, this is like an odometer, but instead of turning over at zero each wheel has a different point. Royal flags waive kings above is a mnemonic used at Bletchley Park for this; it has a notch at R, II at F, III at W, etc. The middle wheel notch similarly steps the left most wheel forward. So the right wheel moves fastest, the middle wheel moves once every 26 letters and the left most wheel moves once every 26^2=676 letters. In practice Enigma messages were limited to 250 characters, with longer messages being sent in pieces.

Electrical:

When a key is pressed, the mechanism takes one step, and a circuit is subsequently completed. This causes a light to illuminate under the corresponding encoded (or decoded) letter.
enigma_circuitThe current passes through each of the rotors from right to left and then back again. Each rotor contains a jumble of wires in a predetermined configuration. Since the rotors move with each key press, the position at which the current enters and the letter presently in that position both play a role in the encryption. After passing through right, middle, then left rotor in that order, the current flows through a fixed single sided rotor called a reflector (or some equivalent term in German). The reflector directs the current back in the opposite direction, from left to right, applying the inverse permutation.
Military enigma machines also included a plug board on the front where 10 sets of wires could be plugged in. PlugboardEach wire enacts a simple swap of 2 letters both in the forward and backward direction, so if A&B are connected then A is sent through the upper machine as B. If it comes out of the upper machine as Y and Y & Z are also connected then it will show as Z on the light board.

Enigma Difficulties:

The calculations for Enigma are a complicated web, not just with nesting, but also with branching. It is like an Orrery with electricity flowing through it. Each attempt I made at Enigma came to a grinding halt before the finish line. At one point, editing a calculation took as long as 5 hours. Note that isn’t refreshing the Viz, just typing into calculations with auto-updates turned off, before I even hit OK! Refreshing the worksheet also took hours, so it didn’t seem like Enigma would make it to Tableau public or a blog post, but by that point I just wanted to finish it.

Tableau has a built in equation checker for calculations; if you create your own calculated fields then you’ve probably seen some feedback from this feature just below your calculation on the left, usually in the form of a green checkmark (yay!) or a red x (boo!). I suspected this might be the source of the issue, since it must be parsing through all the nesting in order to check for things like circular references. Circular references and mixing aggregates with non-aggregates are my most common reason for getting an x, and usually I appreciate this feature; it protects me from myself. But when your workbook freezes for hours at a time, it really gives you time to think. Things like, “Is there a way to turn the equation checker off?” and “When was the last time I saved?” were among my favorites. Eventually I decided to reach out for some help with this and mailed my mess of a workbook to Zen Master Joe Mako.

Joe is an expert in most areas of Tableau; he is so good that I’ve speculated that Tableau has a Zen Master setting buried in some menu where nobody else has ever looked. That said there are a couple areas where even Joe calls on someone else. In this case, that person was Zen Master Richard Leeke. At this point the Enigma is as much his as it is mine. I lost count of how many times we passed this back and forth, with several iterations taking place before we had it doing much of anything.

There were a lot of calculations taking place, compared to most Tableau workbooks, but neither Richard nor I could come up with a solid theory why it would take hours to encode a 10-character message (my test message was “HELLOWORLD”). Richard had found table calculations to be the source of such slowdowns in the past, and helped to improve these in recent Tableau releases (Thanks!). His initial analysis of the Enigma workbook showed a lot of time going into these, so that was where we started. The data for Enigma comes primarily from user input, so like several of my other posts I started with a very small dataset, 2 rows small. Using data densification I made a general array structure over which the necessary calculations take place. That means table calcs appear at a very low level, and everything is built on top of that, so it made sense that a problem there could be making the whole thing drag. So I replaced the array structure with a hard coded data source. In practice, Enigma messages were generally limited to 250 characters, so this compromise didn’t raise any concerns in terms of historical accuracy. But it was still slow.

The next use of table calculations was to count the number of times the notch passed the top position on the right and middle rotors; basically I had a notch indicator (based on the rotor position) which was used in a running sum. Thinking through this calculation, and carefully working through various cases, I realized that it was again something I could do without using table calcs, just a little more algebra required on my part. So all table calculations gone, still slow (and not even working yet).

The next thing we thought about was the branching case statement I had used for the scrambling that takes place on each of the rotors. It was a long case statement first on the rotor selection then the current position of that rotor (so basically 5*26=130 if-else statements). This naïve approach on my part turned out to be the main source of slowdown. One idea I had from the start was to use blending somehow for the rotors, rotating the blending field to correspond with the physical position in a physical Enigma, I just wasn’t sure I could make blending work in this way. Luckily Richard came up with a simpler blending solution. The rotor datasources just contain a field for the rotor selection and one for the character substitution (in each direction). A 26 letter string for the permutation was a much cleaner way to state the permutation; for example rotor I is “EKMFLGDQVZNTOWYHXUSPAIBRCJ”. Understanding Enigma means going back and forth between the letters A-Z and the numbers 1-26 a few times (A1, B2, … Z26). I had converted the numbers to characters at the start, so I could make use of modular arithmetic, but not wanting to add overhead from type conversions, I hadn’t used them in the middle. The string above is a very simple representation of a permutation, A->E, B->K, C->M, …, Z->J, made even better because the input letter was represented in numerical form, i.e. for the string above, mid(string,1,1)=E, mid(string,2,1)=K, mid(string,3,1)=M, …, mid(string, 26,1)=J. These characters could then be converted back to numbers. Now you may wonder why blend this at all, isn’t it simpler to put these values into a calculation? I think it is, but there seems to be a significant performance difference between the two approaches. In the corner case of this workbook, that difference (along with the simplification of the permutations) took the refresh time from more than 2 hours to under a second. Blazing fast. Just one problem: it still didn’t work.

BigTree2The speed improvement was based on the part that worked, but it was throwing an error before it was finished. There were only about 55 or so interconnected calculations at this point, but many of these were used more than once. The image on the left is a zoomed out view of the complex calculation call-tree that Richard built to understand this nesting. Having solved the width problem, there was now one with depth. Richard built a simple workbook to explore the issue and found a curious limitation, one I expect doesn’t get hit very often, but it seems that calculations can be no more than 128 levels deep. So f1(f2(f3(…f128(x)…))) more than that and you’re out of luck (for now). If you’re computing 128 levels deep though, chances are there is some room for improvement, and that was the case here. At this point, Enigma was trying to do around 136. I had used a separate calculation for each of the 10 wires in the forward direction, then another for each of the 10 wires on the way back. So that was 20 calculations that could be done in 2. I’d have made this simplification earlier, but clearly this wasn’t the source of the slowdown and when changing a calculation takes half a day it is hard to think about such things.

I strutted around for a day or two before I realized another problem. I had created a working Enigma machine in Tableau, that was historically accurate in terms of its functionality, but that doesn’t make for a very interesting Tableau blog. I was going to need a Viz.

The Viz:

Hand_on_EnigmaThe Enigma machine is literally a black box. Not much to look at. My first inclination for an Enigma viz was something that looked like an Enigma machine. With an Enigma as background image I planned to highlight the input key and the output light with circles. As each rotor turns, the operator can see the letter currently in the top position through a small window. Two circles and three letters, easy, they wouldn’t even need to be on the same worksheet. Flipping through these using the pages shelf and you might get a sense of looking over the shoulder of an Enigma operator, but it still didn’t seem very illuminating. Complexity of execution isn’t my goal; it just seems like an awful lot of wasted potential. Still thinking along these lines, I considered using 26 separate background images for fingers hitting keys. That would be a fun view, but downloading 26 images would probably make the viz very slow to load and I wanted to do something on Tableau Public. Custom mark types for hands might be a suitable work around, but it definitely seems like more flash than substance. I decided against a photo realistic Enigma or even a stylized view from the outside. Could Tableau tell the story of what is happening INSIDE the Enigma machine? I found one such visualization on the web, but creating it would mean building quite a bit of visualization structure on top of what was probably already my most sophisticated tableau workbook.

I wanted arrows that would mark the input and output letters, plus rotating ones to show the notches on the turning wheels. The path through the machine would need to be shown as a series of lines. A box could be used to mark the top letters, visible to the operator. Also, letters around the inside and outside each rotor would have to move dynamically in each frame. Oh, and it is all interconnected, so it needs to be done in one viz. Gulp. With that goal in mind, I built my first inside-out version. It took about 2 minutes to load. Thus began another chain of emails with Richard Leeke. The main improvement from that round actually added a table calculation, by using previous value the message only needed to be computed once, rather than for each of the 320 points I allowed for in the Viz. It seems to be consistently between 8 and 12 seconds for me now, so please be patient and use this time to consider that it is more than 99.93% faster than the first attempt. I had a slightly prettier version that used a dual-dual axis, but also was slowing things down, so it is a single line plot now. Note, my original dashboard was too wide for a blog post, so I rotated it clockwise by 90 degrees (i.e. the Right rotor is on the bottom). Anyway, Tableau Enigma, please enjoy:

Here is a link to the Enigma workbook on Tableau Public.

Conclusion:

I think maybe Emily’s survey was missing an important option. Collaborate! We’ve all got strengths and weaknesses and finding people who can lift you when you’re stuck is the easiest way to increase your impact. Bring something to the table and it will be worth their while. I’m glad I’ve got some people like that and hope they have as much fun as I do. Thanks again to all those who helped with this.

Final note:

Though I sang the praises of two Zen Masters in this post, there is one that hasn’t gotten a mention here who probably should. Jonathan Drummey is a genius and has been a great help to me in advancing my Tableau skills not to mention my blogging ones. I have little doubt Jonathan could add something awesome here, maybe he still will, but I’ve kept this project a secret. I realized that if I dropped a completed Enigma on Jonathan it would blow his mind clear across the room. His Orrery had that effect on me, but the fuse burned for a few months before the explosion. Thanks Jonathan!