In all the hype about big data, we have to acknowledge that some of us have “wee” data. Not every organization has a fully-built out Information Systems department or Business Intelligence team with access to petabytes of data and the latest tools like Hadoop and Alteryx. Some of us are still running on legacy hardware and software, have tiny budgets, part-time staff, and thousands or tens of thousands of records that we want to analyze vs. billions.
My day job is in the latter camp. For all that US healthcare includes the latest treatments and technology, healthcare IT has historically been behind the times. My desktop is running Windows XP SP3, Office 2007 is our productivity tool, Microsoft Access is our most commonly used database, and Tableau is our go-to choice for data visualization (so there’s at least one area where we’re we’ve got current technology).
Every couple-few months I get a question about Microsoft Access and Tableau, I thought I’d take a few minutes to combine my answers into one post, so read on for what I know about integrating Access and Tableau.
On Windows, Tableau accesses Microsoft Access databases through the Microsoft JET ODBC driver. If your system doesn’t have Microsoft Access installed, you can get the driver at http://www.tableausoftware.com/support/drivers for read-only access. If you’re on Tableau for Mac you’ll need to buy a 3rd party ODBC driver to connect to an Access DB, here are the two that Microsoft references on their website: Actual Technologies http://www.actualtech.com/ and OpenLink http://openlinksw.com.
Supported Features in Access
The ODBC standard supports a subset of what can be done in Microsoft Access: You can use SELECT queries and/or tables as data sources, include subqueries, linked tables, and cross products/cartesian joins. I haven’t tried using crosstab queries as data sources, why use those when I have Tableau? However, queries involving parameters and queries that use VB code won’t pass through ODBC, those queries won’t even show up in the connection window in Tableau. This is also true for subqueries as well.
If you are using a UNION query, that won’t be selectable either, however you can wrap that query inside a SELECT query and it will then be available.
There are some fundamental limits of JET and Access, namely that a query can’t have more than 255 columns and a given column can’t have more than 255 characters (columns with the memo data type are truncated at 255 characters). In addition, there times when multiple levels of nested queries & aggregations just get too complex for JET & Access and the query doesn’t work.
Given those limitations, we can still do quite a bit. Here’s a screenshot of part of one of my Access queries that is used as a Tableau data source:
Also note that any kind of query that you can write that would be visible as a connection for ODBC can be used as Custom SQL, though the Custom SQL syntax used by Tableau is slightly different.
Performance & Sizing
Access wasn’t built to run entire enterprises (though I’m sure some have tried), so it’s not a highly performant, highly tunable system like Postgres, MySQL, SQL Server, Oracle, etc. When there are a few thousand or tens of thousands of records, performance can be acceptable for a live connection (see notes below about latency). If you’ve got a million records in a single Access table, be prepared to wait several seconds for any given query, even on indexed fields. If you have a million records in a table with dozens of columns, Access might start having a very hard time doing things like compacting & repairing the database. I’ve had to separate data into multiple databases and even a single table across multiple databases to handle the load. If you’re having to store multiple million records in a table, either do text files or get a better database. Some free options are Postgres, MySQL, and SQL Server Express.
All of our Access databases are on network file servers and we’ve found that they are really sensitive to network latency & throughput. A query or refresh that might take a split second at 6am takes several seconds by 8am, or sometimes opening a table or query takes 30-60 seconds in the middle of the day. In addition, when Tableau is connecting to Access to create a new connection or add tables to an existing connection, there’s a set of handshaking that has to go on. In some situations with a large database (lots of data, lots of tables, and/or lots of queries) this can take several minutes at a time, ugh.
For all those performance reasons, for me almost everything that is user-facing uses a Tableau Data Extract, I’ve found extracts to regularly be an order of magnitude faster and up to 1000+ times faster than Access. When working with less than 10K records in an analytic mode, I might use a live connection, but as soon as it’s getting slow I’m extracting the data to take advantage of what Tableau can do.
Fuggedaboutit. As far as I can see, Access was designed as an individual/departmental database, with some multi-user features tacked on over time. The fundamental security model is file-based, not role- or record-based. If you’re developing an entire Access application in VB you can do more with security, but when it comes to Tableau connecting to the underlying data all of those controls go away. With the databases on file servers using Active Directory we’re able to control who has access at the file & folder level. My users are very trustworthy (they are in the business of saving lives, after all) but not that technical, so my biggest fear is someone accidentally deleting a folder full of Access DBs. So I educate my users about what to do and what not to do, keep databases in separate folders with appropriate permissions, and have taught them to be scared of deleting anything. Moving to a more robust database at some point has been on my Someday-Maybe list since day 1; we’re in the process of spinning up a SQL Server to deal with this and handle various performance issues.
Access and Tableau Server
Here are three critical points for successfully using Access with Tableau Server:
- Set up your Tableau Server so that the user Tableau Server runs as has permissions to wherever you are putting the Access databases (or Excel or text files) to use as data sources. If using linked tables in Access, the Tableau Server run-as user is also going to need permissions on those linked databases.
- In Tableau Desktop always use the UNC path like \\localhost\pathname rather than C:\pathname. If you didn’t do that at the beginning, you can always edit the connection later, which is better than replacing the data source because if you do the latter then you’ll lose colors & aliases. I keep a text file of snippets around so every time I have to connect to \\smmc.local\SMHC Quality DB\Core\Core.accdb I can just cut & paste.
- When publishing to server, make sure you *uncheck* “Include External Files”. If you don’t do this, you’ll be scratching your head later when the visualization doesn’t show updated data.
If you didn’t do step 2 or step 1, when you publish as Tableau is rendering the active worksheet in the preview window Tableau will spit errors at you saying it can’t display the viz. Note that Tableau only does that for the data sources required for the particular worksheet that is displayed in the preview, so if you have data sources A and B, and the preview worksheet uses A but B isn’t available, the publishing will work fine and you won’t get the error until someone on Tableau Server tries to access a worksheet or dashboard that uses data source B.
Supported Features in Tableau
Tableau does as much of the computations as possible in the data source, and that does put some limitations on what can be done in Tableau when the data source doesn’t have a ton of functionality, and that is true for ODBC connections to Access databases. The following functions are not available for live connections to a Microsoft Access database: COUNTD(), MEDIAN(), PERCENTILE(), IN/OUT of Sets, otherwise everything else is, including the RAWSQL functions. The two workarounds to those limitations are a) use a more robust data source or b) use a Tableau Data Extract.
It’s also worth noting that there can be differences in how JET sources compute a result vs. Tableau data extracts, for example in past releases I’ve found differences in the ROUND() and SIGN() functions, never mind the can o’ worms that is JET’s handling of epoch dates, so if you’re doing a lot of computation in your Access queries, take a little time to validate that your worksheets aren’t unexpectedly changing when you switch to an extract.
How I Use Microsoft Access with Tableau
Our data is organized in a snowflake schema, with several fact tables for patient encounters, diagnoses, procedures, measure results, etc. and a set of dimension tables. The fact tables are stored in several databases (given volumes), the dimension tables in another database, and then there are separate databases that have linked tables and queries, I’m looking at having that particular bit of maintenance headache go away once we get SQL Server up and running.
Given what I’ve described so far, I’ve got two ways of working with Access and Tableau:
Ad hoc & Unplanned Work w/Thousands of Records
When I’ve got that “Hey, I’ve got this data and a presentation due tomorrow, what can you tell me about it?” request I’ll pull the data in Access, if it’s a few thousand or maybe tens of thousands of records I’ll be likely to leave it a live connection unless I need COUNTD(), MEDIAN(), etc. or I have performance issues. If I need other tables I’ll use Tableau Data Blends to put together what I need. I’ve gotten into the habit of always connecting to Access sources using UNC paths because I never know when I’m going to need to quickly publish something.
Planned Work, or Work with Tens of Thousands of Records or More
In the case that I know I’ve got bigger data, or that what I’m building is something that is going to be published on a recurring basis, then I’ll build a query in Access, joining any tables I need, using subqueries where necessary, etc. Since our environment is such that queries can be slow, then I’ll create an extract in Tableau and work from that.
I generally don’t use Custom SQL nor multiple tables connections in Tableau for the following reasons:
- The biggest is that I like Access’s visual query designer way more than editing the text of Custom SQL. Custom SQL for JET sources has about three error messages that could cover anything from a missing comma to your file server having been dropped into one of those Siberian sinkholes, the messages are incredibly non-helpful.
- The visual connection interface in 8.2 is nice, but still not as easy to work with as Access. For example, in Access I can add a custom WHERE clause on a query or use a cross product, those aren’t possible in 8.2’s connection interface.
- Given the sometimes slowness of any given connection to Access, I don’t want to have to wait several seconds to a minute or more every time I’m adding a table. Using the Access query designer avoids that.
- I’m the the only data analyst in my department so I’m the requirements analyzer, developer, DBA, trainer, technical writer, and most-importantly-for-this-section the person who has to live with anything I build. I’ll do the work of joins and computation in Access or in Tableau’s workspace, but not in Custom SQL nor multiple tables connections, so if/when something breaks I only have two places to look, not potentially four.
- The potential performance gains from letting Tableau manage a multiple tables live connection are ignored when using an extract, and I’m mostly using extracts.
- Maintaining a single query is a lot easier than maintaing several workbooks that use similar multiple tables connections.
I try very hard to avoid using queries that have to write out data to intermediate tables, because that imposes an extra step in updating the data. I might use intermediate tables in an ETL process, but not for data for Tableau.
Why Use Microsoft Access with Tableau?
Given all I’ve dished out on Access, why use it? I think this can be divided in general into two cases, one is “Why Use Access vs. Excel?”, the other is “Why Use MS Access vs. Another Database?”
When it comes to Access vs. Excel, I’m pretty hands-down in favor of Access for data validation alone. As much as we can use Excel like a database (and I have), Excel is fundamentally not a database. Microsoft has a free Access runtime, so setting up a form or data entry table in Access that has valid fields with real data types is as fast or faster than Excel, can have indexes and referential integrity built-in, and then the data can be instantly ready for analysis. Queries are re-usable, whereas in Excel unless you’re careful you’ll end up rebuilding multiple tables connections again and again.
For Access vs. most any other database, that other database almost certainly will have more functionality and performance than Access. However, there are a few points in Access’s favor:
- Skill needed: As much as the various vendors have improved usability over the years, Access is much easier to use and administer (with caveats about Access’s limits). With Access’s visual query designer, I can teach an RN with few technical skills how to write a valid, usable query without them ever having to learn SQL. The data window is familiar to my users from MS Excel and easy to get data in & out of.
- License cost: Certainly there are free options like MySQL, Postgres, and SQL Server Express, however if you’re looking at paid software a few Access licenses will be a lot cheaper than even the maintenance on a paid option.
- What’s available and usable functionality: For example, the fastest SQL Server in the world isn’t so useful if your DBA hasn’t given you permissions to create a view on it (yes, I’ve heard of this happening). Besides the control, Access can be used to connect to most anything out there and create federated databases where a linked table in Access is pointing to SQL Server, see http://office.microsoft.com/en-us/access-help/import-or-link-to-sql-server-data-HA010200494.aspx for one set of instructions.
I think of Access as kind of like a Swiss army knife, within a certain range of tasks and requirements it’s quite functional, beyond that we have to resort to other tools. If you’re exploring using Access and Tableau, I hope this is useful for you, if you have any comments please leave them below!