I Have Wee Data – Microsoft Access and Tableau

In all the hype about big data, we have to acknowledge that some of us have “wee” data. Not every organization has a fully-built out Information Systems department or Business Intelligence team with access to petabytes of data and the latest tools like Hadoop and Alteryx. Some of us are still running on legacy hardware and software, have tiny budgets, part-time staff, and thousands or tens of thousands of records that we want to analyze vs. billions.

My day job is in the latter camp. For all that US healthcare includes the latest treatments and technology, healthcare IT has historically been behind the times. My desktop is running Windows XP SP3, Office 2007 is our productivity tool, Microsoft Access is our most commonly used database, and Tableau is our go-to choice for data visualization (so there’s at least one area where we’re we’ve got current technology).

Every couple-few months I get a question about Microsoft Access and Tableau, I thought I’d take a few minutes to combine my answers into one post, so read on for what I know about integrating Access and Tableau.

Connectivity

On Windows, Tableau accesses Microsoft Access databases through the Microsoft JET ODBC driver. If your system doesn’t have Microsoft Access installed, you can get the driver at http://www.tableausoftware.com/support/drivers for read-only access. If you’re on Tableau for Mac you’ll need to buy a 3rd party ODBC driver to connect to an Access DB, here are the two that Microsoft references on their website: Actual Technologies http://www.actualtech.com/  and OpenLink http://openlinksw.com.

If a workbook is built using an Access DB that is then extracted into a Tableau Data Extract, that workbook can be opened up on Tableau for Mac or Windows without needing an extra driver, because the Tableau data connection is to the extract, not the original data. If you want to update that extract or have a live connection to the Access data then you’d need to have the appropriate drivers installed.

Supported Features in Access

The ODBC standard supports a subset of what can be done in Microsoft Access: You can use SELECT queries and/or tables as data sources, include subqueries, linked tables, and cross products/cartesian joins. I haven’t tried using crosstab queries as data sources, why use those when I have Tableau? However, queries involving parameters and queries that use VB code won’t pass through ODBC, those queries won’t even show up in the connection window in Tableau. This is also true for subqueries as well.

If you are using a UNION query, that won’t be selectable either, however you can wrap that query inside a SELECT query and it will then be available.

There are some fundamental limits of JET and Access, namely that a query can’t have more than 255 columns and a given column can’t have more than 255 characters (columns with the memo data type are truncated at 255 characters). In addition, there times when multiple levels of nested queries & aggregations just get too complex for JET & Access and the query doesn’t work.

Given those limitations, we can still do quite a bit. Here’s a screenshot of part of one of my Access queries that is used as a Tableau data source:

Screen Shot 2014-08-10 at 2.34.02 PM

Also note that any kind of query that you can write that would be visible as a connection for ODBC can be used as Custom SQL, though the Custom SQL syntax used by Tableau is slightly different.

Performance & Sizing

Access wasn’t built to run entire enterprises (though I’m sure some have tried), so it’s not a highly performant, highly tunable system like Postgres, MySQL, SQL Server, Oracle, etc. When there are a few thousand or tens of thousands of records, performance can be acceptable for a live connection (see notes below about latency). If you’ve got a million records in a single Access table, be prepared to wait several seconds for any given query, even on indexed fields. If you have a million records in a table with dozens of columns, Access might start having a very hard time doing things like compacting & repairing the database. I’ve had to separate data into multiple databases and even a single table across multiple databases to handle the load. If you’re having to store multiple million records in a table, either do text files or get a better database. Some free options are Postgres, MySQL, and SQL Server Express.

All of our Access databases are on network file servers and we’ve found that they are really sensitive to network latency & throughput. A query or refresh that might take a split second at 6am takes several seconds by 8am, or sometimes opening a table or query takes 30-60 seconds in the middle of the day. In addition, when Tableau is connecting to Access to create a new connection or add tables to an existing connection, there’s a set of handshaking that has to go on. In some situations with a large database (lots of data, lots of tables, and/or lots of queries) this can take several minutes at a time, ugh.

For all those performance reasons, for me almost everything that is user-facing uses a Tableau Data Extract, I’ve found extracts to regularly be an order of magnitude faster and up to 1000+ times faster than Access. When working with less than 10K records in an analytic mode, I might use a live connection, but as soon as it’s getting slow I’m extracting the data to take advantage of what Tableau can do.

Security

Fuggedaboutit. As far as I can see, Access was designed as an individual/departmental database, with some multi-user features tacked on over time. The fundamental security model is file-based, not role- or record-based. If you’re developing an entire Access application in VB you can do more with security, but when it comes to Tableau connecting to the underlying data all of those controls go away. With the databases on file servers using Active Directory we’re able to control who has access at the file & folder level. My users are very trustworthy (they are in the business of saving lives, after all) but not that technical, so my biggest fear is someone accidentally deleting a folder full of Access DBs. So I educate my users about what to do and what not to do, keep databases in separate folders with appropriate permissions, and have taught them to be scared of deleting anything. Moving to a more robust database at some point has been on my Someday-Maybe list since day 1; we’re in the process of spinning up a SQL Server to deal with this and handle various performance issues.

Access and Tableau Server

Here are three critical points for successfully using Access with Tableau Server:

  1. Set up your Tableau Server so that the user Tableau Server runs as has permissions to wherever you are putting the Access databases (or Excel or text files) to use as data sources. If using linked tables in Access, the Tableau Server run-as user is also going to need permissions on those linked databases.
  2. In Tableau Desktop always use the UNC path like \\localhost\pathname rather than C:\pathname. If you didn’t do that at the beginning, you can always edit the connection later, which is better than replacing the data source because if you do the latter then you’ll lose colors & aliases. I keep a text file of snippets around so every time I have to connect to \\smmc.local\SMHC Quality DB\Core\Core.accdb I can just cut & paste.
  3. When publishing to server, make sure you *uncheck* “Include External Files”. If you don’t do this, you’ll be scratching your head later when the visualization doesn’t show updated data.

If you didn’t do step 2 or step 1, when you publish as Tableau is rendering the active worksheet in the preview window Tableau will spit errors at you saying it can’t display the viz. Note that Tableau only does that for the data sources required for the particular worksheet that is displayed in the preview, so if you have data sources A and B, and the preview worksheet uses A but B isn’t available, the publishing will work fine and you won’t get the error until someone on Tableau Server tries to access a worksheet or dashboard that uses data source B.

Supported Features in Tableau

Tableau does as much of the computations as possible in the data source, and that does put some limitations on what can be done in Tableau when the data source doesn’t have a ton of functionality, and that is true for ODBC connections to Access databases. The following functions are not available for live connections to a Microsoft Access database: COUNTD(), MEDIAN(), PERCENTILE(), IN/OUT of Sets, otherwise everything else is, including the RAWSQL functions. The two workarounds to those limitations are a) use a more robust data source or b) use a Tableau Data Extract.

It’s also worth noting that there can be differences in how JET sources compute a result vs. Tableau data extracts, for example in past releases I’ve found differences in the ROUND() and SIGN() functions, never mind the can o’ worms that is JET’s handling of epoch dates, so if you’re doing a lot of computation in your Access queries, take a little time to validate that your worksheets aren’t unexpectedly changing when you switch to an extract.

How I Use Microsoft Access with Tableau

Our data is organized in a snowflake schema, with several fact tables for patient encounters, diagnoses, procedures, measure results, etc. and a set of dimension tables. The fact tables are stored in several databases (given volumes), the dimension tables in another database, and then there are separate databases that have linked tables and queries, I’m looking at having that particular bit of maintenance headache go away once we get SQL Server up and running.

Given what I’ve described so far, I’ve got two ways of working with Access and Tableau:

Ad hoc & Unplanned Work w/Thousands of Records

When I’ve got that “Hey, I’ve got this data and a presentation due tomorrow, what can you tell me about it?” request I’ll pull the data in Access, if it’s a few thousand or maybe tens of thousands of records I’ll be likely to leave it a live connection unless I need COUNTD(), MEDIAN(), etc. or I have performance issues. If I need other tables I’ll use Tableau Data Blends to put together what I need. I’ve gotten into the habit of always connecting to Access sources using UNC paths because I never know when I’m going to need to quickly publish something.

Planned Work, or Work with Tens of Thousands of Records or More

In the case that I know I’ve got bigger data, or that what I’m building is something that is going to be published on a recurring basis, then I’ll build a query in Access, joining any tables I need, using subqueries where necessary, etc. Since our environment is such that queries can be slow, then I’ll create an extract in Tableau and work from that.

I generally don’t use Custom SQL nor multiple tables connections in Tableau for the following reasons:

  • The biggest is that I like Access’s visual query designer way more than editing the text of Custom SQL. Custom SQL for JET sources has about three error messages that could cover anything from a missing comma to your file server having been dropped into one of those Siberian sinkholes, the messages are incredibly non-helpful.
  • The visual connection interface in 8.2 is nice, but still not as easy to work with as Access. For example, in Access I can add a custom WHERE clause on a query or use a cross product, those aren’t possible in 8.2’s connection interface.
  • Given the sometimes slowness of any given connection to Access, I don’t want to have to wait several seconds to a minute or more every time I’m adding a table. Using the Access query designer avoids that.
  • I’m the the only data analyst in my department so I’m the requirements analyzer, developer, DBA, trainer, technical writer, and most-importantly-for-this-section the person who has to live with anything I build. I’ll do the work of joins and computation in Access or in Tableau’s workspace, but not in Custom SQL nor multiple tables connections, so if/when something breaks I only have two places to look, not potentially four.
  • The potential performance gains from letting Tableau manage a multiple tables live connection are ignored when using an extract, and I’m mostly using extracts.
  • Maintaining a single query is a lot easier than maintaing several workbooks that use similar multiple tables connections.

I try very hard to avoid using queries that have to write out data to intermediate tables, because that imposes an extra step in updating the data. I might use intermediate tables in an ETL process, but not for data for Tableau.

Why Use Microsoft Access with Tableau?

Given all I’ve dished out on Access, why use it? I think this can be divided in general into two cases, one is “Why Use Access vs. Excel?”, the other is “Why Use MS Access vs. Another Database?”

When it comes to Access vs. Excel, I’m pretty hands-down in favor of Access for data validation alone. As much as we can use Excel like a database (and I have), Excel is fundamentally not a database. Microsoft has a free Access runtime, so setting up a form or data entry table in Access that has valid fields with real data types is as fast or faster than Excel, can have indexes and referential integrity built-in, and then the data can be instantly ready for analysis. Queries are re-usable, whereas in Excel unless you’re careful you’ll end up rebuilding multiple tables connections again and again.

For Access vs. most any other database, that other database almost certainly will have more functionality and performance than Access. However, there are a few points in Access’s favor:

  • Skill needed: As much as the various vendors have improved usability over the years, Access is much easier to use and administer (with caveats about Access’s limits). With Access’s visual query designer, I can teach an RN with few technical skills how to write a valid, usable query without them ever having to learn SQL. The data window is familiar to my users from MS Excel and easy to get data in & out of.
  • License cost: Certainly there are free options like MySQL, Postgres, and SQL Server Express, however if you’re looking at paid software a few Access licenses will be a lot cheaper than even the maintenance on a paid option.
  • What’s available and usable functionality: For example, the fastest SQL Server in the world isn’t so useful if your DBA hasn’t given you permissions to create a view on it (yes, I’ve heard of this happening). Besides the control, Access can be used to connect to most anything out there and create federated databases where a linked table in Access is pointing to SQL Server, see http://office.microsoft.com/en-us/access-help/import-or-link-to-sql-server-data-HA010200494.aspx for one set of instructions.

Conclusion

I think of Access as kind of like a Swiss army knife, within a certain range of tasks and requirements it’s quite functional, beyond that we have to resort to other tools. If you’re exploring using Access and Tableau, I hope this is useful for you, if you have any comments please leave them below!

15 thoughts on “I Have Wee Data – Microsoft Access and Tableau

  1. suz

    Thanks for posting. Appreciate the dose of reality, clarity, and tool discernment. I work in healthcare data also. Data can range from big to wee to pint-sized wee (office-based surveys). They can also be in various states of cleanliness. Tool availability can range also, as you pointed out.
    Always interesting, sometimes fun!

    (Any update on the book, btw?)

    Reply
    1. Jonathan Drummey Post author

      Thanks for responding, and asking! Progress on the book has gone in a different direction this summer as I took on several Tableau presentations (Boston Tableau User Group, Think Data Thursday, Maine Tableau User Group, and my session on Extreme Data Blending was a late addition to the Tableau Conference). There are definite “chunks” of material for the book that are ready or near-ready, I’ve been considering ways to get over my own perfectionist tendencies and get some material published sooner.

      Reply
  2. Cathy Bridges

    Thanks for the article – I’m in the same boat! We are also investigating solutions that are a little more robust than Access. I keep fearing the 2GB query number, when Access crashes and I have to pull up some backup version and re-engineer my queries. If I understood your article correctly, it sounds like your final queries that you’re connecting to Tableau are designed to include the information from your other tables (you said you’re not creating the table relationships in Tableau). When I’ve tried that, Access crashes 100% of the time. The query is too cumbersome for Access. I went “shopping” for solutions recently, and we’re going to give Firebird a try. It’s free and has free or nearly free GUI interface tools that function much the same as Access. Hopefully it will be what we’re looking for. SQL server hasn’t provided what we need (yet) because we’re not on Tableau Server (yet) and Tableau Online doesn’t play nice with SQL Server.

    Reply
  3. avadhootb

    Great Article and I could imagine my experiences on the same lines. I am also facing the 2Gb limit and then crash experiences as Cathy mentioned. I used a union query in access (into a select) to create a table in access and then use that to connect to my Tableau workbook. In future if I continue getting issues may be I will ping you for a hand. Thanks again for sharing this.

    Reply
  4. jen2franne

    Awwwww, I really enjoyed this article. Upgrading Access to SQL Server or even using SQL Server linked tables from Access to overcome Access 5 concurrent user limitation, file database corruption, performance and lack of security issues, etc. is easier than you think to do. I was doing these upgrades back in 2001 for the military and for those same reasons 10+ years ago. Some of those projects took as little as one day to complete if the VBA and forms used were not too extreme. It is interesting to see the longevity of practical Access solutions.

    The other reason I liked this article was that it was a fantastic reality check. At times with all the big data hype and buzz, it does seem like all the cool kids are working with Hadoop, awesomely cool, new bleeding edge tech projects…but when you ask around usually those projects are few and far between, possibly a pilot project or two, in the world’s largest companies. The small data projects are all around us, keeping the world running and most likely are the ones most folks are working on right now.

    I did fear a little on the XP in healthcare aspects. I think, not 100% sure, but I think XP is no longer supported by Microsoft meaning no more patches.

    Thanks for sharing these tips and your experiences.

    Reply
  5. Chris Gerrard

    Thanks for bringing some light to the wee data world, Jonathan. Bravo.

    One of the things that consistently gets overlooked, particularly by people coming from a traditional BI world, is that a very large fraction, many times most, of the data people are interesting in is local to them, and not really all that voluminous. Wee.

    While traditionalists are always prioritizing the need to guarantee a ‘single version of the truth’, wee dataistas are interested in understanding the data that matters to them, in their specific context. In this sense, to invoke an older and woefully under appreciated principle: all BI is local. And local business data-based information is the stuff that every organization runs on, not the top-level strategically relevant consolidated data that’s been the stuff of BI for years.

    Echoing jen2franne, the current fascination of big data, often with Hadoop somewhere in the chatter, is grounded in the notion that it’s the “Big Questions” that can only be answered by “Big Data” that are the primary source of value for an organization. There’s a lot wrong with this framing (another-time story) and it completely misses the fact that it’s the multitude of little questions that are the meat and potatoes of any organization’s work.

    Being able to easily and effectively access and understand wee data is arguably more important for the vast majority of people doing their jobs than anything they’ll do with big data. Tableau’s great gift was making this wee data analysis simple and straightforward.

    One of the consequences of wee data analysis is that it dramatically expands the horizons of what can be considered ‘data’. It used to be that data had a very narrow concept – only stuff that had been captured by operational business systems and retained in databases. With wee data analysis we’re learning that pretty much everything and anything is data, or at least easily datafiable. And this is a very good thing.

    Reply
    1. Jonathan Drummey Post author

      Hi Chris,

      Thanks for your wisdom! I especially liked “wee dataistas”, yet I think “all BI is local” is an even more important thought. I’ve met a number of people in healthcare whose Tableau implementations all started when there was some sort of enterprise BI system in place or being purchased, and every single time it was because local (often departmental) needs were not being met.

      I’ve got more thoughts and questions in response to what you wrote, partially in relationship to Jen’s reference to governance vis-a-vis self-service data discovery in her latest blog post http://www.jenunderwood.com/2014/08/14/exploring-qlik-sense-desktop/, I’ll need to think on this some more. One question is how can we get IS (and vendors) to value the wee data – the billions of Excel spreadsheets, for example – as much as the big data?

      Reply
      1. Chris Gerrard

        Governance is a surprisingly slippery topic. On the one hand it’s a legitimate concern. On the other hand it’s a red herring, frequently lauded as an essential element of a sane organization’s data use but all to often honored in principle while unrealized in practice.

        The discussion around governance is typically framed by the same Big BI forces that cannot see the value, or even legitimate existence, of wee data. In the Big BI world governance is oriented around determining what data people can access and what they can do with it. This makes some sense in a world where business analytical data is held by, handled by, or at least passes through the vision of the central data managers.

        There’s another way. Instead of controlling what data people can access and what they can do with it, it’s possible to use the fundamental feature of Tableau (and its cousins) – that of connecting to any data, anywhere, to being transparency to the data analytical environment when and where it makes sense. And it doesn’t always make sense – in the world of local BI there are many, many circumstances where the need to understand the
        data doesn’t reach past the local horizon, when it’s enough to trust the person who’s analyzing his/her own data to use it appropriately.

        The irony of governance is that transparency is always required. Even in the most autocratic data-controlled environment without full transparency from source to presentation it’s impossible to assess the reality of what people are seeing. Traditional BI people try to mitigate this by locking things down tighter and tighter, but this only makes it harder and harder for people to understand the data that matters to them.

        Oddly enough, there’s nothing new here. We had the same challenges thirty years ago in FOCUS environments where business people were writing their own FOCUS reports against data in local databases they were creating with FOCUS from the corporate sources. IBI (FOCUS’ vendor) didn’t solve this problem, leaving it up to ad hoc, custom approaches.

        I created TWIS years ago to address this last-data-mile transparency problem. And Cotgreave created the TWB Auditor for the same reason about the same time.

        Strangely, Tableau Software hasn’t capitalized on the market vacuum in this space to bring this ability into the basic tool – doing so would be a real differentiator that would really make a difference. I’ve talked to a number of Tableau people over the years about this but nothing seems to come of it. Admittedly, doing this is outside Tableau’s basic design space, so it would require rethinking everything from what kind of data Tableau can address analytically along with inventing new mechanisms for modeling and asking questions of the data.

        Reply
  6. Brad Earle (@vzwiz)

    Hi Jonathan, sorry not at the conference this year. But I noticed during Mark’s presentation in the Keynote that there is a new gear icon on the data connection grey bar — check out my tweet @vzwiz. My hope is that the gear permits reducing the number of columns selected from a particular table so that complex joins like you are doing in Access can actually be done in Tableau. Right now, since all columns are always returned, Tableau results in an error message for having too many columns.

    On a related note, one of the approaches that you can do within Tableau is to create a bunch of custom queries and then join those queries within the data connection dialog. Most examples I’ve seen cram all the code into a single custom sql query. With Tableau, you can use a similar approach to what can be done within Access — that is, create sub-queries and then use the results of those queries as joined tables.

    #data14

    Reply
    1. Jonathan Drummey Post author

      Hi Brad, sorry you couldn’t be at the conference!

      FYI, the gear icon in the data connection already exists in Tableau 8.2, in 8.2 it’s used for Custom SQL which as you know can be used to reduce the number of columns.

      Mentioning that custom queries can be used in a Multiple Tables connection is a good point. Personally, I choose not to use them for the reason I mentioned above in the Planned Work… section – using a single Access query (which may contain subqueries) as a data source and Tableau calculated fields where necessary is easier for me to wrap my head around and maintain then adding a third place (a Multiple Tables connection) where data is joined & merged.

      Cheers,

      Jonathan

      Reply
  7. Sophie

    Hi Jonathan! It was refreshing to see your post on Access & Tableau. I, as other have also expressed, am looking into converting our MS Access DB to Oracle, but until that cutover happens, we living the MS Access dream… :/

    I am able to successfully publish a MS Access table to the Tableau Server as a Data Source (using UNC path, unchecked the ‘include external files’ box, etc.). The problem comes in when I attempt to create a new Tableau workbook using the Tableau Server file as the connection. I get the following errors:

    “Microsoft JET database error 0x80004005: The Microsoft Jet database engine cannot open the file ‘\\ny6fl14\abrai$\Records.MDB’. It is already opened exclusively by another user, or you need permission to view its data.
    Unable to connect to the Microsoft Access database “‘\\ny6fl14\abrai$\Records.MDB'”. Check that the path is correct and that you have access privileges for the requested file.

    class DOMException
    Unable to connect to the server “de01tabwebapp001″. Check that the server is running and that you have access privileges to the requested database.
    Error: Unable to connect to this Tableau Server data source.”

    Any experience wit this or thoughts? I’ve come to a halt because of this.

    Thanks.
    Sophie

    Reply
    1. Jonathan Drummey Post author

      Hi Sophie,

      That error can be caused by all sorts of reasons. The most common cause is that the Tableau Server run-as user doesn’t have permissions, but since you could publish the data source in the first place I’m not sure that’s it. I suggest logging in as that user and trying to browse to the \\ny6fl14\abrai$\ folder, and checking the permissions on the Records.MDB file. Beyond that, I’d do a test where you have close all applications that have that database open, so only Tableau Server is trying to access the file. If that does work, then I’d contact Tableau tech support, there have been some cases w/Tableau 8.2 where it is throwing up similar errors where it wasn’t in the past.

      Reply
  8. frank.liu@kennametal.com

    Hi Jonathan,

    Totally agree with u. Not all the companys would have the avaliable “big data” for next step to viz. We have to deal with those “wee data” in daily jobs with a “big data” view. :). Considering the data maintenance and knowldage transfer risk, my boss would be more prefer to do all the data consolidation in tableau directly with the raw data. But the point is we need use the best function which they were really good at, personally speaking, we could consilidate the data by access, then viz it by tableau. I’m considering the solution of Acess 2013 + SharePoint. So in case, I left the company or transfer to another position, the template, logical are still there, no more lost for the company. This might be a possiable solution in practice for me.

    Thks & BR

    Frank

    Reply
  9. Pingback: LOD Expression Remix – Finding a Dimension at a Lower Level | Drawing with Numbers

Please add your thoughts and perspectives