Grr... Argh...

O Extract, Where Art Thou?

This was yesterday’s contribution to a Tableau forums discussion on data extracts, I thought it deserved a separate post that I could keep updated. There are some subtle behaviors and idiosyncracies in working with data connections, Tableau data extracts, and Tableau Server that aren’t fully fleshed out in the documentation, here’s my attempt! I start out with a review of the common file types and .twb vs. .twbx, and then get into some details on different types of connections and what happens based on different orders of operations, and toss in a gratuitous Buffy reference.

File Types

Here are the major types of Tableau files that we work with in Desktop:

  • .twb – A Tableau Workbook. This is an XML file containing the metadata for a Tableau workbook that includes information about the connections to the data source(s), what worksheets and dashboards are in the workbook, calculated fields created in Tableau, default color assignments, Tableau parameters, etc.
  • .tde – A Tableau Data Extract that has the data from a data source stored in a columnar, highly compressed format.
  • .tds – A Saved Data Source. This is an XML file that has the connection information, calculated fields, color assignments, etc. that we might find in a .twb, only stored as a separate file so it’s re-usable.
  • .twbx – A Tableau Packaged Workbook. This is a zip file (with a different extension) that stores the .twb and all data sources (not all data sources can be exported into a .twbx, for example cubes).
  • .tdsx – a zip file containing a .tds and the contents of the data source. I know this is possible, I’ve never seen one actually used in practice.

When we open up a .twb file in Tableau Desktop, Tableau reads the connection information and connects to the data source(s). When we open up a .twbx file, Tableau first unzips the .twbx (which includes the .twb and the data) into the user’s temp folder (the location depends on the OS and the user’s profile) and then works with that in place.

Data Extracts

When we’re extracting data, there are a few different scenarios depending on whether the Tableau workbook is a .twb or .twbx.

  1. If we have a .twb (or haven’t saved a workbook yet), Tableau will prompt us to save the extract as a file.
  2. For a .twbx, there are two outcomes:
    • If we use the “Import all data” / “Import some data” options when we make the connection, Tableau will prompt us to save the extract as a file.
    • If we connect live then in the Data window right-click on the data source and choose Extract Data…, the extract is saved inside the .twbx.

Whatever way we do things, the workbook (the XML) also includes the connection information back to the live data source so the extract can be refreshed.

Saving .twb vs. .twbx

When we save a .twb, the .twb has the connection info but not the data, so all the data lives separately. When we save a .twbx, if there’s a live connection then Tableau will attempt to save that data source inside the .twbx (not all data sources can be saved inside a .twbx, for example cube sources), which effectively makes the data source static.

Classifying Data Connections

When it comes to data connections, I have several classifications that I keep in mind.

  • Live connection to the data source.
  • static data source in a .twbx, created by saving a workbook with a live connection as a .twbx.
  • An extract that is made from a live connection to the data source. This gets the double-cylinder icon.
  • A “naked” extract – we can directly connect to a .tde file as a data source. This gets a single-cylinder icon and is a variation on the live connection, only it’s really a static connection because the raw .tde doesn’t have the information needed us to trigger a refresh. I don’t really recommend this unless you are very sure of what you’re doing, if the .tde changes that can case workbooks built on the naked extract to start throwing up errors, showing [Calculation_2987347315671] instead of your actual field names, etc. Use a .tds instead (a bit more about that below).
  • A connection to a Tableau Server Published Data Source – this has four subtypes, the first are variations of three types mentioned above – a passthrough live connection to the data source, an extract made from a live connection, or a naked extract, and the fourth is a “static” connection to the data source (which is what we get from a  live connection to a data source when we leave the “Include External Files” option to it’s default of being checked, or make a connection to a static source where we Include External Files, there’s a failure mode in this that I’ll talk about further down in publishing).

An aside here: One way to think of .tds files is that they “wrap” any of the above connection types in metadata to make the connections re-usable, so instead of making the connection to the raw Excel file, SQL Server, Tableau Saved Data Source, etc. we “connect” to the .tds.

Some Wrinkles

There’s a wrinkle in the above that some users run into:

  1. Make a workbook
  2. Connect live to a file-based data source, like an Excel file.
  3. Save it as a .twbx. Tableau stores a copy of the data in the .twbx, thereby making it static.
  4. Open up the .twbx.
  5. Try to refresh the data, nothing happens. Grrr. Argh.

Here’s another wrinkle:

  1. Make a workbook
  2. Connect live to a file-based data source, like an Excel file.
  3. Save it as a .twbx. Tableau stores a copy of the data in the .twbx, thereby making it static.
  4. Create an extract of the data. Tableau puts the extract of the static data in the .twbx.
  5. Open up the .twbx.
  6. Try to refresh the data, nothing happens. Grrr. Argh.

Here’s a way that avoids either wrinkle:

  1. Connect live to your data, whatever it is.
  2. Extract the data.
  3. Then save the .twbx.
  4. Try to refresh the extract, and it will refresh. Yeay!

Publishing to Tableau Server

Ok, let’s talk about Tableau Server now: Whether the workbook is a .twb or .twbx, Tableau always publishes the metadata (the XML file part as well as connection information) to Tableau Server. Whether the data is included depends on the kind of connection:

  • When we publish a workbook to Tableau with a live connection to a non-file based data source such as SQL Server, Oracle, etc., then we can choose the Authentication method and go from there.
  • When we publish a workbook to Tableau and the workbook has a live connection (including a connection to a naked extract) to a file-based data source, we get the “Include External Files” option. This default puts a copy of the live connection’s data into Tableau Server (more on that in a minute), thereby making it a static data source. When we turn that option off, then Tableau Server won’t copy the data and will attempt to use that live connection. Which is why we need to be careful about file paths–use UNC paths to be safe–and permissions, so the Tableau Server run-as user can also access your data.If Tableau Server can’t connect, depending on whether the current worksheet is accessing that data source you’ll get “An unexpected error occurred on the server.” message, and possibly be sad and frustrated. You’ll need to clear up the path & permissions problems.
  • If the workbook has a static data source (as in publishing a static file that’s inside a .twbx), we will also get the “Include External Files” option. If we leave the default on, that static data is copied to Tableau Server. If we uncheck that option, Tableau Server won’t copy the data and will attempt to use a live connection back to the source, only the source only lives inside the temp file created on your hard drive when Tableau unzipped the .twbx so Tableau Server is almost certainly not going to be able to connect to it and you’ll get that same “An unexpected error occurred on the server.” message after publishing.
  • When we publish a workbook to Tableau that uses extracts made from a live connection, Tableau copies the extract into Tableau Server, and preserves the connection information about how to get back to the live connection so the extract can be refreshed. Plus we get the Scheduling & Authentication option. (So even if you’ve used a .twb and stored the .tde separately, once you’ve published it Tableau Server ignores the .tde file, instead using and refreshing Tableau Server’s own internal copy).
  • A workbook using a Tableau Server Published Data Source connection is a special variation of a live connection that is always live and doesn’t get the “Include External Files” option. (Even though the Tableau Server Published Data Source might be totally static).

If we’re using multiple data sources in a workbook, we’ll see an appropriate combination of the above options when publishing the workbook. Wee!

When we’re creating a Tableau Saved Data Source, the process is the same as when we publish a .twb or .twbx.

Where art thou, data extract?

When I wrote “copies into Tableau Server”, what happens is:

  • The XML gets copied into Tableau’s PostgreSQL database.
  • File-based data sources that are copied due to “Include External Files” being on are copied into PostgreSQL.
  • During the publishing process Tableau data extracts are copied into a Tableau server temp folder, and copied (and often renamed) into a more permanent location under …Tableau Server\data\tabsvc\dataengine\folder_[####]\upload_[GUID], where the the [####] is an unknown-Tableau-assigned number (maybe something to do with versioning of installs?) and the GUID is a unique identifier of the Tableau workbook (I think). This appears to be true whether the extract is a naked extract (that can’t be updated) or one that has the connection info and can be updated.
  • I’m not sure what happens with custom shapes & background images. Since custom shapes are stored as base64 in the XML, I’m guessing they end up in PostgreSQL as well. For .twb’s background images are left in place as external files, for a .twbx they are stored as files inside the .twbx. I haven’t been able to find the background images anywhere on Tableau Server except in a temp folder so I suspect they go into PostgreSQL. If you have more information on this, please let me know.

Note that while a Tableau Server user is uploading and/or interacting with a workbook we will find various temp files with data on Tableau Server, but those get blown away as soon as the interaction is over and Tableau Server does its cleanup.

It’s possible to update the .tde file in Tableau Server’s …dataengine\… folder structure on your own. And some people (I’m looking at you, Tamar Foldi) have gone into the PostgreSQL tables to do hacky-things, and all of that is way into undocumented/break your warranty territory.

Correction 2014-11-02: This section has been changed from the original post based on a correction from Craig Bloodworth (in the comments below) and further investigation. I also added the extra info on custom shapes & background images.

More (Tableau Server-flavored) Wrinkles

Here’s a scenario, that is very close to a real-life situation that was one of my early experiences with Tableau Server when I was trying to figure this out.

  1. Create a Tableau workbook.
  2. Connect to a file-based data source, like an Excel file.
  3. Save it as a .twbx. Tableau stores a copy of the data in the .twbx, thereby making it static.
  4. Create an extract of the data. Tableau puts the extract of the static data in the .twbx.
  5. Build your viz.
  6. Publish the viz to Tableau Server, including setting an extract refresh schedule.
  7. [time passes]
  8. The extract refresh works perfectly, only refreshing static data.
  9. [time passes]
  10. The original data source gets updated with new records.
  11. [time passes]
  12. The extract refresh again works perfectly, only it never sees the new records, only the static ones.
  13. User tries to see the updated data on Tableau, and doesn’t.
  14. User emails you saying “Where’s my data?”
  15. You scratch your head.
  16. You download the workbook, try refreshing the data, turning off the extract, deleting the extract, etc., and eventually reconnect to the data.
  17. But you repeat step #3 and have to go through the same thing over again. GRR. ARGH.
  18. After getting bruises on your forehead from slamming your head against your desk, you fire up your search engine, and hopefully find this post that reminds you to be careful of what order you do things in.

Or, here’s another scenario with a Tableau Server Published Data Source:

  1. Make a workbook
  2. Connect live to a file-based data source, like an Excel file.
  3. Publish the data source to Tableau Server, leaving the default “Include External Files” checked. This makes the live connection now static, since Tableau copies the data to Tableau Server.
  4. Reconnect to the Tableau Server Published Data Source.
  5. Build the viz.
  6. Publish the workbook to Tableau Server.
  7. [time passes]
  8. The original data source gets updated with new records.
  9. User tries to see the updated data on Tableau, and doesn’t.
  10. User emails you saying “Where’s my data?”
  11. You scratch your head, maybe go back to step 2 and repeat, and life is a world of pain and suffering until you realize that you need to turn off Include External Files.

Conclusion

I’m still figuring out what’s my ideal workflow, and there’s more that could be written about permissions, refreshes on Tableau Server, making and editing .tds files, and what happens when we download a workbook to edit or try to edit a Tableau Saved Data Source. So this topic will be revisited at other times, and I’d love to get your feedback in the comments below!

45 thoughts on “O Extract, Where Art Thou?

  1. Peter Gilks (@pgilks)

    My personal biggest annoyance with extracts is that sometimes I want to locate just the .tde file from an extract local to my laptop, maybe to share, or to experiment with. If we use point 2 from your Data Extracts section (after live connection), wouldn’t it be nice to be able to choose a file location? You can go and find the files, but they are always pretty difficult to locate e.g. C:\Users\peterg\AppData\Local\Temp\TableauTempez0s7o1msvn4e12sje3904qmksa\Data\SData\data.tde

    Reply
    1. Jonathan Drummey Post author

      Hi Peter, I totally agree that hunting through temp folders is painful. Usually I just unpackage the .twbx. However, here’s one other idea for sharing based on the research I did for this post:

      1. Right-click on the data source, choose Add to Saved Data Sources…
      2. Save as Type-> Tableau Packaged Data Source (.tdsx).
      3. Then you’d have the .tde as well as your metadata, and if you want just the .tde you can unpackaged the .tdsx.
      Reply
  2. Craig Bloodworth (@craigbloodworth)

    Hi Jonathan,

    Did I misread or is my assumption just wrong? I believed that the tde was the one thing that isn’t sucked into the postgres database, and instead it lives in a guid folder structure. Your post seems to suggest that EVERYTHING ends up in the Postgres db.

    Also Peter if you want to find the location of an extract linked to a data source check out the datasource.xml I seem to remember it being in there.

    Craig

    Reply
    1. Jonathan Drummey Post author

      Hi Craig,

      You’re right, thanks for the correction! I’m trying to figure out where I got confused, I think it was a combination of not testing/searching quite far enough and Tableau’s naming conventions. In past searches all I’d found were .tde’s in the Tableau Server temp folders that would disappear (presumably from Tableau Server housekeeping), so I’d assumed that the .tde’s were inside Postgres. I don’t think I’d ever searched on *.tde, I’d always looked for specific file names.

      Here’s a test case: I’d created a .twb, connected to a Tableau saved data source (the CoffeeChain data), extracted the CoffeeChain data into “foo.tde” on a local folder, then published the workbook. I could see “foo.tde” on a Tableau Server temp folder, and eventually it disappeared. However, there was a “Sample _ Coffee Chain _Access_ 8.tde” file inside …Tableau Server\data\tabsvc\dataengine\folder_[####]\upload_[GUID] with the right creation date. I did another test by creating a materialize-able calc in my original .twb, refreshing the extract, then republishing the workbook and I could again see the the foo.tde in the temp folder, along with an updated Sample _ Coffee Chain _Access_ 8.tde.

      Looking at all the .tde’s on my Server, it appears that Tableau Server will name the .tde’s in the dataengine\ folders based on the name of a saved data source, a Tableau-assigned name like “msaccess_41271_1234567890123456.tde”, and sometimes the name I’ve given them in Tableau Desktop. I’m not at all clear on what algorithm Tableau is using for that.

      I’ll make an update to the post, thanks again!

      Jonathan

      Reply
  3. Chris Gerrard

    Whew. I’m exhausted just reading through the subtleties, complexities, and consequences, trying to keep everything straight. Thanks for puzzling all this out, getting it organized, and putting it in print so we can hang onto it.

    There are some other really interesting (weird) Tableau file behaviors. Like Tableau’s annoying mechanisms of locating files – it’s particularly irksome when you want to save a file into the location it came from, or into the ‘working’ directory but Tableau offers up as the “Save As…” a location that makes no sense from a human standpoint. I figured out some of this once upon a time: http://tableaufriction.blogspot.com/2013/04/um-wheres-this-workbook.html

    I’m currently immersed in the memories of TDS files vis-a-vis published Data Sources. It’s not completely clear yet, but the evidence is very strong that published Data Sources are weird in some unhelpful ways.
    The short version is that Tableau Server treats published Data Sources differently depending upon whether the source is a TDS file (and there are flavors there) or a previously published data source. This poses a problem with my current client because there’s a need to keep four versions of the same schema/metadata as the data undergoes continuous evolution, and need to use published Data Sources to avoid the problem with Web Editing ignoring embedded credentials with regular data connections (another problem needing fixing).
    The idea to use TDSs to synchronize the metadata is appealing because of the simplicity of automating the synchronization. But Tableau Server doesn’t propagate TDS-sourced Data Sources to the workbooks that reference it like it does “real” Data Sources that have been published and/or previously downloaded.
    (look at me – monologing again)

    The data file management rabbit hole spirals down ever deeper. Maybe we’ll reach to bottom some day.

    Reply
  4. Irfan

    Hi Jonathan

    I have read your post a number of times now, but i am still unable to find my answer. My scenario is close to what you have stated below:

    Or, here’s another scenario with a Tableau Server Published Data Source:

    Make a workbook
    Connect live to a file-based data source, like an Excel file.
    Publish the data source to Tableau Server, leaving the default “Include External Files” checked. This makes the live connection now static, since Tableau copies the data to Tableau Server.
    Reconnect to the Tableau Server Published Data Source.
    Build the viz.
    Publish the workbook to Tableau Server.
    [time passes]
    The original data source gets updated with new records.
    User tries to see the updated data on Tableau, and doesn’t.
    User emails you saying “Where’s my data?”
    You scratch your head, maybe go back to step 2 and repeat, and life is a world of pain and suffering until you realize that you need to turn off Include External Files
    ??

    what i have done is:

    1. Make a Workbook
    2. connect live to Hadoop Hive DB, extracted subset of data by applying filters to extract (tde) and saved the data source as a tdsx file. (not sure if i need to publish tds or tdsx)
    3. published the data source (tdsx) to Tableau server keeping the default “include external files” checked. (i guess an extract is an extrernal file and i am not sure if “include external file’ should eb checked)
    4. reconnect to tableau server published data source. (at this point when i right click the data source, i do not see ‘use extract’ option checked just like the way we see a live connection)
    5. Build viz
    6. publish the workbook to Tableau server (not sure if it sould be twb or twbx)
    7. schedule the published data source (tdsx) to refresh the extract
    8. when the extract gets refresh, it should update the data in my workbook on the server.

    I would really appreciate if you could confirm if this approch is correct with a clarification of few doubts i have. This would help me a lot

    Thanks

    Reply
    1. Jonathan Drummey Post author

      [This thread relates to http://community.tableausoftware.com/message/322529 on the Tableau forums]

      Hi Irfan,

      The set of notes that you copied from my post are not appropriate to your case, because Hadoop is not a file-based data source. I’m guessing that you made that connection because of believing that you had to work with a .tds or .tdsx. I’m not sure where you got the impression that you had to save the data source out as a .tds or .tdsx before publishing it to Tableau Server, if you know can you tell me? I’ve now seen a few Tableau forum posts saying that and working with .tds/.tdsx in conjunction with published data sources can introduce additional (and unnecessary, IMO) complexities.

      Here’s how I’d go about what you’re going for:

      1. Create a workbook.
      2. Connect to your Hadoop Hive DB.
      3. Generate the extract with any required data source filters.
      4. Build your data source in Tableau, including any calculated fields, assign colors, aliases, comments, etc.
      5. Refresh the extract. This will get the latest data and also optimize the extract to materialize record-level calculations so it will be as fast as possible.
      6. Publish the data source to Tableau Server and set up the refresh schedule.
      7. Save this workbook as a master of the Tableau data source in .twb format so you can edit it later.***
      8. Start a new workbook.
      9. Connect to your published data source.^^^
      10. Build your views & publish that new workbook.

      *** You actually have four options here for saving your “master Tableau data source”:
      * .twb – This is what I prefer, because it’s a) saving the key elements of the connection information to the DB, the data source filters, and all the calculated fields, colors, aliases, etc., b) is small (just some XML). The local extract that is created in step 3 is actually extraneous to the published data source since Tableau copies the extract to Tableau Server and refreshes the extract there. There’s also a timing argument to be made: In my case, once I’ve published the final workbook I may not touch the master data source for months, so when I need to go back to the master to update it (such as by adding a new calculated field, etc.) I’m likely going to have to refresh that local extract anyways. The steps for updating the .twb are 1) connect to the workbook, 2) refresh/rebuild the extract if necessary, 3) make updates, 4) save the workbook, 5) publish the data source and ignore the warning about overwriting the published data source.
      * .twbx – Now you could use a .twbx and this has the advantage of keeping both the .twb and the local extract in the same file, however for some people this can end up with too large a file (or too many too large files) due to the size of the extract. In my case, using a .twb gives me the opportunity to delete those local extracts and only rebuild them when I have to edit the master data source. This would use the same steps as the .twb.
      * .tds – Technically this is the smallest file that we could store the master Tableau data source in, however to work with it we have to 1) Create a workbook, 2) connect to the .tds, 3) refresh/rebuild the extract if necessary, 4) make updates, 5) save the .tds and ignore the warning about overwriting the saved data source, 6) publish the data source and ignore the warning about overwriting the published data source, 7) close the workbook and ignore the warning that Tableau will give you about not having saved the workbook. It’s a couple of extra steps and requires ignoring multiple warnings, which I don’t like to have as part of my own workflows.
      * .tdsx – This has the same issues as the .tds along with the potential size issue of the .twbx, and uses the same series of steps as the .tds.

      ^^^ In this scenario the published data source that is an extract. The original connection information to the Hadoop DB is known to Tableau Server but not available to the Tableau Desktop user, so “Use Extract” is not an option, the user of the published data source is only connecting to the published extract. My understanding is that this is intentionally part of the architecture of Tableau Server published data sources so a developer can create a data source with the necessary filters, etc. and then publish it without having to give access rights to the original data to every user of that published data source. That allows for very fine-grained control of who sees what.

      Note that there’s nothing here about using “Include external files” because in this case the original data source is a connection to your Hadoop database. Even though the extract is locally generated as a file, the extract is connected to the original data source (as part of the metadata in the Tableau workbook) and therefore isn’t really an “external file” as far as Tableau is concerned and Tableau won’t even give you the option to include external files. “Include external files” only matters when connecting to Excel, text, or Access files, or naked extracts (i.e. a pre-existing .tde file that doesn’t have the metadata connection info to the original source).

      Does that make sense?

      Jonathan

      Reply
      1. Jonathan Drummey Post author

        Hi Irfan,

        One more thing: Your original question on the forums was about having 5 workbooks that used workbook-specific extracts and how to replace them with a published data source. Once you go through the steps 1-7 that I propose, then you would:

        1. Open up one of the existing workbooks.
        2. Connect to the published data source.
        3. Use the Data->Replace Data Source option to replace the original data source with your published one.
        4. Validate that everything works and make corrections as necessary. For example, changing data source can cause color assignments to be lost.
        5. Save and/or publish the workbook.
        6. Repeat steps 1-5 for each workbook.

        Reply
      2. Irfan

        Thanks a lot Jonathan. This makes so much sense to me now. Based on your suggestions, i am going to do the implementation and will update you about my progress. Regarding your question about why i was saving the data source as tds/tdsx, that’s because i wanted to have a local copy of the data source that can also be used as a template for the other workbooks. The same way you are using a .twb file for storing the master data source. Initially i wasnt using centralized Tableau server data source. I had created an extract based data source and saved it locally as .tds. Using this local tds, i started creating my workbooks. whenever i wanted to edit my data source, i used to edit this .tds and then import it into each of my workbooks and do a replace data source. But i believe your approach of saving the master data source as a .twb file is better than saving it as a .tds file. Once again i appreciate your help. Thanks a lot.

        Reply
      3. Vishal

        Hi Jonathan, I have a bit of confusion, When I try to connect to published data source for creating a new workbook, Should I keep this connection as live or extract? as i dont want it to peform slow due to live connection, but whats the purpose of using published tdsx data source if i have to create extracts for all new workbook

        Reply
        1. Jonathan Drummey Post author

          Hi Vishal,

          You’re mixing terminology a bit and might be going through more steps than you need to. The first thing to keep in mind is that you can publish a Tableau workbook without having to separately publish your data sources, each workbook can have its own connections. The main reasons to use a published data source are that the Tableau metadata (calculated fields, aliases, color settings, etc.) can be shared and you can have more control over the connection, for example if you publish an extract for datasource X then there’s only one extract refresh required for a the published data source instead of needing to refresh that extract individually for every workbook that uses data source X. One drawback of published data sources is that they can be slower.

          The published data source itself can be live or an extract (and Tableau can automatically refresh the extracts on a schedule). My recommendation is to try a live connection first and only use extracts if the live connection is too slow, see my recent post To extract or not for details. I’m suspecting that you are publishing a live connection, not an extract. The steps to publishing an extract are: 1) Connect to the data source in Tableau Desktop. 2) Create an extract. 3) Right-click on the extract to publish it. 4) Connect to Tableau Server to to open a connection to the data source you just published. (Tableau 9.3 removes the need for step 4).

          When we connect to Tableau Server published data sources there are two things we can do: 1) If we click on the name of the data source then we connect to the published source on Server and the icon of the source in the Data window is the Tableau logo. 2) If we click on the down arrow in a circle button to the right of the name of the source then we download the source as a TDSX and connect to that. I suspect that’s what you’re doing because that’s the only place Tableau creates a TDSX as part of normal operations, what you need to do is #1 instead to use the published extract on Tableau Server.

          Is that making sense?

          Jonathan

          Reply
  5. Alexander Mou

    [WARNING] Before step 3 (replacing data source), open (or load) every dashboard in the workbook, and have the dashboard rendered. Otherwise, your quick filters will break. This way, your master data in the workbook will be kept unaffected by the source switching. Ref: http://community.tableausoftware.com/message/322446

    I suspect this may also have caused losing color mentioned above.

    This is a wrinkle that I am urging Tableau to fix. It has been a long time known issue, at least since March 2014. Only recently, Tableau started taking this seriously following a wave of bug reports regarding this issue.

    BTW, thanks for Jonathan’s excellent article!

    Reply
  6. Kathryn Sklar (Kat Wilson)

    Yup, just had the “Tableau Server Published Data Source” wrinkle experience. (On my first broadly distributed report, naturally.) Thank you Jonathan for putting all the information I needed to solve my problem in one place!!! [And thank you, paranoid me from last October, for filing this blog post in the “Tableau – you’ll need this later” Outlook folder.]

    Reply
  7. chris

    Hi Jonathan,

    Great overview of the Tableau data storage and connections. Hopefully you haven’t already answered this one as I’ve skimmed through the comment section and couldn’t see the question asked here.

    I work in a small group of just two analysts and a manager for a medium size company on the business side. My manager is pretty averse to any program language mixed in the process (us two analyst program in Python and SAS) and we are now converting all of our reporting from currently SAS queries then reports in Excel, to Business Objects(BO) data dumps to Excel and then appended to Tableau Data Extracts(TDE). He does not want any middle programming language bridging anywhere in the process.

    My manager sees us just using Tableau, but I am expanding our predictive analytics capabilities through other tools outside of Tableau (Python & R mostly, some SAS). The Excel data dumps from BO have a majority of the data I am looking for, but have changed over time and been added too so they are not uniform…it would be quicker for me to spend time writing large queries in BO. But we already have these account level TDEs that have rich, clean, tested, vetted, and well structured data in them, and the data they contain already gets me like 70% of the data I want for the other tools. Here’s my question, is there anyway to literally dump the data out of the TDE into a CSV file? Or would I have to recreate in worksheet a table of exactly what I want for all the fields, and then export the table that way some how?

    Reply
    1. Jonathan Drummey Post author

      There’s no way I know of to directly get data out of a TDE, Tableau has explicitly not supported that option. There are a variety of ways to get data out of Tableau:

      – at a worksheet level Copy->Crosstab or Data
      – also at a worksheet level Export->Crosstab or Data
      – select all the marks then right-click+View Data, then copy the summary or detail data
      – connect to the TDE then use View Data on the data source and copy that
      – build a worksheet with what you need, publish it to Tableau Server, then download the worksheet as a csv.

      One thing to be aware of is that the copy/paste functionality can use a huge amount of memory, and at least though Tableau 8.0 was limited to hundreds of thousands of rows because the MS JET engine would start throwing errors and/or crash. I haven’t tried it with later versions.

      Jonathan

      Reply
  8. JG

    Hi Jonathan,

    Thanks for the fantastic write-up!

    We’ve been using data sources published to Tableau Server (extracts of Oracle data), but we recently ran into an Oracle tablespace bug that has prevented an extract from refreshing.

    Our workaround has been to create a .tde file using an ETL tool (DMX) and save it on a network drive; we plan to set DMX on a schedule to overwrite that file and we’ll publish an extract to Server that has a UNC path to the file and schedule it to refresh.

    Before we do that, we’re trying to figure out the next step. The published extract on the server that we’re trying to replace has a ton of metadata associated with it and around 40 connected workbooks. t doesn’t seem possible to simply change the existing data source connection from Oracle to our new .tde file, so I’m wondering whether there’s another way to preserve the metadata. I tried creating a local copy of the server extract, saving as a .tds, and editing the connection in XML to point to the .tde file, but I wasn’t successful in preserving aliases and it felt hacky.

    Do you have any suggestions? I feel like I’m overlooking something simple.

    Thanks!
    JG

    Reply
    1. Jonathan Drummey Post author

      Hi JG,

      Other than hacking the XML, you can use the Replace Data Source function and it will mostly work, a key thing is to make sure that every calculated field is on at least one worksheet or else you might get the dreaded [Calculation_########] problem.

      Jonathan

      Reply
  9. HC Fong

    Hi Jonathan,
    I am quite new to Tableau Desktop and Tableau Server.

    I hope you can answer my questions:-

    1.
    I do have a Tableau Extract (data source file) that is scheduled in the Tableau server.
    I am wondering that if the extract is running and will that be impacting my users if they want to view the reports at the same time?

    2,
    Can I stop the extract if it is running? I do not want to stop it using the tabadmin command is there any GUI function in the tableau server that I can stop it?

    Thanks.

    HC Fong

    Reply
    1. Jonathan Drummey Post author

      Hi HC Fong,

      I missed the notification, sorry for the late reply! To answer your questions:

      1) The TDE refresh is going to impose an extra load on your server, how much depends on the server size and current user load. Users can interact with the viz the entire time, while their interactions may be slower while the extract is refreshing once it’s done then they will be seeing the fresh data. Monitoring your server during extract refreshes is the best bet on checking the performance. If you’re running the extract on an underlying database (i.e. not Excel or text files) then I also suggest you check out the Tableau white paper on improving performance at http://www.tableausoftware.com/learn/whitepapers/designing-efficient-workbooks. Making sure your extract is faster to load is certainly one way to reduce the impact on users.

      2) I’m sorry, I don’t know of any GUI function to stop an extract refresh.

      Jonathan

      Reply
  10. Manu Mohan C

    Hi Jonathan
    I have read your post and i have 2 approaches in my mind regarding my requirement. I need your help choosing one .

    Scenario 1
    1.Create a workbook
    2.connected live to SQL server
    3.Build all Viz
    4.Saved and Published as twb to server
    5.Later point in time due to performance related issue user wants only extracted data instead of live
    6.Hence i changed from live to extract and it asks for a place to save the tde file in my local drive
    7.After doing all these steps , i need to publish the workbook to server
    8. In this case i publish twb with extract to server

    Scenario 2
    1.Doing all the steps same up to step 6
    2.Instead of saving as twb , i saved it as twbx after making it as extract
    3.Then publishing the twbx with extract file to server

    Assume in both cases i opt to go with default option checked for “Include external files”

    which scenario is better suited for me if i need to refresh the data monthly (Data will get updated monthly)?

    Reply
    1. Jonathan Drummey Post author

      Hi Manu,

      Neither of your steps make sense exactly vis-a-vis your statement that you used “Include external files”. In Scenario 1 if you’ve included external files then if you’re downloading the workbook at the later point in time (step 5) you’ll get a TWBX, and then you wouldn’t get the TDE prompt in step 6. The only way I can explain that is if you have a TWB saved locally. However if that’s the case then when you go to publish in step 8 you won’t get the “include external files” option when you publish.

      In Scenario 2 since you’ve saved the workbook as a TWBX prior to publishing then you won’t get the include external files option either becasue the file is included in the TWBX.

      That said, which of the following do you want to do?

      1) Do you want Tableau Server to be able to automatically refresh the extract on a schedule?
      2) Do you want to do a manual extract refresh that you trigger from Tableau Server?
      3) Do you want to do a manual extract refresh where you have to download the workbook and republish?

      Jonathan

      Reply
      1. Manu Mohan C

        Hi Jonathan
        I want Tableau Server to be able to automatically refresh the extract on a schedule..
        is there any difference in publishing a workbook file as twb which is using an extract file and as a twbx file which is also using an extract(both are created from live connection to SQL server)?

        Reply
      2. Swe

        Hi Jonathan ,

        This is helpful . But I have scenario where I joined data from sql server with Excel and connection is Extract . When I publish the work book options I am getting for my SQL server connection is to allow refreshes and for excel file If I uncheck external data files include then I am getting message that all users may not be able to succeffully view the dashboards .

        Please advise how to address this

        Thanks

        Reply
        1. Jonathan Drummey Post author

          Hello Swe,

          When we publish live connections to file-based sources like Excel the following conditions must be met for Tableau Server users to be able to use the live connection:

          1) the file must be stored in a shared folder that is accessible by Tableau
          2) the Tableau Server run-as user must have permissions to read that file

          If either are not the case then the file will not be accessible. Tableau gives you the warning message because it’s all to easy for either of those conditions to fail. For example if the Excel file is connected to as M:\myShare\myFile.xls and M:\ is not mapped by the Tableau Server run-as user then the live connection will fail. Typically I use UNC paths like \\myServer\myShare\myFile.xls to connect to file-based sources so that way they will run successfully for both Tableau Desktop and Tableau Server.

          Reply
  11. Sharon Johnson

    Hi Jonathan,

    I’ve been reading all over the place, but can’t seem to figure out why published workbooks using a live connection to a published data source don’t reflect updates made the published data source – like renaming a field, etc. I thought this blog post might be the answer to all my problems, but am still not making the connection. I have a sneaking suspicion that it is related to embedded data sources and the fact that the published workbooks seem to have them, even when the connection is Live to the published data source.

    I’ve unchecked the include external files box. I’ve published the data source live and with an extract. Ditto for the workbook.

    Is this enough information for you to point me in the right direction? It seems that it must be some simple explanation – dare I say this about tableau server?

    Thank you,

    Reply
    1. Jonathan Drummey Post author

      Hi Sharon,

      My experience with problems like this is that I (or someone else with the necessary skills) needs to see exactly what steps you go through and what is happening to understand the problem and suggest a remedy, there are just too many variables. For example you mention “embedded data sources”, it’s possible for someone to connect to a published data source and end up with a local copy. I’d be glad to set up a screenshare to walk through this with you, my email is jonathan (dot) drummey (at) gmail.com

      Reply
      1. Sharon Johnson

        Hi, Thanks for the reply. Very strange as I didn’t change anything and it worked the next time I tried it. So…. not sure. I was a little confused about the embedded data sources, which sent me looking down the wrong path. But, I think I’m all straightened out now. On this item, anyway. On to today’s issue…

        Nice Blog. 🙂

        Reply
  12. Chris

    This resources is so very helpful, Thank YOU.

    I likely am missing it somewhere in here, but I wonder if I am connecting properly to refresh all my data daily.

    I have 3 data sources in my workbook.
    2 of the data sources are Oracle connections that I have extracts from.
    1 of the data sources is a Live connection (excel file updated daily) to a Network drive (using the UNC path).

    Currently, on the Tableau server, both(2) extracts for the Oracle connections will update daily (refresh daily scheduled) using the embedded passwords and such. However, the Excel live connection to the UNC path will not refresh. I assume its refreshing to the packaged “static” data instead of reaching out to where the real (updated) file lives. The viz appears but only has the static data from the publish date.

    I have, in a different workbook, been able to connect LIVE to the Excel file via the UNC path and when published to the server I “unchecked” the Include File option, the Excel live connections is refreshed successfully daily. So I know the Tableau server has access, and this method works.

    So, I have successfully published to the server all of these data sources separately and they refresh on desired schedules successfully in their own workbooks (oracle connection in 1 workbook, and the excel live in a separate workbook).

    However, I am not sure how to keep the oracle extracts and the Live excel connections refreshing daily TOGETHER in one workbook.

    Do I need to understand server published data sources more?

    What am I missing?

    Reply
    1. Jonathan Drummey Post author

      Hi Chris,

      You are mixing and matching terminology a bit and taking that apart could take awhile, so I’m just going to focus on your first three paragraphs since that seems to be the core of the problem.

      First of all you need to check your Tableau Server cache settings. The default setting is to refresh less often, so once you’re sure the Excel data has updated with some new values in the original file then try out the instructions here http://kb.tableau.com/articles/issue/view-out-of-date for refreshing the view. If you’re seeing the updated Excel data then the problem you’re seeing is due to your cache settings.

      If you’re seeing stale data at this point then here’s what to test:

      – Open the workbook from Tableau Server in Tableau Desktop.
      – Right-click on the data source and choose Properties… The Connection Properties window will display. If the File Name property is showing the correct UNC path to your data then there’s something else going on (like there’s actually a data extract in place). I strongly suspect that the File Name property won’t be showing the UNC path and will instead be showing a path to a temp folder.
      – In that case there are two possible causes:
      1) Most likely the workbook was published with Include external files checked on (the default). This has to be unchecked *every* time the workbook is published, unfortunately it’s all too easy for me to forget to uncheck it at one point and then boom, the live connection is gone. The solution in this case is to reconnect to the UNC path and republish and make sure to turn off Include external files.***
      2) The workbook was created as a .twbx (Tableau Packaged Workbook). When the workbook is a .twbx any file-based data sources are placed in the workbook and become static. The solution here is to save the workbook out as a .twb and then publish as above.

      *** A more permanent workaround is to publish the data source to Tableau Server (not within a workbook, but right-click on the data source and choose Publish to Server…), make sure you turn off the Include external files checkbox, and then in your workbook connect to the Published Data Source instead of the file. You’ll lose a tiny bit of performance this way and have some maintenance overhead but you won’t have to remember to turn off that checkbox every time you make a change to your workbook.

      Let me know if this helps!

      Jonathan

      Reply
  13. Pingback: How to share twbx file so that other party doesn't have to have the excel files and can refresh the data? | ASK Dev Archives

  14. Monica

    Thank you so much for this resource. I must confess I have to reread it a few times to really get it. And I have referred many people to this. Has this changes much with 10.5 Hyper extracts? It doesn’t seem to of changed to me but I wanted to get your perspective.
    thanks

    Reply
    1. Jonathan Drummey Post author

      Hi Monica,

      It’s changed a little. With 10.5 the extracts no longer automatically materialize calculated fields, we can force that with the Extract->Compute Calculations Now option. With 2018.3 we have the option to have multi-table extracts where instead of one flat table the extract contains the individual tables.

      Reply
  15. Camille Heinen

    Jonathan –
    Thanks so much for this write up. I have a question about refreshing .csv file data sources.

    Here are the steps I’d like to take:

    1. .csv file is generated and stored on a Network Attached Server (NAS) drive. (Tableau run-as ID has access to this drive).

    2. Create a .twb file on Tableau desktop that connects to the .csv (using full UNC path).

    3. Set connection type to Extract. Tableau creates .hyper file and I save it in the same location.

    4. Publish datasource to Tableau server.

    [.csv file changes]

    5. Refresh data source on Tableau Server.

    My question is that when the refresh happens in step 5, do you know what Tableau is doing behind the scenes. When it reaches out to NAS drive to read the .csv file, does it use some sort of file transfer mechanism? SFTP? (If you couldn’t tell, my enterprise’s security department is very thorough…they are having questions before they grant access to the Tableau run-as id to the NAS drive).

    Thank you for your help!

    Reply
    1. Jonathan Drummey Post author

      Hi Camille, in step 4 when the data source is published the .hyper file is copied to the Tableau Server. So the Tableau Published Data Source has three things: 1) the .hyper file, 2) the UNC path from your step 2, and 3) any other metadata that you’ve added like default formatting, colors, calculations, etc.

      For Step 5 just to be totally clear, there are two refresh definitions in Tableau. “Refresh a data source” typically means re-querying a live connection or (if an extract is used) the extract. I’m guessing you mean the second definition which is “Refresh the extract” to pull the updated/new data into the extract, this can be manually or scheduled on Tableau Server. In the latter case Tableau Server is using the same SMB file sharing mechanism that Tableau Desktop does. So whatever permissions are set up on the NAS are respected by Tableau Server.

      Hope this helps!

      Jonathan

      Reply
  16. Kelly

    Hi Jonathan,

    This post has been a really great resource for me, I appreciate you putting it together.

    I keep running into an issue with publishing my data to Tableau in the twb format. If I uncheck “include external files” I always get the “unable to connect to the data source” error, no matter where my data source is saved. If I keep it checked though, it won’t pull in fresh data.

    You mention UNC’s in your post – how do I go about creating those for a source in a network drive and how do I change that in Tableau? Do you have anything else I could try to make this work?

    Thank you for your help!

    Reply
    1. Jonathan Drummey Post author

      Hi Kelly,

      Sorry for the late reply, comments weren’t working properly for awhile. When publishing your workbook as a TWB that means that you’re only publishing the XML, not the data. So the workbook most likely has a link to a file like S:\mydatafolder\mydata.xlsx and that location is not accessible to Tableau Server. One solution is to set up that network drive on Tableau Server, the other is a UNC path which will be something like \\machine\sharename\mydatafolder\mydata.xlsx. You can find the \\machine\sharename in Windows by opening Windows Explorer and looking at your shared drive mappings because they will show that location.

      Reply

Please add your thoughts and perspectives