Throughout this blog, I will be using several public data sets to build out some examples. I particularly like the ones hosted by Exasol, which you can find here: Dataviz – Exasol. Simply sign-up for it, and you will receive an username and password via e-mail, which you can then use to connect with in Tableau, or in any other BI tools you prefer. You can even download the corresponding .tds files for most of the data sets, which will save you the trouble to define the joins and rename the fields in Tableau; another sign of how closely Exasol and Tableau are working together.
One reason why I like these data sets is because they are not just csv or text files, but the data is actually hosted on an Exasol instance. With the advent of database technologies like Exasol and Snowflake (and I expect Tableau’s own HyPer technology to become a competitor in this area), we will see users partly move away from generating extracts in Tableau, and instead they will connect live to these databases to leverage their speed and power. In fact, some of Tableau’s competitors are already fully committed to that strategy. Looker, for instance, doesn’t have any extract functionality at all; Looker only allows the end user to connect live to data sources.
I have been fortunate enough to work with an Exasol data warehouse in the past. Being able to build data sources on a live connection makes development and maintenance so much easier to manage. On the flip side, building live data sources requires a good amount of knowledge on data modelling and SQL. Traditionally, Tableau has focused more on the front-end user rather than the back-end. In that sense, it is different from Looker, which gives the developer more control over the actual coding (full disclosure: this is based on what I’ve seen so far, since I’ve never worked with Looker). Tableau, on the other hand, doesn’t readily allow custom coding, but rather Tableau translates its own internal functions into SQL. Therefore it becomes important to understand what Tableau is really doing in the back-end.
What SQL does Tableau write when you use a level-of-detail calculation? What about a set? How do they differ from a table calculation? And how do these queries perform in certain situations? These are the types of questions you’ll find yourself answering when developing and maintaining live connection data sources.
Which brings me to another reason why I like using the Exasol public data sets: Exasol also provides you access to the user views in the
EXA_STATISTICS schema, where you can track the execution plans of the queries you’ve run. For example, this is the type of information you can retrieve when you run the query
SELECT COUNT(*) FROM CHICAGO_TAXI.TAXI_TRIPS
So Exasol first scans the entire
TAXT_TRIPS table (0.008 sec), then aggregates the data into a single row (0.003 sec). We’ll explore the execution plans in more detail when we go a bit more in-depth on dashboard performance.
However: not all databases are created equally. Just because one query performs better than another query in Exasol, doesn’t mean these queries will behave the same way in another database. Therefore, it is likely I will sometimes use other databases to build my examples, such as Snowflake, Bigquery or even PostgreSQL, but we’ll tackle that when that happens.